K fold cross validation dataframe python

You need DataFrame.iloc for select rows by positions:

Sample:

np.random.seed(100)
df = pd.DataFrame(np.random.random((10,5)), columns=list('ABCDE'))
df.index = df.index * 10
print (df)
           A         B         C         D         E
0   0.543405  0.278369  0.424518  0.844776  0.004719
10  0.121569  0.670749  0.825853  0.136707  0.575093
20  0.891322  0.209202  0.185328  0.108377  0.219697
30  0.978624  0.811683  0.171941  0.816225  0.274074
40  0.431704  0.940030  0.817649  0.336112  0.175410
50  0.372832  0.005689  0.252426  0.795663  0.015255
60  0.598843  0.603805  0.105148  0.381943  0.036476
70  0.890412  0.980921  0.059942  0.890546  0.576901
80  0.742480  0.630184  0.581842  0.020439  0.210027
90  0.544685  0.769115  0.250695  0.285896  0.852395

from sklearn.model_selection import KFold

#added some parameters
kf = KFold(n_splits = 5, shuffle = True, random_state = 2)
result = next(kf.split(df), None)
print (result)
(array([0, 2, 3, 5, 6, 7, 8, 9]), array([1, 4]))

train = df.iloc[result[0]]
test =  df.iloc[result[1]]

print (train)
           A         B         C         D         E
0   0.543405  0.278369  0.424518  0.844776  0.004719
20  0.891322  0.209202  0.185328  0.108377  0.219697
30  0.978624  0.811683  0.171941  0.816225  0.274074
50  0.372832  0.005689  0.252426  0.795663  0.015255
60  0.598843  0.603805  0.105148  0.381943  0.036476
70  0.890412  0.980921  0.059942  0.890546  0.576901
80  0.742480  0.630184  0.581842  0.020439  0.210027
90  0.544685  0.769115  0.250695  0.285896  0.852395

print (test)
           A         B         C         D         E
10  0.121569  0.670749  0.825853  0.136707  0.575093
40  0.431704  0.940030  0.817649  0.336112  0.175410

classsklearn.model_selection.KFold(n_splits=5, *, shuffle=False, random_state=None)[source]

K-Folds cross-validator

Provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default).

Each fold is then used once as a validation while the k - 1 remaining folds form the training set.

Read more in the User Guide.

Parameters:n_splitsint, default=5

Number of folds. Must be at least 2.

Changed in version 0.22: n_splits default value changed from 3 to 5.

shufflebool, default=False

Whether to shuffle the data before splitting into batches. Note that the samples within each split will not be shuffled.

random_stateint, RandomState instance or None, default=None

When shuffle is True, random_state affects the ordering of the indices, which controls the randomness of each fold. Otherwise, this parameter has no effect. Pass an int for reproducible output across multiple function calls. See Glossary.

See also

StratifiedKFold

Takes class information into account to avoid building folds with imbalanced class distributions (for binary or multiclass classification tasks).

GroupKFold

K-fold iterator variant with non-overlapping groups.

RepeatedKFold

Repeats K-Fold n times.

Notes

The first n_samples % n_splits folds have size n_samples // n_splits + 1, other folds have size n_samples // n_splits, where n_samples is the number of samples.

Randomized CV splitters may return different results for each call of split. You can make the results identical by setting random_state to an integer.

Examples

>>> import numpy as np
>>> from sklearn.model_selection import KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
2
>>> print(kf)
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in kf.split(X):
...     print("TRAIN:", train_index, "TEST:", test_index)
...     X_train, X_test = X[train_index], X[test_index]
...     y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]

Methods

get_n_splits([X, y, groups])

Returns the number of splitting iterations in the cross-validator

split(X[, y, groups])

Generate indices to split data into training and test set.

get_n_splits(X=None, y=None, groups=None)[source]

Returns the number of splitting iterations in the cross-validator

Parameters:Xobject

Always ignored, exists for compatibility.

yobject

Always ignored, exists for compatibility.

groupsobject

Always ignored, exists for compatibility.

Returns:n_splitsint

Returns the number of splitting iterations in the cross-validator.

split(X, y=None, groups=None)[source]

Generate indices to split data into training and test set.

Parameters:Xarray-like of shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

yarray-like of shape (n_samples,), default=None

The target variable for supervised learning problems.

groupsarray-like of shape (n_samples,), default=None

Group labels for the samples used while splitting the dataset into train/test set.

Yields:trainndarray

The training set indices for that split.

testndarray

The testing set indices for that split.

Examples using sklearn.model_selection.KFold¶

How do you calculate k

Below are the steps for it:.
Randomly split your entire dataset into k”folds”.
For each k-fold in your dataset, build your model on k – 1 folds of the dataset. ... .
Record the error you see on each of the predictions..
Repeat this until each of the k-folds has served as the test set..

How do you do k

k-Fold cross-validation.
Pick a number of folds – k. ... .
Split the dataset into k equal (if possible) parts (they are called folds).
Choose k – 1 folds as the training set. ... .
Train the model on the training set. ... .
Validate on the test set..
Save the result of the validation..
Repeat steps 3 – 6 k times..

How do you implement K fold in Python?

K-Fold Cross Validation in Python (Step-by-Step).
Randomly divide a dataset into k groups, or “folds”, of roughly equal size..
Choose one of the folds to be the holdout set. ... .
Repeat this process k times, using a different set each time as the holdout set..

What is KFold in Python?

K-Folds cross-validator. Provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default). Each fold is then used once as a validation while the k - 1 remaining folds form the training set. Read more in the User Guide.