You need DataFrame.iloc
for select rows by positions:
Sample:
np.random.seed[100]
df = pd.DataFrame[np.random.random[[10,5]], columns=list['ABCDE']]
df.index = df.index * 10
print [df]
A B C D E
0 0.543405 0.278369 0.424518 0.844776 0.004719
10 0.121569 0.670749 0.825853 0.136707 0.575093
20 0.891322 0.209202 0.185328 0.108377 0.219697
30 0.978624 0.811683 0.171941 0.816225 0.274074
40 0.431704 0.940030 0.817649 0.336112 0.175410
50 0.372832 0.005689 0.252426 0.795663 0.015255
60 0.598843 0.603805 0.105148 0.381943 0.036476
70 0.890412 0.980921 0.059942 0.890546 0.576901
80 0.742480 0.630184 0.581842 0.020439 0.210027
90 0.544685 0.769115 0.250695 0.285896 0.852395
from sklearn.model_selection import KFold
#added some parameters
kf = KFold[n_splits = 5, shuffle = True, random_state = 2]
result = next[kf.split[df], None]
print [result]
[array[[0, 2, 3, 5, 6, 7, 8, 9]], array[[1, 4]]]
train = df.iloc[result[0]]
test = df.iloc[result[1]]
print [train]
A B C D E
0 0.543405 0.278369 0.424518 0.844776 0.004719
20 0.891322 0.209202 0.185328 0.108377 0.219697
30 0.978624 0.811683 0.171941 0.816225 0.274074
50 0.372832 0.005689 0.252426 0.795663 0.015255
60 0.598843 0.603805 0.105148 0.381943 0.036476
70 0.890412 0.980921 0.059942 0.890546 0.576901
80 0.742480 0.630184 0.581842 0.020439 0.210027
90 0.544685 0.769115 0.250695 0.285896 0.852395
print [test]
A B C D E
10 0.121569 0.670749 0.825853 0.136707 0.575093
40 0.431704 0.940030 0.817649 0.336112 0.175410
K-Folds cross-validator
Provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds [without shuffling by default].
Each fold is then used once as a validation while the k - 1 remaining folds form the training set.
Read more in the User Guide.
Parameters:n_splitsint, default=5Number of folds. Must be at least 2.
Changed in version 0.22: n_splits
default value changed from 3 to 5.
Whether to shuffle the data before splitting into batches. Note that the samples within each split will not be shuffled.
random_stateint, RandomState instance or None, default=NoneWhen shuffle
is True, random_state
affects the ordering of the indices,
which controls the randomness of each fold. Otherwise, this parameter has no effect. Pass an int for reproducible output across multiple function calls. See Glossary.
See also
StratifiedKFold
Takes class information into account to avoid building folds with imbalanced class distributions [for binary or multiclass classification tasks].
GroupKFold
K-fold iterator variant with non-overlapping groups.
RepeatedKFold
Repeats K-Fold n times.
Notes
The first n_samples % n_splits
folds have size n_samples // n_splits + 1
, other folds have size n_samples // n_splits
, where n_samples
is the number of samples.
Randomized CV splitters may return different results for each call of split. You can make the results identical by setting random_state
to an integer.
Examples
>>> import numpy as np >>> from sklearn.model_selection import KFold >>> X = np.array[[[1, 2], [3, 4], [1, 2], [3, 4]]] >>> y = np.array[[1, 2, 3, 4]] >>> kf = KFold[n_splits=2] >>> kf.get_n_splits[X] 2 >>> print[kf] KFold[n_splits=2, random_state=None, shuffle=False] >>> for train_index, test_index in kf.split[X]: ... print["TRAIN:", train_index, "TEST:", test_index] ... X_train, X_test = X[train_index], X[test_index] ... y_train, y_test = y[train_index], y[test_index] TRAIN: [2 3] TEST: [0 1] TRAIN: [0 1] TEST: [2 3]
Methods
| Returns the number of splitting iterations in the cross-validator |
| Generate indices to split data into training and test set. |
Returns the number of splitting iterations in the cross-validator
Always ignored, exists for compatibility.
yobjectAlways ignored, exists for compatibility.
groupsobjectAlways ignored, exists for compatibility.
Returns:n_splitsintReturns the number of splitting iterations in the cross-validator.
split[X, y=None, groups=None][source]¶Generate indices to split data into training and test set.
Parameters:Xarray-like of shape [n_samples, n_features]Training data, where n_samples
is the number of samples and n_features
is the number of features.
The target variable for supervised learning problems.
groupsarray-like of shape [n_samples,], default=NoneGroup labels for the samples used while splitting the dataset into train/test set.
Yields:trainndarrayThe training set indices for that split.
testndarrayThe testing set indices for that split.