@@ -41,59 +41,67 @@ data in *folds* that we use for training and testing::
41
41
>>> print(scores)
42
42
[0.93489148580968284, 0.95659432387312182, 0.93989983305509184]
43
43
44
- .. currentmodule :: sklearn.cross_validation
44
+ .. currentmodule :: sklearn.model_selection
45
45
46
46
This is called a :class: `KFold ` cross validation
47
47
48
48
.. _cv_generators_tut :
49
49
50
- Cross-validation generators
51
- =============================
50
+ Cross-validation classes
51
+ ========================
52
52
53
53
54
54
55
- The code above to split data in train and test sets is tedious to write.
56
- Scikit-learn exposes cross-validation generators to generate list
57
- of indices for this purpose::
55
+ The above code, to split data into train and test sets, is tedious to write.
56
+ Scikit-learn provides a bunch of classes which can be used to generate lists
57
+ of train/test indices based on popular cross-validation strategies.
58
58
59
- >>> from sklearn import cross_validation
60
- >>> k_fold = cross_validation.KFold(n=6, n_folds=3)
61
- >>> for train_indices, test_indices in k_fold:
59
+ These classes have a split method which generates the train/test indices.
60
+ In the following example, let us use dummy values for X, to get the
61
+ train/test indices based on a 3-fold cross-validation strategy::
62
+
63
+
64
+ >>> from sklearn.model_selection import KFold
65
+ >>> import numpy as np
66
+ >>> k_fold = KFold(n_folds=3)
67
+ >>> for train_indices, test_indices in k_fold.split(X=np.ones(6)):
62
68
... print('Train: %s | test: %s' % (train_indices, test_indices))
63
69
Train: [2 3 4 5] | test: [0 1]
64
70
Train: [0 1 4 5] | test: [2 3]
65
71
Train: [0 1 2 3] | test: [4 5]
66
72
67
- The cross-validation can then be implemented easily::
73
+ Using these indices cross-validation can then be implemented easily::
68
74
69
- >>> kfold = cross_validation.KFold(len(X_digits), n_folds=3)
70
- >>> [svc.fit(X_digits[train], y_digits[train]).score(X_digits[test], y_digits[test])
71
- ... for train, test in kfold]
75
+ >>> kfold = KFold(n_folds=3)
76
+ >>> [svc.fit(X_digits[train], y_digits[train]).score(
77
+ ... X_digits[test], y_digits[test])
78
+ ... for train, test in kfold.split(X_digits)]
72
79
[0.93489148580968284, 0.95659432387312182, 0.93989983305509184]
73
80
74
81
To compute the ``score `` method of an estimator, the sklearn exposes
75
82
a helper function::
76
83
77
- >>> cross_validation.cross_val_score(svc, X_digits, y_digits, cv=kfold, n_jobs=-1)
84
+ >>> from sklearn.model_selection import cross_val_score
85
+ >>> cross_val_score(svc, X_digits, y_digits, cv=kfold, n_jobs=-1)
78
86
array([ 0.93489149, 0.95659432, 0.93989983])
79
87
80
88
`n_jobs=-1 ` means that the computation will be dispatched on all the CPUs
81
89
of the computer.
82
90
83
- **Cross-validation generators **
91
+ **Cross-validation classes **
84
92
85
93
86
94
.. list-table ::
87
95
88
96
*
89
97
90
- - :class: `KFold ` **(n, k) **
98
+ - :class: `KFold ` **(k) **
91
99
92
- - :class: `StratifiedKFold ` **(y, k) **
100
+ - :class: `StratifiedKFold ` **(k) **
93
101
94
- - :class: `LeaveOneOut ` **(n ) **
102
+ - :class: `LeaveOneOut ` **() **
95
103
96
- - :class: `LeaveOneLabelOut ` **(labels ) **
104
+ - :class: `LeaveOneLabelOut ` **() **
97
105
98
106
*
99
107
@@ -132,14 +140,14 @@ Grid-search and cross-validated estimators
132
140
Grid-search
133
141
-------------
134
142
135
- .. currentmodule :: sklearn.grid_search
143
+ .. currentmodule :: sklearn.model_selection
136
144
137
145
The sklearn provides an object that, given data, computes the score
138
146
during the fit of an estimator on a parameter grid and chooses the
139
147
parameters to maximize the cross-validation score. This object takes an
140
148
estimator during the construction and exposes an estimator API::
141
149
142
- >>> from sklearn.grid_search import GridSearchCV
150
+ >>> from sklearn.model_selection import GridSearchCV
143
151
>>> Cs = np.logspace(-6, -1, 10)
144
152
>>> clf = GridSearchCV(estimator=svc, param_grid=dict(C=Cs),
145
153
... n_jobs=-1)
@@ -163,8 +171,8 @@ a stratified 3-fold.
163
171
164
172
::
165
173
166
- >>> cross_validation. cross_val_score(clf, X_digits, y_digits)
167
- ... # doctest: +ELLIPSIS
174
+ >>> cross_val_score(clf, X_digits, y_digits)
175
+ ... # doctest: +ELLIPSIS
168
176
array([ 0.938..., 0.963..., 0.944...])
169
177
170
178
Two cross-validation loops are performed in parallel: one by the
0 commit comments