4
4
Cross-validation: evaluating estimator performance
5
5
===================================================
6
6
7
- .. currentmodule :: sklearn.cross_validation
7
+ .. currentmodule :: sklearn.model_selection
8
8
9
9
Learning the parameters of a prediction function and testing it on the
10
10
same data is a methodological mistake: a model that would just repeat
@@ -24,7 +24,7 @@ can be quickly computed with the :func:`train_test_split` helper function.
24
24
Let's load the iris data set to fit a linear support vector machine on it::
25
25
26
26
>>> import numpy as np
27
- >>> from sklearn import cross_validation
27
+ >>> from sklearn.model_selection import train_test_split
28
28
>>> from sklearn import datasets
29
29
>>> from sklearn import svm
30
30
@@ -35,7 +35,7 @@ Let's load the iris data set to fit a linear support vector machine on it::
35
35
We can now quickly sample a training set while holding out 40% of the
36
36
data for testing (evaluating) our classifier::
37
37
38
- >>> X_train, X_test, y_train, y_test = cross_validation. train_test_split(
38
+ >>> X_train, X_test, y_train, y_test = train_test_split(
39
39
... iris.data, iris.target, test_size=0.4, random_state=0)
40
40
41
41
>>> X_train.shape, y_train.shape
@@ -101,10 +101,9 @@ kernel support vector machine on the iris dataset by splitting the data, fitting
101
101
a model and computing the score 5 consecutive times (with different splits each
102
102
time)::
103
103
104
+ >>> from sklearn.model_selection import cross_val_score
104
105
>>> clf = svm.SVC(kernel='linear', C=1)
105
- >>> scores = cross_validation.cross_val_score(
106
- ... clf, iris.data, iris.target, cv=5)
107
- ...
106
+ >>> scores = cross_val_score(clf, iris.data, iris.target, cv=5)
108
107
>>> scores # doctest: +ELLIPSIS
109
108
array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ])
110
109
@@ -119,8 +118,8 @@ method of the estimator. It is possible to change this by using the
119
118
scoring parameter::
120
119
121
120
>>> from sklearn import metrics
122
- >>> scores = cross_validation. cross_val_score(clf, iris.data, iris.target,
123
- ... cv=5, scoring='f1_weighted')
121
+ >>> scores = cross_val_score(
122
+ ... clf, iris.data, iris.target, cv=5, scoring='f1_weighted')
124
123
>>> scores # doctest: +ELLIPSIS
125
124
array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ])
126
125
@@ -136,11 +135,11 @@ being used if the estimator derives from :class:`ClassifierMixin
136
135
It is also possible to use other cross validation strategies by passing a cross
137
136
validation iterator instead, for instance::
138
137
139
- >>> n_samples = iris.data.shape[0]
140
- >>> cv = cross_validation.ShuffleSplit(n_samples, n_iter=3,
141
- ... test_size=0.3, random_state=0)
138
+ >>> from sklearn.model_selection import ShuffleSplit
142
139
143
- >>> cross_validation.cross_val_score(clf, iris.data, iris.target, cv=cv)
140
+ >>> n_samples = iris.data.shape[0]
141
+ >>> cv = ShuffleSplit(n_iter=3, test_size=0.3, random_state=0)
142
+ >>> cross_val_score(clf, iris.data, iris.target, cv=cv)
144
143
... # doctest: +ELLIPSIS
145
144
array([ 0.97..., 0.97..., 1. ])
146
145
@@ -153,7 +152,7 @@ validation iterator instead, for instance::
153
152
be learnt from a training set and applied to held-out data for prediction::
154
153
155
154
>>> from sklearn import preprocessing
156
- >>> X_train, X_test, y_train, y_test = cross_validation. train_test_split(
155
+ >>> X_train, X_test, y_train, y_test = train_test_split(
157
156
... iris.data, iris.target, test_size=0.4, random_state=0)
158
157
>>> scaler = preprocessing.StandardScaler().fit(X_train)
159
158
>>> X_train_transformed = scaler.transform(X_train)
@@ -167,7 +166,7 @@ validation iterator instead, for instance::
167
166
168
167
>>> from sklearn.pipeline import make_pipeline
169
168
>>> clf = make_pipeline(preprocessing.StandardScaler(), svm.SVC(C=1))
170
- >>> cross_validation. cross_val_score(clf, iris.data, iris.target, cv=cv)
169
+ >>> cross_val_score(clf, iris.data, iris.target, cv=cv)
171
170
... # doctest: +ELLIPSIS
172
171
array([ 0.97..., 0.93..., 0.95...])
173
172
@@ -184,8 +183,8 @@ can be used (otherwise, an exception is raised).
184
183
185
184
These prediction can then be used to evaluate the classifier::
186
185
187
- >>> predicted = cross_validation.cross_val_predict(clf, iris.data,
188
- ... iris.target, cv=10)
186
+ >>> from sklearn.model_selection import cross_val_predict
187
+ >>> predicted = cross_val_predict(clf, iris.data, iris.target, cv=10)
189
188
>>> metrics.accuracy_score(iris.target, predicted) # doctest: +ELLIPSIS
190
189
0.966...
191
190
@@ -223,10 +222,11 @@ learned using :math:`k - 1` folds, and the fold left out is used for test.
223
222
Example of 2-fold cross-validation on a dataset with 4 samples::
224
223
225
224
>>> import numpy as np
226
- >>> from sklearn.cross_validation import KFold
225
+ >>> from sklearn.model_selection import KFold
227
226
228
- >>> kf = KFold(4, n_folds=2)
229
- >>> for train, test in kf:
227
+ >>> X = np.ones(4)
228
+ >>> kf = KFold(n_folds=2)
229
+ >>> for train, test in kf.split(X):
230
230
... print("%s %s" % (train, test))
231
231
[2 3] [0 1]
232
232
[0 1] [2 3]
@@ -250,11 +250,12 @@ target class as the complete set.
250
250
Example of stratified 3-fold cross-validation on a dataset with 10 samples from
251
251
two slightly unbalanced classes::
252
252
253
- >>> from sklearn.cross_validation import StratifiedKFold
253
+ >>> from sklearn.model_selection import StratifiedKFold
254
254
255
- >>> labels = [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
256
- >>> skf = StratifiedKFold(labels, 3)
257
- >>> for train, test in skf:
255
+ >>> X = np.ones(10)
256
+ >>> y = [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
257
+ >>> skf = StratifiedKFold(n_folds=3)
258
+ >>> for train, test in skf.split(X, y):
258
259
... print("%s %s" % (train, test))
259
260
[2 3 6 7 8 9] [0 1 4 5]
260
261
[0 1 3 4 5 8 9] [2 6 7]
@@ -271,10 +272,12 @@ training sets and :math:`n` different tests set. This cross-validation
271
272
procedure does not waste much data as only one sample is removed from the
272
273
training set::
273
274
274
- >>> from sklearn.cross_validation import LeaveOneOut
275
+ >>> from sklearn.model_selection import LeaveOneOut
275
276
276
- >>> loo = LeaveOneOut(4)
277
- >>> for train, test in loo:
277
+ >>> n_samples = 4
278
+ >>> X = np.ones(n_samples)
279
+ >>> loo = LeaveOneOut()
280
+ >>> for train, test in loo.split(X):
278
281
... print("%s %s" % (train, test))
279
282
[1 2 3] [0]
280
283
[0 2 3] [1]
@@ -329,10 +332,11 @@ overlap for :math:`p > 1`.
329
332
330
333
Example of Leave-2-Out on a dataset with 4 samples::
331
334
332
- >>> from sklearn.cross_validation import LeavePOut
335
+ >>> from sklearn.model_selection import LeavePOut
333
336
334
- >>> lpo = LeavePOut(4, p=2)
335
- >>> for train, test in lpo:
337
+ >>> X = np.ones(4)
338
+ >>> lpo = LeavePOut(p=2)
339
+ >>> for train, test in lpo.split(X):
336
340
... print("%s %s" % (train, test))
337
341
[2 3] [0 1]
338
342
[1 3] [0 2]
@@ -357,11 +361,13 @@ For example, in the cases of multiple experiments, *LOLO* can be used to
357
361
create a cross-validation based on the different experiments: we create
358
362
a training set using the samples of all the experiments except one::
359
363
360
- >>> from sklearn.cross_validation import LeaveOneLabelOut
364
+ >>> from sklearn.model_selection import LeaveOneLabelOut
361
365
366
+ >>> X = [1, 5, 10, 50]
367
+ >>> y = [0, 1, 1, 2]
362
368
>>> labels = [1, 1, 2, 2]
363
- >>> lolo = LeaveOneLabelOut(labels )
364
- >>> for train, test in lolo:
369
+ >>> lolo = LeaveOneLabelOut()
370
+ >>> for train, test in lolo.split(X, y, labels) :
365
371
... print("%s %s" % (train, test))
366
372
[2 3] [0 1]
367
373
[0 1] [2 3]
@@ -389,11 +395,13 @@ samples related to :math:`P` labels for each training/test set.
389
395
390
396
Example of Leave-2-Label Out::
391
397
392
- >>> from sklearn.cross_validation import LeavePLabelOut
398
+ >>> from sklearn.model_selection import LeavePLabelOut
393
399
400
+ >>> X = np.arange(6)
401
+ >>> y = [1, 1, 1, 2, 2, 2]
394
402
>>> labels = [1, 1, 2, 2, 3, 3]
395
- >>> lplo = LeavePLabelOut(labels, p =2)
396
- >>> for train, test in lplo:
403
+ >>> lplo = LeavePLabelOut(n_labels =2)
404
+ >>> for train, test in lplo.split(X, y, labels) :
397
405
... print("%s %s" % (train, test))
398
406
[4 5] [0 1 2 3]
399
407
[2 3] [0 1 4 5]
@@ -416,9 +424,9 @@ generator.
416
424
417
425
Here is a usage example::
418
426
419
- >>> ss = cross_validation.ShuffleSplit(5, n_iter=3, test_size=0.25,
420
- ... random_state=0)
421
- >>> for train_index, test_index in ss:
427
+ >>> X = np.arange(5)
428
+ >>> ss = ShuffleSplit(n_iter=3, test_size=0.25, random_state=0)
429
+ >>> for train_index, test_index in ss.split(X) :
422
430
... print("%s %s" % (train_index, test_index))
423
431
...
424
432
[1 3 4] [2 0]
@@ -480,4 +488,4 @@ Cross validation and model selection
480
488
481
489
Cross validation iterators can also be used to directly perform model
482
490
selection using Grid Search for the optimal hyperparameters of the
483
- model. This is the topic if the next section: :ref: `grid_search `.
491
+ model. This is the topic of the next section: :ref: `grid_search `.
0 commit comments