8000 DOC/FIX update doc to refer to model_selection · raghavrv/scikit-learn@a545a4f · GitHub
[go: up one dir, main page]

Skip to content

Commit a545a4f

Browse files
committed
DOC/FIX update doc to refer to model_selection
FIX/DOC import ShuffleSplit from model_selection FIX/DOC p --> n_labels
1 parent 5a308b2 commit a545a4f

8 files changed

+110
-127
lines changed

doc/modules/classes.rst

Lines changed: 35 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -143,43 +143,58 @@ Classes
143143
covariance.oas
144144
covariance.graph_lasso
145145

146+
.. _model_selection_ref:
146147

147-
.. _cross_validation_ref:
148-
149-
:mod:`sklearn.cross_validation`: Cross Validation
150-
=================================================
148+
:mod:`sklearn.model_selection`: Model Selection
149+
===============================================
151150

152-
.. automodule:: sklearn.cross_validation
151+
.. automodule:: sklearn.model_selection
153152
:no-members:
154153
:no-inherited-members:
155154

156-
**User guide:** See the :ref:`cross_validation` section for further details.
155+
**User guide:** See the :ref:`cross_validation`, :ref:`grid_search` and
156+
:ref:`learning_curve` sections for further details.
157+
157158

159+
Classes
160+
-------
158161
.. currentmodule:: sklearn
159162

160163
.. autosummary::
161164
:toctree: generated/
162165
:template: class.rst
163166

164-
cross_validation.KFold
165-
cross_validation.LeaveOneLabelOut
166-
cross_validation.LeaveOneOut
167-
cross_validation.LeavePLabelOut
168-
cross_validation.LeavePOut
169-
cross_validation.PredefinedSplit
170-
cross_validation.StratifiedKFold
171-
cross_validation.ShuffleSplit
172-
cross_validation.StratifiedShuffleSplit
167+
model_selection.KFold
168+
model_selection.LeaveOneLabelOut
169+
model_selection.LeaveOneOut
170+
model_selection.LeavePLabelOut
171+
model_selection.LeavePOut
172+
model_selection.PredefinedSplit
173+
model_selection.StratifiedKFold
174+
model_selection.ShuffleSplit
175+
model_selection.StratifiedShuffleSplit
176+
model_selection.GridSearchCV
177+
model_selection.ParameterGrid
178+
model_selection.ParameterSampler
179+
model_selection.RandomizedSearchCV
180+
181+
182+
Functions
183+
---------
184+
.. currentmodule:: sklearn
173185

174186
.. autosummary::
175187
:toctree: generated/
176188
:template: function.rst
177189

178-
cross_validation.train_test_split
179-
cross_validation.cross_val_score
180-
cross_validation.cross_val_predict
181-
cross_validation.permutation_test_score
182-
cross_validation.check_cv
190+
model_selection.train_test_split
191+
model_selection.cross_val_score
192+
model_selection.cross_val_predict
193+
model_selection.permutation_test_score
194+
model_selection.check_cv
195+
model_selection.learning_curve
196+
model_selection.validation_curve
197+
183198

184199
.. _datasets_ref:
185200

@@ -508,29 +523,6 @@ From text
508523
gaussian_process.regression_models.quadratic
509524

510525

511-
.. _grid_search_ref:
512-
513-
:mod:`sklearn.grid_search`: Grid Search
514-
=======================================
515-
516-
.. automodule:: sklearn.grid_search
517-
:no-members:
518-
:no-inherited-members:
519-
520-
**User guide:** See the :ref:`grid_search` section for further details.
521-
522-
.. currentmodule:: sklearn
523-
524-
.. autosummary::
525-
:toctree: generated/
526-
:template: class.rst
527-
528-
grid_search.GridSearchCV
529-
grid_search.ParameterGrid
530-
grid_search.ParameterSampler
531-
grid_search.RandomizedSearchCV
532-
533-
534526
.. _isotonic_ref:
535527

536528
:mod:`sklearn.isotonic`: Isotonic regression
@@ -618,24 +610,6 @@ From text
618610
lda.LDA
619611

620612

621-
.. _learning_curve_ref:
622-
623-
:mod:`sklearn.learning_curve` Learning curve evaluation
624-
=======================================================
625-
626-
.. automodule:: sklearn.learning_curve
627-
:no-members:
628-
:no-inherited-members:
629-
630-
.. currentmodule:: sklearn
631-
632-
.. autosummary::
633-
:toctree: generated/
634-
:template: function.rst
635-
636-
learning_curve.learning_curve
637-
learning_curve.validation_curve
638-
639613
.. _linear_model_ref:
640614

641615
:mod:`sklearn.linear_model`: Generalized Linear Models

doc/modules/cross_validation.rst

Lines changed: 47 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
Cross-validation: evaluating estimator performance
55
===================================================
66

7-
.. currentmodule:: sklearn.cross_validation
7+
.. currentmodule:: sklearn.model_selection
88

99
Learning the parameters of a prediction function and testing it on the
1010
same data is a methodological mistake: a model that would just repeat
@@ -24,7 +24,7 @@ can be quickly computed with the :func:`train_test_split` helper function.
2424
Let's load the iris data set to fit a linear support vector machine on it::
2525

2626
>>> import numpy as np
27-
>>> from sklearn import cross_validation
27+
>>> from sklearn.model_selection import train_test_split
2828
>>> from sklearn import datasets
2929
>>> from sklearn import svm
3030

@@ -35,7 +35,7 @@ Let's load the iris data set to fit a linear support vector machine on it::
3535
We can now quickly sample a training set while holding out 40% of the
3636
data for testing (evaluating) our classifier::
3737

38-
>>> X_train, X_test, y_train, y_test = cross_validation.train_test_split(
38+
>>> X_train, X_test, y_train, y_test = train_test_split(
3939
... iris.data, iris.target, test_size=0.4, random_state=0)
4040

4141
>>> X_train.shape, y_train.shape
@@ -101,10 +101,9 @@ kernel support vector machine on the iris dataset by splitting the data, fitting
101101
a model and computing the score 5 consecutive times (with different splits each
102102
time)::
103103

104+
>>> from sklearn.model_selection import cross_val_score
104105
>>> clf = svm.SVC(kernel='linear', C=1)
105-
>>> scores = cross_validation.cross_val_score(
106-
... clf, iris.data, iris.target, cv=5)
107-
...
106+
>>> scores = cross_val_score(clf, iris.data, iris.target, cv=5)
108107
>>> scores # doctest: +ELLIPSIS
109108
array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ])
110109

@@ -119,8 +118,8 @@ method of the estimator. It is possible to change this by using the
119118
scoring parameter::
120119

121120
>>> from sklearn import metrics
122-
>>> scores = cross_validation.cross_val_score(clf, iris.data, iris.target,
123-
... cv=5, scoring='f1_weighted')
121+
>>> scores = cross_val_score(
122+
... clf, iris.data, iris.target, cv=5, scoring='f1_weighted')
124123
>>> scores # doctest: +ELLIPSIS
125124
array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ])
126125

@@ -136,11 +135,11 @@ being used if the estimator derives from :class:`ClassifierMixin
136135
It is also possible to use other cross validation strategies by passing a cross
137136
validation iterator instead, for instance::
138137

139-
>>> n_samples = iris.data.shape[0]
140-
>>> cv = cross_validation.ShuffleSplit(n_samples, n_iter=3,
141-
... test_size=0.3, random_state=0)
138+
>>> from sklearn.model_selection import ShuffleSplit
142139

143-
>>> cross_validation.cross_val_score(clf, iris.data, iris.target, cv=cv)
140+
>>> n_samples = iris.data.shape[0]
141+
>>> cv = ShuffleSplit(n_iter=3, test_size=0.3, random_state=0)
142+
>>> cross_val_score(clf, iris.data, iris.target, cv=cv)
144143
... # doctest: +ELLIPSIS
145144
array([ 0.97..., 0.97..., 1. ])
146145

@@ -153,7 +152,7 @@ validation iterator instead, for instance::
153152
be learnt from a training set and applied to held-out data for prediction::
154153

155154
>>> from sklearn import preprocessing
156-
>>> X_train, X_test, y_train, y_test = cross_validation.train_test_split(
155+
>>> X_train, X_test, y_train, y_test = train_test_split(
157156
... iris.data, iris.target, test_size=0.4, random_state=0)
158157
>>> scaler = preprocessing.StandardScaler().fit(X_train)
159158
>>> X_train_transformed = scaler.transform(X_train)
@@ -167,7 +166,7 @@ validation iterator instead, for instance::
167166

168167
>>> from sklearn.pipeline import make_pipeline
169168
>>> clf = make_pipeline(preprocessing.StandardScaler(), svm.SVC(C=1))
170-
>>> cross_validation.cross_val_score(clf, iris.data, iris.target, cv=cv)
169+
>>> cross_val_score(clf, iris.data, iris.target, cv=cv)
171170
... # doctest: +ELLIPSIS
172171
array([ 0.97..., 0.93..., 0.95...])
173172

@@ -184,8 +183,8 @@ can be used (otherwise, an exception is raised).
184183

185184
These prediction can then be used to evaluate the classifier::
186185

187-
>>> predicted = cross_validation.cross_val_predict(clf, iris.data,
188-
... iris.target, cv=10)
186+
>>> from sklearn.model_selection import cross_val_predict
187+
>>> predicted = cross_val_predict(clf, iris.data, iris.target, cv=10)
189188
>>> metrics.accuracy_score(iris.target, predicted) # doctest: +ELLIPSIS
190189
0.966...
191190

@@ -223,10 +222,11 @@ learned using :math:`k - 1` folds, and the fold left out is used for test.
223222
Example of 2-fold cross-validation on a dataset with 4 samples::
224223

225224
>>> import numpy as np
226-
>>> from sklearn.cross_validation import KFold
225+
>>> from sklearn.model_selection import KFold
227226

228-
>>> kf = KFold(4, n_folds=2)
229-
>>> for train, test in kf:
227+
>>> X = np.ones(4)
228+
>>> kf = KFold(n_folds=2)
229+
>>> for train, test in kf.split(X):
230230
... print("%s %s" % (train, test))
231231
[2 3] [0 1]
232232
[0 1] [2 3]
@@ -250,11 +250,12 @@ target class as the complete set.
250250
Example of stratified 3-fold cross-validation on a dataset with 10 samples from
251251
two slightly unbalanced classes::
252252

253-
>>> from sklearn.cross_validation import StratifiedKFold
253+
>>> from sklearn.model_selection import StratifiedKFold
254254

255-
>>> labels = [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
256-
>>> skf = StratifiedKFold(labels, 3)
257-
>>> for train, test in skf:
255+
>>> X = np.ones(10)
256+
>>> y = [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
257+
>>> skf = StratifiedKFold(n_folds=3)
258+
>>> for train, test in skf.split(X, y):
258259
... print("%s %s" % (train, test))
259260
[2 3 6 7 8 9] [0 1 4 5]
260261
[0 1 3 4 5 8 9] [2 6 7]
@@ -271,10 +272,12 @@ training sets and :math:`n` different tests set. This cross-validation
271272
procedure does not waste much data as only one sample is removed from the
272273
training set::
273274

274-
>>> from sklearn.cross_validation import LeaveOneOut
275+
>>> from sklearn.model_selection import LeaveOneOut
275276

276-
>>> loo = LeaveOneOut(4)
277-
>>> for train, test in loo:
277+
>>> n_samples = 4
278+
>>> X = np.ones(n_samples)
279+
>>> loo = LeaveOneOut()
280+
>>> for train, test in loo.split(X):
278281
... print("%s %s" % (train, test))
279282
[1 2 3] [0]
280283
[0 2 3] [1]
@@ -329,10 +332,11 @@ overlap for :math:`p > 1`.
329332

330333
Example of Leave-2-Out on a dataset with 4 samples::
331334

332-
>>> from sklearn.cross_validation import LeavePOut
335+
>>> from sklearn.model_selection import LeavePOut
333336

334-
>>> lpo = LeavePOut(4, p=2)
335-
>>> for train, test in lpo:
337+
>>> X = np.ones(4)
338+
>>> lpo = LeavePOut(p=2)
339+
>>> for train, test in lpo.split(X):
336340
... print("%s %s" % (train, test))
337341
[2 3] [0 1]
338342
[1 3] [0 2]
@@ -357,11 +361,13 @@ For example, in the cases of multiple experiments, *LOLO* can be used to
357361
create a cross-validation based on the different experiments: we create
358362
a training set using the samples of all the experiments except one::
359363

360-
>>> from sklearn.cross_validation import LeaveOneLabelOut
364+
>>> from sklearn.model_selection import LeaveOneLabelOut
361365

366+
>>> X = [1, 5, 10, 50]
367+
>>> y = [0, 1, 1, 2]
362368
>>> labels = [1, 1, 2, 2]
363-
>>> lolo = LeaveOneLabelOut(labels)
364-
>>> for train, test in lolo:
369+
>>> lolo = LeaveOneLabelOut()
370+
>>> for train, test in lolo.split(X, y, labels):
365371
... print("%s %s" % (train, test))
366372
[2 3] [0 1]
367373
[0 1] [2 3]
@@ -389,11 +395,13 @@ samples related to :math:`P` labels for each training/test set.
389395

390396
Example of Leave-2-Label Out::
391397

392-
>>> from sklearn.cross_validation import LeavePLabelOut
398+
>>> from sklearn.model_selection import LeavePLabelOut
393399

400+
>>> X = np.arange(6)
401+
>>> y = [1, 1, 1, 2, 2, 2]
394402
>>> labels = [1, 1, 2, 2, 3, 3]
395-
>>> lplo = LeavePLabelOut(labels, p=2)
396-
>>> for train, test in lplo:
403+
>>> lplo = LeavePLabelOut(n_labels=2)
404+
>>> for train, test in lplo.split(X, y, labels):
397405
... print("%s %s" % (train, test))
398406
[4 5] [0 1 2 3]
399407
[2 3] [0 1 4 5]
416424

417425
Here is a usage example::
418426

419-
>>> ss = cross_validation.ShuffleSplit(5, n_iter=3, test_size=0.25,
420-
... random_state=0)
421-
>>> for train_index, test_index in ss:
427+
>>> X = np.arange(5)
428+
>>> ss = ShuffleSplit(n_iter=3, test_size=0.25, random_state=0)
429+
>>> for train_index, test_index in ss.split(X):
422430
... print("%s %s" % (train_index, test_index))
423431
...
424432
[1 3 4] [2 0]
@@ -480,4 +488,4 @@ Cross validation and model selection
480488

481489
Cross validation iterators can also be used to directly perform model
482490
selection using Grid Search for the optimal hyperparameters of the
483-
model. This is the topic if the next section: :ref:`grid_search`.
491+
model. This is the topic of the next section: :ref:`grid_search`.

doc/modules/ensemble.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,7 @@ picked as the splitting rule. This usually allows to reduce the variance
156156
of the model a bit more, at the expense of a slightly greater increase
157157
in bias::
158158

159-
>>> from sklearn.cross_validation import cross_val_score
159+
>>> from sklearn.model_selection import cross_val_score
160160
>>> from sklearn.datasets import make_blobs
161161
>>> from sklearn.ensemble import RandomForestClassifier
162162
>>> from sklearn.ensemble import ExtraTreesClassifier
@@ -357,7 +357,7 @@ Usage
357357
The following example shows how to fit an AdaBoost classifier with 100 weak
358358
learners::
359359

360-
>>> from sklearn.cross_validation import cross_val_score
360+
>>> from sklearn.model_selection import cross_val_score
361361
>>> from sklearn.datasets import load_iris
362362
>>> from sklearn.ensemble import AdaBoostClassifier
363363

@@ -945,7 +945,7 @@ Usage
945945
The following example shows how to fit the majority rule classifier::
946946

947947
>>> from sklearn import datasets
948-
>>> from sklearn import cross_validation
948+
>>> from sklearn.model_selection import cross_val_score
949949
>>> from sklearn.linear_model import LogisticRegression
950950
>>> from sklearn.naive_bayes import GaussianNB
951951
>>> from sklearn.ensemble import RandomForestClassifier
@@ -961,7 +961,7 @@ The following example shows how to fit the majority rule classifier::
961961
>>> eclf = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)], voting='hard')
962962

963963
>>> for clf, label in zip([clf1, clf2, clf3, eclf], ['Logistic Regression', 'Random Forest', 'naive Bayes', 'Ensemble']):
964-
... scores = cross_validation.cross_val_score(clf, X, y, cv=5, scoring='accuracy')
964+
... scores = cross_val_score(clf, X, y, cv=5, scoring='accuracy')
965965
... print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))
966966
Accuracy: 0.90 (+/- 0.05) [Logistic Regression]
967967
Accuracy: 0.93 (+/- 0.05) [Random Forest]
@@ -1038,7 +1038,7 @@ Using the `VotingClassifier` with `GridSearch`
10381038
The `VotingClassifier` can also be used together with `GridSearch` in order
10391039
to tune the hyperparameters of the individual estimators::
10401040

1041-
>>> from sklearn.grid_search import GridSearchCV
1041+
>>> from sklearn.model_selection import GridSearchCV
10421042
>>> clf1 = LogisticRegression(random_state=1)
10431043
>>> clf2 = RandomForestClassifier(random_state=1)
10441044
>>> clf3 = GaussianNB()

0 commit comments

Comments
 (0)
0