10000 ENH Major to Minor incremental enhancements to the model_selection · raghavrv/scikit-learn@d0be9d9 · GitHub
[go: up one dir, main page]

Skip to content

Commit d0be9d9

Browse files
committed
ENH Major to Minor incremental enhancements to the model_selection
Squashed commit messages - (For reference) Major ----- * ENH p --> n_labels * FIX *ShuffleSplit: all float/invalid type errors at init and int error at split * FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings * ENH+TST KFold: make rng to be generated at every split call for reproducibility * FIX/MAINT KFold: make shuffle a public attr * FIX Make CVIterableWrapper private. * FIX reuse len_cv instead of recalculating it * FIX Prevent adding *SearchCV estimators from the old grid_search module * re-FIX In all_estimators: the sorting to use only the 1st item (name) To avoid collision between the old and the new GridSearch classes. * FIX test_validate.py: Use 2D X (1D X is being detected as a single sample) * MAINT validate.py --> validation.py * MAINT make the submodules private * MAINT Support old cv/gs/lc until 0.19 * FIX/MAINT n_splits --> get_n_splits * FIX/TST test_logistic.py/test_ovr_multinomial_iris: pass predefined folds as an iterable * MAINT expose BaseCrossValidator * Update the model_selection module with changes from master - From scikit-learn#5161 - - MAINT remove redundant p variable - - Add check for sparse prediction in cross_val_predict - From scikit-learn#5201 - DOC improve random_state param doc - From scikit-learn#5190 - LabelKFold and test - From scikit-learn#4583 - LabelShuffleSplit and tests - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests - From scikit-learn#5378 - Make the GridSearchCV docs more accurate. Minor ----- * ENH Make the KFold shuffling test stronger * FIX/DOC Use the higher level model_selection module as ref * DOC in check_cv "y : array-like, optional" * DOC a supervised learning problem --> supervised learning problems * DOC cross-validators --> cross-validation strategies * DOC Correct Olivier Grisel's name ;) * MINOR/FIX cv_indices --> kfold * FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut * TST/FIX imports on separate lines * FIX use __class__ instead of classmethod * TST/FIX import directly from model_selection * COSMIT Relocate the random_state documentation * COSMIT remove pass * MAINT Remove deprecation warnings from old tests * FIX correct import at test_split * FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse * FIX random state to avoid doctest failure * TST n_splits and split wrapping of _CVIterableWrapper * FIX/MAINT Use multilabel indicator matrix directly * TST/DOC clarify why we conflate classes 0 and 1 * DOC add comment that this was taken from BaseEstimator * FIX use of labels is not needed in stratified k fold * Fix cross_validation reference * Fix the labels param doc
1 parent 0e6c2f9 commit d0be9d9

23 files changed

+886
-427
lines changed

sklearn/cross_validation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
"model_selection module into which all the refactored classes "
3838
"and functions are moved. Also note that the interface of the "
3939
"new CV iterators are different from that of this module. "
40-
"Refer to model_selection for more info.", DeprecationWarning)
40+
"This module will be removed in 0.19.", DeprecationWarning)
4141

4242

4343
__all__ = ['KFold',

sklearn/feature_selection/rfe.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
from ..base import clone
1616
from ..base import is_classifier
1717
from ..model_selection import check_cv
18-
from ..model_selection.validate import _safe_split, _score
18+
from ..model_selection._validation import _safe_split, _score
1919
from ..metrics.scorer import check_scoring
2020
from .base import SelectorMixin
2121

@@ -447,7 +447,7 @@ def fit(self, X, y):
447447
self.estimator_.set_params(**self.estimator_params)
448448
self.estimator_.fit(self.transform(X), y)
449449

450-
# Fixing a normalization error, n is equal to len_cv - 1
451-
# here, the scores are normalized by len_cv
452-
self.grid_scores_ = scores / cv.n_splits(X, y)
450+
# Fixing a normalization error, n is equal to get_n_splits(X, y) - 1
451+
# here, the scores are normalized by get_n_splits(X, y)
452+
self.grid_scores_ = scores / cv.get_n_splits(X, y)
453453
return self

sklearn/grid_search.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,8 @@
3838

3939
warnings.warn("This module has been deprecated in favor of the "
4040
"model_selection module into which all the refactored classes "
41-
"and functions are moved.", DeprecationWarning)
41+
"and functions are moved. This module will be removed in 0.19.",
42+
DeprecationWarning)
4243

4344

4445
class ParameterGrid(object):

sklearn/learning_curve.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@
1818

1919

2020
warnings.warn("This module has been deprecated in favor of the "
21-
"model_selection module into which all the functions are moved.",
21+
"model_selection module into which all the functions are moved."
22+
" This module will be removed in 0.19",
2223
DeprecationWarning)
2324

2425

sklearn/linear_model/coordinate_descent.py

Lines changed: 0 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1371,7 +1371,6 @@ class ElasticNetCV(LinearModelCV, RegressorMixin):
13711371
dual gap for optimality and continues until it is smaller
13721372
than ``tol``.
13731373
1374-
<<<<<<< HEAD
13751374
cv : int, cross-validation generator or an iterable, optional
13761375
Determines the cross-validation splitting strategy.
13771376
Possible inputs for cv are:
@@ -1384,13 +1383,6 @@ class ElasticNetCV(LinearModelCV, RegressorMixin):
13841383
13851384
Refer :ref:`User Guide <cross_validation>` for the various
13861385
cross-validation strategies that can be used here.
1387-
=======
1388-
cv : integer or cross-validation generator, optional
1389-
If an integer is passed, it is the number of fold (default 3).
1390-
Specific cross-validation objects can be passed, see the
1391-
:mod:`sklearn.model_selection.split` module for the list of
1392-
possible objects.
1393-
>>>>>>> ENH introduce the model_selection module
13941386
13951387
verbose : bool or integer
13961388
Amount of verbosity.
@@ -1867,7 +1859,6 @@ class MultiTaskElasticNetCV(LinearModelCV, RegressorMixin):
18671859
dual gap for optimality and continues until it is smaller
18681860
than ``tol``.
18691861
1870-
<<<<<<< HEAD
18711862
cv : int, cross-validation generator or an iterable, optional
18721863
Determines the cross-validation splitting strategy.
18731864
Possible inputs for cv are:
@@ -1880,13 +1871,6 @@ class MultiTaskElasticNetCV(LinearModelCV, RegressorMixin):
18801871
18811872
Refer :ref:`User Guide <cross_validation>` for the various
18821873
cross-validation strategies that can be used here.
1883-
=======
1884-
cv : integer or cross-validation generator, optional
1885-
If an integer is passed, it is the number of fold (default 3).
1886-
Specific cross-validation objects can be passed, see the
1887-
:mod:`sklearn.model_selection.split` module for the list of
1888-
possible objects.
1889-
>>>>>>> ENH introduce the model_selection module
18901874
18911875
verbose : bool or integer
18921876
Amount of verbosity.
@@ -2032,7 +2016,6 @@ class MultiTaskLassoCV(LinearModelCV, RegressorMixin):
20322016
dual gap for optimality and continues until it is smaller
20332017
than ``tol``.
20342018
2035-
<<<<<<< HEAD
20362019
cv : int, cross-validation generator or an iterable, optional
20372020
Determines the cross-validation splitting strategy.
20382021
Possible inputs for cv are:
@@ -2045,13 +2028,6 @@ class MultiTaskLassoCV(LinearModelCV, RegressorMixin):
20452028
20462029
Refer :ref:`User Guide <cross_validation>` for the various
20472030
cross-validation strategies that can be used here.
2048-
=======
2049-
cv : integer or cross-validation generator, optional
2050-
If an integer is passed, it is the number of fold (default 3).
2051-
Specific cross-validation objects can be passed, see the
2052-
:mod:`sklearn.model_selection.split` module for the list of
2053-
possible objects.
2054-
>>>>>>> ENH introduce the model_selection module
20552031
20562032
verbose : bool or integer
20572033
Amount of verbosity.

sklearn/linear_model/least_angle.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1089,7 +1089,7 @@ def fit(self, X, y):
10891089
method=self.method, verbose=max(0, self.verbose - 1),
10901090
normalize=self.normalize, fit_intercept=self.fit_intercept,
10911091
max_iter=self.max_iter, eps=self.eps, positive=self.positive)
1092-
for train, test in cv)
1092+
for train, test in cv.split(X, y))
10931093
all_alphas = np.concatenate(list(zip(*cv_paths))[0])
10941094
# Unique also sorts
10951095
all_alphas = np.unique(all_alphas)

sklearn/linear_model/logistic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1310,7 +1310,7 @@ class LogisticRegressionCV(LogisticRegression, BaseEstimator,
13101310
cv : integer or cross-validation generator
13111311
The default cross-validation generator used is Stratified K-Folds.
13121312
If an integer is provided, then it is the number of folds used.
1313-
See the module :mod:`sklearn.model_selection.split` module for the
1313+
See the module :mod:`sklearn.model_selection` module for the
13141314
list of possible cross-validation objects.
13151315
13161316
penalty : str, 'l1' or 'l2'

sklearn/linear_model/tests/test_logistic.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
import numpy as np
32
import scipy.sparse as sp
43
from scipy import linalg, optimize, sparse
@@ -454,16 +453,24 @@ def test_ovr_multinomial_iris():
454453
train, target = iris.data, iris.target
455454
n_samples, n_features = train.shape
456455

457-
# Use pre-defined fold as folds generated for different y
456+
# The cv indices from stratified kfold (where stratification is done based
457+
# on the fine-grained iris classes, i.e, before the classes 0 and 1 are
458+
# conflated) is used for both clf and clf1
458459
cv = StratifiedKFold(3)
459-
clf = LogisticRegressionCV(cv=cv)
460+
precomputed_folds = list(cv.split(train, target))
461+
462+
# Train clf on the original dataset where classes 0 and 1 are separated
463+
clf = LogisticRegressionCV(cv=precomputed_folds)
460464
clf.fit(train, target)
461465

462-
clf1 = LogisticRegressionCV(cv=cv)
466+
# Conflate classes 0 and 1 and train clf1 on this modifed dataset
467+
clf1 = LogisticRegressionCV(cv=precomputed_folds)
463468
target_copy = target.copy()
464469
target_copy[target_copy == 0] = 1
465470
clf1.fit(train, target_copy)
466471

472+
# Ensure that what OvR learns for class2 is same regardless of whether
473+
# classes 0 and 1 are separated or not
467474
assert_array_almost_equal(clf.scores_[2], clf1.scores_[2])
468475
assert_array_almost_equal(clf.intercept_[2:], clf1.intercept_)
469476
assert_array_almost_equal(clf.coef_[2][np.newaxis, :], clf1.coef_)

sklearn/metrics/scorer.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
arbitrary score functions.
55
66
A scorer object is a callable that can be passed to
7-
:class:`sklearn.model_selection.search.GridSearchCV` or
8-
:func:`sklearn.model_selection.validation.cross_val_score` as the ``scoring``
7+
:class:`sklearn.model_selection.GridSearchCV` or
8+
:func:`sklearn.model_selection.cross_val_score` as the ``scoring``
99
parameter, to specify how a model should be evaluated.
1010
1111
The signature of the call is ``(estimator, X, y)`` where ``estimator``

sklearn/model_selection/__init__.py

Lines changed: 32 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,35 @@
1-
from .split import KFold
2-
from .split import StratifiedKFold
3-
from .split import LeaveOneLabelOut
4-
from .split import LeaveOneOut
5-
from .split import LeavePLabelOut
6-
from .split import LeavePOut
7-
from .split import ShuffleSplit
8-
from .split import StratifiedShuffleSplit
9-
from .split import PredefinedSplit
10-
from .split import train_test_split
11-
from .split import check_cv
1+
from ._split import BaseCrossValidator
2+
from ._split import KFold
3+
from ._split import LabelKFold
4+
from ._split import StratifiedKFold
5+
from ._split import LeaveOneLabelOut
6+
from ._split import LeaveOneOut
7+
from ._split import LeavePLabelOut
8+
from ._split import LeavePOut
9+
from ._split import ShuffleSplit
10+
from ._split import LabelShuffleSplit
11+
from ._split import StratifiedShuffleSplit
12+
from ._split import PredefinedSplit
13+
from ._split import train_test_split
14+
from ._split import check_cv
1215

13-
from .validate import cross_val_score
14-
from .validate import cross_val_predict
15-
from .validate import learning_curve
16-
from .validate import permutation_test_score
17-
from .validate import validation_curve
16+
from ._validation import cross_val_score
17+
from ._validation import cross_val_predict
18+
from ._validation import learning_curve
19+
from ._validation import permutation_test_score
20+
from ._validation import validation_curve
1821

19-
from .search import GridSearchCV
20-
from .search import RandomizedSearchCV
21-
from .search import ParameterGrid
22-
from .search import ParameterSampler
23-
from .search import fit_grid_point
22+
from ._search import GridSearchCV
23+
from ._search import RandomizedSearchCV
24+
from ._search import ParameterGrid
25+
from ._search import ParameterSampler
26+
from ._search import fit_grid_point
2427

25-
__all__ = ('split',
26-
'validate',
27-
'search',
28-
'KFold',
29-
'StratifiedKFold',
30-
'LeaveOneLabelOut',
31-
'LeaveOneOut',
32-
'LeavePLabelOut',
33-
'LeavePOut',
34-
'ShuffleSplit',
35-
'StratifiedShuffleSplit',
36-
'PredefinedSplit',
37-
'train_test_split',
38-
'check_cv',
39-
'cross_val_score',
40-
'cross_val_predict',
41-
'permutation_test_score',
42-
'learning_curve',
43-
'validation_curve',
44-
'GridSearchCV',
45-
'ParameterGrid',
46-
'fit_grid_point',
47-
'ParameterSampler',
48-
'RandomizedSearchCV')
28+
__all__ = ('BaseCrossValidator', 'GridSearchCV', 'KFold', 'LabelKFold',
29+
'LeaveOneLabelOut', 'LeaveOneOut', 'LeavePLabelOut', 'LeavePOut',
30+
'ParameterGrid', 'ParameterSampler', 'PredefinedSplit',
31+
'RandomizedSearchCV', 'ShuffleSplit', 'LabelShuffleSplit',
32+
'StratifiedKFold', 'StratifiedShuffleSplit', 'check_cv',
33+
'cross_val_predict', 'cross_val_score', 'fit_grid_point',
34+
'learning_curve', 'permutation_test_score', 'train_test_split',
35+
'validation_curve')

0 commit comments

Comments
 (0)
0