8000 ENH Major to Minor incremental enhancements to the model_selection · raghavrv/scikit-learn@dedb892 · GitHub
[go: up one dir, main page]

Skip to content

Commit dedb892

Browse files
committed
ENH Major to Minor incremental enhancements to the model_selection
Squashed commit messages - (For reference) Major ----- * ENH p --> n_labels * FIX *ShuffleSplit: all float/invalid type errors at init and int error at split * FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings * ENH+TST KFold: make rng to be generated at every split call for reproducibility * FIX/MAINT KFold: make shuffle a public attr * FIX Make CVIterableWrapper private. * FIX reuse len_cv instead of recalculating it * FIX Prevent adding *SearchCV estimators from the old grid_search module * re-FIX In all_estimators: the sorting to use only the 1st item (name) To avoid collision between the old and the new GridSearch classes. * FIX test_validate.py: Use 2D X (1D X is being detected as a single sample) * MAINT validate.py --> validation.py * MAINT make the submodules private * MAINT Support old cv/gs/lc until 0.19 * FIX/MAINT n_splits --> get_n_splits * FIX/TST test_logistic.py/test_ovr_multinomial_iris: pass predefined folds as an iterable * MAINT expose BaseCrossValidator * Update the model_selection module with changes from master - From scikit-learn#5161 - - MAINT remove redundant p variable - - Add check for sparse prediction in cross_val_predict - From scikit-learn#5201 - DOC improve random_state param doc - From scikit-learn#5190 - LabelKFold and test - From scikit-learn#4583 - LabelShuffleSplit and tests - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests - From scikit-learn#5378 - Make the GridSearchCV docs more accurate. - From scikit-learn#5458 - Remove shuffle from LabelKFold - From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen - From scikit-learn#4826 - Move custom error / warnings into sklearn.exception Minor ----- * ENH Make the KFold shuffling test stronger * FIX/DOC Use the higher level model_selection module as ref * DOC in check_cv "y : array-like, optional" * DOC a supervised learning problem --> supervised learning problems * DOC cross-validators --> cross-validation strategies * DOC Correct Olivier Grisel's name ;) * MINOR/FIX cv_indices --> kfold * FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut * TST/FIX imports on separate lines * FIX use __class__ instead of classmethod * TST/FIX import directly from model_selection * COSMIT Relocate the random_state documentation * COSMIT remove pass * MAINT Remove deprecation warnings from old tests * FIX correct import at test_split * FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse * FIX random state to avoid doctest failure * TST n_splits and split wrapping of _CVIterableWrapper * FIX/MAINT Use multilabel indicator matrix directly * TST/DOC clarify why we conflate classes 0 and 1 * DOC add comment that this was taken from BaseEstimator * FIX use of labels is not needed in stratified k fold * Fix cross_validation reference * Fix the labels param doc
1 parent c41c667 commit dedb892

26 files changed

+881
-441
lines changed

sklearn/cross_validation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
"model_selection module into which all the refactored classes "
4040
"and functions are moved. Also note that the interface of the "
4141
"new CV iterators are different from that of this module. "
42-
"Refer to model_selection for more info.", DeprecationWarning)
42+
"This module will be removed in 0.19.", DeprecationWarning)
4343

4444

4545
__all__ = ['KFold',

sklearn/exceptions.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ class FitFailedWarning(RuntimeWarning):
8585
8686
Examples
8787
--------
88-
>>> from sklearn.grid_search import GridSearchCV
88+
>>> from sklearn.model_selection import GridSearchCV
8989
>>> from sklearn.svm import LinearSVC
9090
>>> from sklearn.exceptions import FitFailedWarning
9191
>>> import warnings

sklearn/feature_selection/rfe.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
from ..base import clone
1616
from ..base import is_classifier
1717
from ..model_selection import check_cv
18-
from ..model_selection.validate import _safe_split, _score
18+
from ..model_selection._validation import _safe_split, _score
1919
from ..metrics.scorer import check_scoring
2020
from .base import SelectorMixin
2121

@@ -414,7 +414,7 @@ def fit(self, X, y):
414414
self.estimator_ = clone(self.estimator)
415415
self.estimator_.fit(self.transform(X), y)
416416

417-
# Fixing a normalization error, n is equal to len_cv - 1
418-
# here, the scores are normalized by len_cv
419-
self.grid_scores_ = scores / cv.n_splits(X, y)
417+
# Fixing a normalization error, n is equal to get_n_splits(X, y) - 1
418+
# here, the scores are normalized by get_n_splits(X, y)
419+
self.grid_scores_ = scores / cv.get_n_splits(X, y)
420420
return self

sklearn/grid_search.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,8 @@
3939

4040
warnings.warn("This module has been deprecated in favor of the "
4141
"model_selection module into which all the refactored classes "
42-
"and functions are moved.", DeprecationWarning)
42+
"and functions are moved. This module will be removed in 0.19.",
43+
DeprecationWarning)
4344

4445

4546
class ParameterGrid(object):

sklearn/learning_curve.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@
1818

1919

2020
warnings.warn("This module has been deprecated in favor of the "
21-
"model_selection module into which all the functions are moved.",
21+
"model_selection module into which all the functions are moved."
22+
" This module will be removed in 0.19",
2223
DeprecationWarning)
2324

2425

sklearn/linear_model/coordinate_descent.py

Lines changed: 0 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1361,7 +1361,6 @@ class ElasticNetCV(LinearModelCV, RegressorMixin):
13611361
dual gap for optimality and continues until it is smaller
13621362
than ``tol``.
13631363
1364-
<<<<<<< HEAD
13651364
cv : int, cross-validation generator or an iterable, optional
13661365
Determines the cross-validation splitting strategy.
13671366
Possible inputs for cv are:
@@ -1374,13 +1373,6 @@ class ElasticNetCV(LinearModelCV, RegressorMixin):
13741373
13751374
Refer :ref:`User Guide <cross_validation>` for the various
13761375
cross-validation strategies that can be used here.
1377-
=======
1378-
cv : integer or cross-validation generator, optional
1379-
If an integer is passed, it is the number of fold (default 3).
1380-
Specific cross-validation objects can be passed, see the
1381-
:mod:`sklearn.model_selection.split` module for the list of
1382-
possible objects.
1383-
>>>>>>> ENH introduce the model_selection module
13841376
13851377
verbose : bool or integer
13861378
Amount of verbosity.
@@ -1857,7 +1849,6 @@ class MultiTaskElasticNetCV(LinearModelCV, RegressorMixin):
18571849
dual gap for optimality and continues until it is smaller
18581850
than ``tol``.
18591851
1860-
<<<<<<< HEAD
18611852
cv : int, cross-validation generator or an iterable, optional
18621853
Determines the cross-validation splitting strategy.
18631854
Possible inputs for cv are:
@@ -1870,13 +1861,6 @@ class MultiTaskElasticNetCV(LinearModelCV, RegressorMixin):
18701861
18711862
Refer :ref:`User Guide <cross_validation>` for the various
18721863
cross-validation strategies that can be used here.
1873-
=======
1874-
cv : integer or cross-validation generator, optional
1875-
If an integer is passed, it is the number of fold (default 3).
1876-
Specific cross-validation objects can be passed, see the
1877-
:mod:`sklearn.model_selection.split` module for the list of
1878-
possible objects.
1879-
>>>>>>> ENH introduce the model_selection module
18801864
18811865
verbose : bool or integer
18821866
Amount of verbosity.
@@ -2022,7 +2006,6 @@ class MultiTaskLassoCV(LinearModelCV, RegressorMixin):
20222006
dual gap for optimality and continues until it is smaller
20232007
than ``tol``.
20242008
2025-
<<<<<<< HEAD
20262009
cv : int, cross-validation generator or an iterable, optional
20272010
Determines the cross-validation splitting strategy.
20282011
Possible inputs for cv are:
@@ -2035,13 +2018,6 @@ class MultiTaskLassoCV(LinearModelCV, RegressorMixin):
20352018
20362019
Refer :ref:`User Guide <cross_validation>` for the various
20372020
cross-validation strategies that can be used here.
2038-
=======
2039-
cv : integer or cross-validation generator, optional
2040-
If an integer is passed, it is the number of fold (default 3).
2041-
Specific cross-validation objects can be passed, see the
2042-
:mod:`sklearn.model_selection.split` module for the list of
2043-
possible objects.
2044-
>>>>>>> ENH introduce the model_selection module
20452021
20462022
verbose : bool or integer
20472023
Amount of verbosity.

sklearn/linear_model/least_angle.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1089,7 +1089,7 @@ def fit(self, X, y):
10891089
method=self.method, verbose=max(0, self.verbose - 1),
10901090
normalize=self.normalize, fit_intercept=self.fit_intercept,
10911091
max_iter=self.max_iter, eps=self.eps, positive=self.positive)
1092-
for train, test in cv)
1092+
for train, test in cv.split(X, y))
10931093
all_alphas = np.concatenate(list(zip(*cv_paths))[0])
10941094
# Unique also sorts
10951095
all_alphas = np.unique(all_alphas)

sklearn/linear_model/logistic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1312,7 +1312,7 @@ class LogisticRegressionCV(LogisticRegression, BaseEstimator,
13121312
cv : integer or cross-validation generator
13131313
The default cross-validation generator used is Stratified K-Folds.
13141314
If an integer is provided, then it is the number of folds used.
1315-
See the module :mod:`sklearn.model_selection.split` module for the
1315+
See the module :mod:`sklearn.model_selection` module for the
13161316
list of possible cross-validation objects.
13171317
13181318
penalty : str, 'l1' or 'l2'

sklearn/linear_model/tests/test_least_angle.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
import numpy as np
44
from scipy import linalg
55

6-
from sklearn.cross_validation import train_test_split
6+
from sklearn.model_selection import train_test_split
77
from sklearn.utils.testing import assert_array_almost_equal
88
from sklearn.utils.testing import assert_true
99
from sklearn.utils.testing import assert_less

sklearn/linear_model/tests/test_logistic.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
import numpy as np
32
import scipy.sparse as sp
43
from scipy import linalg, optimize, sparse
@@ -455,16 +454,24 @@ def test_ovr_multinomial_iris():
455454
train, target = iris.data, iris.target
456455
n_samples, n_features = train.shape
457456

458-
# Use pre-defined fold as folds generated for different y
457+
# The cv indices from stratified kfold (where stratification is done based
458+
# on the fine-grained iris classes, i.e, before the classes 0 and 1 are
459+
# conflated) is used for both clf and clf1
459460
cv = StratifiedKFold(3)
460-
clf = LogisticRegressionCV(cv=cv)
461+
precomputed_folds = list(cv.split(train, target))
462+
463+
# Train clf on the original dataset where classes 0 and 1 are separated
464+
clf = LogisticRegressionCV(cv=precomputed_folds)
461465
clf.fit(train, target)
462466

463-
clf1 = LogisticRegressionCV(cv=cv)
467+
# Conflate classes 0 and 1 and train clf1 on this modifed dataset
468+
clf1 = LogisticRegressionCV(cv=precomputed_folds)
464469
target_copy = target.copy()
465470
target_copy[target_copy == 0] = 1
466471
clf1.fit(train, target_copy)
467472

473+
# Ensure that what OvR learns for class2 is same regardless of whether
474+
# classes 0 and 1 are separated or not
468475
assert_array_almost_equal(clf.scores_[2], clf1.scores_[2])
469476
assert_array_almost_equal(clf.intercept_[2:], clf1.intercept_)
470477
assert_array_almost_equal(clf.coef_[2][np.newaxis, :], clf1.coef_)

0 commit comments

Comments
 (0)
0