8000 [MRG+1] ENH/MNT Rename labels --> groups in CV tools (#6660) · raghavrv/scikit-learn@9a12555 · GitHub
[go: up one dir, main page]

Skip to content

Commit 9a12555

Browse files
raghavrvogrisel
authored andcommitted
[MRG+1] ENH/MNT Rename labels --> groups in CV tools (scikit-learn#6660)
1 parent 8da5092 commit 9a12555

File tree

10 files changed

+618
-528
lines changed

10 files changed

+618
-528
lines changed

doc/modules/cross_validation.rst

Lines changed: 186 additions & 141 deletions
Large diffs are not rendered by default.

doc/tutorial/statistical_inference/model_selection.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ scoring method.
110110

111111
- :class:`StratifiedKFold` **(n_iter, test_size, train_size, random_state)**
112112

113-
- :class:`LabelKFold` **(n_splits, shuffle, random_state)**
113+
- :class:`GroupKFold` **(n_splits, shuffle, random_state)**
114114

115115

116116
*
@@ -119,7 +119,7 @@ scoring method.
119119

120120
- Same as K-Fold but preserves the class distribution within each fold.
121121

122-
- Ensures that the same label is not in both testing and training sets.
122+
- Ensures that the same group is not in both testing and training sets.
123123

124124

125125
.. list-table::
@@ -130,34 +130,34 @@ scoring method.
130130

131131
- :class:`StratifiedShuffleSplit`
132132

133-
- :class:`LabelShuffleSplit`
133+
- :class:`GroupShuffleSplit`
134134

135135
*
136136

137137
- Generates train/test indices based on random permutation.
138138

139139
- Same as shuffle split but preserves the class distribution within each iteration.
140140

141-
- Ensures that the same label is not in both testing and training sets.
141+
- Ensures that the same group is not in both testing and training sets.
142142

143143

144144
.. list-table::
145145

146146
*
147147

148-
- :class:`LeaveOneLabelOut` **()**
148+
- :class:`LeaveOneGroupOut` **()**
149149

150-
- :class:`LeavePLabelOut` **(p)**
150+
- :class:`LeavePGroupsOut` **(p)**
151151

152152
- :class:`LeaveOneOut` **()**
153153

154154

155155

156156
*
157157

158-
- Takes a label array to group observations.
158+
- Takes a group array to group observations.
159159

160-
- Leave P labels out.
160+
- Leave P groups out.
161161

162162
- Leave one observation out.
163163

doc/whats_new.rst

Lines changed: 48 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -64,16 +64,41 @@ Model Selection Enhancements and API Changes
6464
- **Parameters ``n_folds`` and ``n_iter`` renamed to ``n_splits``**
6565

6666
Some parameter names have changed:
67-
The ``n_folds`` parameter in :class:`model_selection.KFold`,
68-
:class:`model_selection.LabelKFold`, and
69-
:class:`model_selection.StratifiedKFold` is now renamed to ``n_splits``.
70-
The ``n_iter`` parameter in :class:`model_selection.ShuffleSplit`,
71-
:class:`model_selection.LabelShuffleSplit`,
72-
and :class:`model_selection.StratifiedShuffleSplit` is now renamed
73-
to ``n_splits``.
67+
The ``n_folds`` parameter in new :class:`model_selection.KFold`,
68+
:class:`model_selection.GroupKFold` (see below for the name change),
69+
and :class:`model_selection.StratifiedKFold` is now renamed to
70+
``n_splits``. The ``n_iter`` parameter in
71+
:class:`model_selection.ShuffleSplit`, the new class
72+
:class:`model_selection.GroupShuffleSplit` and
73+
:class:`model_selection.StratifiedShuffleSplit` is now renamed to
74+
``n_splits``.
75+
76+
- **Rename of splitter classes which accepts group labels along with data**
77+
78+
The cross-validation splitters ``LabelKFold``,
79+
``LabelShuffleSplit``, ``LeaveOneLabelOut`` and ``LeavePLabelOut`` have
80+
been renamed to :class:`model_selection.GroupKFold`,
81+
:class:`model_selection.GroupShuffleSplit`,
82+
:class:`model_selection.LeaveOneGroupOut` and
83+
:class:`model_selection.LeavePGroupsOut` respectively.
84+
85+
NOTE the change from singular to plural form in
86+
:class:`model_selection.LeavePGroupsOut`.
87+
88+
- **Fit parameter ``labels`` renamed to ``groups``**
89+
90+
The ``labels`` parameter in the :func:`split` method of the newly renamed
91+
splitters :class:`model_selection.GroupKFold`,
92+
:class:`model_selection.LeaveOneGroupOut`,
93+
:class:`model_selection.LeavePGroupsOut`,
94+
:class:`model_selection.GroupShuffleSplit` is renamed to ``groups``
95+
following the new nomenclature of their class names.
96+
97+
- **Parameter ``n_labels`` renamed to ``n_groups``**
98+
99+
The parameter ``n_labels`` in the newly renamed
100+
:class:`model_selection.LeavePGroupsOut` is changed to ``n_groups``.
74101

75-
Changelog
76-
---------
77102

78103
New features
79104
............
@@ -464,6 +489,20 @@ API changes summary
464489
:func:`metrics.classification.hamming_loss`.
465490
(`#7260 <https://github.com/scikit-learn/scikit-learn/pull/7260>`_) by
466491
`Sebastián Vanrell`_.
492+
493+
- The splitter classes ``LabelKFold``, ``LabelShuffleSplit``,
494+
``LeaveOneLabelOut`` and ``LeavePLabelsOut`` are renamed to
495+
:class:`model_selection.GroupKFold`,
496+
:class:`model_selection.GroupShuffleSplit`,
497+
:class:`model_selection.LeaveOneGroupOut`
498+
and :class:`model_selection.LeavePGroupsOut` respectively.
499+
Also the parameter ``labels`` in the :func:`split` method of the newly
500+
renamed splitters :class:`model_selection.LeaveOneGroupOut` and
501+
:class:`model_selection.LeavePGroupsOut` is renamed to
502+
``groups``. Additionally in :class:`model_selection.LeavePGroupsOut`,
503+
the parameter ``n_labels``is renamed to ``n_groups``.
504+
(`#6660 <https://github.com/scikit-learn/scikit-learn/pull/6660>`_)
505+
by `Raghav RV`_.
467506

468507

469508
.. currentmodule:: sklearn

sklearn/model_selection/__init__.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
from ._split import BaseCrossValidator
22
from ._split import KFold
3-
from ._split import LabelKFold
3+
from ._split import GroupKFold
44
from ._split import StratifiedKFold
55
from ._split import TimeSeriesSplit
6-
from ._split import LeaveOneLabelOut
6+
from ._split import LeaveOneGroupOut
77
from ._split import LeaveOneOut
8-
from ._split import LeavePLabelOut
8+
from ._split import LeavePGroupsOut
99
from ._split import LeavePOut
1010
from ._split import ShuffleSplit
11-
from ._split import LabelShuffleSplit
11+
from ._split import GroupShuffleSplit
1212
from ._split import StratifiedShuffleSplit
1313
from ._split import PredefinedSplit
1414
from ._split import train_test_split
@@ -30,11 +30,11 @@
3030
'GridSearchCV',
3131
'TimeSeriesSplit',
3232
'KFold',
33-
'LabelKFold',
34-
'LabelShuffleSplit',
35-
'LeaveOneLabelOut',
33+
'GroupKFold',
34+
'GroupShuffleSplit',
35+
'LeaveOneGroupOut',
3636
'LeaveOneOut',
37-
'LeavePLabelOut',
37+
'LeavePGroupsOut',
3838
'LeavePOut',
3939
'ParameterGrid',
4040
'ParameterSampler',

sklearn/model_selection/_search.py

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -528,15 +528,15 @@ def inverse_transform(self, Xt):
528528
self._check_is_fitted('inverse_transform')
529529
return self.best_estimator_.transform(Xt)
530530

531-
def _fit(self, X, y, labels, parameter_iterable):
531+
def _fit(self, X, y, groups, parameter_iterable):
532532
"""Actual fitting, performing the search over parameters."""
533533

534534
estimator = self.estimator
535535
cv = check_cv(self.cv, y, classifier=is_classifier(estimator))
536536
self.scorer_ = check_scoring(self.estimator, scoring=self.scoring)
537537

538-
X, y, labels = indexable(X, y, labels)
539-
n_splits = cv.get_n_splits(X, y, labels)
538+
X, y, groups = indexable(X, y, groups)
539+
n_splits = cv.get_n_splits(X, y, groups)
540540
if self.verbose > 0 and isinstance(parameter_iterable, Sized):
541541
n_candidates = len(parameter_iterable)
542542
print("Fitting {0} folds for each of {1} candidates, totalling"
@@ -554,7 +554,7 @@ def _fit(self, X, y, labels, parameter_iterable):
554554
self.fit_params, return_parameters=True,
555555
error_score=self.error_score)
556556
for parameters in parameter_iterable
557-
for train, test in cv.split(X, y, labels))
557+
for train, test in cv.split(X, y, groups))
558558

559559
test_scores, test_sample_counts, _, parameters = zip(*out)
560560

@@ -876,7 +876,7 @@ def __init__(self, estimator, param_grid, scoring=None, fit_params=None,
876876
self.param_grid = param_grid
877877
_check_param_grid(param_grid)
878878

879-
def fit(self, X, y=None, labels=None):
879+
def fit(self, X, y=None, groups=None):
880880
"""Run fit with all sets of parameters.
881881
882882
Parameters
@@ -890,11 +890,11 @@ def fit(self, X, y=None, labels=None):
890890
Target relative to X for classification or regression;
891891
None for unsupervised learning.
892892
893-
labels : array-like, with shape (n_samples,), optional
893+
groups : array-like, with shape (n_samples,), optional
894894
Group labels for the samples used while splitting the dataset into
895895
train/test set.
896896
"""
897-
return self._fit(X, y, labels, ParameterGrid(self.param_grid))
897+
return self._fit(X, y, groups, ParameterGrid(self.param_grid))
898898

899899

900900
class RandomizedSearchCV(BaseSearchCV):
@@ -1104,7 +1104,7 @@ def __init__(self, estimator, param_distributions, n_iter=10, scoring=None,
11041104
n_jobs=n_jobs, iid=iid, refit=refit, cv=cv, verbose=verbose,
11051105
pre_dispatch=pre_dispatch, error_score=error_score)
11061106

1107-
def fit(self, X, y=None, labels=None):
1107+
def fit(self, X, y=None, groups=None):
11081108
"""Run fit on the estimator with randomly drawn parameters.
11091109
11101110
Parameters
@@ -1117,11 +1117,11 @@ def fit(self, X, y=None, labels=None):
11171117
Target relative to X for classification or regression;
11181118
None for unsupervised learning.
11191119
1120-
labels : array-like, with shape (n_samples,), optional
1120+
groups : array-like, with shape (n_samples,), optional
11211121
Group labels for the samples used while splitting the dataset into
11221122
train/test set.
11231123
"""
11241124
sampled_params = ParameterSampler(self.param_distributions,
11251125
self.n_iter,
11261126
random_state=self.random_state)
1127-
return self._fit(X, y, labels, sampled_params)
1127+
return self._fit(X, y, groups, sampled_params)

0 commit comments

Comments
 (0)
0