[MRG + 2] ShuffleLabelsOut cross-validation iterator #4583

bmcfee · 2015-04-13T15:47:41Z

This PR implements a stochastic variant of leave-p-labels-out CV splitting. Rather than iterating over all n_classes \choose p subsets of labels for the test points, a user-specified number (fraction) of the labels are randomly selected. The number of iterations is also user-specified, as in ShuffleSplit.

This is similar to #4444 , except that no balance is maintained at the level of individual samples.

I'm not sure if there's another PR implementing this exactly this functionality already, but I couldn't find any. However, if I missed it, and someone's already implemented this, feel free to ignore!

A realistic example use case for this is in music information retrieval, where it is common to have a collection of recordings indexed by artist, each artist has a small number of data points, and all of an artist's points should reside on one side of the train/test split. In this setting, the number of labels is large, so an exhaustive LeavePLabelsOut search is undesirable.

merge downstream

amueller · 2015-04-13T19:25:32Z

sklearn/cross_validation.py

+    The difference between LeavePLabelOut and ShuffleLabelsOut is that
+    the former generates splits using all subsets of size `p` unique labels,
+    whereas ShuffleLabelsOut generates a user-determined number of random
+    test splits, each with `p` unique labels.


There is no p in the parametrization, right?

amueller · 2015-04-13T19:25:56Z

Can you please remove the "merged master" commit from the PR?

ogrisel · 2015-04-14T19:23:52Z

The easiest way to do that is probably to branch off of master a new branch and then cherry-pick the non-merge commits of your current shuffle-labels-out branch into that new branch.

If the result of that operation looks good on your local host (the tests still pass and you did not loose any code), please feel free to git push --force bmcfee shuffle-labels-out to update this PR with the code from your new branch.

BigCrunsh · 2015-05-12T06:20:58Z

@bmcfee: thx for implementing that. I think this is really useful. However, I discovered a bug:

ShuffleLabelsOut(np.array([1,2,3]))
AttributeError: 'ShuffleLabelsOut' object has no attribute 'y'

The reason is that you forget to initialize self.y, which is used in __repr__. It's an easy fix ;)
Btw.: It would be nice if you could add tests just calling all new functions.

amueller · 2015-05-12T15:53:50Z

sklearn/cross_validation.py

+        Pseudo-random number generator state used for random sampling.
+    '''
+
+    def __init__(self, y, n_iter=5, test_size=0.2, train_size=None,


I feel this shouldn't be called y but rather labels

amueller · 2015-06-23T15:20:54Z

LGTM.

vene · 2015-06-23T16:03:10Z

sklearn/tests/test_cross_validation.py

+                                    random_state=0)
+
+        # Make sure the repr works
+        repr(slo)


At some point this should be in a common test for all cross-validation generators, but this is not the place.

vene · 2015-06-23T16:23:08Z

The implementation looks good to me. I was just thinking that the _iter_indices implementation could be reused to provide a LabelKFold as well, but I don't think that's important.

Could you emphasize that train_size and test_size refer to labels and not samples? I would add something like this in the top part of the docstring.

For example, a less computationally intensive alternative to LeavePLabelOut(labels, p=10) would be ShuffleLabelsOut(labels, test_size=10, n_iter=100).

Note: The parameters test_size and train_size refer to labels and not to samples, as in ShuffleSplit.

Also, could you add this to /doc/whats_new.rst and to the user's guide (/doc/modules/cross_validation.rst?

Thanks!

ogrisel · 2015-07-02T07:12:53Z

@bmcfee could you please rebase on top of master and reference this new class in doc/modules/classes.rst? Please also cross-reference related classes in docstrings using a "See also" section as done in #4444.

ogrisel · 2015-07-03T14:10:56Z

Also I wonder if we should not rename this class ShuffleDisjointLabelSplit to be more in-line with the new DisjointLabelKFold class while being consistent with the basic ShuffleSplit alternative.

amueller · 2015-07-12T21:35:57Z

LabelShuffleSplit?

jnothman · 2015-08-03T03:48:10Z

I'm looking for something along the lines of this or DisjointLabelKFold atm. I think we need some clarity for ourselves as to when each is the right choice.

As for this particular proposal, I think users often find themselves expecting this functionality from LeavePLabelsOut. At a minimum, the "see also" needs to be emphasised. At the other extreme, we could incorporate this functionality into LeavePLabelsOut with n_iters='all' for current behaviour.

jnothman · 2015-08-03T03:49:22Z

doc/whats_new.rst

@@ -13,6 +13,10 @@ Changelog
 New features
 ............

+   - :class:`cross_validation.ShuffleLabelsOut` generates random train-test splits,
+     similer to `ShuffleSplit`, except that the splits are conditioned on a label array.


Use :class:

amueller · 2015-08-03T16:11:34Z

I'd be ok with just adding more to the "see also" section (for now?)

glouppe · 2015-08-30T09:26:43Z

I'll squash and fix this last comment myself.

FedericoV · 2015-08-30T09:45:25Z

+1 - this looks really good, and it's a simple functionality that was missing.

glouppe · 2015-08-30T10:00:57Z

Merged by rebase as 21a966a

choldgraf · 2015-08-30T15:03:05Z

Great, now I can stop using my hacky workaround to this :) thanks so much!

amueller · 2015-08-31T18:10:40Z

🍻 @bmcfee :)

bmcfee · 2015-08-31T18:15:30Z

🎉 thanks for merging!

From scikit-learn#5161 - MAINT remove redundant p variable - Add check for sparse prediction in cross_val_predict From scikit-learn#5201 - DOC improve random_state param doc From scikit-learn#5190 - LabelKFold and test From scikit-learn#4583 - LabelShuffleSplit and tests

From scikit-learn#5161 - MAINT remove redundant p variable - Add check for sparse prediction in cross_val_predict From scikit-learn#5201 - DOC improve random_state param doc From scikit-learn#5190 - LabelKFold and test From scikit-learn#4583 - LabelShuffleSplit and tests From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests

From scikit-learn#5161 - MAINT remove redundant p variable - Add check for sparse prediction in cross_val_predict From scikit-learn#5201 - DOC improve random_state param doc From scikit-learn#5190 - LabelKFold and test From scikit-learn#4583 - LabelShuffleSplit and tests From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests Other minor changes ------------------- Fix cross_validation reference Fix the labels param doc

Squashed commit messages - (For reference) Major ----- * ENH p --> n_labels * FIX *ShuffleSplit: all float/invalid type errors at init and int error at split * FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings * ENH+TST KFold: make rng to be generated at every split call for reproducibility * FIX/MAINT KFold: make shuffle a public attr * FIX Make CVIterableWrapper private. * FIX reuse len_cv instead of recalculating it * FIX Prevent adding *SearchCV estimators from the old grid_search module * re-FIX In all_estimators: the sorting to use only the 1st item (name) To avoid collision between the old and the new GridSearch classes. * FIX test_validate.py: Use 2D X (1D X is being detected as a single sample) * MAINT validate.py --> validation.py * MAINT make the submodules private * MAINT Support old cv/gs/lc until 0.19 * FIX/MAINT n_splits --> get_n_splits * FIX/TST test_logistic.py/test_ovr_multinomial_iris: pass predefined folds as an iterable * MAINT expose BaseCrossValidator * Update the model_selection module with changes from master - From scikit-learn#5161 - - MAINT remove redundant p variable - - Add check for sparse prediction in cross_val_predict - From scikit-learn#5201 - DOC improve random_state param doc - From scikit-learn#5190 - LabelKFold and test - From scikit-learn#4583 - LabelShuffleSplit and tests - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests Minor ----- * ENH Make the KFold shuffling test stronger * FIX/DOC Use the higher level model_selection module as ref * DOC in check_cv "y : array-like, optional" * DOC a supervised learning problem --> supervised learning problems * DOC cross-validators --> cross-validation strategies * DOC Correct Olivier Grisel's name ;) * MINOR/FIX cv_indices --> kfold * FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut * TST/FIX imports on separate lines * FIX use __class__ instead of classmethod * TST/FIX import directly from model_selection * COSMIT Relocate the random_state documentation * COSMIT remove pass * MAINT Remove deprecation warnings from old tests * FIX correct import at test_split * FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse * FIX random state to avoid doctest failure * TST n_splits and split wrapping of _CVIterableWrapper * FIX/MAINT Use multilabel indicator matrix directly * TST/DOC clarify why we conflate classes 0 and 1 * DOC add comment that this was taken from BaseEstimator * FIX use of labels is not needed in stratified k fold * Fix cross_validation reference * Fix the labels param doc

Squashed commit messages - (For reference) Major ----- * ENH p --> n_labels * FIX *ShuffleSplit: all float/invalid type errors at init and int error at split * FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings * ENH+TST KFold: make rng to be generated at every split call for reproducibility * FIX/MAINT KFold: make shuffle a public attr * FIX Make CVIterableWrapper private. * FIX reuse len_cv instead of recalculating it * FIX Prevent adding *SearchCV estimators from the old grid_search module * re-FIX In all_estimators: the sorting to use only the 1st item (name) To avoid collision between the old and the new GridSearch classes. * FIX test_validate.py: Use 2D X (1D X is being detected as a single sample) * MAINT validate.py --> validation.py * MAINT make the submodules private * MAINT Support old cv/gs/lc until 0.19 * FIX/MAINT n_splits --> get_n_splits * FIX/TST test_logistic.py/test_ovr_multinomial_iris: pass predefined folds as an iterable * MAINT expose BaseCrossValidator * Update the model_selection module with changes from master - From scikit-learn#5161 - - MAINT remove redundant p variable - - Add check for sparse prediction in cross_val_predict - From scikit-learn#5201 - DOC improve random_state param doc - From scikit-learn#5190 - LabelKFold and test - From scikit-learn#4583 - LabelShuffleSplit and tests - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests Minor ----- * ENH Make the KFold shuffling test stronger * FIX/DOC Use the higher level model_selection module as ref * DOC in check_cv "y : array-like, optional" * DOC a supervised learning problem --> supervised learning problems * DOC cross-validators --> cross-validation strategies * DOC Correct Olivier Grisel's name ;) * MINOR/FIX cv_indices --> kfold * FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut * TST/FIX imports on separate lines * FIX use __class__ instead of classmethod * TST/FIX import directly from model_selection * COSMIT Relocate the random_state documentation * COSMIT remove pass * MAINT Remove deprecation warnings from old tests * FIX correct import at test_split * 9E88 FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse * FIX random state to avoid doctest failure * TST n_splits and split wrapping of _CVIterableWrapper * FIX/MAINT Use multilabel indicator matrix directly * TST/DOC clarify why we conflate classes 0 and 1 * DOC add comment that this was taken from BaseEstimator * FIX use of labels is not needed in stratified k fold * Fix cross_validation reference * Fix the labels param doc

Squashed commit messages - (For reference) Major ----- * ENH p --> n_labels * FIX *ShuffleSplit: all float/invalid type errors at init and int error at split * FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings * ENH+TST KFold: make rng to be generated at every split call for reproducibility * FIX/MAINT KFold: make shuffle a public attr * FIX Make CVIterableWrapper private. * FIX reuse len_cv instead of recalculating it * FIX Prevent adding *SearchCV estimators from the old grid_search module * re-FIX In all_estimators: the sorting to use only the 1st item (name) To avoid collision between the old and the new GridSearch classes. * FIX test_validate.py: Use 2D X (1D X is being detected as a single sample) * MAINT validate.py --> validation.py * MAINT make the submodules private * MAINT Support old cv/gs/lc until 0.19 * FIX/MAINT n_splits --> get_n_splits * FIX/TST test_logistic.py/test_ovr_multinomial_iris: pass predefined folds as an iterable * MAINT expose BaseCrossValidator * Update the model_selection module with changes from master - From scikit-learn#5161 - - MAINT remove redundant p variable - - Add check for sparse prediction in cross_val_predict - From scikit-learn#5201 - DOC improve random_state param doc - From scikit-learn#5190 - LabelKFold and test - From scikit-learn#4583 - LabelShuffleSplit and tests - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests - From scikit-learn#5378 - Make the GridSearchCV docs more accurate. Minor ----- * ENH Make the KFold shuffling test stronger * FIX/DOC Use the higher level model_selection module as ref * DOC in check_cv "y : array-like, optional" * DOC a supervised learning problem --> supervised learning problems * DOC cross-validators --> cross-validation strategies * DOC Correct Olivier Grisel's name ;) * MINOR/FIX cv_indices --> kfold * FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut * TST/FIX imports on separate lines * FIX use __class__ instead of classmethod * TST/FIX import directly from model_selection * COSMIT Relocate the random_state documentation * COSMIT remove pass * MAINT Remove deprecation warnings from old tests * FIX correct import at test_split * FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse * FIX random state to avoid doctest failure * TST n_splits and split wrapping of _CVIterableWrapper * FIX/MAINT Use multilabel indicator matrix directly * TST/DOC clarify why we conflate classes 0 and 1 * DOC add comment that this was taken from BaseEstimator * FIX use of labels is not needed in stratified k fold * Fix cross_validation reference * Fix the labels param doc

Squashed commit messages - (For reference) Major ----- * ENH p --> n_labels * FIX *ShuffleSplit: all float/invalid type errors at init and int error at split * FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings * ENH+TST KFold: make rng to be generated at every split call for reproducibility * FIX/MAINT KFold: make shuffle a public attr * FIX Make CVIterableWrapper private. * FIX reuse len_cv instead of recalculating it * FIX Prevent adding *SearchCV estimators from the old grid_search module * re-FIX In all_estimators: the sorting to use only the 1st item (name) To avoid collision between the old and the new GridSearch classes. * FIX test_validate.py: Use 2D X (1D X is being detected as a single sample) * MAINT validate.py --> validation.py * MAINT make the submodules private * MAINT Support old cv/gs/lc until 0.19 * FIX/MAINT n_splits --> get_n_splits * FIX/TST test_logistic.py/test_ovr_multinomial_iris: pass predefined folds as an iterable * MAINT expose BaseCrossValidator * Update the model_selection module with changes from master - From scikit-learn#5161 - - MAINT remove redundant p variable - - Add check for sparse prediction in cross_val_predict - From scikit-learn#5201 - DOC improve random_state param doc - From scikit-learn#5190 - LabelKFold and test - From scikit-learn#4583 - LabelShuffleSplit and tests - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests - From scikit-learn#5378 - Make the GridSearchCV docs more accurate. - From scikit-learn#5458 - Remove shuffle from LabelKFold - From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen Minor ----- * ENH Make the KFold shuffling test stronger * FIX/DOC Use the higher level model_selection module as ref * DOC in check_cv "y : array-like, optional" * DOC a supervised learning problem --> supervised learning problems * DOC cross-validators --> cross-validation strategies * DOC Correct Olivier Grisel's name ;) * MINOR/FIX cv_indices --> kfold * FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut * TST/FIX imports on separate lines * FIX use __class__ instead of classmethod * TST/FIX import directly from model_selection * COSMIT Relocate the random_state documentation * COSMIT remove pass * MAINT Remove deprecation warnings from old tests * FIX correct import at test_split * FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse * FIX random state to avoid doctest failure * TST n_splits and split wrapping of _CVIterableWrapper * FIX/MAINT Use multilabel indicator matrix directly * TST/DOC clarify why we conflate classes 0 and 1 * DOC add comment that this was taken from BaseEstimator * FIX use of labels is not needed in stratified k fold * Fix cross_validation reference * Fix the labels param doc

Squashed commit messages - (For reference) Major ----- * ENH p --> n_labels * FIX *ShuffleSplit: all float/invalid type errors at init and int error at split * FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings * ENH+TST KFold: make rng to be generated at every split call for reproducibility * FIX/MAINT KFold: make shuffle a public attr * FIX Make CVIterableWrapper private. * FIX reuse len_cv instead of recalculating it * FIX Prevent adding *SearchCV estimators from the old grid_search module * re-FIX In all_estimators: the sorting to use only the 1st item (name) To avoid collision between the old and the new GridSearch classes. * FIX test_validate.py: Use 2D X (1D X is being detected as a single sample) * MAINT validate.py --> validation.py * MAINT make the submodules private * MAINT Support old cv/gs/lc until 0.19 * FIX/MAINT n_splits --> get_n_splits * FIX/TST test_logistic.py/test_ovr_multinomial_iris: pass predefined folds as an iterable * MAINT expose BaseCrossValidator * Update the model_selection module with changes from master - From scikit-learn#5161 - - MAINT remove redundant p variable - - Add check for sparse prediction in cross_val_predict - From scikit-learn#5201 - DOC improve random_state param doc - From scikit-learn#5190 - LabelKFold and test - From scikit-learn#4583 - LabelShuffleSplit and tests - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests - From scikit-learn#5378 - Make the GridSearchCV docs more accurate. - From scikit-learn#5458 - Remove shuffle from LabelKFold - From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen - From scikit-learn#4826 - Move custom error / warnings into sklearn.exception Minor ----- * ENH Make the KFold shuffling test stronger * FIX/DOC Use the higher level model_selection module as ref * DOC in check_cv "y : array-like, optional" * DOC a supervised learning problem --> supervised learning problems * DOC cross-validators --> cross-validation strategies * DOC Correct Olivier Grisel's name ;) * MINOR/FIX cv_indices --> kfold * FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut * TST/FIX imports on separate lines * FIX use __class__ instead of classmethod * TST/FIX import directly from model_selection * COSMIT Relocate the random_state documentation * COSMIT remove pass * MAINT Remove deprecation warnings from old tests * FIX correct import at test_split * FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse * FIX random state to avoid doctest failure * TST n_splits and split wrapping of _CVIterableWrapper * FIX/MAINT Use multilabel indicator matrix directly * TST/DOC clarify why we conflate classes 0 and 1 * DOC add comment that this was taken from BaseEstimator * FIX use of labels is not needed in stratified k fold * Fix cross_validation reference * Fix the labels param doc

-------------------- * ENH Reogranize classes/fn from grid_search into search.py * ENH Reogranize classes/fn from cross_validation into split.py * ENH Reogranize cls/fn from cross_validation/learning_curve into validate.py * MAINT Merge _check_cv into check_cv inside the model_selection module * MAINT Update all the imports to point to the model_selection module * FIX use iter_cv to iterate throught the new style/old style cv objs * TST Add tests for the new model_selection members * ENH Wrap the old-style cv obj/iterables instead of using iter_cv * ENH Use scipy's binomial coefficient function comb for calucation of nCk * ENH Few enhancements to the split module * ENH Improve check_cv input validation and docstring * MAINT _get_test_folds(X, y, labels) --> _get_test_folds(labels) * TST if 1d arrays for X introduce any errors * ENH use 1d X arrays for all tests; * ENH X_10 --> X (global var) Minor ----- * ENH _PartitionIterator --> _BaseCrossValidator; * ENH CVIterator --> CVIterableWrapper * TST Import the old SKF locally * FIX/TST Clean up the split module's tests. * DOC Improve documentation of the cv parameter * COSMIT consistently hyphenate cross-validation/cross-validator * TST Calculate n_samples from X * COSMIT Use separate lines for each import. * COSMIT cross_validation_generator --> cross_validator Commits merged manually ----------------------- * FIX Document the random_state attribute in RandomSearchCV * MAINT Use check_cv instead of _check_cv * ENH refactor OVO decision function, use it in SVC for sklearn-like decision_function shape * FIX avoid memory cost when sampling from large parameter grids ENH Major to Minor incremental enhancements to the model_selection Squashed commit messages - (For reference) Major ----- * ENH p --> n_labels * FIX *ShuffleSplit: all float/invalid type errors at init and int error at split * FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings * ENH+TST KFold: make rng to be generated at every split call for reproducibility * FIX/MAINT KFold: make shuffle a public attr * FIX Make CVIterableWrapper private. * FIX reuse len_cv instead of recalculating it * FIX Prevent adding *SearchCV estimators from the old grid_search module * re-FIX In all_estimators: the sorting to use only the 1st item (name) To avoid collision between the old and the new GridSearch classes. * FIX test_validate.py: Use 2D X (1D X is being detected as a single sample) * MAINT validate.py --> validation.py * MAINT make the submodules private * MAINT Support old cv/gs/lc until 0.19 * FIX/MAINT n_splits --> get_n_splits * FIX/TST test_logistic.py/test_ovr_multinomial_iris: pass predefined folds as an iterable * MAINT expose BaseCrossValidator * Update the model_selection module with changes from master - From #5161 - - MAINT remove redundant p variable - - Add check for sparse prediction in cross_val_predict - From #5201 - DOC improve random_state param doc - From #5190 - LabelKFold and test - From #4583 - LabelShuffleSplit and tests - From #5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests - From #5378 - Make the GridSearchCV docs more accurate. - From #5458 - Remove shuffle from LabelKFold - From #5466(#4270) - Gaussian Process by Jan Metzen - From #4826 - Move custom error / warnings into sklearn.exception Minor ----- * ENH Make the KFold shuffling test stronger * FIX/DOC Use the higher level model_selection module as ref * DOC in check_cv "y : array-like, optional" * DOC a supervised learning problem --> supervised learning problems * DOC cross-validators --> cross-validation strategies * DOC Correct Olivier Grisel's name ;) * MINOR/FIX cv_indices --> kfold * FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut * TST/FIX imports on separate lines * FIX use __class__ instead of classmethod * TST/FIX import directly from model_selection * COSMIT Relocate the random_state documentation * COSMIT remove pass * MAINT Remove deprecation warnings from old tests * FIX correct import at test_split * FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse * FIX random state to avoid doctest failure * TST n_splits and split wrapping of _CVIterableWrapper * FIX/MAINT Use multilabel indicator matrix directly * TST/DOC clarify why we conflate classes 0 and 1 * DOC add comment that this was taken from BaseEstimator * FIX use of labels is not needed in stratified k fold * Fix cross_validation reference * Fix the labels param doc FIX/DOC/MAINT Addressing the review comments by Arnaud and Andy COSMIT Sort the members alphabetically COSMIT len_cv --> n_splits COSMIT Merge 2 if; FIX Use kwargs DOC Add my name to the authors :D DOC make labels parameter consistent FIX Remove hack for boolean indices; + COSMIT idx --> indices; DOC Add Returns COSMIT preds --> predictions DOC Add Returns and neatly arrange X, y, labels FIX idx(s)/ind(s)--> indice(s) COSMIT Merge if and else to elif COSMIT n --> n_samples COSMIT Use bincount only once COSMIT cls --> class_i / class_i (ith class indices) --> perm_indices_class_i FIX/ENH/TST Addressing the final reviews COSMIT c --> count FIX/TST make check_cv raise ValueError for string cv value TST nested cv (gs inside cross_val_score) works for diff cvs FIX/ENH Raise ValueError when labels is None for label based cvs; TST if labels is being passed correctly to the cv and that the ValueError is being propagated to the cross_val_score/predict and grid search FIX pass labels to cross_val_score FIX use make_classification DOC Add Returns; COSMIT Remove scaffolding TST add a test to check the _build_repr helper REVERT the old GS/RS should also be tested by the common tests. ENH Add a tuple of all/label based CVS FIX raise VE even at get_n_splits if labels is None FIX Fabian's comments PEP8

Merge pull request #1 from scikit-learn/master

693f425

merge downstream

amueller reviewed Apr 13, 2015
View reviewed changes

bmcfee added 4 commits April 15, 2015 14:11

added ShuffleLabelsOut cv iterator

ebfea02

fixed tests for shufflelabelsout

36c9c3d

updated docstring

3493ec7

Fixed an error in call to the super constructor

62e1996

BigCrunsh mentioned this pull request May 12, 2015

LabelSegmentedKFold cross-validation iterator #4709

Closed

amueller reviewed May 12, 2015
View reviewed changes

bmcfee added 2 commits May 12, 2015 19:41

fixed repr, variable names in ShuffleLabelsOut

ba7f81e

added length and repr tests to ShuffleLabelsOut

a0a2764

amueller changed the title ~~ShuffleLabelsOut cross-validation iterator~~ [MRG + 1] ShuffleLabelsOut cross-validation iterator Jun 23, 2015

vene reviewed Jun 23, 2015
View reviewed changes

added documentation for ShuffleLabelsOut

0030d32

amueller mentioned this pull request Jul 1, 2015

[MRG + 1] Added DisjointLabelKFold to perform K-Fold cv on sets with disjoint labels. #4444

Closed

jnothman reviewed Aug 3, 2015
View reviewed changes

glouppe closed this Aug 30, 2015

raghavrv mentioned this pull request Sep 10, 2015

[MRG+1] Make cross-validators data independent + Reorganize grid_search, cross_validation and learning_curve into model_selection #4294

Merged

9E88 24 tasks

raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Sep 13, 2015

From scikit-learn#4583 - LabelShuffleSplit and tests

ac70abe

raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Sep 14, 2015

From scikit-learn#4583 - LabelShuffleSplit and tests

36cd441

raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Oct 5, 2015

From scikit-learn#4583 - LabelShuffleSplit and tests

c7354ed

JeanKossaifi mentioned this pull request Oct 13, 2015

LabelKFold: shuffling and preserving original order #5390

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG + 2] ShuffleLabelsOut cross-validation iterator #4583

[MRG + 2] ShuffleLabelsOut cross-validation iterator #4583

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG + 2] ShuffleLabelsOut cross-validation iterator #4583

[MRG + 2] ShuffleLabelsOut cross-validation iterator #4583

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!