8000 Nested CV of LeaveOneGroupOut fails in permutation_test_score · Issue #8127 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Nested CV of LeaveOneGroupOut fails in permutation_test_score #8127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
oesteban opened this issue Dec 28, 2016 · 2 comments · Fixed by #27058
Closed

Nested CV of LeaveOneGroupOut fails in permutation_test_score #8127

oesteban opened this issue Dec 28, 2016 · 2 comments · Fixed by #27058

Comments

@oesteban
Copy link
oesteban commented Dec 28, 2016

Description

Nested CV of LeaveOneGroupOut fails in permutation_test_score

Steps/Code to Reproduce

from sklearn.model_selection import LeaveOneGroupOut, GridSearchCV, permutation_test_score

folds_groups = ['group1', 'group2', 'group3']
clf = GridSearchCV(clf_object, clf_params, cv=LeaveOneGroupOut())
perm_res = permutation_test_score(
    clf, sample_x, labels_y, scoring='accuracy', cv=LeaveOneGroupOut(),
    n_permutations=5000, groups=folds_groups,
    n_jobs=-1)

Expected Results

N being the total number of groups, the inner cross validation trains on all possible combinations of N-2 groups and validates in one group. Then the outer permutation test loop trains the best classifier of the inner group in N-1 groups and runs the validation in the left-out group.

Actual Results

The inner cv loop is not passed the groups and raises ValueError.

Traceback (most recent call last):
  File "/home/oesteban/miniconda2/envs/nipypedev-2.7/bin/mriqc_fit", line 9, in <module>
    load_entry_point('mriqc', 'console_scripts', 'mriqc_fit')()
  File "/home/oesteban/workspace/mriqc/mriqc/classifier/cli.py", line 95, in main
    cvhelper.fit(folds=folds)
  File "/home/oesteban/workspace/mriqc/mriqc/classifier/cv.py", line 181, in fit
    n_jobs=self.n_jobs)
  File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 609, in permutation_test_score
    score = _permutation_test_score(clone(estimator), X, y, groups, cv, scorer)
  File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 627, in _permutation_test_score
    estimator.fit(X[train], y[train])
  File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_search.py", line 945, in fit
    return self._fit(X, y, groups, ParameterGrid(self.param_grid))
  File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_search.py", line 543, in _fit
    n_splits = cv.get_n_splits(X, y, groups)
  File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_split.py", line 810, in get_n_splits
    raise ValueError("The groups parameter should not be None")
ValueError: The groups parameter should not be None

Versions

Linux-4.4.0-53-generic-x86_64-with-debian-stretch-sid
('Python', '2.7.11 |Continuum Analytics, Inc.| (default, Dec  6 2015, 18:08:32) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]')
('NumPy', '1.11.2')
('SciPy', '0.17.0')
('Scikit-Learn', '0.18.1')

Comments

Does it make any sense this nested cross validation scheme? I would appreciate any comments on this end.

Particularly, I have a binary classification problem, with ~1000 samples, split in ~20 groups. Each group has 20-300 samples. I want it to generalize well if an unseen new group (20-300 samples) is received. Does it sound about right?

@jnothman
Copy link
Member
jnothman commented Dec 29, 2016 via email

@BurchamLab
Copy link

Has the addition of groups been resolved for use in nested CV yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
0