-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Nested CV of LeaveOneGroupOut fails in permutation_test_score #8127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
We're aware that we don't really have nested handling of groups yet. A grid
search within cross_val_score won't work either, I think. A more general
approach to attaching properties to samples (#4497) will be needed.
…On 29 December 2016 at 00:15, Oscar Esteban ***@***.***> wrote:
Description
Nested CV of LeaveOneGroupOut fails in permutation_test_score
Steps/Code to Reproduce
from sklearn.model_selection import LeaveOneGroupOut, GridSearchCV, permutation_test_score
folds_groups = ['group1', 'group2', 'group3']
clf = GridSearchCV(clf_object, clf_params, cv=LeaveOneGroupOut())
perm_res = permutation_test_score(
clf, sample_x, labels_y, scoring='accuracy', cv=LeaveOneGroupOut(),
n_permutations=5000, groups=folds_groups,
n_jobs=-1)
Expected Results
*N* being the total number of groups, the inner cross validation trains
on all possible combinations of *N*-2 groups and validates in one group.
Then the outer permutation test loop trains the best classifier of the
inner group in *N*-1 groups and runs the validation in the left-out group.
Actual Results
The inner cv loop is not passed the groups and raises ValueError.
Traceback (most recent call last):
File "/home/oesteban/miniconda2/envs/nipypedev-2.7/bin/mriqc_fit", line 9, in <module>
load_entry_point('mriqc', 'console_scripts', 'mriqc_fit')()
File "/home/oesteban/workspace/mriqc/mriqc/classifier/cli.py", line 95, in main
cvhelper.fit(folds=folds)
File "/home/oesteban/workspace/mriqc/mriqc/classifier/cv.py", line 181, in fit
n_jobs=self.n_jobs)
File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 609, in permutation_test_score
score = _permutation_test_score(clone(estimator), X, y, groups, cv, scorer)
File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 627, in _permutation_test_score
estimator.fit(X[train], y[train])
File "/hom
8000
e/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_search.py", line 945, in fit
return self._fit(X, y, groups, ParameterGrid(self.param_grid))
File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_search.py", line 543, in _fit
n_splits = cv.get_n_splits(X, y, groups)
File "/home/oesteban/miniconda2/envs/nipypedev-2.7/lib/python2.7/site-packages/sklearn/model_selection/_split.py", line 810, in get_n_splits
raise ValueError("The groups parameter should not be None")
ValueError: The groups parameter should not be None
Versions
Linux-4.4.0-53-generic-x86_64-with-debian-stretch-sid
('Python', '2.7.11 |Continuum Analytics, Inc.| (default, Dec 6 2015, 18:08:32) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]')
('NumPy', '1.11.2')
('SciPy', '0.17.0')
('Scikit-Learn', '0.18.1')
Comments
Does it make any sense this nested cross validation scheme? I would
appreciate any comments on this end.
Particularly, I have a binary classification problem, with ~1000 samples,
split in ~20 groups. Each group has 20-300 samples. I want it to generalize
well if an unseen new group (20-300 samples) is received. Does it sound
about right?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8127>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEz65NYIChRdWtDVz_rjit2rgr8X5Bxks5rMmELgaJpZM4LW6Un>
.
|
Has the addition of groups been resolved for use in nested CV yet? |
28 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Nested CV of LeaveOneGroupOut fails in permutation_test_score
Steps/Code to Reproduce
Expected Results
N being the total number of groups, the inner cross validation trains on all possible combinations of N-2 groups and validates in one group. Then the outer permutation test loop trains the best classifier of the inner group in N-1 groups and runs the validation in the left-out group.
Actual Results
The inner cv loop is not passed the groups and raises
ValueError
.Versions
Comments
Does it make any sense this nested cross validation scheme? I would appreciate any comments on this end.
Particularly, I have a binary classification problem, with ~1000 samples, split in ~20 groups. Each group has 20-300 samples. I want it to generalize well if an unseen new group (20-300 samples) is received. Does it sound about right?
The text was updated successfully, but these errors were encountered: