-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
how can cv folds can be more than number of groups in cross_validate? #13972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't think groups is being used in that case at all unless you specify a
cv object that can utilise groups
|
you mean something like cv=GroupKFold(n_splits=3) or what type of object? |
Yes. I think it would be good if the documentation of cross_validate was
more explicit about this. A Pull Request clarifying the documentation would
be welcome.
|
I think it should be fixed in the library too. An exception should be shown if a group is declared but there is no object in the cv argument. |
@omarcr I'm not sure that's easy to do in a very consistent way, and I'm also not sure it's desirable. We could raise an error any time |
@amueller The way I understand the method is the following: if for example I have 3 groups of labels then there should be 3 folds. After cross validation, the 3 models built will have an independent group for testing. If for example I don't declare the groups then the therefore if I pass cv=KFold(n_splits=5) I need to declare the groups labels of the data samples so Is this the way the method is implemented? |
Groups are only used if you explicitly pass a cross-validation strategy that uses groups, so GroupKFold or LeaveOneGroupOut. |
I agree that the logic of the code is not obvious from the API. I'd be
happy to have cv=int fail if groups is passed for now. I would also
consider warning if a non-groups splitter is used and groups is passed "Did
you mean to use a group-based splitter?"
|
Yes I agree. |
I think this can be closed. With #28210 you now get a user warning when the splitter doesn't support group. And you will get an error if folds is greater than number of groups. |
In the following code:
Shouldn't it be returned that:
groups should be equal to CV for the folds?
The text was updated successfully, but these errors were encountered: