-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
y
, and groups
parameters toStratifiedGroupKFold.split()
are optional
#30742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello @Teagum, thanks for taking the time to make this issue. I have seen that also the docs for GroupKFold.split() suggest that users could pass I think we should improve the documentation and also make sure we raise a helpful error messages. (I actually wonder why we haven't used I will go ahead and make a PR to fix this, unless you want to do it @Teagum? |
Seems like a good first issue. So let me try it! |
Sure. And let's get a core-dev's confirmation before you put too much work in. Maybe there is a reason to it that I cannot see. Maybe @adrinjalali, would you confirm that it makes sense to make |
This is an interesting one, and tangentially related to #26821. Basically, historically, all splitters have had That was before metadata routing was a thing. These days, we could potentially consolidate a lot of our splitters into very few (if not one) classes/class, and to routing and validation in a nice consistent way. The proposed "fix" here would be somewhat of a patch work, which we can apply to all relevant classes here, but it would be a sort of a short term fix. I'm honestly not sure what the path forward in this particular case should be. In an ideal world, I'd like to see the large refactoring done in the splitter part of the code base, but I understand that's not an easy project to tackle, and I'm not sure when I'd have the chance to tackle it myself. cc @scikit-learn/core-devs for consult really 😁 |
Thank you @adrinjalali. I can see that we might want to re-design all the splitters in the future now with metadata routing. But we're talking about a two(-or-more)-year perspective here. Maybe it's okay to start in small steps, and do some intermediary short-term fixes that lead to the splitters become more equal to each other in the meantime? I don't think that it would be an API-relevant change if users get to see better error messages and have a clearer documentation until we know what to do with the splitters in general. |
Alright, so I'll start a PR for this. |
Is this issue still open to work on? |
Hi, @hoipranav! Thank you for your interest in this issue. I already decided to work on this a few weeks back, but didn't have the time to get into it. It would be great if you would let me finish this. Thank you! |
Describe the bug
StratifiedGroupKFold.split
has the signature(self, X, y=None, groups=None)
indicating that bothy
, andgroups
may not be specified when callingsplit
.However, omitting only
groups
results inTypeError: iteration over a 0-d array
. Also, when omitting bothy
andgroups
, or onlyy
the result isValueError: Supported target types are: ('binary', 'multiclass'). Got 'unknown' instead.
This indicates, contrary to the signature, thaty
and `groups are required and not optional.I would instead expect consisted behavior with e.g.
StratifiedKFold
, where they
parameter tosplit
is not optional.StratifiedKFold
andStratifiedGroupKFold
both inherit from_BaseKFold
, which provides.split
. HoweverStratifiedKFold
implements its ownsplit
method, instead of using_BaseKFold
likeStratifiedGroupKFold
does.Steps/Code to Reproduce
Expected Results
Either no error if
y
,groups
, or both are not specified. Or remove the default ofNone
for both parameters from the function signature.Actual Results
Versions
The text was updated successfully, but these errors were enco 8000 untered: