-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
ENH Adds warning & docs for splitters that do not support goups #28210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Adds warning & docs for splitters that do not support goups #28210
Conversation
@@ -96,6 +96,10 @@ | |||
) | |||
digits = load_digits() | |||
|
|||
pytestmark = pytest.mark.filterwarnings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit weird for me to have this filter globally set for the file.
We should have our test catching the warning to make sure that we raise it when we expect or otherwise, we should raise an error.
Is there a particular reason to go fine grain here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I converted this pytestmark
to error instead and updated the tests.
sklearn/model_selection/_split.py
Outdated
@@ -1698,7 +1761,31 @@ def __init__(self, *, n_splits=5, n_repeats=10, random_state=None): | |||
|
|||
|
|||
class BaseShuffleSplit(_MetadataRequester, metaclass=ABCMeta): | |||
"""Base class for ShuffleSplit and StratifiedShuffleSplit.""" | |||
"""Base class for ShuffleSplit and StratifiedShuffleSplit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the GroupShuffleSplit
as well. Maybe "*ShuffleSplit
is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense. I was actually not aware of the predefined splitter. I think that we don't document it in our API documentation.
I'll open a PR.
We do have it documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a secret plan to make all splitters support group and strata (#26821) to all splitters and to simply the interface a bit.
But in the meantime this is a nice fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @thomasjpfan
…it-learn#28210) Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Reference Issues/PRs
Closes #22848
What does this implement/fix? Explain your changes.
This PR:
_UnsupportedGroupCVMixin
mixin that updates the docstrings of CV splitters that do not support groups. It also includes a warning if groups are passed in.split
method, this PR adds a warning there. ForPredefinedSplitter
andTimeSeriesSplitter
, it raises a warning duringsplit
when the generator is created.Any other comments?
Implementation wise, I moved
ShuffleSplit._iter_indices
toBaseShuffleSplit._iter_indicies
, so thatGroupShuffleSplit
can directly inherit fromBaseShuffleSplit
. This wayGroupShuffleSplit
is not inheriting from a splitter that does not support groups.