-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC/TST Clarify group order in GroupKFold and LeaveOneGroupOut #22582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
sklearn/model_selection/_split.py
Outdated
@@ -1138,6 +1142,11 @@ class LeaveOneGroupOut(BaseCrossValidator): | |||
[3 4]] [[5 6] | |||
[7 8]] [1 2] [1 2] | |||
|
|||
Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this section above the example section
sklearn/model_selection/_split.py
Outdated
@@ -496,6 +496,10 @@ class GroupKFold(_BaseKFold): | |||
[7 8]] [[1 2] | |||
[3 4]] [3 4] [1 2] | |||
|
|||
Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same for this section
X = np.ones(len(groups)) | ||
|
||
splits = iter(LeaveOneGroupOut().split(X, groups=groups)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make a small for
loop here.
expected_indices = [
([0, 1, 4, 5], [2, 3]),
([0, 1, 2, 3], [4, 5]),
([2, 3, 4, 5], [0, 1]),
]
for expected_train, expected_test in expected_indices:
train, test = next(splits)
assert_array_equal(train, expected_train)
assert_array_equal(test, expected_test)
Thanks for your comments. I've made the changes you suggest. Just to confirm, do we not want to make the group ordering in GroupKFold stable for all future versions by using a stable sorting algorithm? |
I assume that this is fine for the moment. At least the grouping is deterministic. I don't know if ordering is enough to change the behaviour for future versions. We can merge this PR for the moment and revisit the issue of ordering in another one. |
Thanks @SparklePigBang |
Reference Issues/PRs
Fixes #18338
What does this implement/fix? Explain your changes.
Any other comments?
argsort
. (In GroupKFold line 526 we take the indices which sort the array of group sizes. Two groups of the same size may be swapped if we don't use a stable sorting algorithm.)kind="stable"
toargsort
)shuffle
option. In Clarify group order in GroupKFold and LeaveOneGroupOut #18338 one possible suggestion is with respect to GroupKFold is: