10000 Clarify group order in GroupKFold and LeaveOneGroupOut · Issue #18338 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Clarify group order in GroupKFold and LeaveOneGroupOut #18338
Closed
@lorentzenchr

Description

@lorentzenchr

Describe the bug

GroupKFold builds folds with non-overlapping groups in it. The order of the groups seems random and not sorted by group ID. A naiv user like me expects the groups to be ordered over the folds.

Steps/Code to Reproduce

import numpy as np
from sklearn.model_selection import GroupKFold


n_splits, n_samples, n_features = 3, 2, 2
X = np.arange(n_splits * n_samples * n_features).reshape(n_splits * n_samples, n_features)
groups = np.concatenate([np.full(n_samples, i) for i in range(n_splits)])
splits = list(GroupKFold(n_splits=n_splits).split(X, groups=groups))
[(np.unique(groups[train]), np.unique(groups[test])) for train, test in splits]

Results show group ID of train and test folds:

[(array([0, 1]), array([2])),
 (array([0, 2]), array([1])),
 (array([1, 2]), array([0]))]

Now we add a few samples to the first group (group ID = 0).

X = np.r_[X[:n_samples, :], X]
groups = np.r_[groups[:n_samples], groups]
splits = list(GroupKFold(n_splits=n_splits).split(X, groups=groups))
[(np.unique(groups[train]), np.unique(groups[test])) for train, test in splits]

Result:

[(array([1, 2]), array([0])),
 (array([0, 1]), array([2])),
 (array([0, 2]), array([1]))]

Expected Results

I expect the same (first) output for both:

[(array([0, 1]), array([2])),
 (array([0, 2]), array([1])),
 (array([1, 2]), array([0]))]

Versions

python: 3.7.2
sklearn: 0.24.dev0 (current master as of today morning)

Metadata

Metadata

Assignees

Labels

DocumentationEasyWell-defined and straightforward way to resolvemodule:test-suiteeverything related to our tests

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0