8000 Revert "Instead of linking to NB, explain the problem inside the test… · paulgb/scikit-learn@38f703c · GitHub
[go: up one dir, main page]

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 38f703c

Browse files
committed
Revert "Instead of linking to NB, explain the problem inside the test itself."
This reverts commit cf9788c.
1 parent f7d1b6b commit 38f703c

File tree

1 file changed

+5
-30
lines changed

1 file changed

+5
-30
lines changed

sklearn/tests/test_cross_validation.py

Lines changed: 5 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -648,36 +648,11 @@ def test_stratified_kfold_preserve_order(): # see #2372
648648

649649

650650
def test_stratified_kfold_preserve_order_with_digits(): # see #2372
651-
# The digits samples are dependent as they are apparently grouped
652-
# by authors although we don't have any information on the groups
653-
# segment locations for this data. We can highlight this fact be
654-
# computing k-fold cross-validation with and without shuffling: we
655-
# observer that the shuffling case makes the IID assumption and is
656-
# therefore too optimistic: it estimates a much higher accuracy
657-
# (around 0.965) than than the non shuffling variant (around
658-
# 0.905).
659-
651+
# A regression test, taken from
652+
# http://nbviewer.ipython.org/urls/raw.github.com/ogrisel/notebooks/master/Non%2520IID%2520cross-validation.ipynb
660653
digits = load_digits()
661-
X, y = digits.data[:800], digits.target[:800]
662-
model = SVC(C=10, gamma=0.005)
663-
n = len(y)
664-
665-
cv = cval.KFold(n, 5, shuffle=False)
666-
assert_greater(0.91, cval.cross_val_score(model, X, y, cv=cv).mean())
667-
668-
cv = cval.KFold(n, 5, shuffle=True, random_state=0)
669-
assert_greater(cval.cross_val_score(model, X, y, cv=cv).mean(), 0.95)
670-
671-
cv = cval.KFold(n, 5, shuffle=True, random_state=1)
672-
assert_greater(cval.cross_val_score(model, X, y, cv=cv).mean(), 0.95)
673-
674-
cv = cval.KFold(n, 5, shuffle=True, random_state=2)
675-
assert_greater(cval.cross_val_score(model, X, y, cv=cv).mean(), 0.95)
676-
677-
# Similarly, StratifiedKFold should try to shuffle the data as few
678-
# as possible (while respecting the balanced class constraints)
679-
# and thus be able to detect the dependency by not overestimating
680-
# the CV score either:
654+
X, y = digits.data, digits.target
681655

656+
model = SVC(C=10, gamma=0.005)
682657
cv = cval.StratifiedKFold(y, 5)
683-
assert_greater(0.91, cval.cross_val_score(model, X, y, cv=cv).mean())
658+
assert cval.cross_val_score(model, X, y, cv=cv, n_jobs=-1).mean() < 0.91

0 commit comments

Comments
 (0)
0