-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Description
In this PR some newly added test cases made some unrelated tests fail. The culprit was mutating state due to missing random_state
(s) in the unrelated tests' function/class invocations. @amueller also mention that:
Ideally we would explicitly pass random states everywhere in all tests. To fix the test failure, maybe fix the random states in the test that's failing. You can also add it in the test that you added.
I made a simple python script here that goes through each of sklearn
's test files and look for estimators/classes that take a random_state
parameter, and see if this was passed in the invocation(s). If not then print out the class/function name followed by a list of line numbers with missing invocations.
On the latest master branch (ca687ba) there are 1 155 cases in 154 test files where a class/function that takes a random_state
is called without setting it to the canonical 42
:
erik scikit-learn$ find * -type f | grep "/test_.*py$" | python rcheck.py
sklearn/tree/tests/test_tree.py
DecisionTreeClassifier [398, 1173, 1210, 1231, 1237, 1531]
DecisionTreeRegressor [1532]
sklearn/tree/tests/test_export.py
DecisionTreeClassifier [197, 212]
.
.
.
sklearn/model_selection/tests/test_search.py
DecisionTreeClassifier [868]
DecisionTreeRegressor [868]
GroupShuffleSplit [242]
KFold [241, 250, 491, 503, 919, 1172, 1193]
LinearSVC [292, 356, 363, 376, 383]
LinearSVCNoScore [101, 210]
ParameterSampler [1093, 1096, 1113]
RandomizedSearchCV [265, 704, 708, 753, 798, 839, 886, 946, 983]
SGDClassifier [1126, 1141]
SVC [292, 301, 305, 325, 333, 340, 348, 356, 363, 376, 383, 417, 441, 644, 647, 704, 708, 752, 753, 795, 798, 838, 839, 1003]
StratifiedKFold [250, 919]
StratifiedShuffleSplit [250]
Found 1155 missing in 154 files
The full output is here
If desirable, I could work on a PR for a find/replace on all these cases? If so, then the scikit-learn contributing docs should perhaps be explicit in stating that all added test cases that invoke an estimator that takes a random_state
, should also set it / or add some kind of flake plugin that checks this?