Should all test cases set the random_state?

@amueller

In this PR some newly added test cases made some unrelated tests fail. The culprit was mutating state due to missing random_state(s) in the unrelated tests' function/class invocations. @amueller also mention that:

Ideally we would explicitly pass random states everywhere in all tests. To fix the test failure, maybe fix the random states in the test that's failing. You can also add it in the test that you added.

I made a simple python script here that goes through each of sklearn's test files and look for estimators/classes that take a random_state parameter, and see if this was passed in the invocation(s). If not then print out the class/function name followed by a list of line numbers with missing invocations.

On the latest master branch (ca687ba) there are 1 155 cases in 154 test files where a class/function that takes a random_state is called without setting it to the canonical 42:

erik  scikit-learn$ find * -type f | grep "/test_.*py$" | python rcheck.py
sklearn/tree/tests/test_tree.py
	DecisionTreeClassifier [398, 1173, 1210, 1231, 1237, 1531]
	DecisionTreeRegressor [1532]
sklearn/tree/tests/test_export.py
        DecisionTreeClassifier [197, 212]
.
.
.
sklearn/model_selection/tests/test_search.py
	DecisionTreeClassifier [868]
	DecisionTreeRegressor [868]
	GroupShuffleSplit [242]
	KFold [241, 250, 491, 503, 919, 1172, 1193]
	LinearSVC [292, 356, 363, 376, 383]
	LinearSVCNoScore [101, 210]
	ParameterSampler [1093, 1096, 1113]
	RandomizedSearchCV [265, 704, 708, 753, 798, 839, 886, 946, 983]
	SGDClassifier [1126, 1141]
	SVC [292, 301, 305, 325, 333, 340, 348, 356, 363, 376, 383, 417, 441, 644, 647, 704, 708, 752, 753, 795, 798, 838, 839, 1003]
	StratifiedKFold [250, 919]
	StratifiedShuffleSplit [250]

Found 1155 missing in 154 files

The full output is here

If desirable, I could work on a PR for a find/replace on all these cases? If so, then the scikit-learn contributing docs should perhaps be explicit in stating that all added test cases that invoke an estimator that takes a random_state, should also set it / or add some kind of flake plugin that checks this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions