Should all test cases set the random_state? #8194

erikcs · 2017-01-12T20:14:10Z

In this PR some newly added test cases made some unrelated tests fail. The culprit was mutating state due to missing random_state(s) in the unrelated tests' function/class invocations. @amueller also mention that:

Ideally we would explicitly pass random states everywhere in all tests. To fix the test failure, maybe fix the random states in the test that's failing. You can also add it in the test that you added.

I made a simple python script here that goes through each of sklearn's test files and look for estimators/classes that take a random_state parameter, and see if this was passed in the invocation(s). If not then print out the class/function name followed by a list of line numbers with missing invocations.

On the latest master branch (ca687ba) there are 1 155 cases in 154 test files where a class/function that takes a random_state is called without setting it to the canonical 42:

erik  scikit-learn$ find * -type f | grep "/test_.*py$" | python rcheck.py
sklearn/tree/tests/test_tree.py
	DecisionTreeClassifier [398, 1173, 1210, 1231, 1237, 1531]
	DecisionTreeRegressor [1532]
sklearn/tree/tests/test_export.py
        DecisionTreeClassifier [197, 212]
.
.
.
sklearn/model_selection/tests/test_search.py
	DecisionTreeClassifier [868]
	DecisionTreeRegressor [868]
	GroupShuffleSplit [242]
	KFold [241, 250, 491, 503, 919, 1172, 1193]
	LinearSVC [292, 356, 363, 376, 383]
	LinearSVCNoScore [101, 210]
	ParameterSampler [1093, 1096, 1113]
	RandomizedSearchCV [265, 704, 708, 753, 798, 839, 886, 946, 983]
	SGDClassifier [1126, 1141]
	SVC [292, 301, 305, 325, 333, 340, 348, 356, 363, 376, 383, 417, 441, 644, 647, 704, 708, 752, 753, 795, 798, 838, 839, 1003]
	StratifiedKFold [250, 919]
	StratifiedShuffleSplit [250]

Found 1155 missing in 154 files

The full output is here

If desirable, I could work on a PR for a find/replace on all these cases? If so, then the scikit-learn contributing docs should perhaps be explicit in stating that all added test cases that invoke an estimator that takes a random_state, should also set it / or add some kind of flake plugin that checks this?

The text was updated successfully, but these errors were encountered:

GaelVaroquaux · 2017-01-14T13:30:59Z

IMHO, they shoud, and this is desirable.

amueller · 2017-03-04T00:54:48Z

You didn't actually check for 42 in the script, right? We also use other states (mostly 0 I think).

erikcs · 2017-03-04T07:45:16Z

I did not check what it was set to, I just checked if it was set (and I apologize for leaving this issue hanging).

amueller · 2017-03-04T16:35:32Z

@Nuffe no need to apologize, you did the hardest part in finding them all.
The rest is mostly leg-work and ideal for new contributors.

kkatrio · 2017-03-06T17:04:10Z

Hi, I am working on this.

erikcs · 2017-03-11T20:18:46Z

@amueller and @kkatrio Regarding avoiding future commits with these issues

It of course makes sense to mention in CONTRIBUTING.md that test cases should set random_state, but good intentions are usually not enough without some enforcing mechanism in place.

I tried to write a Flake8 plugin here that checks scikit-learn test files for these transgressions.

If you install this plugin in the travis build then flake8_diff.sh automatically makes sure that Travis raises a build error for the pull request.

I am not an expert on static analysis, but got this plugin running on my machine. If you guys would want github access to that repo please let me know, or you could of course just fork it :)

kkatrio · 2017-03-12T21:12:52Z

@Nuffe I found your plugin very useful. I located a few more cases with it and added the fix to them. I think there are still some off suggestions, like when the random_state has already a default value in a class or by indicating that the random_state is missing from some "assert_raises(..)" . I will try to improve it. (Also by installing it I think something was tweaked in my system and some not relevant nosetests failed, but I ll look into it to be sure).

erikcs · 2017-03-13T09:56:45Z

@kkatrio thank you, I added you as a contributor on the plugin repo if you want to improve it directly.

a. I thought there where only classes/functions where random_state has a default value ofNone? (a design decision by the scikit-learn developers). And shouldn't random_state also be called in an assert_raises(..) since this will still cause mutating state in the rng?

b. Hmm, yes system-wide this plugin may cause problems since it is always on. Ideally it will only be installed in a suitable Travis test env, and only be invoked when Travis runs flake8_diff.sh to see if any new test cases forgets to set the random_state.

With the plugin installed and running nosetests -v on the latest master, I didn't get any failures (Anaconda Python 3.6). But if I run flake8 in the scikit-learn directory, and flake is run on every file in the repo, this plugin can cause an error to occur since it attempts to load all (test) files as a module (in my case this was because I had an old system wide install of sklearn, and ran flake8 in a directory with the latest master, where the plugin tried to load a test module (using the system sklearn) that had newly added functions)

However, I don't think this should be a big issue for the intended use case of only linting scikit-learn test files in a Travis build with all dependencies contained in a test env. (I added a more descriptive error to the plugin, for when using it on a local machine)

kkatrio · 2017-03-16T10:45:35Z

@Nuffe There are a few cases where the random_state is set by default, for example in datasets/tests/test-base.py the load_files function and in utils/tests/test_testing.py the set_random_state function, so I pushed some changes to avoid raising the warning if the random_state is already set in the definition. I will try to fix cases where the arguments -including the random_state- are passed to the estimator with **.

Also, in datasets/tests/test_kddcup99.py, there is assertion between shuffled and not shuffled data. So, should the warning be raised for line 14? If not, I am thinking how we could prevent cases like this, if possible.

glemaitre · 2017-03-16T10:55:07Z

I tried to write a Flake8 plugin here that checks scikit-learn test files for these transgressions.
If you install this plugin in the travis build then flake8_diff.sh automatically makes sure that Travis raises a build error for the pull request.
I am not an expert on static analysis, but got this plugin running on my machine. If you guys would want github access to that repo please let me know, or you could of course just fork it :)

@lesteve Do you think that it could be a good feature to add to the flake8/sanity script if #8563 is merged?

lesteve · 2017-03-22T13:23:59Z

@lesteve Do you think that it could be a good feature to add to the flake8/sanity script if #8563 is merged?

Sorry I forgot about this, personally I am not fully convinced that checking that random_state is set explicitly is always a great idea. Examples where it is not: checking error messages, checking invariants of the algorithm.

More generally I get the point that having deterministic tests is something to long for. At the same time, the test could only pass because I was lucky with my choice of random_state which is less than great.

IMO the picture is not so clear-cut and I am not in favour of enforcing through a automated tool that random_state is always set.

rth · 2018-06-01T09:50:43Z

IMO the picture is not so clear-cut and I am not in favour of enforcing through a automated tool that random_state is always set.

There was some related work with LGTM queries in #10416

amueller added Easy Well-defined and straightforward way to resolve Need Contributor Sprint labels Mar 4, 2017

kkatrio linked a pull request Mar 9, 2017 that will close this issue

[MRG] add random_state in tests estimators #8563

Open

amueller removed Need Contributor Sprint labels Jul 13, 2017

cmarmo added Needs Decision Requires decision and removed Easy Well-defined and straightforward way to resolve labels Apr 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Should all test cases set the random_state? #8194

Should all test cases set the random_state? #8194

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Should all test cases set the random_state? #8194

Should all test cases set the random_state? #8194

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!