8000 Should all test cases set the random_state? · Issue #8194 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Should all test cases set the random_state? #8194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
erikcs opened this issue Jan 12, 2017 · 12 comments · May be fixed by #8563
Open

Should all test cases set the random_state? #8194

erikcs opened this issue Jan 12, 2017 · 12 comments · May be fixed by #8563
Labels
Needs Decision Requires decision

Comments

@erikcs
Copy link
Contributor
erikcs commented Jan 12, 2017

In this PR some newly added test cases made some unrelated tests fail. The culprit was mutating state due to missing random_state(s) in the unrelated tests' function/class invocations. @amueller also mention that:

Ideally we would explicitly pass random states everywhere in all tests. To fix the test failure, maybe fix the random states in the test that's failing. You can also add it in the test that you added.

I made a simple python script here that goes through each of sklearn's test files and look for estimators/classes that take a random_state parameter, and see if this was passed in the invocation(s). If not then print out the class/function name followed by a list of line numbers with missing invocations.

On the latest master branch (ca687ba) there are 1 155 cases in 154 test files where a class/function that takes a random_state is called without setting it to the canonical 42:

erik  scikit-learn$ find * -type f | grep "/test_.*py$" | python rcheck.py
sklearn/tree/tests/test_tree.py
	DecisionTreeClassifier [398, 1173, 1210, 1231, 1237, 1531]
	DecisionTreeRegressor [1532]
sklearn/tree/tests/test_export.py
        DecisionTreeClassifier [197, 212]
.
.
.
sklearn/model_selection/tests/test_search.py
	DecisionTreeClassifier [868]
	DecisionTreeRegressor [868]
	GroupShuffleSplit [242]
	KFold [241, 250, 491, 503, 919, 1172, 1193]
	LinearSVC [292, 356, 363, 376, 383]
	LinearSVCNoScore [101, 210]
	ParameterSampler [1093, 1096, 1113]
	RandomizedSearchCV [265, 704, 708, 753, 798, 839, 886, 946, 983]
	SGDClassifier [1126, 1141]
	SVC [292, 301, 305, 325, 333, 340, 348, 356, 363, 376, 383, 417, 441, 644, 647, 704, 708, 752, 753, 795, 798, 838, 839, 1003]
	StratifiedKFold [250, 919]
	StratifiedShuffleSplit [250]

Found 1155 missing in 154 files

The full output is here

If desirable, I could work on a PR for a find/replace on all these cases? If so, then the scikit-learn contributing docs should perhaps be explicit in stating that all added test cases that invoke an estimator that takes a random_state, should also set it / or add some kind of flake plugin that checks this?

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Jan 14, 2017 via email

@amueller amueller added Easy Well-defined and straightforward way to resolve Need Contributor Sprint labels Mar 4, 2017
@amueller
Copy link
Member
amueller commented Mar 4, 2017

You didn't actually check for 42 in the script, right? We also use other states (mostly 0 I think).

@erikcs
Copy link
Contributor Author
erikcs commented Mar 4, 2017

I did not check what it was set to, I just checked if it was set (and I apologize for leaving this issue hanging).

@amueller
Copy link
Member
amueller commented Mar 4, 2017

@Nuffe no need to apologize, you did the hardest part in finding them all.
The rest is mostly leg-work and ideal for new contributors.

@kkatrio
Copy link
kkatrio commented Mar 6, 2017

Hi, I am working on this.

@kkatrio kkatrio linked a pull request Mar 9, 2017 that will close this issue
@erikcs
Copy link
Contributor Author
erikcs commented Mar 11, 2017

@amueller and @kkatrio Regarding avoiding future commits with these issues

It of course makes sense to mention in CONTRIBUTING.md that test cases should set random_state, but good intentions are usually not enough without some enforcing mechanism in place.

I tried to write a Flake8 plugin here that checks scikit-learn test files for these transgressions.

If you install this plugin in the travis build then flake8_diff.sh automatically makes sure that Travis raises a build error for the pull request.

I am not an expert on static analysis, but got this plugin running on my machine. If you guys would want github access to that repo please let me know, or you could of course just fork it :)

@kkatrio
Copy link
kkatrio commented Mar 12, 2017

@Nuffe I found your plugin very useful. I located a few more cases with it and added the fix to them. I think there are still some off suggestions, like when the random_state has already a default value in a class or by indicating that the random_state is missing from some "assert_raises(..)" . I will try to improve it. (Also by installing it I think something was tweaked in my system and some not relevant nosetests failed, but I ll look into it to be sure).

@erikcs
Copy link
Contributor Author
erikcs commented Mar 13, 2017

@kkatrio thank you, I added you as a contributor on the plugin repo if you want to improve it directly.

a. I thought there where only classes/functions where random_state has a default value ofNone? (a design decision by the scikit-learn developers). And shouldn't random_state also be called in an assert_raises(..) since this will still cause mutating state in the rng?

b. Hmm, yes system-wide this plugin may cause problems since it is always on. Ideally it will only be installed in a suitable Travis test env, and only be invoked when Travis runs flake8_diff.sh to see if any new test cases forgets to set the random_state.

With the plugin installed and running nosetests -v on the latest master, I didn't get any failures (Anaconda Python 3.6). But if I run flake8 in the scikit-learn directory, and flake is run on every file in the repo, this plugin can cause an error to occur since it attempts to load all (test) files as a module (in my case this was because I had an old system wide install of sklearn, and ran flake8 in a directory with the latest master, where the plugin tried to load a test module (using the system sklearn) that had newly added functions)

However, I don't think this should be a big issue for the intended use case of only linting scikit-learn test files in a Travis build with all dependencies contained in a test env. (I added a more descriptive error to the plugin, for when using it on a local machine)

@kkatrio
Copy link
kkatrio commented Mar 16, 2017

@Nuffe There are a few cases where the random_state is set by default, for example in datasets/tests/test-base.py the load_files function and in utils/tests/test_testing.py the set_random_state function, so I pushed some changes to avoid raising the warning if the random_state is already set in the definition. I will try to fix cases where the arguments -including the random_state- are passed to the estimator with **.

Also, in datasets/tests/test_kddcup99.py, there is assertion between shuffled and not shuffled data. So, should the warning be raised for line 14? If not, I am thinking how we could prevent cases like this, if possible.

@glemaitre
Copy link
Member

I tried to write a Flake8 plugin here that checks scikit-learn test files for these transgressions.
If you install this plugin in the travis build then flake8_diff.sh automatically makes sure that Travis raises a build error for the pull request.
I am not an expert on static analysis, but got this plugin running on my machine. If you guys would want github access to that repo please let me know, or you could of course just fork it :)

@lesteve Do you think that it could be a good feature to add to the flake8/sanity script if #8563 is merged?

@lesteve
Copy link
Member
lesteve commented Mar 22, 2017

@lesteve Do you think that it could be a good feature to add to the flake8/sanity script if #8563 is merged?

Sorry I forgot about this, personally I am not fully convinced that checking that random_state is set explicitly is always a great idea. Examples where it is not: checking error messages, checking invariants of the algorithm.

More generally I get the point that having deterministic tests is something to long for. At the same time, the test could only pass because I was lucky with my choice of random_state which is less than great.

IMO the picture is not so clear-cut and I am not in favour of enforcing through a automated tool that random_state is always set.

@rth
Copy link
Member
rth commented Jun 1, 2018

IMO the picture is not so clear-cut and I am not in favour of enforcing through a automated tool that random_state is always set.

There was some related work with LGTM queries in #10416

@cmarmo cmarmo added Needs Decision Requires decision and removed Easy Well-defined and straightforward way to resolve labels Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Decision Requires decision
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants
0