-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Should all test cases set the random_state? #8194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
IMHO, they shoud, and this is desirable.
|
You didn't actually check for 42 in the script, right? We also use other states (mostly 0 I think). |
I did not check what it was set to, I just checked if it was set (and I apologize for leaving this issue hanging). |
@Nuffe no need to apologize, you did the hardest part in finding them all. |
Hi, I am working on this. |
@amueller and @kkatrio Regarding avoiding future commits with these issues It of course makes sense to mention in CONTRIBUTING.md that test cases should set I tried to write a Flake8 plugin here that checks scikit-learn test files for these transgressions. If you install this plugin in the travis build then flake8_diff.sh automatically makes sure that Travis raises a build error for the pull request. I am not an expert on static analysis, but got this plugin running on my machine. If you guys would want github access to that repo please let me know, or you could of course just fork it :) |
@Nuffe I found your plugin very useful. I located a few more cases with it and added the fix to them. I think there are still some off suggestions, like when the random_state has already a default value in a class or by indicating that the random_state is missing from some "assert_raises(..)" . I will try to improve it. (Also by installing it I think something was tweaked in my system and some not relevant nosetests failed, but I ll look into it to be sure). |
@kkatrio thank you, I added you as a contributor on the plugin repo if you want to improve it directly. a. I thought there where only classes/functions where b. Hmm, yes system-wide this plugin may cause problems since it is always on. Ideally it will only be installed in a suitable Travis test env, and only be invoked when Travis runs With the plugin installed and running However, I don't think this should be a big issue for the intended use case of only linting scikit-learn test files in a Travis build with all dependencies contained in a test env. (I added a more descriptive error to the plugin, for when using it on a local machine) |
@Nuffe There are a few cases where the random_state is set by default, for example in Also, in |
@lesteve Do you think that it could be a good feature to add to the flake8/sanity script if #8563 is merged? |
Sorry I forgot about this, personally I am not fully convinced that checking that random_state is set explicitly is always a great idea. Examples where it is not: checking error messages, checking invariants of the algorithm. More generally I get the point that having deterministic tests is something to long for. At the same time, the test could only pass because I was lucky with my choice of IMO the picture is not so clear-cut and I am not in favour of enforcing through a automated tool that |
There was some related work with LGTM queries in #10416 |
Uh oh!
There was an error while loading. Please reload this page.
In this PR some newly added test cases made some unrelated tests fail. The culprit was mutating state due to missing
random_state
(s) in the unrelated tests' function/class invocations. @amueller also mention that:I made a simple python script here that goes through each of
sklearn
's test files and look for estimators/classes that take arandom_state
parameter, and see if this was passed in the invocation(s). If not then print out the class/function name followed by a list of line numbers with missing invocations.On the latest master branch (ca687ba) there are 1 155 cases in 154 test files where a class/function that takes a
random_state
is called without setting it to the canonical42
:The full output is here
If desirable, I could work on a PR for a find/replace on all these cases? If so, then the scikit-learn contributing docs should perhaps be explicit in stating that all added test cases that invoke an estimator that takes a
random_state
, should also set it / or add some kind of flake plugin that checks this?The text was updated successfully, but these errors were encountered: