-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
Test failure observed in gh-11095:
scipy.stats.multiscale_graphcorr
--------------------------------
File "build/testenv/lib/python3.6/site-packages/scipy/stats/stats.py", line 4612, in multiscale_graphcorr
Failed example:
'%.3f, %.2f' % (mgc.stat, mgc.pvalue)
Expected:
'0.033, 0.02'
Got:
'0.033, 0.03'
The issue is in the _ParallelP
class, it uses np.random.permutation
without the option of providing a Generator
or RandomState
object or a seed. This should be possible for reproducibility. The design pattern to use is a random_state=None
keyword I think, just like the rvs
method of all distributions, or stats.rvs_ratio_uniforms
.
There's actually more to it than that - for parallel usage it's not clear that _ParallelP
does the right thing. Looks like it should be using one of the parallel rng methods: https://numpy.org/devdocs/reference/random/index.html#parallel-generation
Those are not available in NumPy < 1.17 though. In the absence of that, we are likely to get issues with non-independent streams: numpy/numpy#9650. Should either at least generate different seeds before spawning new processes, or disable the feature.
@sampan501 @jovo thoughts?