8000 BUG: multiscale_graphcorr random state seeding and parallel use · Issue #11100 · scipy/scipy · GitHub
[go: up one dir, main page]

Skip to content
BUG: multiscale_graphcorr random state seeding and parallel use #11100
@rgommers

Description

@rgommers

Test failure observed in gh-11095:

scipy.stats.multiscale_graphcorr
--------------------------------
File "build/testenv/lib/python3.6/site-packages/scipy/stats/stats.py", line 4612, in multiscale_graphcorr
Failed example:
    '%.3f, %.2f' % (mgc.stat, mgc.pvalue)
Expected:
    '0.033, 0.02'
Got:
    '0.033, 0.03'

The issue is in the _ParallelP class, it uses np.random.permutation without the option of providing a Generator or RandomState object or a seed. This should be possible for reproducibility. The design pattern to use is a random_state=None keyword I think, just like the rvs method of all distributions, or stats.rvs_ratio_uniforms.

There's actually more to it than that - for parallel usage it's not clear that _ParallelP does the right thing. Looks like it should be using one of the parallel rng methods: https://numpy.org/devdocs/reference/random/index.html#parallel-generation
Those are not available in NumPy < 1.17 though. In the absence of that, we are likely to get issues with non-independent streams: numpy/numpy#9650. Should either at least generate different seeds before spawning new processes, or disable the feature.

@sampan501 @jovo thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    defectA clear bug or issue that prevents SciPy from being installed or used as expectedscipy.stats

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0