8000 numpy.random.RandomState() fails silently if /dev/urandom returns nothing · Issue #14844 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

numpy.random.RandomState() fails silently if /dev/urandom returns nothing #14844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you a 8000 gree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kskyten opened this issue Nov 6, 2019 · 3 comments
Closed

Comments

@kskyten
Copy link
kskyten commented Nov 6, 2019

For some reason, I get nothing back when I run cat /dev/urandom | base64 | head -c 5 on my system. This turns numpy deterministic silently.

Reproducing code example:

Running

import numpy as np
print(np.random.RandomState().get_state()[1][0])
print(np.random.RandomState().get_state()[1][0])

should print two different numbers. I get the same number twice.

Numpy/Python version information:

1.17.3 3.7.5 (default, Oct 17 2019, 12:09:47)
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]

@kskyten
Copy link
Author
kskyten commented Nov 6, 2019

Actually, I misread the output from /dev/urandom (there was no newline in the output). I get random values when running echo $(cat /dev/urandom | base64 | head -c 5). I'm still getting deterministic output for the above script, though.

@rkern
Copy link
Member
rkern commented Nov 7, 2019

Only the first element of the state is deterministic. The rest of the state is initialized correctly from the seeding data provided by /dev/urandom. This is a bug, but not one that affects the quality of the initialization much. If you drew random numbers from RandomState, you should be getting different numbers each time: print(np.random.RandomState().random()) instead.

@mattip The bug is in this line:

https://github.com/numpy/numpy/blob/master/numpy/random/_mt19937.pyx#L132

You're supposed to |= 0x80000000UL, not = 0x80000000UL.

https://github.com/numpy/numpy/blob/v1.16.3/numpy/random/mtrand/randomkit.c#L186

@rkern
Copy link
Member
rkern commented Nov 8, 2019

Actually, this is not true. The lower 31 bits of the first word in the state never get used by the algorithm.

https://github.com/numpy/numpy/blob/v1.17.3/numpy/random/src/mt19937/mt19937.c#L87-L90

UPPER_MASK = 0x80000000UL and LOWER_MASK = 0x7fffffffUL

key[0] & UPPER_MASK gets used and then key[0] gets reassigned. key[0] gets read later, but only after it gets obliterated.

This is why the algorithm has a period of 2**19937 - 1, or 2**(623*32 + 1) - 1 (the -1s are because the all-0 case is not part of the cycle). The state is actually 623 32-bit words plus the one high bit from the first word. The 31 low bits of the first word do not participate in the algorithm.

Back when numpy.random was first implemented, I used freely available C code that implemented the algorithm. In the routine that read from /dev/urandom, it used the |= construction rather than the = construction. Somehow, that stuck in my head, not the init_by_array() function that I adopted from CPython's random module, which uses the = construction, which is also valid (and arguably preferable).

The fact that np.random.RandomState().get_state()[1][0] used to change on every call, it made no difference. The only bit in that value that mattered was always forced to 1 in 1.16 and 1.17.

@rkern rkern closed this as completed Nov 8, 2019
@charris charris removed this from the 1.17.4 release. milestone Nov 12, 2019
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Nov 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0