8000 BUG: Fix nonrandom first uint32 in MT19937 state initialization. by mattip · Pull Request #14847 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Fix nonrandom first uint32 in MT19937 state initialization. #14847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

mattip
Copy link
Member
@mattip mattip commented Nov 7, 2019

Fixes gh-14844. As noted in the comments there by @rkern, the bug was to set the first byte to a value rather than to OR it with that value.

@mattip mattip added 00 - Bug 09 - Backport-Candidate PRs tagged should be backported labels Nov 7, 2019
@charris charris added this to the 1.17.4 release. milestone Nov 7, 2019
@charris
Copy link
Member
charris commented Nov 7, 2019

I suppose this would be rather tricky to test except by taking a large number of samples.

@charris charris changed the title BUG: fix first-drawn byte in MT19937 BUG: Fix nonrandom first-drawn uint32 in MT19937 state initialization. Nov 7, 2019
@charris charris changed the title BUG: Fix nonrandom first-drawn uint32 in MT19937 state initialization. BUG: Fix nonrandom first uint32 in MT19937 state initialization. Nov 7, 2019
Copy link
Member
@rkern rkern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is still wrong.

@rkern
Copy link
Member
rkern commented Nov 7, 2019

For testing, you can give it a SeedSequence that generates all 0s.

@rkern
Copy link
Member
rkern commented Nov 7, 2019

Or rather, a variety of values, in a controlled fashion.

@rkern
Copy link
Member
rkern commented Nov 7, 2019
|15> from numpy.random.bit_generator import ISeedSequence

|16> class FullSeedSequence(ISeedSequence):
...>     def __init__(self, x):
...>         self.x = x
...>     def generate_state(self, n_words, dtype=np.uint32):
...>         return np.full(n_words, self.x, dtype=dtype)
...>     

|17> for i in range(32):
...>     ss = FullSeedSequence(1 << i)
...>     key = MT19937(ss).state['state']['key']
...>     expected = np.full(624, 1 << i, dtype=np.uint32)
...>     expected[0] |= 0x80000000
...>     np.testing.assert_array_equal(key, expected)
...>     
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-17-357821e12562> in <module>()
      4     expected = np.full(624, 1 << i, dtype=np.uint32)
      5     expected[0] |= 0x80000000
----> 6     np.testing.assert_array_equal(key, expected)
      7 

/home/rkern/.edm/envs/python3/lib/python3.6/site-packages/numpy/testing/_private/utils.py in assert_array_equal(x, y, err_msg, verbose)
    916     __tracebackhide__ = True  # Hide traceback for py.test
    917     assert_array_compare(operator.__eq__, x, y, err_msg=err_msg,
--> 918                          verbose=verbose, header='Arrays are not equal')
    919 
    920 

/home/rkern/.edm/envs/python3/lib/python3.6/site-packages/numpy/testing/_private/utils.py in assert_array_compare(comparison, x, y, err_msg, verbose, header, precision, equal_nan, equal_inf)
    839                                 verbose=verbose, header=header,
    840                                 names=('x', 'y'), precision=precision)
--> 841             raise AssertionError(msg)
    842     except ValueError:
    843         import traceback

AssertionError: 
Arrays are not equal

Mismatch: 0.16%
Max absolute difference: 4294967295
Max relative difference: 2.
 x: array([2147483648,          1,          1,          1,          1,
                1,          1,          1,          1,          1,
                1,          1,          1,          1,          1,...
 y: array([2147483649,          1,          1,          1,          1,
                1,          1,          1,          1,          1,
                1,          1,          1,          1,          1,...

@rkern
Copy link
Member
rkern commented Nov 7, 2019

Also, don't forget to fix the other instance of this bug in _legacy_seeding(). That will be harder to test, for sure, but hopefully, it will be correct by inspection.

@mattip
Copy link
Member Author
mattip commented Nov 7, 2019

tests added for both fixes

Copy link
Member
@rkern rkern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is also this bug in mt19937_init_by_array, which is likely the original typo that got translated into the other code paths. It's only used in the legacy seeding for RandomState

[~]
|4> np.random.RandomState([10, 20, 30]).get_state()[1][0]
2147483648

[~]
|5> np.random.RandomState([10, 20, 31]).get_state()[1][0]
2147483648

[~]
|6> np.random.RandomState([10, 20, 32]).get_state()[1][0]
2147483648

[~]
|8> np.random.MT19937([10, 20, 30]).state['state']['key'][0]
3742924469

[~]
|9> np.random.MT19937([10, 20, 31]).state['state']['key'][0]
2480826684

[~]
|10> np.random.MT19937([10, 20, 32]).state['state']['key'][0]
3174699929

@mattip
Copy link
Member Author
mattip commented Nov 7, 2019

@rkern tested and fixed for mt19937_init_by_array

Copy link
Member
@rkern rkern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for seeing this through!

This will definitely require some release notes.

@charris
Copy link
Member
charris commented Nov 7, 2019

Was this also a problem with the legacy code in 1.16.x?

EDIT: Looks like yes. Is it worth backporting to 1.16.x?

@rkern
8000
Copy link
Member
rkern commented Nov 7, 2019

No, all this was introduced in 1.17

@charris
Copy link
Member
charris commented Nov 7, 2019

There is this in 1.16.x initarray.c

    mt[0] = 0x80000000UL; /* MSB is 1; assuring non-zero initial array */

@rkern
Copy link
Member
rkern commented Nov 7, 2019

Huh. Okay. So I guess we ought to revert that one to maintain our stream-compatibility guarantees.

@rkern
Copy link
Member
rkern commented Nov 7, 2019

That one dates all the way back. It's still that way in Python. My apologies. I knew that randomkit's rk_randomseed() initialization had the |=, which meant that RandomState().get_state()[1][0] returned something different each time, and we want to do the same whenever a SeedSequence is used (regardless of how it was initialized). I didn't thoroughly check all of the code paths in the pre-1.17 code.

@rkern
Copy link
Member
rkern commented Nov 7, 2019

Do we have legacy-seeding stream tests for the array-seed initialization in addition to the integer-seed route? We probably should.

@mattip
Copy link
Member Author
mattip commented Nov 8, 2019

Only mt19937_init_by_array should be reverted, the other changes are bugs that made post-randomgen state['key'] incompatible with pre-randomgen?

@rkern
Copy link
Member
rkern commented Nov 8, 2019

Yes. Let's add some stream tests to be sure.

@rkern
Copy link
Member
rkern commented Nov 8, 2019

Hmm, seems like TestSeed.test_array() ought to have caught the change.

@rkern
Copy link
Member
rkern commented Nov 8, 2019

Ah! The lower bits of the first word in the key never actually participate in the algorithm!

for (i = 0; i < N - M; i++) {
y = (state->key[i] & UPPER_MASK) | (state->key[i + 1] & LOWER_MASK);
state->key[i] = state->key[i + M] ^ (y >> 1) ^ (-(y & 1) & MATRIX_A);
}

UPPER_MASK = 0x80000000UL and LOWER_MASK = 0x7fffffffUL

key[0] & UPPER_MASK gets used and then key[0] gets reassigned. key[0] gets read later, but only after it gets obliterated.

Nothing was ever broken! It just looked distressing to the user. We can just close this, if we want. Sorry, @mattip!

@mattip mattip closed this Nov 8, 2019
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Nov 8, 2019
@charris charris removed this from the 1.17.4 release. milestone Nov 8, 2019
@mattip mattip deleted the issue-14844 branch November 2, 2020 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

numpy.random.RandomState() fails silently if /dev/urandom returns nothing
3 participants
0