-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: use SeedSequence instead of seed() #13780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hmm, python3.5 does not have a |
You can use random_entropy from numpy.random.entropy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the multitude of comments.
I think BitGenerator is effectively to defining the interface and so needs to be adaptable to different generators.
numpy/random/bit_generator.pyx
Outdated
else: | ||
raise TypeError('{}(seed) expects int or SeedSequence for seed ' | ||
'not {}'.format(type(self).__name__, seed)) | ||
self._seed = seed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the seed get set into the bit generators state? In the old model, you would not call self.seed(self._seed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could use a private class self._set_seed(seed) that is requried by children that handles specific seeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand. I was thinking that the the BitGenerator
class would call self._set_seed(self._seed)
(which is provided by PCG64._set_seed(self, value)
) at the end of its __init__
. Might not be worth the hassle since then PCG64.__init__
would have to ensure that everything existed before the super call (which is fine for PCG64, but not for a generator that needs aligned memory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BitGenerators call SeedSequence.generate_state
, then use that value to set their seed. in particular PCG64 calls
val = self._seed_seq.generate_state(4, np.uint64);
pcg64_set_seed(&self.rng_state,
<uint64_t *>np.PyArray_DATA(val),
<uint64_t *>np.PyArray_DATA(val) + 2)
IMO this is not a good idea. Think about testing. Why create many generators when you really want to use the same one with the same state? The cost of creating a generator may also depend on the size of the state. For example, MT19937 allocates 4*624 + 4 bytes of state. |
It's actually pretty hard in most test frameworks to make just one object for the test suite and reseed it each test. You have to go out of your way to do it. It's much easier just to make it fresh each time in the |
Doc failure looks legitimate. |
What I don't understand is that is the upside to moving seed from being public to private? There is a downside (creating 2 or 3 extra objects, depending on how the seed is implemented). IME in pytest is easy to create a global automatically reset generator in conftest using a fixture, e.g.,
and then
Of course one can do the same thing by grabbing and then resetting the initial state, so maybe this isn't much of a concern in this context. The proof is in the pudding I suppose: In [20]: %timeit rg.Generator(rg.PCG64())
27 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [21]: %timeit g.seed(12334)
5.49 µs ± 75.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [22]: state = g.state
In [23]: %timeit g.state = state
2.39 µs ± 25.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) So directly setting the state is fastest, which makes sense since it has the fewest safeties. This said, none are too slow in any important way, and even 10K calls is < 1s. |
I am aiming for the minimal interface for the first release. We can always add |
We've seen such great misuse of |
I have to agree with being minimal. If there is any other reason except slight optimization, sure. But I doubt few tests even notice that slowdown. |
Does the design pattern for SeedSequence seem reasonable (create it in the base |
I have moved dSFMT, MT19937 and PCG32 to use seeding via SeedSequence (MT19937 will only use the new SeedSequence if a SeedSequence is provided for legacy reasons). I also updated the documentation. I don't think it makes sense to use SeedSequence in the counter-based ThreeFry and Philox since they already mix some entropy into their initial state. |
The use of The only thing that needs the old |
numpy/random/mt19937.pyx
Outdated
seed = random_entropy(1, 'fallback') | ||
mt19937_seed(&self.rng_state, seed[0]) | ||
elif isinstance(seed, SeedSequence): | ||
val = self._seed_seq.generate_state(1, np.uint32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please get at least 128 bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is more better? Should I use RK_STATE_LEN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we'd bypass mt19937_init_by_array()
entirely by pulling RK_STATE_LEN
uint32
words, then make sure the first word isn't 0 like in the usual routine. Then that's the state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will let us replace SeedSequence
with a LegacyMTSeedSequence
at the Python level that replicates mt19937_init_by_array()
. That would be a good way to implement RandomState(seed)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If following this path, should probably disallow BitGenerators
from being input into RandomState
, so that RandomState(MT19937(seed))
is not allowed since it will be different from RandomState(seed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to refactor the code from mt19937_init_by_array
into a LegacyMTSeedSequence
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, @classmethod
it is, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked. Now RandomState.seed
calls MT19337.legacy_seeded
. If RandomState
has a different BitGenerator, the call will fail. Also reworked the tests to create new Generators for each test rather than reseed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, NEP 19 says (well, I said) that in such cases we should pass the provided seed through to the BitGenerator
. Though at the time I was probably still thinking of BitGenerator.seed()
being an API that people might use, however disrecommended it was. I think I prefer the semantics you have now, though. If people are calling np.random.seed()
, they probably wanted to reproduce the old bitstream, so making this error out when someone has threaded through another BitGenerator
should raise the discrepancy up to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If needed we can propose a change to the NEP since we now have a better picture of how this all should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly formatting issues.
numpy/random/mt19937.pyx
Outdated
seed = random_entropy(1, 'fallback') | ||
mt19937_seed(&self.rng_state, seed[0]) | ||
elif isinstance(seed, SeedSequence): | ||
val = self._seed_seq.generate_state(1, np.uint32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If following this path, should probably disallow BitGenerators
from being input into RandomState
, so that RandomState(MT19937(seed))
is not allowed since it will be different from RandomState(seed)
I've been wondering about the API design for this. Is it necessary for a BitGenerator to expose this up to a Generator? In particular, I could imagine another approach that has a signature like
that would be able to enable sequenced creation of any Generator-like object from any BitGenerator-like object as long as the BitGenerator can accept 32-bit unsigned integers. This avoids tight coupling of SeedSequence at the bottom and simplifies writing Generator-like classes that could contain user-provided distributions. This is orthogonal to the use of SeedSequence internally to the BitGenerators I think. |
Can you show a usage example? I don't have a clear idea from that constructor what you are proposing. |
Adopting |
What is "This"? |
Well now you have the test's name |
IIRC all of the tests use a scalar seed. Might be worth triple checking that there is at least one that uses the vector seed path, and adding one with a random-looking seed. |
TestSeed seems to have the cases you mentioned. |
Are you able to see the code context of my comment? I made it on the commit, so maybe it's not being displayed to you.
|
Since it seems Philox is one of the BitGenerators that will be shipped, I refactored it to use SeedSequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are just a few items left that could be addressed in a subsequent PR. Getting this in will help unblock some of the other PRs.
Seeds can be passed to any of the BitGenerators. The provided value is mixed | ||
via `~.SeedSequence` to spread a possible sequence of seeds across a wider | ||
range of initialization states for the BitGenerator (except for `~.MT19937` | ||
which for legacy reasons will use the older initialization methods). Here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now untrue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updating
`RandomState` object. Random number generation is separated into two | ||
components, a bit generator and a random generator. | ||
`RandomState` object. Random number generation is separated into three | ||
components, a seed sequencer, a bit generator and a random generator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
components, a seed sequencer, a bit generator and a random generator. | |
components, a seed sequence, a bit generator and a random generator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adopting
@@ -153,6 +170,11 @@ The included BitGenerators are: | |||
* MT19937 - The standard Python BitGenerator. Produces identical results to | |||
Python using the same seed/state. Adds a `~mt19937.MT19937.jumped` function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only true with _legacy_seeding()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing
numpy/random/mt19937.pyx
Outdated
sequence) of unsigned 32-bit integers, a `SeedSequence`, or ``None`` | ||
(the default). If `seed` is ``None``, a 32-bit unsigned integer is | ||
read from ``/dev/urandom`` (or the Windows equivalent) if available. If | ||
unavailable, a 32-bit hash of the time and process ID is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This text is out of date. It should be rationalized with the other SeedSequence
-using BitGenerator
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing
self.rng_state.ctr.v[i] = counter[i] | ||
|
||
self._reset_state_variables() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ctypes
and cffi
properties can also be deleted, as they are provided by BitGenerator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing
@@ -132,7 +131,7 @@ def test_permutation_subclass(self): | |||
class N(np.ndarray): | |||
pass | |||
|
|||
mt19937.bit_generator.seed(1) | |||
mt19937 = RandomState(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test using RandomState
while all of the others using Generator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing last vestiges of RandomState from this file.
Needs a rebase, of course. |
I will rebase once #13793 is merged |
There are 31 commits here, should I rebase and squash them down to 1 or leave it be? |
Might as well squash them down, it will make the rebase easier. |
Should be good to rebase now. |
CircleCI has found a couple of nits. |
def __init__(self, seed=None, counter=None, key=None): | ||
def __init__(self, seed_seq=None, counter=None, key=None): | ||
if seed_seq is not None and key is not None: | ||
raise ValueError('seed and key cannot be both used') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be 'seed_seq and key`
n_children_spawned : {int}, optional | ||
The number of children already spawned. Only pass this if | ||
reconstructing a `SeedSequence` from a serialized form. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need an Attributes section for program_entropy
and pool
? If these are private should they be _ed?
Fixed the merge problem, let's see if things go. @bashtage Lets get this in and then do the fixups. I think it is getting late out your way and Matti will be traveling tomorrow. |
What should be the fix here?
|
Just import it. With all of the squashing, it's hard to tell what happened. |
Thank you all for your patience and endurance! |
Modify BitGenerators to use a SeedSequence class for seeded entropy, the class also provides a spawned() API to generate a number of independent entropy states for creating independent random streams.
Based on this gist by @rkern, the idea is
BitGenerator(seed)
accepts an int, a sequence of ints, or a SeedSequence.seed
method to modify an existing BitGenerator, since creation of a new one is cheap. To seed a BitGenerator, simply create a new one with the desired seed value. This may change based on user acceptance of the idea.The first commit only changes PCG64, as a proof of concept, since the change is rather large, and does not expose the SeedSequence's
spawn
method to the BitGenerator.