8000 BUG: bit generator spawns different child generators despite having the same random state · Issue #27882 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: bit generator spawns different child generators despite having the same random state #27882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JakobGruen opened this issue Dec 1, 2024 · 4 comments
Labels
33 - Question Question about NumPy usage or development

Comments

@JakobGruen
Copy link
JakobGruen commented Dec 1, 2024

Describe the issue:

When using the default_rng to spawn child rngs, the random states of the children are different, even if I reset the state of the parent generator.

If the random states are the same, all the generators' functions should produce the same result given the same input. I think this should include spawning children.

Reproduce the code example:

import numpy as np

# create rng and save state
rng = np.random.default_rng()
print(rng.bit_generator.state)
state = rng.bit_generator.state

# spawn a child
child1 = rng.spawn(1)[0]
print(child1.bit_generator.state)
print(child1.uniform(0, 1, 2))

# reset the state and confirm
rng.bit_generator.state = state
print(rng.bit_generator.state == state)

child2 = rng.spawn(1)[0]
print(child2.bit_generator.state)
print(child2.uniform(0, 1, 2))

Error message:

No response

Python and NumPy Versions:

Python 3.12.7

numpy 1.26.4
Also in numpy 2.1.3

Runtime Environment:

No response

Context for the issue:

I am not completely sure if this is a bug or intentional behavior, but it is confusing at least. I could not find any explanation of this behavior in the documentation and it seems counterintuitive.

If the seed is set for the first rng, it still produces different child states, but they are always the same.

I was trying to spawn child and grandchild rngs to be able to reproduce only parts of my simulation without having to rerun everything. But when I tried to recreate it from a child rng, I got different results. If I set the seed in the beginning, I can still recreate the whole simulation and I could create a list of new seeds instead of spawning children, but I would prefer not to do it that way.

@seberg seberg added 33 - Question Question about NumPy usage or development and removed 00 - Bug labels Dec 1, 2024
@rkern
Copy link
Member
rkern commented Dec 2, 2024

Spawning is covered here. Please note that the state that is spawned is the SeedSequence from the initial seeding of the BitGenerator, not the current state of the BitGenerator. This is somewhat obscured by Generator.spawn() really being syntactic sugar for reaching down to call SeedSequence.spawn(), but we do point to the documentation that covers it thoroughly.

We can still talk about solving your problem, but we'd need more details, and the Discourse forum would probably be a better venue. In short, my advice would probably be to avoid resetting the states of BitGenerators. I don't know what your use case is for it, but there's probably better ways to accomplish what you are doing. I've discussed similar use cases here and here with recommended patterns.

8000

@JakobGruen
Copy link
Author

Okay, I see. I have to admit, that I was not aware of how the SeedSequence object works, but I think I got it now. Thanks for your answer and the links.

I thought it must be a bug since the state should be the only important datum to determine the behavior of a BitGenerator. This is still true, I suppose. It's just that when I call spawn on the BitGenerator, it's actually the linked SeedSequence that is doing the spawning.

About my use case, I have a main script that is starting several independent jobs on a server. Each of those jobs is running processes in parallel. Since the jobs are running independently on different machines, I needed a way to save and load the random generators to a file. Also, I want to be able to reproduce any of the jobs without rerunning the entire script.

I was switching from generating a list of seeds to pass down to spawning child rngs, and thought if I pass along the initial state, it would amount to the same. But apparently, it's the SeedSequence of the children, and in particular its initial entropy that I should pass down and save for later.

@rkern
Copy link
Member
rkern commented Dec 3, 2024

Spawning Generators in the parent process and sending them down to the child processes should still work fine and insulate you from the details of working manually with SeedSequence (also a viable option, but there are simpler ways). You do need to save the spawned Generator faithfully (i.e. by pickle) immediately before consuming anything from them (either random method calls or further spawning), but that would be good practice, regardless.

I suspect what led you down the wrong path was trying to manually reach into the PRNG state and saving that instead of using pickle, which will save every important detail, including the SeedSequence state. See this discussion for more details and alternatives (if you absolutely must avoid pickle).

@JakobGruen
Copy link
Author

Yes, I suppose using pickle would be the best option. I was passing along some other parameters in a JSON file (so that I can read it easily) and wanted to avoid adding another file. But as you say, that led me down the wrong path...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
33 - Question Question about NumPy usage or development
Projects
None yet
Development

No branches or pull requests

3 participants
0