-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: Add PCG64DXSM BitGenerator #18906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@rkern I fixed the linting. Could you add a release note? |
`PCG64DXSM`: collisions are possible, but both streams have to simultaneously | ||
be both "close" in the 128 bit state space *and* "close" in the 127-bit | ||
increment space, so the negligible chance of colliding in the 128-bit internal | ||
`SeedSequence` pool is more likely. The DXSM output function is more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"pool is less likely"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It is more likely that you will observe a collision in the SeedSequence
pool (creating completely identical seeds) than to see one of the poorly-correlated pairs of distinct streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the verb to "would be" would help (and be more consistent with the other verbiage we have on the subject).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the SeedSequence
affect both increment and state, or just the state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both increment and state. During the mega-thread, we decided that was the best design, especially once we got strong seeding with SeedSequence
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both increment and state
Does this make the seed sequences differ between bit-generators or is the increment derived from the state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me back up a bit to where I suspect the disconnect is. We'll see if I guessed right.
SeedSequence
takes an arbitrary sequence of (arbitrary-sized) integers and hashes them down into an internal pool of bits. The default pool size is 128 bits. The BitGenerator
then asks the SeedSequence
for however many bits it needs to initialize itself. For MT19937
, that's a plump 624 uint32
s. For the PCG64
s, that's the relatively-svelte 128-bit state and 127-bit increment (added together and rounded up to an even 256 bits total). The SeedSequence
uses what amounts to a slow, carefully constructed PRNG to generate however many bits are requested. It's not just copying its internal pool of 128 bits into the BitGenerator
state.
The claim I am making here is that the odds that two distinct SeedSequence
s generate a pair of distinct, bad PCG64DXSM
instances is less than the odds that two distinct user inputs hashed down to the same 128-bit internal pool. And the odds of that 128-bit collision are ignorably tiny.
I like the technical explanation. It might be useful to others considering using a PCG64 generator, for instance the Go 2 developers. |
Is this PCG64CMDXSM or PCG64DXSM? I always thought these were distinct variants of PCG64. If it is CM then I think the docstring for the class should mention the difference since both the step and the output function are changing. |
@bashtage I added some more words to the appropriate part of the docstring. I'm not going to expand the name of the class, though. It's already a carcrash of random characters, and I don't want to make it any longer. :-) |
I just ping @imneme to keep her updated. |
LGTM. I'll redo timings after this is merged. |
Thanks Robert. |
@rkern can you please explain how the CSV files with the test data were generated, are they for PCG64 CM DXSM or another variant? E.g. for 0 seed, shouldn't we start with:
This is what I get when using |
@flyingmutant Are you transforming the seed by |
I am using the seed directly (like |
Not anymore. We should probably delete that. |
It comes from an initial version of |
Closes #16313
This PR adds the
PCG64DXSM
BitGenerator
.I have a documentation page explaining the issue and when you should worry about it. It is my usual overcomplicated prose (with my penchant for parentheticals), but I did try to give clear practical advice upfront and shunt all of the technical explanation down to the bottom. I imagine it could use a few more editing passes, but probably nothing that would prevent a release. Inserting references to it in all of the relevant places was even less elegant, but it gets the job done. As in NEP 19, I name myself as the maker of mistakes, which I feel is appropriate, but I'm not wedded to it if it doesn't match the tone we want in the documentation.
I updated the performance benchmarking script that feeds the tables in the documentation, but I did not replace the tables as I don't have ready access to all of the platforms. @bashtage I think you ran those, am I right?