WIP: Small Fast Chaotic PRNGs #41

rkern · 2019-06-22T04:42:03Z

Here are two more PRNGs to enter the fray. They are each at least as fast as Xoshiro256, at least on my 64-bit Linux machine. I expect they'll be about the same on 32-bit machines, but I have not tested that.

❯ python benchmark.py
--------------------------------------------------------------------------------

Time to produce 1,000,000 Uniforms
************************************************************
JSF64         2.38 ms
MT19937       6.88 ms
PCG32         3.58 ms
PCG64         4.10 ms
Philox        6.15 ms
SFC64         2.44 ms
ThreeFry      8.53 ms
Xoshiro256    2.80 ms
numpy         6.88 ms
dtype: object

Uniforms per second
************************************************************
JSF64         419.99 million
MT19937       145.28 million
PCG32         279.56 million
PCG64         243.88 million
Philox        162.65 million
SFC64         410.45 million
ThreeFry      117.18 million
Xoshiro256    356.89 million
numpy         145.28 million
dtype: object

Speed-up relative to NumPy
************************************************************
JSF64         189.1%
MT19937         0.0%
PCG32          92.4%
PCG64          67.9%
Philox         12.0%
SFC64         182.5%
ThreeFry      -19.3%
Xoshiro256    145.7%
dtype: object
--------------------------------------------------------------------------------

They are each 256-bit PRNGs architected fairly similarly, based on random invertible mappings. They each have a small number of separate cycles within their state space, but it is exceedingly unlikely that your seeding won't fall on the big ones. You can expect a period of about 2**255. SFC64 incorporates a 64-bit counter in its state so you have the extra safety that, should you happen to be enormously unlucky, you always have a minimum of 2**64 states in your cycle.

Both are very well-tested. JSF64 has been analyzed for a long time. SFC64 seems to be inspired by that work; it was written by the author of PractRand so it too has been pretty thoroughly tested.

Their one downside is that they don't have an efficient jumpahead operation. @imneme (whose implementation I referenced) seems to think that it might be doable for SFC64, but I don't know where to start with that.

* BUG: Keep the integers uint32_t, add test

* ENH: Expose and use n_children_spawned * DOC: docstring formatting and cleanups. * TST: more test coverage of SeedSequence.

…ivate

bashtage · 2019-06-22T21:10:18Z

Some Win64 timings

Time to produce 1,000,000 Uniforms
************************************************************
JSF64          5.45 ms
MT19937       10.48 ms
PCG64          6.23 ms
Philox        27.40 ms
SFC64          5.20 ms
ThreeFry      20.50 ms
Xoshiro256     6.89 ms

and Win32

Time to produce 1,000,000 Uniforms
************************************************************
JSF64          13.88 ms
MT19937        13.98 ms
PCG64          46.19 ms
Philox         60.32 ms
SFC64          13.19 ms
ThreeFry       41.87 ms
Xoshiro256     15.91 ms

mattip · 201 8000 9-06-22T21:21:17Z

I would prefer to add only one of them to numpy. We could add both to the collection of alternative BitGenerators in numpy/bitgenerators

rkern · 2019-06-22T21:40:17Z

I would prefer to add only one of them to numpy.

Certainly, hence the WIP. I wanted both up for comparison purposes. I'd pick SFC64 of the two.

imneme · 2019-06-23T00:31:23Z

BTW, I suspect the comment about jump-ahead is a copy-and-paste-o. The same comment is at the bottom of some other PRNGs where it is actually true.

FWIW, I reported to Chris that PractRand claimed to find a problem with SFC64 (but not the smaller variant). In version 0.94 of PractRand Chris changed the constants for SFC, which may address/avoid the issue.

rkern · 2019-06-23T01:05:04Z

Hmm, no, there's no difference in the sfc64 implementation between 0.93 and 0.94. The last update to the constants happened in 0.93.

imneme · 2019-06-23T01 8000 :22:07Z

Oh, sorry, I misread the release notes. You're right, the constants were improved in 0.93. Which is unfortunate because it means that the version that failed pat5 is still the standard one. Someone should rerun the test just to be sure, but it'll take 12 days 13.5 hours on my (somewhat slow but 48-core) machine. You can rerun it with:

./RNG_test sfc64 -te 1 -tlmin 20 -tlmax 50 -seed 0xdf7e5fd0802274af

In the interests of making sure the result reproduces, I've set it rerunning on a faster machine. It should be done in about a week.

rkern · 2019-06-23T02:37:42Z

I've patched PractRand to let me just run the Pat5 tests. Let's see if that provokes the failure any faster in wall-clock time.

imneme · 2019-06-23T08:39:46Z

FWIW, here's the full log file from when I tested it. The whole thing is pretty weird. It's super unhappy about it, but then its outrage fades as more data comes in.

sfc64-test.log

rkern · 2019-06-23T20:08:58Z

Interesting. In my Pat5-only run, I got to 64 TiB without a failure:

❯ ./RNG_test sfc64 -te 100 -tlmin 30 -tlmax 50 -seed 0xdf7e5fd0802274af -multithreaded
...
rng=sfc64, seed=0xdf7e5fd0802274af
length= 64 terabytes (2^46 bytes), time= 46871 seconds
  no anomalies in 72 test result(s)

And then promptly mis-pressed Ctrl-C to copy that here, so I have to start again.

My other run with a different seed is still running at 64 TiB as well and is still going.

imneme · 2019-06-23T21:23:11Z

It's all a bit of a PITA. When I tried yesterday to reproduce it with 0.94, I saw early differences from my run with some of the other tests. Rerunning with 0.93, I see the same output for these other tests which gives me more confidence. I looked briefly at backporting your pat5-only option to 0.93, but the code is structured rather differently, and as I contemplated doing so I wasn't sure what it would mean anyway if it did or didn't work with 0.93.

Alas, running the code on my iMac at home (a Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz) was supposed to be faster than my main test machine a (Intel(R) Xeon(R) CPU E7-4830 v3 @ 2.10GHz), but as the test wears on that is proving to not be the case.

(It's possible that the whole thing was a glitch related to a race condition in the -multithreaded option; but I can't say I'll be happy if that's the case, because you really want to feel you can trust that a test failure that shows up after running a test for a couple of weeks [or more than half a year in some cases] actually means what it says.)

rkern · 2019-06-23T21:37:34Z

Interesting. I'll do the backporting. If I can provoke the same failure, then I think that'll be confirmation enough. If not, then a full run is still probably warranted.

I've pushed an implementation of David Blackman's GJrand as well. It has many of the same desirable properties as SFC64, like each of the 2**128 unique seed states being at least 2**64 steps away from any other. It's moderately slower than JSF64/SFC64/Xoshiro256, about halfway between PCG64 and that lot on my 64-bit system.

imneme · 2019-06-24T00:16:21Z

One issue with all of these chaotic PRNGs is that their authors don't allow you to seed the full state of the generator, just part of it. For JSF, it's merely because the author did some testing on the limited range of allowed seedings (beyond the probabilistic guarantee that would have existed); for the counter-based variants, it's because the guarantees that the counter provides disappear if you let the counter have an arbitrary starting value.

Having less random seed data than the state size of the PRNG will typically introduce a kind of bias. This bias is only easily detectable if the space of allowed speeds is very small (e.g., 32 bits), otherwise we're in too big to fail territory.

To explore this, I just wrote seeding_bias.cpp, which is a modest tweak on code I wrote to show bias initializing the Mersenne Twister with simple seeds.

Here's what happens if we run it on jsf32

unix% g++ -Wall -O3 -std=c++14 seeding_bias.cpp -o jsf32_bias -DRNG_TYPE=jsf32 -DINCLUDE_FILE='"jsf.hpp"'

unix% ./jsf32_bias 1
Bias test of jsf32:
- Will enumerate through all 4294967296 seeds, reading 1 outputs from each one.
- Enumerating: ................................................................

Never occurring: 1, 3, 4, 8, 11, 12, 13, 17, 19, 20, 22, 23, 27, 28, 29, 31, 32, 40, 44, 48, ...
Most overrepresented: 264362982 -- repeated 12 times

unix% ./jsf32bias 4
Bias test of jsf32:
- Will enumerate through all RNG seeds, reading 4 outputs from each one.
- Enumerating: ................................................................

Never occurring: 130, 135, 387, 613, 685, 757, 766, 875, 890, 935, 983, 988, 1167, 1175, 1199, 1251, 1258, 1312, 1350, 1353, ...
Most overrepresented: 790806589 -- repeated 25 times

unix% ./jsf32bias 16     
Bias test of jsf32:
- Will enumerate through all 4294967296 seeds, reading 1 outputs from each one.
- Will enumerate through all RNG seeds, reading 16 outputs from each one.
- Enumerating: ................................................................

Never occurring: 3835856, 7814077, 7839806, 19677170, 22684530, 37104417, 37886080, 44763576, 44915593, 50908421, 51401130, 62306837, 83121896, 86907698, 101248875, 106001438, 106066804, 106908924, 109382321, 119572514, ...
Most overrepresented: 1674563904 -- repeated 47 times

This also happens for gjrand and sfc:

unix% g++ -Wall -O3 -std=c++14 seeding_bias.cpp -o gjrand_bias -DRNG_TYPE=gjrand32 -DINCLUDE_FILE='"gjrand.hpp"'

unix% g++-8 -Wall -O3 -std=c++14 seeding_bias.cpp -o sfc_bias -DRNG_TYPE=sfc32 -DINCLUDE_FILE='"sfc.hpp"'

unix% ./sfc_bias 1
Bias test of sfc32:
- Will enumerate through all 4294967296 seeds, reading 1 outputs from each one.
- Enumerating: ................................................................

Never occurring: 1, 4, 5, 8, 9, 10, 13, 18, 21, 23, 24, 28, 29, 30, 37, 38, 40, 41, 43, 45, ...
Most overrepresented: 1585543738 -- repeated 13 times

unix% ./gjrand_bias 1
Bias test of gjrand32:
- Will enumerate through all 4294967296 seeds, reading 1 outputs from each one.
- Enumerating: ................................................................

Never occurring: 6, 8, 12, 17, 18, 22, 23, 26, 27, 31, 34, 39, 40, 41, 43, 44, 45, 46, 47, 49, ...
Most overrepresented: 210398684 -- repeated 12 times

(These two PRNGs also allow three-word and two-word seeding respectively, which is too large to test, but would also technically have bias as well for the same reasons.)

Just to show that it doesn't have to be this way (i.e., if the amount of seed data matches the internal state size):

unix% g++ -Wall -O3 -std=c++14 seeding_bias.cpp -o xorshift32_bias -DRNG_TYPE=xorshift32plain32a -DINCLUDE_FILE='"xorshift.hpp"'

unix% g++ -Wall -O3 -std=c++14 seeding_bias.cpp -o xorshift32star16_bias -DRNG_TYPE=xorshift32star16a -DSEED_TYPE=uint32_t -DINCLUDE_FILE='"xorshift.hpp"'

unix% g++ -Wall -O3 -std=c++14 seeding_bias.cpp -o pcg32_once_insecure_bias -DRNG_TYPE='pcg32_once_insecure' -DSEED_TYPE=uint32_t -DINCLUDE_FILE='"pcg_random.hpp"'

unix% g++ -Wall -O3 -std=c++14 seeding_bias.cpp -o pcg16_bias -DRNG_TYPE='pcg_engines::setseq_xsh_rr_32_16' -DSEED_TYPE=uint32_t -DINCLUDE_FILE='"pcg_random.hpp"'

unix% ./xorshift32_bias      
Bias test of xorshift32plain32a:
- Will enumerate through all 4294967296 seeds, reading 1 outputs from each one.
- Enumerating: ................................................................

N
8000
ever occurring: (none)
Most overrepresented: (nothing overrepresented)

unix% ./xorshift32star16_bias
Bias test of xorshift32star16a:
- Will enumerate through all 4294967296 seeds, reading 1 outputs from each one.
- Enumerating: ................................................................

Never occurring: (none)
Most overrepresented: (nothing overrepresented)

unix% ./pcg32_once_insecure_bias 
Bias test of pcg32_once_insecure:
- Will enumerate through all 4294967296 seeds, reading 1 outputs from each one.
- Enumerating: ................................................................

Never occurring: (none)
Most overrepresented: (nothing overrepresented)

unix% ./pcg16_bias 
Bias test of pcg_engines::setseq_xsh_rr_32_16:
- Will enumerate through all 4294967296 seeds, reading 1 outputs from each one.
- Enumerating: ................................................................

Never occurring: (none)
Most overrepresented: (nothing overrepresented)

FWIW, for both pcg16 and pcg32_once_insecure, in one sense we're actually only providing half the state, because we haven't specified the stream (but could). This is a (rare) positive that comes from the fact that streams control a parameter of the generator rather than changing state.

rkern · 2019-06-24T03:32:49Z

Patch for just the Pat5 tests on PractRand 0.93.

imneme · 2019-06-24T04:36:16Z

Thanks. I had to also patch show_checkpoint to have return type void rather than double to avoid undefined (and problematic) behavior. Running now.

(Clang also warned about various other problems with the overall source.)

rkern · 2019-06-24T04:41:03Z

Yeah, I applied a patch for that as well.

mattip · 2019-06-24T14:12:27Z

Would it make sense to add GJrand, JSF64 and SFC64 to numpy/bitgenerators and then change this PR to only add SFC64 to numpy.random?

I understand then we would remove PCG64 and PCG32 from numpy.random as well.

bashtage · 2019-06-24T14:17:45Z

I don't think PCG64 is going away, is it?

mattip · 2019-06-24T14:19:16Z

As I understood it, the goal is to release 1.17.0 with two BitGenerators: MT19937 and one other. All the rest would live in numpy/bitgenerators.

bashtage · 2019-06-24T14:56:58Z

I might have missed that thread/comment.

imneme · 2019-06-24T16:10:12Z

With the pat5 only test with PractRand 0.93, I see:

rng=sfc64, seed=0xdf7e5fd0802274af
length= 64 terabytes (2^46 bytes), time= 29910 seconds
  no anomalies in 18 test result(s)

That leads me to believe it was some kind of strange one-time glitch in the tester and sfc64 gets a clean bill of health. It's good for sfc, but bad in other ways because it means we can't be as sure about drawing inferences from seeing PractRand fail something.

rkern · 2019-06-24T17:16:30Z

As I understood it, the goal is to release 1.17.0 with two BitGenerators: MT19937 and one other. All the rest would live in numpy/bitgenerators.

No consensus has been reached on that point. I proposed it, but was convinced that including Philox as a representative of the counter-based cryptoish PRNG clade was a good idea. One often uses those in a different way than other PRNGs thanks to their guarantees, so it would assist porting code that uses one of PRNGs (setting aside the actual numbers that come out). I still like PCG64 as the default as a good all-rounder; the same argument for a counter-based PRNG applies to jumpable PRNGs. But I think there's room for one more "speed-at-all-costs-except-quality", even on Win32, that SFC64 would provide.

My current proposal is PCG64 as the default, MT19937 for legacy, Philox and SFC64 for minimal amounts of diversity.

mattip · 2019-06-24T17:45:22Z

I added GJrand, JSF64, SFC64 to numpy/bitgenerators. Could you redo this PR against numpy/numpy to add SFC64, then we can move toward PCG64, MT19937, Philox, and SFC64 only.

rkern · 2019-06-24T17:46:38Z

That leads me to believe it was some kind of strange one-time glitch in the tester and sfc64 gets a clean bill of health. It's good for sfc, but bad in other ways because it means we can't be as sure about drawing inferences from seeing PractRand fail something.

I concur.

Possibly-insulting question: on the original machine which you ran the tests, did you have ECC memory? I always enjoy cosmic-ray-corrupted-my-process stories. But a bug in PractRand's multithreading seems more likely.

mattip · 2019-06-24T18:40:15Z

I updated the README to describe the bitgenerators, I hope I was accurate.

imneme · 2019-06-24T21:46:25Z

Possibly-insulting question: on the original machine which you ran the tests, did you have ECC memory? I always enjoy cosmic-ray-corrupted-my-process stories. But a bug in PractRand's multithreading seems more likely.

FWIW, my main test machine is a Silicon Mechanics Rackform R422.v5 [based on the SuperMicro X10QBL board], with four 12-core Intel(R) Xeon(R) CPU E7-4830 v3 @ 2.10GHz and 32 Samsung M393B2G70QH0-YK0 16GB DDR3 1600 ECC Registered DIMMs, giving me 96 hyperthreaded cores and 512 GB of goodness.

(It's a really nice machine!)

rkern · 2019-06-26T04:34:14Z

Replaced by numpy#13838

…_assign_subscript Merge in numpy-hpy from ss/array_transpose to labs-hpy-port * commit '801921dd9786794a410e5d193c019930419da168': Port more of array_assign_subscript Slices support in array_assign_subscript Revert "Comment out initialization of the legacy global PyObject variables and weakreflist" Port more of array_assign_subscript Port more of array_assign_subscript Partial port of np.transpose

mattip and others added 21 commits June 14, 2019 10:54

ENH: use SeedSequence in PCG64, fix tests

b445716

BUG: fix for 32 bits by converting to cython, adapt for python3.5

82453cd

BUG: Ensure uint32 arithmetic (numpy#37)

2b753dc

* BUG: Keep the integers uint32_t, add test

MAINT: fixes from review, fix tests

750431e

DOC: fix docstrings, remove seed from TOC

80e6861

BUG: increment n_children_spawned

109db70

ENH: move dSFMT, MT19937 and PCG32 to SeedSequence

4724fb0

MAINT: re-remove inc argument (from review)

f8a85cf

MAINT: changes from review

3c2dc06

MAINT: fix failing tests, add SeedSequence.state

82df9ce

ENH: add ISeedSequence, SeedlessSeedSequence (from review) and tests

7825b1b

ENH: Make ISeedSequence an ABC.

c9340d9

ENH: refactor SeedSequence use for MT19337 and refactor tests

84bf917

merge master into branch

0185678

BUG: fix merge, fix tests after merge

c02636d

SeedSequence doc and n_children_spawned fixes (numpy#39)

c16e612

* ENH: Expose and use n_children_spawned * DOC: docstring formatting and cleanups. * TST: more test coverage of SeedSequence.

DOC: fix method name

f07c79f

DOC, MAINT: fix See Also, index ABCs, make get_assembled_entropy pr…

f01d045

…ivate

ENH: Add JSF64 implementation.

0957f94

ENH: Add SFC64 implementation.

b456ea0

BUG: Left out SFC64

2dc840e

rkern mentioned this pull request Jun 22, 2019

MAINT: remove xoshiro* BitGenerators numpy/numpy#13793

Merged

ENH: Add GJrand PRNG.

91fe6dd

mattip mentioned this pull request Jun 25, 2019

MAINT: remove pcg32 BitGenerator numpy/numpy#13832

Merged

mattip force-pushed the seedsequence branch 2 times, most recently from f6e8d21 to c69f90c Compare June 25, 2019 22:47

rkern mentioned this pull request Jun 26, 2019

ENH: SFC64 BitGenerator numpy/numpy#13838

Merged

rkern closed this Jun 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: Small Fast Chaotic PRNGs #41

WIP: Small Fast Chaotic PRNGs #41

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WIP: Small Fast Chaotic PRNGs #41

WIP: Small Fast Chaotic PRNGs #41

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!