8000 ENH: use SeedSequence to generate entropy for seeding · numpy/numpy@efa35e7 · GitHub
[go: up one dir, main page]

Skip to content

Commit efa35e7

Browse files
committed
ENH: use SeedSequence to generate entropy for seeding
1 parent 8bb4645 commit efa35e7

32 files changed

+7461
-7308
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
:orphan:
2+
3+
BitGenerator
4+
------------
5+
6+
.. currentmodule:: numpy.random.bit_generator
7+
8+
.. autosummary::
9+
:toctree: generated/
10+
11+
BitGenerator
Lines changed: 30 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,49 @@
11
.. _bit_generator:
22

3+
.. currentmodule:: numpy.random
4+
35
Bit Generators
46
--------------
57

6-
.. currentmodule:: numpy.random
7-
88
The random values produced by :class:`~Generator`
99
orignate in a BitGenerator. The BitGenerators do not directly provide
1010
random numbers and only contains methods used for seeding, getting or
1111
setting the state, jumping or advancing the state, and for accessing
1212
low-level wrappers for consumption by code that can efficiently
1313
access the functions provided, e.g., `numba <https://numba.pydata.org>`_.
1414

15-
Stable RNGs
16-
===========
17-
1815
.. toctree::
1916
:maxdepth: 1
2017

18+
BitGenerator <bitgenerators>
2119
MT19937 <mt19937>
2220
PCG64 <pcg64>
2321
Philox <philox>
2422

23+
Seeding and Entropy
24+
-------------------
25+
26+
A BitGenerator provides a stream of random values. In order to generate
27+
reproducableis streams, BitGenerators support setting their initial state via a
28+
seed. But how best to seed the BitGenerator? On first impulse one would like to
29+
do something like ``[bg(i) for i in range(12)]`` to obtain 12 non-correlated,
30+
independent BitGenerators. However using a highly correlated set of seeds could
31+
generate BitGenerators that are correlated or overlap within a few samples.
32+
33+
NumPy uses a `SeedSequence` class to mix the seed in a reproducible way that
34+
introduces the necessary entropy to produce independent and largely non-
35+
overlapping streams. Small seeds may still be unable to reach all possible
36+
initialization states, which can cause biases among an ensemble of small-seed
37+
runs. For many cases, that doesn't matter. If you just want to hold things in
38+
place while you debug something, biases aren't a concern. For actual
39+
simulations whose results you care about, let ``SeedSequence(None)`` do its
40+
thing and then log/print the `SeedSequence.entropy` for repeatable
41+
`BitGenerator` streams.
42+
43+
.. autosummary::
44+
:toctree: generated/
45+
46+
bit_generator.ISeedSequence
47+
bit_generator.ISpawnableSeedSequence
48+
SeedSequence
49+
bit_generator.SeedlessSeedSequence

doc/source/reference/random/bit_generators/mt19937.rst

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,12 @@ Mersenne Twister (MT19937)
88
.. autoclass:: MT19937
99
:exclude-members:
1010

11-
Seeding and State
12-
=================
11+
State
12+
=====
1313

1414
.. autosummary::
1515
:toctree: generated/
1616

17-
~MT19937.seed
1817
~MT19937.state
1918

2019
Parallel generation

doc/source/reference/random/bit_generators/pcg64.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@ Seeding and State
1414
.. autosummary::
1515
:toctree: generated/
1616

17-
~PCG64.seed
1817
~PCG64.state
1918

2019
Parallel generation

doc/source/reference/random/bit_generators/philox.rst

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,12 @@ Philox Counter-based RNG
88
.. autoclass:: Philox
99
:exclude-members:
1010

11-
Seeding and State
12-
=================
11+
State
12+
=====
1313

1414
.. autosummary::
1515
:toctree: generated/
1616

17-
~Philox.seed
1817
~Philox.state
1918

2019
Parallel generation

doc/source/reference/random/index.rst

Lines changed: 39 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ Numpy's random number routines produce pseudo random numbers using
99
combinations of a `BitGenerator` to create sequences and a `Generator`
1010
to use those sequences to sample from different statistical distributions:
1111

12+
* SeedSequence: Objects that provide entropy for the initial state of a
13+
BitGenerator. A good SeedSequence will provide initializations across the
14+
entire range of possible states for the BitGenerator, otherwise biases may
15+
creep into the generated bit streams.
1216
* BitGenerators: Objects that generate random numbers. These are typically
1317
unsigned integer words filled with sequences of either 32 or 64 random bits.
1418
* Generators: Objects that transform sequences of random bits from a
@@ -52,28 +56,37 @@ the random values are generated by `~PCG64`. The
5256
rg.standard_normal()
5357
rg.bit_generator
5458
55-
56-
Seeds can be passed to any of the BitGenerators. Here `mt19937.MT19937` is used
57-
and is the wrapped with a `~.Generator`.
58-
59+
Seeds can be passed to any of the BitGenerators. The provided value is mixed
60+
via `~.SeedSequence` to spread a possible sequence of seeds across a wider
61+
range of initialization states for the BitGenerator. Here `~.PCG64` is used and
62+
is wrapped with a `~.Generator`.
5963

6064
.. code-block:: python
6165
62-
from numpy.random import Generator, MT19937
63-
rg = Generator(MT19937(12345))
66+
from numpy.random import Generator, PCG64
67+
rg = Generator(PCG64(12345))
6468
rg.standard_normal()
6569
66-
6770
Introduction
6871
------------
6972
RandomGen takes a different approach to producing random numbers from the
70-
`RandomState` object. Random number generation is separated into two
71-
components, a bit generator and a random generator.
73+
`RandomState` object. Random number generation is separated into three
74+
components, a seed sequence, a bit generator and a random generator.
7275

73-
The bit generator has a limited set of responsibilities. It manages state
76+
The `BitGenerator` has a limited set of responsibilities. It manages state
7477
and provides functions to produce random doubles and random unsigned 32- and
75-
64-bit values. The bit generator also handles all seeding which varies with
76-
different bit generators.
78+
64-bit values.
79+
80+
The `SeedSequence` takes a seed and provides the initial state for the
81+
`BitGenerator`. Since consecutive seeds can cause bad effects when comparing
82+
`BitGenerator` streams, the `SeedSequence` uses current best-practice methods
83+
to spread the initial state out. However small seeds may still be unable to
84+
reach all possible initialization states, which can cause biases among an
85+
ensemble of small-seed runs. For many cases, that doesn't matter. If you just
86+
want to hold things in place while you debug something, biases aren't a
87+
concern. For actual simulations whose results you care about, let
88+
``SeedSequence(None)`` do its thing and then log/print the
89+
`SeedSequence.entropy` for repeatable `BitGenerator` streams.
7790

7891
The `random generator <Generator>` takes the
7992
bit generator-provided stream and transforms them into more useful
@@ -86,15 +99,15 @@ The `Generator` is the user-facing object that is nearly identical to
8699
the sole argument. Note that the BitGenerator must be instantiated.
87100
.. code-block:: python
88101
89-
from numpy.random import Generator, MT19937
90-
rg = Generator(MT19937())
102+
from numpy.random import Generator, PCG64
103+
rg = Generator(PCG64())
91104
rg.random()
92105
93106
Seed information is directly passed to the bit generator.
94107

95108
.. code-block:: python
96109
97-
rg = Generator(MT19937(12345))
110+
rg = Generator(PCG64(12345))
98111
rg.random()
99112
100113
What's New or Different
@@ -150,9 +163,14 @@ Supported BitGenerators
150163
-----------------------
151164
The included BitGenerators are:
152165

153-
* MT19937 - The standard Python BitGenerator. Produces identical results to
154-
Python using the same seed/state. Adds a `~mt19937.MT19937.jumped` function
155-
that returns a new generator with state as-if ``2**128`` draws have been made.
166+
* MT19937 - The standard Python BitGenerator. Adds a `~mt19937.MT19937.jumped`
167+
function that returns a new generator with state as-if ``2**128`` draws have
168+
been made.
169+
* PCG-64 - Fast generator that support many parallel streams and
170+
can be advanced by an arbitrary amount. See the documentation for
171+
:meth:`~.PCG64.advance`. PCG-64 has a period of
172+
:math:`2^{128}`. See the `PCG author's page`_ for more details about
173+
this class of PRNG.
156174
* Xorshiro256** and Xorshiro512** - The most recently introduced XOR,
157175
shift, and rotate generator. Supports ``jumped`` and so can be used in
158176
parallel applications. See the documentation for
@@ -163,21 +181,14 @@ The included BitGenerators are:
163181
.. _`PCG author's page`: http://www.pcg-random.org/
164182
.. _`Random123`: https://www.deshawresearch.com/resources_random123.html
165183

166-
Generator
167-
---------
184+
Concepts
185+
--------
168186
.. toctree::
169187
:maxdepth: 1
170188

171189
generator
172190
legacy mtrand <legacy>
173-
174-
BitGenerators
175-
-------------
176-
177-
.. toctree::
178-
:maxdepth: 1
179-
180-
BitGenerators <bit_generators/index>
191+
BitGenerators, SeedSequences <bit_generators/index>
181192
182193
Features
183194
--------

numpy/random/__init__.py

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@
1616
bytes Uniformly distributed random bytes.
1717
permutation Randomly permute a sequence / generate a random sequence.
1818
shuffle Randomly permute a sequence in place.
19-
seed Seed the random number generator.
2019
choice Random sample from 1-D array.
2120
==================== =========================================================
2221
@@ -32,6 +31,7 @@
3231
(deprecated, use ``integers(..., closed=True)`` instead)
3332
random_sample Alias for `random_sample`
3433
randint Uniformly distributed integers in a given range
34+
seed Seed the legacy random number generator.
3535
==================== =========================================================
3636
3737
==================== =========================================================
@@ -102,6 +102,12 @@
102102
Philox
103103
============================================= ===
104104
105+
============================================= ===
106+
Getting entropy to initialize a BitGenerator
107+
--------------------------------------------- ---
108+
SeedSequence
109+
============================================= ===
110+
105111
"""
106112
from __future__ import division, absolute_import, print_function
107113

@@ -161,22 +167,25 @@
161167
from . import mtrand
162168
from .mtrand import *
163169
from .generator import Generator
170+
from .bit_generator import SeedSequence
164171
from .mt19937 import MT19937
165172
from .pcg64 import PCG64
166173
from .philox import Philox
167174
from .mtrand import RandomState
168175

169-
__all__ += ['Generator', 'MT19937', 'Philox', 'PCG64', 'RandomState']
176+
__all__ += ['Generator', 'RandomState', 'SeedSequence', 'MT19937',
177+
'Philox', 'PCG64']
178+
170179

171180
def __RandomState_ctor():
172181
"""Return a RandomState instance.
173182
174183
This function exists solely to assist (un)pickling.
175184
176-
Note that the state of the RandomState returned here is irrelevant, as this function's
177-
entire purpose is to return a newly allocated RandomState whose state pickle can set.
178-
Consequently the RandomState returned by this function is a freshly allocated copy
179-
with a seed=0.
185+
Note that the state of the RandomState returned here is irrelevant, as this
186+
function's entire purpose is to return a newly allocated RandomState whose
187+
state pickle can set. Consequently the RandomState returned by this function
188+
is a freshly allocated copy with a seed=0.
180189
181190
See https://github.com/numpy/numpy/issues/4763 for a detailed discussion
182191

numpy/random/bit_generator.pxd

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
2+
from .common cimport bitgen_t
3+
cimport numpy as np
4+
5+
cdef class BitGenerator():
6+
cdef readonly object _seed_seq
7+
cdef readonly object lock
8+
cdef bitgen_t _bitgen
9+
cdef readonly object _ctypes
10+
cdef readonly object _cffi
11+
cdef readonly object capsule
12+
13+
14+
cdef class SeedSequence():
15+
cdef readonly object entropy
16+
cdef readonly object program_entropy
17+
cdef readonly tuple spawn_key
18+
cdef readonly int pool_size
19+
cdef readonly object pool
20+
cdef readonly int n_children_spawned
21+
22+
cdef mix_entropy(self, np.ndarray[np.npy_uint32, ndim=1] mixer,
23+
np.ndarray[np.npy_uint32, ndim=1] entropy_array)
24+
cdef get_assembled_entropy(self)
25+
26+
cdef class SeedlessSequence():
27+
pass

0 commit comments

Comments
 (0)
0