8000 Merge pull request #13849 from rkern/doc/random-cleanups · rkern/numpy@588310c · GitHub
[go: up one dir, main page]

Skip to content

Commit 588310c

Browse files
authored
Merge pull request numpy#13849 from rkern/doc/random-cleanups
DOC: np.random documentation cleanup and expansion.
2 parents 1461f1d + 4945048 commit 588310c

File tree

11 files changed

+343
-215
lines changed

11 files changed

+343
-215
lines changed

doc/source/reference/random/bit_generators/index.rst

Lines changed: 65 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -17,21 +17,23 @@ Supported BitGenerators
1717

1818
The included BitGenerators are:
1919

20+
* PCG-64 - The default. A fast generator that supports many parallel streams
21+
and can be advanced by an arbitrary amount. See the documentation for
22+
:meth:`~.PCG64.advance`. PCG-64 has a period of :math:`2^{128}`. See the `PCG
23+
author's page`_ for more details about this class of PRNG.
2024
* MT19937 - The standard Python BitGenerator. Adds a `~mt19937.MT19937.jumped`
21-
function that returns a new generator with state as-if ``2**128`` draws have
25+
function that returns a new generator with state as-if :math:`2^{128}` draws have
2226
been made.
23-
* PCG-64 - Fast generator that support many parallel streams and
24-
can be advanced by an arbitrary amount. See the documentation for
25-
:meth:`~.PCG64.advance`. PCG-64 has a period of
26-
:math:`2^{128}`. See the `PCG author's page`_ for more details about
27-
this class of PRNG.
28-
* Philox - a counter-based generator capable of being advanced an
27+
* Philox - A counter-based generator capable of being advanced an
2928
arbitrary number of steps or generating independent streams. See the
3029
`Random123`_ page for more details about this class of bit generators.
30+
* SFC64 - A fast generator based on random invertible mappings. Usually the
31+
fastest generator of the four. See the `SFC author's page`_ for (a little)
32+
more detail.
3133

3234
.. _`PCG author's page`: http://www.pcg-random.org/
3335
.. _`Random123`: https://www.deshawresearch.com/resources_random123.html
34-
36+
.. _`SFC author's page`: http://pracrand.sourceforge.net/RNG_engines.txt
3537

3638
.. toctree::
3739
:maxdepth: 1
@@ -46,26 +48,65 @@ Seeding and Entropy
4648
-------------------
4749

4850
A BitGenerator provides a stream of random values. In order to generate
49-
reproducableis streams, BitGenerators support setting their initial state via a
50-
seed. But how best to seed the BitGenerator? On first impulse one would like to
51-
do something like ``[bg(i) for i in range(12)]`` to obtain 12 non-correlated,
52-
independent BitGenerators. However using a highly correlated set of seeds could
53-
generate BitGenerators that are correlated or overlap within a few samples.
54-
55-
NumPy uses a `SeedSequence` class to mix the seed in a reproducible way that
56-
introduces the necessary entropy to produce independent and largely non-
57-
overlapping streams. Small seeds are unable to fill the complete range of
58-
initializaiton states, and lead to biases among an ensemble of small-seed
59-
runs. For many cases, that doesn't matter. If you just want to hold things in
60-
place while you debug something, biases aren't a concern. For actual
61-
simulations whose results you care about, let ``SeedSequence(None)`` do its
62-
thing and then log/print the `SeedSequence.entropy` for repeatable
63-
`BitGenerator` streams.
51+
reproducible streams, BitGenerators support setting their initial state via a
52+
seed. All of the provided BitGenerators will take an arbitrary-sized
53+
non-negative integer, or a list of such integers, as a seed. BitGenerators
54+
need to take those inputs and process them into a high-quality internal state
55+
for the BitGenerator. All of the BitGenerators in numpy delegate that task to
56+
`~SeedSequence`, which uses hashing techniques to ensure that even low-quality
57+
seeds generate high-quality initial states.
58+
59+
.. code-block:: python
60+
61+
from numpy.random import PCG64
62+
63+
bg = PCG64(12345678903141592653589793)
64+
65+
.. end_block
66+
67+
`~SeedSequence` is designed to be convenient for implementing best practices.
68+
We recommend that a stochastic program defaults to using entropy from the OS so
69+
that each run is different. The program should print out or log that entropy.
70+
In order to reproduce a past value, the program should allow the user to
71+
provide that value through some mechanism, a command-line argument is common,
72+
so that the user can then re-enter that entropy to reproduce the result.
73+
`~SeedSequence` can take care of everything except for communicating with the
74+
user, which is up to you.
75+
76+
.. code-block:: python
77+
78+
from numpy.random import PCG64, SeedSequence
79+
80+
# Get the user's seed somehow, maybe through `argparse`.
81+
# If the user did not provide a seed, it should return `None`.
82+
seed = get_user_seed()
83+
ss = SeedSequence(seed)
84+
print('seed = {}'.format(ss.entropy))
85+
bg = PCG64(ss)
86+
87+
.. end_block
88+
89+
We default to using a 128-bit integer using entropy gathered from the OS. This
90+
is a good amount of entropy to initialize all of the generators that we have in
91+
numpy. We do not recommend using small seeds below 32 bits for general use.
92+
Using just a small set of seeds to instantiate larger state spaces means that
93+
there are some initial states that are impossible to reach. This creates some
94+
biases if everyone uses such values.
95+
96+
There will not be anything *wrong* with the results, per se; even a seed of
97+
0 is perfectly fine thanks to the processing that `~SeedSequence` does. If you
98+
just need *some* fixed value for unit tests or debugging, feel free to use
99+
whatever seed you like. But if you want to make inferences from the results or
100+
publish them, drawing from a larger set of seeds is good practice.
101+
102+
If you need to generate a good seed "offline", then ``SeedSequence().entropy``
103+
or using ``secrets.randbits(128)`` from the standard library are both
104+
convenient ways.
64105

65106
.. autosummary::
66107
:toctree: generated/
67108

109+
SeedSequence
68110
bit_generator.ISeedSequence
69111
bit_generator.ISpawnableSeedSequence
70-
SeedSequence
71112
bit_generator.SeedlessSeedSequence

doc/source/reference/random/index.rst

Lines changed: 37 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,16 @@
11
.. _numpyrandom:
22

3+
.. py:module:: numpy.random
4+
35
.. currentmodule:: numpy.random
46

5-
numpy.random
6-
============
7+
Random sampling (:mod:`numpy.random`)
8+
=====================================
79

810
Numpy's random number routines produce pseudo random numbers using
911
combinations of a `BitGenerator` to create sequences and a `Generator`
1012
to use those sequences to sample from different statistical distributions:
1113

12-
* SeedSequence: Objects that provide entropy for the initial state of a
13-
BitGenerator. A good SeedSequence will provide initializations across the
14-
entire range of possible states for the BitGenerator, otherwise biases may
15-
creep into the generated bit streams.
1614
* BitGenerators: Objects that generate random numbers. These are typically
1715
unsigned integer words filled with sequences of either 32 or 64 random bits.
1816
* Generators: Objects that transform sequences of random bits from a
@@ -24,28 +22,28 @@ Since Numpy version 1.17.0 the Generator can be initialized with a
2422
number of different BitGenerators. It exposes many different probability
2523
distributions. See `NEP 19 <https://www.numpy.org/neps/
2624
nep-0019-rng-policy.html>`_ for context on the updated random Numpy number
27-
routines. The legacy `RandomState` random number routines are still
25+
routines. The legacy `.RandomState` random number routines are still
2826
available, but limited to a single BitGenerator.
2927

30-
For convenience and backward compatibility, a single `RandomState`
28+
For convenience and backward compatibility, a single `~.RandomState`
3129
instance's methods are imported into the numpy.random namespace, see
3230
:ref:`legacy` for the complete list.
3331

3432
Quick Start
3533
-----------
3634

37-
By default, `Generator` uses normals provided by `PCG64` which will be
38-
statistically more reliable than the legacy methods in `RandomState`
35+
By default, `~Generator` uses normals provided by `~pcg64.PCG64` which will be
36+
statistically more reliable than the legacy methods in `~.RandomState`
3937

4038
.. code-block:: python
4139
4240
# Uses the old numpy.random.RandomState
4341
from numpy import random
4442
random.standard_normal()
4543
46-
`Generator` can be used as a direct replacement for `~RandomState`, although
47-
the random values are generated by `~PCG64`. The
48-
`Generator` holds an instance of a BitGenerator. It is accessible as
44+
`~Generator` can be used as a direct replacement for `~.RandomState`, although
45+
the random values are generated by `~.PCG64`. The
46+
`~Generator` holds an instance of a BitGenerator. It is accessible as
4947
``gen.bit_generator``.
5048

5149
.. code-block:: python
@@ -69,45 +67,37 @@ is wrapped with a `~.Generator`.
6967
7068
Introduction
7169
------------
72-
RandomGen takes a different approach to producing random numbers from the
73-
`RandomState` object. Random number generation is separated into three
74-
components, a seed sequence, a bit generator and a random generator.
70+
The new infrastructure takes a different approach to producing random numbers
71+
from the `~.RandomState` object. Random number generation is separated into
72+
two components, a bit generator and a random generator.
7573

7674
The `BitGenerator` has a limited set of responsibilities. It manages state
7775
and provides functions to produce random doubles and random unsigned 32- and
7876
64-bit values.
7977

80-
The `SeedSequence` takes a seed and provides the initial state for the
81-
`BitGenerator`. Since consecutive seeds can cause bad effects when comparing
82-
`BitGenerator` streams, the `SeedSequence` uses current best-practice methods
83-
to spread the initial state out. However small seeds may still be unable to
84-
reach all possible initialization states, which can cause biases among an
85-
ensemble of small-seed runs. For many cases, that doesn't matter. If you just
86-
want to hold things in place while you debug something, biases aren't a
87-
concern. For actual simulations whose results you care about, let
88-
``SeedSequence(None)`` do its thing and then log/print the
89-
`SeedSequence.entropy` for repeatable `BitGenerator` streams.
90-
9178
The `random generator <Generator>` takes the
9279
bit generator-provided stream and transforms them into more useful
9380
distributions, e.g., simulated normal random values. This structure allows
9481
alternative bit generators to be used with little code duplication.
9582

9683
The `Generator` is the user-facing object that is nearly identical to
97-
`RandomState`. The canonical method to initialize a generator passes a
98-
`~mt19937.MT19937` bit generator, the underlying bit generator in Python -- as
99-
the sole argument. Note that the BitGenerator must be instantiated.
84+
`.RandomState`. The canonical method to initialize a generator passes a
85+
`~.PCG64` bit generator as the sole argument.
86+
10087
.. code-block:: python
10188
102-
from numpy.random import Generator, PCG64
103-
rg = Generator(PCG64())
89+
from numpy.random import default_gen
90+
rg = default_gen(12345)
10491
rg.random()
10592
106-
Seed information is directly passed to the bit generator.
93+
One can also instantiate `Generator` directly with a `BitGenerator` instance.
94+
To use the older `~mt19937.MT19937` algorithm, one can instantiate it directly
95+
and pass it to `Generator`.
10796

10897
.. code-block:: python
10998
110-
rg = Generator(PCG64(12345))
99+
from numpy.random import Generator, MT19937
100+ 1241
rg = Generator(MT19937(12345))
111101
rg.random()
112102
113103
What's New or Different
@@ -117,9 +107,9 @@ What's New or Different
117107
The Box-Muller method used to produce NumPy's normals is no longer available
118108
in `Generator`. It is not possible to reproduce the exact random
119109
values using Generator for the normal distribution or any other
120-
distribution that relies on the normal such as the `numpy.random.gamma` or
121-
`numpy.random.standard_t`. If you require bitwise backward compatible
122-
streams, use `RandomState`.
110+
distribution that relies on the normal such as the `.RandomState.gamma` or
111+
`.RandomState.standard_t`. If you require bitwise backward compatible
112+
streams, use `.RandomState`.
123113

124114
* The Generator's normal, exponential and gamma functions use 256-step Ziggurat
125115
methods which are 2-10 times faster than NumPy's Box-Muller or inverse CDF
@@ -133,9 +123,8 @@ What's New or Different
133123
source of randomness that is used in cryptographic applications (e.g.,
134124
``/dev/urandom`` on Unix).
135125
* All BitGenerators can produce doubles, uint64s and uint32s via CTypes
136-
(`~PCG64.ctypes`) and CFFI
137-
(:meth:`~PCG64.cffi`). This allows the bit generators to
138-
be used in numba.
126+
(`~.PCG64.ctypes`) and CFFI (`~.PCG64.cffi`). This allows the bit generators
127+
to be used in numba.
139128
* The bit generators can be used in downstream projects via
140129
:ref:`Cython <randomgen_cython>`.
141130
* `~.Generator.integers` is now the canonical way to generate integer
@@ -144,8 +133,11 @@ What's New or Different
144133
The ``endpoint`` keyword can be used to specify open or closed intervals.
145134
This replaces both ``randint`` and the deprecated ``random_integers``.
146135
* `~.Generator.random` is now the canonical way to generate floating-point
147-
random numbers, which replaces `random_sample`, `sample`, and `ranf`. This
148-
is consistent with Python's `random.random`.
136+
random numbers, which replaces `.RandomState.random_sample`,
137+
`.RandomState.sample`, and `.RandomState.ranf`. This is consistent with
138+
Python's `random.random`.
139+
* All BitGenerators in numpy use `~SeedSequence` to convert seeds into
140+
initialized states.
149141

150142
See :ref:`new-or-different` for a complete list of improvements and
151143
differences from the traditional ``Randomstate``.
@@ -154,10 +146,11 @@ Parallel Generation
154146
~~~~~~~~~~~~~~~~~~~
155147

156148
The included generators can be used in parallel, distributed applications in
157-
one of two ways:
149+
one of three ways:
158150

151+
* :ref:`seedsequence-spawn`
159152
* :ref:`independent-streams`
160-
* :ref:`jump-and-advance`
153+
* :ref:`parallel-jumped`
161154

162155
Concepts
163156
--------

doc/source/reference/random/legacy.rst

Lines changed: 17 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. currentmodule:: numpy.random
2+
13
.. _legacy:
24

35
Legacy Random Generation
@@ -8,46 +10,41 @@ no further improvements. It is guaranteed to produce the same values
810
as the final point release of NumPy v1.16. These all depend on Box-Muller
911
normals or inverse CDF exponentials or gammas. This class should only be used
1012
if it is essential to have randoms that are identical to what
11-
would have been produced by NumPy.
13+
would have been produced by previous versions of NumPy.
1214

1315
`~mtrand.RandomState` adds additional information
1416
to the state which is required when using Box-Muller normals since these
1517
are produced in pairs. It is important to use
1618
`~mtrand.RandomState.get_state`, and not the underlying bit generators
1719
`state`, when accessing the state so that these extra values are saved.
1820

19-
.. warning::
20-
21-
:class:`~randomgen.legacy.LegacyGenerator` only contains functions
22-
that have changed. Since it does not contain other functions, it
23-
is not directly possible to replace :class:`~numpy.random.RandomState`.
24-
In order to full replace :class:`~numpy.random.RandomState`, it is
25-
necessary to use both :class:`~randomgen.legacy.LegacyGenerator`
26-
and :class:`~randomgen.generator.RandomGenerator` both driven
27-
by the same basic RNG. Methods present in :class:`~randomgen.legacy.LegacyGenerator`
28-
must be called from :class:`~randomgen.legacy.LegacyGenerator`. Other Methods
29-
should be called from :class:`~randomgen.generator.RandomGenerator`.
30-
21+
Although we provide the `~mt19937.MT19937` BitGenerator for use independent of
22+
`~mtrand.RandomState`, note that its default seeding uses `~SeedSequence`
23+
rather than the legacy seeding algorithm. `~mtrand.RandomState` will use the
24+
legacy seeding algorithm. The methods to use the legacy seeding algorithm are
25+
currently private as the main reason to use them is just to implement
26+
`~mtrand.RandomState`. However, one can reset the state of `~mt19937.MT19937`
27+
using the state of the `~mtrand.RandomState`:
3128

3229
.. code-block:: python
3330
3431
from numpy.random import MT19937
3532
from numpy.random import RandomState
3633
37-
# Use same seed
3834
rs = RandomState(12345)
39-
mt19937 = MT19937(12345)
40-
lg = RandomState(mt19937)
35+
mt19937 = MT19937()
36+
mt19937.state = rs.get_state()
37+
rs2 = RandomState(mt19937)
4138
42-
# Identical output
39+
# Same output
4340
rs.standard_normal()
44-
lg.standard_normal()
41+
rs2.standard_normal()
4542
4643
rs.random()
47-
lg.random()
44+
rs2.random()
4845
4946
rs.standard_exponential()
50-
lg.standard_exponential()
47+
rs2.standard_exponential()
5148
5249
5350
.. currentmodule:: numpy.random.mtrand

doc/source/reference/random/multithreading.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
Multithreaded Generation
22
========================
33

4-
The four core distributions all allow existing arrays to be filled using the
5-
``out`` keyword argument. Existing arrays need to be contiguous and
6-
well-behaved (writable and aligned). Under normal circumstances, arrays
4+
The four core distributions (:meth:`~.Generator.random`,
5+
:meth:`~.Generator.standard_normal`, :meth:`~.Generator.standard_exponential`,
6+
and :meth:`~.Generator.standard_gamma`) all allow existing arrays to be filled
7+
using the ``out`` keyword argument. Existing arrays need to be contiguous and
8+
well-behaved (writable and aligned). Under normal circumstances, arrays
79
created using the common constructors such as :meth:`numpy.empty` will satisfy
810
these requirements.
911

0 commit comments

Comments
 (0)
0