8000 DOC: np.random documentation cleanup and expansion. · rkern/numpy@ed72319 · GitHub
[go: up one dir, main page]

Skip to content

Commit ed72319

Browse files
committed
DOC: np.random documentation cleanup and expansion.
1 parent b976458 commit ed72319

File tree

10 files changed

+291
-184
lines changed

10 files changed

+291
-184
lines changed

doc/source/reference/random/bit_generators/index.rst

Lines changed: 65 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -17,21 +17,23 @@ Supported BitGenerators
1717

1818
The included BitGenerators are:
1919

20+
* PCG-64 - The default. A fast generator that supports many parallel streams
21+
and can be advanced by an arbitrary amount. See the documentation for
22+
:meth:`~.PCG64.advance`. PCG-64 has a period of :math:`2^{128}`. See the `PCG
23+
author's page`_ for more details about this class of PRNG.
2024
* MT19937 - The standard Python BitGenerator. Adds a `~mt19937.MT19937.jumped`
21-
function that returns a new generator with state as-if ``2**128`` draws have
25+
function that returns a new generator with state as-if :math:`2^{128}` draws have
2226
been made.
23-
* PCG-64 - Fast generator that support many parallel streams and
24-
can be advanced by an arbitrary amount. See the documentation for
25-
:meth:`~.PCG64.advance`. PCG-64 has a period of
26-
:math:`2^{128}`. See the `PCG author's page`_ for more details about
27-
this class of PRNG.
28-
* Philox - a counter-based generator capable of being advanced an
27+
* Philox - A counter-based generator capable of being advanced an
2928
arbitrary number of steps or generating independent streams. See the
3029
`Random123`_ page for more details about this class of bit generators.
30+
* SFC64 - A fast generator based on random invertible mappings. Usually the
31+
fastest generator of the four. See the `SFC author's page`_ for (a little)
32+
more detail.
3133

3234
.. _`PCG author's page`: http://www.pcg-random.org/
3335
.. _`Random123`: https://www.deshawresearch.com/resources_random123.html
34-
36+
.. _`SFC author's page`: http://pracrand.sourceforge.net/RNG_engines.txt
3537

3638
.. toctree::
3739
:maxdepth: 1
@@ -46,26 +48,65 @@ Seeding and Entropy
4648
-------------------
4749

4850
A BitGenerator provides a stream of random values. In order to generate
49-
reproducableis streams, BitGenerators support setting their initial state via a
50-
seed. But how best to seed the BitGenerator? On first impulse one would like to
51-
do something like ``[bg(i) for i in range(12)]`` to obtain 12 non-correlated,
52-
independent BitGenerators. However using a highly correlated set of seeds could
53-
generate BitGenerators that are correlated or overlap within a few samples.
54-
55-
NumPy uses a `SeedSequence` class to mix the seed in a reproducible way that
56-
introduces the necessary entropy to produce independent and largely non-
57-
overlapping streams. Small seeds are unable to fill the complete range of
58-
initializaiton states, and lead to biases among an ensemble of small-seed
59-
runs. For many cases, that doesn't matter. If you just want to hold things in
60-
place while you debug something, biases aren't a concern. For actual
61-
simulations whose results you care about, let ``SeedSequence(None)`` do its
62-
thing and then log/print the `SeedSequence.entropy` for repeatable
63-
`BitGenerator` streams.
51+
reproducible streams, BitGenerators support setting their initial state via a
52+
seed. All of the provided BitGenerators will take an arbitrary-sized
53+
non-negative integer, or a list of such integers, as a seed. BitGenerators
54+
need to take those inputs and process them into a high-quality internal state
55+
for the BitGenerator. All of the BitGenerators in numpy delegate that task to
56+
`~SeedSequence`, which uses hashing techniques to ensure that even low-quality
57+
seeds generate high-quality initial states.
58+
59+
.. code-block:: python
60+
61+
from numpy.random import PCG64
62+
63+
bg = PCG64(12345678903141592653589793)
64+
65+
.. end_block
66+
67+
`~SeedSequence` is designed to be convenient for implementing best practices.
68+
We recommend that a stochastic program defaults to using entropy from the OS so
69+
that each run is different. The program should print out or log that entropy.
70+
In order to reproduce a past value, the program should allow the user to
71+
provide that value through some mechanism, a command-line argument is common,
72+
so that the user can then re-enter that entropy to reproduce the result.
73+
`~SeedSequence` can take care of everything except for communicating with the
74+
user, which is up to you.
75+
76+
.. code-block:: python
77+
78+
from numpy.random import PCG64, SeedSequence
79+
80+
# Get the user's seed somehow, maybe through `argparse`.
81+
# If the user did not provide a seed, it should return `None`.
82+
seed = get_user_seed()
83+
ss = SeedSequence(seed)
84+
print(f'seed = {ss.entropy}')
85+
bg = PCG64(ss)
86+
87+
.. end_block
88+
89+
We default to using a 128-bit integer using entropy gathered from the OS. This
90+
is a good amount of entropy to initialize all of the generators that we have in
91+
numpy. We do not recommend using small seeds below 32 bits for general use.
92+
Using just a small set of seeds to instantiate larger state spaces means that
93+
there are some initial states that are impossible to reach. This creates some
94+
biases if everyone uses such values.
95+
96+
There will not be anything *wrong* with the results, per se; even a seed of
97+
0 is perfectly fine thanks to the processing that `~SeedSequence` does. If you
98+
just need *some* fixed value for unit tests or debugging, feel free to use
99+
whatever seed you like. But if you want to make inferences from the results or
100+
publish them, drawing from a larger set of seeds is good practice.
101+
102+
If you need to generate a good seed "offline", then ``SeedSequence().entropy``
103+
or using ``secrets.randbits(128)`` from the standard library are both
104+
convenient ways.
64105

65106
.. autosummary::
66107
:toctree: generated/
67108

109+
SeedSequence
68110
bit_generator.ISeedSequence
69111
bit_generator.ISpawnableSeedSequence
70-
SeedSequence
71112
bit_generator.SeedlessSeedSequence

doc/source/reference/random/index.rst

Lines changed: 23 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,6 @@ Numpy's random number routines produce pseudo random numbers using
99
combinations of a `BitGenerator` to create sequences and a `Generator`
1010
to use those sequences to sample from different statistical distributions:
1111

12-
* SeedSequence: Objects that provide entropy for the initial state of a
13-
BitGenerator. A good SeedSequence will provide initializations across the
14-
entire range of possible states for the BitGenerator, otherwise biases may
15-
creep into the generated bit streams.
1612
* BitGenerators: Objects that generate random numbers. These are typically
1713
unsigned integer words filled with sequences of either 32 or 64 random bits.
1814
* Generators: Objects that transform sequences of random bits from a
@@ -24,28 +20,28 @@ Since Numpy version 1.17.0 the Generator can be initialized with a
2420
number of different BitGenerators. It exposes many different probability
2521
distributions. See `NEP 19 <https://www.numpy.org/neps/
2622
nep-0019-rng-policy.html>`_ for context on the updated random Numpy number
27-
routines. The legacy `RandomState` random number routines are still
23+
routines. The legacy `~RandomState` random number routines are still
2824
available, but limited to a single BitGenerator.
2925

30-
For convenience and backward compatibility, a single `RandomState`
26+
For convenience and backward compatibility, a single `~RandomState`
3127
instance's methods are imported into the numpy.random namespace, see
3228
:ref:`legacy` for the complete list.
3329

3430
Quick Start
3531
-----------
3632

37-
By default, `Generator` uses normals provided by `PCG64` which will be
38-
statistically more reliable than the legacy methods in `RandomState`
33+
By default, `~Generator` uses normals provided by `~PCG64` which will be
34+
statistically more reliable than the legacy methods in `~RandomState`
3935

4036
.. code-block:: python
4137
4238
# Uses the old numpy.random.RandomState
4339
from numpy import random
4440
random.standard_normal()
4541
46-
`Generator` can be used as a direct replacement for `~RandomState`, although
42+
`~Generator` can be used as a direct replacement for `~RandomState`, although
4743
the random values are generated by `~PCG64`. The
48-
`Generator` holds an instance of a BitGenerator. It is accessible as
44+
`~Generator` holds an instance of a BitGenerator. It is accessible as
4945
``gen.bit_generator``.
5046

5147
.. code-block:: python
@@ -69,45 +65,37 @@ is wrapped with a `~.Generator`.
6965
7066
Introduction
7167
------------
72-
RandomGen takes a different approach to producing random numbers from the
73-
`RandomState` object. Random number generation is separated into three
74-
components, a seed sequence, a bit generator and a random generator.
68+
The new infrastructure takes a different approach to producing random numbers
69+
from the `RandomState` object. Random number generation is separated into
70+
two components, a bit generator and a random generator.
7571

7672
The `BitGenerator` has a limited set of responsibilities. It manages state
7773
and provides functions to produce random doubles and random unsigned 32- and
7874
64-bit values.
7975

80-
The `SeedSequence` takes a seed and provides the initial state for the
81-
`BitGenerator`. Since consecutive seeds can cause bad effects when comparing
82-
`BitGenerator` streams, the `SeedSequence` uses current best-practice methods
83-
to spread the initial state out. However small seeds may still be unable to
84-
reach all possible initialization states, which can cause biases among an
85-
ensemble of small-seed runs. For many cases, that doesn't matter. If you just
86-
want to hold things in place while you debug something, biases aren't a
87-
concern. For actual simulations whose results you care about, let
88-
``SeedSequence(None)`` do its thing and then log/print the
89-
`SeedSequence.entropy` for repeatable `BitGenerator` streams.
90-
9176
The `random generator <Generator>` takes the
9277
bit generator-provided stream and transforms them into more useful
9378
distributions, e.g., simulated normal random values. This structure allows
9479
alternative bit generators to be used with little code duplication.
9580

9681
The `Generator` is the user-facing object that is nearly identical to
9782
`RandomState`. The canonical method to initialize a generator passes a
98-
`~mt19937.MT19937` bit generator, the underlying bit generator in Python -- as
99-
the sole argument. Note that the BitGenerator must be instantiated.
83+
`~pcg64.PCG64` bit generator as 1C6A the sole argument.
84+
10085
.. code-block:: python
10186
102-
from numpy.random import Generator, PCG64
103-
rg = Generator(PCG64())
87+
from numpy.random import default_gen
88+
rg = default_gen(12345)
10489
rg.random()
10590
106-
Seed information is directly passed to the bit generator.
91+
One can also instantiate `Generator` directly with a `BitGenerator` instance.
92+
To use the older `~mt19937.MT19937` algorithm, one can instantiate it directly
93+
and pass it to `Generator`.
10794

10895
.. code-block:: python
10996
110-
rg = Generator(PCG64(12345))
97+
from numpy.random import Generator, MT19937
98+
rg = Generator(MT19937(12345))
11199
rg.random()
112100
113101
What's New or Different
@@ -146,6 +134,8 @@ What's New or Different
146134
* `~.Generator.random` is now the canonical way to generate floating-point
147135
random numbers, which replaces `random_sample`, `sample`, and `ranf`. This
148136
is consistent with Python's `random.random`.
137+
* All BitGenerators in numpy use `~SeedSequence` to process convert seeds into
138+
initialized states.
149139

150140
See :ref:`new-or-different` for a complete list of improvements and
151141
differences from the traditional ``Randomstate``.
@@ -154,10 +144,11 @@ Parallel Generation
154144
~~~~~~~~~~~~~~~~~~~
155145

156146
The included generators can be used in parallel, distributed applications in
157-
one of two ways:
147+
one of three ways:
158148

149+
* :ref:`seedsequence-spawn`
159150
* :ref:`independent-streams`
160-
* :ref:`jump-and-advance`
151+
* :ref:`parallel-jumped`
161152

162153
Concepts
163154
--------

doc/source/reference/random/legacy.rst

Lines changed: 10 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. currentmodule:: numpy.random
2+
13
.. _legacy:
24

35
Legacy Random Generation
@@ -8,26 +10,20 @@ no further improvements. It is guaranteed to produce the same values
810
as the final point release of NumPy v1.16. These all depend on Box-Muller
911
normals or inverse CDF exponentials or gammas. This class should only be used
1012
if it is essential to have randoms that are identical to what
11-
would have been produced by NumPy.
13+
would have been produced by previous versions of NumPy.
1214

1315
`~mtrand.RandomState` adds additional information
1416
to the state which is required when using Box-Muller normals since these
1517
are produced in pairs. It is important to use
1618
`~mtrand.RandomState.get_state`, and not the underlying bit generators
1719
`state`, when accessing the state so that these extra values are saved.
1820

19-
.. warning::
20-
21-
:class:`~randomgen.legacy.LegacyGenerator` only contains functions
22-
that have changed. Since it does not contain other functions, it
23-
is not directly possible to replace :class:`~numpy.random.RandomState`.
24-
In order to full replace :class:`~numpy.random.RandomState`, it is
25-
necessary to use both :class:`~randomgen.legacy.LegacyGenerator`
26-
and :class:`~randomgen.generator.RandomGenerator` both driven
27-
by the same basic RNG. Methods present in :class:`~randomgen.legacy.LegacyGenerator`
28-
must be called from :class:`~randomgen.legacy.LegacyGenerator`. Other Methods
29-
should be called from :class:`~randomgen.generator.RandomGenerator`.
30-
21+
Although we provide the `~mt19937.MT19937` BitGenerator for use independent of
22+
`~mtrand.RandomState`, note that its default seeding uses `~SeedSequence`
23+
rather than the legacy seeding algorithm. `~mtrand.RandomState` will use the
24+
legacy seeding algorithm. The methods to use the legacy seeding algorithm are
25+
currently private as the main reason to use them is just to implement
26+
`~mtrand.RandomState`.
3127

3228
.. code-block:: python
3329
@@ -39,7 +35,7 @@ are produced in pairs. It is important to use
3935
mt19937 = MT19937(12345)
4036
lg = RandomState(mt19937)
4137
42-
# Identical output
38+
# Different output, sorry.
4339
rs.standard_normal()
4440
lg.standard_normal()
4541

doc/source/reference/random/multithreading.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
Multithreaded Generation
22
========================
33

4-
The four core distributions all allow existing arrays to be filled using the
5-
``out`` keyword argument. Existing arrays need to be contiguous and
6-
well-behaved (writable and aligned). Under normal circumstances, arrays
4+
The four core distributions (:meth:`~.Generator.random`,
5+
:meth:`~.Generator.standard_normal`, :meth:`~.Generator.standard_exponential`,
6+
and :meth:`~.Generator.standard_gamma`) all allow existing arrays to be filled
7+
using the ``out`` keyword argument. Existing arrays need to be contiguous and
8+
well-behaved (writable and aligned). Under normal circumstances, arrays
79
created using the common constructors such as :meth:`numpy.empty` will satisfy
810
these requirements.
911

0 commit comments

Comments
 (0)
0