8000 DOC: np.random documentation cleanup and expansion. by rkern · Pull Request #13849 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

DOC: np.random documentation cleanup and expansion. #13849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 28, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 65 additions & 24 deletions doc/source/reference/random/bit_generators/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,23 @@ Supported BitGenerators

The included BitGenerators are:

* PCG-64 - The default. A fast generator that supports many parallel streams
and can be advanced by an arbitrary amount. See the documentation for
:meth:`~.PCG64.advance`. PCG-64 has a period of :math:`2^{128}`. See the `PCG
author's page`_ for more details about this class of PRNG.
* MT19937 - The standard Python BitGenerator. Adds a `~mt19937.MT19937.jumped`
function that returns a new generator with state as-if ``2**128`` draws have
function that returns a new generator with state as-if :math:`2^{128}` draws have
been made.
* PCG-64 - Fast generator that support many parallel streams and
can be advanced by an arbitrary amount. See the documentation for
:meth:`~.PCG64.advance`. PCG-64 has a period of
:math:`2^{128}`. See the `PCG author's page`_ for more details about
this class of PRNG.
* Philox - a counter-based generator capable of being advanced an
* Philox - A counter-based generator capable of being advanced an
arbitrary number of steps or generating independent streams. See the
`Random123`_ page for more details about this class of bit generators.
* SFC64 - A fast generator based on random invertible mappings. Usually the
fastest generator of the four. See the `SFC author's page`_ for (a little)
more detail.

.. _`PCG author's page`: http://www.pcg-random.org/
.. _`Random123`: https://www.deshawresearch.com/resources_random123.html

.. _`SFC author's page`: http://pracrand.sourceforge.net/RNG_engines.txt

.. toctree::
:maxdepth: 1
Expand All @@ -46,26 +48,65 @@ Seeding and Entropy
-------------------

A BitGenerator provides a stream of random values. In order to generate
reproducableis streams, BitGenerators support setting their initial state via a
seed. But how best to seed the BitGenerator? On first impulse one would like to
do something like ``[bg(i) for i in range(12)]`` to obtain 12 non-correlated,
independent BitGenerators. However using a highly correlated set of seeds could
generate BitGenerators that are correlated or overlap within a few samples.

NumPy uses a `SeedSequence` class to mix the seed in a reproducible way that
introduces the necessary entropy to produce independent and largely non-
overlapping streams. Small seeds are unable to fill the complete range of
initializaiton states, and lead to biases among an ensemble of small-seed
runs. For many cases, that doesn't matter. If you just want to hold things in
place while you debug something, biases aren't a concern. For actual
simulations whose results you care about, let ``SeedSequence(None)`` do its
thing and then log/print the `SeedSequence.entropy` for repeatable
`BitGenerator` streams.
reproducible streams, BitGenerators support setting their initial state via a
seed. All of the provided BitGenerators will take an arbitrary-sized
non-negative integer, or a list of such integers, as a seed. BitGenerators
need to take those inputs and process them into a high-quality internal state
for the BitGenerator. All of the BitGenerators in numpy delegate that task to
`~SeedSequence`, which uses hashing techniques to ensure that even low-quality
seeds generate high-quality initial states.

.. code-block:: python

from numpy.random import PCG64

bg = PCG64(12345678903141592653589793)

.. end_block

`~SeedSequence` is designed to be convenient for implementing best practices.
We recommend that a stochastic program defaults to using entropy from the OS so
that each run is different. The program should print out or log that entropy.
In order to reproduce a past value, the program should allow the user to
provide that value through some mechanism, a command-line argument is common,
so that the user can then re-enter that entropy to reproduce the result.
`~SeedSequence` can take care of everything except for communicating with the
user, which is up to you.

.. code-block:: python

from numpy.random import PCG64, SeedSequence

# Get the user's seed somehow, maybe through `argparse`.
# If the user did not provide a seed, it should return `None`.
seed = get_user_seed()
ss = SeedSequence(seed)
print('seed = {}'.format(ss.entropy))
bg = PCG64(ss)

.. end_block

We default to using a 128-bit integer using entropy gathered from the OS. This
is a good amount of entropy to initialize all of the generators that we have in
numpy. We do not recommend using small seeds below 32 bits for general use.
Using just a small set of seeds to instantiate larger state spaces means that
there are some initial states that are impossible to reach. This creates some
biases if everyone uses such values.

There will not be anything *wrong* with the results, per se; even a seed of
0 is perfectly fine thanks to the processing that `~SeedSequence` does. If you
just need *some* fixed value for unit tests or debugging, feel free to use
whatever seed you like. But if you want to make inferences from the results or
publish them, drawing from a larger set of seeds is good practice.

If you need to generate a good seed "offline", then ``SeedSequence().entropy``
or using ``secrets.randbits(128)`` from the standard library are both
convenient ways.

.. autosummary::
:toctree: generated/

SeedSequence
bit_generator.ISeedSequence
bit_generator.ISpawnableSeedSequence
SeedSequence
bit_generator.SeedlessSeedSequence
81 changes: 37 additions & 44 deletions doc/source/reference/random/index.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
.. _numpyrandom:

.. py:module:: numpy.random

.. currentmodule:: numpy.random

numpy.random
============
Random sampling (:mod:`numpy.random`)
=====================================

Numpy's random number routines produce pseudo random numbers using
combinations of a `BitGenerator` to create sequences and a `Generator`
to use those sequences to sample from different statistical distributions:

* SeedSequence: Objects that provide entropy for the initial state of a
BitGenerator. A good SeedSequence will provide initializations across the
entire range of possible states for the BitGenerator, otherwise biases may
creep into the generated bit streams.
* BitGenerators: Objects that generate random numbers. These are typically
unsigned integer words filled with sequences of either 32 or 64 random bits.
* Generators: Objects that transform sequences of random bits from a
Expand All @@ -24,28 +22,28 @@ Since Numpy version 1.17.0 the Generator can be initialized with a
number of different BitGenerators. It exposes many different probability
distributions. See `NEP 19 <https://www.numpy.org/neps/
nep-0019-rng-policy.html>`_ for context on the updated random Numpy number
routines. The legacy `RandomState` random number routines are still
routines. The legacy `.RandomState` random number routines are still
available, but limited to a single BitGenerator.

For convenience and backward compatibility, a single `RandomState`
For convenience and backward compatibility, a single `~.RandomState`
instance's methods are imported into the numpy.random namespace, see
:ref:`legacy` for the complete list.

Quick Start
-----------

By default, `Generator` uses normals provided by `PCG64` which will be
statistically more reliable than the legacy methods in `RandomState`
By default, `~Generator` uses normals provided by `~pcg64.PCG64` which will be
statistically more reliable than the legacy methods in `~.RandomState`

.. code-block:: python

# Uses the old numpy.random.RandomState
from numpy import random
random.standard_normal()

`Generator` can be used as a direct replacement for `~RandomState`, although
the random values are generated by `~PCG64`. The
`Generator` holds an instance of a BitGenerator. It is accessible as
`~Generator` can be used as a direct replacement for `~.RandomState`, although
the random values are generated by `~.PCG64`. The
`~Generator` holds an instance of a BitGenerator. It is accessible as
``gen.bit_generator``.

.. code-block:: python
Expand All @@ -69,45 +67,37 @@ is wrapped with a `~.Generator`.

Introduction
------------
RandomGen takes a different approach to producing random numbers from the
`RandomState` object. Random number generation is separated into three
components, a seed sequence, a bit generator and a random generator.
The new infrastructure takes a different approach to producing random numbers
from the `~.RandomState` object. Random number generation is separated into
two components, a bit generator and a random generator.

The `BitGenerator` has a limited set of responsibilities. It manages state
and provides functions to produce random doubles and random unsigned 32- and
64-bit values.

The `SeedSequence` takes a seed and provides the initial state for the
`BitGenerator`. Since consecutive seeds can cause bad effects when comparing
`BitGenerator` streams, the `SeedSequence` uses current best-practice methods
to spread the initial state out. However small seeds may still be unable to
reach all possible initialization states, which can cause biases among an
ensemble of small-seed runs. For many cases, that doesn't matter. If you just
want to hold things in place while you debug something, biases aren't a
concern. For actual simulations whose results you care about, let
``SeedSequence(None)`` do its thing and then log/print the
`SeedSequence.entropy` for repeatable `BitGenerator` streams.

The `random generator <Generator>` takes the
bit generator-provided stream and transforms them into more useful
distributions, e.g., simulated normal random values. This structure allows
alternative bit generators to be used with little code duplication.

The `Generator` is the user-facing object that is nearly identical to
`RandomState`. The canonical method to initialize a generator passes a
`~mt19937.MT19937` bit generator, the underlying bit generator in Python -- as
the sole argument. Note that the BitGenerator must be instantiated.
`.RandomState`. The canonical method to initialize a generator passes a
`~.PCG64` bit generator as the sole argument.

.. code-block:: python

from numpy.random import Generator, PCG64
rg = Generator(PCG64())
from numpy.random import default_gen
rg = default_gen(12345)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. This depends on #13840

rg.random()

Seed information is directly passed to the bit generator.
One can also instantiate `Generator` directly with a `BitGenerator` instance.
To use the older `~mt19937.MT19937` algorithm, one can instantiate it directly
and pass it to `Generator`.

.. code-block:: python

rg = Generator(PCG64(12345))
from numpy.random import Generator, MT19937
rg = Generator(MT19937(12345))
rg.random()

What's New or Different
Expand All @@ -117,9 +107,9 @@ What's New or Different
The Box-Muller method used to produce NumPy's normals is no longer available
in `Generator`. It is not possible to reproduce the exact random
values using Generator for the normal distribution or any other
distribution that relies on the normal such as the `numpy.random.gamma` or
`numpy.random.standard_t`. If you require bitwise backward compatible
streams, use `RandomState`.
distribution that relies on the normal such as the `.RandomState.gamma` or
`.RandomState.standard_t`. If you require bitwise backward compatible
streams, use `.RandomState`.

* The Generator's normal, exponential and gamma functions use 256-step Ziggurat
methods which are 2-10 times faster than NumPy's Box-Muller or inverse CDF
Expand All @@ -133,9 +123,8 @@ What's New or Different
source of randomness that is used in cryptographic applications (e.g.,
``/dev/urandom`` on Unix).
* All BitGenerators can produce doubles, uint64s and uint32s via CTypes
(`~PCG64.ctypes`) and CFFI
(:meth:`~PCG64.cffi`). This allows the bit generators to
be used in numba.
(`~.PCG64.ctypes`) and CFFI (`~.PCG64.cffi`). This allows the bit generators
to be used in numba.
* The bit generators can be used in downstream projects via
:ref:`Cython <randomgen_cython>`.
* `~.Generator.integers` is now the canonical way to generate integer
Expand All @@ -144,8 +133,11 @@ What's New or Different
The ``endpoint`` keyword can be used to specify open or closed intervals.
This replaces both ``randint`` and the deprecated ``random_integers``.
* `~.Generator.random` is now the canonical way to generate floating-point
random numbers, which replaces `random_sample`, `sample`, and `ranf`. This
is consistent with Python's `random.random`.
random numbers, which replaces `.RandomState.random_sample`,
`.RandomState.sample`, and `.RandomState.ranf`. This is consistent with
Python's `random.random`.
* All BitGenerators in numpy use `~SeedSequence` to convert seeds into
initialized states.

See :ref:`new-or-different` for a complete list of improvements and
differences from the traditional ``Randomstate``.
Expand All @@ -154,10 +146,11 @@ Parallel Generation
~~~~~~~~~~~~~~~~~~~

The included generators can be used in parallel, distributed applications in
one of two ways:
one of three ways:

* :ref:`seedsequence-spawn`
* :ref:`independent-streams`
* :ref:`jump-and-advance`
* :ref:`parallel-jumped`

Concepts
--------
Expand Down
37 changes: 17 additions & 20 deletions doc/source/reference/random/legacy.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. currentmodule:: numpy.random

.. _legacy:

Legacy Random Generation
Expand All @@ -8,46 +10,41 @@ no further improvements. It is guaranteed to produce the same values
as the final point release of NumPy v1.16. These all depend on Box-Muller
normals or inverse CDF exponentials or gammas. This class should only be used
if it is essential to have randoms that are identical to what
would have been produced by NumPy.
would have been produced by previous versions of NumPy.

`~mtrand.RandomState` adds additional information
to the state which is required when using Box-Muller normals since these
are produced in pairs. It is important to use
`~mtrand.RandomState.get_state`, and not the underlying bit generators
`state`, when accessing the state so that these extra values are saved.

.. warning::

:class:`~randomgen.legacy.LegacyGenerator` only contains functions
that have changed. Since it does not contain other functions, it
is not directly possible to replace :class:`~numpy.random.RandomState`.
In order to full replace :class:`~numpy.random.RandomState`, it is
necessary to use both :class:`~randomgen.legacy.LegacyGenerator`
and :class:`~randomgen.generator.RandomGenerator` both driven
by the same basic RNG. Methods present in :class:`~randomgen.legacy.LegacyGenerator`
must be called from :class:`~randomgen.legacy.LegacyGenerator`. Other Methods
should be called from :class:`~randomgen.generator.RandomGenerator`.

Although we provide the `~mt19937.MT19937` BitGenerator for use independent of
`~mtrand.RandomState`, note that its default seeding uses `~SeedSequence`
rather than the legacy seeding algorithm. `~mtrand.RandomState` will use the
legacy seeding algorithm. The methods to use the legacy seeding algorithm are
currently private as the main reason to use them is just to implement
`~mtrand.RandomState`. However, one can reset the state of `~mt19937.MT19937`
using the state of the `~mtrand.RandomState`:

.. code-block:: python

from numpy.random import MT19937
from numpy.random import RandomState

# Use same seed
rs = RandomState(12345)
mt19937 = MT19937(12345)
lg = RandomState(mt19937)
mt19937 = MT19937()
mt19937.state = rs.get_state()
rs2 = RandomState(mt19937)

# Identical output
# Same output
rs.standard_normal()
lg.standard_normal()
rs2.standard_normal()

rs.random()
lg.random()
rs2.random()

rs.standard_exponential()
lg.standard_exponential()
rs2.standard_exponential()


.. currentmodule:: numpy.random.mtrand
Expand Down
8 changes: 5 additions & 3 deletions doc/source/reference/random/multithreading.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
Multithreaded Generation
========================

The four core distributions all allow existing arrays to be filled using the
``out`` keyword argument. Existing arrays need to be contiguous and
well-behaved (writable and aligned). Under normal circumstances, arrays
The four core distributions (:meth:`~.Generator.random`,
:meth:`~.Generator.standard_normal`, :meth:`~.Generator.standard_exponential`,
and :meth:`~.Generator.standard_gamma`) all allow existing arrays to be filled
using the ``out`` keyword argument. Existing arrays need to be contiguous and
well-behaved (writable and aligned). Under normal circumstances, arrays
created using the common constructors such as :meth:`numpy.empty` will satisfy
these requirements.

Expand Down
Loading
0