8000 DOC: add docs on thread safety in NumPy by ngoldbaum · Pull Request #27223 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

DOC: add docs on thread safety in NumPy #27223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
8000 Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/source/reference/c-api/array.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1264,6 +1264,13 @@ User-defined data types
registered (checked only by the address of the pointer), then
return the previously-assigned type-number.

The number of user DTypes known to numpy is stored in
``NPY_NUMUSERTYPES``, a static global variable that is public in the
C API. Accessing this symbol is inherently *not* thread-safe. If
for some reason you need to use this API in a multithreaded context,
you will need to add your own locking, NumPy does not ensure new
data types can be added in a thread-safe manner.

.. c:function:: int PyArray_RegisterCastFunc( \
PyArray_Descr* descr, int totype, PyArray_VectorUnaryFunc* castfunc)

Expand Down
17 changes: 8 additions & 9 deletions doc/source/reference/global_state.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
.. _global_state:

************
Global state
************

NumPy has a few import-time, compile-time, or runtime options
which change the global behaviour.
Most of these are related to performance or for debugging
purposes and will not be interesting to the vast majority
of users.
****************************
Global Configuration Options
****************************

NumPy has a few import-time, compile-time, or runtime configuration
options which change the global behaviour. Most of these are related to
performance or for debugging purposes and will not be interesting to the
vast majority of users.


Performance-related options
Expand Down
1 change: 1 addition & 0 deletions doc/source/reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ Other topics

array_api
simd/index
thread_safety
global_state
security
distutils_status_migration
Expand Down
51 changes: 51 additions & 0 deletions doc/source/reference/thread_safety.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
.. _thread_safety:

*************
Thread Safety
*************

NumPy supports use in a multithreaded context via the `threading` module in the
standard library. Many NumPy operations release the GIL, so unlike many
situations in Python, it is possible to improve parallel performance by
exploiting multithreaded parallelism in Python.

The easiest performance gains happen when each worker thread owns its own array
or set of array objects, with no data directly shared between threads. Because
NumPy releases the GIL for many low-level operations, threads that spend most of
the time in low-level code will run in parallel.

It is possible to share NumPy arrays between threads, but extreme care must be
taken to avoid creating thread safety issues when mutating arrays that are
shared between multiple threads. If two threads simultaneously read from and
write to the same array, they will at best produce inconsistent, racey results that
are not reproducible, let alone correct. It is also possible to crash the Python
interpreter by, for example, resizing an array while another thread is reading
from it to compute a ufunc operation.

In the future, we may add locking to ndarray to make writing multithreaded
algorithms using NumPy arrays safer, but for now we suggest focusing on
read-only access of arrays that are shared between threads, or adding your own
locking if you need to mutation and multithreading.

Note that operations that *do not* release the GIL will see no performance gains
from use of the `threading` module, and instead might be better served with
`multiprocessing`. In particular, operations on arrays with ``dtype=object`` do
not release the GIL.

Free-threaded Python
--------------------

.. versionadded:: 2.1

Starting with NumPy 2.1 and CPython 3.13, NumPy also has experimental support
for python runtimes with the GIL disabled. See
https://py-free-threading.github.io for more information about installing and
using free-threaded Python, as well as information about supporting it in
libraries that depend on NumPy.

Because free-threaded Python does not have a global interpreter lock to
serialize access to Python objects, there are more opportunities for threads to
mutate shared state and create thread safety issues. In addition to the
limitations about locking of the ndarray object noted above, this also means
that arrays with ``dtype=object`` are not protected by the GIL, creating data
races for python objects that are not possible outside free-threaded python.
3 changes: 3 additions & 0 deletions doc/source/user/c-info.beyond-basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,9 @@ specifies your data-type. This type number should be stored and made
available by your module so that other modules can use it to recognize
your data-type.

Note that this API is inherently thread-unsafe. See `thread_safety` for more
details about thread safety in NumPy.


Registering a casting function
------------------------------
Expand Down
0