8000 DOC: add docs on thread safety in NumPy by ngoldbaum · Pull Request #27223 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

DOC: add docs on thread safety in NumPy #27223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 17, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
DOC: add docs on thread safety in NumPy
[skip azp][skip actions][skip cirrus]
  • Loading branch information
ngoldbaum committed Aug 15, 2024
commit 908169c55aef1f2fbadab59a67f79af56722bae2
16 changes: 11 additions & 5 deletions doc/source/reference/global_state.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,10 @@
Global state
************

NumPy has a few import-time, compile-time, or runtime options
which change the global behaviour.
Most of these are related to performance or for debugging
purposes and will not be interesting to the vast majority
of users.
NumPy exposes global state in legacy APIs and a few import-time,
compile-time, or runtime options which change the global behaviour.
Most of these are related to performance or for debugging purposes and
will not be interesting to the vast majority of users.


Performance-related options
Expand Down Expand Up @@ -71,3 +70,10 @@ and set the ``ndarray.base``.

.. versionchanged:: 1.25.2
This variable is only checked on the first import.

Legacy User DTypes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not feel very logical here. Better at the end of thread safety? Also, isn't this only an issue if one defines legacy user types, which I'd think very few programs do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could combine the two pages maybe?

It's here because it's global state.

I agree it's extremely niche and it's a legacy feature that we aren't really planning to advocate for people to use going forward. I could also not mention it, no one has cared or noticed up until now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could put the info in PyArray_RegisterDataType and link from the free-threaded section?
(Right now the user-dtype won't be compiled with free-threaded support anyway, but it is tempting of course... And you probably get away with it since most will be probably be added at import time when no threads are active.)

I don't think it needs to be here really. It is global state, but it modifies NumPy runtime behavior very explicitly (and ideally not at all unless you use that dtype).
(I.e. maybe the global state name is not great but not sure what is better. "Global config"?)

==================

The number of legacy user DTypes is stored in ``NPY_NUMUSERTPES``, a global
variable that is exposed in the NumPy C API. This means that the legacy DType
API is inherently not thread-safe.
1 change: 1 addition & 0 deletions doc/source/reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ Other topics

array_api
simd/index
thread_safety
global_state
security
distutils_status_migration
Expand Down
49 changes: 49 additions & 0 deletions doc/source/reference/thread_safety.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
.. _thread_safety:

*************
Thread Safety
*************

NumPy supports use in a multithreaded context via the `threading` module in the
standard library. Many NumPy operations release the GIL, so unlike many
situations in Python, it is possible to improve parallel performance by
exploiting multithreaded parallelism in Python.

The easiest performance gains happen when each worker thread owns its own array
or set of array objects, with no data directly shared between threads. Because
NumPy releases the GIL for many low-level operations, threads that spend most of
the time in low-level code will run in parallel.

It is possible to share NumPy arrays between threads, but extreme care must be
taken to avoid creating thread safety issues when mutating shared arrays. If
two threads simultaneously read from and write to the same array, at best they
will see inconsistent views of the same array data. It is also possible to crash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth paying special attention to the wording here, given the potential for confusion with array views (in the NumPy sense of the word "view"). For example, maybe "shared arrays" would be better expressed as "arrays that share the same underlying data", or something similar but less wordy. Similarly, the "inconsistent views of the same array data" might be confusing.

Overall certainly not a blocker and not worth bike-shedding over at this stage. I'd be in favor of getting this in and worrying about refining later!

the Python interpreter by, for example, resizing an array while another thread
is reading from it to compute a ufunc operation.

In the future, we may add locking to ndarray to make working with shared NumPy
arrays easier, but for now we suggest focusing on read-only access of arrays
that are shared between threads.

Note that operations that *do not* release the GIL will see no performance gains
from use of the `threading` module, and instead might be better served with
`multiprocessing`. In particular, operations on arrays with ``dtype=object`` do
not release the GIL.

Free-threaded Python
--------------------

.. versionadded:: 2.1

Starting with NumPy 2.1 and CPython 3.13, NumPy also has experimental support
for python runtimes with the GIL disabled. See
https://py-free-threading.github.io for more information about installing and
using free-threaded Python, as well as information about supporting it in
libraries that depend on NumPy.

Because free-threaded Python does not have a global interpreter lock to
serialize access to Python objects, there are more opportunities for threads to
mutate shared state and create thread safety issues. In addition to the
limitations about locking of the ndarray object noted above, this also means
that arrays with ``dtype=object`` are not protected by the GIL, creating data
races for python objects that are not possible outside free-threaded python.
0