-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
DOC: add docs on thread safety in NumPy #27223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
[skip azp][skip actions][skip cirrus]
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,6 +58,7 @@ Other topics | |
|
||
array_api | ||
simd/index | ||
thread_safety | ||
global_state | ||
security | ||
distutils_status_migration | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
.. _thread_safety: | ||
|
||
************* | ||
Thread Safety | ||
************* | ||
|
||
NumPy supports use in a multithreaded context via the `threading` module in the | ||
standard library. Many NumPy operations release the GIL, so unlike many | ||
situations in Python, it is possible to improve parallel performance by | ||
exploiting multithreaded parallelism in Python. | ||
|
||
The easiest performance gains happen when each worker thread owns its own array | ||
or set of array objects, with no data directly shared between threads. Because | ||
NumPy releases the GIL for many low-level operations, threads that spend most of | ||
the time in low-level code will run in parallel. | ||
|
||
It is possible to share NumPy arrays between threads, but extreme care must be | ||
taken to avoid creating thread safety issues when mutating shared arrays. If | ||
two threads simultaneously read from and write to the same array, at best they | ||
will see inconsistent views of the same array data. It is also possible to crash | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may be worth paying special attention to the wording here, given the potential for confusion with array views (in the NumPy sense of the word "view"). For example, maybe "shared arrays" would be better expressed as "arrays that share the same underlying data", or something similar but less wordy. Similarly, the "inconsistent views of the same array data" might be confusing. Overall certainly not a blocker and not worth bike-shedding over at this stage. I'd be in favor of getting this in and worrying about refining later! |
||
the Python interpreter by, for example, resizing an array while another thread | ||
is reading from it to compute a ufunc operation. | ||
|
||
In the future, we may add locking to ndarray to make working with shared NumPy | ||
arrays easier, but for now we suggest focusing on read-only access of arrays | ||
that are shared between threads. | ||
|
||
Note that operations that *do not* release the GIL will see no performance gains | ||
from use of the `threading` module, and instead might be better served with | ||
`multiprocessing`. In particular, operations on arrays with ``dtype=object`` do | ||
not release the GIL. | ||
|
||
Free-threaded Python | ||
-------------------- | ||
|
||
.. versionadded:: 2.1 | ||
|
||
Starting with NumPy 2.1 and CPython 3.13, NumPy also has experimental support | ||
for python runtimes with the GIL disabled. See | ||
https://py-free-threading.github.io for more information about installing and | ||
using free-threaded Python, as well as information about supporting it in | ||
libraries that depend on NumPy. | ||
|
||
Because free-threaded Python does not have a global interpreter lock to | ||
serialize access to Python objects, there are more opportunities for threads to | ||
mutate shared state and create thread safety issues. In addition to the | ||
limitations about locking of the ndarray object noted above, this also means | ||
that arrays with ``dtype=object`` are not protected by the GIL, creating data | ||
races for python objects that are not possible outside free-threaded python. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not feel very logical here. Better at the end of thread safety? Also, isn't this only an issue if one defines legacy user types, which I'd think very few programs do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could combine the two pages maybe?
It's here because it's global state.
I agree it's extremely niche and it's a legacy feature that we aren't really planning to advocate for people to use going forward. I could also not mention it, no one has cared or noticed up until now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could put the info in
PyArray_RegisterDataType
and link from the free-threaded section?(Right now the user-dtype won't be compiled with free-threaded support anyway, but it is tempting of course... And you probably get away with it since most will be probably be added at import time when no threads are active.)
I don't think it needs to be here really. It is global state, but it modifies NumPy runtime behavior very explicitly (and ideally not at all unless you use that dtype).
(I.e. maybe the global state name is not great but not sure what is better. "Global config"?)