8000 BUG: Race on `descr->byteorder` under free threading · Issue #28143 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Race on descr->byteorder under free threading #28143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hawkinsp opened this issue Jan 10, 2025 · 5 comments · Fixed by #28154
Closed

BUG: Race on descr->byteorder under free threading #28143

hawkinsp opened this issue Jan 10, 2025 · 5 comments · Fixed by #28154
Labels
00 - Bug 39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703)

Comments

@hawkinsp
Copy link
Contributor

Describe the issue:

When running the following code under TSAN and free-threading, TSAN reports an apparently real race:

Reproduce the code example:

import concurrent.futures
import functools
import threading

import numpy as np

num_threads = 20

def closure(b, x):
  b.wait()
  for _ in range(100):
    y = np.arange(10)
    y.flat[x] = x

with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:
  for i in range(1000):
    b = threading.Barrier(num_threads)
    for _ in range(num_threads):
      x = np.arange(100)
      executor.submit(functools.partial(closure, b, x))

Error message:

# TSAN error

WARNING: ThreadSanitizer: data race (pid=3673664)
  Read of size 1 at 0x7f2973cd94a2 by main thread:
    #0 PyArray_PromoteTypes /usr/local/google/home/phawkins/p/numpy/.mesonpy-v3nmau22/../numpy/_core/src/multiarray/convert_datatype.c:1001:16 (_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x248c2e) (BuildId: 01f027aee9bd9e769c42b1931cadc90163397cb6)
    #1 handle_promotion /usr/local/google/home/phawkins/p/numpy/.mesonpy-v3nmau22/../numpy/_core/src/multiarray/array_coercion.c:720:32 (_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x224e6c) (BuildId: 01f027aee9bd9e769c42b1931cadc90163397cb6)
    #2 handle_scalar /usr/local/google/home/phawkins/p/numpy/.mesonpy-v3nmau22/../numpy/_core/src/multiarray/array_coercion.c:777:9 (_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x22499c) (BuildId: 01f027aee9bd9e769c42b1931cadc90163397cb6)
    #3 PyArray_DiscoverDTypeAndShape_Recursive /usr/local/google/home/phawkins/p/numpy/.mesonpy-v3nmau22/../numpy/_core/src/multiarray/array_coercion.c:1022:20 (_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x223b45) (BuildId: 01f027aee9bd9e769c42b1931cadc90163397cb6)
    #4 PyArray_DiscoverDTypeAndShape /usr/local/google/home/phawkins/p/numpy/.mesonpy-v3nmau22/../numpy/_core/src/multiarray/array_coercion.c:1307:16 (_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x225162) (BuildId: 01f027aee9bd9e769c42b1931cadc90163397cb6)
    #5 PyArray_DTypeFromObject /usr/local/google/home/phawkins/p/numpy/.mesonpy-v3nmau22/../numpy/_core/src/multiarray/common.c:132:12 (_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x241ecd) (BuildId: 01f027aee9bd9e769c42b1931cadc90163397cb6)
    #6 PyArray_DescrFromObject /usr/local/google/home/phawkins/p/numpy/.mesonpy-v3nmau22/../numpy/_core/src/multiarray/ctors.c:2628:9 (_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x25c02b) (BuildId: 01f027aee9bd9e769c42b1931cadc90163397cb6)
    #7 PyArray_ArangeObj /usr/local/google/home/phawkins/p/numpy/.mesonpy-v3nmau22/../numpy/_core/src/multiarray/ctors.c:3340:9 (_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x25c02b)
    #8 array_arange /usr/local/google/home/phawkins/p/numpy/.mesonpy-v3nmau22/../numpy/_core/src/multiarray/multiarraymodule.c:3100:13 (_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x2db297) (BuildId: 01f027aee9bd9e769c42b1931cadc90163397cb6)
    #9 cfunction_vectorcall_FASTCALL_KEYWORDS /usr/local/google/home/phawkins/p/cpython/Objects/methodobject.c:441:24 (python3.13+0x289f20) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #10 _PyObject_VectorcallTstate /usr/local/google/home/phawkins/p/cpython/./Include/internal/pycore_call.h:168:11 (python3.13+0x1eafea) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #11 PyObject_Vectorcall /usr/local/google/home/phawkins/p/cpython/Objects/call.c:327:12 (python3.13+0x1eafea)
    #12 _PyEval_EvalFrameDefault /usr/local/google/home/phawkins/p/cpython/Python/generated_cases.c.h:813:23 (python3.13+0x3e290b) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #13 _PyEval_EvalFrame /usr/local/google/home/phawkins/p/cpython/./Include/internal/pycore_ceval.h:119:16 (python3.13+0x3de712) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #14 _PyEval_Vector /usr/local/google/home/phawkins/p/cpython/Python/ceval.c:1807:12 (python3.13+0x3de712)
    #15 PyEval_EvalCode /usr/local/google/home/phawkins/p/cpython/Python/ceval.c:597:21 (python3.13+0x3de712)
    #16 run_eval_code_obj /usr/local/google/home/phawkins/p/cpython/Python/pythonrun.c:1337:9 (python3.13+0x4a0a7e) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #17 run_mod /usr/local/google/home/phawkins/p/cpython/Python/pythonrun.c:1422:19 (python3.13+0x4a01a5) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #18 pyrun_file /usr/local/google/home/phawkins/p/cpython/Python/pythonrun.c:1255:15 (python3.13+0x49c2a0) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #19 _PyRun_SimpleFileObject /usr/local/google/home/phawkins/p/cpython/Python/pythonrun.c:490:13 (python3.13+0x49c2a0)
    #20 _PyRun_AnyFileObject /usr/local/google/home/phawkins/p/cpython/Python/pythonrun.c:77:15 (python3.13+0x49b968) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #21 pymain_run_file_obj /usr/local/google/home/phawkins/p/cpython/Modules/main.c:410:15 (python3.13+0x4d7e8f) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #22 pymain_run_file /usr/local/google/home/phawkins/p/cpython/Modules/main.c:429:15 (python3.13+0x4d7e8f)
    #23 pymain_run_python /usr/local/google/home/phawkins/p/cpython/Modules/main.c:697:21 (python3.13+0x4d70dc) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #24 Py_RunMain /usr/local/google/home/phawkins/p/cpython/Modules/main.c:776:5 (python3.13+0x4d70dc)
    #25 pymain_main /usr/local/google/home/phawkins/p/cpython/Modules/main.c:806:12 (python3.13+0x4d7518) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #26 Py_BytesMain /usr/local/google/home/phawkins/p/cpython/Modules/main.c:830:12 (python3.13+0x4d759b) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #27 main /usr/local/google/home/phawkins/p/cpython/./Programs/python.c:15:12 (python3.13+0x15c7eb) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)

  Previous write of size 1 at 0x7f2973cd94a2 by thread T147:
    #0 PyArray_CheckFromAny_int /usr/local/google/home/phawkins/p/numpy/.mesonpy-v3nmau22/../numpy/_core/src/multiarray/ctors.c:1843:33 (_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x259d09) (BuildId: 01f027aee9bd9e769c42b1931cadc90163397cb6)
    #1 PyArray_CheckFromAny /usr/local/google/home/phawkins/p/numpy/.mesonpy-v3nmau22/../numpy/_core/src/multiarray/ctors.c:1811:22 (_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x25999d) (BuildId: 01f027aee9bd9e769c42b1931cadc90163397cb6)
    #2 iter_ass_subscript /usr/local/google/home/phawkins/p/numpy/.mesonpy-v3nmau22/../numpy/_core/src/multiarray/iterators.c:1016:19 (_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x2b3e0b) (BuildId: 01f027aee9bd9e769c42b1931cadc90163397cb6)
    #3 PyObject_SetItem /usr/local/google/home/phawkins/p/cpython/Objects/abstract.c:232:19 (python3.13+0x1b9728) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #4 _PyEval_EvalFrameDefault /usr/local/google/home/phawkins/p/cpython/Python/generated_cases.c.h:5777:27 (python3.13+0x3f5fcb) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #5 _PyEval_EvalFrame /usr/local/google/home/phawkins/p/cpytho
8000
n/./Include/internal/pycore_ceval.h:119:16 (python3.13+0x3dea3a) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #6 _PyEval_Vector /usr/local/google/home/phawkins/p/cpython/Python/ceval.c:1807:12 (python3.13+0x3dea3a)
    #7 _PyFunction_Vectorcall /usr/local/google/home/phawkins/p/cpython/Objects/call.c (python3.13+0x1eb65f) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #8 _PyObject_VectorcallTstate /usr/local/google/home/phawkins/p/cpython/./Include/internal/pycore_call.h:168:11 (python3.13+0x572352) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #9 partial_vectorcall /usr/local/google/home/phawkins/p/cpython/./Modules/_functoolsmodule.c:252:16 (python3.13+0x572352)
    #10 _PyVectorcall_Call /usr/local/google/home/phawkins/p/cpython/Objects/call.c:273:16 (python3.13+0x1eb2d3) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #11 _PyObject_Call /usr/local/google/home/phawkins/p/cpython/Objects/call.c:348:16 (python3.13+0x1eb2d3)
    #12 PyObject_Call /usr/local/google/home/phawkins/p/cpython/Objects/call.c:373:12 (python3.13+0x1eb355) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #13 _PyEval_EvalFrameDefault /usr/local/google/home/phawkins/p/cpython/Python/generated_cases.c.h:1355:26 (python3.13+0x3e4af2) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #14 _PyEval_EvalFrame /usr/local/google/home/phawkins/p/cpython/./Include/internal/pycore_ceval.h:119:16 (python3.13+0x3dea3a) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #15 _PyEval_Vector /usr/local/google/home/phawkins/p/cpython/Python/ceval.c:1807:12 (python3.13+0x3dea3a)
    #16 _PyFunction_Vectorcall /usr/local/google/home/phawkins/p/cpython/Objects/call.c (python3.13+0x1eb65f) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #17 _PyObject_VectorcallTstate /usr/local/google/home/phawkins/p/cpython/./Include/internal/pycore_call.h:168:11 (python3.13+0x1ef62f) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #18 method_vectorcall /usr/local/google/home/phawkins/p/cpython/Objects/classobject.c:70:20 (python3.13+0x1ef62f)
    #19 _PyVectorcall_Call /usr/local/google/home/phawkins/p/cpython/Objects/call.c:273:16 (python3.13+0x1eb2d3) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #20 _PyObject_Call /usr/local/google/home/phawkins/p/cpython/Objects/call.c:348:16 (python3.13+0x1eb2d3)
    #21 PyObject_Call /usr/local/google/home/phawkins/p/cpython/Objects/call.c:373:12 (python3.13+0x1eb355) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #22 thread_run /usr/local/google/home/phawkins/p/cpython/./Modules/_threadmodule.c:337:21 (python3.13+0x564a32) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)
    #23 pythread_wrapper /usr/local/google/home/phawkins/p/cpython/Python/thread_pthread.h:243:5 (python3.13+0x4bddb7) (BuildId: 19c569fa942016d8ac49d19fd40ccb1ddded939b)

  Location is global 'LONG_Descr' of size 136 at 0x7f2973cd9478 (_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0xad94a2)

Python and NumPy Versions:

2.3.0.dev0+git20250110.dc78e30
3.13.1+ experimental free-threading build (heads/3.13:65da5db28a3, Jan 10 2025, 14:52:18) [Clang 18.1.8 (11)]

Runtime Environment:

[{'numpy_version': '2.3.0.dev0+git20250110.dc78e30',
'python': '3.13.1+ experimental free-threading build '
'(heads/3.13:65da5db28a3, Jan 10 2025, 14:52:18) [Clang 18.1.8 '
'(11)]',
'uname': uname_result(system='Linux', node='redacted', release='6.10.11-redacted-amd64', version='#1 SMP PREEMPT_DYNAMIC Debian 6.10.11-redacted(2024-10-16)', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'Zen',
'filepath': '/usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.27.so',
'internal_api': 'openblas',
'num_threads': 128,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.27'}]

Context for the issue:

Found when running the JAX test suite with TSAN and free-threading.

@ngoldbaum
Copy link
Member

this (kinda silly) diff seems to fix the race:

diff --git a/numpy/_core/src/multiarray/ctors.c b/numpy/_core/src/multiarray/ctors.c
index 0723e54f34..8379480389 100644
--- a/numpy/_core/src/multiarray/ctors.c
+++ b/numpy/_core/src/multiarray/ctors.c
@@ -1839,7 +1839,7 @@ PyArray_CheckFromAny_int(PyObject *op, PyArray_Descr *in_descr,
         else if (in_descr && !PyArray_ISNBO(in_descr->byteorder)) {
             PyArray_DESCR_REPLACE(in_descr);
         }
-        if (in_descr && in_descr->byteorder != NPY_IGNORE) {
+        if (in_descr && in_descr->byteorder != NPY_IGNORE && in_descr->byteorder != NPY_NATIVE) {
             in_descr->byteorder = NPY_NATIVE;
         }
     }

@hawkinsp in your opinion, are changes like above the sort of thing we need to do now or is this the sort of thing it makes sense to write a suppression for? I'll also wait for Sebastian to comment before sending that in since I don't have context if a bigger refactor to avoid writing to effectively global descriptor objects makes sense.

@hawkinsp
Copy link
Contributor Author

There's no such thing as a benign race, e.g., see usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf or similar. Even things you think might be safe at a hardware level are extremely difficult to reason about as soon as there is an optimizing compiler involved. So it's probably sensible to fix pretty much anything tsan tells you about.

Is this particular bug the most urgent? No. For my own purposes, I have added a tsan suppression and moved on with my life, but I don't want the issue to be forgotten, and hence this bug. But, contrast that with, say, #28048 which I can't work around: I'd rate that sort of issue higher priority.

@ngoldbaum
Copy link
Member

But, contrast that with, say, #28048 which I can't work around: I'd rate that sort of issue higher priority.

For sure, that's definitely high up on my priority queue to fix, and thank you for finding and reporting these issues.

@seberg
Copy link
Member
seberg commented Jan 10, 2025

No arcane knowledge. Just need to look 2 lines up, it should be part of the elif (and then doesn't need the if even).

@seberg
Copy link
Member
seberg commented Jan 10, 2025

Sorry, nonsense of course... Yeah, I think the fix is good, cose just has a bit weird flow and not sure how well the descr replacement works for user-dtypes but that is a different issue.

@ngoldbaum ngoldbaum added the 39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703) label Jan 13, 2025
Sign up for free to join this conversation on GitHub. Alread 4E2B y have an account? Sign in to comment
Labels
00 - Bug 39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0