-
-
Notifications
You must be signed in to change notification settings - Fork 32k
TSAN failures seen running PyO3 tests with the free-threaded build #130421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For future reference: py_get_monotonic_clock (macOS only)The initialization of the global Lines 1160 to 1166 in 38642bf
long_from_non_binary_baseNon thread-safe initialization of global state. Either initialize this during runtime initialization (if it's fast enough) or use Lines 2835 to 2839 in f963239
_PyType_AllocNoTrack
Line 2237 in 38642bf
Line 8667 in 38642bf
get_or_create_weakref and PyType_FromMetaclassSeems pretty similar to the cpython/Objects/weakrefobject.c Line 413 in 38642bf
Line 5055 in 38642bf
best_base / type_readyNot sure, but seems related to the above two issues.
_Py_IncRef and _PyObject_SetDeferredRefcount
(We could also consider trying to make The type shouldn't be exposed to other threads, but it might be getting accessed via a Line 2549 in 38642bf
PyTuple_Pack / mi_page_free_list_extendPossible duplicate of: |
@ngoldbaum - I think a few of these races, such as https://github.com/PyO3/pyo3/blob/b07871d962d56e66243d2c52a700e978545ff417/tests/test_gc.rs#L185
Is it possible to remove the |
Likely yes, I'll take a look at that. |
I tried disabling the call to
Here I'm using a suppression file with the following content:
|
Windows and macOS require precomputing a "timebase" in order to convert OS timestamps into nanoseconds. Retrieve and compute this value during runtime initialization to avoid data races when accessing the time.
Windows and macOS require precomputing a "timebase" in order to convert OS timestamps into nanoseconds. Retrieve and compute this value during runtime initialization to avoid data races when accessing the time.
…30592) Windows and macOS require precomputing a "timebase" in order to convert OS timestamps into nanoseconds. Retrieve and compute this value during runtime initialization to avoid data races when accessing the time.
Uh oh!
There was an error while loading. Please reload this page.
I'm seeing TSAN warnings running the PyO3 tests using CPython commit 38642bf
I've done this on an M3 Macbook Pro running MacOS Sequoia as well as @nascheme's cpython_sanity docker image which has LLVM 20 installed (as well as Python3.13 with TSAN and some packages, but I didn't use that). I think it's only possible to run TSAN on both the rust code and CPython using LLVM 20 and I can't easily install that on my Mac right now since it's not yet packaged on homebrew.
See this comment in the PyO3 repo if you want to use the docker image, there are some small tweaks you need to do before it will work correctly.
On an ARM Mac, I installed llvm from homebrew and then did
CONFIGURE_OPTS="--with-thread-sanitizer" pyenv install 3.14t-dev
to get a TSAN CPython build. You'll also need to install a rust toolchain.You'll also need a copy of PyO3 checked out to this branch.
Because homebrew doesn't have LLVM 19, I had to resort to just running the cargo tests as normal using a CPython with TSAN. I think this should still detect races happening inside CPython.
Here is the full output from one invocation on my Mac: https://gist.github.com/ngoldbaum/e198d87149617ecdaf881f29a03b8126
Here are a sampling of the warning summaries:
You can ignore all the
test_compiler_error
messages - nightly rust always has compiler error message failures.Another way to trigger these failures is with
cargo stress
, which runs the tests in a loop to try to trigger safety issues like this. I have a hacked together version ofcargo stress
on this branch that makes it so that instead of crashing if a thread writes to stderr, it prints the stderr to the terminal and continues. If you run TSAN withTSAN_OPTIONS=exit_code=0
, my version ofcargo stress
will happily continue running after the first TSAN warning. This is a good way to generate lots of warnings quickly without waiting to rerun the full test suite manually.Here are some additional summaries that I see in a
cargo stress
run:And here is the full terminal output (this ran for about 10 seconds before I killed it with ctrl-c): https://gist.github.com/ngoldbaum/1d1e29c8e10f0ac979ef27a95c73d39f
When I try to do the same tests in the docker container using a version of 3.14t-dev I built on the container, I don't see any of the TSAN reports seen above. Maybe they don't happen on x86_64?
Also note that there is a race inside PyO3 triggered by the PyO3 test
test_thread_safety_2
, you may see that if you are running the tests inside the docker container. There are also two test failures due to unexpected panics that I only see under TSAN in the docker container. I'm not sure what's happening with the failures yet.Linked PRs
The text was updated successfully, but these errors were encountered: