-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
MNT: disable the allocator cache for nogil builds #26251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What is the performance penalty for disabling the caches? |
I am not sure how the nogil branch works, but I think I once heard about it using OTOH, that may also be something for the future to worry about. (EDIT: I actually, not even sure we need "Raw", since I think these need the GIL anyway.) |
I ran the benchmarks with a nogil python build but with the GIL explicitly enabled to avoid any threading bugs in the benchmarks. Unfortunately asv is sort of broken with nogil python at the moment, so I had to run the comparison manually, and the tools in asv to compare benchmark results don't work with this mode. The first file in the gist is the core benchmark suite with this PR applied, the second is without. https://gist.github.com/ngoldbaum/b632c35e5359ea78d90f541ed5cc21d2 Eyeballing the results, I see approximately a 10% performance penalty for operations on 8000 small arrays. As a general rule for the nogil work, I'm hoping to end up with a version of numpy that runs stably with nogil python and that goal may initially be achieved at the cost of single-threaded performance in the nogil build of python. Does that sound like a reasonable rule of thumb for this work?
|
Also I'm happy to add a section to the tracking issue where we keep track of places we sacrifice single-threaded performance to avoid threading bugs, as an opportunity to come back and add e.g. locking or a more sophisticated implementation later when we're optimizing nogil numpy. |
That sounds like the right thing to do to me. It's too early to micro-optimize. And optimizing performance will be a lot easier to do later, once |
I like how this is isolated by the |
We talked about this yesterday at the community meeting and there seemed to be agreement that focusing on stability at the cost of performance is OK for this work, at least as long as there aren't any catastrophic single-threaded performance implications. There also shouldn't be negative performance implications for the "normal" python build. I'll open a followup issue to track spots where we should come back and improve single-threaded performance and link it from the tracking issue. @seberg, you weren't at the meeting yesterday and I just want to double-check, does this plan sound reasonable to you? |
Yes of course, even if this ends up lost in time, I don't think it is a huge issue. I was just a bit curious if using the default allocator (and thus |
OK great, let's bring this in then. |
These caches are implicitly locked by the GIL, so in the nogil build I see use-after-free errors in tests that use python threads. See for example this ASAN error in one of the stringdtype tests that does multithreaded array mutation and access:
I initially thought we could keep the caching by marking the caches as thread local, but that won't work with any allocation happening in a dynamically created thread. Instead I think the only safe thing we can do without adding a mutex to access the caches is disable the cache entirely.
Happy to entertain alternative approaches.
With this change when the gil is disabled
numpy/_core/tests/test_stringdtype.py::test_threaded_access_and_mutation
goes from seg faulting every time it's executed to working every time I run it.