8000 PERF: add a fast path to ufunc type resolution by ngoldbaum · Pull Request #27859 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

PERF: add a fast path to ufunc type resolution #27859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ngoldbaum
Copy link
Member

Fixes #27786.

Our group is thinking about ways to add better benchmarking so we don't regress on issues like this, see Quansight-Labs/free-threaded-compatibility#117. For now if it's alright I'd just like to merge this fix.

With the test script in the issue and this PR applied, I get the following scaling:

mflops_array_length_1000

generating this made my laptop pretty hot!

@ngoldbaum ngoldbaum added 09 - Backport-Candidate PRs tagged should be backported 39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703) labels Nov 26, 2024
@ngoldbaum ngoldbaum added this to the 2.1.4 release milestone Nov 26, 2024
PyArray_DTypeMeta *op_dtypes[],
npy_bool legacy_promotion_is_possible)
{
PyObject *info = PyArrayIdentityHash_GetItem(ufunc->_dispatch_cache,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is thread-safe because we may read from the hash table while another thread is concurrently modifying it, including possibly resizing the table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😓 That's a good point.

Is there a thread-safe way to do it without an RW lock?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make PyArrayIdentityHash a concurrent hash table. That's generally hard, but there are a few things that make this specific case easier:

  • We only care about concurrent read performance
  • Entries are never deleted or overwritten

But we'd still have to deal with hash table resizing, which makes it more complex.

I think a RW lock will be simpler as long as it provides good enough read performance.

@ngoldbaum
Copy link
Member Author

I'm going to try again using an RW-lock

@ngoldbaum ngoldbaum closed this Dec 2, 2024
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Dec 3, 2024
@charris charris removed this from the 2.1.4 release milestone Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: poor multithreaded performance scaling for small arrays (nogil)
3 participants
0