-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
PERF: Improve performance of special attribute lookups #21423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Improve performance of special attribute lookups #21423
Conversation
Cache PyUniCode object with value __array_ufunc__ for attribute lookup
8dc6a24
to
ee8c683
Compare
Oh, I had not realized that this branch kicks in for our scalars. I think this makes sense in any case, although I am curious about cleaning the code up a bit more. Would you be up for that? The main point is that I am pretty sure there are two things to note here:
There are two ways to go about, either introduce a I don't want to overstretch you though. So if you like, I could also try to do the larger refactor and you can review it instead? |
@seberg I can take a shot a refactoring. There is an issue though with both A screenshot of profiling results from calculating About 37% of the There are some options to eliminate the exception generation:
Since I an new to numpy I cannot see the implications of these options (and perhaps there are more), so I welcome your thoughts on this. |
@seberg The attribute lookup is slow due to the fact that numpy looks for the attribute on the type and not the instance
The following benchmarks shows what happens:
Result:
The difference in execution times is because in the first 2 tests we follow this path: and in the last two we follow
Not sure why the attribute lookup for types should be slower, I will look into this. |
About the earlier option:
Looking up on the type is correct, I am surprised that looking up on the instance is so much faster! Squinting at the Python code, it might be to do with metaclass handling, but these don't even have a metaclass (well besides But, no matter all of those other things, storing the interned string somewhere and working with that exclusively still seems like a good start, that is independend of the other things you found? |
The third option works:
results in a 5% speedup (and perhaps when added on the C side). I will first complete refactoring the unicode string conversion and then continue on the attribute lookup. |
Yeah, but I had not double checked that it will not have any effect on binary operators in weird subclassing situation. Not that this is likely to matter in practice :/. Defining The alternative would be to use Squinting at it, if it is important to not slow down unknown/arbitrary objects in some of these places, we could even add a bloom filter (a bloom filter is like a mini Lets focus on the other things first! Interesting CPython issue, so it is the error formatting mainly/only? I guess CPython would have to add a fast-path for |
Ah, so you know what I was looking at. The binary operator logic in question is the logic following here: numpy/numpy/core/src/common/binop_override.h Line 131 in d40f630
|
…ance Refactor code so the unicode names are interned
Frankly, we should not use a header anymore there, it should be a C file now, but that is better for another PR/day.
Going to put this in once CI is happy, thanks @eendebakpt! We should have a few benchmarks that notice this just fine. At least one of the array-coercion does Further, the benchmarks for |
@eendebakpt since you managed to run the benchmarks. If you like, it would be cool to followup with a small ufunc benchark to test ufunc call overhead on NumPy scalars (and maybe also 0-D or just very small arrays). But, I am happy with these changes, so will put them in, thanks. |
This PR reduces the overhead of
ufunc_generic_fastcall
, which is used by many numpy methods. For many input arguments a check is performed on the existence of the__array_ufunc__
attribute. The check involves a call totp_getattro
fromPyTypeObject
which requires aPyUniCode
object.This PR avoids construction of the
PyUniCode
for each invocation of the attribute lookup.Benchmark
Results
Results of numpy benchmarks
Benchmark:The results show some tests with improved and some with decreased performance. It looks like some instability on my system.
Most changes are in
bench_ufunc_strides.BinaryInt
, on a re-run that values changed and the increases or decreases do not look systematic.Full results