-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Ufunc calls on scalars are very slow #11232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Does this have a benchmark? If not, would be good to add one before digging any further |
IIRC one of the slowest things right now may be the type resolution loop of the ufunc, because I remember discussions that implementing a hash table for it would be good, and I somewhat don't think that has been done. A few years back there were some optimizations done in the scalar paths in a GSoC I think, so a lot of the lowest hanging fruits are likely gone (though new ones might have come up I guess). |
@seberg - I'm mostly looking to see if my idea of intercepting already at the |
Yeah, it sounds like a good idea, a few simple fast paths like that for scalars could go a far way. If we can get such a thing right in a not too complex way, maybe it can even help with reducing the code complexity/duplication that is currently really annoying for scalars? (I believe in scalars, but that extra code I do not believe in) |
@eric-wieser - good point about the benchmarks. From a quick look, it looks like there are benchmarks for array scalars but not for true scales (neither numpy nor python ones). |
@seberg - there is room for improvement... I now added a few timings on top. |
here is one of the type lookup optimizations: |
Still would be good to see if one cannot optimize ufuncs called on scalars a bit. Timings on numpy-dev (nearly 2.0)
|
Uh oh!
There was an error while loading. Please reload this page.
It is well known that ufunc calls on scalars are rather slow, but it is probably good to have a summary of why it is so slow, for which it is useful to go along the
ufunc_generic_call
path. I got only partway, but one possible solution might be for the scalars to already get overridden in inCheckOverride
, i.e., to treat them as if they had their own__array_ufunc__
(with priority even below that of ndarray; an actual__array_ufunc__
callingmath
is slightly slower than our present state).PyUFunc_CheckOverride
: for non-arrays (thus including scalars), this checks whether the scalar has__array_ufunc__
. Easy to avoid if our whole API is available - needs ENH: implement nep 0015: merge multiarray and umath #10915.PyUFunc_GenericFunction
: to be done (will edit).make_full_arg_tuple
: EDIT now fast (with MAINT: ensure we do not create unnecessary tuples for outputs #11231)._find_array_wrap
->_find_array_method
: skips arrays and scalars, so should be reasonably fast (though a subclass check forGeneric
is done before type checks on python objects inPyArray_IsAnyScalar
(inndarrayobject.h
).Simple timings
Single-input ufunc, comparing with
math
Somewhat more random, for addition
The text was updated successfully, but these errors were encountered: