-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Rebase, WIP: implement __rop__
logic for scalar operators
#8126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is the second half of the fix for numpygh-1296 (trac 698). None + np.longdouble(3) would cause either a RunTimeError or an interpreter crash. The swapped case (np.longdouble(3) + None) was fixed in 2008 (376d483). When writing binary operators in C, they must treat both arguments equivalently because they will be called for both __op__ and __rop__. This bug fix is necessary because the longdouble and clongdouble operators did not maintain this symmetry.
Previously it took 10x longer to do np.type1(1) op np.type2(2) when type2 could not be safely cast to type1 than the equivalent operation with type1 == type1 or when type2 could be safely upcast to type1. This was due to falling back to calling the equivalent ufunc without trying to defer the call to the scalar operator of type2 (and safely upcasting type1) so previously: In [2]: a = np.float32(4) In [3]: b = np.float64(4) In [4]: timeit a * a 10000000 loops, best of 3: 69.1 ns per loop In [5]: timeit b * b 10000000 loops, best of 3: 69.5 ns per loop In [6]: timeit a * b 1000000 loops, best of 3: 1.29 µs per loop In [7]: timeit b * a 10000000 loops, best of 3: 116 ns per loop and with these changes: In [2]: a = np.float32(4) In [3]: b = np.float64(4) In [4]: timeit a * a 10000000 loops, best of 3: 74 ns per loop In [5]: timeit b * b 10000000 loops, best of 3: 73.7 ns per loop In [6]: timeit a * b 10000000 loops, best of 3: 181 ns per loop In [7]: timeit b * a 10000000 loops, best of 3: 125 ns per loop Operations on mixed type scalars that result in a scalar of a new type still use the ufunc fallback and hit the speed penalty e.g. F op d -> D. In [2]: a = np.complex64(1+2j) In [3]: b = np.double(4) In [4]: timeit a * b 1000000 loops, best of 3: 1.22 µs per loop
When evaluating np.type1(2) op np.type2(4) don't fall back to calling the op ufunc if the output type is neither of np.type1 or np.type2. Defer the op call to that of the correct output type. This speeds up things like F * d -> D by about 5x and with the previous changes to the safely castable case, causes the scalar power operators to always return a floating point type when raising integer types to negative powers. Fixes numpygh-7449.
☔ The latest upstream changes (presumably 0a02bb6) made this pull request unmergeable. Please resolve the merge conflicts. |
Pushing this off to 1.13.0. Not sure how much applies. |
@ewmoore ISTR that you have run out of spare time ;) I think we do need to take a look at the |
__rop__
logic for scalar operators
Pushing this off again, don't have time to work on this before the release. |
Pushing off to 1.16. |
Rebase of #7459.
Re. #7449.
This is a work in progress, both because I'd like to get some feedback on the approach and there are currently two test failures that I don't yet understand.