8000 WIP: implement __rop__ logic for scalar operators by ewmoore · Pull Request #7459 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

WIP: implement __rop__ logic for scalar operators #7459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
ENH: implement __rop__ logic for mixed type scalar-scalar operators
Previously it took 10x longer to do np.type1(1) op np.type2(2) when
type2 could not be safely cast to type1 than the equivalent operation
with type1 == type1 or when type2 could be safely upcast to type1. This
was due to falling back to calling the equivalent ufunc without trying
to defer the call to the scalar operator of type2 (and safely upcasting
type1)

so previously:

In [2]: a = np.float32(4)

In [3]: b = np.float64(4)

In [4]: timeit a * a
10000000 loops, best of 3: 69.1 ns per loop

In [5]: timeit b * b
10000000 loops, best of 3: 69.5 ns per loop

In [6]: timeit a * b
1000000 loops, best of 3: 1.29 µs per loop

In [7]: timeit b * a
10000000 loops, best of 3: 116 ns per loop

and with these changes:

In [2]: a = np.float32(4)

In [3]: b = np.float64(4)

In [4]: timeit a * a
10000000 loops, best of 3: 74 ns per loop

In [5]: timeit b * b
10000000 loops, best of 3: 73.7 ns per loop

In [6]: timeit a * b
10000000 loops, best of 3: 181 ns per loop

In [7]: timeit b * a
10000000 loops, best of 3: 125 ns per loop

Operations on mixed type scalars that result in a scalar of a new type
still use the ufunc fallback and hit the speed penalty e.g. F op d -> D.

In [2]: a = np.complex64(1+2j)

In [3]: b = np.double(4)

In [4]: timeit a * b
1000000 loops, best of 3: 1.22 µs per loop
  • Loading branch information
ewmoore committed Apr 18, 2016
commit f2fc804fd9e271a762458af9518d095c2572d072
Loading
0