-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
PERF: Reduce overhead of np.linalg.norm for small arrays #21394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reduce the overhead of np.linalg.norm by replacing dot(x,x) with x.dot(x) and replacing the sqrt calculation with a version specialized for float numbers
Thanks! The first change, I think we might as well just do. Since the input is an exact array (and not even a subclass). |
Agreed - are we sure that the second change actually has an appreciable impact on the benchmarks? It'd be good to know what the contribution to performance improvement is from each change. Also re: benchmarking, we could likely get more comprehensive results with the NumPy benchmarking suite, e.g. |
@seberg @rossbar I am not a fan of the second change as well, but was hoping we could somehow use an (internal) numpy method to perform the sqrt calculation for the various floating point types (e.g. I tried running the benchmarks, but they do not work on my systems. Instead, let me give you some numbers so you can decide whether to include the second optimization or not. Note: these results are on python 3.10 windows (before it was python 3.8, linux)
Benchmark:
Result:
(this is on python 3.10 windows, on python 3.8 linux I see similar results)
|
Please open an issue with the tracebacks. This is the standard way for reporting performance enhancements in a reproducible way and allows us to track regressions. Maybe this PR improves one measure of performance but hurts others (unlikely, but stranger things have happened). |
I removed the specialization for float. With the ideas in #21423 (and some other ideas) the overhead from Running benchmarks still does not work, it is due to #21288 (but it does not yet give me a solution). |
Yeah, I just ran into the benchmark problem as well and opened gh-21428 for it. I am not sure it would be super useful here anyway, because we are a bit short on tests for small arrays/scalars. So it would be nice to add tests for the norm of very small arrays, but considering how simple it is now and that the benchmarks don't run easily, I am happy to glance over it now. Can you clean up the |
Benchmark:
|
OK, the benchmark seems to at least notice a small difference and I don't think this adds any maintanence burden anyway in the current state. So lets just get it in. Thanks @eendebakpt. |
The
numpy.linalg.norm
method has some overhead for small and medium sized arrays. This PR reduces the overhead by making two small adjustments to the implementation. Details are discussed below.Benchmark:
Results for main
This PR
The call
dot(x,x)
is replaced withx.dot(x)
. Sincex
has been cast withasarray
andravel
, we know thatx
is a 1D numpy array. An alternative would benp.inner(x,x)
. The call withinner
is faster thandot(x,x)
but slower thanx.dot(x)
The call to
np.sqrt
is replaced with a specialized call for float numbers (the common case). Ideally, the call would be replaced with a call to a numpy version ofsqrt
that only handles the scalar case. I could not find such a method exposed in the numpy codebase. Would it be possible to use a create such a method?