8000 MAINT: Avoid moveaxis overhead in median. by anntzer · Pull Request #18324 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

MAINT: Avoid moveaxis overhead in median. #18324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 9, 2021
Merged

MAINT: Avoid moveaxis overhead in median. #18324

merged 1 commit into from
Feb 9, 2021

Conversation

anntzer
Copy link
Contributor
@anntzer anntzer commented Feb 4, 2021

This change speeds up taking the median of 1001 floats by ~20%, as
measured by
python -mtimeit -s 'import numpy as np; x = np.random.randn(1001)' -- 'np.median(x)'

(See also #18298, #18203.)

@seberg
Copy link
Member
seberg commented Feb 5, 2021

What does this do for high dimensional arrays, say arr = np.random.random((10000, 20)); np.median(arr, axis=-1)? Just curious, since take seems pretty terrible compared to slicing in that case, but likely the partition is just so much slower anyway, that it doesn't matter.

I can't say I like this, but I can live with it. If we want to do this, do we have a benchmark?

@anntzer
Copy link
Contributor Author
anntzer commented Feb 5, 2021

fwiw python -mtimeit -s 'import numpy as np; x = np.random.rand(10000, 20)' -- 'np.median(x, axis=-1)' shows no difference whatsoever in performance arising from this PR. As you say, likely the partition() call dwarfs everything.

Do you want something like the following?

diff --git i/benchmarks/benchmarks/bench_lib.py w/benchmarks/benchmarks/bench_lib.py
index c22ceaa5e..b86b2f6e9 100644
--- i/benchmarks/benchmarks/bench_lib.py
+++ w/benchmarks/benchmarks/bench_lib.py
@@ -6,6 +6,23 @@ from .common import Benchmark
 import numpy as np
 
 
+class Median(Benchmark):
+    """Benchmarks for `numpy.median`"""
+
+    param_names = ["array_size", "axis"]
+    params = [
+        [(1001,), (10000, 20)],
+        [None, -1],
+    ]
+
+    def setup(self, array_size, axis):
+        np.random.seed(123)
+        self.array = np.random.random(array_size)
+
+    def time_median(self, array_size, axis):
+        np.median(self.array, axis=axis)
+
+
 class Pad(Benchmark):
     """Benchmarks for `numpy.pad`.

@seberg
Copy link
Member
seberg commented Feb 5, 2021

Yeah, was thinking of something like that. Seems we already have benchmarks including "small arrays" (500 elements) in benchmarks/bench_function_base.py, could ammend that with a multi-dimensional example, but its probably enough.

Anyway, if no-one else has an opinion... It feels hackish to me, but I guess we should just roll with it and feel free to change it later (hopefully we would notice if there is a serious performance regression with the benchmarks).

I am wondering now, if I got the axes wrong, and it should have been arr = np.random.random((20, 10000)); np.median(arr, axis=0, but yeah, it probably just doesn't matter.

This change speeds up taking the median of 1001 floats by ~20%, as
measured by
`python -mtimeit -s 'import numpy as np; x = np.random.randn(1001)' -- 'np.median(x)'`
@anntzer
Copy link
Contributor Author
anntzer commented Feb 6, 2021

It feels hackish to me

I agree making moveaxis/asarray faster would be better, but that's beyond my pay grade :)

could ammend that with a multi-dimensional example, but its probably enough.

Added the benchmarks there.

I am wondering now, if I got the axes wrong, and it should have been arr = np.random.random((20, 10000)); np.median(arr, axis=0, but yeah, it probably just doesn't matter.

timeit doesn't show a difference in this case either.

@Qiyu8
Copy link
Member
Qiyu8 commented Feb 7, 2021

Are you suggesting that take is superio than np.moveaxis? I can see a lot of places such as chebyshev.py hermite.py laguerre.py legendre.py using functions like np.moveaxis(data, axis, -1) , if the answer is yes, can you provide the asv benchmark result here and try to improve the np.moveaxis itself ?

@seberg
Copy link
Member
seberg commented Feb 7, 2021

@Qiyu8 take is pretty much never good performance wise, this is absolutely an exception, and even here its more that moveaxis should be faster probably.

@anntzer
Copy link
Contributor Author
anntzer commented Feb 7, 2021

I did try to change the polynomial calls to moveaxis but t 8000 here there was no performance improvement.
Based on some quick profiling I believe the main costs here are Python vs. C overheads, so really the benefit here is that take() is fully implemented in C (and the cost of making a new array is less than the gain of minimizing Python calls). Speeding up moveaxis likely(?) means moving it to C too, which would be a somewhat bigger endeavour.

@seberg
Copy link
Member
seberg commented Feb 7, 2021

A not-so-small part of it may also be that take is a method, which removes __array_function__ overheads. np.median is really is all about overheads (and right now that is true even for 1000+ elements, which is not too nice).

Moving np.moveaxis to C is probably worth it (or maybe parts of it if it is enough) if it means we can skip this here and also make a real difference in other places.

Speeding up np.asarray, etc. I have an old PR for that, which I will probably dig up when the next PyPy beta release comes around hopefully (so fairly soon).

@mhvk
Copy link
Contributor
mhvk commented Feb 8, 2021

Slightly faster still (but less elegant to me) is

a = np.random.normal(size=(10,10,10))
%timeit item = [slice(None)]*a.ndim; item[axis] = -1; a[tuple(item)]
# 497 ns ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit a.take(-1, axis=axis)
# 762 ns ± 21.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit np.moveaxis(a, axis, -1); a[..., -1]
# 3.68 µs ± 24.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Ideally, np.median would just become a gufunc though...

@anntzer
Copy link
Contributor Author
anntzer commented Feb 8, 2021

I can go with @mhvk's solution if the rest agree (it seems best, except for the slight ugliness of the code needed to set up the right index tuple).

Alternatively (but more complex), given the existence of the same problem in #18203, perhaps a solution would be to fold nan-checking into PyArray_Partition and to have a variant of it (PyArray_PartitionCheckNan) which handles the nan checking and, if a nan is present, also sets all the requested partition entries to nan.

@seberg
Copy link
Member
seberg commented Feb 8, 2021

I am ok with it, its ugly, but maybe we should add a brief comment that it is a micro-opt for moveaxes. The doc of the function suggests multiple axes are allowed. that seems simply not true, but lets fix that?

I guess we do have a test for things like: np.random.random((5, 6, 7, 8)); np.median(arr, axis=(1, -2)) though?

I am not sure how easy it is to check for NaN during partition, I think if anything, it might be more useful to create an "optimization" gufunc "all_finite()" or so. Applying that to the upper half is in theory a bit faster probably.

EDIT: To be a bit more assertive, the list trick is fairly common also in other functions I think. Maybe np.moveaxes would be nicer there as well, but at least it will be in good company, which is maybe nicer than a stray take that looks like its there for a very good reason, but is not :).

@mhvk
Copy link
Contributor
mhvk commented Feb 8, 2021

Definitely need to check the multiple-axis case... To me, a.take is actually the more elegant solution.

@anntzer
Copy link
Contributor Author
anntzer commented Feb 9, 2021

We don't actually need to support multiple axis here because median goes through _ureduce for that. In fact the docstring of _median_nancheck is wrong, it doesnt support axis being a sequence of int (the call to moveaxis() would fail). See #18379.

@seberg
Copy link
Member
seberg commented Feb 9, 2021

OK, thanks @anntzer. I agree take reads the nicest and the unnecessary copy doesn't seem to matter here (its unlikely to do I guess, except in extreme circumstances, such as taking the median of thousands of length 3 arrays).

Going to put this in, although I am would be more than happy to pull it out again for a better alternative ;).

@seberg seberg merged commit 3f3e604 into numpy:master Feb 9, 2021
@anntzer anntzer deleted the fm branch February 9, 2021 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0