median() is slower than partition(..., n//2)[n//2] #18298

anntzer · 2021-02-02T09:33:54Z

For arrays up to ~1e6 elements, calling median() is slower that calling partition(..., n//2) followed by getting the (n//2)th element. In fact, there seems to be so much overhead in median() that it's even slower than sorting the whole array and getting the (n//2)th element for arrays up to ~1e3 elements.

Reproducing code example:

(I cheated a bit and only test odd-size arrays here, but for even-size arrays, the overhead for partition() would be no more than ~1.5x (partition to get the (n//2)th element and then the min of all elements above that to get the (n//2+1)th.)

In [1]: for n in [11, 101, 1_001, 10_001, 100_001, 1_000_001, 10_000_001]:
   ...:     print(n)
   ...:     x = np.random.rand(n)
   ...:     assert np.median(x) == np.sort(x)[n//2] == np.partition(x, n//2)[n//2]
   ...:     %timeit np.median(x)
   ...:     %timeit np.sort(x)[n//2]
   ...:     %timeit np.partition(x, n//2)[n//2]
   ...: 
11
74.6 µs ± 5.06 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
5.76 µs ± 397 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
6.68 µs ± 244 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
101
88.4 µs ± 4.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7.32 µs ± 385 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
7.95 µs ± 386 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1001
103 µs ± 7.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
61.2 µs ± 2.45 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
15 µs ± 720 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
10001
398 µs ± 32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.05 ms ± 31.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
243 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
100001
2.61 ms ± 172 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13.2 ms ± 376 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.36 ms ± 53.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1000001
25.1 ms ± 1.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
164 ms ± 3.96 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
21.6 ms ± 1.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
10000001
349 ms ± 33.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.06 s ± 95.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
375 ms ± 54.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

NumPy/Python version information:

1.20.0 3.9.1 (default, Dec 13 2020, 11:55:53) 
[GCC 10.2.0]

The text was updated successfully, but these errors were encountered:

seberg · 2021-02-02T17:12:14Z

median has quite a lot of overheads compared to partition(I have some work that might significantly reduce those soon – by 50% IIRC). But I think the main difference you are timing is the fact that median adds a check for NaN (by partitioning also the -1 element):

In [6]: In [1]: for n in [10_001, 100_001]:
   ...:    ...:     print(n)
   ...:    ...:     x = np.random.rand(n)
   ...:    ...:     assert np.median(x) == np.sort(x)[n//2] == np.partition(x, n//2)[n//2]
   ...:    ...:     %timeit np.median(x)
   ...:    ...:     %timeit np.partition(x, [n//2, -1])[n//2]
   ...: 
10001
112 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
79.8 µs ± 372 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
100001
1.09 ms ± 577 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.06 ms ± 1.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

anntzer · 2021-02-04T15:35:25Z

Fair point, but I'd guess partitioning two elements should not take more than twice the time it takes to partition just one element(?), and the overhead is much more than twofold for small arrays.

seberg · 2021-02-04T15:54:59Z

Yah, sorry, I guess your focus was really the overheads. There are many reasons for the fairly large overheads, the main one: np.median is written in python, which causes other overheads to pile up over the fairly long and complex function.

One thing, I have been looking at (by now, I am mostly interested in it because it caused me to cleanup ufunc code in the process to be honest), is to generally reduce overheads of some functions (array and friends, ufunc calls, and methods calls with keyword arguments). That had a pretty big effect on median IIRC (although maybe much more so on even sized).

Another big thing might be to speed up __array_function__, which requires two design changes in it though, to go the route that seems most promising (neither of which is likely a problem, but I am not sure how big of a project it would be), which would reduce a lot of overheads considerably.

More limited to median itself, but maybe far more promising:

Maybe the code can be restructured to be shorter/more elegant (and in this case, thus also have less overheads).
median could be made a gufunc (potentially with a Python core). Even a Python core may help, since it optimizes simplifies some of the setup code at least (if the 1-D median has much less overhead compared to the N-D version).
Parts of median could be moved to C
Or it is soon time to look into other acceleration approaches, cython/pythran for certain functions...

Median is a bit of a complex beast, so not sure how easy these are.

anntzer mentioned this issue Feb 4, 2021

MAINT: Avoid moveaxis overhead in median. #18324

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

median() is slower than partition(..., n//2)[n//2] #18298

median() is slower than partition(..., n//2)[n//2] #18298

Uh oh!

Uh oh!

Uh oh!

Uh oh!

median() is slower than partition(..., n//2)[n//2] #18298

median() is slower than partition(..., n//2)[n//2] #18298

Comments

Reproducing code example:

NumPy/Python version information:

Uh oh!

Uh oh!

Uh oh!