8000 median() is slower than partition(..., n//2)[n//2] · Issue #18298 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

median() is slower than partition(..., n//2)[n//2] #18298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anntzer opened this issue Feb 2, 2021 · 3 comments
Open

median() is slower than partition(..., n//2)[n//2] #18298

anntzer opened this issue Feb 2, 2021 · 3 comments

Comments

@anntzer
Copy link
Contributor
anntzer commented Feb 2, 2021

For arrays up to ~1e6 elements, calling median() is slower that calling partition(..., n//2) followed by getting the (n//2)th element. In fact, there seems to be so much overhead in median() that it's even slower than sorting the whole array and getting the (n//2)th element for arrays up to ~1e3 elements.

Reproducing code example:

(I cheated a bit and only test odd-size arrays here, but for even-size arrays, the overhead for partition() would be no more than ~1.5x (partition to get the (n//2)th element and then the min of all elements above that to get the (n//2+1)th.)

In [1]: for n in [11, 101, 1_001, 10_001, 100_001, 1_000_001, 10_000_001]:
   ...:     print(n)
   ...:     x = np.random.rand(n)
   ...:     assert np.median(x) == np.sort(x)[n//2] == np.partition(x, n//2)[n//2]
   ...:     %timeit np.median(x)
   ...:     %timeit np.sort(x)[n//2]
   ...:     %timeit np.partition(x, n//2)[n//2]
   ...: 
11
74.6 µs ± 5.06 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
5.76 µs ± 397 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
6.68 µs ± 244 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
101
88.4 µs ± 4.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7.32 µs ± 385 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
7.95 µs ± 386 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1001
103 µs ± 7.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
61.2 µs ± 2.45 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
15 µs ± 720 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
10001
398 µs ± 32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.05 ms ± 31.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
243 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
100001
2.61 ms ± 172 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13.2 ms ± 376 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.36 ms ± 53.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1000001
25.1 ms ± 1.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
164 ms ± 3.96 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
21.6 ms ± 1.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
10000001
349 ms ± 33.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.06 s ± 95.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
375 ms ± 54.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

NumPy/Python version information:

1.20.0 3.9.1 (default, Dec 13 2020, 11:55:53) 
[GCC 10.2.0]
@seberg
Copy link
Member
seberg commented Feb 2, 2021

median has quite a lot of overheads compared to partition(I have some work that might significantly reduce those soon – by 50% IIRC). But I think the main difference you are timing is the fact that median adds a check for NaN (by partitioning also the -1 element):

In [6]: In [1]: for n in [10_001, 100_001]:
   ...:    ...:     print(n)
   ...:    ...:     x = np.random.rand(n)
   ...:    ...:     assert np.median(x) == np.sort(x)[n//2] == np.partition(x, n//2)[n//2]
   ...:    ...:     %timeit np.median(x)
   ...:    ...:     %timeit np.partition(x, [n//2, -1])[n//2]
   ...: 
10001
112 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
79.8 µs ± 372 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
100001
1.09 ms ± 577 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.06 ms ± 1.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

@anntzer
Copy link
Contributor Author
anntzer commented Feb 4, 2021

Fair point, but I'd guess partitioning two elements should not take more than twice the time it takes to partition just one element(?), and the overhead is much more than twofold for small arrays.

@seberg
Copy link
Member
seberg commented Feb 4, 2021

Yah, sorry, I guess your focus was really the overheads. There are many reasons for the fairly large overheads, the main one: np.median is written in python, which causes other overheads to pile up over the fairly long and complex function.

One thing, I have been looking at (by now, I am mostly interested in it because it caused me to cleanup ufunc code in the process to be honest), is to generally reduce overheads of some functions (array and friends, ufunc calls, and methods calls with keyword arguments). That had a pretty big effect on median IIRC (although maybe much more so on even sized).

Another big thing might be to speed up __array_function__, which requires two design changes in it though, to go the route that seems most promising (neither of which is likely a problem, but I am not sure how big of a project it would be), which would reduce a lot of overheads considerably.

More limited to median itself, but maybe far more promising:

  • Maybe the code can be restructured to be shorter/more elegant (and in this case, thus also have less overheads).
  • median could be made a gufunc (potentially with a Python core). Even a Python core may help, since it optimizes simplifies some of the setup code at least (if the 1-D median has much less overhead compared to the N-D version).
  • Parts of median could be moved to C
  • Or it is soon time to look into other acceleration approaches, cython/pythran for certain functions...

Median is a bit of a complex beast, so not sure how easy these are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0