-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
API: Optimize np.unique for integer arrays; add kind
parameter
#21843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping.
(Though I am not sure if this is the best way to do it. Comments appreciated!)
np.add.at
is definitely how I'd approach this too.
Some early minor remarks below.
You could do normal slicing, except on a 2D array, then sum along one axis. Of course this would be N^2 memory usage... |
I don't know about the implementations, but I assume the "unbuffered" in "Performs unbuffered in place operation" might be relevant. Although you already lose a factor of 2 if you use Side note: it's better to post code as text rather than images, because people will be able to copy-paste them, and people using screen readers will also be able to see your code and results. Here's more or less your code with N = 100_000
counts = np.zeros(N, dtype=np.intp)
idx = np.random.randint(0, N, size=(N,))
>>> %timeit counts[idx] = 1
... %timeit counts[idx] += 1
... %timeit np.add.at(counts, idx, 1)
257 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
486 µs ± 9.57 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
6.36 ms ± 594 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) |
Right, will do that! Regarding the In [7]: N = 100_000
...: counts = np.zeros(N, dtype=np.intp)
...: idx = np.random.randint(0, 2, size=(N,))
In [8]: counts[idx] += 1
In [9]: np.max(counts)
Out[9]: 1 So I don't think it could be used here unfortunately. |
Indeed. My point was just that when you compare the performance of |
Ah I see, sorry. I guess my point is that I think even 12x or 25x seems extremely large. Why is it that slow? I tested this. Here's a barebones implementation in C: https://gist.github.com/MilesCranmer/d262f468d570df4b90c8769ebdcce213 If I call this with
About 64 ms. With counting turned off, i.e.,
About 38 ms. So there is only <2x difference... However, here is the python attempt: In [2]: x = np.random.randint(0, 10_000_000, size=(10_000_000,))
In [3]: counts = np.zeros(10_000_000, dtype=np.intp)
In [10]: %%timeit
...: counts[x] = 1
38.3 ms ± 83.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [11]: counts[:] = 0
In [12]: %%timeit
...: np.add.at(counts, x, 1)
1.14 s ± 7.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) So the counting one is much much slower. Maybe np.add.at is written in pure Python or something? Edit: Incorrectly passed a bool so times were the same if statement. Now fixed - about a 2x difference overall. Edit 2: Moved C example to a gist |
Good news: I sped up the new method for Here are the updated speeds, showing In [2]: x = np.random.randint(0, 10_000_000, size=(10_000_000,))
In [3]: %%timeit
...: np.unique(x, kind='sort')
828 ms ± 5.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [4]: %%timeit
...: np.unique(x, kind='table')
150 ms ± 6.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [5]: %%timeit
...: np.unique(x, kind='sort', return_counts=True)
876 ms ± 5.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [6]: %%timeit
...: np.unique(x, kind='table', return_counts=True)
329 ms ± 13.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
Okay I think the core API should be ready for a review. I checked through the unit-tests and many of them are for non-integral types, so it won't be as easy to |
Here are the results of some newly added benchmarks. This shows that one gets >12x speedups for large integer arrays, including with The only regressions look to be with very tiny arrays which are on the order of μs in difference. I think that is entirely due to the check for what method to use. If those regressions are not acceptable I could switch off the new method for small arrays, before the computation is done using ranges. before after ratio bench_lib.Unique.<test>(<array_size>, <array_range>)
[54c52f13] [158f25b3]
<v1.23.0^0> <unique-speed>
+ 7.69±0.03s 489±20ms 0.06 bench_lib.Unique.time_unique(100000000, 1778279)
+ 5.81±0.03s 447±10ms 0.08 bench_lib.Unique.time_unique(100000000, 31622)
+ 101±0.7ms 8.09±0.7ms 0.08 bench_lib.Unique.time_unique(1778279, 31622)
+ 101±1ms 9.18±1ms 0.09 bench_lib.Unique.time_unique_with_counts(1778279, 31622)
+ 1.22±0.01ms 111±40μs 0.09 bench_lib.Unique.time_unique(31622, 562)
+ 70.6±1ms 7.84±0.9ms 0.11 bench_lib.Unique.time_unique(1778279, 562)
+ 3.98±0.04s 442±10ms 0.11 bench_lib.Unique.time_unique(100000000, 562)
+ 7.74±0.06s 870±200ms 0.11 bench_lib.Unique.time_unique_with_counts(100000000, 1778279)
+ 69.6±0.4ms 8.89±0.7ms 0.13 bench_lib.Unique.time_unique_with_counts(1778279, 562)
+ 131±0.9ms 19.0±1ms 0.14 bench_lib.Unique.time_unique(1778279, 1778279)
+ 9.20±0s 1.49±0.09s 0.16 bench_lib.Unique.time_unique(100000000, 100000000)
+ 607±2μs 111±40μs 0.18 bench_lib.Unique.time_unique(31622, 10)
+ 2.37±0.04s 444±10ms 0.19 bench_lib.Unique.time_unique(100000000, 10)
+ 38.4±0.9ms 7.35±1ms 0.19 bench_lib.Unique.time_unique(1778279, 10)
+ 2.33±0.06s 462±200ms 0.20 bench_lib.Unique.time_unique_with_counts(100000000, 10)
+ 1.72±0.01ms 357±20μs 0.21 bench_lib.Unique.time_unique(31622, 31622)
+ 1.24±0.01ms 260±50μs 0.21 bench_lib.Unique.time_unique_with_counts(31622, 562)
+ 38.8±1ms 8.92±0.9ms 0.23 bench_lib.Unique.time_unique_with_counts(1778279, 10)
+ 139±3ms 38.2±3ms 0.28 bench_lib.Unique.time_unique_with_counts(1778279, 1778279)
+ 1.91±0.02ms 717±100μs 0.38 bench_lib.Unique.time_unique_with_counts(31622, 31622)
+ 630±6μs 253±40μs 0.40 bench_lib.Unique.time_unique_with_counts(31622, 10) Slow downs: - 11.1±0.1μs 20.2±0.9μs 1.82 bench_lib.Unique.time_unique(562, 562)
- 19.0±0.7μs 34.4±7μs 1.82 bench_lib.Unique.time_unique_with_counts(562, 562)
- 10.1±0.2μs 22.3±0.9μs 2.20 bench_lib.Unique.time_unique(562, 1778279)
- 14.4±0.08μs 32.2±8μs 2.24 bench_lib.Unique.time_unique_with_counts(562, 10)
- 9.92±0.2μs 22.3±0.7μs 2.25 bench_lib.Unique.time_unique(562, 31622)
- 10.7±0.2μs 24.3±7μs 2.27 bench_lib.Unique.time_unique_with_counts(10, 10)
- 9.75±0.07μs 22.3±0.6μs 2.29 bench_lib.Unique.time_unique(562, 100000000)
- 8.22±0.1μs 19.1±1μs 2.32 bench_lib.Unique.time_unique(562, 10)
- 18.2±0.6μs 49.0±20μs 2.69 bench_lib.Unique.time_unique_with_counts(562, 1778279)
- 10.5±0.1μs 29.7±10μs 2.83 bench_lib.Unique.time_unique_with_counts(10, 31622)
- 16.9±0.1μs 49.1±10μs 2.90 bench_lib.Unique.time_unique_with_counts(562, 31622)
- 17.4±0.1μs 52.1±20μs 2.99 bench_lib.Unique.time_unique_with_counts(562, 100000000)
- 10.7±0.1μs 32.4±8μs 3.04 bench_lib.Unique.time_unique_with_counts(10, 100000000)
- 10.4±0.09μs 33.0±10μs 3.18 bench_lib.Unique.time_unique_with_counts(10, 562)
- 10.6±0.1μs 37.5±10μs 3.55 bench_lib.Unique.time_unique_with_counts(10, 1778279)
- 4.43±0.04μs 16.3±1μs 3.67 bench_lib.Unique.time_unique(10, 31622)
- 4.46±0.1μs 16.7±2μs 3.74 bench_lib.Unique.time_unique(10, 100000000)
- 4.33±0.08μs 17.1±1μs 3.95 bench_lib.Unique.time_unique(10, 1778279)
- 4.41±0.03μs 17.5±0.6μs 3.98 bench_lib.Unique.time_unique(10, 562)
- 4.50±0.06μs 18.1±5μs 4.02 bench_lib.Unique.time_unique(10, 10) |
Posted to the mailing list. |
I have not seen any email, maybe you got the wrong thing? This is the right one: https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ |
Ah, I see, sorry. It said "Your message to the NumPy-Discussion mailing-list was rejected for the following Will do it correctly this time... |
This PR fixes the issue discussed on #12065 and #21843 where 'timedelta64' was noted to be a subtype of numpy.integer. This in principle should detect any cases where int(np.min(ar2)) fails. This PR also adds unittests for these. * TST: Create in1d test for timedelta input * MAINT: fix in1d for timedelta input * TST: in1d raise ValueError for timedelta input * MAINT: Clean up type checking for isin kind="table" * TST: Add test for mixed boolean/integer in1d * MAINT: Increase readability of in1d type checking * STY: Apply small code style tweaks This is probably really mainly my personal opinion... Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
67da234
to
837101a
Compare
Hey @seberg! It's been a while but I realized we never merged this yet. I have a bit of time so I was wondering if we could push on this PR and try to get it merged? I just updated my branch to Recent update: there was an issue with Cheers, |
This commit implements the same lookup table method from #12065 for
np.unique
.np.unique
is, likenp.isin
, based on sorting and array comparison. These methods, though more generic, can be greatly sped up for integer dtypes with a lookup table. For unique, this works as follows:calloc
a lookup table with each element corresponding to an integer:x = np.zeros(ar_range)
1
where a number exists in the array:x[ar - ar_min] = 1
1
:return np.nonzero(x) + ar_min
A similar strategy can be used for
return_counts=True
. Instead of filling the lookup table with1
, you can use the table to count, with, for example,which will count the occurrences of each integer. (Though I am not sure if this is the best way to do it. Comments appreciated!)
I can see speedups similar to found with
np.isin
for large integral arrays:Comments very much appreciated!
cc'ing everybody recently active in the
isin
PR: @seberg @WarrenWeckesser @hameerabbasi @shoyer @adeakMailing list discussion: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/3SJTDDF7D4VLZIUPQKHJMSYU2RVY3MHH/
Edit 1: Improved speed for
return_counts=True
.Edit 2: No longer a work in progress.
Edit 3: Added mailing list link