8000 MAINT: Don't always make full copies in (arg)searchsorted. by ewmoore · Pull Request #16942 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

MAINT: Don't always make full copies in (arg)searchsorted. #16942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ewmoore
Copy link
Contributor
@ewmoore ewmoore commented Jul 24, 2020

When running searchsorted on arrays with different dtypes currently full copies are made of of each to a common dtype. This is rather inefficient since we really only need a few elements to be converted. This makes searchsorted and argsearchsorted only copy when necessary. This would close gh-13579.

I've labeled this WIP since either the approach I'm using will be rejected or we will need to move some more code around so that the dtype transfer functions are expose enough that they can be called from with npysort. I'm not sure what is gained really by having the sorting code off by itself, so hopefully compiling it with the rest of multiarray will be fine. I don't think it is linked to anything else.

I've added a benchmark for these changes. The results are included below. It almost certainly has too many sets of parameters for keeping around forever, but since there will be changes needed here, that can change too. I'm not entirely sure why those three floating point loops are slowed a little bit, everything is as fast or faster.

This PR also has a change to runtests.py, maybe this isn't needed, but --bench-compare wouldn't work for me without it.

Looking forward to your feedback.


       before           after         ratio
     [327327dd]       [0d82f47d]
     <ss_nocopy_orig~1>       <ss_nocopy_orig>
+         453±1μs          510±1μs     1.13  bench_function_base.SearchSorted.time_argsearchsorted(10000, 1000, 'sorted', ('f8', 'f8'))
+     70.1±0.02μs       76.1±0.1μs     1.08  bench_function_base.SearchSorted.time_argsearchsorted(1000, 1000, 'sorted', ('f8', 'f8'))
+       403±0.2μs          426±1μs     1.06  bench_function_base.SearchSorted.time_searchsorted(10000, 1000, 'sorted', ('f8', 'f8'))
-     18.7±0.02μs      17.8±0.05μs     0.95  bench_function_base.SearchSorted.time_argsearchsorted(100, 10000, 'random', ('i8', 'i8'))
-     1.63±0.01μs      1.54±0.01μs     0.95  bench_function_base.SearchSorted.time_searchsorted(1, 1000, 'sorted', ('i8', 'i8'))
-     4.46±0.02μs      4.23±0.02μs     0.95  bench_function_base.SearchSorted.time_argsearchsorted(10, 1000, 'sorted', ('i8', 'i4'))
-         712±3μs          674±6μs     0.95  bench_function_base.SearchSorted.time_argsearchsorted(1000, 1000000, 'sorted', ('i8', 'i4'))
-     27.7±0.09μs      26.1±0.04μs     0.94  bench_function_base.SearchSorted.time_argsearchsorted(100, 100000, 'random', ('i8', 'i4'))
-     2.48±0.04μs      2.34±0.02μs     0.94  bench_function_base.SearchSorted.time_searchsorted(1, 10, 'sorted', ('i4', 'i8'))
-      26.5±0.1μs      24.9±0.07μs     0.94  bench_function_base.SearchSorted.time_argsearchsorted(100, 100000, 'random', ('i8', 'i8'))
-     16.1±0.09μs      15.1±0.02μs     0.94  bench_function_base.SearchSorted.time_argsearchsorted(100, 1000, 'random', ('i4', 'u4'))
-     15.5±0.07μs      14.4±0.03μs     0.93  bench_function_base.SearchSorted.time_argsearchsorted(100, 1000, 'random', ('i4', 'i8'))
-     6.50±0.05μs      6.05±0.03μs     0.93  bench_function_base.SearchSorted.time_argsearchsorted(100, 10, 'random', ('i4', 'i8'))
-     13.4±0.06μs      12.5±0.07μs     0.93  bench_function_base.SearchSorted.time_argsearchsorted(100, 1000, 'random', ('i8', 'i4'))
-     2.58±0.02μs      2.40±0.03μs     0.93  bench_function_base.SearchSorted.time_searchsorted(10, 10, 'random', ('i4', 'i8'))
-     6.64±0.03μs      6.15±0.05μs     0.93  bench_function_base.SearchSorted.time_argsearchsorted(100, 10, 'random', ('i8', 'i4'))
-     12.2±0.04μs      11.3±0.06μs     0.93  bench_function_base.SearchSorted.time_argsearchsorted(100, 1000, 'random', ('i8', 'i8'))
-     7.21±0.04μs      6.66±0.06μs     0.92  bench_function_base.SearchSorted.time_argsearchsorted(100, 10, 'random', ('i4', 'u4'))
-        3.63±0ms         3.30±0ms     0.91  bench_function_base.SearchSorted.time_searchsorted(100000, 1000, 'sorted', ('f8', 'f8'))
-     5.44±0.05μs      4.94±0.02μs     0.91  bench_function_base.SearchSorted.time_argsearchsorted(100, 10, 'random', ('i8', 'i8'))
-         385±2μs        296±0.5μs     0.77  bench_function_base.SearchSorted.time_argsearchsorted(1000, 100000, 'random', ('i4', 'i8'))
-     21.2±0.08ms      16.2±0.06ms     0.77  bench_function_base.SearchSorted.time_argsearchsorted(10000, 1000000, 'random', ('i4', 'i8'))
-         390±7μs        298±0.5μs     0.76  bench_function_base.SearchSorted.time_argsearchsorted(1000, 100000, 'random', ('i4', 'u4'))
-     21.4±0.07ms      16.3±0.07ms     0.76  bench_function_base.SearchSorted.time_argsearchsorted(10000, 1000000, 'random', ('i4', 'u4'))
-      33.7±0.2μs      25.3±0.05μs     0.75  bench_function_base.SearchSorted.time_argsearchsorted(100, 10000, 'random', ('i4', 'u4'))
-      32.9±0.1μs      24.5±0.08μs     0.74  bench_function_base.SearchSorted.time_argsearchsorted(100, 10000, 'random', ('i4', 'i8'))
-         270±4μs        197±0.3μs     0.73  bench_function_base.SearchSorted.time_argsearchsorted(1000, 100000, 'sorted', ('i4', 'i8'))
-       294±0.5μs        213±0.3μs     0.73  bench_function_base.SearchSorted.time_searchsorted(1000, 100000, 'random', ('i4', 'u4'))
-       292±0.6μs        211±0.3μs     0.72  bench_function_base.SearchSorted.time_searchsorted(1000, 100000, 'random', ('i4', 'i8'))
-         277±2μs        200±0.2μs     0.72  bench_function_base.SearchSorted.time_argsearchsorted(1000, 100000, 'sorted', ('i4', 'u4'))
-     10.3±0.06ms      7.43±0.02ms     0.72  bench_function_base.SearchSorted.time_argsearchsorted(10000, 1000000, 'sorted', ('i4', 'u4'))
-     10.3±0.05ms      7.36±0.02ms     0.72  bench_function_base.SearchSorted.time_argsearchsorted(10000, 1000000, 'sorted', ('i4', 'i8'))
-     7.17±0.06μs      4.99±0.04μs     0.70  bench_function_base.SearchSorted.time_argsearchsorted(10, 1000, 'random', ('i4', 'u4'))
-      30.1±0.1μs      20.7±0.05μs     0.69  bench_function_base.SearchSorted.time_argsearchsorted(100, 10000, 'sorted', ('i4', 'u4'))
-     29.3±0.06μs      19.7±0.04μs     0.67  bench_function_base.SearchSorted.time_argsearchsorted(100, 10000, 'sorted', ('i4', 'i8'))
-     7.25±0.06μs      4.85±0.03μs     0.67  bench_function_base.SearchSorted.time_argsearchsorted(10, 1000, 'sorted', ('i4', 'u4'))
-      27.2±0.1μs      18.2±0.03μs     0.67  bench_function_base.SearchSorted.time_searchsorted(100, 10000, 'random', ('i4', 'u4'))
-     26.5±0.09μs      17.4±0.03μs     0.66  bench_function_base.SearchSorted.time_searchsorted(100, 10000, 'random', ('i4', 'i8'))
-     6.50±0.06μs      4.17±0.01μs     0.64  bench_function_base.SearchSorted.time_argsearchsorted(10, 1000, 'random', ('i4', 'i8'))
-     5.31±0.02μs      3.35±0.04μs     0.63  bench_function_base.SearchSorted.time_searchsorted(10, 1000, 'sorted', ('i4', 'u4'))
-       228±0.3μs        142±0.1μs     0.62  bench_function_base.SearchSorted.time_searchsorted(1000, 100000, 'sorted', ('i4', 'u4'))
-       225±0.2μs        140±0.1μs     0.62  bench_function_base.SearchSorted.time_searchsorted(1000, 100000, 'sorted', ('i4', 'i8'))
-     6.61±0.03μs      4.09±0.03μs     0.62  bench_function_base.SearchSorted.time_argsearchsorted(10, 1000, 'sorted', ('i4', 'i8'))
-     5.29±0.03μs      3.27±0.05μs     0.62  bench_function_base.SearchSorted.time_searchsorted(10, 1000, 'random', ('i4', 'u4'))
-     24.8±0.05μs      14.8±0.02μs     0.60  bench_function_base.SearchSorted.time_searchsorted(100, 10000, 'sorted', ('i4', 'u4'))
-     6.56±0.03μs      3.87±0.01μs     0.59  bench_function_base.SearchSorted.time_argsearchsorted(1, 1000, 'random', ('i4', 'u4'))
-     6.62±0.02μs      3.86±0.01μs     0.58  bench_function_base.SearchSorted.time_argsearchsorted(1, 1000, 'sorted', ('i4', 'u4'))
-     4.69±0.03μs      2.73±0.01μs     0.58  bench_function_base.SearchSorted.time_searchsorted(10, 1000, 'sorted', ('i4', 'i8'))
-      24.2±0.1μs      14.0±0.04μs     0.58  bench_function_base.SearchSorted.time_searchsorted(100, 10000, 'sorted', ('i4', 'i8'))
-     4.68±0.02μs      2.65±0.02μs     0.57  bench_function_base.SearchSorted.time_searchsorted(10, 1000, 'random', ('i4', 'i8'))
-     5.90±0.07μs      3.11±0.03μs     0.53  bench_function_base.SearchSorted.time_argsearchsorted(1, 1000, 'random', ('i4', 'i8'))
-     5.95±0.05μs      3.09±0.02μs     0.52  bench_function_base.SearchSorted.time_argsearchsorted(1, 1000, 'sorted', ('i4', 'i8'))
-     4.95±0.02μs      2.41±0.06μs     0.49  bench_function_base.SearchSorted.time_searchsorted(1, 1000, 'random', ('i4', 'u4'))
-     4.96±0.03μs      2.39±0.06μs     0.48  bench_function_base.SearchSorted.time_searchsorted(1, 1000, 'sorted', ('i4', 'u4'))
-     4.39±0.01μs      1.80±0.03μs     0.41  bench_function_base.SearchSorted.time_searchsorted(1, 1000, 'random', ('i4', 'i8'))
-     4.42±0.03μs      1.79±0.03μs     0.41  bench_function_base.SearchSorted.time_searchsorted(1, 1000, 'sorted', ('i4', 'i8'))
-      11.9±0.1ms      4.54±0.03ms     0.38  bench_function_base.SearchSorted.time_searchsorted(10000, 1000000, 'random', ('i4', 'u4'))
-     11.7±0.08ms      4.41±0.05ms     0.38  bench_function_base.SearchSorted.time_searchsorted(10000, 1000000, 'random', ('i4', 'i8'))
-     7.05±0.04ms      2.56±0.01ms     0.36  bench_function_base.SearchSorted.time_searchsorted(10000, 1000000, 'sorted', ('i4', 'u4'))
-     6.97±0.02ms      2.53±0.02ms     0.36  bench_function_base.SearchSorted.time_searchsorted(10000, 1000000, 'sorted', ('i4', 'i8'))
-     18.8±0.06μs      5.72±0.04μs     0.30  bench_function_base.SearchSorted.time_argsearchsorted(10, 10000, 'random', ('i4', 'u4'))
-     19.0±0.07μs      5.73±0.03μs     0.30  bench_function_base.SearchSorted.time_argsearchsorted(10, 10000, 'sorted', ('i4', 'u4'))
-     18.2±0.06μs      4.90±0.02μs     0.27  bench_function_base.SearchSorted.time_argsearchsorted(10, 10000, 'random', ('i4', 'i8'))
-     18.3±0.07μs      4.91±0.02μs     0.27  bench_function_base.SearchSorted.time_argsearchsorted(10, 10000, 'sorted', ('i4', 'i8'))
-     17.0±0.03μs      3.80±0.06μs     0.22  bench_function_base.SearchSorted.time_searchsorted(10, 10000, 'random', ('i4', 'u4'))
-         155±1μs      34.0±0.06μs     0.22  bench_function_base.SearchSorted.time_argsearchsorted(100, 100000, 'random', ('i4', 'u4'))
-     18.0±0.06μs      3.91±0.03μs     0.22  bench_function_base.SearchSorted.time_argsearchsorted(1, 10000, 'random', ('i4', 'u4'))
-       154±0.2μs      33.1±0.08μs     0.22  bench_function_base.SearchSorted.time_argsearchsorted(100, 100000, 'random', ('i4', 'i8'))
-     17.0±0.04μs      3.64±0.07μs     0.21  bench_function_base.SearchSorted.time_searchsorted(10, 10000, 'sorted', ('i4', 'u4'))
-     18.0±0.09μs      3.87±0.03μs     0.21  bench_function_base.SearchSorted.time_argsearchsorted(1, 10000, 'sorted', ('i4', 'u4'))
-     16.4±0.06μs      3.14±0.02μs     0.19  bench_f
8000
unction_base.SearchSorted.time_searchsorted(10, 10000, 'random', ('i4', 'i8'))
-       149±0.3μs      27.9±0.06μs     0.19  bench_function_base.SearchSorted.time_argsearchsorted(100, 100000, 'sorted', ('i4', 'u4'))
-     16.3±0.08μs      3.03±0.01μs     0.19  bench_function_base.SearchSorted.time_searchsorted(10, 10000, 'sorted', ('i4', 'i8'))
-     17.3±0.05μs      3.14±0.01μs     0.18  bench_function_base.SearchSorted.time_argsearchsorted(1, 10000, 'random', ('i4', 'i8'))
-       148±0.3μs      26.9±0.08μs     0.18  bench_function_base.SearchSorted.time_argsearchsorted(100, 100000, 'sorted', ('i4', 'i8'))
-     17.3±0.04μs      3.13±0.04μs     0.18  bench_function_base.SearchSorted.time_argsearchsorted(1, 10000, 'sorted', ('i4', 'i8'))
-     5.74±0.03ms          969±7μs     0.17  bench_function_base.SearchSorted.time_argsearchsorted(1000, 1000000, 'random', ('i4', 'u4'))
-       144±0.2μs      23.7±0.05μs     0.16  bench_function_base.SearchSorted.time_searchsorted(100, 100000, 'random', ('i4', 'u4'))
-     5.75±0.02ms         944±10μs     0.16  bench_function_base.SearchSorted.time_argsearchsorted(1000, 1000000, 'random', ('i4', 'i8'))
-       143±0.2μs      23.0±0.08μs     0.16  bench_function_base.SearchSorted.time_searchsorted(100, 100000, 'random', ('i4', 'i8'))
-     16.3±0.04μs      2.49±0.06μs     0.15  bench_function_base.SearchSorted.time_searchsorted(1, 10000, 'random', ('i4', 'u4'))
-     16.4±0.04μs      2.49±0.03μs     0.15  bench_function_base.SearchSorted.time_searchsorted(1, 10000, 'sorted', ('i4', 'u4'))
-       141±0.2μs      19.3±0.06μs     0.14  bench_function_base.SearchSorted.time_searchsorted(100, 100000, 'sorted', ('i4', 'u4'))
-       140±0.2μs      18.6±0.09μs     0.13  bench_function_base.SearchSorted.time_searchsorted(100, 100000, 'sorted', ('i4', 'i8'))
-     15.7±0.03μs      1.86±0.03μs     0.12  bench_function_base.SearchSorted.time_searchsorted(1, 10000, 'random', ('i4', 'i8'))
-     15.7±0.07μs      1.84±0.01μs     0.12  bench_function_base.SearchSorted.time_searchsorted(1, 10000, 'sorted', ('i4', 'i8'))
-     5.07±0.02ms         413±10μs     0.08  bench_function_base.SearchSorted.time_argsearchsorted(1000, 1000000, 'sorted', ('i4', 'u4'))
-     5.05±0.02ms          394±8μs     0.08  bench_function_base.SearchSorted.time_argsearchsorted(1000, 1000000, 'sorted', ('i4', 'i8'))
-     4.62±0.04ms        291±0.3μs     0.06  bench_function_base.SearchSorted.time_searchsorted(1000, 1000000, 'random', ('i4', 'u4'))
-     4.67±0.02ms        287±0.5μs     0.06  bench_function_base.SearchSorted.time_searchsorted(1000, 1000000, 'random', ('i4', 'i8'))
-       133±0.1μs      6.90±0.04μs     0.05  bench_function_base.SearchSorted.time_argsearchsorted(10, 100000, 'random', ('i4', 'u4'))
-       132±0.2μs      6.60±0.05μs     0.05  bench_function_base.SearchSorted.time_argsearchsorted(10, 100000, 'sorted', ('i4', 'u4'))
-     4.42±0.04ms        209±0.2μs     0.05  bench_function_base.SearchSorted.time_searchsorted(1000, 1000000, 'sorted', ('i4', 'u4'))
-     4.46±0.02ms        206±0.3μs     0.05  bench_function_base.SearchSorted.time_searchsorted(1000, 1000000, 'sorted', ('i4', 'i8'))
-       132±0.2μs      6.09±0.07μs     0.05  bench_function_base.SearchSorted.time_argsearchsorted(10, 100000, 'random', ('i4', 'i8'))
-       132±0.2μs      5.79±0.08μs     0.04  bench_function_base.SearchSorted.time_argsearchsorted(10, 100000, 'sorted', ('i4', 'i8'))
-       130±0.2μs      4.33±0.03μs     0.03  bench_function_base.SearchSorted.time_searchsorted(10, 100000, 'random', ('i4', 'u4'))
-       130±0.2μs      4.03±0.09μs     0.03  bench_function_base.SearchSorted.time_searchsorted(10, 100000, 'sorted', ('i4', 'u4'))
-       131±0.3μs      3.92±0.02μs     0.03  bench_function_base.SearchSorted.time_argsearchsorted(1, 100000, 'sorted', ('i4', 'u4'))
-       131±0.2μs      3.90±0.03μs     0.03  bench_function_base.SearchSorted.time_argsearchsorted(1, 100000, 'random', ('i4', 'u4'))
-       130±0.1μs      3.69±0.02μs     0.03  bench_function_base.SearchSorted.time_searchsorted(10, 100000, 'random', ('i4', 'i8'))
-       129±0.2μs      3.36±0.03μs     0.03  bench_function_base.SearchSorted.time_searchsorted(10, 100000, 'sorted', ('i4', 'i8'))
-       130±0.3μs      3.19±0.03μs     0.02  bench_function_base.SearchSorted.time_argsearchsorted(1, 100000, 'random', ('i4', 'i8'))
-       130±0.2μs      3.18±0.03μs     0.02  bench_function_base.SearchSorted.time_argsearchsorted(1, 100000, 'sorted', ('i4', 'i8'))
-       129±0.2μs      2.47±0.06μs     0.02  bench_function_base.SearchSorted.time_searchsorted(1, 100000, 'sorted', ('i4', 'u4'))
-       129±0.3μs      2.46±0.06μs     0.02  bench_function_base.SearchSorted.time_searchsorted(1, 100000, 'random', ('i4', 'u4'))
-       128±0.2μs      1.87±0.02μs     0.01  bench_function_base.SearchSorted.time_searchsorted(1, 100000, 'sorted', ('i4', 'i8'))
-       129±0.2μs      1.86±0.02μs     0.01  bench_function_base.SearchSorted.time_searchsorted(1, 100000, 'random', ('i4', 'i8'))
-     4.12±0.03ms       40.7±0.5μs     0.01  bench_function_base.SearchSorted.time_argsearchsorted(100, 1000000, 'random', ('i4', 'u4'))
-     4.15±0.02ms       39.7±0.6μs     0.01  bench_function_base.SearchSorted.time_argsearchsorted(100, 1000000, 'random', ('i4', 'i8'))
-     4.11±0.02ms       36.4±0.4μs     0.01  bench_function_base.SearchSorted.time_argsearchsorted(100, 1000000, 'sorted', ('i4', 'u4'))
-     4.11±0.02ms       35.5±0.3μs     0.01  bench_function_base.SearchSorted.time_argsearchsorted(100, 1000000, 'sorted', ('i4', 'i8'))
-     4.04±0.01ms       29.1±0.2μs     0.01  bench_function_base.SearchSorted.time_searchsorted(100, 1000000, 'random', ('i4', 'u4'))
-     4.02±0.02ms       28.4±0.1μs     0.01  bench_function_base.SearchSorted.time_searchsorted(100, 1000000, 'random', ('i4', 'i8'))
-     4.02±0.02ms      25.9±0.06μs     0.01  bench_function_base.SearchSorted.time_searchsorted(100, 1000000, 'sorted', ('i4', 'u4'))
-     4.04±0.03ms      25.2±0.08μs     0.01  bench_function_base.SearchSorted.time_searchsorted(100, 1000000, 'sorted', ('i4', 'i8'))
-     3.93±0.02ms      7.20±0.08μs     0.00  bench_function_base.SearchSorted.time_argsearchsorted(10, 1000000, 'random', ('i4', 'u4'))
-     3.96±0.01ms      7.10±0.08μs     0.00  bench_function_base.SearchSorted.time_argsearchsorted(10, 1000000, 'sorted', ('i4', 'u4'))
-     3.94±0.04ms      6.39±0.09μs     0.00  bench_function_base.SearchSorted.time_argsearchsorted(10, 1000000, 'random', ('i4', 'i8'))
-     3.97±0.03ms       6.32±0.1μs     0.00  bench_function_base.SearchSorted.time_argsearchsorted(10, 1000000, 'sorted', ('i4', 'i8'))
-     3.96±0.03ms       4.72±0.1μs     0.00  bench_function_base.SearchSorted.time_searchsorted(10, 1000000, 'sorted', ('i4', 'u4'))
-     3.93±0.02ms      4.67±0.04μs     0.00  bench_function_base.SearchSorted.time_searchsorted(10, 1000000, 'random', ('i4', 'u4'))
-     3.95±0.02ms      4.01±0.06μs     0.00  bench_function_base.SearchSorted.time_searchsorted(10, 1000000, 'sorted', ('i4', 'i8'))
-     3.94±0.04ms      3.99±0.02μs     0.00  bench_function_base.SearchSorted.time_argsearchsorted(1, 1000000, 'sorted', ('i4', 'u4'))
-     3.93±0.02ms      3.97±0.03μs     0.00  bench_function_base.SearchSorted.time_argsearchsorted(1, 1000000, 'random', ('i4', 'u4'))
-     3.96±0.03ms      3.95±0.03μs     0.00  bench_function_base.SearchSorted.time_searchsorted(10, 1000000, 'random', ('i4', 'i8'))
-     3.95±0.01ms      3.24±0.03μs     0.00  bench_function_base.SearchSorted.time_argsearchsorted(1, 1000000, 'random', ('i4', 'i8'))
-     3.96±0.03ms      3.23±0.04μs     0.00  bench_function_base.SearchSorted.time_argsearchsorted(1, 1000000, 'sorted', ('i4', 'i8'))
-     3.94±0.02ms      2.53±0.06μs     0.00  bench_function_base.SearchSorted.time_searchsorted(1, 1000000, 'random', ('i4', 'u4'))
-     3.96±0.02ms      2.54±0.07μs     0.00  bench_function_base.SearchSorted.time_searchsorted(1, 1000000, 'sorted', ('i4', 'u4'))
-     3.98±0.03ms      1.93±0.05μs     0.00  bench_function_base.SearchSorted.time_searchsorted(1, 1000000, 'random', ('i4', 'i8'))
-     3.94±0.03ms      1.90±0.06μs     0.00  bench_function_base.SearchSorted.time_searchsorted(1, 1000000, 'sorted', ('i4', 'i8'))

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.


ewmoore added 2 commits July 30, 2020 22:35
< 8000 input type="hidden" name="disable_live_updates" value="false" autocomplete="off" data-targets="batch-deferred-content.inputs" />
This will no longer build npysort as library that is linked into
multiarray. This library was not installed and was not linked to
anything else.
@ewmoore ewmoore changed the title WIP: Don't always make full copies in (arg)searchsorted. Don't always make full copies in (arg)searchsorted. Jul 31, 2020
@ewmoore
Copy link
Contributor Author
ewmoore commented Jul 31, 2020

I've adjusted the build so that npysort is no longer a library that is built and then linked into multiarray. I've removed the WIP label in the title. From my end this complete and ready to go.

@charris charris changed the title Don't always make full copies in (arg)searchsorted. MAINT: Don't always make full copies in (arg)searchsorted. Aug 2, 2020
@seberg
Copy link
Member
seberg commented Aug 3, 2020

It may be good to have a few more tests with mixed types, but maybe something for when everything is settled, which will probably take a while. On first sight, the approach seems sound, I am curious if we can translate it to a gufunc in theory, and it seems a bit like we can, but only if we move the casting into the gufunc itself (for these types) and not have it handled by the gufunc machinery. That would come with the issue that broadcasted searches may cast multiple times.

In any case, merging the libraries seems good, the main worry I would have is whether its worth the added complexity (in real life) and the fact that in theory the transfer might be able to fail for some combination of types. Which is one thing I should look into soon, to retrofit the transfer functions with an error return, though.

Other than that, it would be nice to have someone with more sorting experience have a first look at this.

@mattip
Copy link
8000 Member
mattip commented Aug 13, 2020

So this PR does a few things:

  • cleans up the tail end of merging _multiarray and _umath by not building npsort as a separate library and moves copycast_isaligned to the common directory
  • changes runtests
  • adds benchmarks
  • adds PyArray_TransferBinSearchFunc and PyArray_TransferArgBinSearchFunc and changes the interfaces of get_argbinsearch_func and get_binsearch_func

It would be easier to review if these were broken into separate PRs, maybe the last two should remain together.

Base automatically changed from master to main March 4, 2021 02:05
@InessaPawson
Copy link
Member

@ewmoore I can see that you started splitting this PR into multiple ones as it was advised by @mattip. Should we close this PR then?

@seberg seberg added the 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. label Jan 11, 2023
@mattip
Copy link
Member
mattip commented Jan 11, 2023

This seems worthwhile. @ewmoore do you want to continue or does anyone else want to take this over (preserving the current commits and attribution)?

@seberg seberg added 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. and removed 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. labels Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
03 - Maintenance 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. component: numpy._core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed problem for searchsorted when different integer dtypes
5 participants
0