Description
Reproducing code example:
>>> n = 1_000_000
>>> arr = np.array([1] * n + [2] * n + [3] * n, dtype='int8')
>>> %timeit arr.searchsorted(2) # search for a python int
7.38 ms # slow
>>> code = np.int16(1) # numpy int, but different dtype
>>> %timeit arr.searchsorted(code)
4.04 ms # slow
>>> code = np.int8(1)
>>> %timeit arr.searchsorted(code) # same dtype
2.57 µs # fast
The cause of the slowness is that searchsorted upcasts the input array arr
and value v
to the same dtype, if they have different dtypes, and this costs time. This is not always needed.
In Pandas we've worked around this by making special integer type checks in a custom version of searchsorted, but IMO everyone would benefit if this/something similar was put into numpy instead.
See https://github.com/pandas-dev/pandas/blob/master/pandas/core/algorithms.py#L1726 for the pandas version. The relevant PR is pandas-dev/pandas#22034. The pandas version of runs at 17.3 µs. This is slower than the optimal numpy version, probably because it's done in python + makes some checks that likely wouldn't be needed to do in in numpy.
BTW, I wouldn't be able to implement this in numpy, because I don't know C...
Numpy/Python version information:
1.15.4 3.6.7 |Anaconda, Inc.| (default, Oct 28 2018, 19:44:12) [MSC v.1915 64 bit (AMD64)]