-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Speed problem for searchsorted when different integer dtypes #13579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not sure about the code, to really speed things up for such mismatches, you would have to decide that the needle is small and only do the casting as needed for each of the elements (at least that would be my intuition). |
This was previously reported in gh-5370, which has a similar discussion (close that in favor of this one). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Reproducing code example:
The cause of the slowness is that searchsorted upcasts the input array
arr
and valuev
to the same dtype, if they have different dtypes, and this costs time. This is not always needed.In Pandas we've worked around this by making special integer type checks in a custom version of searchsorted, but IMO everyone would benefit if this/something similar was put into numpy instead.
See https://github.com/pandas-dev/pandas/blob/master/pandas/core/algorithms.py#L1726 for the pandas version. The relevant PR is pandas-dev/pandas#22034. The pandas version of runs at 17.3 µs. This is slower than the optimal numpy version, probably because it's done in python + makes some checks that likely wouldn't be needed to do in in numpy.
BTW, I wouldn't be able to implement this in numpy, because I don't know C...
Numpy/Python version information:
1.15.4 3.6.7 |Anaconda, Inc.| (default, Oct 28 2018, 19:44:12) [MSC v.1915 64 bit (AMD64)]
The text was updated successfully, but these errors were encountered: