-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
MAINT: Don't always make full copies in (arg)searchsorted. #16942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This will no longer build npysort as library that is linked into multiarray. This library was not installed and was not linked to anything else.
I've adjusted the build so that npysort is no longer a library that is built and then linked into multiarray. I've removed the WIP label in the title. From my end this complete and ready to go. |
It may be good to have a few more tests with mixed types, but maybe something for when everything is settled, which will probably take a while. On first sight, the approach seems sound, I am curious if we can translate it to a gufunc in theory, and it seems a bit like we can, but only if we move the casting into the gufunc itself (for these types) and not have it handled by the gufunc machinery. That would come with the issue that broadcasted searches may cast multiple times. In any case, merging the libraries seems good, the main worry I would have is whether its worth the added complexity (in real life) and the fact that in theory the transfer might be able to fail for some combination of types. Which is one thing I should look into soon, to retrofit the transfer functions with an error return, though. Other than that, it would be nice to have someone with more sorting experience have a first look at this. |
So this PR does a few things:
It would be easier to review if these were broken into separate PRs, maybe the last two should remain together. |
This seems worthwhile. @ewmoore do you want to continue or does anyone else want to take this over (preserving the current commits and attribution)? |
When running
searchsorted
on arrays with different dtypes currently full copies are made of of each to a common dtype. This is rather inefficient since we really only need a few elements to be converted. This makessearchsorted
andargsearchsorted
only copy when necessary. This would close gh-13579.I've labeled this WIP since either the approach I'm using will be rejected or we will need to move some more code around so that the dtype transfer functions are expose enough that they can be called from with npysort. I'm not sure what is gained really by having the sorting code off by itself, so hopefully compiling it with the rest of multiarray will be fine. I don't think it is linked to anything else.
I've added a benchmark for these changes. The results are included below. It almost certainly has too many sets of parameters for keeping around forever, but since there will be changes needed here, that can change too. I'm not entirely sure why those three floating point loops are slowed a little bit, everything is as fast or faster.
This PR also has a change to
runtests.py
, maybe this isn't needed, but --bench-compare wouldn't work for me without it.Looking forward to your feedback.