-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
argsort/sort with O(n) algorithms: counting-sort, radix-sort #6050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree 8000 to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've been toying with the idea of adding some of the O(n) sorting algorithms to NumPy for some time now: counting sort for u/int8s and bools, radix sort for larger ints, timsort for object arrays... The most difficult thing to do is figure out how to adapt the current NumPy API to fit the new functionality without cluttering it to much. At some point I was thinking on adding to the current options of quicksort, mergesort, heapsort, that sort and argsort provide, two more options: fastest and stable, probably with better names. The idea is that these two new selectors do not have to point to new implementations (you could map fastest to quicksort and stable to mergesort and be done with it), and if they do, they do so in a more or less concealed manner, so that the internals can be changed at will without breaking any user code. |
If thinking about such changes then it would also be good to keep in mind
|
Everyone loves a super fast sort! In terms of API, why not keep the old
|
I realize this is an old issue, but this will be super useful. I have an application where things to be sorted are integers and sorting takes 70+% of the time.
I'd much rather prefer that for |
There are good implementations of radix/counting sort/argsort in this repo: https://github.com/eloj/radix-sorting O(n) integer sort is important for PyData/Sparse as a lot of time is spent on integer sorting in many operations. See pydata/sparse#52. It’s also important to Zarr (zarr-developers/zarr-python#236) Ideally, radix sort should be the default sort for integer arrays. The reason is that it’s O(n) instead of O(n log n). Benchmarks in the repository I linked show as much as a 7x speed up for moderately sized arrays. Radix sort is also stable, which means it doesn’t change the relative ordering of any equal elements. The only downside to radix sort is that it isn’t in-place: it needs a temporary array to get the work done. |
You can already sort with couting-sort using numpy: as stated in an SO answer, it is simply:
But there is no way (as far as I can tell) to do an argsort.
I think it would be fairly easy to implement and slot into the argsort/sort documentation. The benefit of counting sort is that it's O(n), although it only works for not-too-large non-negative integers...however those are actually quite common in the real world! Pandas uses it behind the scenes apparently.
The text was updated successfully, but these errors were encountered: