-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Vectorize np.sort and np.partition with AVX2 #25045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -38,7 +38,7 @@ def test_argequivalent(self): | |||
(np.sort, np.argsort, dict()), | |||
(_add_keepdims(np.min), _add_keepdims(np.argmin), dict()), | |||
(_add_keepdims(np.max), _add_keepdims(np.argmax), dict()), | |||
(np.partition, np.argpartition, dict(kth=2)), | |||
#(np.partition, np.argpartition, dict(kth=2)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure what to do with this test. When there is no unique defined output to np.partition
or np.argpartition
, we obviously cannot expect their outputs to match with each other, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we obviously cannot expect their outputs to match with each other, right?
I agree since the order is supposed to be undefined.
Commit f92e3b3 updates test |
Rebased after #24018. Also split highway dispatch to a separate file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just scrolled over this, think the split looks good but maybe we can avoid the last pointer check.
Cygwin failure seems unrelated. |
Rebased with main to fix the cygwin CI failure. |
This reverts commit 76d5534.
Perf improvements to AVX2 sorting: see intel/x86-simd-sort#104
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but it'd be good to restart CI with this month's credits. @ngoldbaum, do you have buttons for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you Raghuveer!. Could you please make a few changes as suggested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done, Thank you!
Thank you @r-devulap. |
Does this need a release note? |
You could add an improvement release note, won't hurt. |
Adds AVX2 implementations of np.sort and np.partition for 32-bit and 64-bit dtypes. Speeds up 32-bit sort by up to 13x and 64-bit sort by up to 7x.
This patch includes commits from #24924 (makes it easier to rebase later).
EDIT: updated benchmark numbers after we made a fewmore improvments to 64-bit AVX2 sorting in intel/x86-simd-sort#99.
Detailed benchmarks are here. https://gist.github.com/r-devulap/4d55b5c1909a7ed0743746cf719bf9d2