-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Use openmp on x86-simd-sort to speed up np.sort and np.argsort #28619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
130cac5
Update x86-simd-sort module to latest
r-devulap 7fd938b
BLD: Add openmp flags to build x86-simd-sort
r-devulap f8cfa4e
ENH: Update x86-simd-sort to port openmp support for argsort
r-devulap 02c4728
Add meson option to toggle building with openMP
r-devulap 21bc19f
TST: Add np.argsort test for openmp paths
r-devulap e425de8
CI: Add openmp flags to test openMP code paths
r-devulap ac59ea9
Update x86-simd-sort: detect already sorted arrays for np.argsort
r-devulap 8ba425e
DOCS: add release notes
r-devulap e0f0247
Minor changes to meson.build
r-devulap 6eff29e
Initialize omp to empty variable
r-devulap File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
Building NumPy with OpenMP Parallelization | ||
------------------------------------------- | ||
NumPy now supports OpenMP parallel processing capabilities when built with the | ||
``-Denable_openmp=true`` Meson build flag. This feature is disabled by default. | ||
When enabled, ``np.sort`` and ``np.argsort`` functions can utilize OpenMP for | ||
parallel thread execution, improving performance for these operations. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
Performance improvements to ``np.sort`` and ``np.argsort`` | ||
---------------------------------------------------------- | ||
``np.sort`` and ``np.argsort`` functions now can leverage OpenMP for parallel | ||
thread execution, resulting in up to 3.5x speedups on x86 architectures with | ||
AVX2 or AVX-512 instructions. This opt-in feature requires NumPy to be built | ||
with the -Denable_openmp Meson flag. Users can control the number of threads | ||
used by setting the OMP_NUM_THREADS environment variable. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Submodule x86-simd-sort
updated
34 files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we "all good" to use OpenMP in NumPy directly? I thought there were other ecosystem interaction concerns to consider, like cross-interactions with OpenBLAS or wheel-related issues, etc. Maybe this is just for a custom local build rather than for activation in wheels.
If we are actually "ok" for that, I guess that in addition to the env variable Sayed mentioned there is also
threadpoolctl
where one might need to modulate both with something likewith controller.limit(limits={"openblas": 2, "openmp": 4})
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with implications of using openMP and how it could potentially interact with other modules. I was hoping to get an answer to that via the pull request and everyone's input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenBLAS usually manages OpenMP itself, which creates a bit of confusion when nesting it.
OpenBLAS now has:
https://github.com/OpenMathLib/OpenBLAS/blob/develop/driver/others/blas_server_callback.c
Which we can use if we have a central thread pool we want to re-use across multiple things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ogrisel I was wondering if you know anything about this (i.e. whether it is safe to use OpenMP in NumPy, or it would create issues).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
libopenblas
we ship inside our wheels is always built with pthreads, not openmp. Build scripts live at https://github.com/MacPython/openblas-libs/tree/main/tools.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scikit-learn does run into issues with running OpenMP's threadpool together with OpenBLAS: scikit-learn/scikit-learn#28883
There is a draft PR here to force OpenBLAS to use OpenMP (alway from pthreads): scikit-learn/scikit-learn#29403
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are some quite ugly issues with multiple installed openmp libraries and segfaults depending on import order, see
microsoft/LightGBM#6595