-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Use Highway with Static Dispatch for some types in Absolute #24385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see this simplified example :D
9726c37
to
8939670
Compare
const npy_intp ssrc = steps[0] / lsize; | ||
const npy_intp sdst = steps[1] / lsize; | ||
auto load_index = hwy::AllocateAligned<TI>(hn::Lanes(d)); | ||
for (size_t i = 0; i < hn::Lanes(d); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could actually vectorize this, just Mul() for <= 32 bit types, and perhaps converting to double and then Mul of that for 64-bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example of how to specialize a generic function for certain types:
https://github.com/libjxl/libjxl/blob/e6202f7181eff36c78bfdb79aa9bd45c3d1d614b/lib/jxl/rational_polynomial-inl.h#L29
numpy/core/meson.build
Outdated
@@ -79,6 +79,10 @@ endif | |||
# important, because code in NumPy typically does not check the value but only | |||
# whether the symbol is defined. So `#define HAVE_ 8000 SOMETHING 0` is wrong. | |||
|
|||
#cmake = import('cmake') | |||
#hwy = cmake.subproject('highway') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mousius was this useful for local debugging? I'm working on Meson subproject usage elsewhere, so I'm curious about the details here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could have been more useful. I wanted to use the meson integration with cmake to be cleaner, but I ended up just adding the files manually as there seemed to be a lot of steps involved.
I tried to build this branch, to test with it a little bit. I had trouble checking out the submodule, since this points at a commit that's not part of the main repo. That can be updated now though, since the Gather/Scatter PR was merged. A lot of CI jobs failed with:
and after merging |
Ack. Apologies, I've been focussing on other things, I didn't originally do the spike on x86 so didn't pick this up, I'll take a look and follow up. |
Demonstrates how this fits together with the user toggles within the existing system.
8939670
to
84695e0
Compare
@rgommers updated this, took me longer than I expected to unpick the new dispatch system. The issue was that I had not made sure all the values line up across the various dispatch files, so Because I've taken examples from both |
Thanks @Mousius, that works for me now. I'll try to play with this branch a bit. |
This variant on #24384 uses the static dispatch within Highway, but the dynamic dispatch exists in NumPy.
Similarly, loads and stores can only be contiguous rather than supporting all potential layouts (waiting for the Scatter equivalent for google/highway@c6c09c4, then it should be trivial).