Use Highway with Static Dispatch for some types in Absolute#24385
Use Highway with Static Dispatch for some types in Absolute#24385Mousius wants to merge 7 commits intonumpy:mainfrom
Conversation
There was a problem hiding this comment.
Nice to see this simplified example :D
9726c37 to
8939670
Compare
| const npy_intp ssrc = steps[0] / lsize; | ||
| const npy_intp sdst = steps[1] / lsize; | ||
| auto load_index = hwy::AllocateAligned<TI>(hn::Lanes(d)); | ||
| for (size_t i = 0; i < hn::Lanes(d); ++i) { |
There was a problem hiding this comment.
We could actually vectorize this, just Mul() for <= 32 bit types, and perhaps converting to double and then Mul of that for 64-bit?
There was a problem hiding this comment.
Here's an example of how to specialize a generic function for certain types:
https://github.com/libjxl/libjxl/blob/e6202f7181eff36c78bfdb79aa9bd45c3d1d614b/lib/jxl/rational_polynomial-inl.h#L29
numpy/core/meson.build
Outdated
| # whether the symbol is defined. So `#define HAVE_SOMETHING 0` is wrong. | 8000||
|
|
||
| #cmake = import('cmake') | ||
| #hwy = cmake.subproject('highway') |
There was a problem hiding this comment.
@Mousius was this useful for local debugging? I'm working on Meson subproject usage elsewhere, so I'm curious about the details here.
There was a problem hiding this comment.
This could have been more useful. I wanted to use the meson integration with cmake to be cleaner, but I ended up just adding the files manually as there seemed to be a lot of steps involved.
|
I tried to build this branch, to test with it a little bit. I had trouble checking out the submodule, since this points at a commit that's not part of the main repo. That can be updated now though, since the Gather/Scatter PR was merged. A lot of CI jobs failed with: and after merging |
Ack. Apologies, I've been focussing on other things, I didn't originally do the spike on x86 so didn't pick this up, I'll take a look and follow up. |
Demonstrates how this fits together with the user toggles within the existing system.
8939670 to
84695e0
Compare
|
@rgommers updated this, took me longer than I expected to unpick the new dispatch system. The issue was that I had not made sure all the values line up across the various dispatch files, so Because I've taken examples from both |
|
Thanks @Mousius, that works for me now. I'll try to play with this branch a bit. |
This variant on #24384 uses the static dispatch within Highway, but the dynamic dispatch exists in NumPy.
Similarly, loads and stores can only be contiguous rather than supporting all potential layouts (waiting for the Scatter equivalent for google/highway@c6c09c4, then it should be trivial).