8000 Use Highway with Static Dispatch for some types in Absolute by Mousius · Pull Request #24385 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content
8000

Use Highway with Static Dispatch for some types in Absolute #24385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

Mousius
Copy link
Member
@Mousius Mousius commented Aug 10, 2023

This variant on #24384 uses the static dispatch within Highway, but the dynamic dispatch exists in NumPy.

Similarly, loads and stores can only be contiguous rather than supporting all potential layouts (waiting for the Scatter equivalent for google/highway@c6c09c4, then it should be trivial).

@Mousius
Copy link
Member Author
Mousius commented Aug 10, 2023

Copy link
Contributor
@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see this simplified example :D

@Mousius Mousius force-pushed the hwy-spike-static branch 3 times, most recently from 9726c37 to 8939670 Compare August 11, 2023 16:32
const npy_intp ssrc = steps[0] / lsize;
const npy_intp sdst = steps[1] / lsize;
auto load_index = hwy::AllocateAligned<TI>(hn::Lanes(d));
for (size_t i = 0; i < hn::Lanes(d); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could actually vectorize this, just Mul() for <= 32 bit types, and perhaps converting to double and then Mul of that for 64-bit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example of how to specialize a generic function for certain types:
https://github.com/libjxl/libjxl/blob/e6202f7181eff36c78bfdb79aa9bd45c3d1d614b/lib/jxl/rational_polynomial-inl.h#L29

@rgommers rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Aug 28, 2023
@@ -79,6 +79,10 @@ endif
# important, because code in NumPy typically does not check the value but only
# whether the symbol is defined. So `#define HAVE_ 8000 SOMETHING 0` is wrong.

#cmake = import('cmake')
#hwy = cmake.subproject('highway')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mousius was this useful for local debugging? I'm working on Meson subproject usage elsewhere, so I'm curious about the details here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have been more useful. I wanted to use the meson integration with cmake to be cleaner, but I ended up just adding the files manually as there seemed to be a lot of steps involved.

@rgommers
Copy link
Member

I tried to build this branch, to test with it a little bit. I had trouble checking out the submodule, since this points at a commit that's not part of the main repo. That can be updated now though, since the Gather/Scatter PR was merged.

A lot of CI jobs failed with:

/usr/bin/ld: build/temp.linux-x86_64-3.9/numpy/core/src/umath/umathmodule.o: relocation R_X86_64_PC32 against undefined hidden symbol `DOUBLE_absolute_SSE41' can not be used when making a shared object

and after merging main into this branch (which is needed to get SIMD support in the Meson build) I am also getting this error locally. @Mousius would you be able to rebase this? CI doesn't have to pass, but it'd be great if it worked with regular GCC on Linux x86-64. And don't worry about the setup.py-based build, since we're almost done with removing it (~4 CI jobs left to go).

@Mousius
Copy link
Member Author
Mousius commented Sep 7, 2023

I tried to build this branch, to test with it a little bit. I had trouble checking out the submodule, since this points at a commit that's not part of the main repo. That can be updated now though, since the Gather/Scatter PR was merged.

A lot of CI jobs failed with:

/usr/bin/ld: build/temp.linux-x86_64-3.9/numpy/core/src/umath/umathmodule.o: relocation R_X86_64_PC32 against undefined hidden symbol `DOUBLE_absolute_SSE41' can not be used when making a shared object

and after merging main into this branch (which is needed to get SIMD support in the Meson build) I am also getting this error locally. @Mousius would you be able to rebase this? CI doesn't have to pass, but it'd be great if it worked with regular GCC on Linux x86-64. And don't worry about the setup.py-based build, since we're almost done with removing it (~4 CI jobs left to go).

Ack. Apologies, I've been focussing on other things, I didn't originally do the spike on x86 so didn't pick this up, I'll take a look and follow up.

Demonstrates how this fits together with the user toggles within the
existing system.
< 8000 div class="js-socket-channel js-updatable-content" data-channel="eyJjIjoicmVwbzo5MDg2MDc6Y29tbWl0Ojg0Njk1ZTAyNzE4ZjU1ZDYwN2QxNDdjODg0MjhhNWUwMzI0N2VjOGIiLCJ0IjoxNzQ3Njc5MTkyfQ==--1e38570b3784409dc2a0a2b8c88788ba1855a8d96cfe0702545863fb2e0b3f4e" data-url="/numpy/numpy/pull/24385/partials/commit_status_icon?oid=84695e02718f55d607d147c88428a5e03247ec8b">
@Mousius
Copy link
Member Author
Mousius commented Sep 7, 2023

@rgommers updated this, took me longer than I expected to unpick the new dispatch system. The issue was that I had not made sure all the values line up across the various dispatch files, so meson.build wasn't building that symbol.

Because I've taken examples from both loops_autovec and loops_unary_fp I had to align those with the example, but we can unpick that easily for other functions - if I built SSE41 in addition it would use the same HWY_NAMESPACE as AVX2 by accident. This is not an issue for a real world implementation.

@rgommers
Copy link
Member
rgommers commented Sep 7, 2023

Thanks @Mousius, that works for me now. I'll try to play with this branch a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0