-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BLD, SIMD: The meson CPU dispatcher implementation #23096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ah it works! |
Kinda works! meson_devpy build is fine, musllinux is not but still considered a progress |
close/open to update feature module to fix musllinux build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @seiko2plus, this looks pretty good to me! I added a few initial questions and comments inline. And below is the result of my first test.
Looking at the configure stage output:
- CPU feature detection doesn't work for me on a regular x86-64 Intel CPU (i9-7920X), I'll have to look into that:
Test feature "SSE" : Unsupported
- There is one entry per dispatch source file now (minor, but maybe we can avoid that):
Configuring _simd.dispatch.h using configuration
Configuring _umath_tests.dispatch.h using configuration
Configuring x86-qsort.dispatch.h using configuration
...
- Also minor and can be left for later.: let's try to use regular and more compact output for CPU optimization than the one
########### CPU OPTIMIZATION
message, it stands out a bit much now.
Regarding the feature detection issue on my machine, a quick look in build/meson-logs/meson-log.txt
shows:
Compiler stderr:
/tmp/tmpesdxhkp2/testfile.c: In function 'main':
/tmp/tmpesdxhkp2/testfile.c:5:26: error: parameter name omitted
5 | int main(int, char **argv)
| ^~~
Test feature "�[1mSSE�[0m" : Unsupported
Test feature "�[1mSSE�[0m" : Unsupported due to implied feature "SSE3" compiler was not able to compile the test code
Arguments: ['-msse3']
Now that I've got it working (one line diff for that posted on mesonbuild/meson#11307), I will try this out more. The configure output is very verbose, that seems fine during development but will need silencing sooner or later - it's now this, many times over:
|
I guess you meant "abandon"? The static libraries per optimization setting and then compiling those into the final extension module seems logical to me. Just checking: are you happy with this change, or is it forced by the split between Meson and NumPy? For others, this is a snippet of what the relevant Ninja rules look like: build numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd.dispatch.c: CUSTOM_COMMAND ../numpy/core/src/_simd/_simd.dispatch.c.src | /home/rgommers/code/numpy/numpy/_build_utils/process_src_template.py
DESC = Generating$ 'numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd.dispatch.c'.
COMMAND = /home/rgommers/mambaforge/envs/numpy-dev/bin/python3.9 /home/rgommers/code/numpy/numpy/_build_utils/process_src_template.py ../numpy/core/src/_simd/_simd.dispatch.c.src --outfile numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd.dispatch.c
build numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_inc.h: CUSTOM_COMMAND ../numpy/core/src/_simd/_simd_inc.h.src | /home/rgommers/code/numpy/numpy/_build_utils/process_src_template.py
DESC = Generating$ 'numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_inc.h'.
COMMAND = /home/rgommers/mambaforge/envs/numpy-dev/bin/python3.9 /home/rgommers/code/numpy/numpy/_build_utils/process_src_template.py ../numpy/core/src/_simd/_simd_inc.h.src --outfile numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_inc.h
build numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_data.inc: CUSTOM_COMMAND ../numpy/core/src/_simd/_simd_data.inc.src | /home/rgommers/code/numpy/numpy/_build_utils/process_src_template.py
DESC = Generating$ 'numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_data.inc'.
COMMAND = /home/rgommers/mambaforge/envs/numpy-dev/bin/python3.9 /home/rgommers/code/numpy/numpy/_build_utils/process_src_template.py ../numpy/core/src/_simd/_simd_data.inc.src --outfile numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_data.inc
build numpy/core/meson_cpu/lib_simd_SSE42.a.p/meson-generated__simd.dispatch.c.o: c_COMPILER numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd.dispatch.c || numpy/core/__multiarray_api.h numpy/core/__ufunc_api.h numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_data.inc numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_inc.h numpy/core/npy_math_internal.h
DEPFILE = numpy/core/meson_cpu/lib_simd_SSE42.a.p/meson-generated__simd.dispatch.c.o.d
DEPFILE_UNQUOTED = numpy/core/meson_cpu/lib_simd_SSE42.a.p/meson-generated__simd.dispatch.c.o.d
ARGS = -Inumpy/core/meson_cpu/lib_simd_SSE42.a.p -Inumpy/core/meson_cpu -I../numpy/core/meson_cpu -Imeson_cpu -I../meson_cpu -I../numpy/core/src/_simd -Inumpy/core -I../numpy/core -Inumpy/core/include -I../numpy/core/include -I../numpy/core/src/common -I/home/rgommers/mambaforge/envs/numpy-dev/include/python3.9 -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c99 -O2 -g -msse -msse2 -msse3 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/rgommers/mambaforge/envs/numpy-dev/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/rgommers/mambaforge/envs/numpy-dev/include -fPIC -DNPY__CPU_TARGET_SSE42 -DNPY__CPU_TARGET_CURRENT=SSE42 -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2
build numpy/core/meson_cpu/lib_simd_SSE42.a: STATIC_LINKER numpy/core/meson_cpu/lib_simd_SSE42.a.p/meson-generated__simd.dispatch.c.o
LINK_ARGS = csrDT
build numpy/core/_simd.cpython-39-x86_64-linux-gnu.so: c_LINKER numpy/core/_simd.cpython-39-x86_64-linux-gnu.so.p/src_common_npy_cpu_features.c.o numpy/core/_simd.cpython-39-x86_64-linux-gnu.so.p/src__simd__simd.c.o | /home/rgommers/mambaforge/envs/numpy-dev/x86_64-conda-linux-gnu/sysroot/lib64/libm-2.12.so /home/rgommers/mambaforge/envs/numpy-dev/x86_64-conda-linux-gnu/sysroot/usr/lib64/libm.a numpy/core/libnpymath.a numpy/core/meson_cpu/lib_simd_AVX2_FMA3.a numpy/core/meson_cpu/lib_simd_AVX512_COMMON.a numpy/core/meson_cpu/lib_simd_AVX512_SKX.a numpy/core/meson_cpu/lib_simd_BASELINE.a numpy/core/meson_cpu/lib_simd_SSE42.a
LINK_ARGS = -L/home/rgommers/mambaforge/envs/numpy-dev/lib -Wl,--as-needed -Wl,--allow-shlib-undefined -shared -fPIC -Wl,--start-group -lm -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/home/rgommers/mambaforge/envs/numpy-dev/lib -Wl,-rpath-link,/home/rgommers/mambaforge/envs/numpy-dev/lib -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/rgommers/mambaforge/envs/numpy-dev/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/rgommers/mambaforge/envs/numpy-dev/include numpy/core/libnpymath.a numpy/core/meson_cpu/lib_simd_BASELINE.a numpy/core/meson_cpu/lib_simd_SSE42.a numpy/core/meson_cpu/lib_simd_AVX2_FMA3.a numpy/core/meson_cpu/lib_simd_AVX512_COMMON.a numpy/core/meson_cpu/lib_simd_AVX512_SKX.a -Wl,--end-group |
Hi Ralf,
Thanks for the fix
No, but it can be reduced if we have multiple dispatch sources share the same required targets.
I see it very important for debugging, I think we also need to ship it with NumPy, maybe part of |
To be honest, no. count on static libraries increased the complexity but Meson didn't give me another option. |
It is important, but it is a lot of text that is overshadowing other important things and filling the screen. Can you make it more compact and less repetitive? |
Definitely should go into |
02e757a
to
3b40f10
Compare
With the meson @seiko2plus do you have a preference for merging this roughly as it is now, or switching more of the SIMD-specific CI jobs over to Meson first? |
One error left related to the build artifact, should be fixed before this pr get merged the rest of notes related to the reviews can be addressed in another pr. |
close to rebuild |
Okay, thanks, that sounds good to me. I already had the CircleCI thing fixed. The last thing is related to picking up the forked Meson on Windows; I'm debugging that on my own fork right now. |
I just saw you suppress it, thanks! |
Make sure to squash this for the merge to make the backport easier. |
[skip circle] [skip cirrus] [skip travis]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, all green 🎉. In it goes. Thanks a lot for all the work on this @seiko2plus!
Almost gives the same functionality as Distutils/CCompiler Opt, with a few changes to the way we specify the targets. Also, it abandons the idea of wrapping the dispatchable sources, instead it counts on static libraries to enable different paths and flags.
Almost gives the same functionality as Distutils/CCompiler Opt, with a few changes to the way we specify the targets. Also, it abandons the idea of wrapping the dispatchable sources, instead it counts on static libraries to enable different paths and flags.
requires meson feature module
mesonbuild/meson#11307
Almost gives the same functionality as Distutils/CCompilerOpt,
with a few changes to the way we specify the targets, also its
abend idea of wrapping the dispatchable sources, instead it counts
on static libraries to enable different paths and flags.
TODO:
[x] Add test casesNo need for new test cases for both runtime & compile time the current tests are good enough