8000 BLD, SIMD: The meson CPU dispatcher implementation by seiko2plus · Pull Request #23096 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BLD, SIMD: The meson CPU dispatcher implementation #23096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Aug 11, 2023

Conversation

seiko2plus
Copy link
Member
@seiko2plus seiko2plus commented Jan 26, 2023

requires meson feature module
mesonbuild/meson#11307

Almost gives the same functionality as Distutils/CCompilerOpt,
with a few changes to the way we specify the targets, also its
abend idea of wrapping the dispatchable sources, instead it counts
on static libraries to enable different paths and flags.

TODO:

  • Add support for native build
  • Add support for X86
  • Add support for Arm
  • Add support for PPC64
  • Add support for IBMZ
  • [x] Add test cases No need for new test cases for both runtime & compile time the current tests are good enough

@seiko2plus seiko2plus changed the title The meson CPU dispatcher implementation BLD, SIMD: The meson CPU dispatcher implementation Jan 26, 2023
@seiko2plus
Copy link
Member Author

Ah it works!

@seiko2plus
Copy link
Member Author

Kinda works! meson_devpy build is fine, musllinux is not but still considered a progress

@seiko2plus
Copy link
Member Author
seiko2plus commented Jan 26, 2023

@mattip, @rgommers, @stefanv, your early opinion matters before finalizing both implementations from our side and the meson side.

@seiko2plus
Copy link
Member Author

close/open to update feature module to fix musllinux build

@seiko2plus seiko2plus closed this Jan 26, 2023
@seiko2plus seiko2plus reopened this Jan 26, 2023
Copy link
Member
@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @seiko2plus, this looks pretty good to me! I added a few initial questions and comments inline. And below is the result of my first test.

Looking at the configure stage output:

  • CPU feature detection doesn't work for me on a regular x86-64 Intel CPU (i9-7920X), I'll have to look into that:
Test feature "SSE" : Unsupported 
  • There is one entry per dispatch source file now (minor, but maybe we can avoid that):
Configuring _simd.dispatch.h using configuration
Configuring _umath_tests.dispatch.h using configuration
Configuring x86-qsort.dispatch.h using configuration
...
  • Also minor and can be left for later.: let's try to use regular and more compact output for CPU optimization than the one ########### CPU OPTIMIZATION message, it stands out a bit much now.

Regarding the feature detection issue on my machine, a quick look in build/meson-logs/meson-log.txt shows:

Compiler stderr:
 /tmp/tmpesdxhkp2/testfile.c: In function 'main':
/tmp/tmpesdxhkp2/testfile.c:5:26: error: parameter name omitted
    5 |                 int main(int, char **argv)
      |                          ^~~

Test feature "�[1mSSE�[0m" : Unsupported 
Test feature "�[1mSSE�[0m" : Unsupported due to implied feature "SSE3" compiler was not able to compile the test code
Arguments: ['-msse3']

@rgommers rgommers added 01 - Enhancement Meson Items related to the introduction of Meson as the new build system for NumPy labels Jan 26, 2023
@rgommers
Copy link
Member

Now that I've got it working (one line diff for that posted on mesonbuild/meson#11307), I will try this out more. The configure output is very verbose, that seems fine during development but will need silencing sooner or later - it's now this, many times over:

Message: 
########### CPU OPTIMIZATION ###########
CPU baseline  :
  Requested   : min
  Enabled     : SSE SSE2 SSE3

CPU dispatch  :
  Requested   : max -xop -fma4
  Enabled     : SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512_COMMON AVX512_KNL AVX512_KNM AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL

Generated     : 
  Target      : BASELINE
  Enabled     : SSE SSE2 SSE3
  Flags       : -msse -msse2 -msse3
  Detect      : SSE SSE2 SSE3
  Defines     : SSE SSE2 SSE3
  Undefines   : 
  Sources     : _simd.dispatch.h
                _umath_tests.dispatch.h
                argfunc.dispatch.h
                loops_arithm_fp.dispatch.h
                loops_arithmetic.dispatch.h
                loops_comparison.dispatch.h
                loops_exponent_log.dispatch.h
                loops_hyperbolic.dispatch.h
                loops_minmax.dispatch.h
                loops_modulo.dispatch.h
                loops_trigonometric.dispatch.h
                loops_umath_fp.dispatch.h
                loops_unary_fp.dispatch.h
                loops_logical.dispatch.h
                loops_unary.dispatch.h

@rgommers
Copy link
Member

also its abend idea of wrapping the dispatchable sources, instead it counts on static libraries to enable different paths and flags.

I guess you meant "abandon"? The static libraries per optimization setting and then compiling those into the final extension module seems logical to me. Just checking: are you happy with this change, or is it forced by the split between Meson and NumPy?

For others, this is a snippet of what the relevant Ninja rules look like:

build numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd.dispatch.c: CUSTOM_COMMAND ../numpy/core/src/_simd/_simd.dispatch.c.src | /home/rgommers/code/numpy/numpy/_build_utils/process_src_template.py
 DESC = Generating$ 'numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd.dispatch.c'.
 COMMAND = /home/rgommers/mambaforge/envs/numpy-dev/bin/python3.9 /home/rgommers/code/numpy/numpy/_build_utils/process_src_template.py ../numpy/core/src/_simd/_simd.dispatch.c.src --outfile numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd.dispatch.c

build numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_inc.h: CUSTOM_COMMAND ../numpy/core/src/_simd/_simd_inc.h.src | /home/rgommers/code/numpy/numpy/_build_utils/process_src_template.py
 DESC = Generating$ 'numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_inc.h'.
 COMMAND = /home/rgommers/mambaforge/envs/numpy-dev/bin/python3.9 /home/rgommers/code/numpy/numpy/_build_utils/process_src_template.py ../numpy/core/src/_simd/_simd_inc.h.src --outfile numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_inc.h

build numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_data.inc: CUSTOM_COMMAND ../numpy/core/src/_simd/_simd_data.inc.src | /home/rgommers/code/numpy/numpy/_build_utils/process_src_template.py
 DESC = Generating$ 'numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_data.inc'.
 COMMAND = /home/rgommers/mambaforge/envs/numpy-dev/bin/python3.9 /home/rgommers/code/numpy/numpy/_build_utils/process_src_template.py ../numpy/core/src/_simd/_simd_data.inc.src --outfile numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_data.inc

build numpy/core/meson_cpu/lib_simd_SSE42.a.p/meson-generated__simd.dispatch.c.o: c_COMPILER numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd.dispatch.c || numpy/core/__multiarray_api.h numpy/core/__ufunc_api.h numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_data.inc numpy/core/meson_cpu/lib_simd_SSE42.a.p/_simd_inc.h numpy/core/npy_math_internal.h
 DEPFILE = numpy/core/meson_cpu/lib_simd_SSE42.a.p/meson-generated__simd.dispatch.c.o.d
 DEPFILE_UNQUOTED = numpy/core/meson_cpu/lib_simd_SSE42.a.p/meson-generated__simd.dispatch.c.o.d
 ARGS = -Inumpy/core/meson_cpu/lib_simd_SSE42.a.p -Inumpy/core/meson_cpu -I../numpy/core/meson_cpu -Imeson_cpu -I../meson_cpu -I../numpy/core/src/_simd -Inumpy/core -I../numpy/core -Inumpy/core/include -I../numpy/core/include -I../numpy/core/src/common -I/home/rgommers/mambaforge/envs/numpy-dev/include/python3.9 -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c99 -O2 -g -msse -msse2 -msse3 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/rgommers/mambaforge/envs/numpy-dev/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/rgommers/mambaforge/envs/numpy-dev/include -fPIC -DNPY__CPU_TARGET_SSE42 -DNPY__CPU_TARGET_CURRENT=SSE42 -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2

build numpy/core/meson_cpu/lib_simd_SSE42.a: STATIC_LINKER numpy/core/meson_cpu/lib_simd_SSE42.a.p/meson-generated__simd.dispatch.c.o
 LINK_ARGS = csrDT



build numpy/core/_simd.cpython-39-x86_64-linux-gnu.so: c_LINKER numpy/core/_simd.cpython-39-x86_64-linux-gnu.so.p/src_common_npy_cpu_features.c.o numpy/core/_simd.cpython-39-x86_64-linux-gnu.so.p/src__simd__simd.c.o | /home/rgommers/mambaforge/envs/numpy-dev/x86_64-conda-linux-gnu/sysroot/lib64/libm-2.12.so /home/rgommers/mambaforge/envs/numpy-dev/x86_64-conda-linux-gnu/sysroot/usr/lib64/libm.a numpy/core/libnpymath.a numpy/core/meson_cpu/lib_simd_AVX2_FMA3.a numpy/core/meson_cpu/lib_simd_AVX512_COMMON.a numpy/core/meson_cpu/lib_simd_AVX512_SKX.a numpy/core/meson_cpu/lib_simd_BASELINE.a numpy/core/meson_cpu/lib_simd_SSE42.a
 LINK_ARGS = -L/home/rgommers/mambaforge/envs/numpy-dev/lib -Wl,--as-needed -Wl,--allow-shlib-undefined -shared -fPIC -Wl,--start-group -lm -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/home/rgommers/mambaforge/envs/numpy-dev/lib -Wl,-rpath-link,/home/rgommers/mambaforge/envs/numpy-dev/lib -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/rgommers/mambaforge/envs/numpy-dev/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/rgommers/mambaforge/envs/numpy-dev/include numpy/core/libnpymath.a numpy/core/meson_cpu/lib_simd_BASELINE.a numpy/core/meson_cpu/lib_simd_SSE42.a numpy/core/meson_cpu/lib_simd_AVX2_FMA3.a numpy/core/meson_cpu/lib_simd_AVX512_COMMON.a numpy/core/meson_cpu/lib_simd_AVX512_SKX.a -Wl,--end-group

@seiko2plus
Copy link
Member Author
seiko2plus commented Jan 27, 2023

Hi Ralf,

CPU feature detection doesn't work for me on a regular x86-64 Intel CPU (i9-7920X), I'll have to look into that:

Thanks for the fix

There is one entry per dispatch source file now (minor, but maybe we can avoid that):

No, but it can be reduced if we have multiple dispatch sources share the same required targets.

Also minor and can be left for later.: let's try to use regular and more compact output for CPU optimization than the one ########### CPU OPTIMIZATION message, it stands out a bit much now.

I see it very important for debugging, I think we also need to ship it with NumPy, maybe part of show_config()

@seiko2plus
Copy link
Member Author

Just checking: are you happy with this change, or is it forced by the split between Meson and NumPy?

To be honest, no. count on static libraries increased the complexity but Meson didn't give me another option.

@mattip
Copy link
Member
mattip commented Jan 27, 2023

I see it very important for debugging, I think we also need to ship it with NumPy, maybe part of show_config()

It is important, but it is a lot of text that is overshadowing other important things and filling the screen. Can you make it more compact and less repetitive?

@rgommers
Copy link
Member

Definitely should go into show_config - @ganesh-k13 prepared the ground for that already: https://github.com/numpy/numpy/blob/main/numpy/__config__.py.in#L95

@seiko2plus seiko2plus force-pushed the meson_simd branch 14 times, most recently from 02e757a to 3b40f10 Compare July 21, 2023 00:28
@rgommers rgommers added this to the 2.0.0 release milestone Aug 10, 2023
@rgommers
Copy link
Member

With the meson feature module now available in numpy (after gh-24379), rebased this and reverted the tweaks to the build requirements for meson. Let's see if we can make this PR happy now and get it in. First push looks like it's unhappy about something, I'll try to sort that out now.

@seiko2plus do you have a preference for merging this roughly as it is now, or switching more of the SIMD-specific CI jobs over to Meson first?

@seiko2plus
Copy link
Member Author

do you have a preference for merging this roughly as it is now

One error left related to the build artifact, should be fixed before this pr get merged the rest of notes related to the reviews can be addressed in another pr.

@seiko2plus
Copy link
Member Author

close to rebuild

@seiko2plus seiko2plus closed this Aug 10, 2023
@seiko2plus seiko2plus reopened this Aug 10, 2023
@rgommers
Copy link
Member

Okay, thanks, that sounds good to me. I already had the CircleCI thing fixed. The last thing is related to picking up the forked Meson on Windows; I'm debugging that on my own fork right now.

@seiko2plus
Copy link
Member Author

Okay, thanks, that sounds good to me. I already had the CircleCI thing fixed

I just saw you suppress it, thanks!

@charris
6D40 Copy link
Member
charris commented Aug 11, 2023

Make sure to squash this for the merge to make the backport easier.

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Aug 11, 2023
[skip circle] [skip cirrus] [skip travis]
Copy link
Member
@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, all green 🎉. In it goes. Thanks a lot for all the work on this @seiko2plus!

@rgommers rgommers merged commit 4ec0182 into numpy:main Aug 11, 2023
charris pushed a commit to charris/numpy that referenced this pull request Aug 12, 2023
Almost gives the same functionality as Distutils/CCompiler Opt,
with a few changes to the way we specify the targets. Also, it
abandons the idea of wrapping the dispatchable sources, instead it
counts on static libraries to enable different paths and flags.
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Aug 12, 2023
charris pushed a commit to charris/numpy that referenced this pull request Nov 11, 2023
Almost gives the same functionality as Distutils/CCompiler Opt,
with a few changes to the way we specify the targets. Also, it
abandons the idea of wrapping the dispatchable sources, instead it
counts on static libraries to enable different paths and flags.
@TimotheusBachinger TimotheusBachinger mentioned this pull request Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement Meson Items related to the introduction of Meson as the new build system for NumPy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0