8000 BLD, SIMD: The meson CPU dispatcher implementation by seiko2plus · Pull Request #23096 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BLD, SIMD: The meson CPU dispatcher implementation #23096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Aug 11, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
ab496b3
ENH, SIMD: The meson CPU dispatcher implementation
seiko2plus Jul 16, 2023
9a139d3
Provide compatibility with distutils
seiko2plus Jul 18, 2023
166477d
Extend test_requirements/pyproject to cover meson module
seiko2plus Jul 18, 2023
9421795
BUG: Fix SSE build on meson
seiko2plus Jul 18, 2023
0c3111c
fix the build when it disabled
seiko2plus Jul 19, 2023
a78ef6b
enable AVX512_SPR for quicksort
seiko2plus Jul 19, 2023
a3058a9
Add support for build option --test-simd
seiko2plus Jul 19, 2023
e4b3d72
Fix sdist build
seiko2plus Jul 20, 2023
4203a5b
Pass Opt level 3 to all dispach-able sources
seiko2plus Jul 20, 2023
04b2e2a
Disable SIMD kernels of log/exp/sin/cos on clang-cl
seiko2plus Jul 20, 2023
1917470
CI: Transition x86 specialized tests to Meson from Distutils
seiko2plus Jul 21, 2023
0963fe0
Cleanup the main configration header and improves docs
seiko2plus Jul 23, 2023
481114a
Detect global architecture args
seiko2plus Jul 23, 2023
58239f3
update the meson module name
seiko2plus Jul 25, 2023
2271d02
fix SSE41 flag on Intel-cl
seiko2plus Aug 1, 2023
080e19c
rename method multi_target to multi_targets
seiko2plus Aug 1, 2023
35292aa
Disables mmx when AVX512 is enabled similar to distutils
seiko2plus Aug 1, 2023
077a09f
Add build target AVX512_ICL for simd_qsort
seiko2plus Aug 1, 2023
7ec6933
CI: Allow noblas for SIMD tests
seiko2plus Aug 1, 2023
0beab65
Bybass sort validation for _simd module
seiko2plus Aug 2, 2023
f580462
Removes build option boolean warning
seiko2plus Aug 2, 2023
de71d9b
removes py_dep from _simd extention
seiko2plus Aug 2, 2023
dedd413
fix Initlize typo
seiko2plus Aug 2, 2023
a751c20
Minimize the log of CPU optimization
seiko2plus Aug 2, 2023
76327b8
Remove debug log and count on multi_targets() debug
seiko2plus Aug 4, 2023
e13ce41
update multi_targets to reduce the number of objects
seiko2plus Aug 7, 2023
9509508
BLD: updates to build and test dependencies
rgommers Aug 10, 2023
371bb76
BLD: add Meson version check, to catch older installed versions early
rgommers Aug 10, 2023
3363fa3
CI: fix doc refguide check failure on CircleCI
rgommers Aug 10, 2023
b868d25
Merge branch 'main' into meson_simd
rgommers Aug 11, 2023
b1855c0
STY: minor fixes to code comments
rgommers Aug 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
STY: minor fixes to code comments
[skip circle] [skip cirrus] [skip travis]
  • Loading branch information
rgommers committed Aug 11, 2023
commit b1855c07932e006e558716baf5798ab623a3190c
76 changes: 41 additions & 35 deletions meson_cpu/meson.build
10000
Original file line number Diff line number Diff line change
@@ -1,41 +1,48 @@
# The CPU Dispatcher implementation.
#
# This script handles the CPU dispatcher and requires the Meson module 'features'.
# This script handles the CPU dispatcher and requires the Meson module
# 'features'.
#
# The CPU dispatcher script is responsible for three main tasks:
#
# 1. Defining the enabled baseline and dispatched features by parsing build options
# or compiler arguments, including detection of native flags.
# 1. Defining the enabled baseline and dispatched features by parsing build
# options or compiler arguments, including detection of native flags.
#
# 2. Specifying the baseline arguments and definitions across all sources.
#
# 3. Generating the main configuration file, which contains information about the enabled features,
# along with a collection of C macros necessary for runtime dispatching.
# For more details, see the template file `main_config.h.in`.
# 3. Generating the main configuration file, which contains information about
# the enabled features, along with a collection of C macros necessary for
# runtime dispatching. For more details, see the template file
# `main_config.h.in`.
#
# This script exposes the following variables:
#
# - `CPU_BASELINE`: A set of CPU feature objects obtained from `mod_features.new()`,
# representing the minimum CPU features specified by the build option `-Dcpu-baseline`.
# - `CPU_BASELINE`: A set of CPU feature objects obtained from
# `mod_features.new()`, representing the minimum CPU features
# specified by the build option `-Dcpu-baseline`.
#
# - `CPU_BASELINE_NAMES`: A set of enabled CPU feature names, representing the minimum CPU features
# specified by the build option `-Dcpu-baseline`.
# - `CPU_BASELINE_NAMES`: A set of enabled CPU feature names, representing the
# minimum CPU features specified by the build option
# `-Dcpu-baseline`.
#
# - `CPU_DISPATCH_NAMES`: A set of enabled CPU feature names, representing the additional CPU features
# that can be dispatched at runtime, specified by the build option `-Dcpu-dispatch`.
# - `CPU_DISPATCH_NAMES`: A set of enabled CPU feature names, representing the
# additional CPU features that can be dispatched at
# runtime, specified by the build option
# `-Dcpu-dispatch`.
#
# - `CPU_FEATURES`: A dictionary containing all supported CPU feature objects.
#
# Additionally, this script exposes a set of variables that represent each supported feature
# to be used within the Meson function `mod_features.multi_targets()`.
# Additionally, this script exposes a set of variables that represent each
# supported feature to be used within the Meson function
# `mod_features.multi_targets()`.

# Prefix used by all macros and features definitions
CPU_CONF_PREFIX = 'NPY_'
# main configuration name
CPU_CONF_CONFIG = 'npy_cpu_dispatch_config.h'

if get_option('disable-optimization')
add_project_arguments('-D'+CPU_CONF_PREFIX+'DISABLE_OPTIMIZATION', language: ['c', 'cpp'])
add_project_arguments('-D' + CPU_CONF_PREFIX + 'DISABLE_OPTIMIZATION', language: ['c', 'cpp'])
CPU_CONF_BASELINE = 'none'
CPU_CONF_DISPATCH = 'none'
else
Expand All @@ -62,10 +69,8 @@ else
CPU_CONF_DISPATCH = get_option('cpu-dispatch')
endif

# Initialize the CPU features
# Export the X86 features objects 'SSE', 'AVX', etc
# and a dictionary "X86_FEATURES" which maps to each
# object by its name
# Initialize the CPU features Export the X86 features objects 'SSE', 'AVX',
# etc. plus a dictionary "X86_FEATURES" which maps to each object by its name
subdir('x86')
subdir('ppc64')
subdir('s390x')
Expand All @@ -77,8 +82,8 @@ CPU_FEATURES += X86_FEATURES
CPU_FEATURES += PPC64_FEATURES
CPU_FEATURES += S390X_FEATURES

# Parse the requsted baseline(CPU_CONF_BASELINE)
# and dispatch features(CPU_CONF_DISPATCH).
# Parse the requsted baseline (CPU_CONF_BASELINE) and dispatch features
# (CPU_CONF_DISPATCH).
cpu_family = host_machine.cpu_family()
# Used by build option 'min'
min_features = {
Expand Down Expand Up @@ -207,14 +212,16 @@ foreach opt_name, conf : parse_options
endif
endforeach # opt_name, conf : parse_options

# Test the baseline and dispatch features and set their flags and #definitions across all sources.
# Test the baseline and dispatch features and set their flags and #definitions
# across all sources.
#
# It is important to know that this test enables the maximum supported features by the platform
# depending on the required features.
# It is important to know that this test enables the maximum supported features
# by the platform depending on the required features.
#
# For example, if the user specified `--cpu-baseline=avx512_skx`, and the compiler doesn't support it,
# but still supports any of the implied features, then we enable the maximum supported implied features, e.g., AVX2,
# which can be done by specifying `anyfet: true` to the test function.
# For example, if the user specified `--cpu-baseline=avx512_skx`, and the
# compiler doesn't support it, but still supports any of the implied features,
# then we enable the maximum supported implied features, e.g., AVX2, which can
# be done by specifying `anyfet: true` to the test function.
if parse_result['cpu-baseline'].length() > 0
baseline = mod_features.test(parse_result['cpu-baseline'], anyfet: true)[1]
baseline_args = baseline['args']
Expand All @@ -225,16 +232,15 @@ if parse_result['cpu-baseline'].length() > 0
else
baseline = {}
endif
# The name of the baseline features including
# its implied features.
# The name of the baseline features including its implied features.
CPU_BASELINE_NAMES = baseline.get('features', [])
CPU_BASELINE = []
foreach fet_name : CPU_BASELINE_NAMES
CPU_BASELINE += [CPU_FEATURES[fet_name]]
endforeach
# Loop all initlized features and disable any
# feature not part of requsted baseline and dispatch features
# to avoid been enabled by import('feature').multi_targets
# Loop over all initialized features and disable any feature that is not part
# of the requested baseline and dispatch features to avoid it enabled by
# import('feature').multi_targets
foreach fet_name, fet_obj : CPU_FEATURES
if fet_obj in parse_result['cpu-dispatch'] or fet_name in CPU_BASELINE_NAMES
continue
Expand All @@ -254,8 +260,8 @@ foreach fet_obj : parse_result['cpu-dispatch']
endif
CPU_DISPATCH_NAMES += [fet_obj.get('name')]
endforeach
# Generate main configuration header
# see 'main_config.h.in' for more clearfiction
# Generate main configuration header see 'main_config.h.in' for more
# clarification.
main_config = {
'P': CPU_CONF_PREFIX,
'WITH_CPU_BASELINE': ' '.join(CPU_BASELINE_NAMES),
Expand Down Expand Up @@ -287,7 +293,7 @@ add_project_arguments(

message(
'''
CPU Optimiztion Options
CPU Optimization Options
baseline:
Requested : @0@
Enabled : @1@
Expand Down
4 changes: 2 additions & 2 deletions meson_cpu/ppc64/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ VSX2 = mod_features.new(
detect: {'val': 'VSX2', 'match': 'VSX'},
test_code: files(source_root + '/numpy/distutils/checks/cpu_vsx2.c')[0],
)
# VSX2 is hardware baseline feature on ppc64le
# since the first little-endian support was part of Power8
# VSX2 is hardware baseline feature on ppc64le since the first little-endian
# support was part of Power8
if host_machine.endian() == 'little'
VSX.update(implies: VSX2)
endif
Expand Down
1 change: 1 addition & 0 deletions meson_cpu/s390x/meson.build
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
source_root = meson.project_source_root()
mod_features = import('features')

VX = mod_features.new(
'VX', 1, args: ['-mzvector', '-march=arch11'],
test_code: files(source_root + '/numpy/distutils/checks/cpu_vx.c')[0],
Expand Down
29 changes: 17 additions & 12 deletions meson_cpu/x86/meson.build
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
source_root = meson.project_source_root()
mod_features = import('features')

SSE = mod_features.new(
'SSE', 1, args: '-msse',
test_code: files(source_root + '/numpy/distutils/checks/cpu_sse.c')[0]
Expand All @@ -9,8 +10,7 @@ SSE2 = mod_features.new(
args: '-msse2',
test_code: files(source_root + '/numpy/distutils/checks/cpu_sse2.c')[0]
)
# enabling SSE without SSE2 is useless also
# it's non-optional for x86_64
# enabling SSE without SSE2 is useless also it's non-optional for x86_64
SSE.update(implies: SSE2)
SSE3 = mod_features.new(
'SSE3', 3, implies: SSE2,
Expand Down Expand Up @@ -66,9 +66,9 @@ AVX2 = mod_features.new(
# 25-40 left as margin for any extra features
AVX512F = mod_features.new(
'AVX512F', 40, implies: [FMA3, AVX2],
# Disables mmx due to cause stack corruption
# that may happens during mask converstions.
# TODO(seiko2plus): provides more clarification
# Disables mmx because of stack corruption that may happen during mask
# conversions.
# TODO (seiko2plus): provide more clarification
args: ['-mno-mmx', '-mavx512f'],
detect: {'val': 'AVX512F', 'match': '.*'},
test_code: files(source_root + '/numpy/distutils/checks/cpu_avx512f.c')[0],
Expand Down Expand Up @@ -125,17 +125,19 @@ AVX512_SPR = mod_features.new(
group: ['AVX512FP16'],
test_code: files(source_root + '/numpy/distutils/checks/cpu_avx512_spr.c')[0]
)

# Specializations for non unix-like compilers
# ---------------------------------------------
# -------------------------------------------
cpu_family = host_machine.cpu_family()
compiler_id = meson.get_compiler('c').get_id()
if compiler_id not in ['gcc', 'clang']
AVX512_SPR.update(disable: compiler_id + ' compiler does not support it')
endif
# Common specializations bettween both Intel compilers (unix-like and msvc-like)

# Common specializations between both Intel compilers (unix-like and msvc-like)
if compiler_id in ['intel', 'intel-cl']
# POPCNT, and F16C don't own private FLAGS still
# the compiler provides ISA capability for them.
# POPCNT, and F16C don't own private FLAGS however the compiler still
# provides ISA capability for them.
POPCNT.update(args: '')
F16C.update(args: '')
# Intel compilers don't support the following features independently
Expand All @@ -146,6 +148,7 @@ if compiler_id in ['intel', 'intel-cl']
XOP.update(disable: 'Intel Compiler does not support it')
FMA4.update(disable: 'Intel Compiler does not support it')
endif

if compiler_id == 'intel-cl'
foreach fet : [SSE, SSE2, SSE3, SSSE3, AVX]
fet.update(args: {'val': '/arch:' + fet.get('name'), 'match': '/arch:.*'})
Expand All @@ -163,6 +166,7 @@ if compiler_id == 'intel-cl'
AVX512_CNL.update(args: {'val': '/Qx:CANNONLAKE', 'match': '/[arch|Qx]:.*'})
AVX512_ICL.update(args: {'val': '/Qx:ICELAKE-CLIENT', 'match': '/[arch|Qx]:.*'})
endif

if compiler_id == 'intel'
clear_m = '^(-mcpu=|-march=)'
clear_any = '^(-mcpu=|-march=|-x[A-Z0-9\-])'
Expand All @@ -177,13 +181,14 @@ if compiler_id == 'intel'
AVX512_CNL.update(args: {'val': '-xCANNONLAKE', 'match': clear_any})
AVX512_ICL.update(args: {'val': '-xICELAKE-CLIENT', 'match': clear_any})
endif

if compiler_id == 'msvc'
# MSVC Compiler doesn't support the following features
# MSVC compiler doesn't support the following features
foreach fet : [AVX512_KNL, AVX512_KNM]
fet.update(disable: compiler_id + ' compiler does not support it')
endforeach
# The following features don't own private FLAGS still
# the compiler provides ISA capability for them.
# The following features don't own private FLAGS, however the compiler still
# provides ISA capability for them.
foreach fet : [
SSE3, SSSE3, SSE41, POPCNT, SSE42, AVX, F16C, XOP, FMA4,
AVX512F, AVX512CD, AVX512_CLX, AVX512_CNL,
Expand Down
12 changes: 6 additions & 6 deletions meson_options.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
option('blas', type: 'string', value: 'openblas',
description: 'option for BLAS library switching')
description: 'Option for BLAS library switching')
option('lapack', type: 'string', value: 'openblas',
description: 'option for LAPACK library switching')
description: 'Option for LAPACK library switching')
option('allow-noblas', type: 'boolean', value: false,
description: 'If set to true, allow building with (slow!) internal fallback routines')
option('use-ilp64', type: 'boolean', value: false,
Expand All @@ -13,11 +13,11 @@ option('disable-svml', type: 'boolean', value: false,
option('disable-threading', type: 'boolean', value: false,
description: 'Disable threading support (see `NPY_ALLOW_THREADS` docs)')
option('disable-optimization', type: 'boolean', value: false,
description: 'Disable CPU optimized code(dispatch,simd,unroll...)')
description: 'Disable CPU optimized code (dispatch,simd,unroll...)')
option('cpu-baseline', type: 'string', value: 'min',
description: 'minimal set of required CPU features')
description: 'Minimal set of required CPU features')
option('cpu-dispatch', type: 'string', value: 'max -xop -fma4',
description: 'dispatched set of additional CPU features')
description: 'Dispatched set of additional CPU features')
option('test-simd', type: 'array',
value: [
'BASELINE', 'SSE2', 'SSE42', 'XOP', 'FMA4',
Expand All @@ -28,6 +28,6 @@ option('test-simd', type: 'array',
],
description: 'Specify a list of CPU features to be tested against NumPy SIMD interface')
option('test-simd-args', type: 'string', value: '',
description: 'Extra args to be passed to _simd module that used for testing Numpy SIMD inteface')
description: 'Extra args to be passed to the `_simd` module that is used for testing the NumPy SIMD interface')
option('relaxed-strides-debug', type: 'boolean', value: false,
description: 'Enable relaxed strides debug mode (see `NPY_RELAXED_STRIDES_DEBUG` docs)')
25 changes: 11 additions & 14 deletions numpy/core/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,7 @@ cdata.set('NPY_API_VERSION', C_API_VERSION)
use_svml = (
host_machine.system() == 'linux' and
host_machine.cpu_family() == 'x86_64' and
('AVX512_SKX' in CPU_DISPATCH_NAMES or
'AVX512_SKX' in CPU_BASELINE_NAMES) and
('AVX512_SKX' in CPU_DISPATCH_NAMES or 'AVX512_SKX' in CPU_BASELINE_NAMES) and
not get_option('disable-svml')
)
if use_svml
Expand Down Expand Up @@ -324,16 +323,14 @@ optional_function_attributes = [
# endif
#endforeach

# Max possible optimiztion flags
# We pass this flags to all our dispatch-able
# Max possible optimization flags. We pass this flags to all our dispatch-able
# (multi_targets) sources.
compiler = meson.get_compiler('c')
compiler_id = compiler.get_id()
compiler_id = cc.get_id()
max_opt = {
'msvc': ['/O2'],
'intel-cl': ['/O3'],
}.get(compiler_id, ['-O3'])
max_opt = compiler.has_multi_arguments(max_opt) ? max_opt : []
max_opt = cc.has_multi_arguments(max_opt) ? max_opt : []

# Optional GCC compiler builtins and their call arguments.
# If given, a required header and definition name (HAVE_ prepended)
Expand Down Expand Up @@ -844,8 +841,8 @@ foreach gen_mtargets : [
[
'loops_exponent_log.dispatch.h',
src_file.process('src/umath/loops_exponent_log.dispatch.c.src'),
# Enables SIMD on clang-cl raises spurious FP exceptions
# TODO(seiko2plus): Debug spurious FP exceptions for single-precision log/exp
# Enabling SIMD on clang-cl raises spurious FP exceptions
# TODO (seiko2plus): debug spurious FP exceptions for single-precision log/exp
compiler_id == 'clang-cl' ? [] : [
AVX512_SKX, AVX512F, [AVX2, FMA3]
]
Expand Down Expand Up @@ -890,8 +887,8 @@ foreach gen_mtargets : [
[
'loops_trigonometric.dispatch.h',
src_file.process('src/umath/loops_trigonometric.dispatch.c.src'),
# Enables SIMD on clang-cl raises spurious FP exceptions
# TODO(seiko2plus): Debug spurious FP exceptions for single-precision sin/cos
# Enabling SIMD on clang-cl raises spurious FP exceptions
# TODO (seiko2plus): debug spurious FP exceptions for single-precision sin/cos
compiler_id == 'clang-cl' ? [] : [
AVX512F, [AVX2, FMA3],
VSX4, VSX3, VSX2,
Expand Down Expand Up @@ -1181,9 +1178,9 @@ _simd_mtargets = mod_features.multi_targets(
src_file.process('src/_simd/_simd_data.inc.src'),
src_file.process('src/_simd/_simd.dispatch.c.src'),
],
# Skip validating the sort of `_simd_dispatch` because we execute all these features,
# not just the highest interest one. The sorting doesn't matter here,
# given the nature of this testing unit.
# Skip validating the order of `_simd_dispatch` because we execute all these
# features, not just the highest interest one. The sorting doesn't matter
# here, given the nature of this testing unit.
keep_sort: true,
dispatch: _simd_dispatch,
baseline: _simd_baseline,
Expand Down
0