-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Question: Should __cpu_features__
provide target information
#15558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
IMHO
#13516, add new two attributes to umath module
>>> from numpy.core._multiarray_umath import __cpu_baseline__, __cpu_dispatch__
>>> __cpu_baseline__
['SSE', 'SSE2', 'SSE3']
>>> __cpu_dispatch__
['SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', 'AVX512CD', 'AVX512_KNL', 'AVX512_KNM', 'AVX512_SKX', 'AVX512_CLX', 'AVX512_CNL', 'AVX512_ICL'] also #13516, combine all new attrbutes togther to provides a minmal report in pytester NumPy version 1.19.0.dev0+6e51f50
NumPy relaxed strides checking option: True
NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX
8000
2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL? but I think it needs to be improved like generate a python file contains the new attributes and import it into NumPy module instead of adding the new attributes to the build log report from #13516 on gcc and x86_64 : ########### EXT COMPILER OPTIMIZATION ###########
CPU baseline :
Requested : 'min'
Enabled : SSE SSE2 SSE3
Flags : -msse -msse2 -msse3
CPU dispatch :
Requested : 'max -xop -fma4'
Enabled : SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_KNL AVX512_KNM AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL
Generated :
:
SSE42 : SSE SSE2 SSE3 SSSE3 SSE41 POPCNT
Flags : -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2
Detect : SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42
: build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd.dispatch.c
: build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_intrinsics.dispatch.c
: build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_type.dispatch.c
:
AVX2 : SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C
Flags : -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mavx2
Detect : AVX F16C AVX2
: build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd.dispatch.c
: build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_intrinsics.dispatch.c
: build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_type.dispatch.c
: build/src.linux-x86_64-3.7/numpy/core/src/umath/loops_fast.dispatch.c
:
AVX512F : SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2
Flags : -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2 -mavx512f
Detect : AVX512F
: build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd.dispatch.c
: build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_intrinsics.dispatch.c
: build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_type.dispatch.c
:
AVX512_SKX : SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD
Flags : -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq
Detect : AVX512_SKX
: build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd.dispatch.c
: build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_intrinsics.dispatch.c
: build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_type.dispatch.c
: build/src.linux-x86_64-3.7/numpy/core/src/umath/loops_fast.dispatch.c
CCompilerOpt._cache_write[728] : write cache to path -> /home/seiko/repos/py/numpy/build/temp.linux-x86_64-3.7/ccompiler_opt_cache_ext.py
########### CLIB COMPILER OPTIMIZATION ###########
CPU baseline :
Requested : 'min'
Enabled : SSE SSE2 SSE3
Flags : -msse -msse2 -msse3
CPU dispatch :
Requested : 'max -xop -fma4'
Enabled : SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_KNL AVX512_KNM AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL
Generated : none
CCompilerOpt._cache_write[728] : write cache to path -> /home/seiko/repos/py/numpy/build/temp.linux-x86_64-3.7/ccompiler_opt_cache_clib.py |
Sounds good, seems like if I want code to run only if AVX2 is supported I woul do:
which is fair enough, thanks for the info. I guess we may bike-shed on the naming at some point, but you got it covered for starters. |
@seberg, the user can change the default settings, and AVX2 may be part of the baseline, so it should be like this # better to have a function instead
def feature_is_supported(name):
assert(isinstance(name, str))
NAME = name.upper() # lets allow lower case
if NAME not in __cpu_features__:
raise ValueError("Invalid CPU feature '%s'" % name)
if NAME in __cpu_baseline__:
# there's no need check if the CPU supports it or not
# since the module will fail to load with a runtime error,
7398
# if the runnung machine doesn't support the baseline features
return True
if NAME in __cpu_dispatch__ and __cpu_features__[NAME]:
return True
return False
avx2_is_being_used = feature_is_supported("AVX2") |
Uh oh!
There was an error while loading. Please reload this page.
We just added the
__cpu_features__
structure. I am not quite sure what it provides, but there are up to three things that are interesting.The CPU features:
I am wondering if we should add that information @seiko2plus, @mattip? That could be either by providing an additional struct, or assuming that we currently have the "supported by the system" definition, we change
True
to a string"enabled"
and"supported"
(the truthiness of which actually evaluates to the identical thing). Information "2." would not be available, but I am not sure it is super relevant.EDIT: Sorry, if we are using definition "3.", I think that is probably all good, but I thought we may be using definition "1." in which case I wonder it is what we actually want to know in most cases.
The text was updated successfully, but these errors were encountered: