8000 Question: Should `__cpu_features__ ` provide target information · Issue #15558 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Question: Should __cpu_features__ provide target information #15558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seberg opened this issue Feb 12, 2020 · 3 comments
Closed

Question: Should __cpu_features__ provide target information #15558

seberg opened this issue Feb 12, 2020 · 3 comments

Comments

@seberg
Copy link
Member
seberg commented Feb 12, 2020

We just added the __cpu_features__ structure. I am not quite sure what it provides, but there are up to three things that are interesting.

The CPU features:

  1. supported by the system
  2. NumPy is compiled with (targeted)
  3. the union of both (i.e. the ones that are actually in use for the running instance)

I am wondering if we should add that information @seiko2plus, @mattip? That could be either by providing an additional struct, or assuming that we currently have the "supported by the system" definition, we change True to a string "enabled" and "supported" (the truthiness of which actually evaluates to the identical thing). Information "2." would not be available, but I am not sure it is super relevant.

EDIT: Sorry, if we are using definition "3.", I think that is probably all good, but I thought we may be using definition "1." in which case I wonder it is what we actually want to know in most cases.

@seiko2plus
Copy link
Member
seiko2plus commented Feb 13, 2020

IMHO __cpu_features__ should only contain what the running machine support and also separated from the infrastructure for the following reasons:

  • it may be used by other projects in different areas that don't rely on NumPy distutils

  • easy to dump by the end-user True or False too easy to get it, make it useful for reports.

  • easy to use by testing units and it's also the reason behind combining all features from all architecture together even if the CPU feature will always remain False,
    for example:

    from numpy.core._multiarray_umath import __cpu_features__ as cpu_have
    # in X86 VSX always exist and returns False
    # instead of setting a default value via get()
    if cpu_have["VSX"]:
        # good enough
        pass
    if cpu_have.get("VSX", False):
        # not necessary
        pass
    if cpu_have.get("VSX"):
        # not necessary too
        pass
    # also it can be used to check if NumPy support a certain feature or not,
    # or even validate it
    a_feature = "bla"
    if a_feature not in cpu_have:
        # cross-platform checking
        print("'%s' is an invalid CPU feature or maybe Numpy doesn't support yet" % a_feature)

2- NumPy is compiled with (targeted)

#13516, add new two attributes to umath module

  • __cpu_baseline__ a list of CPU baseline feature names that configured by --cpu-baseline
  • __cpu_dispatch__ a list of CPU dispatch feature names that configured by --cpu-dispatch
>>>  from numpy.core._multiarray_umath import __cpu_baseline__, __cpu_dispatch__
>>> __cpu_baseline__
['SSE', 'SSE2', 'SSE3']
>>> __cpu_dispatch__
['SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', 'AVX512CD', 'AVX512_KNL', 'AVX512_KNM', 'AVX512_SKX', 'AVX512_CLX', 'AVX512_CNL', 'AVX512_ICL']

also #13516, combine all new attrbutes togther to provides a minmal report in pytester

NumPy version 1.19.0.dev0+6e51f50
NumPy relaxed strides checking option: True
NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX
8000
2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?

but I think it needs to be improved like generate a python file contains the new attributes and import it into NumPy module instead of adding the new attributes to _multiarray_umath module, also add new dict contains the actual targeted features with list of the dispatch-able sources that generated for them, almost the same provided information in the build log report.

the build log report from #13516 on gcc and x86_64 :

########### EXT COMPILER OPTIMIZATION ###########
CPU baseline  : 
  Requested   : 'min'
  Enabled     : SSE SSE2 SSE3
  Flags       : -msse -msse2 -msse3

CPU dispatch  : 
  Requested   : 'max -xop -fma4'
  Enabled     : SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_KNL AVX512_KNM AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL
  Generated   : 
              : 
  SSE42       : SSE SSE2 SSE3 SSSE3 SSE41 POPCNT
  Flags       : -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2
  Detect      : SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42
              : build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd.dispatch.c
              : build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_intrinsics.dispatch.c
              : build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_type.dispatch.c
              : 
  AVX2        : SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C
  Flags       : -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mavx2
  Detect      : AVX F16C AVX2
              : build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd.dispatch.c
              : build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_intrinsics.dispatch.c
              : build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_type.dispatch.c
              : build/src.linux-x86_64-3.7/numpy/core/src/umath/loops_fast.dispatch.c
              : 
  AVX512F     : SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2
  Flags       : -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2 -mavx512f
  Detect      : AVX512F
              : build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd.dispatch.c
              : build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_intrinsics.dispatch.c
              : build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_type.dispatch.c
              : 
  AVX512_SKX  : SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD
  Flags       : -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq
  Detect      : AVX512_SKX
              : build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd.dispatch.c
              : build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_intrinsics.dispatch.c
              : build/src.linux-x86_64-3.7/numpy/core/src/_simd/_simd_type.dispatch.c
              : build/src.linux-x86_64-3.7/numpy/core/src/umath/loops_fast.dispatch.c
CCompilerOpt._cache_write[728] : write cache to path -> /home/seiko/repos/py/numpy/build/temp.linux-x86_64-3.7/ccompiler_opt_cache_ext.py

########### CLIB COMPILER OPTIMIZATION ###########
CPU baseline  : 
  Requested   : 'min'
  Enabled     : SSE SSE2 SSE3
  Flags       : -msse -msse2 -msse3

CPU dispatch  : 
  Requested   : 'max -xop -fma4'
  Enabled     : SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_KNL AVX512_KNM AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL
  Generated   : none
CCompilerOpt._cache_write[728] : write cache to path -> /home/seiko/repos/py/numpy/build/temp.linux-x86_64-3.7/ccompiler_opt_cache_clib.py

@seberg
Copy link
Member Author
seberg commented Feb 13, 2020

Sounds good, seems like if I want code to run only if AVX2 is supported I woul do:

avx_is_being_used =  "AVX2" in __cpu_dispatch__ and __cpu_features__.get("AVX2", False)

which is fair enough, thanks for the info. I guess we may bike-shed on the naming at some point, but you got it covered for starters.

@seberg seberg closed this as completed Feb 13, 2020
@seiko2plus
Copy link
Member
seiko2plus commented Feb 13, 2020

@seberg, the user can change the default settings, and AVX2 may be part of the baseline, so it should be like this

# better to have a function instead
def feature_is_supported(name):
    assert(isinstance(name, str))
    NAME = name.upper() # lets allow lower case
    if NAME not in __cpu_features__:
        raise ValueError("Invalid CPU feature '%s'" % name)
    if NAME in __cpu_baseline__:
        # there's no need check if the CPU supports it or not
        # since the module will fail to load with a runtime error,
        
7398
# if the runnung machine doesn't support the baseline features
        return True
    if NAME in __cpu_dispatch__ and __cpu_features__[NAME]:
        return True
    return False

avx2_is_being_used = feature_is_supported("AVX2")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0