8000 import numpy crashes at xgetbv instruction in npy_cpu_supports on a VM · Issue #15342 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

import numpy crashes at xgetbv instruction in npy_cpu_supports on a VM #15342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ryandesign opened this issue Jan 18, 2020 · 8 comments
Closed

Comments

@ryandesign
Copy link
ryandesign commented Jan 18, 2020

Hi, I run the MacPorts automated build system and we have been experiencing an issue for awhile where import numpy causes Python to crash. I don't know if this happens for any users running macOS directly on their Macs but it does happen on one of our build machines—the one running macOS Sierra 10.12. The other build machines (which run earlier and later macOS versions) don't experience the problem. All the build machines are virtual machines running under VMware ESXi 6 on 2009-model Xserves.

I believe this is similar to or the same as #13059 but in case it's not I didn't want to mix up my information in that issue. In that issue, @eric-wieser speculated that HAVE_XGETBV is being set incorrectly. As my lldb trace below shows, we are crashing at the xgetbv instruction.

@seberg asked in #13059 if the problem was the VM claiming to support AVX when it does not. I have used the sample program posted by @ulido in #10330 to check whether the machine claims to have avx support (using __builtin_cpu_supports("avx")). In my case here in this macOS 10.12 VM, the sample program says "AVX not supported!"

Reproducing code example:

import numpy

Error message:

I am sorry, I am not familiar with gdb, but here's what happens under lldb:

Process 3114 resuming
Python 3.7.6 (default, Jan 17 2020, 18:32:15)
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
Process 3114 stopped
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x000000010391c691 _multiarray_umath.cpython-37m-darwin.so`npy_cpu_supports + 118
_multiarray_umath.cpython-37m-darwin.so`npy_cpu_supports:
->  0x10391c691 <+118>: xgetbv
    0x10391c694 <+121>: andl   $0x76, %eax
    0x10391c697 <+124>: movl   $0x76, %ecx
    0x10391c69c <+129>: jmp    0x10391c6dc               ; <+193>
Target 0: (Python) stopped.

Numpy/Python version information:

Python 3.7.6
Numpy 1.18.1
macOS 10.12.6 Build version 16G2136
Xcode 9.2 Build version 9C40b
Apple LLVM version 9.0.0 (clang-900.0.39.2)

@ryandesign
Copy link
Author

While trying to learn more about this problem, I found a bug report at another project where there was a crash at xgetbv on a machine that had avx support disabled. The fix there was to check that the CPUID.1:ECX.OSXSAVE bit is set first. I wasn't sure if something like that might be applicable for numpy too.

@qfan
Copy link
qfan commented Jan 23, 2020

I got the same error on a Google Cloud VM.
The /proc/cpuinfo I got is:

processor       : 3                                                                                                                                                                           
vendor_id       : GenuineIntel                                                                                                                                                                
cpu family      : 6                                                                                                                                                                           
model           : 44                                                                                                                                                                          
model name      : Westmere E56xx/L56xx/X56xx (Nehalem-C)                                                                                                                                      
stepping        : 1                                                                                                                                                                           
microcode       : 0x1                                                                                                                                                                         
cpu MHz         : 2199.816                                                                                                                                                                    
cache size      : 4096 KB                                                                                                                                                                     
physical id     : 3                                                                                                                                                                           
siblings        : 1                                                                                                                                                                           
core id         : 0                                                                                                                                                                           
cpu cores       : 1                                                                                                                                                                           
apicid          : 3                                                                                                                                                                           
initial apicid  : 3                                                                                                                                                                           
fpu             : yes                                                                                                                                                                         
fpu_exception   : yes                                                                                                                                                                         
cpuid level     : 11                                                                                                                                                                          
wp              : yes                                                                                                                                                                         
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc rep_good nopl nonstop_tsc pni pclmulq
dq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single tpr_shadow flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle av
x2 smep bmi2 erms invpcid rtm rdseed adx smap arat                                                                                                                                            
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds                                                                                                               
bogomips        : 4399.63                                                                                                                                                                     
clflush size    : 64                                                                                                                                                                          
cache_alignment : 64                                                                                                                                                                          
address sizes   : 40 bits physical, 48 bits virtual                                                                                                                                           
power management:           

So it appears that this thing does have avx and avx2 support in general, but not the xgetbv instruction.

In my case the crash points to the specific version and line:
https://github.com/numpy/numpy/blob/v1.17.3/numpy/core/src/umath/cpuid.c#L30

It seems that the xgetbv instruction is called without first checking whether it is available.

According to "Intel(R) Advanced Vector Extensions Programming Reference", Ref. # 319433-011

Note: It is unwise for an application to rely exclusively on CPUID.1:ECX.AVX[bit 28]
or at all on CPUID.1:ECX.XSAVE[bit 26]: These indicate hardware support but not
operating system support. If YMM state management is not enabled by an operating
systems, AVX instructions will #UD regardless of CPUID.1:ECX.AVX[bit 28].

@seberg
Copy link
Member
seberg commented Jan 23, 2020

@mattip, @seiko2plus does the new/proposed cpu detection code solve this issue? And @qfan is there a simple fix you can suggest to our current cpu detection functionality?

@qfan
Copy link
qfan commented Jan 23, 2020

@mattip, @seiko2plus does the new/proposed cpu de 8000 tection code solve this issue? And @qfan is there a simple fix you can suggest to our current cpu detection functionality?

I do not know any x86 assembly. But the "Intel(R) Advanced Vector Extensions Programming Reference" PDF (you can google it, but intel does not provide a direct link) has a 17-line pseudo-code for checking this, near the "note" I quoted in previous comment.

@seiko2plus
Copy link
Member

@ryandesign, yes that's right, OSXSAVE bit must be checked first before executing xgetbv,
@seberg, @qfan, gh-13421 should fix this issue, please could you try it? also you can dump the new module attribute __cpu_features__ that come with it. it should contain the runtime detection result.

@qfan
Copy link
qfan commented Jan 24, 2020

@ryandesign, yes that's right, OSXSAVE bit must be checked first before executing xgetbv,
@seberg, @qfan, gh-13421 should fix this issue, please could you try it? also you can dump the new module attribute __cpu_features__ that come with it. it should contain the runtime detection result.

I applied your pull request to numpy/master and it does work on that Westmere CPU. Great Job!

>>> import numpy
>>> numpy.__cpu_features__
{'MMX': True, 'SSE': True, 'SSE2': True, 'SSE3': True, 'SSSE3': True, 'SSE41': True, 'POPCNT': True, 'SSE42': True, 'AVX': False, 'F16C': True, 'XOP': False, 'FMA4': False, 'FMA3': False, 'AVX2': False, 'AVX512F': False, 'AVX512CD': False, 'AVX512ER': False, 'AVX512PF': False, 'AVX5124FMAPS': False, 'AVX5124VNNIW': False, 'AVX512VPOPCNTDQ': False, 'AVX512VL': False, 'AVX512BW': False, 'AVX512DQ': False, 'AVX512VNNI': False, 'AVX512IFMA': False, 'AVX512VBMI': False, 'AVX512VBMI2': False, 'AVX512BITALG': False, 'AVX512_KNL': False, 'AVX512_KNM': False, 'AVX512_SKX': False, 'AVX512_CLX': False, 'AVX512_CNL': False, 'AVX512_ICL': False, 'VSX': False, 'VSX2': False, 'VSX3': False, 'NEON': False, 'NEON_FP16': False, 'NEON_VFPV4': False, 'ASIMD': False, 'FPHP': False, 'ASIMDHP': False, 'ASIMDDP': False, 'ASIMDFHM': False} 

>>>

I'm going to lose access to this VM tomorrow, though. So I probably won't be able to test this again if you make any change later.

@ryandesign
Copy link
Author

gh-13421 should fix this issue, please could you try it?

It does on my macOS Sierra VM, thank you.

also you can dump the new module attribute __cpu_features__ that come with it.

$ python3.7
Python 3.7.6 (default, Jan 17 2020, 18:32:15)
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__cpu_features__
{'MMX': True, 'SSE': True, 'SSE2': True, 'SSE3': True, 'SSSE3': True, 'SSE41': True, 'POPCNT': True, 'SSE42': True, 'AVX': False, 'F16C': False, 'XOP': False, 'FMA4': False, 'FMA3': False, 'AVX2': False, 'AVX512F': False, 'AVX512CD': False, 'AVX512ER': False, 'AVX512PF': False, 'AVX5124FMAPS': False, 'AVX5124VNNIW': False, 'AVX512VPOPCNTDQ': False, 'AVX512VL': False, 'AVX512BW': False, 'AVX512DQ': False, 'AVX512VNNI': False, 'AVX512IFMA': False, 'AVX512VBMI': False, 'AVX512VBMI2': False, 'AVX512BITALG': False, 'AVX512_KNL': False, 'AVX512_KNM': False, 'AVX512_SKX': False, 'AVX512_CLX': False, 'AVX512_CNL': False, 'AVX512_ICL': False, 'VSX': False, 'VSX2': False, 'VSX3': False, 'NEON': False, 'NEON_FP16': False, 'NEON_VFPV4': False, 'ASIMD': False, 'FPHP': False, 'ASIMDHP': False, 'ASIMDDP': False, 'ASIMDFHM': False}
>>>

@ryandesign
Copy link
Author

1.19.0 fixes this. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0