-
Notifications
You must be signed in to change notification settings - Fork 24.2k
PyTorch VS2022 official build Windows binary illegal instruction on AVX2(max ISA level) CPU #145702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Suggest to dispatch this issue to Microsoft guys, and let them have a look. |
I suspect it has something to do with the way we detect and bypass flags, as by default PyTorch should be build even without AVX2 support, only specific dispatches should be enabled like that (and there used to be a test on Linux side that launch qemu to validate this is not the case) |
Update: |
Thank you for flagging. Will discuss it for triage review. Would be good to understand if issues is limited to Windows, or if Linux builds are affected as well (from #146792 it looks like the generic contamination problem) But may be for x86 platform it's time to revisit oldest supported CPU, as AVX2 were introduced in 2013 |
Hi @malfet |
We need to keeping AVX2 support for PyTorch, because of almost client CPUs(Desktop and Laptop) are AVX2 CPU. |
Hi @malfet , I checked the Intel clients CPUs ISA supporting table. For Intel clients CPUs, only Gen11 CPUs can support AVX512, so disable AVX2 is a horrible change.
Actrully, I have did a lot of attemption to fix issue, but it can't be reproduce locally. |
Sorry, I misread the issue. Proposal was not to drop AVX2, but rather make it a new base supported architecture, instead of SSE4, which is the current base. As to the nature of the problem: few ops are using mutexes, so linker in some cases can pick avx512 accelerated implementation instead of base one. And I guess it's non-deterministic |
During I debug on this issue: #147917, I found how to generate PDB to wheel package: 85ed6ad So, I re-debug this issue with PDB file, the callstack as below: ModLoad: 00007ff8`820d0000 00007ff8`820e2000 C:\Users\Xuhan\.conda\envs\debug_cpu_illegal_instruction\DLLs\_asyncio.pyd
ModLoad: 00007ff8`89b20000 00007ff8`89b2e000 C:\Users\Xuhan\.conda\envs\debug_cpu_illegal_instruction\DLLs\_overlapped.pyd
ModLoad: 00007ff8`e4e50000 00007ff8`e4eba000 C:\WINDOWS\system32\mswsock.dll
(3eb4.3a4c): Illegal instruction - code c000001d (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** WARNING: Unable to verify checksum for C:\Users\Xuhan\.conda\envs\debug_cpu_illegal_instruction\Lib\site-packages\torch\lib\torch_cpu.dll
torch_cpu!at::vec::AVX2::interleave2<c10::BFloat16>+0x661ed:
00007ff8`509e952d 62e17c28106515 vmovups ymm20,ymmword ptr [rbp+2A0h] ss:00000087`4efec3f5=00
2:009> k
# Child-SP RetAddr Call Site
00 00000087`4efec3a0 00007ff8`5094c4d0 torch_cpu!at::vec::AVX2::interleave2<c10::BFloat16>+0x661ed
01 00000087`4efec850 00007ff8`509a09ce torch_cpu!at::vec::AVX2::convert_to_int_of_same_size<c10::Half,short>+0x17f50
02 00000087`4efeca40 00007ff8`509b346d torch_cpu!at::vec::AVX2::interleave2<c10::BFloat16>+0x1d68e
03 00000087`4efecad0 00007ff8`508e8ee5 torch_cpu!at::vec::AVX2::interleave2<c10::BFloat16>+0x3012d
04 00000087`4efecc70 00007ff8`4cb72f7e torch_cpu!at::vec::AVX2::int_elementwise_binary_256<unsigned char,std::divides<unsigned char> >+0xdd205
05 00000087`4efecd50 00007ff8`4d86e5c1 torch_cpu!at::native::grid_sampler_2d_cpu+0x1fe
06 00000087`4efece60 00007ff8`4d864e14 torch_cpu!at::cpu::where_outf+0xba1
07 00000087`4efeceb0 00007ff8`4d7544a1 torch_cpu!at::cpu::bucketize_outf+0xef14
08 00000087`4efecf00 00007ff8`4f1e9d0f torch_cpu!at::_ops::grid_sampler_2d::redispatch+0xb1
09 00000087`4efecf70 00007ff8`4f269b4a torch_cpu!std::optional<std::tuple<at::Tensor,at::Tensor,at::Tensor> >::~optional<std::tuple<at::Tensor,at::Tensor,at::Tensor> >+0x13bf
0a 00000087`4efed160 00007ff8`4f24ecc8 torch_cpu!c10::impl::BoxedKernelWrapper<at::Tensor __cdecl(at::Tensor const & __ptr64,at::Tensor const & __ptr64,std::optional<at::Tensor> const & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,bool & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,c10::SymInt & __ptr64),void>::call+0x3074a
0b 00000087`4efed3d0 00007ff8`4d6e9925 torch_cpu!c10::impl::BoxedKernelWrapper<at::Tensor __cdecl(at::Tensor const & __ptr64,at::Tensor const & __ptr64,std::optional<at::Tensor> const & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,bool & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,c10::SymInt & __ptr64),void>::call+0x158c8
0c 00000087`4efed430 00007ff8`45b87ca7 torch_cpu!at::_ops::grid_sampler_2d::call+0x1d5
0d 00000087`4efed540 00007ff8`45be7d25 torch_python!THPPointer<_frame>::release+0xcdeb7
0e 00000087`4efed5a0 00007ff8`61bfa92c torch_python!THPPointer<_frame>::release+0x12df35
0f 00000087`4efed7e0 00007ff8`61bb003f python312!cfunction_call+0x5c [\objects\methodobject.c @ 540]
10 00000087`4efed810 00007ff8`61bb0235 python312!_PyObject_MakeTpCall+0x13f [\objects\call.c @ 240]
11 00000087`4efed870 00007ff8`61cc545d python312!PyObject_Vectorcall+0x35 [\objects\call.c @ 327]
12 00000087`4efed8b0 00007ff8`61bb0494 python312!_PyEval_EvalFrameDefault+0x642d [\pcbuild\python\bytecodes.c @ 2711]
13 00000087`4efeda70 00007ff8`61bb2d1b python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424]
14 (Inline Function) --------`-------- python312!_PyObject_VectorcallTstate+0x2f [\include\internal\pycore_call.h @ 92]
15 00000087`4efedab0 00007ff8`61bb018e python312!method_vectorcall+0x11b [\objects\classobject.c @ 69]
16 00000087`4efedb60 00007ff8`61bb02c6 python312!_PyVectorcall_Call+0xce [\objects\call.c @ 271]
17 00000087`4efedba0 00007ff8`61cc5768 python312!_PyObject_Call+0x46 [\objects\call.c @ 373]
18 00000087`4efedbe0 00007ff8`61bb0494 python312!_PyEval_EvalFrameDefault+0x6738 [\pcbuild\python\bytecodes.c @ 3259]
19 00000087`4efedda0 00007ff8`61bafd31 python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424]
1a 00000087`4efedde0 00007ff8`61bb0752 python312!_PyObject_FastCallDictTstate+0xb1 [\objects\call.c @ 146]
1b 00000087`4efede30 00007ff8`61c25d55 python312!_PyObject_Call_Prepend+0xa2 [\objects\call.c @ 508]
1c 00000087`4efeded0 00007ff8`61bb003f python312!slot_tp_call+0xf5 [\objects\typeobject.c @ 8764]
1d 00000087`4efedf30 00007ff8`61bb0235 python312!_PyObject_MakeTpCall+0x13f [\objects\call.c @ 240]
1e 00000087`4efedf90 00007ff8`61cc545d python312!PyObject_Vectorcall+0x35 [\objects\call.c @ 327]
1f 00000087`4efedfd0 00007ff8`61bb0494 python312!_PyEval_EvalFrameDefault+0x642d [\pcbuild\python\bytecodes.c @ 2711]
20 00000087`4efee190 00007ff8`61bafd31 python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424]
21 00000087`4efee1d0 00007ff8`61bb0752 python312!_PyObject_FastCallDictTstate+0xb1 [\objects\call.c @ 146]
22 00000087`4efee220 00007ff8`61c25d55 python312!_PyObject_Call_Prepend+0xa2 [\objects\call.c @ 508]
23 00000087`4efee2c0 00007ff8`61bb0356 python312!slot_tp_call+0xf5 [\objects\typeobject.c @ 8764]
24 00000087`4efee320 00007ff8`61cc5768 python312!_PyObject_Call+0xd6 [\objects\call.c @ 369]
25 00000087`4efee360 00007ff8`61bb0494 python312!_PyEval_EvalFrameDefault+0x6738 [\pcbuild\python\bytecodes.c @ 3259]
26 00000087`4efee520 00007ff8`61bafd31 python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424]
27 00000087`4efee560 00007ff8`61bb0752 python312!_PyObject_FastCallDictTstate+0xb1 [\objects\call.c @ 146]
28 00000087`4efee5b0 00007ff8`61c25d55 python312!_PyObject_Call_Prepend+0xa2 [\objects\call.c @ 508]
29 00000087`4efee650 00007ff8`61bb003f python312!slot_tp_call+0xf5 [\objects\typeobject.c @ 8764]
2a 00000087`4efee6b0 00007ff8`61bb0235 python312!_PyObject_MakeTpCall+0x13f [\objects\call.c @ 240]
2b 00000087`4efee710 00007ff8`61cc545d python312!PyObject_Vectorcall+0x35 [\objects\call.c @ 327]
2c 00000087`4efee750 00007ff8`61bb0494 python312!_PyEval_EvalFrameDefault+0x642d [\pcbuild\python\bytecodes.c @ 2711]
2d 00000087`4efee910 00007ff8`61bafd31 python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424]
2e 00000087`4efee950 00007ff8`61bb0752 python312!_PyObject_FastCallDictTstate+0xb1 [\objects\call.c @ 146]
2f 00000087`4efee9a0 00007ff8`61c25d55 python312!_PyObject_Call_Prepend+0xa2 [\objects\call.c @ 508]
30 00000087`4efeea40 00007ff8`61bb003f python312!slot_tp_call+0xf5 [\objects\typeobject.c @ 8764]
31 00000087`4efeeaa0 00007ff8`61bb0235 python312!_PyObject_MakeTpCall+0x13f [\objects\call.c @ 240]
32 00000087`4efeeb00 00007ff8`61cc545d python312!PyObject_Vectorcall+0x35 [\objects\call.c @ 327]
33 00000087`4efeeb40 00007ff8`61bb0494 python312!_PyEval_EvalFrameDefault+0x642d [\pcbuild\python\bytecodes.c @ 2711]
34 00000087`4efeed00 00007ff8`61bafd31 python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424]
35 00000087`4efeed40 00007ff8`61bb0752 python312!_PyObject_FastCallDictTstate+0xb1 [\objects\call.c @ 146]
36 00000087`4efeed90 00007ff8`61c25d55 python312!_PyObject_Call_Prepend+0xa2 [\objects\call.c @ 508]
37 00000087`4efeee30 00007ff8`61bb003f python312!slot_tp_call+0xf5 [\objects\typeobject.c @ 8764]
38 00000087`4efeee90 00007ff8`61bb0235 python312!_PyObject_MakeTpCall+0x13f [\objects\call.c @ 240]
39 00000087`4efeeef0 00007ff8`61cc545d python312!PyObject_Vectorcall+0x35 [\objects\call.c @ 327]
3a 00000087`4efeef30 00007ff8`61cbee66 python312!_PyEval_EvalFrameDefault+0x642d [\pcbuild\python\bytecodes.c @ 2711]
3b (Inline Function) --------`-------- python312!_PyEval_EvalFrame+0x1b [\include\internal\pycore_ceval.h @ 88]
3c (Inline Function) --------`-------- python312!_PyEval_Vector+0x55 [\python\ceval.c @ 1675]
3d 00000087`4efef0f0 00007ff8`61cba816 python312!PyEval_EvalCode+0xe6 [\python\ceval.c @ 570]
3e 00000087`4efef170 00007ff8`61cb7e6e python312!builtin_exec_impl+0x256 [\python\bltinmodule.c @ 1096]
3f 00000087`4efef200 00007ff8`61bfa5f1 python312!builtin_exec+0xce [\python\clinic\bltinmodule.c.h @ 586]
40 00000087`4efef2b0 00007ff8`61bafae9 python312!cfunction_vectorcall_FASTCALL_KEYWORDS+0x81 [\objects\methodobject.c @ 439]
41 00000087`4efef2e0 00007ff8`61bb0235 python312!_PyObject_VectorcallTstate+0x49 [\include\internal\pycore_call.h @ 93]
42 00000087`4efef320 00007ff8`61cc545d python312!PyObject_Vectorcall+0x35 [\objects\call.c @ 327]
43 00000087`4efef360 00007ff8`61bb0494 python312!_PyEval_EvalFrameDefault+0x642d [\pcbuild\python\bytecodes.c @ 2711]
44 00000087`4efef520 00007ff8`61bb018e python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424]
45 00000087`4efef560 00007ff8`61bb02c6 python312!_PyVectorcall_Call+0xce [\objects\call.c @ 271]
46 00000087`4efef5a0 00007ff8`61b2fa5f python312!_PyObject_Call+0x46 [\objects\call.c @ 373]
47 (Inline Function) --------`-------- python312!PyObject_Call+0x41 [\objects\call.c @ 379]
48 00000087`4efef5e0 00007ff8`61b307e1 python312!pymain_run_module+0x2cf [\modules\main.c @ 301]
49 00000087`4efef630 00007ff8`61b30cf8 python312!pymain_run_python+0x411 [\modules\main.c @ 608]
4a 00000087`4efef6c0 00007ff8`61b30d82 python312!Py_RunMain+0x18 [\modules\main.c @ 691]
4b (Inline Function) --------`-------- python312!pymain_main+0x4b [\modules\main.c @ 719]
4c 00000087`4efef6f0 00007ff6`94aa1494 python312!Py_Main+0x52 [\modules\main.c @ 732]
4d (Inline Function) --------`-------- python!invoke_main+0x22 [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 90]
4e 00000087`4efef760 00007ff8`e87ae8d7 python!__scrt_common_main_seh+0x10c [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288]
4f 00000087`4efef7a0 00007ff8`e98fbf2c KERNEL32!BaseThreadInitThunk+0x17
50 00000087`4efef7d0 00000000`00000000 ntdll!RtlUserThreadStart+0x2c The code optimization make the callstack don't have enough information for debug. |
VS2019 code was cleaned up: #145863 |
Some initial observations:
|
Hi @Blackhex, I tried to reproduce locally with the latest VS2022 installed by GUI, the issue don't occurred this issue. So, I replaced the VS2022 download URL to latest version, the issue still occurred:
Based on above steps, I guess two potential reason:
|
Hello @xuhancn. Thank you for this important observation. I can confirm that building the repro case with VS2022 17.13.6 does not reproduce the issue on 13th gen CPU while it does with VS2022 17.8.18 which is the version that currently uses nightly. I was not able to reproduce it with either of those versions on 11th gen CPU. |
Could you please validate this case also: #152385 ![]() |
I can confirm that the #152385 repro-case fails with Build Tools 17.8.18 while it works with 17.13.6 on 13th gen CPU. I will bisect what version fixed the issue to get more insights. As this seems like an issue with compiler, that has been already fixed, it would help if anyone had some narrow C++ repro-case. Until we fully understand the root cause, updating the Build Tools seems to be the best fix/workaround. |
@Blackhex Good to hear VS2022 17.13.6 can fix the issue. And I also checked VS2022 release note: https://learn.microsoft.com/en-us/visualstudio/releases/2022/release-notes |
JFYI, the latest Build Tools version that has the issue is 17.8.20. Versions 17.9.0-17.9.3 failed to compile with ICE and the earliest working version is 17.9.4. |
Updated status on May 10th:
![]() Mannul launched and check the version info.
![]()
![]() This behavior as same as I tried on Dec 2024. @Blackhex how to upgrade VS2022 to new version correctly? if it is the installation command caused the issue as I mentioned: #145702 (comment) |
@xuhancn The toolchain for the PyTorch build is selected by loading the development shell, either from:
for Build Tools, or:
for full Visual Studio for local builds. The exact path may be different based on the product you've installed. It's quite a long time I was working on updating VS2019 to VS2022 so I don't remember the details. I'd guess that the https://github.com/pytorch/pytorch/pull/153322/files#diff-3bbee23c5ce8d35d620708f0a6107b0fd0fb55fc552528aecc22bf49d24925f7 script is not used for CI build at all. I can refer you to my old PR doing the update pytorch/test-infra#1175 but I am not sure how much it's relevant nowadays. |
I checked build log: https://github.com/pytorch/pytorch/actions/runs/14945315292/job/41987884812?pr=153322 ![]() It called my updated script. |
I have added back VS2019 installation script, and I found the installer download place is different:
For VS2022, it is downloaded from MSFT official link:
So, how can we install VS2022 correctly? |
IIRC, infrastructure engineers placed Build Tools installer on AWS storage because there were occasional download failures form the official source. |
I've checked relevant MSVC issues and I can provide an unverified hypothesis what could be the root cause and another possible workaround:
|
Hi @kalaskarsanket , @Blackhex It works: #153480 |
Awesome. Thank you for handling this. |
🐛 Describe the bug
Background
This issue is re-submit of #145042, because of we taked about this issue and we think the original issue will make us confuse it is a
XPU
related issue. But acturally it is aCPU
only issue.Reproduce steps:
ciflow/binaries
tag to trigger nightly binary build.wheel-py3_10-cpu
wheel fromArtifacts
page https://github.com/pytorch/pytorch/actions/runs/12972478091?pr=143791AVX2
.pytest
:command line:
Root cause
I debugged it via WinDBG, the reason is VS2022 genarated
AVX512
instruction, and it is run on the client CPU, which max ISA level isAVX2
.Additional information
low priority
tag.cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @malfet @seemethere @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10
The text was updated successfully, but these errors were encountered: