8000 PyTorch VS2022 official build Windows binary illegal instruction on AVX2(max ISA level) CPU · Issue #145702 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

PyTorch VS2022 official build Windows binary illegal instruction on AVX2(max ISA level) CPU #145702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
xuhancn opened this issue Jan 26, 2025 · 27 comments · May be fixed by #153480
Open

PyTorch VS2022 official build Windows binary illegal instruction on AVX2(max ISA level) CPU #145702

xuhancn opened this issue Jan 26, 2025 · 27 comments · May be fixed by #153480
Assignees
Labels
high priority module: build Build system issues module: cpu CPU specific problem (e.g., perf, algorithm) module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: windows Windows support for PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@xuhancn
Copy link
Collaborator
xuhancn commented Jan 26, 2025

🐛 Describe the bug

Background

This issue is re-submit of #145042, because of we taked about this issue and we think the original issue will make us confuse it is a XPU related issue. But acturally it is a CPU only issue.

Reproduce steps:

  1. It is easy step to reproduce it, just switch Windows CPU build to VS2022. Reference to [don't merge] use vs2022 build windows cpu wheel. #143791
  2. Add ciflow/binaries tag to trigger nightly binary build.
  3. Download wheel-py3_10-cpu wheel from Artifacts page https://github.com/pytorch/pytorch/actions/runs/12972478091?pr=143791
  4. Install the wheel to the CPU, which max ISA level is AVX2.
  5. Run the reproduce code via pytest:
# cpu_vs2022_inst_issue.py
import torch
class TestClass:
    def test_grid_sampler_2d(self):
        torch.manual_seed(0)
        b = torch.rand(2, 13, 10, 2, dtype=torch.float64)
        a = torch.rand(2, 3, 5, 20, dtype=torch.float64)
        torch.grid_sampler_2d(a, b, interpolation_mode=0, padding_mode=0, align_corners=False)

command line:

pytest -v cpu_vs2022_inst_issue.py

Root cause

Image

I debugged it via WinDBG, the reason is VS2022 genarated AVX512 instruction, and it is run on the client CPU, which max ISA level is AVX2.

Additional information

  1. This issue is not impact on currenct PyTorch officical binaries, because of current PyTorch official binaries built by VS2019.
  2. The PyTorch CI can't test this issue, due to the CI runs on server CPU, which is support AVX512.
  3. I tried to reproduce it on my local VS2022 build environment, but it is can't reproduce. I think it only occurred issue on PyTorch official build environment.
  4. I just open this issue for track it, to avoid to upgrade VS2022 and occurred this issue, in the further.
  5. It is not impact on official PyTorch binary, so I will add low priority tag.

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @malfet @seemethere @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@xuhancn xuhancn added low priority We're unlikely to get around to doing this in the near future module: windows Windows support for PyTorch module: cpu CPU specific problem (e.g., perf, algorithm) labels Jan 26, 2025
@xuhancn
Copy link
Collaborator Author
xuhancn commented Jan 26, 2025

Suggest to dispatch this issue to Microsoft guys, and let them have a look.

@malfet
Copy link
Contributor
malfet commented Jan 27, 2025

I suspect it has something to do with the way we detect and bypass flags, as by default PyTorch should be build even without AVX2 support, only specific dispatches should be enabled like that (and there used to be a test on Linux side that launch qemu to validate this is not the case)

@malfet malfet added module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: build Build system issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jan 27, 2025
@xuhancn
Copy link
Collaborator Author
xuhancn commented Feb 20, 2025

Update:
PyTorch official build switched to VS2022: #145319

@xuhancn
Copy link
Collaborator Author
xuhancn commented Feb 20, 2025

I awared that official PyTorch build switched to VS2022. I tried the latest PyTorch CPU only nightly build(20250219).
The issue still existing in nightly build. So, I will remove low priority tag.
CC #145319 author and reviewer: @Camyll, @huydhn

@xuhancn xuhancn removed the low priority We're unlikely to get around to doing this in the near future label Feb 20, 2025
@malfet malfet removed the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 20, 2025
@malfet
Copy link
Contributor
malfet commented Feb 20, 2025

Thank you for flagging. Will discuss it for triage review. Would be good to understand if issues is limited to Windows, or if Linux builds are affected as well (from #146792 it looks like the generic contamination problem)

But may be for x86 platform it's time to revisit oldest supported CPU, as AVX2 were introduced in 2013

@xuhancn
Copy link
Collaborator Author
xuhancn commented Feb 20, 2025

Thank you for flagging. Will discuss it for triage review. Would be good to understand if issues is limited to Windows, or if Linux builds are affected as well (from #146792 it looks like the generic contamination problem)

But may be for x86 platform it's time to revisit oldest supported CPU, as AVX2 were introduced in 2013

Hi @malfet
I didn't known why it is only occurred on PyTorch official build environment. We can build it okay on our local environment. I think we have chance to fix it, by figure out the difference between official and local environments.

< 8000 a class="d-inline-block" data-hovercard-type="user" data-hovercard-url="/users/xuhancn/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="/xuhancn">@xuhancn
Copy link
Collaborator Author
xuhancn commented Feb 20, 2025

We need to keeping AVX2 support for PyTorch, because of almost client CPUs(Desktop and Laptop) are AVX2 CPU.

@xuhancn
Copy link
Collaborator Author
xuhancn commented Feb 20, 2025

Hi @malfet ,

I checked the Intel clients CPUs ISA supporting table. For Intel clients CPUs, only Gen11 CPUs can support AVX512, so disable AVX2 is a horrible change.
My proposal is that two options:

  1. Could you please provides me more detailed informations, and I can try to fix VS2022 illegal instruction issue. As you metioned: PyTorch VS2022 official build Windows binary illegal instruction on AVX2(max ISA level) CPU #145702 (comment)
  2. Revert the switch to VS2022 PR: Windows builds with VS2022 #145319 and roll back to VS2019.

Actrully, I have did a lot of attemption to fix issue, but it can't be reproduce locally.

@malfet
Copy link
Contributor
malfet commented Feb 20, 2025

We need to keeping AVX2 support for PyTorch, because of almost client CPUs(Desktop and Laptop) are AVX2 CPU.

Sorry, I misread the issue. Proposal was not to drop AVX2, but rather make it a new base supported architecture, instead of SSE4, which is the current base.

As to the nature of the problem: few ops are using mutexes, so linker in some cases can pick avx512 accelerated implementation instead of base one. And I guess it's non-deterministic

@mikaylagawarecki mikaylagawarecki added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Feb 24, 2025
@xuhancn
Copy link
Collaborator Author
xuhancn commented Mar 4, 2025

During I debug on this issue: #147917, I found how to generate PDB to wheel package: 85ed6ad

So, I re-debug this issue with PDB file, the callstack as below:

ModLoad: 00007ff8`820d0000 00007ff8`820e2000   C:\Users\Xuhan\.conda\envs\debug_cpu_illegal_instruction\DLLs\_asyncio.pyd
ModLoad: 00007ff8`89b20000 00007ff8`89b2e000   C:\Users\Xuhan\.conda\envs\debug_cpu_illegal_instruction\DLLs\_overlapped.pyd
ModLoad: 00007ff8`e4e50000 00007ff8`e4eba000   C:\WINDOWS\system32\mswsock.dll
(3eb4.3a4c): Illegal instruction - code c000001d (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** WARNING: Unable to verify checksum for C:\Users\Xuhan\.conda\envs\debug_cpu_illegal_instruction\Lib\site-packages\torch\lib\torch_cpu.dll
torch_cpu!at::vec::AVX2::interleave2<c10::BFloat16>+0x661ed:
00007ff8`509e952d 62e17c28106515  vmovups ymm20,ymmword ptr [rbp+2A0h] ss:00000087`4efec3f5=00
2:009> k
 # Child-SP          RetAddr               Call Site
00 00000087`4efec3a0 00007ff8`5094c4d0     torch_cpu!at::vec::AVX2::interleave2<c10::BFloat16>+0x661ed
01 00000087`4efec850 00007ff8`509a09ce     torch_cpu!at::vec::AVX2::convert_to_int_of_same_size<c10::Half,short>+0x17f50
02 00000087`4efeca40 00007ff8`509b346d     torch_cpu!at::vec::AVX2::interleave2<c10::BFloat16>+0x1d68e
03 00000087`4efecad0 00007ff8`508e8ee5     torch_cpu!at::vec::AVX2::interleave2<c10::BFloat16>+0x3012d
04 00000087`4efecc70 00007ff8`4cb72f7e     torch_cpu!at::vec::AVX2::int_elementwise_binary_256<unsigned char,std::divides<unsigned char> >+0xdd205
05 00000087`4efecd50 00007ff8`4d86e5c1     torch_cpu!at::native::grid_sampler_2d_cpu+0x1fe
06 00000087`4efece60 00007ff8`4d864e14     torch_cpu!at::cpu::where_outf+0xba1
07 00000087`4efeceb0 00007ff8`4d7544a1     torch_cpu!at::cpu::bucketize_outf+0xef14
08 00000087`4efecf00 00007ff8`4f1e9d0f     torch_cpu!at::_ops::grid_sampler_2d::redispatch+0xb1
09 00000087`4efecf70 00007ff8`4f269b4a     torch_cpu!std::optional<std::tuple<at::Tensor,at::Tensor,at::Tensor> >::~optional<std::tuple<at::Tensor,at::Tensor,at::Tensor> >+0x13bf
0a 00000087`4efed160 00007ff8`4f24ecc8     torch_cpu!c10::impl::BoxedKernelWrapper<at::Tensor __cdecl(at::Tensor const & __ptr64,at::Tensor const & __ptr64,std::optional<at::Tensor> const & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,bool & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,c10::SymInt & __ptr64),void>::call+0x3074a
0b 00000087`4efed3d0 00007ff8`4d6e9925     torch_cpu!c10::impl::BoxedKernelWrapper<at::Tensor __cdecl(at::Tensor const & __ptr64,at::Tensor const & __ptr64,std::optional<at::Tensor> const & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,bool & __ptr64,c10::ArrayRef<c10::SymInt> & __ptr64,c10::SymInt & __ptr64),void>::call+0x158c8
0c 00000087`4efed430 00007ff8`45b87ca7     torch_cpu!at::_ops::grid_sampler_2d::call+0x1d5
0d 00000087`4efed540 00007ff8`45be7d25     torch_python!THPPointer<_frame>::release+0xcdeb7
0e 00000087`4efed5a0 00007ff8`61bfa92c     torch_python!THPPointer<_frame>::release+0x12df35
0f 00000087`4efed7e0 00007ff8`61bb003f     python312!cfunction_call+0x5c [\objects\methodobject.c @ 540] 
10 00000087`4efed810 00007ff8`61bb0235     python312!_PyObject_MakeTpCall+0x13f [\objects\call.c @ 240] 
11 00000087`4efed870 00007ff8`61cc545d     python312!PyObject_Vectorcall+0x35 [\objects\call.c @ 327] 
12 00000087`4efed8b0 00007ff8`61bb0494     python312!_PyEval_EvalFrameDefault+0x642d [\pcbuild\python\bytecodes.c @ 2711] 
13 00000087`4efeda70 00007ff8`61bb2d1b     python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424] 
14 (Inline Function) --------`--------     python312!_PyObject_VectorcallTstate+0x2f [\include\internal\pycore_call.h @ 92] 
15 00000087`4efedab0 00007ff8`61bb018e     python312!method_vectorcall+0x11b [\objects\classobject.c @ 69] 
16 00000087`4efedb60 00007ff8`61bb02c6     python312!_PyVectorcall_Call+0xce [\objects\call.c @ 271] 
17 00000087`4efedba0 00007ff8`61cc5768     python312!_PyObject_Call+0x46 [\objects\call.c @ 373] 
18 00000087`4efedbe0 00007ff8`61bb0494     python312!_PyEval_EvalFrameDefault+0x6738 [\pcbuild\python\bytecodes.c @ 3259] 
19 00000087`4efedda0 00007ff8`61bafd31     python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424] 
1a 00000087`4efedde0 00007ff8`61bb0752     python312!_PyObject_FastCallDictTstate+0xb1 [\objects\call.c @ 146] 
1b 00000087`4efede30 00007ff8`61c25d55     python312!_PyObject_Call_Prepend+0xa2 [\objects\call.c @ 508] 
1c 00000087`4efeded0 00007ff8`61bb003f     python312!slot_tp_call+0xf5 [\objects\typeobject.c @ 8764] 
1d 00000087`4efedf30 00007ff8`61bb0235     python312!_PyObject_MakeTpCall+0x13f [\objects\call.c @ 240] 
1e 00000087`4efedf90 00007ff8`61cc545d     python312!PyObject_Vectorcall+0x35 [\objects\call.c @ 327] 
1f 00000087`4efedfd0 00007ff8`61bb0494     python312!_PyEval_EvalFrameDefault+0x642d [\pcbuild\python\bytecodes.c @ 2711] 
20 00000087`4efee190 00007ff8`61bafd31     python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424] 
21 00000087`4efee1d0 00007ff8`61bb0752     python312!_PyObject_FastCallDictTstate+0xb1 [\objects\call.c @ 146] 
22 00000087`4efee220 00007ff8`61c25d55     python312!_PyObject_Call_Prepend+0xa2 [\objects\call.c @ 508] 
23 00000087`4efee2c0 00007ff8`61bb0356     python312!slot_tp_call+0xf5 [\objects\typeobject.c @ 8764] 
24 00000087`4efee320 00007ff8`61cc5768     python312!_PyObject_Call+0xd6 [\objects\call.c @ 369] 
25 00000087`4efee360 00007ff8`61bb0494     python312!_PyEval_EvalFrameDefault+0x6738 [\pcbuild\python\bytecodes.c @ 3259] 
26 00000087`4efee520 00007ff8`61bafd31     python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424] 
27 00000087`4efee560 00007ff8`61bb0752     python312!_PyObject_FastCallDictTstate+0xb1 [\objects\call.c @ 146] 
28 00000087`4efee5b0 00007ff8`61c25d55     python312!_PyObject_Call_Prepend+0xa2 [\objects\call.c @ 508] 
29 00000087`4efee650 00007ff8`61bb003f     python312!slot_tp_call+0xf5 [\objects\typeobject.c @ 8764] 
2a 00000087`4efee6b0 00007ff8`61bb0235     python312!_PyObject_MakeTpCall+0x13f [\objects\call.c @ 240] 
2b 00000087`4efee710 00007ff8`61cc545d     python312!PyObject_Vectorcall+0x35 [\objects\call.c @ 327] 
2c 00000087`4efee750 00007ff8`61bb0494     python312!_PyEval_EvalFrameDefault+0x642d [\pcbuild\python\bytecodes.c @ 2711] 
2d 00000087`4efee910 00007ff8`61bafd31     python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424] 
2e 00000087`4efee950 00007ff8`61bb0752     python312!_PyObject_FastCallDictTstate+0xb1 [\objects\call.c @ 146] 
2f 00000087`4efee9a0 00007ff8`61c25d55     python312!_PyObject_Call_Prepend+0xa2 [\objects\call.c @ 508] 
30 00000087`4efeea40 00007ff8`61bb003f     python312!slot_tp_call+0xf5 [\objects\typeobject.c @ 8764] 
31 00000087`4efeeaa0 00007ff8`61bb0235     python312!_PyObject_MakeTpCall+0x13f [\objects\call.c @ 240] 
32 00000087`4efeeb00 00007ff8`61cc545d     python312!PyObject_Vectorcall+0x35 [\objects\call.c @ 327] 
33 00000087`4efeeb40 00007ff8`61bb0494     python312!_PyEval_EvalFrameDefault+0x642d [\pcbuild\python\bytecodes.c @ 2711] 
34 00000087`4efeed00 00007ff8`61bafd31     python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424] 
35 00000087`4efeed40 00007ff8`61bb0752     python312!_PyObject_FastCallDictTstate+0xb1 [\objects\call.c @ 146] 
36 00000087`4efeed90 00007ff8`61c25d55     python312!_PyObject_Call_Prepend+0xa2 [\objects\call.c @ 508] 
37 00000087`4efeee30 00007ff8`61bb003f     python312!slot_tp_call+0xf5 [\objects\typeobject.c @ 8764] 
38 00000087`4efeee90 00007ff8`61bb0235     python312!_PyObject_MakeTpCall+0x13f [\objects\call.c @ 240] 
39 00000087`4efeeef0 00007ff8`61cc545d     python312!PyObject_Vectorcall+0x35 [\objects\call.c @ 327] 
3a 00000087`4efeef30 00007ff8`61cbee66     python312!_PyEval_EvalFrameDefault+0x642d [\pcbuild\python\bytecodes.c @ 2711] 
3b (Inline Function) --------`--------     python312!_PyEval_EvalFrame+0x1b [\include\internal\pycore_ceval.h @ 88] 
3c (Inline Function) --------`--------     python312!_PyEval_Vector+0x55 [\python\ceval.c @ 1675] 
3d 00000087`4efef0f0 00007ff8`61cba816     python312!PyEval_EvalCode+0xe6 [\python\ceval.c @ 570] 
3e 00000087`4efef170 00007ff8`61cb7e6e     python312!builtin_exec_impl+0x256 [\python\bltinmodule.c @ 1096] 
3f 00000087`4efef200 00007ff8`61bfa5f1     python312!builtin_exec+0xce [\python\clinic\bltinmodule.c.h @ 586] 
40 00000087`4efef2b0 00007ff8`61bafae9     python312!cfunction_vectorcall_FASTCALL_KEYWORDS+0x81 [\objects\methodobject.c @ 439] 
41 00000087`4efef2e0 00007ff8`61bb0235     python312!_PyObject_VectorcallTstate+0x49 [\include\internal\pycore_call.h @ 93] 
42 00000087`4efef320 00007ff8`61cc545d     python312!PyObject_Vectorcall+0x35 [\objects\call.c @ 327] 
43 00000087`4efef360 00007ff8`61bb0494     python312!_PyEval_EvalFrameDefault+0x642d [\pcbuild\python\bytecodes.c @ 2711] 
44 00000087`4efef520 00007ff8`61bb018e     python312!_PyFunction_Vectorcall+0x54 [\objects\call.c @ 424] 
45 00000087`4efef560 00007ff8`61bb02c6     python312!_PyVectorcall_Call+0xce [\objects\call.c @ 271] 
46 00000087`4efef5a0 00007ff8`61b2fa5f     python312!_PyObject_Call+0x46 [\objects\call.c @ 373] 
47 (Inline Function) --------`--------     python312!PyObject_Call+0x41 [\objects\call.c @ 379] 
48 00000087`4efef5e0 00007ff8`61b307e1     python312!pymain_run_module+0x2cf [\modules\main.c @ 301] 
49 00000087`4efef630 00007ff8`61b30cf8     python312!pymain_run_python+0x411 [\modules\main.c @ 608] 
4a 00000087`4efef6c0 00007ff8`61b30d82     python312!Py_RunMain+0x18 [\modules\main.c @ 691] 
4b (Inline Function) --------`--------     python312!pymain_main+0x4b [\modules\main.c @ 719] 
4c 00000087`4efef6f0 00007ff6`94aa1494     python312!Py_Main+0x52 [\modules\main.c @ 732] 
4d (Inline Function) --------`--------     python!invoke_main+0x22 [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 90] 
4e 00000087`4efef760 00007ff8`e87ae8d7     python!__scrt_common_main_seh+0x10c [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288] 
4f 00000087`4efef7a0 00007ff8`e98fbf2c     KERNEL32!BaseThreadInitThunk+0x17
50 00000087`4efef7d0 00000000`00000000     ntdll!RtlUserThreadStart+0x2c

The code optimization make the callstack don't have enough information for debug.

@xuhancn
Copy link
Collaborator Author
xuhancn commented Apr 16, 2025

VS2019 code was cleaned up: #145863

@Blackhex
Copy link
Collaborator
Blackhex commented May 5, 2025

Some initial observations:

  • Generation of AVX512 instructions is controlled by /arch:AVX512 compiler flag which seems to be added to the compile command when CMake scripts detects it's supported on the build machine.
  • The wheel build machine seems to have AVX512 supported as CMakes scripts detects it Performing Test COMPILER_SUPPORTS_AVX512 - Success.
  • Newer Intel CPUs (Alder Lake and Raptor Lake) either do not support AVX512 instructions at all (Efficiency cores) or disables it for compatibility reasons (Performance cores) because they are consuming too much energy for the benefit.
  • VS2019 added support for AVX512 in version 16.3 on February 2022 (https://devblogs.microsoft.com/cppblog/avx-512-auto-vectorization-in-msvc/). This alone does not explain why VS2019 and VS2022 build behaves differently so there potentially might also be an issue in the CMakes scripts responsible for the AVX512 support detection.

@xuhancn
Copy link
Collaborator Author
xuhancn commented May 6, 2025

Some initial observations:

  • Generation of AVX512 instructions is controlled by /arch:AVX512 compiler flag which seems to be added to the compile command when CMake scripts detects it's supported on the build machine.
  • The wheel build machine seems to have AVX512 supported as CMakes scripts detects it Performing Test COMPILER_SUPPORTS_AVX512 - Success.
  • Newer Intel CPUs (Alder Lake and Raptor Lake) either do not support AVX512 instructions at all (Efficiency cores) or disables it for compatibility reasons (Performance cores) because they are consuming too much energy for the benefit.
  • VS2019 added support for AVX512 in version 16.3 on February 2022 (https://devblogs.microsoft.com/cppblog/avx-512-auto-vectorization-in-msvc/). This alone does not explain why VS2019 and VS2022 build behaves differently so there potentially might also be an issue in the CMakes scripts responsible for the AVX512 support detection.

Hi @Blackhex,

I tried to reproduce locally with the latest VS2022 installed by GUI, the issue don't occurred this issue. So, I replaced the VS2022 download URL to latest version, the issue still occurred:

$VS_DOWNLOAD_LINK = "https://download.visualstudio.microsoft.com/download/pr/8f480125-28b8-4a2c-847c-c2b02a8cdd1b/64be21d4ada005d7d07896ed0b004c322409bd04d6e8eba4c03c9fa39c928e7a/vs_BuildTools.exe"

Based on above steps, I guess two potential reason:

  1. VS2022 command installation:
    $VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",
    "--add Microsoft.Component.MSBuild",
    "--add Microsoft.Vi 8000 sualStudio.Component.Roslyn.Compiler",
    "--add Microsoft.VisualStudio.Component.TextTemplating",
    "--add Microsoft.VisualStudio.Component.VC.CoreIde",
    "--add Microsoft.VisualStudio.Component.VC.Redist.14.Latest",
    "--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Core",
    "--add Microsoft.VisualStudio.Component.VC.Tools.x86.x64",
    "--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Win81")
  2. Try to setup another build machine?

@Blackhex
Copy link
Collaborator
Blackhex commented May 6, 2025

Hello @xuhancn.

Thank you for this important observation.

I can confirm that building the repro case with VS2022 17.13.6 does not reproduce the issue on 13th gen CPU while it does with VS2022 17.8.18 which is the version that currently uses nightly. I was not able to reproduce it with either of those versions on 11th gen CPU.

@xuhancn
Copy link
Collaborator Author
xuhancn commented May 6, 2025

Hello @xuhancn.

Thank you for this important observation.

I can confirm that building the repro case with VS2022 17.13.6 does not reproduce the issue on 13th gen CPU while it does with VS2022 17.8.18 which is the version that currently uses nightly. I was not able to reproduce it with either of those versions on 11th gen CPU.

Could you please validate this case also: #152385
BTW, 11th gen Intel CPU is the only one client CPU, which is supporting AVX512, we can't validate this issue on them.

Image

@Blackhex
Copy link
Collaborator
Blackhex commented May 7, 2025

I can confirm that the #152385 repro-case fails with Build Tools 17.8.18 while it works with 17.13.6 on 13th gen CPU. I will bisect what version fixed the issue to get more insights. As this seems like an issue with compiler, that has been already fixed, it would help if anyone had some narrow C++ repro-case. Until we fully understand the root cause, updating the Build Tools seems to be the best fix/workaround.

@xuhancn
Copy link
Collaborator Author
xuhancn commented May 7, 2025

@Blackhex Good to hear VS2022 17.13.6 can fix the issue. And I also checked VS2022 release note: https://learn.microsoft.com/en-us/visualstudio/releases/2022/release-notes
VS2022 17.13.6 was released on Feb 11, 2025. I tried to upgrade compiler on Dec 2024, VS2022 was not fixed on that time.
Could you please submit a PR to upgrade VS2022 compiler on main branch? @atalman can help on cherry-pick it to fix release/2.7 branch.
CC: @malfet

@Blackhex
Copy link
Collaborator
Blackhex commented May 8, 2025

JFYI, the latest Build Tools version that has the issue is 17.8.20. Versions 17.9.0-17.9.3 failed to compile with ICE and the earliest working version is 17.9.4.

@xuhancn
Copy link
Collaborator Author
xuhancn commented May 10, 2025

Updated status on May 10th:

  1. Follow the comments: PyTorch VS2022 official build Windows binary illegal instruction on AVX2(max ISA level) CPU #145702 (comment), I created a PR to upgrade VS2022 to v17.13.6: [don't merge] upgrade vs2022 to v17.13.6 #153322.
Image

Mannul launched and check the version info.

  1. I dowload the official binary build from: https://github.com/pytorch/pytorch/actions/runs/14945315292?pr=153322
  2. The issue still existing, and please check below snapshot.
Image
  1. I also dumped the compiler info by a open source tool: https://github.com/dishather/richprint
Image The compiler info shows as `VS2022 v17.8.0`.

This behavior as same as I tried on Dec 2024.

@Blackhex how to upgrade VS2022 to new version correctly? if it is the installation command caused the issue as I mentioned: #145702 (comment)

@Blackhex
Copy link
Collaborator
Blackhex commented May 10, 2025

@xuhancn The toolchain for the PyTorch build is selected by loading the development shell, either from:

"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64

for Build Tools, or:

"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvarsall.bat" x64

for full Visual Studio for local builds. The exact path may be different based on the product you've installed.

It's quite a long time I was working on updating VS2019 to VS2022 so I don't remember the details. I'd guess that the https://github.com/pytorch/pytorch/pull/153322/files#diff-3bbee23c5ce8d35d620708f0a6107b0fd0fb55fc552528aecc22bf49d24925f7 script is not used for CI build at all. I can refer you to my old PR doing the update pytorch/test-infra#1175 but I am not sure how much it's relevant nowadays.

@xuhancn
Copy link
Collaborator Author
xuhancn commented May 10, 2025

@xuhancn The toolchain for the PyTorch build is selected by loading the development shell, either from:

"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64

for Build Tools, or:

"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvarsall.bat" x64

for full Visual Studio for local builds. The exact path may be different based on the product you've installed.

It's quite a long time I was working on updating VS2019 to VS2022 so I don't remember the details. I'd guess that the https://github.com/pytorch/pytorch/pull/153322/files#diff-3bbee23c5ce8d35d620708f0a6107b0fd0fb55fc552528aecc22bf49d24925f7 script is not used for CI build at all. I can refer you to my old PR doing the update pytorch/test-infra#1175 but I am not sure how much it's relevant nowadays.

I checked build log: https://github.com/pytorch/pytorch/actions/runs/14945315292/job/41987884812?pr=153322

Image

It called my updated script.

@xuhancn
Copy link
Collaborator Author
xuhancn commented May 10, 2025

I have added back VS2019 installation script, and I found the installer download place is different:
For VS2019, It is downloaded from AWS stored place:

$VS_DOWNLOAD_LINK = "https://ossci-windows.s3.us-east-1.amazonaws.com/vs16.8.6_BuildTools.exe"

For VS2022, it is downloaded from MSFT official link:
$VS_DOWNLOAD_LINK = "https://download.visualstudio.microsoft.com/download/pr/8f480125-28b8-4a2c-847c-c2b02a8cdd1b/64be21d4ada005d7d07896ed0b004c322409bd04d6e8eba4c03c9fa39c928e7a/vs_BuildTools.exe"

So, how can we install VS2022 correctly?

@Blackhex
Copy link
Collaborator
Blackhex commented May 12, 2025

IIRC, infrastructure engineers placed Build Tools installer on AWS storage because there were occasional download failures form the official source.

@Blackhex
Copy link
Collaborator
Blackhex commented May 12, 2025

I've checked relevant MSVC issues and I can provide an unverified hypothesis what could be the root cause and another possible workaround:

  • The situation might be caused by certain AVX512 intrinsics like _mm256_extract_epi32 being optimized with another AVX512 instructions which are not available on 12th+ gen CPU as the compiler expects that such intrinsics should be used only when not targeting AVX512. See https://developercommunity.visualstudio.com/t/Invalid-code-gen-when-using-AVX2-and-SSE/10527298 for the reference.
  • The behavior of the optimization was changing between the versions which explains why it is not a issue for the newer Build Tools. Nevertheless, all versions of the compiler work "as expected" as they assume those intrinsics should not be used for non-AVX512-targetting code.
  • As the source of the issue comes from code generated objects I could not identify which particular usage of intrinsic is causing the issue yet.
  • I partially confirmed that this issue might be the case of the above by applying /d2implyavx512upperregs- compiler flag which disables the mentioned optimization and which is the suggested workaround.
  • One can compile the PyTorch that won't fail on the repro cases by adding the following lines to the topmost CMakeLists.txt:
    string(APPEND CMAKE_C_FLAGS " /d2implyavx512upperregs-")
    string(APPEND CMAKE_CXX_FLAGS " /d2implyavx512upperregs-")
    
  • If confirmed, this would mean that the proper fix is to find usage of such intrinsics and use them only when targetting AVX512.

@kalaskarsanket
Copy link

@malfet any thoughts on @Blackhex 's comment ?

@xuhancn
Copy link
Collaborator Author
xuhancn commented May 15, 2025

Hi @kalaskarsanket , @Blackhex

It works: #153480

@Blackhex
Copy link
Collaborator

Awesome. Thank you for handling this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: build Build system issues module: cpu CPU specific problem (e.g., perf, algorithm) module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: windows Windows support for PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
0