-
Notifications
You must be signed in to change notification settings - Fork 24.8k
Description
🐛 Describe the bug
Background
Intel Team found the PyTorch Windows XPU nightly build binary occurred illegal instruction on AVX2(max ISA level) CPU, the original issue is here: intel/torch-xpu-ops#1173
Reproduce steps:
Install the PyTorch Windows XPU binary, and then run it on Intel client CPU(max ISA level is AVX2).
Example, use 2024-12-11 nightly build:
python -m pip install https://download.pytorch.org/whl/nightly/xpu/torch-2.6.0.dev20241211%2Bxpu-cp39-cp39-win_amd64.whl
Reproduce code:
import torch
class TestClass:
def test_grid_sampler_2d(self):
torch.manual_seed(0)
b = torch.rand(2, 13, 10, 2, dtype=torch.float64)
a = torch.rand(2, 3, 5, 20, dtype=torch.float64)
torch.grid_sampler_2d(a, b, interpolation_mode=0, padding_mode=0, align_corners=False)
and it will occur the illegal instruction.
Debug Note:
-
Intel team tried to build PyTorch Windows XPU binary locally, but we can't reproduce the issue.
-
Intel Team tried to debug the official binary via WinDBG.
WinDBG catched up the issue, it is genarated AVX512 instruction and it is raised illegal instruction on AVX2 max ISA CPU.
But we can't locate the issue to source level. Due to our missing debug symbol (.pdb) files. PyTorch has some issue to genarate the .pdb file.
- We tried to switch PyTorch Windows CPU(only) build to VS2022: [don't merge] use vs2022 build windows cpu wheel. #143791
We tested the PyTorch Windows CPU(only) binary, which build by the PR. The issue can reproduced.
Conclusion:
- It is only occurred on PyTorch official build system, and Visual studio version must be 2022.
- The illegal instruction is caused by compiler genarated AVX512 instruction for AVX2 ISA.
- Due to item 2, it is only occurred on AVX2 (max ISA) CPU
Solution
Option 1: Fix the pytorch official build system, if we want to switch PyTorch Windows CPU build to VS2022, in the further.
Because of we can't reproduce the issue locally. Suggest involev Microsoft PyTorch team or Microsoft Visual Studio team. The reproduce PR is: #143791
Option 2: Intel PyTorch team downgrade PyTorch Windows XPU build to VS2019.
Versions
NA
cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10