-
Notifications
You must be signed in to change notification settings - Fork 24.2k
Python 3.10 + intel-openmp failed to use numactl after import torch._C #136307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not sure I understand what is the problem/ask here... |
The problem is when we using torch launcher for CPU test, it will use numactl for core binding. But we found that on python 3.10, the core binding it's not work based on numactl. So dig it deeper and find that it's caused by This case existed when the following conditions satisfied:
|
Hi,@malfet. Something seems wrong with How to reproduce:
Without specifying the intel openmp shared library With specifying the intel openmp shared library: I don't know why specifying the Intel OpenMP shared library is causing a segmentation fault. |
Here is the latest finding: After setting the KMP_AFFINITY parameter, it seems that numactl can recognize one cpu core. Prepare the env:
The reproduce shell script:
The test results are as follows: After setting the KMP_AFFINITY parameter, numactl can bind only to core 0. It seems that numactl can recognize only one CPU core after setting
|
Update on the latest findings: Record some debugging progress by the way. I tried using the compilation option |
I think it is expected that linking/dlopening multiple lib*omp implementations will cause problems. Last I checked, pytorch's MKL DNN component links to gomp regardless of whether you have it on the machine, so I suspect you'd also need to disable MKL DNN in the custom build. |
The flavor of OpenMP MKL DNN uses is defined by the compiler. GCC builds will always link to |
This seg fault issue can be fixed by the next Intel OpenMP release (ETA end of March). |
This issue is caused by intel-openmp, and it is not related to pytorch. It can be reproduced by 3 step:
Obviously, the mkl in pytorch do some omp functions in initial. |
This seg fault issue should have been fixed by the latest intel-openmp: 2025.1.0 which is already released. @LifengWang you could try if using intel-openmp==2025.1.0 fixes this seg fault error. |
Hi, @chunyuan-w. I have verified that using intel-openmp==2025.1.0 with the PyTorch 0427 nightly wheels resolves the segmentation fault issue. |
This bug can be fixed by the Intel OpenMP in OneAPI 2025.2 (ETA end of June). |
🐛 Describe the bug
Insert debug code in torch.init.py
How to reproduce:
Print:
Versions
Python 3.10
Intel-openmp: 2024
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @frank-wei
The text was updated successfully, but these errors were encountered: