-
Notifications
You must be signed in to change notification settings - Fork 24.2k
Pytorch 1.5.0 (installed from conda) errors with complaints about incompatibility between MKL and libgomp when using Pytorch's multiprocessing #37377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What happens if you follow the advice suggested in the error message? |
It's kind of interesting; I set
continues to be printed to stderr. I have a job that's been running for about 2 hours on 10x Titan V's and my stderr log file has about 2000 repetitions of this error message. |
I also get the same error and do @mawright 's method, but training log still post the message |
Got the same error using the following docker image:
Using 4x RTX 2080ti and an AMD Threadripper (have not tested on Intel). There is no difference between using fp16 or 32 Setting |
Same issue with me. Distributed training breaks with this error and forcing PyTorch version: 1.6.0a0+bc09478 (Compiled from source) |
Same issue. Pytorch 1.5 / fairseq 0.9 / 2 P40. Error message is keep printing. Anyone solved this? |
Setting
I searched for this piece of code (IntelPython) and I found this: else if(strcasecmp(mtlayer, "intel") == 0) { /* Intel runtime is requested */
if(omp && !iomp) {
fprintf(stderr, "Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with %s library."
"\n\tTry to import numpy first or set the threading layer accordingly. "
"Set MKL_SERVICE_FORCE_INTEL to force it.\n", omp_name);
if(!getenv("MKL_SERVICE_FORCE_INTEL"))
exit(1);
} else
PRELOAD(libiomp);
} The error message is printed either way, with or without |
Can someone please try to set |
You can get rid of this error and the error message by setting I'm using a Threadripper too, so together with Tetratrio's report this might be an AMD thing. Grepping conda manifests, libgomp is pulled in by import os
def print_layer(prefix):
print(f'{prefix}: {os.environ.get("MKL_THREADING_LAYER")}')
if __name__ == '__main__':
print_layer('Pre-import')
import numpy as np
from torch import multiprocessing as mp
print_layer('Post-import')
mp.set_start_method('spawn')
p = mp.Process(target=print_layer, args=('Child',))
p.start()
p.join() See, if torch is imported before numpy then the child process here gets a GNU threading layer (even though the parent doesn't have the variable defined).
But if the imports are swapped so numpy is imported before torch, the child process gets an INTEL threading layer
So I suspect numpy - or ones of its imports - is messing with the |
Realised I could just grep through site-packages for MKL_THREADING_LAYER, and turns out the culprit is
Minimal example: def child():
import torch
torch.set_num_threads(1)
if __name__ == '__main__':
import mkl
from torch import multiprocessing as mp
mp.set_start_method('spawn')
p = mp.Process(target=child)
p.start()
p.join() If you switch the order of the imports, you get a GNU threading layer in the child and everything is fine. |
Could the solution be as simple as putting |
I think which one you want is specific to the build and distribution mechanism you're using, so you shouldn't hardcode it in this repo.
What is "everything"? By default NumPy doesn't contain any OpenMP code; Anaconda shipped a patched versions till recently (and (IntelPython probably still does) with patches to use MKL routines, and I believe Intel's OpenMP implementation. Having NumPy pull in |
Pardon me, should've been clearer. Grep'ing through my conda-meta, libgomp turns up in exactly one place: libgcc-ng (check the I'll admit, I'm confused as to why libgomp is in there. First result from googling "libgomp" "libgcc-ng" is this discussion on conda-forge, - is that the 'until recently'? - but frankly I'm a bit out of my depth at this point. |
I think the first order of business is to file an issue to mkl complaining that they are setting an environment variable (libraries should NEVER EVER do this). Has anyone done this yet? |
Agreed, done. |
@rgommers confirming that updating to |
Thanks for confirming @protonish!
This happened by now, so closing. |
Going to go ahead and remove this from the 1.10.1 milestone as well since we've already confirmed this has a workaround that requires no underlying changes |
Will TBB make this work? |
I was having a similar issue while doing distributed training using accelerate. I added this to my cells and it worked: %env MKL_SERVICE_FORCE_INTEL=1 |
Fixes error introduced by 995edef when running unit tests using `run_test.py`: ``` Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library. Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it. ``` Refer solution suggested in: pytorch#37377 (comment) Need to cherry-pick this change to release/2.3 and release/2.4 branches as well.
🐛 Bug
I am using fairseq to run some multi-GPU training of NLP models. After upgrading Pytorch from 1.4.0 to 1.5.0 through conda (on the pytorch channel), I consistently get this error:
The above error message was generated when trying to train on 3 GPUs, so I assume that the three repetitions of the incompatibliity error means each process is generating one.
This error does not occur if I downgrade to Pytorch 1.4.0.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The above error does not occur
Environment
Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).
You can get the script and run it with:
conda
,pip
, source): condaHere is the yaml of my conda environment:
cc @ezyang @gchanan @zou3519 @malfet
The text was updated successfully, but these errors were encountered: