8000 Pytorch 1.5.0 (installed from conda) errors with complaints about incompatibility between MKL and libgomp when using Pytorch's multiprocessing · Issue #37377 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Pytorch 1.5.0 (installed from conda) errors with complaints about incompatibility between MKL and libgomp when using Pytorch's multiprocessing #37377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
8000
mawright opened this issue Apr 27, 2020 · 39 comments
Assignees
Labels
has workaround high priority module: build Build system issues module: dependency bug Problem is not caused by us, but caused by an upstream library we use module: known issue module: mkl Related to our MKL support module: multiprocessing Related to torch.multiprocessing module: regression It used to work, and now it doesn't triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@mawright
Copy link
mawright commented Apr 27, 2020

🐛 Bug

I am using fairseq to run some multi-GPU training of NLP models. After upgrading Pytorch from 1.4.0 to 1.5.0 through conda (on the pytorch channel), I consistently get this error:

Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
        Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
        Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
        Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Traceback (most recent call last):
  File "/data/mwright/anaconda3/envs/gpu/bin/fairseq-train", line 11, in <module>
    load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()
  File "/data/mwright/fairseq/fairseq_cli/train.py", line 355, in cli_main
    nprocs=args.distributed_world_size,
  File "/data/mwright/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/data/mwright/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/data/mwright/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 113, in join
    (error_index, exitcode)
Exception: process 0 terminated with exit code 1

The above error message was generated when trying to train on 3 GPUs, so I assume that the three repetitions of the incompatibliity error means each process is generating one.

This error does not occur if I downgrade to Pytorch 1.4.0.

To Reproduce

Steps to reproduce the behavior:

  1. Install Pytorch 1.5.0 through conda.
  2. Install fairseq to the conda environment from source
  3. Run a multi-GPU training run on fairseq

Expected behavior

The above error does not occur

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
  • PyTorch Version (e.g., 1.0): 1.5.0
  • OS (e.g., Linux): Ubuntu Linux 18.04.2
  • How you installed PyTorch (conda, pip, source): conda
  • Build command you used (if compiling from source): N/A
  • Python version: 3.7.7
  • CUDA/cuDNN version: 10.1/7.6.5 (also from conda)
  • GPU models and configuration: 3x Tesla V100
  • Any other relevant information:
    Here is the yaml of my conda environment:
name: gpu
channels:
  - pytorch
  - defaults
  - conda-forge
dependencies:
  - _libgcc_mutex=0.1=main
  - _tflow_select=2.1.0=gpu
  - absl-py=0.9.0=py37_0
  - asn1crypto=1.3.0=py37_0
  - astor=0.8.0=py37_0
  - blas=1.0=mkl
  - blinker=1.4=py37_0
  - c-ares=1.15.0=h7b6447c_1001
  - ca-certificates=2020.1.1=0
  - cachetools=3.1.1=py_0
  - certifi=2020.4.5.1=py37_0
  - cffi=1.14.0=py37h2e261b9_0
  - chardet=3.0.4=py37_1003
  - click=7.1.1=py_0
  - cryptography=2.8=py37h1ba5d50_0
  - cudatoolkit=10.1.243=h6bb024c_0
  - cudnn=7.6.5=cuda10.1_0
  - cupti=10.1.168=0
  - cycler=0.10.0=py37_0
  - cython=0.29.15=py37he6710b0_0
  - dbus=1.13.12=h746ee38_0
  - expat=2.2.6=he6710b0_0
  - fontconfig=2.13.0=h9420a91_0
  - freetype=2.9.1=h8a8886c_1
  - future=0.18.2=py37_0
  - gast=0.2.2=py37_0
  - glib=2.63.1=h5a9c865_0
  - google-auth=1.13.1=py_0
  - google-auth-oauthlib=0.4.1=py_2
  - google-pasta=0.2.0=py_0
  - grpcio=1.27.2=py37hf8bcb03_0
  - gst-plugins-base=1.14.0=hbbd80ab_1
  - gstreamer=1.14.0=hb453b48_1
  - h5py=2.10.0=py37h7918eee_0
  - hdf5=1.10.4=hb1b8bf9_0
  - icu=58.2=h9c2bf20_1
  - idna=2.9=py_1
  - intel-openmp=2020.0=166
  - jpeg=9b=h024ee3a_2
  - keras-applications=1.0.8=py_0
  - keras-preprocessing=1.1.0=py_1
  - kiwisolver=1.1.0=py37he6710b0_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libpng=1.6.37=hbc83047_0
  - libprotobuf=3.11.4=hd408876_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libuuid=1.0.3=h1bed415_2
  - libxcb=1.13=h1bed415_1
  - libxml2=2.9.9=hea5a465_1
  - markdown=3.1.1=py37_0
  - matplotlib=3.1.3=py37_0
  - matplotlib-base=3.1.3=py37hef1b27d_0
  - mkl=2020.0=166
  - mkl-service=2.3.0=py37he904b0f_0
  - mkl_fft=1.0.15=py37ha843d7b_0
  - mkl_random=1.1.0=py37hd6b4f25_0
  - ncurses=6.2=he6710b0_0
  - ninja=1.9.0=py37hfd86e86_0
  - numpy=1.18.1=py37h4f9e942_0
  - numpy-base=1.18.1=py37hde5b4d6_1
  - oauthlib=3.1.0=py_0
  - openssl=1.1.1g=h7b6447c_0
  - opt_einsum=3.1.0=py_0
  - pcre=8.43=he6710b0_0
  - pip=20.0.2=py37_1
  - portalocker=1.5.2=py37_0
  - protobuf=3.11.4=py37he6710b0_0
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.7=py_0
  - pycparser=2.20=py_0
  - pyjwt=1.7.1=py37_0
  - pyopenssl=19.1.0=py37_0
  - pyparsing=2.4.6=py_0
  - pyqt=5.9.2=py37h05f1152_2
  - pysocks=1.7.1=py37_0
  - python=3.7.7=hcf32534_0_cpython
  - python-dateutil=2.8.1=py_0
  - pytorch=1.5.0=py3.7_cuda10.1.243_cudnn7.6.3_0
  - qt=5.9.7=h5867ecd_1
  - readline=8.0=h7b6447c_0
  - regex=2020.4.4=py37h7b6447c_0
  - requests=2.23.0=py37_0
  - requests-oauthlib=1.3.0=py_0
  - rsa=4.0=py_0
  - scipy=1.4.1=py37h0b6359f_0
  - setuptools=46.1.3=py37_0
  - sip=4.19.8=py37hf484d3e_0
  - six=1.14.0=py37_0
  - sqlite=3.31.1=h62c20be_1
  - tensorboard=2.1.0=py3_0
  - tensorboardx=2.0=py_0
  - tensorflow=2.1.0=gpu_py37h7a4bb67_0
  - tensorflow-base=2.1.0=gpu_py37h6c5654b_0
  - tensorflow-estimator=2.1.0=pyhd54b08b_0
  - tensorflow-gpu=2.1.0=h0d30ee6_0
  - termcolor=1.1.0=py37_1
  - tk=8.6.8=hbc83047_0
  - tornado=6.0.4=py37h7b6447c_1
  - tqdm=4.45.0=py_0
  - typing=3.6.4=py37_0
  - urllib3=1.25.8=py37_0
  - werkzeug=1.0.1=py_0
  - wheel=0.34.2=py37_0
  - wrapt=1.12.1=py37h7b6447c_1
  - xz=5.2.5=h7b6447c_0
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - mecab-python3==0.996.5
    - sacrebleu==1.4.8
prefix: /data/mwright/anaconda3/envs/gpu

cc @ezyang @gchanan @zou3519 @malfet

@mawright mawright changed the title Pytorch 1.5.0 installed from conda package errors with complaints about incompatibility between MKL and libgomp when using Pytorch's multiprocessing Pytorch 1.5.0 (installed from conda) errors with complaints about incompatibility between MKL and libgomp when using Pytorch's multiprocessing Apr 27, 2020
@mrshenli mrshenli added module: mkl Related to our MKL support module: multiprocessing Related to torch.multiprocessing triage review high priority module: regression It used to work, and now it doesn't labels Apr 28, 2020
@ezyang
Copy link
Contributor
ezyang commented Apr 29, 2020

What happens if you follow the advice suggested in the error message?

@mawright
Copy link
Author

It's kind of interesting; I set export MKL_SERVICE_FORCE_INTEL=1 before calling the fairseq script and the training seems to work (I'm unsure whether it is performing as well as with 1.4.0) but the error message

Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
        Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.

continues to be printed to stderr. I have a job that's been running for about 2 hours on 10x Titan V's and my stderr log file has about 2000 repetitions of this error message.

@zixiliuUSC
Copy link

I also get the same error and do @mawright 's method, but training log still post the message

@adriansahlman
Copy link

Got the same error using the following docker image:

FROM pytorch/pytorch:1.5-cuda10.1-cudnn7-devel

RUN apt-get update && apt-get install -y --no-install-recommends \
         build-essential \
         cmake \
         git \
         curl \
         vim \
         wget \
         ca-certificates \
         libjpeg-dev \
         libpng-dev

WORKDIR /opt
RUN git clone https://github.com/NVIDIA/apex.git
WORKDIR /opt/apex
RUN pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--xentropy" .

WORKDIR /opt
RUN git clone https://github.com/pytorch/fairseq.git
WORKDIR /opt/fairseq
RUN pip install --editable .

Using 4x RTX 2080ti and an AMD Threadripper (have not tested on Intel).

There is no difference between using fp16 or 32

Setting MKL_SERVICE_FORCE_INTEL=1 allows training to continue with the error message printed twice for each process. First error message is written before any other output from fairseq-train while the second is written right after "Using FusedAdam".

@ngimel ngimel added module: build Build system issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 4, 2020
@lovishmadaan
Copy link
lovishmadaan commented May 7, 2020

Got the same error using the following docker image:

FROM pytorch/pytorch:1.5-cuda10.1-cudnn7-devel

RUN apt-get update && apt-get install -y --no-install-recommends \
         build-essential \
         cmake \
         git \
         curl \
         vim \
         wget \
         ca-certificates \
         libjpeg-dev \
         libpng-dev

WORKDIR /opt
RUN git clone https://github.com/NVIDIA/apex.git
WORKDIR /opt/apex
RUN pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--xentropy" .

WORKDIR /opt
RUN git clone https://github.com/pytorch/fairseq.git
WORKDIR /opt/fairseq
RUN pip install --editable .

Using 4x RTX 2080ti and an AMD Threadripper (have not tested on Intel).

There is no difference between using fp16 or 32

Setting MKL_SERVICE_FORCE_INTEL=1 allows training to continue with the error message printed twice for each process. First error message is written before any other output from fairseq-train while the second is written right after "Using FusedAdam".

Same issue with me. Distributed training breaks with this error and forcing MKL_SERVICE_FORCE_INTEL to 1 continues the training with error message repeated for each process.

PyTorch version: 1.6.0a0+bc09478 (Compiled from source)
fairseq version: 0.9.0

@Lyusungwon
Copy link

Same issue. Pytorch 1.5 / fairseq 0.9 / 2 P40. Error message is keep printing. Anyone solved this?

@alvesmarcos
Copy link
alvesmarcos commented May 15, 2020

Setting MKL_SERVICE_FORCE_INTEL=1 works fine but the following error message is still printed.

Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
        Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.

I searched for this piece of code (IntelPython) and I found this:

 else if(strcasecmp(mtlayer, "intel") == 0) {   /* Intel runtime is requested */
        if(omp && !iomp) {
            fprintf(stderr, "Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with %s library."
                            "\n\tTry to import numpy first or set the threading layer accordingly. "
                            "Set MKL_SERVICE_FORCE_INTEL to force it.\n", omp_name);
            if(!getenv("MKL_SERVICE_FORCE_INTEL"))
                exit(1);
        } else
            PRELOAD(libiomp);
    }

The error message is printed either way, with or without MKL_SERVICE_FORCE_INTEL.

@malfet
Copy link
Contributor
malfet commented May 15, 2020

Can someone please try to set MKL_THREADING_LAYER=GNU and see if it will also fixe those errors? (By default numpy is trying to use Intel's implementation of OpenMP, while PyTorch is linked with GNU, which seem to trigger this error message)

@andyljones
Copy link
Contributor

You can get rid of this error and the error message by setting MKL_THREADING_LAYER=GNU.

I'm using a Threadripper too, so together with Tetratrio's report this might be an AMD thing.

Grepping conda manifests, libgomp is pulled in by libgcc-ng, which is in turn pulled in by, uh, pretty much everything. So the culprit is more likely to be whoever's setting MKL_THREADING_LAYER=INTEL. As far as that goes, well, it's weird.

import os

def print_layer(prefix):
    print(f'{prefix}: {os.environ.get("MKL_THREADING_LAYER")}')

if __name__ == '__main__':
    print_layer('Pre-import')
    import numpy as np
    from torch import multiprocessing as mp
    print_layer('Post-import')

    mp.set_start_method('spawn')
    p = mp.Process(target=print_layer, args=('Child',))
    p.start()
    p.join()

See, if torch is imported before numpy then the child process here gets a GNU threading layer (even though the parent doesn't have the variable defined).

Pre-import: None
Post-import: None
Child: GNU

But if the imports are swapped so numpy is imported before torch, the child process gets an INTEL threading layer

Pre-import: None
Post-import: None
Child: INTEL

So I suspect numpy - or ones of its imports - is messing with the env parameter of Popen, but half an hour's search and I can't figure out how.

@Baranowski Baranowski self-assigned this May 16, 2020
@andyljones
Copy link
Contributor
andyljones commented May 16, 2020

Realised I could just grep through site-packages for MKL_THREADING_LAYER, and turns out the culprit is mkl itself. The bug's triggered if

  • you're using libgomp, which is pulled in by everything (though maybe just on AMD?)
  • mkl is imported before torch, so that the child gets a INTEL threading layer
  • the number of threads is set in the child, which forces mkl to initialize

Minimal example:

def child():
    import torch
    torch.set_num_threads(1)

if __name__ == '__main__':
    import mkl
    from torch import multiprocessing as mp

    mp.set_start_method('spawn')
    p = mp.Process(target=child)
    p.start()
    p.join()

If you switch the order of the imports, you get a GNU threading layer in the child and everything is fine.

@Baranowski
Copy link
Contributor

Could the solution be as simple as putting os.environ['MKL_THREADING_LAYER'] = 'GNU' in torch/__init__.py?

@rgommers
Copy link
Collaborator

I think which one you want is specific to the build and distribution mechanism you're using, so you shouldn't hardcode it in this repo.

you're using libgomp, which is pulled in by everything (though maybe just on AMD?)

What is "everything"? By default NumPy doesn't contain any OpenMP code; Anaconda shipped a patched versions till recently (and (IntelPython probably still does) with patches to use MKL routines, and I believe Intel's OpenMP implementation. Having NumPy pull in libgomp (if that is what you're seeing) is very likely AMD-specific.

@andyljones
Copy link
Contributor
andyljones commented May 17, 2020

Pardon me, should've been clearer. Grep'ing through my conda-meta, libgomp turns up in exactly one place: libgcc-ng (check the info.*/info/paths.json in that archive). libgcc-ng in turn shows up in 20-odd other meta files.

I'll admit, I'm confused as to why libgomp is in there. First result from googling "libgomp" "libgcc-ng" is this discussion on conda-forge, - is that the 'until recently'? - but frankly I'm a bit out of my depth at this point.

@Baranowski Baranowski removed their assignment May 18, 2020
@ezyang ezyang added the module: dependency bug Problem is not caused by us, but caused by an upstream library we use label May 18, 2020
@ezyang
Copy link
Contributor
ezyang commented May 18, 2020

I think the first order of business is to file an issue to mkl complaining that they are setting an environment variable (libraries should NEVER EVER do this). Has anyone done this yet?

@rgommers
Copy link
Collaborator

I think the first order of business is to file an issue to mkl complaining that they are setting an environment variable (libraries should NEVER EVER do this). Has anyone done this yet?

Agreed, done.

@protonish
Copy link

@rgommers confirming that updating to mkl-service=2.4.0 fixes this problem.

@rgommers
Copy link
Collaborator

Thanks for confirming @protonish!

Hopefully it'll propagate to defaults shortly, until then this issue should remain open.

This happened by now, so closing.

@seemethere
Copy link
Member

Going to go ahead and remove this from the 1.10.1 milestone as well since we've already confirmed this has a workaround that requires no underlying changes

@zhanwenchen
Copy link

Will TBB make this work?

@jaydeepborkar
Copy link
jaydeepborkar commented Nov 24, 2023

I was having a similar issue while doing distributed training using accelerate.

I added this to my cells and it worked:
`
%env MKL_THREADING_LAYER=GNU

%env MKL_SERVICE_FORCE_INTEL=1
`

pruthvistony pushed a commit to ROCm/pytorch that referenced this issue Oct 4, 2024
Fixes error introduced by
995edef
when running unit tests using `run_test.py`:
```
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
	Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
```

Refer solution suggested in:
pytorch#37377 (comment)

Need to cherry-pick this change to release/2.3 and release/2.4 branches
as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has workaround high priority module: build Build system issues module: dependency bug Problem is not caused by us, but caused by an upstream library we use module: known issue module: mkl Related to our MKL support module: multiprocessing Related to torch.multiprocessing module: regression It used to work, and now it doesn't triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

0