Can't import torch --> OSError related to libcublasLt.so.11 #88882

tbloch1 · 2022-11-11T12:25:35Z

🐛 Describe the bug

When I try to import torch in my docker container, I get an OSError:
/usr/local/lib/python3.8/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

I've followed the step from Issue 51080, which seems like it might be similar, but to no effect. Note, I don't have conda, so I didn't follow those specific steps, just the pip ones.

However, I have found that if I import tensorflow first, list my devices and then import torch it works fine...

try:
    import torch
except OSError as e:
    print(e)
    import tensorflow as tf
    tf.config.list_physical_devices()
    import torch
    print([torch.device(i) for i in range(torch.cuda.device_count())])

>>> /usr/local/lib/python3.8/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11
>>> [device(type='cuda', index=0), device(type='cuda', index=1)]

Which makes me suspect that this is a different issue to Issue 51080.

The full traceback of import torch is:

OSError                                   Traceback (most recent call last)
/root/GI_Data/KPVESQC5_AI4Q_P/Exp_workflow_a.ipynb Cell 2' in <cell line: 1>()
----> [1](vscode-notebook-cell://attached-container%2B7b22636f6e7461696e65724e616d65223a222f74625f656173657163222c2273657474696e6773223a7b22686f7374223a227373683a2f2f4750555f75736572227d7d/root/GI_Data/KPVESQC5_AI4Q_P/Exp_workflow_a.ipynb#ch0000009vscode-remote?line=0) import torch

File /usr/local/lib/python3.8/dist-packages/torch/__init__.py:191, in <module>
    [180](file:///usr/local/lib/python3.8/dist-packages/torch/__init__.py?line=179) else:
    [181](file:///usr/local/lib/python3.8/dist-packages/torch/__init__.py?line=180)     # Easy way.  You want this most of the time, because it will prevent
    [182](file:///usr/local/lib/python3.8/dist-packages/torch/__init__.py?line=181)     # C++ symbols from libtorch clobbering C++ symbols from other
   (...)
    [188](file:///usr/local/lib/python3.8/dist-packages/torch/__init__.py?line=187)     #
    [189](file:///usr/local/lib/python3.8/dist-packages/torch/__init__.py?line=188)     # See Note [Global dependencies]
    [190](file:///usr/local/lib/python3.8/dist-packages/torch/__init__.py?line=189)     if USE_GLOBAL_DEPS:
--> [191](file:///usr/local/lib/python3.8/dist-packages/torch/__init__.py?line=190)         _load_global_deps()
    [192](file:///usr/local/lib/python3.8/dist-packages/torch/__init__.py?line=191)     from torch._C import *  # noqa: F403
    [194](file:///usr/local/lib/python3.8/dist-packages/torch/__init__.py?line=193) # Appease the type checker; ordinarily this binding is inserted by the
    [195](file:///usr/local/lib/python3.8/dist-packages/torch/__init__.py?line=194) # torch._C module initialization code in C

File /usr/local/lib/python3.8/dist-packages/torch/__init__.py:153, in _load_global_deps()
    [150](file:///usr/local/lib/python3.8/dist-packages/torch/__init__.py?line=149) here = os.path.abspath(__file__)
    [151](file:///usr/local/lib/python3.8/dist-packages/torch/__init__.py?line=150) lib_path = os.path.join(os.path.dirname(here), 'lib', lib_name)
--> [153](file:///usr/local/lib/python3.8/dist-packages/torch/__init__.py?line=152) ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)

File /usr/lib/python3.8/ctypes/__init__.py:373, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    [370](file:///usr/lib/python3.8/ctypes/__init__.py?line=369) self._FuncPtr = _FuncPtr
    [372](file:///usr/lib/python3.8/ctypes/__init__.py?line=371) if handle is None:
--> [373](file:///usr/lib/python3.8/ctypes/__init__.py?line=372)     self._handle = _dlopen(self._name, mode)
    [374](file:///usr/lib/python3.8/ctypes/__init__.py?line=373) else:
    [375](file:///usr/lib/python3.8/ctypes/__init__.py?line=374)     self._handle = handle

OSError: /usr/local/lib/python3.8/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

Versions

Python version: 3.8.10
Docker version: 20.10.21
BASE_IMAGE=tensorflow/tensorflow
IMAGE_VERSION=2.9.1-gpu-jupyter

Output of nvcc -V:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

Output of torch.__version__ in python:

'1.13.0+cu117'

Output of collect_env.py:

PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.8.10 (default, Mar 15 2022, 12:22:08)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.29
Is CUDA available: N/A
CUDA runtime version: 11.2.152
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: 
GPU 0: NVIDIA A10
GPU 1: NVIDIA A10

Nvidia driver version: 520.61.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

Versions of relevant libraries:
[pip3] numpy==1.22.4
[pip3] torch==1.13.0
[pip3] torchaudio==0.13.0
[pip3] torchvision==0.14.0
[conda] Could not collect

cc @seemethere @malfet @osalpekar @atalman @ptrblck @ezyang @ngimel

The text was updated successfully, but these errors were encountered:

malfet · 2022-11-11T18:40:45Z

Can you please clarify, how you've installed PyTorch into your container? Using pip I assume? In that case, do you mind sharing output of pip list?

Also, please try install torch wheel using pip3 install torch --extra-index-url https://download.pytorch.org/whl/cu117/ and see if this will make it go away.

tbloch1 · 2022-11-14T10:44:19Z

Yep, I installed pytorch using pip, following the install guide.

Sure thing. The pip llist output is below.

I've tried that install method, and it says all the requirements have already been met, and the error remains.

Package                      Version
---------------------------- --------------------
absl-py                      1.0.0
affine                       2.3.1
argon2-cffi                  21.3.0
argon2-cffi-bindings         21.2.0
asciitree                    0.3.3
asttokens                    2.0.5
astunparse                   1.6.3
attrs                        21.4.0
backcall                     0.2.0
beautifulsoup4               4.11.1
bleach                       5.0.0
cachetools                   5.1.0
celluloid                    0.2.0
certifi                      2019.11.28
cffi                         1.15.0
chardet                      3.0.4
click                        8.1.3
click-plugins                1.1.1
cligj                        0.7.2
cloudpickle                  2.2.0
cycler                       0.11.0
Cython                       0.29.32
dask                         2022.10.2
dbus-python                  1.2.16
debugpy                      1.6.0
decorator                    5.1.1
defusedxml                   0.7.1
entrypoints                  0.4
executing                    0.8.3
fasteners                    0.18
fastjsonschema               2.15.3
Fiona                        1.8.22
flatbuffers                  1.12
fonttools                    4.33.3
fsspec                       2022.10.0
gast                         0.4.0
geopandas                    0.12.1
google-auth                  2.6.6
google-auth-oauthlib         0.4.6
google-pasta                 0.2.0
grpcio                       1.46.3
h5py                         3.6.0
hdbscan                      0.8.29
idna                         2.8
importlib-metadata           4.11.4
importlib-resources          5.7.1
ipykernel                    5.1.1
ipympl                       0.9.2
ipython                      8.3.0
ipython-genutils             0.2.0
ipywidgets                   7.7.0
jedi                         0.17.2
Jinja2                       3.1.2
joblib                       1.2.0
jsonschema                   4.5.1
jupyter                      1.0.0
jupyter-client               7.3.1
jupyter-console              6.4.3
jupyter-core                 4.10.0
jupyter-http-over-ws         0.0.8
jupyterlab-pygments          0.2.2
jupyterlab-widgets           1.1.0
keras                        2.9.0
Keras-Preprocessing          1.1.2
kiwisolver                   1.4.2
libclang                     14.0.1
lightgbm                     3.3.3
llvmlite                     0.39.1
locket                       1.0.0
Markdown                     3.3.7
MarkupSafe                   2.1.1
matplotlib                   3.5.2
matplotlib-inline            0.1.3
mistune                      0.8.4
munch                        2.5.0
nbclient                     0.6.3
nbconvert                    6.5.0
nbformat                     5.7.0
nest-asyncio                 1.5.5
notebook                     6.4.11
numba                        0.56.3
numcodecs                    0.10.2
numpy                        1.22.4
nvidia-cublas-cu11           11.10.3.66
nvidia-cuda-nvrtc-cu11       11.7.99
nvidia-cuda-runtime-cu11     11.7.99
nvidia-cudnn-cu11            8.5.0.96
nvidia-pyindex               1.0.9
oauthlib                     3.2.0
opt-einsum                   3.3.0
packaging                    21.3
pandas                       1.5.1
pandocfilters                1.5.0
parso                        0.7.1
partd                        1.3.0
pexpect                      4.8.0
pickleshare                  0.7.5
Pillow                       9.1.1
pip                          22.3.1
prometheus-client            0.14.1
prompt-toolkit               3.0.29
protobuf                     3.19.4
psutil                       5.9.1
ptyprocess                   0.7.0
pure-eval                    0.2.2
pyasn1                       0.4.8
pyasn1-modules               0.2.8
pycparser                    2.21
Pygments                     2.12.0
PyGObject                    3.36.0
pynndescent                  0.5.8
pyparsing                    3.0.9
pyproj                       3.4.0
pyrsistent                   0.18.1
python-apt                   2.0.0+ubuntu0.20.4.7
python-dateutil              2.8.2
pytz                         2022.6
PyYAML                       6.0
pyzmq                        23.0.0
qtconsole                    5.3.0
QtPy                         2.1.0
rasterio                     1.3.3
requests                     2.22.0
requests-oauthlib            1.3.1
requests-unixsocket          0.2.0
rioxarray                    0.12.4
rsa                          4.8
scikit-learn                 1.1.3
scipy                        1.9.3
seaborn                      0.12.1
Send2Trash                   1.8.0
setuptools                   65.5.1
Shapely                      1.8.5.post1
six                          1.14.0
snuggs                       1.4.7
soupsieve                    2.3.2.post1
stack-data                   0.2.0
tensorboard                  2.9.0
tensorboard-data-server      0.6.1
tensorboard-plugin-wit       1.8.1
tensorflow                   2.9.1
tensorflow-estimator         2.9.0
tensorflow-io-gcs-filesystem 0.26.0
termcolor                    1.1.0
terminado                    0.15.0
threadpoolctl                3.1.0
tinycss2                     1.1.1
toolz                        0.12.0
torch                        1.13.0
torchaudio                   0.13.0
torchvision                  0.14.0
tornado                      6.1
tqdm                         4.64.1
traitlets                    5.2.1.post0
typing_extensions            4.4.0
umap-learn                   0.5.3
urllib3                      1.25.8
wcwidth
8000
                      0.2.5
webencodings                 0.5.1
Werkzeug                     2.1.2
wheel                        0.38.3
widgetsnbextension           3.6.0
wrapt                        1.14.1
xarray                       2022.10.0
zarr                         2.13.3
zipp                         3.8.0

mergian · 2022-12-16T09:07:13Z

I just ran into the same issue on Ubuntu 22.04. The reason is that Pytorch loads libcublas.so. in _preload_cuda_deps (in torch/__init__.py), which uses symbols from libcublasLt.so.11. But dlopen cannot find that library. I suspect NVIDIA has forgotten to set RPATH=$ORIGIN within libcublas.so, so the path to the library is not considered for loading dependencies.

Quick solution: set LD_LIBRARY_PATH to point to that directory, in your case:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/python3.8/dist-packages/torch/lib/../../nvidia/cublas/lib/

A more solid fix would be to change _preload_cuda_deps to:

def _preload_cuda_deps():
    """ Preloads cudnn/cublas deps if they could not be found otherwise """
    # Should only be called on Linux if default path resolution have failed
    assert platform.system() == 'Linux', 'Should only be called on Linux'
    for path in sys.path:
        nvidia_path = os.path.join(path, 'nvidia')
        if not os.path.exists(nvidia_path):
            continue
        cublaslt_path = os.path.join(nvidia_path, 'cublas', 'lib', 'libcublasLt.so.11')
        cublas_path = os.path.join(nvidia_path, 'cublas', 'lib', 'libcublas.so.11')
        cudnn_path = os.path.join(nvidia_path, 'cudnn', 'lib', 'libcudnn.so.8')
        if not os.path.exists(cublaslt_path) or not os.path.exists(cublas_path) or not os.path.exists(cudnn_path):
            continue
        break

    ctypes.CDLL(cublaslt_path)
    ctypes.CDLL(cublas_path)
    ctypes.CDLL(cudnn_path)

tbloch1 · 2022-12-16T09:16:23Z

Thanks for the reply @mergian! Seems like that's the right direction for a full solution.

For me I managed to solve it by trial and error, eventually using the following to install pytorch:

pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116

Which I guess fixed it due to a mismatch in cuda versions previously? Not too sure. But either way, I do not have the problem after this, and torch works fine.

I don't want to close this issue, since it seems like there is a more underlying problem that needs addressing.

malfet added module: cuda Related to torch.cuda, and CUDA support in general module: binaries Anything related to official binaries that we release to users labels Nov 11, 2022

malfet added the module: regression It used to work, and now it doesn't label Nov 11, 2022

mergian mentioned this issue Dec 20, 2022

torch-2.0.0-rc1 and torch-1.13.1 can not be installed on Ubuntu 20.04 #91067

Closed

daymxn mentioned this issue Feb 15, 2023

High priority best practices manny405/sapai#96

Merged

bdice mentioned this issue May 16, 2023

cuML fails to import with undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11 Kaggle/docker-python#1258

Closed

albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 2, 2023

pchampio mentioned this issue Dec 11, 2024

Error with libcublasLt.so.12 while running B3 model – Could it be related to CUDA version? Voice-Privacy-Challenge/Voice-Privacy-Challenge-2024#73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can't import torch --> OSError related to libcublasLt.so.11 #88882

Can't import torch --> OSError related to libcublasLt.so.11 #88882

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Can't import torch --> OSError related to libcublasLt.so.11 #88882

Can't import torch --> OSError related to libcublasLt.so.11 #88882

Comments

Uh oh!

🐛 Describe the bug

Versions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!