-
Notifications
You must be signed in to change notification settings - Fork 24.3k
Can't import torch --> OSError related to libcublasLt.so.11 #88882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you please clarify, how you've installed PyTorch into your container? Using Also, please try install torch wheel using |
Yep, I installed pytorch using pip, following the install guide. Sure thing. The I've tried that install method, and it says all the requirements have already been met, and the error remains.
|
I just ran into the same issue on Ubuntu 22.04. The reason is that Pytorch loads Quick solution: set A more solid fix would be to change def _preload_cuda_deps():
""" Preloads cudnn/cublas deps if they could not be found otherwise """
# Should only be called on Linux if default path resolution have failed
assert platform.system() == 'Linux', 'Should only be called on Linux'
for path in sys.path:
nvidia_path = os.path.join(path, 'nvidia')
if not os.path.exists(nvidia_path):
continue
cublaslt_path = os.path.join(nvidia_path, 'cublas', 'lib', 'libcublasLt.so.11')
cublas_path = os.path.join(nvidia_path, 'cublas', 'lib', 'libcublas.so.11')
cudnn_path = os.path.join(nvidia_path, 'cudnn', 'lib', 'libcudnn.so.8')
if not os.path.exists(cublaslt_path) or not os.path.exists(cublas_path) or not os.path.exists(cudnn_path):
continue
break
ctypes.CDLL(cublaslt_path)
ctypes.CDLL(cublas_path)
ctypes.CDLL(cudnn_path) |
Thanks for the reply @mergian! Seems like that's the right direction for a full solution. For me I managed to solve it by trial and error, eventually using the following to install pytorch:
Which I guess fixed it due to a mismatch in cuda versions previously? Not too sure. But either way, I do not have the problem after this, and torch works fine. I don't want to close this issue, since it seems like there is a more underlying problem that needs addressing. |
Uh oh!
There was an error while loading. Please reload this page.
🐛 Describe the bug
When I try to import torch in my docker container, I get an OSError:
/usr/local/lib/python3.8/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11
I've followed the step from Issue 51080, which seems like it might be similar, but to no effect. Note, I don't have conda, so I didn't follow those specific steps, just the pip ones.
However, I have found that if I import tensorflow first, list my devices and then import torch it works fine...
Which makes me suspect that this is a different issue to Issue 51080.
The full traceback of
import torch
is:Versions
Python version: 3.8.10
Docker version: 20.10.21
BASE_IMAGE=tensorflow/tensorflow
IMAGE_VERSION=2.9.1-gpu-jupyter
Output of
nvcc -V
:Output of
torch.__version__
in python:Output of collect_env.py:
cc @seemethere @malfet @osalpekar @atalman @ptrblck @ezyang @ngimel
The text was updated successfully, but these errors were encountered: