Torch model compile error "/usr/bin/ld: cannot find -lcuda" though cuda is installed via run file #103417

alexcpn · 2023-06-12T08:08:53Z

🐛 Describe the bug

I have installed the NVIDIA driver seperate and CUDA seperate

libcuda.so --> is provided by the NVIDIA Driver and is here

/usr/lib/x86_64-linux-gnu/libcuda.so.525.105.17
/usr/lib/x86_64-linux-gnu/libcuda.so.1

libcudart.so --> is provided by CUDA Runtime and is here

ld  -L/usr/local/cuda/lib64/ -lcudart --verbose
attempt to open /usr/local/cuda/lib64//libcudart.so succeeded

and it is linked to CUDA 12.0

ll /usr/local/cuda/lib64//libcudart.so
lrwxrwxrwx 1 root root 15 Jun  6 21:14 /usr/local/cuda/lib64//libcudart.so -> libcudart.so.12*

All this is fine and as expected

I have given the LD_LIBRARY_PATH

export LD_LIBRARY_PATH=/usr/local/cuda/lib64
sudo ldconfig

I am able to run a model in GPU. However when I run the torch.model.compile it links against libcuda.so. From my understanding it shoud be able to work also with libcudart.so ; but I am unable to set any environment variable or flag to let torch to use this library

Sample Code

import torch
import torchvision

print("torch version is ",torch.__version__)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)


x=torch.ones(1,3,224,224).to(device)
model=torchvision.models.resnet50().to(device)
compiled=torch.compile(model)
compiled(x)

Ouput

python test_cuda.py 
torch version is  2.0.0.dev20230202+cu116
Using device: cuda
/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:89: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status

Versions

Collecting environment information...
PyTorch version: 2.0.0.dev20230202+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Pop!_OS 22.04 LTS (x86_64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Clang version: Could not collect
CMake version: version 3.25.0
Libc version: glibc-2.35

Python version: 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-6.2.6-76060206-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.0.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 525.105.17
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 7 5800H with Radeon Graphics
CPU family: 25
Model: 80
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 0
Frequency boost: enabled
CPU max MHz: 4462.5000
CPU min MHz: 1200.0000
BogoMIPS: 6388.26
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
Virtualization: AMD-V
L1d cache: 256 KiB (8 instances)
L1i cache: 256 KiB (8 instances)
L2 cache: 4 MiB (8 instances)
L3 cache: 16 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.24.1
[pip3] pytorch-triton==2.0.0+0d7e753227
[pip3] torch==2.0.0.dev20230202+cu116
[pip3] torch-tb-profiler==0.4.0
[pip3] torchaudio==2.0.0.dev20230201+cu116
[pip3] torchvision==0.15.0.dev20230201+cu116
[conda] Could not collect

cc @ezyang @msaroufim @wconstab @bdhirsh @anijain2305

The text was updated successfully, but these errors were encountered:

Aidyn-A · 2023-06-12T15:50:57Z

The PyTorch version installed on your system ( 2.0.0.dev20230202+cu116) is linked against CUDA 11.6. I guess it was installed via pip or conda installation and if it was, the CUDA Runtime library will be already shipped with PyTorch. In my case the libcudart-24af1308.so.12 is in ~miniconda3/envs/pytorch-nightly/lib/python3.9/site-packages/torch/lib. So the exporting CUDA 12 is not doing any good.
The reason why it fails with -lcuda not found is because torch.compile directly utilizes CUDA Driver API.

malfet · 2023-06-12T21:42:06Z

Yeah, I want to contribute a small change to trition that would switch to dynamic linking with libcuda, similar to PyTorches LazyNVRTC bindings

alexcpn · 2023-06-14T06:42:51Z

@Aidyn-A
I uninstalled and installed torch via pip and it is not helping.

torch version is now 2.0.1+cu117. and I am not sure if it means cuda11.7 or some other cuda library torch is using (as I am not sure if not from the torch requirements where it got the 11.7 version libs). Anyway getting the same error- /usr/bin/ld: cannot find -lcuda:

Is there any other way to install torch so that torch.compile* starts using the CUDA Runtime libs - libcudart.so in /usr/local/cuda/lib64 ; or proper NVIDIA driver /usr/lib/x86_64-linux-gnu/libcuda.so.525.105.17

only torch.compile has the problem, other torch functions are working properly in GPU

export LD_LIBRARY_PATH=/usr/local/cuda/lib64
sudo ldconfig

pip install torch
Defaulting to user installation because normal site-packages is not writeable
Collecting torch
  Downloading torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 619.9/619.9 MB 792.9 kB/s eta 0:00:00
Collecting nvidia-cublas-cu11==11.10.3.66
  Downloading nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 317.1/317.1 MB 1.6 MB/s eta 0:00:00
Collecting triton==2.0.0
  Downloading triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.3/63.3 MB 5.5 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions in /home/alex/.local/lib/python3.10/site-packages (from torch) (4.4.0)
Collecting nvidia-nvtx-cu11==11.7.91
  Downloading nvidia_nvtx_cu11-11.7.91-py3-none-manylinux1_x86_64.whl (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.6/98.6 KB 1.8 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu11==11.4.0.1
  Downloading nvidia_cusolver_cu11-11.4.0.1-2-py3-none-manylinux1_x86_64.whl (102.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 102.6/102.6 MB 3.4 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu11==11.7.99
  Downloading nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.0/21.0 MB 7.7 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu11==11.7.4.91
  Downloading nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl (173.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 173.2/173.2 MB 2.3 MB/s eta 0:00:00
Collecting nvidia-nccl-cu11==2.14.3
  Downloading nvidia_nccl_cu11-2.14.3-py3-none-manylinux1_x86_64.whl (177.
8000
1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.1/177.1 MB 2.6 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu11==11.7.99
  Downloading nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 849.3/849.3 KB 5.8 MB/s eta 0:00:00
Collecting nvidia-curand-cu11==10.2.10.91
  Downloading nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.6/54.6 MB 6.2 MB/s eta 0:00:00
Requirement already satisfied: filelock in /home/alex/.local/lib/python3.10/site-packages (from torch) (3.9.0)
Collecting nvidia-cufft-cu11==10.9.0.58
  Downloading nvidia_cufft_cu11-10.9.0.58-py3-none-manylinux1_x86_64.whl (168.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168.4/168.4 MB 2.4 MB/s eta 0:00:00
Requirement already satisfied: jinja2 in /home/alex/.local/lib/python3.10/site-packages (from torch) (3.0.2)
Requirement already satisfied: networkx in /home/alex/.local/lib/python3.10/site-packages (from torch) (3.0rc1)
Requirement already satisfied: sympy in /home/alex/.local/lib/python3.10/site-packages (from torch) (1.11.1)
Collecting nvidia-cudnn-cu11==8.5.0.96
  Downloading nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 557.1/557.1 MB 937.8 kB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu11==11.7.101
  Downloading nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB 8.8 MB/s eta 0:00:00
Requirement already satisfied: wheel in /usr/lib/python3/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (0.37.1)
Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (59.6.0)
Requirement already satisfied: cmake in /home/alex/.local/lib/python3.10/site-packages (from triton==2.0.0->torch) (3.25.0)
Collecting lit
  Downloading lit-16.0.5.post0.tar.gz (138 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 138.1/138.1 KB 2.2 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: MarkupSafe>=2.0 in /home/alex/.local/lib/python3.10/site-packages (from jinja2->torch) (2.0.1)
Requirement already satisfied: mpmath>=0.19 in /home/alex/.local/lib/python3.10/site-packages (from sympy->torch) (1.2.1)
Building wheels for collected packages: lit
  Building wheel for lit (setup.py) ... done
  Created wheel for lit: filename=lit-16.0.5.post0-py3-none-any.whl size=88273 sha256=bb5e6b868d36072af37c6975d0a51cb1ec267447f5eb1629d82807bcc72da208
  Stored in directory: /home/alex/.cache/pip/wheels/1a/24/92/1e1c9e37be8411a7c7c18a4c54962f5d0a75c56bab4a6f7f57
Successfully built lit
Installing collected packages: lit, nvidia-nvtx-cu11, nvidia-nccl-cu11, nvidia-cusparse-cu11, nvidia-curand-cu11, nvidia-cufft-cu11, nvidia-cuda-runtime-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-cupti-cu11, nvidia-cublas-cu11, nvidia-cusolver-cu11, nvidia-cudnn-cu11, triton, torch
Successfully installed lit-16.0.5.post0 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 torch-2.0.1 triton-2.0.0

output

 python test_cuda.py 
torch version is  2.0.1+cu117
Using device: cuda
/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:90: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 549, in _worker_compile
    kernel.precompile(warm_cache_only_with_cc=cc)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/triton_ops/autotune.py", line 69, in precompile
    self.launchers = [
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/triton_ops/autotune.py", line 70, in <listcomp>
    self._precompile_config(c, warm_cache_only_with_cc)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/triton_ops/autotune.py", line 83, in _precompile_config
    triton.compile(
  File "/home/alex/.local/lib/python3.10/site-packages/triton/compiler.py", line 1588, in compile
    so_path = make_stub(name, signature, constants)
  File "/home/alex/.local/lib/python3.10/site-packages/triton/compiler.py", line 1477, in make_stub
    so = _build(name, src_path, tmpdir)
  File "/home/alex/.local/lib/python3.10/site-packages/triton/compiler.py", line 1392, in _build
    ret = subprocess.check_call(cc_cmd)
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpu6tgmg13/main.c', '-O3', '-I/usr/local/cuda/include', '-I/usr/include/python3.10', '-I/tmp/tmpu6tgmg13', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpu6tgmg13/triton_.cpython-310-x86_64-linux-gnu.so']' returned non-zero exit status 1.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 670, in call_user_compiler
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
    compiled_fn = compiler_fn(gm, self.fake_example_inputs())
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/debug_utils.py", line 1055, in debug_wrapper
/usr/bin/ld: cannot find -lcuda: No such file or directory
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/__init__.py", line 1390, in __call__
    return compile_fx(model_, inputs_, config_patches=self.config)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 455, in compile_fx
    return aot_autograd(
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 48, in compiler_fn
    cg = aot_module_simplified(gm, example_inputs, **kwargs)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2822, in aot_module_simplified
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
    compiled_fn = create_aot_dispatcher_function(
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2515, in create_aot_dispatcher_function
    compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1715, in aot_wrapper_dedupe
    return compiler_fn(flat_fn, leaf_flat_args, aot_config)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2150, in aot_dispatch_autograd
    compiled_fw_func = aot_config.fw_compiler(
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 430, in fw_compiler
    return inner_compile(
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/debug_utils.py", line 595, in debug_wrapper
    compiled_fn = compiler_fn(gm, example_inputs)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/debug.py", line 239, in inner
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
    return fn(*args, **kwargs)
  File "/usr/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 177, in compile_fx_inner
    compiled_fn = graph.compile_to_fn()
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/graph.py", line 586, in compile_to_fn
    return self.compile_to_module().call
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/graph.py", line 575, in compile_to_module
    mod = PyCodeCache.load(code)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 528, in load
    exec(code, mod.__dict__, mod.__dict__)
  File "/tmp/torchinductor_alex/a2/ca2mqricjxyc2l6pwom64pcxh4uhe4f3ttrcaueqrxkih2hsyaua.py", line 1855, in <module>
    async_compile.wait(globals())
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 715, in wait
    scope[key] = result.result()
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 573, in result
    self.future.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpu6tgmg13/main.c', '-O3', '-I/usr/local/cuda/include', '-I/usr/include/python3.10', '-I/tmp/tmpu6tgmg13', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpu6tgmg13/triton_.cpython-310-x86_64-linux-gnu.so']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/alex/coding/tranformer_learn/test_cuda.py", line 12, in <module>
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
    compiled(x)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 82, in forward
    return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 209, in _fn
    return fn(*args, **kwargs)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 337, in catch_errors
    return callback(frame, cache_size, hooks)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 404, in _convert_frame
    result = inner_convert(frame, cache_size, hooks)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 104, in _fn
    return fn(*args, **kwargs)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 262, in _convert_frame_assert
    return _compile(
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 324, in _compile
    out_code = transform_code_object(code, transform)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 445, in transform_code_object
    transformations(instructions, code_options)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 311, in transform
    tracer.run()
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1726, in run
    super().run()
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 576, in run
    and self.step()
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 540, in step
    getattr(self, inst.opname)(inst)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1792, in RETURN_VALUE
    self.output.compile_subgraph(
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 517, in compile_subgraph
    self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 588, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/alex/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 675, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: debug_wrapper raised CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpu6tgmg13/main.c', '-O3', '-I/usr/local/cuda/include', '-I/usr/include/python3.10', '-I/tmp/tmpu6tgmg13', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpu6tgmg13/triton_.cpython-310-x86_64-linux-gnu.so']' returned non-zero exit status 1.

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

Yeah2333 · 2023-06-15T05:34:48Z

Hi, i find a solution to deal with this problem. You can find libcuda.so.525.105.17 in /usr/lib/x86_64-linux-gnu/. So, just create symbolic link for this :
sudo ln -s /usr/lib/x86_64-linux-gnu/libcuda.so.525.105.17 /usr/lib/libcuda.so

alexcpn · 2023-06-16T17:47:16Z

Hi, i find a solution to deal with this problem. You can find libcuda.so.525.105.17 in /usr/lib/x86_64-linux-gnu/. So, just create symbolic link for this : sudo ln -s /usr/lib/x86_64-linux-gnu/libcuda.so.525.105.17 /usr/lib/libcuda.so

I am not sure if that is really the right way;

From https://forums.developer.nvidia.com/t/checkmacros-cpp-272-error-code-1-cuda-runtime-cuda-driver-is-a-stub-library/202911/11

And by all means, make sure that at no point does your LD_LIBRARY_PATH env var include the path /usr/local/cuda/lib64/stubs. And by all means, don’t copy the stub version of libcuda.so anywhere. You shouldn’t ever copy or symlink to libcuda.so under any circumstances.

Also note that it generally should not be necessary to have the GPU driver install location on your LD_LIBRARY_PATH variable. The runtime loader is usually already configured (e.g. by ldconfig or similar) to look in the location that the GPU driver installer places it.

bhack · 2023-06-16T17:52:32Z

I have the same issue in the official pytorch Docker image. So we have an official env to reproduce this issue.

bhack · 2023-06-17T11:49:53Z

Same problem confirmed also on pytorch nightly images:
https://github.com/orgs/pytorch/packages/container/package/pytorch-nightly

mlazos · 2024-03-05T06:38:21Z

Hi @alexcpn are you still encountering this issue?

bhack · 2024-03-05T15:25:25Z

@mlazos We need to re-test after #119457 will produce a new nightly.

bhack · 2024-03-06T17:15:11Z

I've tried with the last official Docker nightly:
https://github.com/orgs/pytorch/packages/container/pytorch-nightly/187386166?tag=2.3.0.dev20240306-cuda12.1-cudnn8-runtime

The problem it is still here.

sdake · 2024-07-27T17:51:10Z

There is a problem with cutlass using functions from the driver library (libcuda.so). This is a relatively new introduction. Since nothing is linked with libcuda.so, the dependent software fails. See:
vllm-project/vllm#6864

malfet added oncall: pt2 upstream triton Upstream Triton Issue labels Jun 12, 2023

bhack mentioned this issue Jun 20, 2023

Docker images: faster linker for torch.compile #103891

Open

bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 29, 2023

msaroufim mentioned this issue Oct 3, 2023

Fix for GPU regression failure pytorch/serve#2636

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torch model compile error "/usr/bin/ld: cannot find -lcuda" though cuda is installed via run file #103417

Torch model compile error "/usr/bin/ld: cannot find -lcuda" though cuda is installed via run file #103417

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Torch model compile error "/usr/bin/ld: cannot find -lcuda" though cuda is installed via run file #103417

Torch model compile error "/usr/bin/ld: cannot find -lcuda" though cuda is installed via run file #103417

Comments

Uh oh!

🐛 Describe the bug

Versions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!