Pytorch 2.4 RC cu118 wheels do not work on old drivers #130684

ppwwyyxx · 2024-07-13T23:14:25Z

🐛 Describe the bug

Pytorch2.4 uses a new version of triton that adds the cuTensorMapEncodeTiled API (triton-lang/triton@7289a23#diff-0d645ca31937abba9a3357062ee2c3708f6d49f66d7842d5f6577a2044f962f5)

This API requires a sufficiently new NVIDIA driver. Otherwise triton refuses to compile anything. To reproduce:

Use a machine with pre-cuda12 drivers.
Create a venv, install pytorch 2.4 cu118 RC using pip
Run the official triton example https://github.com/triton-lang/triton/blob/main/python/tutorials/06-fused-attention.py
Obtain error:

Traceback (most recent call last):
  File "/users/XXXg/home/projects/triton/python/tutorials/06-fused-attention.py", line 81, in <module>
    configs = [
  File "/users/XXXg/home/projects/triton/python/tutorials/06-fused-attention.py", line 85, in <listcomp>
    for s in ([1] if is_hip() else [3, 4, 7])\
  File "/users/XXXg/home/projects/triton/python/tutorials/06-fused-attention.py", line 22, in is_hip
    return triton.runtime.driver.active.get_current_target().backend == "hip"
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in __getattr__
    self._initialize_obj()
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
    self._obj = self._init_fn()
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/runtime/driver.py", line 9, in _create_driver
    return actives[0]()
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 371, in __init__
    self.utils = CudaUtils()  # TODO: make static
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 80, in __init__
    mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 62, in compile_module_from_src
    mod = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176, in create_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /home/XXXg/.triton/cache/2920354f453efffb492e73b112abcee1d2d301a37ade21e318a1ba26fa4fcd7c/cuda_utils.so: undefined symbol: cuTensorMapEncodeTiled

My driver version is: NVIDIA-SMI 470.161.03 Driver Version: 470.161.03. Note that this driver had been running older pytorch cu118 wheels without problems.

Related issue: triton-lang/triton#2062

Versions

PyTorch version: 2.4.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.3) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.10.0 (default, Dec 18 2023, 03:34:21) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.4.250-2-velinux1u1-amd64-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY


Nvidia driver version: 470.161.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.7.0

cc @ptrblck @msaroufim @ezyang @anijain2305 @chauhang @penguinwu @bertmaher @int3 @davidberard98 @nmacchioni @chenyang78 @embg @malfet @seemethere

The text was updated successfully, but these errors were encountered:

malfet · 2024-07-15T23:59:50Z

Removing module: build and adding oncall: pt2 and upstream triton labels.
I remember seeing similar issue around 2.2 timeframe and PR that fixed the problem: https://github.com/triton-lang/triton/pull/2771/files#diff-9c7ee36285036f90cb3f4fdec9c78cd2b9bc8229f52076e16f05eeab65f40a20 but looks like it resurfaced via a different API call.

Tentative fix triton-lang/triton#4330

ptrblck · 2024-07-16T16:11:26Z

It's unclear to me if PyTorch + CUDA 11.8 + Triton is a supported and valid combination based on #106144 (comment) and #115075 which point to compatibility issues with Triton.

@malfet We should discuss how much resources we should spend in fixing Triton support for CUDA 11 given the PyTorch + CUDA 11.8 binaries are built for users who cannot update their drivers and are using older GPU architectures (for which Triton might be limited speedup opportunities, but please correct me if that's not the case).

atalman · 2024-07-22T20:39:35Z

Confirmed with 2.4, this is resolved. Running as follows:

TRITON_PTXAS_PATH=/usr/local/lib/python3.10/site-packages/torch/bin/ptxas  python smoke_test.py --package torchonly
torch: 2.4.0+cu118
ATen/Parallel:
	at::get_num_threads() : 8
	at::get_num_interop_threads() : 16
OpenMP 201511 (a.k.a. OpenMP 4.5)
	omp_get_max_threads() : 8
Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
	mkl_get_max_threads() : 8
Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)
std::thread::hardware_concurrency() : 16
Environment variables:
	OMP_NUM_THREADS : [not set]
	MKL_NUM_THREADS : [not set]
ATen parallel backend: OpenMP

Skip version check for channel None as stable version is None
Testing smoke_test_conv2d
Testing smoke_test_linalg on cpu
Testing smoke_test_compile for cuda and torch.float16
Testing smoke_test_compile for cuda and torch.float32
Testing smoke_test_compile for cuda and torch.float64
Testing smoke_test_compile with mode 'max-autotune' for torch.float32
AUTOTUNE convolution(64x32x26x26, 64x32x3x3)
  triton_convolution_9 0.0963 ms 100.0%
  triton_convolution_7 0.0983 ms 97.9%
....

masnesral · 2025-02-24T22:38:44Z

Confirmed with 2.4, this is resolved

@atalman I'm assuming we can close this then. Please reopen if I misunderstood.

colesbury added module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jul 15, 2024

malfet added oncall: pt2 module: third_party upstream triton Upstream Triton Issue and removed module: build Build system issues labels Jul 15, 2024

malfet added this to the 2.4.0 milestone Jul 16, 2024

atalman mentioned this issue Jul 16, 2024

Add support for testing with minimum supported Nvidia Drivers to release validations pytorch/test-infra#5434

Open

atalman mentioned this issue Jul 17, 2024

[RUNTIME] Fix the function lookup problem for CUDA 11 driver triton-lang/triton#4335

Merged

masnesral closed this as completed Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pytorch 2.4 RC cu118 wheels do not work on old drivers #130684

Pytorch 2.4 RC cu118 wheels do not work on old drivers #130684

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pytorch 2.4 RC cu118 wheels do not work on old drivers #130684

Pytorch 2.4 RC cu118 wheels do not work on old drivers #130684

Comments

Uh oh!

🐛 Describe the bug

Versions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!