8000 Pytorch 2.4 RC cu118 wheels do not work on old drivers · Issue #130684 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Pytorch 2.4 RC cu118 wheels do not work on old drivers #130684

8000
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ppwwyyxx opened this issue Jul 13, 2024 · 4 comments
Closed

Pytorch 2.4 RC cu118 wheels do not work on old drivers #130684

ppwwyyxx opened this issue Jul 13, 2024 · 4 comments
Labels
module: cuda Related to torch.cuda, and CUDA support in general module: third_party oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module upstream triton Upstream Triton Issue
Milestone

Comments

@ppwwyyxx
Copy link
Collaborator
ppwwyyxx commented Jul 13, 2024

🐛 Describe the bug

Pytorch2.4 uses a new version of triton that adds the cuTensorMapEncodeTiled API (triton-lang/triton@7289a23#diff-0d645ca31937abba9a3357062ee2c3708f6d49f66d7842d5f6577a2044f962f5)

This API requires a sufficiently new NVIDIA driver. Otherwise triton refuses to compile anything. To reproduce:

Traceback (most recent call last):
  File "/users/XXXg/home/projects/triton/python/tutorials/06-fused-attention.py", line 81, in <module>
    configs = [
  File "/users/XXXg/home/projects/triton/python/tutorials/06-fused-attention.py", line 85, in <listcomp>
    for s in ([1] if is_hip() else [3, 4, 7])\
  File "/users/XXXg/home/projects/triton/python/tutorials/06-fused-attention.py", line 22, in is_hip
    return triton.runtime.driver.active.get_current_target().backend == "hip"
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in __getattr__
    self._initialize_obj()
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
    self._obj = self._init_fn()
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/runtime/driver.py", line 9, in _create_driver
    return actives[0]()
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 371, in __init__
    self.utils = CudaUtils()  # TODO: make static
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 80, in __init__
    mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 62, in compile_module_from_src
    mod = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176, in create_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /home/XXXg/.triton/cache/2920354f453efffb492e73b112abcee1d2d301a37ade21e318a1ba26fa4fcd7c/cuda_utils.so: undefined symbol: cuTensorMapEncodeTiled

My driver version is: NVIDIA-SMI 470.161.03 Driver Version: 470.161.03. Note that this driver had been running older pytorch cu118 wheels without problems.

Related issue: triton-lang/triton#2062

Versions

PyTorch version: 2.4.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.3) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.10.0 (default, Dec 18 2023, 03:34:21) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.4.250-2-velinux1u1-amd64-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY


Nvidia driver version: 470.161.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.7.0

cc @ptrblck @msaroufim @ezyang @anijain2305 @chauhang @penguinwu @bertmaher @int3 @davidberard98 @nmacchioni @chenyang78 @embg @malfet @seemethere

@colesbury colesbury added module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jul 15, 2024
@malfet malfet added oncall: pt2 module: third_party upstream triton Upstream Triton Issue and removed module: build Build system issues labels Jul 15, 2024
@malfet
Copy link
Contributor
malfet commented Jul 15, 2024

Removing module: build and adding oncall: pt2 and upstream triton labels.
I remember seeing similar issue around 2.2 timeframe and PR that fixed the problem: https://github.com/triton-lang/triton/pull/2771/files#diff-9c7ee36285036f90cb3f4fdec9c78cd2b9bc8229f52076e16f05eeab65f40a20 but looks like it resurfaced via a different API call.

Tentative fix triton-lang/triton#4330

@ptrblck
Copy link
Collaborator
ptrblck commented Jul 16, 2024

It's unclear to me if PyTorch + CUDA 11.8 + Triton is a supported and valid combination based on #106144 (comment) and #115075 which point to compatibility issues with Triton.

@malfet We should discuss how much resources we should spend in fixing Triton support for CUDA 11 given the PyTorch + CUDA 11.8 binaries are built for users who cannot update their drivers and are using older GPU architectures (for which Triton might be limited speedup opportunities, but please correct me if that's not the case).

@atalman
Copy link
Contributor
atalman commented Jul 22, 2024

Confirmed with 2.4, this is resolved. Running as follows:

TRITON_PTXAS_PATH=/usr/local/lib/python3.10/site-packages/torch/bin/ptxas  python smoke_test.py --package torchonly
torch: 2.4.0+cu118
ATen/Parallel:
	at::get_num_threads() : 8
	at::get_num_interop_threads() : 16
OpenMP 201511 (a.k.a. OpenMP 4.5)
	omp_get_max_threads() : 8
Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
	mkl_get_max_threads() : 8
Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)
std::thread::hardware_concurrency() : 16
Environment variables:
	OMP_NUM_THREADS : [not set]
	MKL_NUM_THREADS : [not set]
ATen parallel backend: OpenMP

Skip version check for channel None as stable version is None
Testing smoke_test_conv2d
Testing smoke_test_linalg on cpu
Testing smoke_test_compile for cuda and torch.float16
Testing smoke_test_compile for cuda and torch.float32
Testing smoke_test_compile for cuda and torch.float64
Testing smoke_test_compile with mode 'max-autotune' for torch.float32
AUTOTUNE convolution(64x32x26x26, 64x32x3x3)
  triton_convolution_9 0.0963 ms 100.0%
  triton_convolution_7 0.0983 ms 97.9%
....

@masnesral
Copy link
Contributor

Confirmed with 2.4, this is resolved

@atalman I'm assuming we can close this then. Please reopen if I misunderstood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cuda Related to torch.cuda, and CUDA support in general module: third_party oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module upstream triton Upstream Triton Issue
Projects
None yet
Development

No branches or pull requests

6 participants
0