-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Description
🐛 Describe the bug
By default triton release/3.5x ships a PTXAS version that is based on CUDA12.8.
** in environments that the latest CTK is NOT installed**
Comparing to PTXAS from CUDA13.0, CUDA12.8 ptxas is not capable to handle THOR device (which underwent a renaming, see llvm/llvm-project#156096 for background related issue. Note this llvm issue 156096 has been fixed in triton/3.5.x via triton-lang/llvm-project#2, which can be verified with a CTK 13.0. Referencing here just for the renaming context) and for other newer devices.
Users on THOR would encounter:
ptxas fatal : Value 'sm_110a' is not defined for option 'gpu-name'
Users on SM_121 device (https://docs.nvidia.com/cuda/pdf/CUDA_Features_Archive.pdf) would encounter
ptxas fatal : Value 'sm_121a' is not defined for option 'gpu-name'
See also the report llvm/llvm-project#156096 (comment) from @mcr-ksh
** in environments that has the latest CTK installed **
Users may still need the explicit "export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas" to get Triton to pick up the right ptxas.
We have a few options:
- According to @ptrblck, one workaround could be to ship ptxas12 as well as ptxas13 and use the appropriate one using a runtime check for the PyTorch/CUDA version, we did this in the past for Blackwell (using ptxas_blackwell) when ptxas==12.8.
- PyTorch cu126/cu128/cu130 shipping a different ptxas, then triton won't need one
- we build triton cuda wheels separately for cu126/cu128/cu130.
No.1 seems to be doable for final v2.9RC. Thoughts?
cc @seemethere @malfet @atalman @ptrblck @eqy @tinglvv @xwang233 @davidberard98
Versions
Triton release/3.5.x
Metadata
Metadata
Assignees
Labels
Type
Projects
Status