-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Closed
Labels
module: rocmAMD GPU support for PytorchAMD GPU support for Pytorchoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queue
Description
🐛 Describe the bug
AMDGPU build without FLASH_ATTENTION fails to compile attention_backward.hip with:
cd /var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip && /usr/bin/cmake -E make_directory /var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/. && /usr/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELWITHDEBINFO -D generated_file:STRING=/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention_backward.hip.o -P /var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o.cmake
clang++: warning: argument unused during compilation: '--offload-compress' [-Wunused-command-line-argument]
/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src/ATen/native/transformers/hip/attention_backward.hip:335:3: error: use of undeclared identifier 'CHECK_NOSPARSE_CONTIGUOUS_CUDA'
335 | CHECK_NOSPARSE_CONTIGUOUS_CUDA(grad_out);
| ^
/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src/ATen/native/transformers/hip/attention_backward.hip:337:3: error: use of undeclared identifier 'CHECK_NOSPARSE_LASTCONTIGUOUS_CUDA'
337 | CHECK_NOSPARSE_LASTCONTIGUOUS_CUDA(query);
| ^
/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src/ATen/native/transformers/hip/attention_backward.hip:338:3: error: use of undeclared identifier 'CHECK_NOSPARSE_LASTCONTIGUOUS_CUDA'
338 | CHECK_NOSPARSE_LASTCONTIGUOUS_CUDA(key);
| ^
/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src/ATen/native/transformers/hip/attention_backward.hip:339:3: error: use of undeclared identifier 'CHECK_NOSPARSE_LASTCONTIGUOUS_CUDA'
339 | CHECK_NOSPARSE_LASTCONTIGUOUS_CUDA(value);
| ^
/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src/ATen/native/transformers/hip/attention_backward.hip:349:5: error: use of undeclared identifier 'CHECK_NOSPARSE_CONTIGUOUS_CUDA'
349 | CHECK_NOSPARSE_CONTIGUOUS_CUDA((*cu_seqlens_q));
| ^
/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src/ATen/native/transformers/hip/attention_backward.hip:350:5: error: use of undeclared identifier 'CHECK_NOSPARSE_CONTIGUOUS_CUDA'
350 | CHECK_NOSPARSE_CONTIGUOUS_CUDA((*cu_seqlens_k));
| ^
6 errors generated when compiling for gfx1100.
failed to execute:/usr/lib/llvm/20/bin/clang++ --offload-arch=gfx1100 -O3 -c -x hip /var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src/ATen/native/transformers/hip/attention_backward.hip -o "/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention_backward.hip.o" --offload-compress -DUSE_NCCL -DUSE_ROCM -D__HIP_PLATFORM_AMD__ -DUSE_MEM_EFF_ATTENTION -DTORCH_HIP_BUILD_MAIN_LIB -DROCM_VERSION=60403 -DTORCH_HIP_VERSION=604 -DONNX_ML=1 -DONNXIFI_ENABLE_EXT=1 -DONNX_NAMESPACE=onnx -DHAVE_MMAP=1 -D_FILE_OFFSET_BITS=64 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DUSE_EXTERNAL_MZCRC -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -D__HIP_PLATFORM_AMD__=1 -DUSE_PROF_API=1 -DGFLAGS_IS_A_DLL=0 -DGLOG_CUSTOM_PREFIX_SUPPORT -DAT_PER_OPERATOR_HEADERS -DPROTOBUF_USE_DLLS -D__HIP_PLATFORM_AMD__ -fPIC -D__HIP_PLATFORM_AMD__=1 -DCUDA_HAS_FP16=1 -DUSE_ROCM -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_HIP_VERSION=604 -Wno-shift-count-negative -Wno-shift-count-overflow -Wno-duplicate-decl-specifier -DCAFFE2_USE_MIOPEN -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP -DHIPBLAS_V2 -DHIP_ENABLE_WARP_SYNC_BUILTINS -fno-gpu-rdc -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build/aten/src -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0 -I/usr/include -I/usr/include -I/usr/include/eigen3 -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/moodycamel -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/INTERFACE -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/third_party/concurrentqueue -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src/THH -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src/ATen/hip -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src/ATen/../../../../composable_kernel-8086bbe3a78d931eb96fe12fdc014082e18d18d3/include -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src/ATen/../../../../composable_kernel-8086bbe3a78d931eb96fe12fdc014082e18d18d3/library/include -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build/caffe2/aten/src/ATen/composable_kernel -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build/caffe2/aten/src -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build/aten/src -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src/ATen/.. -I/usr/include -I/usr/include/rocblas -I/usr/include -I/usr/include -I/usr/include -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/c10/hip/../.. -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/c10/../ -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build -I/usr/include -I/usr/include -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/torch/csrc/api -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/torch/csrc/api/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include/rocblas -I/usr/include -I/usr/include -I/usr/include -I/usr/include/hiprand -I/usr/include -I/usr/include/rocrand -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build/aten/src -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/aten/src -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0_build -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0 -I/usr/include -I/usr/include/eigen3 -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/moodycamel -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/INTERFACE -I/var/tmp/portage/sci-ml/caffe2-2.8.0/work/pytorch-2.8.0/third_party/concurrentqueue
These macros are defined in the header which is conditionally included:
pytorch/aten/src/ATen/native/transformers/cuda/attention_backward.cu
Lines 36 to 39 in ba56102
| #ifdef USE_FLASH_ATTENTION | |
| // FlashAttention Specific Imports | |
| #include <ATen/native/transformers/cuda/flash_attn/flash_api.h> | |
| #endif |
In the previous release these macros were located in aotriton_adapter.h, moved to conditionally included flash_api.h in d547a56
Full build log: build.log
CC @xw285cornell @jianyuh, @houseroad
Versions
2.8.0
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd
Metadata
Metadata
Assignees
Labels
module: rocmAMD GPU support for PytorchAMD GPU support for Pytorchoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queue
Type
Projects
Status
Done