ROCm SDPA: Ensure attn_mask has the same dtype with q (#144398)

pytorchbot · xinyazhang · web-flow · commit a99cc48bcd5e · 2025-01-10T10:42:59.000-08:00
ROCm SDPA: Ensure attn_mask has the same dtype with q (#143242) This is required by current AOTriton's backend. Fixes NaN when calling SDPA ME backend with `q.dtype() != attn_mask.dtype()` when training llama2 using transformers+deepspeed+pytorch Corresponding CUDA check seems to be here: https://github.com/pytorch/pytorch/blob/708ce3c0082d670d9eaff84bc3c43cad4554a75d/aten/src/ATen/native/transformers/cuda/attention.cu#L1331-L1336 Pull Request resolved: #143242 Approved by: https://github.com/jeffdaily (cherry picked from commit 3068ce0) Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com>
diff --git a/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp b/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp
@@ -705,6 +705,14 @@ bool can_use_mem_efficient_attention(sdp_params const& params, bool debug) {
   }
 
 #ifdef USE_ROCM
+  if (params.attn_mask.has_value()) {
+    const auto q_dtype = params.query.dtype();
+    const auto bias_dtype = params.attn_mask.value().dtype();
+    if (bias_dtype != at::kBool && bias_dtype != q_dtype) {
+      TORCH_WARN("Efficient attention on ROCM requires attn_mask be boolean, or has the same datatype as of q,k,v");
+      return false;
+    }
+  }
   return check_tensor_dtype(params, aotriton_mem_efficient_dtypes, debug);
 #else
   auto dprop = at::cuda::getCurrentDeviceProperties();