SDPA: CUDNN backend error w/ q_seq_len = 1 #138529

drisspg · 2024-10-22T00:08:09Z

Summary

Repro script

import torch
import torch.nn as nn
import torch.nn.functional as F


q = torch.randn(1, 16, 1, 64, device="cuda", dtype=torch.bfloat16, requires_grad=True)
k = torch.randn(1, 16, 2**16, 64, device="cuda", dtype=torch.bfloat16, requires_grad=True)
v = torch.randn(1, 16, 2**16, 64, device="cuda", dtype=torch.bfloat16, requires_grad=True)


from torch.nn.attention import sdpa_kernel, SDPBackend    

with sdpa_kernel(SDPBackend.CUDNN_ATTENTION):
    out = F.scaled_dot_product_attention(q, k, v)
    out.backward(torch.ones_like(out))

Error:

/home/drisspg/meta/pytorch/torch/autograd/graph.py:825: UserWarning: cuDNN SDPA backward got an innermost stride of 0 in grad_out, which is unsupported. Materializing a contiguous tensor which will increase memory usage... (Triggered internally at /home/drisspg/meta/pytorch/aten/src/ATen/native/cudnn/MHA.cpp:664.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
  File "/home/drisspg/meta/scripts/sdpa/repro_gqa.py", line 15, in <module>
    out.sum().backward()
  File "/home/drisspg/meta/pytorch/torch/_tensor.py", line 624, in backward
    torch.autograd.backward(
  File "/home/drisspg/meta/pytorch/torch/autograd/__init__.py", line 347, in backward
    _engine_run_backward(
  File "/home/drisspg/meta/pytorch/torch/autograd/graph.py", line 825, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph.

cc @ptrblck @msaroufim @eqy @mikaylagawarecki

drisspg · 2024-10-22T00:10:35Z

FrontEnd Log: frontendlog.txt

BackendLog: backendlog.txt

eqy · 2024-10-22T00:26:17Z

Looks like it's not happy with sequence length 1, rather than the mismatched s_q vs. s_kv, forwarding to cuDNN...

# Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc atalman cc mikaylagawarecki [ghstack-poisoned]

@atalman

# Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc @atalman Pull Request resolved: #138522 Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/malfet (cherry picked from commit 9a9a0ab)

@atalman

[SDPA-CUDNN] Make CuDNN Attention Opt in (#138522) # Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc @atalman Pull Request resolved: #138522 Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/malfet (cherry picked from commit 9a9a0ab) Co-authored-by: drisspg <drisspguessous@gmail.com>

@atalman

# Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc @atalman Pull Request resolved: #138522 Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/malfet

Forwarded #138529 to the cuDNN team but for now but we want to avoid dispatching to unsupported cases Pull Request resolved: #138531 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

) Forwarded pytorch#138529 to the cuDNN team but for now but we want to avoid dispatching to unsupported cases Pull Request resolved: pytorch#138531 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

drisspg added module: cuda Related to torch.cuda, and CUDA support in general module: multi-headed-attention labels Oct 22, 2024

eqy mentioned this issue Oct 22, 2024

[cuDNN][SDPA] Bail out of cuDNN SDPA for seqlen 1 inputs #138531

Closed

drisspg changed the title ~~CuDNN Backend cuDNN Frontend error~~ SDPA: CUDNN backend error w/ q_seq_len = 1 Oct 22, 2024

drisspg mentioned this issue Oct 22, 2024

[SDPA-CUDNN] Make CuDNN Attention Opt in #138522

Closed

pytorchbot mentioned this issue Oct 22, 2024

[SDPA-CUDNN] Make CuDNN Attention Opt in #138587

Merged

drisspg mentioned this issue Oct 22, 2024

SDPA 2.5 Issue tracking #138649

Open

bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 23, 2024

kiwik mentioned this issue Nov 14, 2024

Monthly issue metrics report kiwik/os-version-checker#65

Closed

drisspg mentioned this issue Nov 20, 2024

CuDNN SDPA Issue Tracker #141133

Open

9 tasks

drisspg added module: sdpa All things related to torch.nn.functional.scaled_dot_product_attentiion and removed module: multi-headed-attention labels Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SDPA: CUDNN backend error w/ q_seq_len = 1 #138529

SDPA: CUDNN backend error w/ q_seq_len = 1 #138529

Uh oh!

Uh oh!

SDPA: CUDNN backend error w/ q_seq_len = 1 #138529

SDPA: CUDNN backend error w/ q_seq_len = 1 #138529

Comments

Uh oh!

Summary

Uh oh!

Uh oh!