[ROCm] Prevent accidental enablement of efficient attention. #133331

xinyazhang · 2024-08-13T17:15:28Z

Currently Efficient attention and Flash attention share the same set of GPU
kernels on ROCM and have common limitations on head sizes.

Fixes #132004

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

This fixes pytorch#132004

…d_tensors_cuda ROCm's Efficient Attention (GPU kernel shared with FA) is more tolerancing about the inputs.

pytorch-bot · 2024-08-13T17:15:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133331

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit e094699 with merge base 89795da ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

rocm / linux-focal-rocm6.1-py3.8 / test (default, 1, 6, linux.rocm.gpu.2) (gh) (disabled by #126853 but the issue was closed recently and a rebase is needed to make it pass)
inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_vertical_pointwise_reduction_fusion_cuda

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

rocm / linux-focal-rocm6.1-py3.8 / test (default, 2, 6, linux.rocm.gpu.2) (gh) (trunk failure)
'test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/test_transformers.py

malfet · 2024-08-16T18:16:09Z

@xinyazhang is this read for review? If so, can you please remove draft status

xinyazhang · 2024-08-16T18:17:22Z

is this read for review? If so, can you please remove draft status

Yes this is ready. I'll implement your suggestion and move it out of draft status.

malfet · 2024-08-26T20:38:43Z

@pytorchbot merge

pytorchmergebot · 2024-08-26T20:40:26Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

xinyazhang · 2024-08-26T20:42:40Z

@pytorchbot label "topic: not user facing"

xinyazhang · 2024-08-26T20:46:25Z

@pytorchbot merge

pytorchmergebot · 2024-08-26T20:48:06Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-08-27T00:00:58Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

jithunnair-amd · 2024-08-27T00:01:52Z

@pytorchbot merge -f "Unrelated CI failures. Critical fix needed for 2.4.1"

pytorchmergebot · 2024-08-27T00:03:35Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jithunnair-amd · 2024-08-27T00:04:35Z

@pytorchbot cherry-pick --onto release/2.4 -c critical

Currently Efficient attention and Flash attention share the same set of GPU kernels on ROCM and have common limitations on head sizes. Fixes #132004 Pull Request resolved: #133331 Approved by: https://github.com/malfet, https://github.com/jithunnair-amd (cherry picked from commit 46ecc67)

pytorchbot · 2024-08-27T00:08:43Z

Cherry picking #133331

The cherry pick PR is at #134531 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

[v2.4.1] Release Tracker #132400 (comment)

Details for Dev Infra team

Raised by workflow job

[ROCm] Prevent accidental enablement of efficient attention. (#133331) Currently Efficient attention and Flash attention share the same set of GPU kernels on ROCM and have common limitations on head sizes. Fixes #132004 Pull Request resolved: #133331 Approved by: https://github.com/malfet, https://github.com/jithunnair-amd (cherry picked from commit 46ecc67) Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com>

…#134531) [ROCm] Prevent accidental enablement of efficient attention. (pytorch#133331) Currently Efficient attention and Flash attention share the same set of GPU kernels on ROCM and have common limitations on head sizes. Fixes pytorch#132004 Pull Request resolved: pytorch#133331 Approved by: https://github.com/malfet, https://github.com/jithunnair-amd (cherry picked from commit 46ecc67) Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com>

…#134531) (#1565) [ROCm] Prevent accidental enablement of efficient attention. (pytorch#133331) Currently Efficient attention and Flash attention share the same set of GPU kernels on ROCM and have common limitations on head sizes. Pull Request resolved: pytorch#133331 Approved by: https://github.com/malfet, https://github.com/jithunnair-amd (cherry picked from commit 46ecc67) Fixes pytorch#132004 Co-authored-by: pytorchbot <soumith+bot@pytorch.org>

…#133331) Currently Efficient attention and Flash attention share the same set of GPU kernels on ROCM and have common limitations on head sizes. Fixes pytorch#132004 Pull Request resolved: pytorch#133331 Approved by: https://github.com/malfet, https://github.com/jithunnair-amd

xinyazhang added 2 commits August 13, 2024 17:12

Prevent accidental enablement of efficient attention.

5be3b68

This fixes pytorch#132004

Fix test_invalid_fused_inputs_head_dim_kernel1_cuda and test_unaligne…

5f02177

…d_tensors_cuda ROCm's Efficient Attention (GPU kernel shared with FA) is more tolerancing about the inputs.

pytorchbot added the open source label Aug 13, 2024

pruthvistony added module: rocm AMD GPU support for Pytorch rocm This tag is for PRs from ROCm team rocm priority high priority ROCm PRs from performance or other aspects ciflow/rocm Trigger "default" config CI on ROCm topic: bug fixes topic category labels Aug 13, 2024

pruthvistony added this to the 2.4.1 milestone Aug 13, 2024

malfet reviewed Aug 13, 2024

View reviewed changes

test/test_transformers.py Outdated Show resolved Hide resolved

Use xfailIfRocm for test_invalid_fused_inputs_head_dim

efe5bf3

malfet reviewed Aug 16, 2024

View reviewed changes

test/test_transformers.py Outdated Show resolved Hide resolved

Prefer context manager over xfail

12c92eb

xinyazhang marked this pull request as ready for review August 16, 2024 18:22

xinyazhang mentioned this pull request Aug 16, 2024

Improve Backward Performance and Navi31 Support ROCm/aotriton#39

Merged

colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 16, 2024

Fix lint

e094699

jithunnair-amd requested a review from malfet August 22, 2024 22:55

jeffdaily changed the title ~~Prevent accidental enablement of efficient attention.~~ [ROCm] Prevent accidental enablement of efficient attention. Aug 26, 2024

malfet approved these changes Aug 26, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 26, 2024

pytorchmergebot added the merging label Aug 26, 2024

pytorchmergebot removed the merging label Aug 26, 2024

pytorch-bot bot added the topic: not user facing topic category label Aug 26, 2024

jithunnair-amd approved these changes Aug 26, 2024

View reviewed changes

pytorchmergebot added the merging label Aug 26, 2024

pytorchmergebot added the Merged label Aug 27, 2024

pytorchmergebot closed this in 46ecc67 Aug 27, 2024

pytorchmergebot removed the merging label Aug 27, 2024

pytorchbot mentioned this pull request Aug 27, 2024

[v2.4.1] Release Tracker #132400

Closed

jithunnair-amd mentioned this pull request Aug 28, 2024

DISABLED test_transformerencoderlayer_cuda_float32 (__main__.TestNNDeviceTypeCUDA) #134687

Open

xinyazhang mentioned this pull request Aug 28, 2024

[ROCm] Prevent accidental enablement of efficient attention. (#134531) ROCm/pytorch#1565

Merged

atalman mentioned this pull request Aug 28, 2024

Release 2.4.1 validations checklist and cherry-picks #134694

Closed

40 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Prevent accidental enablement of efficient attention. #133331

[ROCm] Prevent accidental enablement of efficient attention. #133331

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[ROCm] Prevent accidental enablement of efficient attention. #133331

[ROCm] Prevent accidental enablement of efficient attention. #133331

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133331

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge failed

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Cherry picking #133331

Uh oh!

Uh oh!