Enable non power of 2 head_dim for FlexAttention #133495

drisspg · 2024-08-14T19:24:38Z

Stack from ghstack (oldest at bottom):

-> Enable non power of 2 head_dim for FlexAttention #133495

Summary

Adds support for non-power of 2 headdim by launching blocks w/ head_dim rounded to the next valid power.
Other option I considered was building up the final dot_products with smaller blocks (this would probably work but for sake of code complexity going with this option for now)

Corollary

We had a bug in our backwards kernel where we were using index_k instead of index_v. This should have shown up for the qk_head_dim != v_head_dim cases..

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @Chillee @yanboliang @BoyuanFeng

[ghstack-poisoned]

pytorch-bot · 2024-08-14T19:24:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133495

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 055f65a with merge base 40e27fb ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / unit-test / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_avx2, 2, 2, linux.10xlarge.avx2) (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 1
pull / linux-focal-py3.9-clang10 / test (dynamo_wrapped, 3, 3, lf.linux.2xlarge) (gh) (disabled by #116746 but the issue was closed recently and a rebase is needed to make it pass)
torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_second_order_accurate

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

========= COMPUTE-SANITIZER Test completed successfully! ========= ERROR SUMMARY: 0 errors ## NOTE: HMM very interestingly: If the og_headdim is a odd this works as expected. However when the og_head_dim is a multiple of 2 this segfaults here: ```Shell (lldb) bt * thread #67, name = 'pt_autograd_0', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x10) * frame #0: 0x00007ffed327fbfe libtriton.so`scheduleRemainingToLastStage(forOp=ForOp @ 0x00007ffcafdfd658, schedule=0x00007ffcafdfd9e0, afterPrologue=<unavailable>, numStages=2) at MatmulLoopPipeline.cpp:893:9 frame #1: 0x00007ffed328d970 libtriton.so`mlir::triton::preProcessLoopAndGetSchedule(forOp=0x00007ffcafdfddc0, numStages=2, options=0x00007ffcafdfde80) at MatmulLoopPipeline.cpp:1230:31 frame #2: 0x00007ffed32a6a43 libtriton.so`mlir::triton::gpu::PipelinePass::runOnOperation() [inlined] pipelineLoop(numStages=2, forOp=ForOp @ 0x00007ffcafdfddc0) at SoftwarePipeliner.cpp:79:47 frame #3: 0x00007ffed32a6998 libtriton.so`mlir::triton::gpu::PipelinePass::runOnOperation(this=0x00007ffc54767f10) at SoftwarePipeliner.cpp:125:36 frame #4: 0x00007ffed385147c libtriton.so`mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 700 frame #5: 0x00007ffed3851df2 libtriton.so`mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 354 frame #6: 0x00007ffed385481c libtriton.so`mlir::PassManager::run(mlir::Operation*) + 876 frame #7: 0x00007ffed3542bad libtriton.so`<lambda(mlir::PassManager&, mlir::ModuleOp&)>::operator(self=<unavailable>, mod=0x00007ffc54579280, __closure=<unavailable>)(mlir::PassManager &, mlir::ModuleOp &) at ir.cc:1625:19 frame #8: 0x00007ffed3560108 libtriton.so`_FUN [inlined] operator(this=0x0000000000000000, call=0x00007ffcafdfe6e0) at cast.h:1480:37 frame #9: 0x00007ffed35600f0 libtriton.so`_FUN((null)=0x00007ffcafdfe6e0) at pybind11.h:224:21 frame #10: 0x00007ffed9ae5590 libtriton.so`typeinfo for pybind11::handle + 24 frame #11: 0x00007ffed9ae5590 libtriton.so`typeinfo for pybind11::handle + 24 ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: f591ad7 Pull Request resolved: #133495

drisspg · 2024-08-15T03:09:00Z

triton-lang/triton#4521

foreverpiano · 2024-08-22T07:28:26Z

@drisspg has this fixed?

[ghstack-poisoned]

ghstack-source-id: d49cc53 Pull Request resolved: #133495

foreverpiano · 2024-10-22T03:30:51Z

@drisspg ping again, has this fixed?

github-actions · 2024-12-21T03:35:32Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

[ghstack-poisoned]

ghstack-source-id: af8ccd0 Pull Request resolved: #133495

oraluben · 2025-01-09T07:49:51Z

Hi @drisspg , thanks for this amazing improvement! Looks like the triton fix has been merged into 3.1, do you have a plan to merge this?

I also manually applied the patch and evaluated a little bit (on torch 2.5), and got some feedback:

bwd seems still unsupported;

The checks should be deleted in this PR:

pytorch/torch/nn/attention/flex_attention.py

Lines 859 to 865 in 067c895

    
           if not ( 
        
               _supported_head_dim(query.size(-1)) and _supported_head_dim(value.size(-1)) 
        
           ): 
        
               raise ValueError( 
        
                   f"NYI: Currently non power of 2 embedding dimension are not supported. " 
        
                   f"Got E={query.size(-1)} and Ev={value.size(-1)}." 
        
               )

When running on Hopper, I got a new triton error seems unrelated to [BACKEND] Fix an issue with the pipeliner triton-lang/triton#4247:

python: /project/lib/Analysis/Allocation.cpp:47: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout && !srcMmaLayout.isAmpere()) && "mma -> mma layout conversion is only supported on Ampere"' failed.
Aborted (core dumped)

2.6+triton 3.2 have different triton issue, nightly torch + triton is fine

[ghstack-poisoned]

ghstack-source-id: c0fb26d Pull Request resolved: #133495

drisspg · 2025-01-10T18:13:20Z

@oraluben This one has been in stasis for a while I will update

Thanks to manman-ren who verified that triton-lang/triton#4247 fixes this issue as well. This is not currently cherry-picked into pytorch-triton. ========= COMPUTE-SANITIZER Test completed successfully! ========= ERROR SUMMARY: 0 errors ## NOTE: HMM very interestingly: If the og_headdim is a odd this works as expected. However when the og_head_dim is a multiple of 2 this segfaults here: ```Shell (lldb) bt * thread #67, name = 'pt_autograd_0', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x10) * frame #0: 0x00007ffed327fbfe libtriton.so`scheduleRemainingToLastStage(forOp=ForOp @ 0x00007ffcafdfd658, schedule=0x00007ffcafdfd9e0, afterPrologue=<unavailable>, numStages=2) at MatmulLoopPipeline.cpp:893:9 frame #1: 0x00007ffed328d970 libtriton.so`mlir::triton::preProcessLoopAndGetSchedule(forOp=0x00007ffcafdfddc0, numStages=2, options=0x00007ffcafdfde80) at MatmulLoopPipeline.cpp:1230:31 frame #2: 0x00007ffed32a6a43 libtriton.so`mlir::triton::gpu::PipelinePass::runOnOperation() [inlined] pipelineLoop(numStages=2, forOp=ForOp @ 0x00007ffcafdfddc0) at SoftwarePipeliner.cpp:79:47 frame #3: 0x00007ffed32a6998 libtriton.so`mlir::triton::gpu::PipelinePass::runOnOperation(this=0x00007ffc54767f10) at SoftwarePipeliner.cpp:125:36 frame #4: 0x00007ffed385147c libtriton.so`mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 700 frame #5: 0x00007ffed3851df2 libtriton.so`mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 354 frame #6: 0x00007ffed385481c libtriton.so`mlir::PassManager::run(mlir::Operation*) + 876 frame #7: 0x00007ffed3542bad libtriton.so`<lambda(mlir::PassManager&, mlir::ModuleOp&)>::operator(self=<unavailable>, mod=0x00007ffc54579280, __closure=<unavailable>)(mlir::PassManager &, mlir::ModuleOp &) at ir.cc:1625:19 frame #8: 0x00007ffed3560108 libtriton.so`_FUN [inlined] operator(this=0x0000000000000000, call=0x00007ffcafdfe6e0) at cast.h:1480:37 frame #9: 0x00007ffed35600f0 libtriton.so`_FUN((null)=0x00007ffcafdfe6e0) at pybind11.h:224:21 frame #10: 0x00007ffed9ae5590 libtriton.so`typeinfo for pybind11::handle + 24 frame #11: 0x00007ffed9ae5590 libtriton.so`typeinfo for pybind11::handle + 24 ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

[ghstack-poisoned]

ghstack-source-id: 57ee478 Pull Request resolved: #133495

drisspg · 2025-01-23T02:28:38Z

@pytorchbot merge

pytorchmergebot · 2025-01-23T02:31:01Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-01-23T02:36:33Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / win-vs2019-cpu-py3 / build

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

drisspg · 2025-01-23T17:03:58Z

@pytorchbot merge -f "unrelated failures"

pytorchmergebot · 2025-01-23T17:05:24Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ZainRizvi · 2025-01-23T19:52:57Z

Note: failure on trunk is unrelated to this PR. It's caused by #131303

Enable non power of 2 head_dim for FlexAttention

e6563a7

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels Aug 14, 2024

Update on "Enable non power of 2 head_dim for FlexAttention"

bcdbcfa

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

drisspg added a commit that referenced this pull request Aug 14, 2024

Enable non power of 2 head_dim for FlexAttention

05cda03

ghstack-source-id: f591ad7 Pull Request resolved: #133495

This was referenced Aug 14, 2024

FlexAttention Segmentation Fault triton-lang/triton#4521

Open

Flexattention: ValueError: Shape element 1 must be a power of 2 #133321

Closed

drisspg mentioned this pull request Aug 20, 2024

Supporting Different head dims in FlexAttention #133674

Closed

drisspg requested a review from Chillee September 7, 2024 16:32

drisspg mentioned this pull request Sep 13, 2024

Question about OOm on large sequences pytorch-labs/attention-gym#32

Closed

Update

19aee3d

[ghstack-poisoned]

drisspg added a commit that referenced this pull request Oct 2, 2024

Enable non power of 2 head_dim for FlexAttention

c890229

ghstack-source-id: d49cc53 Pull Request resolved: #133495

drisspg added the topic: not user facing topic category label Oct 2, 2024

github-actions bot added the Stale label Dec 21, 2024

Update

7b88d73

[ghstack-poisoned]

drisspg added a commit that referenced this pull request Dec 24, 2024

Enable non power of 2 head_dim for FlexAttention

499e3bd

ghstack-source-id: af8ccd0 Pull Request resolved: #133495

Update

64c5d61

[ghstack-poisoned]

drisspg added a commit that referenced this pull request Jan 10, 2025

Enable non power of 2 head_dim for FlexAttention

1a44d23

ghstack-source-id: c0fb26d Pull Request resolved: #133495

drisspg requested a review from albanD as a code owner January 15, 2025 23:46

Chillee approved these changes Jan 22, 2025

View reviewed changes

Update

055f65a

[ghstack-poisoned]

drisspg added a commit that referenced this pull request Jan 23, 2025

Enable non power of 2 head_dim for FlexAttention

9343c47

ghstack-source-id: 57ee478 Pull Request resolved: #133495

pytorchmergebot added the merging label Jan 23, 2025

pytorchmergebot removed the merging label Jan 23, 2025

pytorch-bot bot temporarily deployed to upload-benchmark-results January 23, 2025 03:00 Inactive

pytorch-bot bot temporarily deployed to upload-benchmark-results January 23, 2025 03:01 Inactive

pytorch-bot bot temporarily deployed to upload-benchmark-results January 23, 2025 03:02 Inactive

pytorchmergebot added the merging label Jan 23, 2025

pytorchmergebot closed this in c670773 Jan 23, 2025

pytorchmergebot added Merged and removed merging labels Jan 23, 2025

drisspg mentioned this pull request Feb 4, 2025

DeepSeek: MLA attention #146330

Open

github-actions bot deleted the gh/drisspg/32/head branch February 23, 2025 02:10

drisspg mentioned this pull request Mar 18, 2025

Support not-power-of-two embedding dimensions pytorch-labs/attention-gym#127

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable non power of 2 head_dim for FlexAttention #133495

Enable non power of 2 head_dim for FlexAttention #133495

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Enable non power of 2 head_dim for FlexAttention #133495

Enable non power of 2 head_dim for FlexAttention #133495

Uh oh!

Conversation

Uh oh!

Summary

Corollary

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133495

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Merge failed

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!