[inductor] fix MA on poor gpu #145133

shunting314 · 2025-01-18T01:31:00Z

Stack from ghstack (oldest at bottom):

Found this bug when debugging a MA issue in CI that can not be repro-ed on devgpu.

On GPU with less than 68 SMs (like NVidia L4 used in CI), running torch compile in max-autotune mode may result in the following confusing error https://gist.github.com/shunting314/370f42f547e3367a3773237942725a86 complaining about layout:

torch._inductor.exc.InductorError: LoweringException: AssertionError: convert FlexibleLayout to FixedLayout first

The reason is, even if we don't pick Triton template, Inductor still returns a MultiTemplateBuffer for tuned addmm. MultiTemplateBuffer.get_reads called from Reduction.num_splits may indexing a FlexibleLayout which results in the error aforementioned.

The issue does not appear on devgpu because we freeze the layout of addmm inputs when rendering triton templates.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2025-01-18T01:31:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145133

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 69006be with merge base 0f051ea ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, linux.2xlarge) (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

shunting314 · 2025-01-21T07:18:42Z

@pytorchbot merge

pytorchmergebot · 2025-01-21T07:20:48Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Found this bug when debugging a MA issue in CI that can not be repro-ed on devgpu. On GPU with less than 68 SMs (like NVidia L4 used in CI), running torch compile in max-autotune mode may result in the following confusing error https://gist.github.com/shunting314/370f42f547e3367a3773237942725a86 complaining about layout: ``` torch._inductor.exc.InductorError: LoweringException: AssertionError: convert FlexibleLayout to FixedLayout first ``` The reason is, even if we don't pick Triton template, Inductor still returns a MultiTemplateBuffer for tuned addmm. MultiTemplateBuffer.get_reads called from Reduction.num_splits may indexing a FlexibleLayout which results in the error aforementioned. The issue does not appear on devgpu because we freeze the layout of addmm inputs when rendering triton templates. Pull Request resolved: pytorch#145133 Approved by: https://github.com/jansel

[inductor] fix MA on poor gpu

69006be

[ghstack-poisoned]

shunting314 mentioned this pull request Jan 18, 2025

[Inductor] inplace padding #140249

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Jan 18, 2025

shunting314 requested review from eellison and jansel January 18, 2025 01:37

jansel approved these changes Jan 18, 2025

View reviewed changes

shunting314 added the topic: not user facing topic category label Jan 21, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 21, 2025

pytorchmergebot added the merging label Jan 21, 2025

pytorchmergebot added the Merged label Jan 21, 2025

pytorchmergebot closed this in 803017f Jan 21, 2025

pytorchmergebot removed the merging label Jan 21, 2025

github-actions bot deleted the gh/shunting314/192/head branch February 21, 2025 02:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] fix MA on poor gpu #145133

[inductor] fix MA on poor gpu #145133

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[inductor] fix MA on poor gpu #145133

[inductor] fix MA on poor gpu #145133

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145133

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!