Reland '[Inductor] GEMM shape padding improvements (#118522)' #125773

eellison · 2024-05-08T18:19:05Z

Stack from ghstack (oldest at bottom):

Relanding just the pad in a single pass portion of the pr. Not including
the transpose logic:

This was previously accepted and reviewed.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Relanding just the pad in a single pass portion of the pr. Not including the transpose logic: [ghstack-poisoned]

pytorch-bot · 2024-05-08T18:19:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125773

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 5d998e2 with merge base afda668 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor / rocm6.1-py3.8-inductor / test (inductor, 1, 1, linux.rocm.gpu.2) (gh) (similar failure)
distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_multi_group_eager

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Relanding just the pad in a single pass portion of the pr. Not including the transpose logic: ghstack-source-id: 710d916 Pull Request resolved: #125773

…'" Relanding just the pad in a single pass portion of [the pr](#118522). Not including the transpose logic: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

…'" Relanding just the pad in a single pass portion of [the pr](#118522). Not including the transpose logic: This was previously accepted and reviewed. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

shunting314 · 2024-05-10T21:17:23Z

Test failure looks related.

@kadeng are you fine with the split?

eellison · 2024-05-10T21:20:41Z

@shunting314 I made a late night change to the bias padding that fixes a regression benchmark lol.. looks like it is causing an issue. will investigate. (before recent change I think we were padding bias when it didnt need to be, causing us to miss out on padding a mm and regression)

…'" Relanding just the pad in a single pass portion of [the pr](#118522). Not including the transpose logic: This was previously accepted and reviewed. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

eellison · 2024-05-10T22:52:04Z

torch/_inductor/fx_passes/pad_mm.py

@@ -163,19 +145,34 @@ def pad_addmm(
                input = pad_dim(input, n_padded_length, 1)
            elif input.dim() == 1 and input.shape[0] != 1:
                input = pad_dim(input, n_padded_length, 0)
-        elif m_padded_length != 0 and input.dim() == 2 and input.shape[0] != 1:
+        if m_padded_length != 0 and input.dim() == 2 and input.shape[0] != 1:


Now that I remember: i believe the regression was because we were expanding the one D bias in a way that we would no longer hit cublas addmm. in any case - multi dimensional bias is extremely uncommon. and ive added tests here for all of the difference ways bias might be hit

…'" Relanding just the pad in a single pass portion of [the pr](#118522). Not including the transpose logic: This was previously accepted and reviewed. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

eellison · 2024-05-15T17:15:46Z

@pytorchbot merge

pytorchmergebot · 2024-05-15T17:18:44Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

eellison · 2024-05-15T17:31:23Z

@pytorchbot merge

pytorchmergebot · 2024-05-15T17:34:03Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

For mm inputs which are not inputs of the graph, assume that we can memory plan them in the aten.cat and exclude the padding cost in the benchmarking comparison. Technically we also have to do a small amount of 0s writing, but that should be relatively small and encompassed in the weighting of the padding time by `1.1` Pull Request resolved: #125780 Approved by: https://github.com/shunting314 ghstack dependencies: #125772, #125773

Otherwise you get an error in constant_pad_nd. Pull Request resolved: #126475 Approved by: https://github.com/huydhn ghstack dependencies: #125772, #125773, #125780

…ytorch#125773) Relanding just the pad in a single pass portion of [the pr](pytorch#118522). Not including the transpose logic: This was previously accepted and reviewed. Pull Request resolved: pytorch#125773 Approved by: https://github.com/shunting314 ghstack dependencies: pytorch#125772

For mm inputs which are not inputs of the graph, assume that we can memory plan them in the aten.cat and exclude the padding cost in the benchmarking comparison. Technically we also have to do a small amount of 0s writing, but that should be relatively small and encompassed in the weighting of the padding time by `1.1` Pull Request resolved: pytorch#125780 Approved by: https://github.com/shunting314 ghstack dependencies: pytorch#125772, pytorch#125773

Otherwise you get an error in constant_pad_nd. Pull Request resolved: pytorch#126475 Approved by: https://github.com/huydhn ghstack dependencies: pytorch#125772, pytorch#125773, pytorch#125780

Reland '[Inductor] GEMM shape padding improvements (#118522)'

6ab5a4e

Relanding just the pad in a single pass portion of the pr. Not including the transpose logic: [ghstack-poisoned]

eellison mentioned this pull request May 8, 2024

update pointwise cat heuristics #125772

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels May 8, 2024

eellison added a commit that referenced this pull request May 8, 2024

Reland '[Inductor] GEMM shape padding improvements (#118522)'

04ac1a2

Relanding just the pad in a single pass portion of the pr. Not including the transpose logic: ghstack-source-id: 710d916 Pull Request resolved: #125773

This was referenced May 8, 2024

Refactoring pad logic #125779

Closed

Skip padding cost of fusible/planable inputs #125780

Closed

eellison requested review from kadeng and shunting314 May 9, 2024 19:39

eellison added 2 commits May 9, 2024 13:05

shunting314 approved these changes May 10, 2024

View reviewed changes

eellison commented May 10, 2024

View reviewed changes

eellison added 3 commits May 13, 2024 12:42

eellison added the ciflow/trunk Trigger trunk jobs on your pull request label May 14, 2024

pytorchmergebot added the merging label May 15, 2024

pytorchmergebot removed the merging label May 15, 2024

eellison added the topic: not user facing topic category label May 15, 2024

pytorchmergebot added the merging label May 15, 2024

pytorchmergebot added the Merged label May 15, 2024

pytorchmergebot closed this in 4fb5d69 May 15, 2024

pytorchmergebot removed the merging label May 15, 2024

eellison mentioned this pull request May 16, 2024

dont pad 0 dim mm inputs #126475

Closed

pytorchmergebot pushed a commit that referenced this pull request May 17, 2024

dont pad 0 dim mm inputs (#126475)

a8c41e0

Otherwise you get an error in constant_pad_nd. Pull Request resolved: #126475 Approved by: https://github.com/huydhn ghstack dependencies: #125772, #125773, #125780

github-actions bot deleted the gh/eellison/645/head branch June 15, 2024 02:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reland '[Inductor] GEMM shape padding improvements (#118522)' #125773

Reland '[Inductor] GEMM shape padding improvements (#118522)' #125773

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reland '[Inductor] GEMM shape padding improvements (#118522)' #125773

Reland '[Inductor] GEMM shape padding improvements (#118522)' #125773

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125773

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Merge failed

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!