update pointwise cat heuristics #125772

eellison · 2024-05-08T18:19:01Z

Stack from ghstack (oldest at bottom):

Fix for #122871. There are two cases where we emit pointwise cat:

fusing into a pointwise use
horizontally fusing copy_ kernels

The regression I looked into previously was due to being overly aggressive in the latter case. I've updated the logic there so that we only emit the horizontal fusion in the case where there are not reductions.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-05-08T18:19:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125772

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit fb2a38f with merge base 3ccf107 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Chillee

Yay!

Perhaps some more tests for the pointwise cat?

eellison · 2024-05-08T20:03:22Z

There are already a bunch but i can add a couple more.

Fix for #122871. There are two cases where we emit pointwise cat: - fusing into a pointwise use - horizontally fusing copy_ kernels The regression I looked into previously was due to being overly aggressive in the latter case. I've updated the logic there so that we only emit the horizontal fusion in the case that we would have to emit separate copy_ kernels anyway. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

eellison · 2024-05-09T21:15:06Z

@pytorchbot merge

pytorchmergebot · 2024-05-09T21:17:52Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

eellison · 2024-05-10T00:39:27Z

@pytorchbot merge

pytorchmergebot · 2024-05-10T00:41:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

izaitsevfb · 2024-05-11T15:25:58Z

@pytorchbot revert -m 'Fails numerical stability test for aps model, see D57215900' -c ghfirst

pytorchmergebot · 2024-05-11T15:27:35Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit d19d932. Reverted #125772 on behalf of https://github.com/izaitsevfb due to Fails numerical stability test for aps model, see D57215900 ([comment](#125772 (comment)))

pytorchmergebot · 2024-05-11T15:27:49Z

@eellison your PR has been successfully reverted.

Fix for #122871. There are two cases where we emit pointwise cat: - fusing into a pointwise use - horizontally fusing copy_ kernels The regression I looked into previously was due to being overly aggressive in the latter case. I've updated the logic there so that we only emit the horizontal fusion in the case that we would have to emit separate copy_ kernels anyway. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

eellison · 2024-05-14T01:57:35Z

@pytorchbot merge

pytorchmergebot · 2024-05-14T01:59:20Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This reverts commit d19d932. Reverted pytorch#125772 on behalf of https://github.com/izaitsevfb due to Fails numerical stability test for aps model, see D57215900 ([comment](pytorch#125772 (comment)))

Fix for pytorch#122871. There are two cases where we emit pointwise cat: - fusing into a pointwise use - horizontally fusing copy_ kernels The regression I looked into previously was due to being overly aggressive in the latter case. I've updated the logic there so that we only emit the horizontal fusion in the case where there are not reductions. Pull Request resolved: pytorch#125772 Approved by: https://github.com/Chillee

Relanding just the pad in a single pass portion of [the pr](#118522). Not including the transpose logic: This was previously accepted and reviewed. Pull Request resolved: #125773 Approved by: https://github.com/shunting314 ghstack dependencies: #125772

For mm inputs which are not inputs of the graph, assume that we can memory plan them in the aten.cat and exclude the padding cost in the benchmarking comparison. Technically we also have to do a small amount of 0s writing, but that should be relatively small and encompassed in the weighting of the padding time by `1.1` Pull Request resolved: #125780 Approved by: https://github.com/shunting314 ghstack dependencies: #125772, #125773

Otherwise you get an error in constant_pad_nd. Pull Request resolved: #126475 Approved by: https://github.com/huydhn ghstack dependencies: #125772, #125773, #125780

…ytorch#125773) Relanding just the pad in a single pass portion of [the pr](pytorch#118522). Not including the transpose logic: This was previously accepted and reviewed. Pull Request resolved: pytorch#125773 Approved by: https://github.com/shunting314 ghstack dependencies: pytorch#125772

For mm inputs which are not inputs of the graph, assume that we can memory plan them in the aten.cat and exclude the padding cost in the benchmarking comparison. Technically we also have to do a small amount of 0s writing, but that should be relatively small and encompassed in the weighting of the padding time by `1.1` Pull Request resolved: pytorch#125780 Approved by: https://github.com/shunting314 ghstack dependencies: pytorch#125772, pytorch#125773

Otherwise you get an error in constant_pad_nd. Pull Request resolved: pytorch#126475 Approved by: https://github.com/huydhn ghstack dependencies: pytorch#125772, pytorch#125773, pytorch#125780

update pointwise cat heuristics

fcbbbe2

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels May 8, 2024

This was referenced May 8, 2024

Reland '[Inductor] GEMM shape padding improvements (#118522)' #125773

Closed

Refactoring pad logic #125779

Closed

Skip padding cost of fusible/planable inputs #125780

Closed

eellison requested a review from Chillee May 8, 2024 19:44

Chillee approved these changes May 8, 2024

View reviewed changes

eellison added 2 commits May 9, 2024 11:01

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 9, 2024

pytorchmergebot added the merging label May 9, 2024

pytorchmergebot removed the merging label May 9, 2024

eellison added the topic: not user facing topic category label May 9, 2024

pytorchmergebot added the merging label May 10, 2024

pytorchmergebot closed this in d19d932 May 10, 2024

pytorchmergebot added Merged and removed merging labels May 10, 2024

pytorchmergebot added the Reverted label May 11, 2024

pytorchmergebot reopened this May 11, 2024

eellison added 2 commits May 13, 2024 12:56

pytorchmergebot added the merging label May 14, 2024

pytorchmergebot closed this in bc95877 May 14, 2024

pytorchmergebot removed the merging label May 14, 2024

eellison mentioned this pull request May 16, 2024

dont pad 0 dim mm inputs #126475

Closed

pytorchmergebot pushed a commit that referenced this pull request May 17, 2024

dont pad 0 dim mm inputs (#126475)

a8c41e0

Otherwise you get an error in constant_pad_nd. Pull Request resolved: #126475 Approved by: https://github.com/huydhn ghstack dependencies: #125772, #125773, #125780

ColinPeppler mentioned this pull request May 21, 2024

[inductor] Fix edge case in JIT vs. AOT fusion after finalizing MultiTemplateBuffer #126622

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update pointwise cat heuristics #125772

update pointwise cat heuristics #125772

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

update pointwise cat heuristics #125772

update pointwise cat heuristics #125772

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125772

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Merge failed

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!