-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Reland '[Inductor] GEMM shape padding improvements (#118522)' #125773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Relanding just the pad in a single pass portion of the pr. Not including the transpose logic: [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125773
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 5d998e2 with merge base afda668 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…'" Relanding just the pad in a single pass portion of [the pr](#118522). Not including the transpose logic: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…'" Relanding just the pad in a single pass portion of [the pr](#118522). Not including the transpose logic: This was previously accepted and reviewed. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…'" Relanding just the pad in a single pass portion of [the pr](#118522). Not including the transpose logic: This was previously accepted and reviewed. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
Test failure looks related. @kadeng are you fine with the split? |
@shunting314 I made a late night change to the bias padding that fixes a regression benchmark lol.. looks like it is causing an issue. will investigate. (before recent change I think we were padding bias when it didnt need to be, causing us to miss out on padding a mm and regression) |
…'" Relanding just the pad in a single pass portion of [the pr](#118522). Not including the transpose logic: This was previously accepted and reviewed. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
@@ -163,19 +145,34 @@ def pad_addmm( | |||
input = pad_dim(input, n_padded_length, 1) | |||
elif input.dim() == 1 and input.shape[0] != 1: | |||
input = pad_dim(input, n_padded_length, 0) | |||
elif m_padded_length != 0 and input.dim() == 2 and input.shape[0] != 1: | |||
if m_padded_length != 0 and input.dim() == 2 and input.shape[0] != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I remember: i believe the regression was because we were expanding the one D bias in a way that we would no longer hit cublas addmm. in any case - multi dimensional bias is extremely uncommon. and ive added tests here for all of the difference ways bias might be hit
…'" Relanding just the pad in a single pass portion of [the pr](#118522). Not including the transpose logic: This was previously accepted and reviewed. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…'" Relanding just the pad in a single pass portion of [the pr](#118522). Not including the transpose logic: This was previously accepted and reviewed. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…'" Relanding just the pad in a single pass portion of [the pr](#118522). Not including the transpose logic: This was previously accepted and reviewed. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…'" Relanding just the pad in a single pass portion of [the pr](#118522). Not including the transpose logic: This was previously accepted and reviewed. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
For mm inputs which are not inputs of the graph, assume that we can memory plan them in the aten.cat and exclude the padding cost in the benchmarking comparison. Technically we also have to do a small amount of 0s writing, but that should be relatively small and encompassed in the weighting of the padding time by `1.1` Pull Request resolved: #125780 Approved by: https://github.com/shunting314 ghstack dependencies: #125772, #125773
Otherwise you get an error in constant_pad_nd. Pull Request resolved: #126475 Approved by: https://github.com/huydhn ghstack dependencies: #125772, #125773, #125780
…ytorch#125773) Relanding just the pad in a single pass portion of [the pr](pytorch#118522). Not including the transpose logic: This was previously accepted and reviewed. Pull Request resolved: pytorch#125773 Approved by: https://github.com/shunting314 ghstack dependencies: pytorch#125772
For mm inputs which are not inputs of the graph, assume that we can memory plan them in the aten.cat and exclude the padding cost in the benchmarking comparison. Technically we also have to do a small amount of 0s writing, but that should be relatively small and encompassed in the weighting of the padding time by `1.1` Pull Request resolved: pytorch#125780 Approved by: https://github.com/shunting314 ghstack dependencies: pytorch#125772, pytorch#125773
Otherwise you get an error in constant_pad_nd. Pull Request resolved: pytorch#126475 Approved by: https://github.com/huydhn ghstack dependencies: pytorch#125772, pytorch#125773, pytorch#125780
Stack from ghstack (oldest at bottom):
Relanding just the pad in a single pass portion of the pr. Not including
the transpose logic:
This was previously accepted and reviewed.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang