-
Notifications
You must be signed in to change notification settings - Fork 24.3k
[Inductor] Add fused_attention pattern matcher with additional clone #108141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108141
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 8eb66b8 with merge base 60bb02a ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
637d24a
to
8eb66b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add remove_extra_clones
call to joint_graph_passes
the similar way as it is invoked in the post grad pass? This way, we don't need to add extra patterns to match the redundant clones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgong5 the clone removal pass has subtle correctness conditions, so it might be better to do in post grad. we could look into doing it in the joint but I don't know if it should block this pr.
Oh, I didn't realize it. Yes, let's get this PR in first then. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…108141) A previous PR #106274 decomposes `aten.dropout` and would create a `clone()` when `eval()` or `p=0`. This makes many SDPA-related models fail to match fused_attention pattern matchers. This PR adds new fused_attention pattern matchers with an additional clone to re-enable the SDPA op matching. Pull Request resolved: #108141 Approved by: https://github.com/jgong5, https://github.com/eellison
…ytorch#108141) A previous PR pytorch#106274 decomposes `aten.dropout` and would create a `clone()` when `eval()` or `p=0`. This makes many SDPA-related models fail to match fused_attention pattern matchers. This PR adds new fused_attention pattern matchers with an additional clone to re-enable the SDPA op matching. Pull Request resolved: pytorch#108141 Approved by: https://github.com/jgong5, https://github.com/eellison
…108141) (#108327) A previous PR #106274 decomposes `aten.dropout` and would create a `clone()` when `eval()` or `p=0`. This makes many SDPA-related models fail to match fused_attention pattern matchers. This PR adds new fused_attention pattern matchers with an additional clone to re-enable the SDPA op matching. Pull Request resolved: #108141 Approved by: https://github.com/jgong5, https://github.com/eellison
When dropout is traced in inference, it creates a clone() instead of training pattern of rand() etc. This was partially addressed by manually #108141, however that did not cover all of the patterns that included dropout, and there is no reason we should have to specify them manually. This updates the inference patterns generated to trace with dropout_p = 0.0. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
…0, cleanup" When dropout is traced in inference, it creates a clone() instead of training pattern of rand() etc. This was partially addressed by manually #108141, however that did not cover all of the patterns that included dropout, and there is no reason we should have to specify them manually. This updates the inference patterns generated to trace with dropout_p = 0.0. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
When dropout is traced in inference, it creates a clone() instead of training pattern of rand() etc. This was partially addressed by manually #108141, however that did not cover all of the patterns that included dropout, and there is no reason we should have to specify them manually. This updates the inference patterns generated to trace with dropout_p = 0.0. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
…0, cleanup" When dropout is traced in inference, it creates a clone() instead of training pattern of rand() etc. This was partially addressed by manually #108141, however that did not cover all of the patterns that included dropout, and there is no reason we should have to specify them manually. This updates the inference patterns generated to trace with dropout_p = 0.0. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
When dropout is traced in inference, it creates a clone() instead of training pattern of rand() etc. This was partially addressed by manually #108141, however that did not cover all of the patterns that included dropout, and there is no reason we should have to specify them manually. This updates the inference patterns generated to trace with dropout_p = 0.0. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
When dropout is traced in inference, it creates a clone() instead of training pattern of rand() etc. This was partially addressed by manually #108141, however that did not cover all of the patterns that included dropout, and there is no reason we should have to specify them manually. This updates the inference patterns generated to trace with dropout_p = 0.0. Pull Request resolved: #109118 Approved by: https://github.com/drisspg, https://github.com/Valentine233
A previous PR #106274 decomposes
aten.dropout
and would create aclone()
wheneval()
orp=0
. This makes many SDPA-related models fail to match fused_attention pattern matchers.This PR adds new fused_attention pattern matchers with an additional clone to re-enable the SDPA op matching.
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov