-
Notifications
You must be signed in to change notification settings - Fork 24.3k
[AOTI] Fix an unaligned memory access issue in mm_template #146293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146293
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 37f366f with merge base f397c72 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D69034578 |
Summary: Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error. Reviewed By: chenyang78 Differential Revision: D69034578
d7b4ad1
to
c0f31f8
Compare
Summary: Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error. Reviewed By: chenyang78 Differential Revision: D69034578
c0f31f8
to
c2aa560
Compare
This pull request was exported from Phabricator. Differential Revision: D69034578 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D69034578 |
Hmm, I am not sure why the new test fails on rocm. |
Summary: Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error. Reviewed By: frank-wei, shunting314, chenyang78 Differential Revision: D69034578
c2aa560
to
50e3b01
Compare
Summary: Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error. Reviewed By: frank-wei, shunting314, chenyang78 Differential Revision: D69034578
50e3b01
to
27ee613
Compare
This pull request was exported from Phabricator. Differential Revision: D69034578 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D69034578 |
Summary: Pull Request resolved: #146293 Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error. Reviewed By: frank-wei, shunting314, chenyang78 Differential Revision: D69034578
27ee613
to
868dfc4
Compare
Summary: Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error. Reviewed By: frank-wei, shunting314, chenyang78 Differential Revision: D69034578
868dfc4
to
880f8f8
Compare
This pull request was exported from Phabricator. Differential Revision: D69034578 |
Summary: Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error. Reviewed By: frank-wei, shunting314, chenyang78 Differential Revision: D69034578
880f8f8
to
dd26b2e
Compare
This pull request was exported from Phabricator. Differential Revision: D69034578 |
Summary: Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error. Reviewed By: frank-wei, shunting314, chenyang78 Differential Revision: D69034578
dd26b2e
to
37f366f
Compare
This pull request was exported from Phabricator. Differential Revision: D69034578 |
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error. Differential Revision: D69034578 Pull Request resolved: #146293 Approved by: https://github.com/chenyang78, https://github.com/jansel
…llowing #146293" Summary: To follow up #146293, add a JIT Inductor unit test. Other Triton template may need similar fixes. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
Summary: To follow up #146293, add a JIT Inductor unit test. Other Triton template may need similar fixes. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
Summary: To follow up #146293, add a JIT Inductor unit test. Other Triton template may need similar fixes. Pull Request resolved: #146529 Approved by: https://github.com/eellison, https://github.com/shunting314
# The only difference between the two templates is M >= BLOCK_M and N >= BLOCK_N checking. | ||
# See more details in https://github.com/pytorch/pytorch/pull/146293 | ||
else r""" | ||
{{def_kernel("A", "B")}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jataylo would you look into this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eellison Sure thing looks like its failing in the triton compilation stages. Would you mind creating an issue for this and assigning me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried the unit test of this PR with latest pytorch/pytorch main and ROCm 6.3:
python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_linear_dynamic_maxautotune_cuda
MI300: PASS
MI200: FAIL : :0:rocdevice.cpp :3018: 5087874215430d us: Callback: Queue 0x7fdcb5400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016 Aborted (core dumped)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a minimal reproducible for the error.
However, when we update from triton release/3.2.x
to triton-3.2.0+git8759017a
(aka TOT), the error goes away. :)
So for pytorch 2.7 - which requires TOT - this bug should disappear. The UT should start working once the pytorch 2.7 <-> TOT API compatibility problems are ironed out.
Please see here for more context.
Summary: To follow up #146293, add a JIT Inductor unit test. Other Triton template may need similar fixes. Pull Request resolved: #146529 Approved by: https://github.com/eellison, https://github.com/shunting314
Summary: Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error.
Differential Revision: D69034578
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov