8000 [AOTI] Fix an unaligned memory access issue in mm_template by desertfire · Pull Request #146293 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[AOTI] Fix an unaligned memory access issue in mm_template #146293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

desertfire
Copy link
Contributor
@desertfire desertfire commented Feb 3, 2025

Summary: Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error.

Differential Revision: D69034578

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Copy link
pytorch-bot bot commented Feb 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146293

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 37f366f with merge base f397c72 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69034578

@desertfire desertfire requested a review from eellison February 3, 2025 14:22
facebook-github-bot pushed a commit that referenced this pull request Feb 3, 2025
Summary:

Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error.

Reviewed By: chenyang78

Differential Revision: D69034578
desertfire added a commit that referenced this pull request Feb 3, 2025
Summary:

Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error.

Reviewed By: chenyang78

Differential Revision: D69034578
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69034578

1 similar comment
@facebook-github-bot
Copy link
Contributor
8000

This pull request was exported from Phabricator. Differential Revision: D69034578

@desertfire
Copy link
Contributor Author

Hmm, I am not sure why the new test fails on rocm.

@desertfire desertfire added the ciflow/inductor-rocm Trigger "inductor" config CI on ROCm label Feb 3, 2025
facebook-github-bot pushed a commit that referenced this pull request Feb 3, 2025
Summary:

Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error.

Reviewed By: frank-wei, shunting314, chenyang78

Differential Revision: D69034578
desertfire added a commit that referenced this pull request Feb 3, 2025
Summary:

Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error.

Reviewed By: frank-wei, shunting314, chenyang78

Differential Revision: D69034578
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69034578

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69034578

desertfire added a commit that referenced this pull request Feb 3, 2025
Summary:
Pull Request resolved: #146293

Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error.

Reviewed By: frank-wei, shunting314, chenyang78

Differential Revision: D69034578
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 3, 2025
facebook-github-bot pushed a commit that referenced this pull request Feb 3, 2025
Summary:

Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error.

Reviewed By: frank-wei, shunting314, chenyang78

Differential Revision: D69034578
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69034578

desertfire added a commit that referenced this pull request Feb 4, 2025
Summary:

Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error.

Reviewed By: frank-wei, shunting314, chenyang78

Differential Revision: D69034578
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69034578

Summary:

Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error.

Reviewed By: frank-wei, shunting314, chenyang78

Differential Revision: D69034578
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69034578

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

desertfire added a commit that referenced this pull request Feb 5, 2025
Summary: To follow up #146293, add a JIT Inductor unit test. Other Triton template may need similar fixes.

ghstack-source-id: 407edac
Pull Request resolved: #146529
pytorch-bot bot pushed a commit that referenced this pull request Feb 6, 2025
Summary: Fixes a corner case in the Triton MM template, where the dimension M (dynamic size) can be smaller than BLOCK_M (similarly for the N dimenstion) can trigger unaligned memory access error.

Differential Revision: D69034578

Pull Request resolved: #146293
Approved by: https://github.com/chenyang78, https://github.com/jansel
desertfire added a commit that referenced this pull request Feb 6, 2025
…llowing #146293"

Summary: To follow up #146293, add a JIT Inductor unit test. Other Triton template may need similar fixes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Feb 6, 2025
Summary: To follow up #146293, add a JIT Inductor unit test. Other Triton template may need similar fixes.

ghstack-source-id: 6fdb192
Pull Request resolved: #146529
desertfire added a commit that referenced this pull request Feb 6, 2025
Summary: To follow up #146293, add a JIT Inductor unit test. Other Triton template may need similar fixes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Feb 6, 2025
Summary: To follow up #146293, add a JIT Inductor unit test. Other Triton template may need similar fixes.

Pull Request resolved: #146529
Approved by: https://github.com/eellison, https://github.com/shunting314
# The only difference between the two templates is M >= BLOCK_M and N >= BLOCK_N checking.
# See more details in https://github.com/pytorch/pytorch/pull/146293
else r"""
{{def_kernel("A", "B")}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jataylo would you look into this ?

Copy link
Collaborator
@jataylo jataylo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eellison Sure thing looks like its failing in the triton compilation stages. Would you mind creating an issue for this and assigning me?

Copy link
Collaborator
@AmdSampsa AmdSampsa Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried the unit test of this PR with latest pytorch/pytorch main and ROCm 6.3:

python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_linear_dynamic_maxautotune_cuda

MI300: PASS

MI200: FAIL : :0:rocdevice.cpp :3018: 5087874215430d us: Callback: Queue 0x7fdcb5400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016 Aborted (core dumped)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a minimal reproducible for the error.

However, when we update from triton release/3.2.x to triton-3.2.0+git8759017a (aka TOT), the error goes away. :)

So for pytorch 2.7 - which requires TOT - this bug should disappear. The UT should start working once the pytorch 2.7 <-> TOT API compatibility problems are ironed out.

Please see here for more context.

Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
Summary: To follow up #146293, add a JIT Inductor unit test. Other Triton template may need similar fixes.

Pull Request resolved: #146529
Approved by: https://github.com/eellison, https://github.com/shunting314
@github-actions github-actions bot deleted the export-D69034578 branch March 23, 2025 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0