8000 Incorporate coalesce analysis in codegen by eellison · Pull Request #153751 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Incorporate coalesce analysis in codegen #153751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: gh/eellison/793/base
Choose a base branch
from

Conversation

eellison
Copy link
Contributor
@eellison eellison commented May 16, 2025

Stack from ghstack (oldest at bottom):

This pr uses the coalescing information in generating a tiling. The previous tiling heuristic would have each dependency generate a tiling. Then, we sum up the score for each generated tiling, preferring any 2d tiling over the default. The new tiling heuristics scores each tiling by its global coalesced memory. This gives both a potentially better tiling (especially for more complicated, 3d patterns) as well as information we can use in generating block sizes.

In triton heuristics, for generating 3d tiled reductions, we take the same total block size that the 2d reduction would use, then distribute the block according to whichever block coalesces the most memory.

The motivating kernel is in #149982 which is a 32 element reduction. A smaller version of it is here. We need to run this kernel once in the forward per linear layer on a contiguous tensor, and once in the backward on a transposed tensor.

While the contiguous kernel has coalesced accesses, and is performant on master, the transposed version accesses uncoalesced memory on main and is ~2.8x slower. See, this full log from the above repro. Now, with this PR, it is only ~1.15x slower. See the updated log.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]
Copy link
pytorch-bot bot commented May 16, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153751

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 17 New Failures, 2 Unrelated Failures

As of commit 948321e with merge base 5d316ce (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eellison added a commit that referenced this pull request May 16, 2025
ghstack-source-id: 56a5c66
Pull Request resolved: #153751
Copy link
Contributor

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

[ghstack-poisoned]
eellison added a commit that referenced this pull request May 16, 2025
ghstack-source-id: a0985f1
Pull Request resolved: #153751
[ghstack-poisoned]
eellison added a commit that referenced this pull request May 16, 2025
ghstack-source-id: b6a6026
Pull Request resolved: #153751
@eellison eellison requested review from jansel and blaine-rister and removed request for jansel and blaine-rister May 16, 2025 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0