inductor change needed to update triton pin #107722

shunting314 · 2023-08-22T18:51:53Z

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov

[ghstack-poisoned]

pytorch-bot · 2023-08-22T18:51:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107722

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

GitHub 2023/08/28 outage

✅ No Failures

As of commit f862382 with merge base 138e289 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 4972d83 Pull Request resolved: #107722

shunting314 · 2023-08-22T20:47:32Z

The failed test:

python test/inductor/test_pattern_matcher.py -k test_mixed_mm

seems to be related to this upgrade.

We trigger an error in triton C++ code:

UNREACHABLE executed at /home/shunting/ws/triton/lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp:253!

shunting314 · 2023-08-22T22:23:12Z

Cut a triton issue for the failure in mixed mm: triton-lang/triton#2156

shunting314 · 2023-08-22T22:58:27Z

with the new triton pin,

python test/test_sparse_csr.py -k test_triton_scaled_dot_product_attention_block_size_16_cuda_bfloat16

starts to fail with error:

RuntimeError: Triton Error [CUDA]: misaligned address

I guess it may due to the test uses sparse tensor and triton may have changed it's alignment requirements.

@cpuhrsch I saw the test is added by #102095 . Do you think this is a blocking test failure?

cpuhrsch · 2023-08-22T23:17:28Z

@shunting314 - Given that it used to work and now with the moved pin fails, I'd consider this a blocking failure. It'd be good to figure out why this is breaking now with the new version of Triton. cc @amjames @pearu

shunting314 · 2023-08-22T23:28:55Z

@shunting314 - Given that it used to work and now with the moved pin fails, I'd consider this a blocking failure. It'd be good to figure out why this is breaking now with the new version of Triton. cc @amjames @pearu

by any chance the test pass a view to trition which results in an unaligned address?

jansel

failing tests?

torch/_inductor/utils.py

shunting314 · 2023-08-23T00:27:55Z

failing tests?

There are 2 failed tests mentioned above:

test_mixed_mm: should be a triton bug and I've put a standalone triton repro here: type conversion before tl.dot fails compilation triton-lang/triton#2156 . I can dig further but it may be much faster if triton team can took a look
test_triton_scaled_dot_product_attention_block_size_16_cuda_bfloat16

There are the only broken tests in CI.

torch/_inductor/triton_heuristics.py

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ghstack-source-id: 17cad06 Pull Request resolved: #107722

shunting314 · 2023-08-23T06:51:17Z

For the test failure in test_triton_scaled_dot_product_attention_block_size_16_cuda_bfloat16 , I've found the offending kernel is _bsr_softmax_kernel in torch/sparse/_triton_ops.py . @cpuhrsch , @amjames @pearu can you help create a standalone script calling that kernel with random examples? If that's a triton problem, we can cut a triton issue with your repro

cpuhrsch

If this update breaks a kernel that worked previously, why is it ok to land it?

cpuhrsch · 2023-08-23T17:56:40Z

Regardless, we can create a standalone version of the kernel to more easily reproduce this error.

shunting314 · 2023-08-24T17:42:57Z

There are 2 more test failures due to the testing environment is using an old version of trition: https://github.com/pytorch/pytorch/actions/runs/5947833455/job/16130966172 , https://github.com/pytorch/pytorch/actions/runs/5947833455/job/16130966318 . I think if we want, we can still make inductor work with older version of trition with a bit more complex code. Just not sure if we should do that or upgrading trition version in those cases instead.

EDIT: I partially fixed the BC issue by checking if CompiledKernel has num_ctas attributes. To fully fix, we also need check if triton expect the new definition of instance_descriptor.

ghstack-source-id: e8e4078 Pull Request resolved: #107722

shunting314 · 2023-08-24T18:02:12Z

For the test failure in test_triton_scaled_dot_product_attention_block_size_16_cuda_bfloat16 , I've found the offending kernel is _bsr_softmax_kernel in torch/sparse/_triton_ops.py .

I took a further look, actually the root cause of the issue is not kernel _bsr_softmax_kernel but an upstream kernel _sampled_addmm_kernel . _sampled_addmm_kernel 'corrupt' the input tensors and then cause issues in the downstream _bsr_softmax_kernel kernel.

To repro:

run

python test/test_sparse_csr.py -k test_triton_scaled_dot_product_attention_block_size_16_cuda_bfloat16

with the following breakpionts set:

diff --git a/torch/sparse/_triton_ops.py b/torch/sparse/_triton_ops.py
index 57c9ac0168a..80cf3ff6e0f 100644
--- a/torch/sparse/_triton_ops.py
+++ b/torch/sparse/_triton_ops.py
@@ -539,6 +539,7 @@ if _has_triton():
             allow_tf32 = False

         def kernel(grid, *sliced_tensors):
+            breakpoint() # TODO
             _sampled_addmm_kernel[grid](
                 alpha, beta, is_beta_zero,
                 *blocksize, k, tile_k,
@@ -548,6 +549,7 @@ if _has_triton():
                 num_stages=1,
                 num_warps=4
             )
+            breakpoint() # TODO

         launch_kernel(kernel, tensor_dims_map, full_grid, grid_blocks)

at the first breakpoint, we are able to print(sliced_tensors). but at the second breakpoint, print the same tensors (which has been corrupted) will result in:

(Pdb) sliced_tensors
*** RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

shunting314 · 2023-08-25T18:00:57Z

The perf test looks mostly neutral link, although

torchbench default slow down from 1.19x to 1.16x . But torchbench with cudagraphs is neutral
4 models in TB get network issues and fail to run.

I'll rerun the perf tests.

Edit:
New perf test link

same conclusion as above. Except one more timm model pass. It's failed previous due to 'two_eager_run_differ'. I think it's not related to the upgrade (even if it's a nice thing), but due to flakiness.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ghstack-source-id: 5d421f7 Pull Request resolved: #107722

shunting314 · 2023-08-28T23:09:53Z

Split the pin update to a separate PR per @shintaro-iwasaki 's request to make FBCode side testing easier.

jansel · 2023-08-29T02:03:27Z

torch/_inductor/codegen/triton_utils.py

@@ -49,8 +49,25 @@ def is_aligned(x):
                return V.graph.sizevars.statically_known_multiple_of(x.expr, ALIGNMENT)
        raise NotImplementedError(f"unhandled {type(x)}: {x}")

+    def is_aligned_8(x):


could this share more code with is_aligned?

yea, I'll do that in a follow up PR to make cherry-picking this one earlier.

shunting314 · 2023-08-29T04:29:53Z

@pytorchbot merge

pytorchmergebot · 2023-08-29T04:31:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ghstack-source-id: 5d421f7 Pull Request resolved: #107722

Jokeren · 2023-08-29T13:52:51Z

torch/_inductor/kernel/mm_common.py

-    n = max(next_power_of_2(V.graph.sizevars.size_hint(n)), 16)
-    k = max(next_power_of_2(V.graph.sizevars.size_hint(k)), 16)
+
+    # According to https://github.com/openai/triton/issues/2156#issuecomment-1695897424


BTW, it could be 16x32 if you want to try to improve perf a bit :).

So that means m can be 16 but n and k have to be at lease 32 for int8?
Since we have tl.tensor with shape [m, k], [k, n], [m, n] in the triton kernel.

So that means m can be 16 but n and k have to be at lease 32 for int8?

k must be >= 32 but n and m can be >= 16 if both a and b in axb are not transposed.

Resolve comment: #107722 (comment) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Resolve comment: #107722 (comment) Pull Request resolved: #108135 Approved by: https://github.com/jansel ghstack dependencies: #107722

ghstack-source-id: 5d421f7 Pull Request resolved: #107722

Pull Request resolved: #108104 Approved by: https://github.com/desertfire ghstack dependencies: #107722

update triton pin with needed inductor change

15e1ec6

[ghstack-poisoned]

shunting314 requested review from desertfire, Chillee, eellison, ngimel and bertmaher as code owners August 22, 2023 18:51

pytorch-bot bot added the release notes: releng release notes category label Aug 22, 2023

github-actions bot added ciflow/trunk Trigger trunk jobs on your pull request module: inductor ciflow/inductor labels Aug 22, 2023

shunting314 added a commit that referenced this pull request Aug 22, 2023

update triton pin with needed inductor change

897d54e

ghstack-source-id: 4972d83 Pull Request resolved: #107722

shunting314 mentioned this pull request Aug 22, 2023

Bump Triton version #107708

Closed

jansel requested changes Aug 23, 2023

View reviewed changes

torch/_inductor/utils.py Outdated Show resolved Hide resolved

EikanWang reviewed Aug 23, 2023

View reviewed changes

torch/_inductor/triton_heuristics.py Outdated Show resolved Hide resolved

shunting314 mentioned this pull request Aug 23, 2023

type conversion before tl.dot fails compilation triton-lang/triton#2156

Open

Update on "update triton pin with needed inductor change"

18cd81e

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

shunting314 added a commit that referenced this pull request Aug 23, 2023

update triton pin with needed inductor change

43baf79

ghstack-source-id: 17cad06 Pull Request resolved: #107722

jansel approved these changes Aug 23, 2023

View reviewed changes

cpuhrsch requested changes Aug 23, 2023

View reviewed changes

cpuhrsch approved these changes Aug 23, 2023

View reviewed changes

shunting314 added a commit that referenced this pull request Aug 24, 2023

update triton pin with needed inductor change

676f1c4

ghstack-source-id: e8e4078 Pull Request resolved: #107722

Update on "update triton pin with needed inductor change"

8e8549b

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Update on "update triton pin with needed inductor change"

f862382

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

shunting314 added a commit that referenced this pull request Aug 28, 2023

inductor change needed to update triton pin

156871c

ghstack-source-id: 5d421f7 Pull Request resolved: #107722

shunting314 mentioned this pull request Aug 28, 2023

update triton pin #108104

Closed

jansel approved these changes Aug 29, 2023

View reviewed changes

pytorchmergebot added the merging label Aug 29, 2023

pytorchmergebot added Merged and removed merging labels Aug 29, 2023

pytorchmergebot closed this in e68b3ad Aug 29, 2023

shunting314 mentioned this pull request Aug 29, 2023

inductor change needed to update triton pin #108129

Merged

shunting314 changed the title ~~update triton pin with needed inductor change~~ inductor change needed to update triton pin Aug 29, 2023

This was referenced Aug 29, 2023

[v.2.1.0] Release Tracker #108055

Closed

[inductor][easy] reuse a single is_aligned function #108135

Closed

Jokeren reviewed Aug 29, 2023

View reviewed changes

pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2023

[inductor][easy] reuse a single is_aligned function (#108135)

5e0ec03

Resolve comment: #107722 (comment) Pull Request resolved: #108135 Approved by: https://github.com/jansel ghstack dependencies: #107722

facebook-github-bot deleted the gh/shunting314/74/head branch September 1, 2023 14:24

atalman pushed a commit that referenced this pull request Sep 5, 2023

inductor change needed to update triton pin (#108129)

a49fca4

ghstack-source-id: 5d421f7 Pull Request resolved: #107722

pytorchmergebot pushed a commit that referenced this pull request Sep 8, 2023

update triton pin (#108104)

fa542cc

Pull Request resolved: #108104 Approved by: https://github.com/desertfire ghstack dependencies: #107722

chuanqi129 added a commit to chuanqi129/pytorch that referenced this pull request Sep 14, 2023

Backport pr pytorch#107722 to support Triton 2.1

afae076

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

inductor change needed to update triton pin #107722

inductor change needed to update triton pin #107722

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

inductor change needed to update triton pin #107722

inductor change needed to update triton pin #107722

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107722

❗ 1 Active SEVs

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Merge started

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!