[Inductor] Adjust boundary checking of dimensions using YBLOCK #149504

kundaMwiza · 2025-03-19T11:46:50Z

Apply the same logic introduced in #139751 to triton kernels using block ptrs. Here, if ynumel / YBLOCK > max_y_grids, dimensions dependent on YBLOCK need to be boundary checked, even if the block shape in such dimensions is a multiple of an expression in YBLOCK. This is because ynumel / YBLOCK % get_max_y_grids() may not be zero, so redundant programs will be launched that will attempt to read / write OOB.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

pytorch-bot · 2025-03-19T11:46:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149504

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 85c0178 with merge base 6c2c527 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4) (gh) (disabled by #139011 but the issue was closed recently and a rebase is needed to make it pass)
distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kundaMwiza · 2025-03-19T11:47:55Z

@pytorchbot label "topic: not user facing"

eellison · 2025-04-15T19:18:55Z

Sorry for delay. cc @blaine-rister who has added a lot of the block ptrs logic, mind taking this one ?

blaine-rister · 2025-04-22T22:52:06Z

Nit: Did you mean to leave "Fixes #ISSUE_NUMBER" in the PR description? You can delete that line if there's no issue to reference.

blaine-rister · 2025-04-22T22:57:38Z

test/inductor/test_torchinductor_strided_blocks.py

+            # if numel not a block multiple, must boundary check
+            # skip triton_cpu very slow test > 1000s
+            subtest([False, False], decorators=[test_torchinductor.skip_if_triton_cpu])
+            # TODO: test zdim too


Looks like you meant to add another test here.

test/inductor/test_torchinductor_strided_blocks.py

torch/_inductor/codegen/triton.py

blaine-rister · 2025-04-22T23:04:22Z

torch/_inductor/codegen/triton.py

+            # See Note: Constant mask optimisation
+            # if ynumel / YBLOCK > max_ygrid, then the z dimension is used to handle
+            # the remaining programs that cannot fit into the y dimension. This means
+            # its possible that some redundant programs are launched, so even if


nit: is redundancy the issue here, or is it that we could end up with out-of-bounds accesses?

Its the latter, OOB accesses due to additional programs being launched. I've edited the comment.

blaine-rister · 2025-04-22T23:17:31Z

torch/_inductor/codegen/triton.py

+        boundary_check = []
+        overflow_grid_check = None
+        for idx in range(len(self.shape)):
+            # See Note: Constant mask optimisation
+            # if ynumel / YBLOCK > max_ygrid, then the z dimension is used to handle
+            # the remaining programs that cannot fit into the y dimension. This means
+            # its possible that some redundant programs are launched, so even if
+            # ynumel divides YBLOCK, boundary checking is required in the relevant dimensions
+            if (
+                TritonSymbols.block_sizes[SymT.YBLOCK]
+                in self.block_shape[idx].free_symbols
+            ):
+                if overflow_grid_check is None:
+                    y_tree: IterationRangesRoot = next(
+                        t for t in range_trees if t.prefix == "y"
+                    )
+                    overflow_grid_check = (
+                        not y_tree.has_zdim
+                        and not V.graph.sizevars.statically_known_leq(
+                            y_tree.numel, get_max_y_grid()
+                        )
+                    )
+
            if (
                not sizevars.statically_known_equals(self.strides[idx], sympy.S.Zero)
-                and not sizevars.statically_known_multiple_of(
-                    self.shape[idx], self.block_shape[idx]
-                )
-                and not sizevars.statically_known_multiple_of(
-                    self.shape[idx], sympy_subs(self.block_shape[idx], block_to_max)
+                and (
+                    overflow_grid_check
+                    or (
+                        not sizevars.statically_known_multiple_of(
+                            self.shape[idx], self.block_shape[idx]
+                        )
+                        and not sizevars.statically_known_multiple_of(
+                            self.shape[idx],
+                            sympy_subs(self.block_shape[idx], block_to_max),
+                        )
+                    )


It looks like there's some existing logic to compute this. Since this is fairly complex, could we call the same helper here? Something like this should work.

needs_overflow_grid = any(map(self.needs_yz_grid_overflow, self.range_trees)) self._boundary_check = [ idx for idx in range(len(self.shape)) if ( not sizevars.statically_known_equals(self.strides[idx], sympy.S.Zero) and not sizevars.statically_known_multiple_of( self.shape[idx], self.block_shape[idx] ) and not sizevars.statically_known_multiple_of( self.shape[idx], sympy_subs(self.block_shape[idx], block_to_max) ) and not ( V.kernel.no_x_dim and self.block_shape[idx] == TritonSymbols.block_sizes[SymT.XBLOCK] ) and not ( needs_overflow_grid and self.block_shape[idx] != TritonSymbols.block_sizes[SymT.XBLOCK] ) ) ]

blaine-rister · 2025-04-22T23:26:52Z

torch/_inductor/codegen/triton.py

+                in self.block_shape[idx].free_symbols
+            ):
+                if overflow_grid_check is None:
+                    y_tree: IterationRangesRoot = next(


It seems like this is initialized to None, but then inside the loop, it becomes either True or False. Does the value depend on idx? Or would it be equivalent to compute this outside the loop? See the suggestion below.

blaine-rister

This is a good fix and I think it's pretty close to being approved. A couple things I'd like to see before approving:

Resolving the TODO by adding a test for 3D tiling.
I left a few comments about possibly cleaning up the boundary check logic.

kundaMwiza · 2025-04-28T17:26:21Z

torch/_inductor/codegen/triton.py

-                and not sizevars.statically_known_multiple_of(
-                    self.shape[idx], sympy_subs(self.block_shape[idx], block_to_max)
+                and (
+                    (


@blaine-rister I've updated the code here. If needs_overflow_grid and the current dimension is based on YBLOCK, there is no need to check divisibility of the input shape by the block shape, since even if it is divisible, if excess programs are launched then the OOB case is reached.

kundaMwiza · 2025-04-28T17:27:50Z

@blaine-rister Thank you for the review. I've updated the PR, can you please have a look?

blaine-rister

LGTM! Thanks for the fix and very thorough testing.

kundaMwiza · 2025-05-15T13:35:36Z

@pytorchbot merge

pytorch-bot · 2025-05-15T13:35:41Z

Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra.

…overflow

Co-authored-by: blaine-rister <145300525+blaine-rister@users.noreply.github.com>

kundaMwiza · 2025-05-21T20:58:34Z

@blaine-rister I think you need to approve the workflows / merge

kundaMwiza · 2025-06-20T14:37:11Z

@pytorchbot merge

pytorchmergebot · 2025-06-20T14:39:01Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added the module: inductor label Mar 19, 2025

pytorch-bot bot added the topic: not user facing topic category label Mar 19, 2025

pytorchbot added the open source label Mar 19, 2025

kundaMwiza force-pushed the mwizak/handle-ygrid-overflow-if-block-ptr branch 3 times, most recently from 5553e41 to f65cb36 Compare March 20, 2025 10:59

bdhirsh requested review from eellison and shunting314 March 24, 2025 14:30

colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 24, 2025

eellison requested a review from blaine-rister April 15, 2025 19:19

eellison removed their request for review April 22, 2025 19:25

blaine-rister reviewed Apr 22, 2025

View reviewed changes

test/inductor/test_torchinductor_strided_blocks.py Outdated Show resolved Hide resolved

blaine-rister reviewed Apr 22, 2025

View reviewed changes

test/inductor/test_torchinductor_strided_blocks.py Outdated Show resolved Hide resolved

blaine-rister reviewed Apr 22, 2025

View reviewed changes

test/inductor/test_torchinductor_strided_blocks.py Outdated Show resolved Hide resolved

blaine-rister reviewed Apr 22, 2025

View reviewed changes

torch/_inductor/codegen/triton.py Outdated Show resolved Hide resolved

blaine-rister reviewed Apr 22, 2025

View reviewed changes

blaine-rister requested changes Apr 22, 2025

View reviewed changes

kundaMwiza force-pushed the mwizak/handle-ygrid-overflow-if-block-ptr branch from 8b01e19 to 565b66a Compare April 28, 2025 17:17

kundaMwiza commented Apr 28, 2025

View reviewed changes

kundaMwiza requested a review from blaine-rister April 28, 2025 17:27

blaine-rister approved these changes May 10, 2025

View reviewed changes

kundaMwiza and others added 5 commits May 21, 2025 20:55

Boundary check dimensions using YBLOCK if there is a chance of ygrid …

490580d

…overflow

Add basic unit test for boundary checks. Add missing Note

b6e0dca

Use assert methods for consistency

339300c

Co-authored-by: blaine-rister <145300525+blaine-rister@users.noreply.github.com>

Address review comments

4c995d2

Comment on custom inductor choices class

85c0178

kundaMwiza force-pushed the mwizak/handle-ygrid-overflow-if-block-ptr branch from c639a7a to 85c0178 Compare May 21, 2025 20:56

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 20, 2025

pytorchmergebot added the merging label Jun 20, 2025

pytorchmergebot added the Merged label Jun 20, 2025

pytorchmergebot closed this in e31f205 Jun 20, 2025

pytorchmergebot removed the merging label Jun 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inductor] Adjust boundary checking of dimensions using YBLOCK #149504

[Inductor] Adjust boundary checking of dimensions using YBLOCK #149504

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Inductor] Adjust boundary checking of dimensions using YBLOCK #149504

[Inductor] Adjust boundary checking of dimensions using YBLOCK #149504

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149504

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!