You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[inductor][triton 3.3] Fix cpp_wrapper w/ TMA in triton 3.3 (#149973)
Fixes#148938
Context:
In triton 3.3, triton kernels expect a global scratch space arg to be passed in. This is fixed in #148051, which fixed most of the AOTI/cpp_wrapper failures; the fix is to inject a (null) global scratch space arg passed as an argument to all kernels.
But in the case of TMA, we need to call a non-triton-generated function - init1DTMADescriptor. The same `generate_args_decl` function used for calling triton kernels (and modified in #148051 to insert a global scratch space) is used to prepare the arguments to init1DTMADescriptor, and so it had an extra global scratch space arg. Then we'd get a null pointer passed into init1DTMADescriptor, resulting in an IMA later on when the TMA use kernel
This PR: adds an option to `generate_args_decl` to specify whether this is a triton kernel (in which case we should add the global scratch space arg) or not (when we shouldn't add the extra arg).
Note: this doesn't appear in CI because we don't run these tests with Hopper machines in CI.
Pull Request resolved: #149973
Approved by: https://github.com/drisspg
0 commit comments