8000 [Cutlass] Import cutlass python API for EVT by mlazos · Pull Request #150344 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[Cutlass] Import cutlass python API for EVT #150344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

Copy link
pytorch-bot bot commented Mar 31, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150344

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1164f59 with merge base c9aef50 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@henrylhtsang
Copy link
Contributor

can you insert a pdb.set_trace in the except clause, and then run

python -x test/inductor/test_cutlass_backend.py

to see if that works?

@henrylhtsang
Copy link
Contributor

Also can you add a unit test to check for the import? This can help when we update the cutlass version

mlazos added a commit that referenced this pull request Apr 5, 2025
ghstack-source-id: 7d6b727
Pull Request resolved: #150344
mlazos added a commit that referenced this pull request Apr 9, 2025
ghstack-source-id: 7d6b727
Pull Request resolved: #150344
mlazos added a commit that referenced this pull request Apr 9, 2025
ghstack-source-id: 7d6b727
Pull Request resolved: #150344
This imports the pieces of the cutlass python API that are needed for python EVT tracing. It builds on existing importing for cutlass_library. Once EVT tracing has been added to cutlass_library (should be later this year) this can be removed.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
@henrylhtsang
Copy link
Contributor

umm I checkout your commit, and test with

python test/inductor/test_cutlass_backend.py -k test_import_cutlass 

and it didn't work.

mlazos added a commit that referenced this pull request Apr 9, 2025
ghstack-source-id: f0f327a
Pull Request resolved: #150344
This imports the pieces of the cutlass python API that are needed for python EVT tracing. It builds on existing importing for cutlass_library. Once EVT tracing has been added to cutlass_library (should be later this year) this can be removed.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 9, 2025
ghstack-source-id: c723dc8
Pull Request resolved: #150344
This imports the pieces of the cutlass python API that are needed for python EVT tracing. It builds on existing importing for cutlass_library. Once EVT tracing has been added to cutlass_library (should be later this year) this can be removed.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 10, 2025
ghstack-source-id: 780a382
Pull Request resolved: #150344
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
mlazos added a commit that referenced this pull request Apr 17, 2025
…ter"

This implements epilogue visitor tree argument generation (example type [here](https://github.com/NVIDIA/cutlass/blob/3fe62887d8dd75700fdaf57f9c181878701b0802/include/cutlass/epilogue/fusion/sm90_callbacks_tma_warpspecialized.hpp#L332)).

Details:
The codegen task here is to implement a function which can generate a tree of C++ structs and properly extract the correct properties from Inductor buffers and write them to the correct locations in the generated struct. To implement this with the minimum amount of code, I generate the cutlass DAGIR (the EVT internal represenation) which specifically has a pass, [pass_argument_type.py ](https://github.com/NVIDIA/cutlass/blob/5e497243f7ad13a2aa842143f9b10bbb23d98292/python/cutlass/backend/evt/passes/pass_argument_type.py#L4) which generates a nested tree of custom argument types for each node in the DAGIR. This nested tree of constructors is then passed kwargs to fill in the proper values, where the node's name is used to differentiate between different values in the kwarg dictionary. This however is non-customizable; the nested tree of EVT args is a nested tree of ctypes which looks for *actual values* so that this object can be passed directly to the cutlass-python C++ runner. Inductor on the other hand needs to fill this struct with string C++ expressions representing the values (or extracting the values from kernel launcher args). So `_render_argument_type` implements this: it iterates over the tree of types created by pass_argument_type.py and generates a string representing the nested structs, filling in C++ expressions representing the different fields. 

Long term plan:
Long term, I will ask the nvidia to provide an overridable [visitor_factory](https://github.com/NVIDIA/cutlass/blob/5e497243f7ad13a2aa842143f9b10bbb23d98292/python/cutlass/backend/evt/passes/pass_argument_type.py#L82) which could allow us to override the behavior of pass_argument_type.py to generate the string we would like during DAGIR generation. 



Previously merged:
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 17, 2025
This implements epilogue visitor tree argument generation (example type [here](https://github.com/NVIDIA/cutlass/blob/3fe62887d8dd75700fdaf57f9c181878701b0802/include/cutlass/epilogue/fusion/sm90_callbacks_tma_warpspecialized.hpp#L332)).

Details:
The codegen task here is to implement a function which can generate a tree of C++ structs and properly extract the correct properties from Inductor buffers and write them to the correct locations in the generated struct. To implement this with the minimum amount of code, I generate the cutlass DAGIR (the EVT internal represenation) which specifically has a pass, [pass_argument_type.py ](https://github.com/NVIDIA/cutlass/blob/5e497243f7ad13a2aa842143f9b10bbb23d98292/python/cutlass/backend/evt/passes/pass_argument_type.py#L4) which generates a nested tree of custom argument types for each node in the DAGIR. This nested tree of constructors is then passed kwargs to fill in the proper values, where the node's name is used to differentiate between different values in the kwarg dictionary. This however is non-customizable; the nested tree of EVT args is a nested tree of ctypes which looks for *actual values* so that this object can be passed directly to the cutlass-python C++ runner. Inductor on the other hand needs to fill this struct with string C++ expressions representing the values (or extracting the values from kernel launcher args). So `_render_argument_type` implements this: it iterates over the tree of types created by pass_argument_type.py and generates a string representing the nested structs, filling in C++ expressions representing the different fields. 

Long term plan:
Long term, I will ask the nvidia to provide an overridable [visitor_factory](https://github.com/NVIDIA/cutlass/blob/5e497243f7ad13a2aa842143f9b10bbb23d98292/python/cutlass/backend/evt/passes/pass_argument_type.py#L82) which could allow us to override the behavior of pass_argument_type.py to generate the string we would like during DAGIR generation. 



Previously merged:
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Apr 17, 2025
This implements epilogue visitor tree argument generation (example type [here](https://github.com/NVIDIA/cutlass/blob/3fe62887d8dd75700fdaf57f9c181878701b0802/include/cutlass/epilogue/fusion/sm90_callbacks_tma_warpspecialized.hpp#L332)).

Details:
The codegen task here is to implement a function which can generate a tree of C++ structs and properly extract the correct properties from Inductor buffers and write them to the correct locations in the generated struct. To implement this with the minimum amount of code, I generate the cutlass DAGIR (the EVT internal represenation) which specifically has a pass, [pass_argument_type.py ](https://github.com/NVIDIA/cutlass/blob/5e497243f7ad13a2aa842143f9b10bbb23d98292/python/cutlass/backend/evt/passes/pass_argument_type.py#L4) which generates a nested tree of custom argument types for each node in the DAGIR. This nested tree of constructors is then passed kwargs to fill in the proper values, where the node's name is used to differentiate between different values in the kwarg dictionary. This however is non-customizable; the nested tree of EVT args is a nested tree of ctypes which looks for *actual values* so that this object can be passed directly to the cutlass-python C++ runner. Inductor on the other hand needs to fill this struct with string C++ expressions representing the values (or extracting the values from kernel launcher args). So `_render_argument_type` implements this: it iterates over the tree of types created by pass_argument_type.py and generates a string representing the nested structs, filling in C++ expressions representing the different fields.

Long term plan:
Long term, I will ask the nvidia to provide an overridable [visitor_factory](https://github.com/NVIDIA/cutlass/blob/5e497243f7ad13a2aa842143f9b10bbb23d98292/python/cutlass/backend/evt/passes/pass_argument_type.py#L82) which could allow us to override the behavior of pass_argument_type.py to generate the string we would like during DAGIR generation.

Previously merged:
* #150346
* #150345
* #150344

Pull Request resolved: #150903
Approved by: https://github.com/henrylhtsang, https://github.com/eellison
mlazos added a commit that referenced this pull request Apr 17, 2025
…ation"

This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++.


udpates to example tensor creation

Previously merged:
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 17, 2025
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++.


udpates to example tensor creation

Previously merged:
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 18, 2025
…ation"

This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++.


udpates to example tensor creation

Previously merged:
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 18, 2025
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++.


udpates to example tensor creation

Previously merged:
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
Divigroup-RAP pushed a commit to Divigroup-RAP/PYTORCH that referenced this pull request Apr 22, 2025
mlazos added a commit that referenced this pull request Apr 22, 2025
…ation"

This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++.


udpates to example tensor creation

Previously merged:
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 22, 2025
629A
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++.


udpates to example tensor creation

Previously merged:
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Apr 23, 2025
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++.

udpates to example tensor creation

Previously merged:
* #150903
* #150346
* #150345
* #150344

Pull Request resolved: #150904
Approved by: https://github.com/eellison
mlazos added a commit that referenced this pull request Apr 25, 2025
…ation"

This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++.


udpates to example tensor creation

Previously merged:
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 25, 2025
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++.


udpates to example tensor creation

Previously merged:
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Apr 25, 2025
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++.

udpates to example tensor creation

Previously merged:
* #150903
* #150346
* #150345
* #150344

Pull Request resolved: #150904
Approved by: https://github.com/eellison
rec pushed a commit to rec/pytorch that referenced this pull request Apr 25, 2025
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++.

udpates to example tensor creation

Previously merged:
* pytorch#150903
* pytorch#150346
* pytorch#150345
* pytorch#150344

Pull Request resolved: pytorch#150904
Approved by: https://github.com/eellison
mlazos added a commit that referenced this pull request Apr 28, 2025
…or python codegen"

This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. 

Details:
The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. 


Previously merged:
* #150904
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 28, 2025
This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. 

Details:
The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. 


Previously merged:
* #150904
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 28, 2025
…or python codegen"

This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. 

Details:
The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. 


Previously merged:
* #150904
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 28, 2025
This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. 

Details:
The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. 


Previously merged:
* #150904
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 28, 2025
…or python codegen"

This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. 

Details:
The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. 


Previously merged:
* #150904
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 28, 2025
This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. 

Details:
The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. 


Previously merged:
* #150904
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 29, 2025
…or python codegen"

This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. 

Details:
The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. 


Previously merged:
* #150904
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 29, 2025
This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. 

Details:
The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. 


Previously merged:
* #150904
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 29, 2025
…or python codegen"

This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. 

Details:
The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. 


Previously merged:
* #150904
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mlazos added a commit that referenced this pull request Apr 29, 2025
This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. 

Details:
The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. 


Previously merged:
* #150904
* #150903
* #150346
* #150345
* #150344

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Apr 29, 2025
This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra.

Details:
The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise.

Previously merged:
* #150904
* #150903
* #150346
* #150345
* #150344

Pull Request resolved: #150905
Approved by: https://github.com/eellison
ghstack dependencies: #152305, #152306
@github-actions github-actions bot deleted the gh/mlazos/18/head branch May 25, 2025 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0