-
Notifications
You must be signed in to change notification settings - Fork 24.3k
[Cutlass] Import cutlass python API for EVT #150344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150344
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1164f59 with merge base c9aef50 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
can you insert a pdb.set_trace in the except clause, and then run
to see if that works? |
Also can you add a unit test to check for the import? This can help when we update the cutlass version |
This imports the pieces of the cutlass python API that are needed for python EVT tracing. It builds on existing importing for cutlass_library. Once EVT tracing has been added to cutlass_library (should be later this year) this can be removed. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
umm I checkout your commit, and test with
and it didn't work. |
This imports the pieces of the cutlass python API that are needed for python EVT tracing. It builds on existing importing for cutlass_library. Once EVT tracing has been added to cutlass_library (should be later this year) this can be removed. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This imports the pieces of the cutlass python API that are needed for python EVT tracing. It builds on existing importing for cutlass_library. Once EVT tracing has been added to cutlass_library (should be later this year) this can be removed. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
Previously merged: * pytorch#150344 Pull Request resolved: pytorch#150345 Approved by: https://github.com/henrylhtsang, https://github.com/eellison ghstack dependencies: pytorch#150344
Previously merged: * pytorch#150345 * pytorch#150344 Pull Request resolved: pytorch#150346 Approved by: https://github.com/henrylhtsang ghstack dependencies: pytorch#150344, pytorch#150345
…ter" This implements epilogue visitor tree argument generation (example type [here](https://github.com/NVIDIA/cutlass/blob/3fe62887d8dd75700fdaf57f9c181878701b0802/include/cutlass/epilogue/fusion/sm90_callbacks_tma_warpspecialized.hpp#L332)). Details: The codegen task here is to implement a function which can generate a tree of C++ structs and properly extract the correct properties from Inductor buffers and write them to the correct locations in the generated struct. To implement this with the minimum amount of code, I generate the cutlass DAGIR (the EVT internal represenation) which specifically has a pass, [pass_argument_type.py ](https://github.com/NVIDIA/cutlass/blob/5e497243f7ad13a2aa842143f9b10bbb23d98292/python/cutlass/backend/evt/passes/pass_argument_type.py#L4) which generates a nested tree of custom argument types for each node in the DAGIR. This nested tree of constructors is then passed kwargs to fill in the proper values, where the node's name is used to differentiate between different values in the kwarg dictionary. This however is non-customizable; the nested tree of EVT args is a nested tree of ctypes which looks for *actual values* so that this object can be passed directly to the cutlass-python C++ runner. Inductor on the other hand needs to fill this struct with string C++ expressions representing the values (or extracting the values from kernel launcher args). So `_render_argument_type` implements this: it iterates over the tree of types created by pass_argument_type.py and generates a string representing the nested structs, filling in C++ expressions representing the different fields. Long term plan: Long term, I will ask the nvidia to provide an overridable [visitor_factory](https://github.com/NVIDIA/cutlass/blob/5e497243f7ad13a2aa842143f9b10bbb23d98292/python/cutlass/backend/evt/passes/pass_argument_type.py#L82) which could allow us to override the behavior of pass_argument_type.py to generate the string we would like during DAGIR generation. Previously merged: * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This implements epilogue visitor tree argument generation (example type [here](https://github.com/NVIDIA/cutlass/blob/3fe62887d8dd75700fdaf57f9c181878701b0802/include/cutlass/epilogue/fusion/sm90_callbacks_tma_warpspecialized.hpp#L332)). Details: The codegen task here is to implement a function which can generate a tree of C++ structs and properly extract the correct properties from Inductor buffers and write them to the correct locations in the generated struct. To implement this with the minimum amount of code, I generate the cutlass DAGIR (the EVT internal represenation) which specifically has a pass, [pass_argument_type.py ](https://github.com/NVIDIA/cutlass/blob/5e497243f7ad13a2aa842143f9b10bbb23d98292/python/cutlass/backend/evt/passes/pass_argument_type.py#L4) which generates a nested tree of custom argument types for each node in the DAGIR. This nested tree of constructors is then passed kwargs to fill in the proper values, where the node's name is used to differentiate between different values in the kwarg dictionary. This however is non-customizable; the nested tree of EVT args is a nested tree of ctypes which looks for *actual values* so that this object can be passed directly to the cutlass-python C++ runner. Inductor on the other hand needs to fill this struct with string C++ expressions representing the values (or extracting the values from kernel launcher args). So `_render_argument_type` implements this: it iterates over the tree of types created by pass_argument_type.py and generates a string representing the nested structs, filling in C++ expressions representing the different fields. Long term plan: Long term, I will ask the nvidia to provide an overridable [visitor_factory](https://github.com/NVIDIA/cutlass/blob/5e497243f7ad13a2aa842143f9b10bbb23d98292/python/cutlass/backend/evt/passes/pass_argument_type.py#L82) which could allow us to override the behavior of pass_argument_type.py to generate the string we would like during DAGIR generation. Previously merged: * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This implements epilogue visitor tree argument generation (example type [here](https://github.com/NVIDIA/cutlass/blob/3fe62887d8dd75700fdaf57f9c181878701b0802/include/cutlass/epilogue/fusion/sm90_callbacks_tma_warpspecialized.hpp#L332)). Details: The codegen task here is to implement a function which can generate a tree of C++ structs and properly extract the correct properties from Inductor buffers and write them to the correct locations in the generated struct. To implement this with the minimum amount of code, I generate the cutlass DAGIR (the EVT internal represenation) which specifically has a pass, [pass_argument_type.py ](https://github.com/NVIDIA/cutlass/blob/5e497243f7ad13a2aa842143f9b10bbb23d98292/python/cutlass/backend/evt/passes/pass_argument_type.py#L4) which generates a nested tree of custom argument types for each node in the DAGIR. This nested tree of constructors is then passed kwargs to fill in the proper values, where the node's name is used to differentiate between different values in the kwarg dictionary. This however is non-customizable; the nested tree of EVT args is a nested tree of ctypes which looks for *actual values* so that this object can be passed directly to the cutlass-python C++ runner. Inductor on the other hand needs to fill this struct with string C++ expressions representing the values (or extracting the values from kernel launcher args). So `_render_argument_type` implements this: it iterates over the tree of types created by pass_argument_type.py and generates a string representing the nested structs, filling in C++ expressions representing the different fields. Long term plan: Long term, I will ask the nvidia to provide an overridable [visitor_factory](https://github.com/NVIDIA/cutlass/blob/5e497243f7ad13a2aa842143f9b10bbb23d98292/python/cutlass/backend/evt/passes/pass_argument_type.py#L82) which could allow us to override the behavior of pass_argument_type.py to generate the string we would like during DAGIR generation. Previously merged: * #150346 * #150345 * #150344 Pull Request resolved: #150903 Approved by: https://github.com/henrylhtsang, https://github.com/eellison
…ation" This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++. udpates to example tensor creation Previously merged: * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++. udpates to example tensor creation Previously merged: * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
…ation" This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++. udpates to example tensor creation Previously merged: * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++. udpates to example tensor creation Previously merged: * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
ghstack-source-id: 309a8ff Pull Request resolved: pytorch/pytorch#150344
…ation" This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++. udpates to example tensor creation Previously merged: * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++. udpates to example tensor creation Previously merged: * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++. udpates to example tensor creation Previously merged: * #150903 * #150346 * #150345 * #150344 Pull Request resolved: #150904 Approved by: https://github.com/eellison
…ation" This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++. udpates to example tensor creation Previously merged: * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++. udpates to example tensor creation Previously merged: * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++. udpates to example tensor creation Previously merged: * #150903 * #150346 * #150345 * #150344 Pull Request resolved: #150904 Approved by: https://github.com/eellison
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++. udpates to example tensor creation Previously merged: * pytorch#150903 * pytorch#150346 * pytorch#150345 * pytorch#150344 Pull Request resolved: pytorch#150904 Approved by: https://github.com/eellison
…or python codegen" This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. Details: The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. Previously merged: * #150904 * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. Details: The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. Previously merged: * #150904 * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
…or python codegen" This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. Details: The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. Previously merged: * #150904 * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. Details: The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. Previously merged: * #150904 * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
…or python codegen" This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. Details: The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. Previously merged: * #150904 * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. Details: The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. Previously merged: * #150904 * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
…or python codegen" This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. Details: The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. Previously merged: * #150904 * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. Details: The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. Previously merged: * #150904 * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
…or python codegen" This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. Details: The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. Previously merged: * #150904 * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. Details: The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. Previously merged: * #150904 * #150903 * #150346 * #150345 * #150344 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This PR implements the second codegen task of CUTLASS EVT: translating inductor epilogue nodes into python code that will be traced by the EVT infra. Details: The implementation uses a simple ops wrapper which only supports add and mul pointwise ops today (to be extended in the future). This ops wrapper generates python code from inner_fn of the epilogue nodes in the format EVT expects. The main caveat is that one of the outputs needs to be named "D" and the accumulator input needs to be named "acc". Reads/writes are named according to the inductor buffer names otherwise. Previously merged: * #150904 * #150903 * #150346 * #150345 * #150344 Pull Request resolved: #150905 Approved by: https://github.com/eellison ghstack dependencies: #152305, #152306
This imports the pieces of the cutlass python API that are needed for python EVT tracing. It builds on existing importing for cutlass_library. Once EVT tracing has been added to cutlass_library (should be later this year) this can be removed.
Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov