Description
Background
This RFC is submitted to formally describe and track enabling of SYCL support in CPP Ectension API. The initial enabling of this feature was done in response to the #132944. This RFC tries to follow a guideline process suggested in #152134 for Pytorch 2.8.
Motivation
Pytorch defines a range of standard operators to support variety of deep learning models. However, with the rapid development of the technology new operators and optimization techniques emerge. Their adoption requires implementations to mature before being accepted in a standard pytorch distribution. To both equip developers with the way to implement new operators capable to work within Pytorch ecosystem and mitigate a delay to finalize the implementation and get it accepted in a Pytorch code base, Pytorch provides CPP Extension API. At the moment Pytorch defines that API to work with C++, CUDA and Metal sources.
This RFC suggests to extend Pytorch CPP Extension API to allow building new operators for Intel GPU platforms by supporting SYCL standard and compilers. SYCL is an open standard developed by the Khronos Group that allows developers to program heterogeneous architectures in standard C++. Intel GPU software stack supports this standard and provides DPC++ compiler to build SYCL code.
Plan
Here is a check list of items to complete to add SYCL support to Pytorch CPP Extension API for Intel GPUs:
Support for Linux:
[API-Unstable]
- Add SYCL support to
torch.utils.cpp_extension.load()
API on Linux - Add SYCL support to
torch.utils.cpp_extension.load_inline()
API on Linux - Support new
class torch.utils.cpp_extension.SyclExtension
API on Linux - Docstring documentation
- Add CI tests for SYCL support in CPP Extensions
[API-Stable]
- Create (or extend existing) tutorial
- Identify and enable 2-3 3d party projects with SYCL kernels (and Pytorch CPP Extension API)
- Enable Huggingface Quanto with SYCL kernels (and Pytorch CPP Extension API)
- Clarify, document and test SYCL support story in relevance to SYCL RT and compiler versions
Support for Windows:
[API-Unstable]
- Add SYCL support to
torch.utils.cpp_extension.load()
API on Windows - Add SYCL support to
torch.utils.cpp_extension.load_inline()
API on Windows - Support new
class torch.utils.cpp_extension.SyclExtension
API on Windows - Docstring documentation
- Add CI tests for SYCL support in CPP Extensions
[API-Stable]
- Create (or extend existing) tutorial
- Identify and enable 2-3 other 3d party projects with SYCL kernels (and Pytorch CPP Extension API)
- Clarify, document and test SYCL support story in relevance to SYCL RT and compiler versions
Relevant PRs
Here is a list of relevant PRs improving SYCL support in Pytorch CPP Extension API:
- xpu: torch.xpu.get_arch_list() to return [] if xpu not compiled #147431
- xpu: test py_limited_api with SyclExtension #147984
- doc/xpu: align description of SyclExtension with CPP/CUDA #147988
- xpu: update filter out of dg2 AOT target #148677
- [xpu] set aot device flags in cpp_extension #149459
- xpu: get xpu arch flags at runtime in cpp_extensions #152192
- xpu: rely on sycl/sycl.hpp to include bfloat16.hpp #152562
Relevant issues
- redefinition error when build PyTorch release/2.7 from source with oneapi in conda env and then activate oneapi on Windows intel/torch-xpu-ops#1503
- [cpp extension] Provide a clear error message when using inconsistent oneapi versions. intel/torch-xpu-ops#1649
- xpu: AOT compilation does not happen with sycl extension (JIT fallback happens) #156249
Release information
PyTorch 2.8
Proposal of Marketing/Blog text post below:
[API-Unstable] SYCL support in PyTorch CPP Extension API
This feature allows users to implement new high-performance custom operators for Intel GPU platforms as SYCL kernels accessible via PyTorch XPU device backend. SYCL is an open standard developed by the Khronos Group that allows developers to program heterogeneous architectures in standard C++. At the moment feature is available for Linux users.
CC: @EikanWang @guangyey