-
Notifications
You must be signed in to change notification settings - Fork 24.2k
xpu: support sycl with torch.utils.cpp_extension APIs #132945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132945
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 70 PendingAs of commit c43bf06 with merge base 1224765 ( UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This commit implements xpu extension with unpack kernels written in sycl. Pytorch XPU backend provides hw acceleration on Intel GPUs. At the moment Meteor Lake (MTL) and Data Center Max (PVC) are supported. Provided sycl kernel was converted from existing cuda kernel. $ python bench/kernels/benchmark.py --it 1000 unpack_2bit[xpu]: python = 0.177 ms, ext = 0.033 ms, ratio = 5.4x unpack_4bit[xpu]: python = 0.085 ms, ext = 0.026 ms, ratio = 3.3x note: without extension ratio is 0.8x. At the moment there are few not implemented features for xpu backend which affect implementation. These are: * pytorch/pytorch#127929 * Some memory ops not supported by xpu backend * WA applied: calling these ops is commented out * pytorch/pytorch#131840 * elapsed_time is not supported by XPUEvent * WA applied: calling these ops is commented out (CPU e2e time is measured) * TBD linkg for missing aten ops * WA required: set PYTORCH_ENABLE_XPU_FALLBACK=1 on cmdline Requires: pytorch/pytorch#132945 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
FYI, here is PR on HF quanto side which is using this feature: |
This commit implements xpu extension with unpack kernels written in sycl. Pytorch XPU backend provides hw acceleration on Intel GPUs. At the moment Meteor Lake (MTL) and Data Center Max (PVC) are supported. Provided sycl kernel was converted from existing cuda kernel. $ python bench/kernels/benchmark.py --it 1000 unpack_2bit[xpu]: python = 0.177 ms, ext = 0.033 ms, ratio = 5.4x unpack_4bit[xpu]: python = 0.085 ms, ext = 0.026 ms, ratio = 3.3x note: without extension ratio is 0.8x. At the moment there are few not implemented features for xpu backend which affect implementation. These are: * pytorch/pytorch#127929 * Some memory ops not supported by xpu backend * WA applied: calling these ops is commented out * pytorch/pytorch#131840 * elapsed_time is not supported by XPUEvent * WA applied: calling these ops is commented out (CPU e2e time is measured) * pytorch/pytorch#132947 * Some aten ops are not implemented with xpu backend falling back to cpu * WA required: set PYTORCH_ENABLE_XPU_FALLBACK=1 on cmdline Requires: pytorch/pytorch#132945 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.sycl
is not documented by SYCL spec and Intel SYCL compiler implementation. For now, I think it is not proper time to deliver the usage to community. We are following up the feature with compiler team. @EikanWang Please correct me. BTW, it is a good example to show compiler team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.sycl is not documented by SYCL spec and Intel SYCL compiler implementation.
Actually I used a documented feature to support files named with .sycl
extension. Which is while this extension is not automatically recognized by the compiler, you can use -x <lang>
option to say what's the type of the file which is being compiled. I used -x c++ file.sycl
.
$ icpx --help | grep "\-x "
-x <language> Treat subsequent input files as having type <language>
I agree that we should follow up with dpc++ compiler asking for automated support of .sycl
extension. I fill file issue for that tomorrow. But I believe we can proceed in a meanwhile with approach I described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed intel/llvm#15015 with request for .sycl
extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd prefer to leave the flexibility to the SYCL compiler community to provide the solution. If SYCL compiler community decides to use file extension to support this case, it is the freedom of the SYCL compiler community to decide which the file extension for SYCL source files should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is summary of discussions with our compiler team and compiler community. At the moment they oppose introducing .sycl
file extension into the compiler. They also encourage to deal with SYCL/C++ compilation differences on build system level using build system agreed custom file extensions or other methods to logically separate sources. This discussion needs to happen here for PyTorch. Similar discussion is ongoing around SYCL support in cmake.
Overall for the PyTorch cpp_extension feature we have 2 options to proceed:
- Option 1. We agree on and introduce custom file extension for SYCL source in PyTorch. That's what this PR is currently doing. So, proposal is to adopt
.sycl
as a file extension specific to PyTorch ecosystem and further influence other communities to align on that.
- Option 2. As an alternative, we can introduce other logical separation for SYCL sources. In particular we can:
- Have
sycl_sources = [ ... ]
variable to take sycl source intorch.utils.cpp_extension.load
(this will be new a new addition, CUDA does not have that) torch.utils.cpp_extension.load
already hascuda_sources
and this PR introducessycl_sources
- Have both
sources = [...]
andsycl_sources = [ ... ]
variable onclass SyclExtension
(that will be difference vs. how CUDAExtension class is defined)
- Have
Currently PR follows Option 1. Please, let me know your opinions on the better option.
torch/utils/cpp_extension.py
Outdated
host_cxx = get_cxx_compiler() | ||
host_cflags = cflags | ||
# escaping quoted arguments to pass them thru icpx | ||
host_cflags = [item.replace('\\"', '\\\\\\\\\\"') for item in host_cflags] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed intel/llvm#15016 to easy handling escaped arguments.
Feature-wise, I think this PR duplicates with #131276 |
No, it does not. My PR adds support for the API not handled in 131276. Note that you did not handle this API in IPEX as well. Please, help review again. |
It should be part of #131276 as #131276 is an initial support for Intel GPU |
@EikanWang : pytorch provides 3 APIs to build extensions. I think that thru these APIs only 1 was actually implemented by IPEX (
|
Rebased and added support for |
@EikanWang : following our offline discussion I've updated PR to include the full story for
See 3 respective commits for each step. Note that I could not use #131276 as a base even for Things to note:
I will add comments to #131276 and suggest to help review @132945 instead. |
@pytorchbot merge -f "Lint is green, let's test in prod" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
I think the root cause is the change in the behavior of |
Initially discussed here: pytorch#132945 (comment) Previously () `torch.xpu.get_arch_list()` got relaxed to work even if XPU device is not available. However, we overlooked the case when pytorch is not compiled with XPU support. In such a case function throws an exception. This commit adjusts this behavior and makes function return `[]` even if pytorch is not compiled with XPU support. Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Initially discussed here: pytorch#132945 (comment) Previously `torch.xpu.get_arch_list()` got relaxed to work even if XPU device is not available. However, we overlooked the case when pytorch is not compiled with XPU support. In such a case function throws an exception. This commit adjusts this behavior and makes function return `[]` even if pytorch is not compiled with XPU support. Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension. Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for. Fixes: #132944 CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 Pull Request resolved: #132945 Approved by: https://github.com/albanD, https://github.com/guangyey
This reverts commit 6073799. Reverted #132945 on behalf of https://github.com/malfet due to It just broke all the tests, see https://hud.pytorch.org/hud/pytorch/pytorch/b16ae97ad03a6f376988e505fa23734523d0b4c5/1?per_page=50 ([comment](#132945 (comment)))
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension. Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for. Fixes: #132944 CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 Pull Request resolved: #132945 Approved by: https://github.com/albanD, https://github.com/guangyey, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
) Initially discussed here: #132945 (comment) Previously `torch.xpu.get_arch_list()` got relaxed to work even if XPU device is not available. However, we overlooked the case when pytorch is not compiled with XPU support. In such a case function throws an exception. This commit adjusts this behavior and makes function return `[]` even if pytorch is not compiled with XPU support. CC: @EikanWang @fengyuan14 @guangyey @malfet @albanD Pull Request resolved: #147431 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/albanD
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension. Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for. Fixes: #132944 CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 Pull Request resolved: #132945 Approved by: https://github.com/albanD, https://github.com/guangyey
This reverts commit 6073799. Reverted #132945 on behalf of https://github.com/malfet due to It just broke all the tests, see https://hud.pytorch.org/hud/pytorch/pytorch/b16ae97ad03a6f376988e505fa23734523d0b4c5/1?per_page=50 ([comment](#132945 (comment)))
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension. Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for. Fixes: #132944 CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 Pull Request resolved: #132945 Approved by: https://github.com/albanD, https://github.com/guangyey, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
) Initially discussed here: #132945 (comment) Previously `torch.xpu.get_arch_list()` got relaxed to work even if XPU device is not available. However, we overlooked the case when pytorch is not compiled with XPU support. In such a case function throws an exception. This commit adjusts this behavior and makes function return `[]` even if pytorch is not compiled with XPU support. CC: @EikanWang @fengyuan14 @guangyey @malfet @albanD Pull Request resolved: #147431 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/albanD
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension. Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for. Fixes: pytorch#132944 CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 Pull Request resolved: pytorch#132945 Approved by: https://github.com/albanD, https://github.com/guangyey
…h#132945)" This reverts commit 6073799. Reverted pytorch#132945 on behalf of https://github.com/malfet due to It just broke all the tests, see https://hud.pytorch.org/hud/pytorch/pytorch/b16ae97ad03a6f376988e505fa23734523d0b4c5/1?per_page=50 ([comment](pytorch#132945 (comment)))
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension. Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for. Fixes: pytorch#132944 CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 Pull Request resolved: pytorch#132945 Approved by: https://github.com/albanD, https://github.com/guangyey, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
…rch#147431) Initially discussed here: pytorch#132945 (comment) Previously `torch.xpu.get_arch_list()` got relaxed to work even if XPU device is not available. However, we overlooked the case when pytorch is not compiled with XPU support. In such a case function throws an exception. This commit adjusts this behavior and makes function return `[]` even if pytorch is not compiled with XPU support. CC: @EikanWang @fengyuan14 @guangyey @malfet @albanD Pull Request resolved: pytorch#147431 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/albanD
Fix the ERROR test_sycl_queue.py - TypeError: ExtensionVersioner.bump_version_if_changed() missing 1 required positional argument: 'with_sycl'. PyTorch 2.7 changed the API bump_version_if_changed in torch/utils/_cpp_extension_versioner.py in pytorch/pytorch#132945.
This commit implements xpu extension with unpack kernels written in sycl. Pytorch supports XPU extensions on Linux starting from version 2.7. $ python bench/kernels/benchmark.py --it 1000 unpack_2bit[xpu]: python = 0.177 ms, ext = 0.033 ms, ratio = 5.4x unpack_4bit[xpu]: python = 0.085 ms, ext = 0.026 ms, ratio = 3.3x note: without extension ratio is 0.8x. Requires: pytorch/pytorch#132945 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
This commit implements xpu extension with unpack kernels written in sycl. Pytorch supports XPU extensions on Linux starting from version 2.7. $ python bench/kernels/benchmark.py --it 1000 unpack_2bit[xpu]: python = 0.177 ms, ext = 0.033 ms, ratio = 5.4x unpack_4bit[xpu]: python = 0.085 ms, ext = 0.026 ms, ratio = 3.3x note: without extension ratio is 0.8x. Requires: pytorch/pytorch#132945 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
This commit implements xpu extension with unpack kernels written in sycl. Pytorch supports XPU extensions on Linux starting from version 2.7. $ python bench/kernels/benchmark.py --it 1000 unpack_2bit[xpu]: python = 0.177 ms, ext = 0.033 ms, ratio = 5.4x unpack_4bit[xpu]: python = 0.085 ms, ext = 0.026 ms, ratio = 3.3x note: without extension ratio is 0.8x. Requires: pytorch/pytorch#132945 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
This patch adds support for sycl kernels build via
torch.utils.cpp_extension.load
,torch.utils.cpp_extension.load_inline
and (new)class SyclExtension
APIs. Files having.sycl
extension are considered to have sycl kernels and are compiled withicpx
(dpc++ sycl compiler from Intel). Files with other extensions,.cpp
,.cu
, are handled as before. API supports building sycl along with other file types into single extension.Note that
.sycl
file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt.sycl
file extension.By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment
pvc,xe-lpg
. This behavior can be overridden by settingTORCH_XPU_ARCH_LIST
environment variables to the comma separated list of desired devices to compile for.Fixes: #132944
CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5
cc @gujinghui @EikanWang @fengyuan14 @guangyey