8000 xpu: support sycl with torch.utils.cpp_extension APIs by dvrogozh · Pull Request #132945 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

xpu: support sycl with torch.utils.cpp_extension APIs #132945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

dvrogozh
Copy link
Contributor
@dvrogozh dvrogozh commented Aug 7, 2024

This patch adds support for sycl kernels build via torch.utils.cpp_extension.load, torch.utils.cpp_extension.load_inline and (new) class SyclExtension APIs. Files having .sycl extension are considered to have sycl kernels and are compiled with icpx (dpc++ sycl compiler from Intel). Files with other extensions, .cpp, .cu, are handled as before. API supports building sycl along with other file types into single extension.

Note that .sycl file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt .sycl file extension.

By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment pvc,xe-lpg. This behavior can be overridden by setting TORCH_XPU_ARCH_LIST environment variables to the comma separated list of desired devices to compile for.

Fixes: #132944

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5

cc @gujinghui @EikanWang @fengyuan14 @guangyey

Copy link
pytorch-bot bot commented Aug 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132945

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 70 Pending

As of commit c43bf06 with merge base 1224765 (image):
💚 Looks good so far! There are no failures yet. 💚

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

dvrogozh added a commit to dvrogozh/optimum-quanto that referenced this pull request Aug 7, 2024
This commit implements xpu extension with unpack kernels written
in sycl. Pytorch XPU backend provides hw acceleration on Intel
GPUs. At the moment Meteor Lake (MTL) and Data Center Max (PVC)
are supported. Provided sycl kernel was converted from existing
cuda kernel.

$ python bench/kernels/benchmark.py --it 1000
unpack_2bit[xpu]: python = 0.177 ms, ext = 0.033 ms, ratio = 5.4x
unpack_4bit[xpu]: python = 0.085 ms, ext = 0.026 ms, ratio = 3.3x

note: without extension ratio is 0.8x.

At the moment there are few not implemented features for xpu
backend which affect implementation. These are:
* pytorch/pytorch#127929
  * Some memory ops not supported by xpu backend
  * WA applied: calling these ops is commented out
* pytorch/pytorch#131840
  * elapsed_time is not supported by XPUEvent
  * WA applied: calling these ops is commented out (CPU e2e time
    is measured)
* TBD linkg for missing aten ops
  * WA required: set PYTORCH_ENABLE_XPU_FALLBACK=1 on cmdline

Requires: pytorch/pytorch#132945
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@dvrogozh
Copy link
Contributor Author
dvrogozh commented Aug 7, 2024

FYI, here is PR on HF quanto side which is using this feature:

dvrogozh added a commit to dvrogozh/optimum-quanto that referenced this pull request Aug 7, 2024
This commit implements xpu extension with unpack kernels written
in sycl. Pytorch XPU backend provides hw acceleration on Intel
GPUs. At the moment Meteor Lake (MTL) and Data Center Max (PVC)
are supported. Provided sycl kernel was converted from existing
cuda kernel.

$ python bench/kernels/benchmark.py --it 1000
unpack_2bit[xpu]: python = 0.177 ms, ext = 0.033 ms, ratio = 5.4x
unpack_4bit[xpu]: python = 0.085 ms, ext = 0.026 ms, ratio = 3.3x

note: without extension ratio is 0.8x.

At the moment there are few not implemented features for xpu
backend which affect implementation. These are:
* pytorch/pytorch#127929
  * Some memory ops not supported by xpu backend
  * WA applied: calling these ops is commented out
* pytorch/pytorch#131840
  * elapsed_time is not supported by XPUEvent
  * WA applied: calling these ops is commented out (CPU e2e time
    is measured)
* pytorch/pytorch#132947
  * Some aten ops are not implemented with xpu backend falling back to cpu
  * WA required: set PYTORCH_ENABLE_XPU_FALLBACK=1 on cmdline

Requires: pytorch/pytorch#132945
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.sycl is not documented by SYCL spec and Intel SYCL compiler implementation. For now, I think it is not proper time to deliver the usage to community. We are following up the feature with compiler team. @EikanWang Please correct me. BTW, it is a good example to show compiler team.

8000 Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.sycl is not documented by SYCL spec and Intel SYCL compiler implementation.

Actually I used a documented feature to support files named with .sycl extension. Which is while this extension is not automatically recognized by the compiler, you can use -x <lang> option to say what's the type of the file which is being compiled. I used -x c++ file.sycl.

$ icpx --help | grep "\-x "
  -x <language>           Treat subsequent input files as having type <language>

I agree that we should follow up with dpc++ compiler asking for automated support of .sycl extension. I fill file issue for that tomorrow. But I believe we can proceed in a meanwhile with approach I described above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed intel/llvm#15015 with request for .sycl extension.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd prefer to leave the flexibility to the SYCL compiler community to provide the solution. If SYCL compiler community decides to use file extension to support this case, it is the freedom of the SYCL compiler community to decide which the file extension for SYCL source files should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is summary of discussions with our compiler team and compiler community. At the moment they oppose introducing .sycl file extension into the compiler. They also encourage to deal with SYCL/C++ compilation differences on build system level using build system agreed custom file extensions or other methods to logically separate sources. This discussion needs to happen here for PyTorch. Similar discussion is ongoing around SYCL support in cmake.

Overall for the PyTorch cpp_extension feature we have 2 options to proceed:

  • Option 1. We agree on and introduce custom file extension for SYCL source in PyTorch. That's what this PR is currently doing. So, proposal is to adopt .sycl as a file extension specific to PyTorch ecosystem and further influence other communities to align on that.
  1. Option 2. As an alternative, we can introduce other logical separation for SYCL sources. In particular we can:
    • Have sycl_sources = [ ... ] variable to take sycl source in torch.utils.cpp_extension.load (this will be new a new addition, CUDA does not have that)
    • torch.utils.cpp_extension.load already has cuda_sources and this PR introduces sycl_sources
    • Have both sources = [...] and sycl_sources = [ ... ] variable on class SyclExtension (that will be difference vs. how CUDAExtension class is defined)

Currently PR follows Option 1. Please, let me know your opinions on the better option.

@cpuhrsch cpuhrsch added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: xpu Intel XPU related issues labels Aug 8, 2024
host_cxx = get_cxx_compiler()
host_cflags = cflags
# escaping quoted arguments to pass them thru icpx
host_cflags = [item.replace('\\"', '\\\\\\\\\\"') for item in host_cflags]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed intel/llvm#15016 to easy handling escaped arguments.

@ezyang ezyang requested a review from albanD August 9, 2024 02:05
@EikanWang EikanWang marked this pull request as draft August 12, 2024 06:44
@EikanWang EikanWang marked this pull request as ready for review August 12, 2024 06:45
@EikanWang
Copy link
Collaborator

Feature-wise, I think this PR duplicates with #131276

@dvrogozh
Copy link
Contributor Author

Feature-wise, I think this PR duplicates with #131276

No, it does not. My PR adds support for the API not handled in 131276. Note that you did not handle this API in IPEX as well. Please, help review again.

@EikanWang
Copy link
Collaborator
EikanWang commented Aug 13, 2024

Feature-wise, I think this PR duplicates with #131276

No, it does not. My PR adds support for the API not handled in 131276. Note that you did not handle this API in IPEX as well. Please, help review again.

It should be part of #131276 as #131276 is an initial support for Intel GPU cpp_extension. And why is the #131276 not the prerequisite PR of this PR? @uniartisan

@dvrogozh
Copy link
Contributor Author

@EikanWang : pytorch provides 3 APIs to build extensions. I think that thru these APIs only 1 was actually implemented by IPEX (XPUExtension), another 2 were not - load() and load_inline ( correct if I am wrong, please). Pay attention that load* and XPUExtension paths are not actually based on each other - these follow separate implementation paths. There might be few overlapping utility methods, but they are mostly generic without xpu specifics. In general these API paths are different. Thus, I can't consider mine or #131276 to be prerequisite for another. And that was the reason I did not base my PR on top of #131276. Plus, there were some changes I made vs. #131276:

API IPEX
XPUExtension implemented
load not implemented
load_inline not implemented

@dvrogozh dvrogozh changed the title xpu: support sycl with torch.utils.cpp_extension.load xpu: support sycl with torch.utils.cpp_extension.load* Aug 15, 2024
@dvrogozh
Copy link
Contributor Author

Rebased and added support for load_inline()

@dvrogozh dvrogozh changed the title xpu: support sycl with torch.utils.cpp_extension.load* xpu: support sycl with torch.utils.cpp_extension APIs Aug 16, 2024
@dvrogozh
Copy link
Contributor Author

@EikanWang : following our offline discussion I've updated PR to include the full story for torch.utils.cpp_extension APIs. This includes implementations for:

  • torch.utils.cpp_extension.load
  • torch.utils.cpp_extension.load_inline
  • torch.utils.cpp_extension.SyclExtension (name to be discussed)

See 3 respective commits for each step. Note that I could not use #131276 as a base even for SyclExtension - implementation in this PR follows different pattern and required significant changes.

Things to note:

  1. Name of torch.utils.cpp_extension.SyclExtension. Should this be XPUExtension as suggested in Enhance XPU support and introduce Intel cppmodule #131276?
  2. SyclExtension can be built only with Ninja. That's for the reason that current cpp_extension architecture supports devlink step required for SYCL kernels only via Ninja. Non-ninja path (make?) supports only 2 steps: compile and link. Thus, to support this case of SYCL we will need to build everything with icpx compiler which might not be desirable for non-SYCL extensions which can be combined in a build process (see attached test extension architecture). Problem is that at the moment I don't know how to switch build to icpx only for SyclExtension. Hence this limitation.
  3. oneDNN, oneMKL and any oneAPI non-SYCL libs are out of the scope of this PR.

I will add comments to #131276 and suggest to help review @132945 instead.

@malfet
Copy link
Contributor
malfet commented Feb 16, 2025

@pytorchbot merge -f "Lint is green, let's test in prod"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@guangyey
Copy link
Collaborator

I think the root cause is the change in the behavior of torch.xpu.get_arch_list() introduced in PR #146966.

dvrogozh added a commit to dvrogozh/pytorch that referenced this pull request Feb 19, 2025
Initially discussed here: pytorch#132945 (comment)

Previously () `torch.xpu.get_arch_list()` got relaxed to work even if
XPU device is not available. However, we overlooked the case when
pytorch is not compiled with XPU support. In such a case function
throws an exception. This commit adjusts this behavior and makes function
return `[]` even if pytorch is not compiled with XPU support.

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
dvrogozh added a commit to dvrogozh/pytorch that referenced this pull request Feb 19, 2025
Initially discussed here: pytorch#132945 (comment)

Previously `torch.xpu.get_arch_list()` got relaxed to work even if
XPU device is not available. However, we overlooked the case when
pytorch is not compiled with XPU support. In such a case function
throws an exception. This commit adjusts this behavior and makes function
return `[]` even if pytorch is not compiled with XPU support.

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension.

Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension.

By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for.

Fixes: #132944

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5

Pull Request resolved: #132945
Approved by: https://github.com/albanD, https://github.com/guangyey
Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension.

Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension.

By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for.

Fixes: #132944

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5

Pull Request resolved: #132945
Approved by: https://github.com/albanD, https://github.com/guangyey, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
pytorchmergebot pushed a commit that referenced this pull request Feb 24, 2025
)

Initially discussed here: #132945 (comment)

Previously `torch.xpu.get_arch_list()` got relaxed to work even if XPU device is not available. However, we overlooked the case when pytorch is not compiled with XPU support. In such a case function throws an exception. This commit adjusts this behavior and makes function return `[]` even if pytorch is not compiled with XPU support.

CC: @EikanWang @fengyuan14 @guangyey @malfet @albanD

Pull Request resolved: #147431
Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/albanD
pytorch-bot bot pushed a commit that referenced this pull request Feb 24, 2025
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension.

Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension.

By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for.

Fixes: #132944

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5

Pull Request resolved: #132945
Approved by: https://github.com/albanD, https://github.com/guangyey
pytorch-bot bot pushed a commit that referenced this pull request Feb 24, 2025
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension.

Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension.

By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for.

Fixes: #132944

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5

Pull Request resolved: #132945
Approved by: https://github.com/albanD, https://github.com/guangyey, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
aditew01 pushed a commit that referenced this pull request Feb 28, 2025
)

Initially discussed here: #132945 (comment)

Previously `torch.xpu.get_arch_list()` got relaxed to work even if XPU device is not available. However, we overlooked the case when pytorch is not compiled with XPU support. In such a case function throws an exception. This commit adjusts this behavior and makes function return `[]` even if pytorch is not compiled with XPU support.

CC: @EikanWang @fengyuan14 @guangyey @malfet @albanD

Pull Request resolved: #147431
Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/albanD
majing921201 pushed a commit to majing921201/pytorch that referenced this pull request Mar 4, 2025
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension.

Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension.

By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for.

Fixes: pytorch#132944

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5

Pull Request resolved: pytorch#132945
Approved by: https://github.com/albanD, https://github.com/guangyey
majing921201 pushed a commit to majing921201/pytorch that referenced this pull request Mar 4, 2025
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension.

Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension.

By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for.

Fixes: pytorch#132944

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5

Pull Request resolved: pytorch#132945
Approved by: https://github.com/albanD, https://github.com/guangyey, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
majing921201 pushed a commit to majing921201/pytorch that referenced this pull request Mar 4, 2025
…rch#147431)

Initially discussed here: pytorch#132945 (comment)

Previously `torch.xpu.get_arch_list()` got relaxed to work even if XPU device is not available. However, we overlooked the case when pytorch is not compiled with XPU support. In such a case function throws an exception. This commit adjusts this behavior and makes function return `[]` even if pytorch is not compiled with XPU support.

CC: @EikanWang @fengyuan14 @guangyey @malfet @albanD

Pull Request resolved: pytorch#147431
Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/albanD
tye1 pushed a commit to intel/intel-extension-for-pytorch that referenced this pull request Mar 10, 2025
Fix the ERROR test_sycl_queue.py - TypeError: ExtensionVersioner.bump_version_if_changed() missing 1 required positional argument: 'with_sycl'. PyTorch 2.7 changed the API bump_version_if_changed in torch/utils/_cpp_extension_versioner.py in pytorch/pytorch#132945.
dvrogozh added a commit to dvrogozh/optimum-quanto that referenced this pull request Mar 20, 2025
This commit implements xpu extension with unpack kernels written
in sycl. Pytorch supports XPU extensions on Linux starting from
version 2.7.

$ python bench/kernels/benchmark.py --it 1000
unpack_2bit[xpu]: python = 0.177 ms, ext = 0.033 ms, ratio = 5.4x
unpack_4bit[xpu]: python = 0.085 ms, ext = 0.026 ms, ratio = 3.3x

note: without extension ratio is 0.8x.

Requires: pytorch/pytorch#132945
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
dvrogozh added a commit to dvrogozh/optimum-quanto that referenced this pull request Mar 20, 2025
This commit implements xpu extension with unpack kernels written
in sycl. Pytorch supports XPU extensions on Linux starting from
version 2.7.

$ python bench/kernels/benchmark.py --it 1000
unpack_2bit[xpu]: python = 0.177 ms, ext = 0.033 ms, ratio = 5.4x
unpack_4bit[xpu]: python = 0.085 ms, ext = 0.026 ms, ratio = 3.3x

note: without extension ratio is 0.8x.

Requires: pytorch/pytorch#132945
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
dacorvo pushed a commit to huggingface/optimum-quanto that referenced this pull request Apr 2, 2025
This commit implements xpu extension with unpack kernels written
in sycl. Pytorch supports XPU extensions on Linux starting from
version 2.7.

$ python bench/kernels/benchmark.py --it 1000
unpack_2bit[xpu]: python = 0.177 ms, ext = 0.033 ms, ratio = 5.4x
unpack_4bit[xpu]: python = 0.085 ms, ext = 0.026 ms, ratio = 3.3x

note: without extension ratio is 0.8x.

Requires: pytorch/pytorch#132945
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged open source release notes: xpu release notes category Reverted triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xpu: support torch.utils.cpp_extension APIs to build SYCL kernels
10 participants
0