8000 [Intel GPU] Enable fp64 GEMM by ZhiweiYan-96 · Pull Request #140677 · pytorch/pytorch · GitHub

[Intel GPU] Enable fp64 GEMM #140677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

ZhiweiYan-96 wants to merge 24 commits into gh/ZhiweiYan-96/40/base from gh/ZhiweiYan-96/40/head

Collaborator

ZhiweiYan-96 commented

•

Stack from ghstack (oldest at bottom):

-> [Intel GPU] Enable fp64 GEMM #140677

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @gujinghui @fengyuan14 @guangyey


          Update

7cf6f34

[ghstack-poisoned]

ZhiweiYan-96 requested review from EikanWang and gujinghui as code owners

November 14, 2024 06:26

pytorch-bot bot commented

•

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140677

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit b358c61 with merge base 880e176 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

linux-binary-manywheel / manywheel-py3_9-cuda12_4-test / test (gh) (similar failure)
ImportError: /opt/python/cp39-cp39/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommInitRankScalable
linux-binary-manywheel / manywheel-py3_9-cuda12_6-test / test (gh) (similar failure)
ImportError: /opt/python/cp39-cp39/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommInitRankScalable
linux-binary-manywheel / manywheel-py3_9-cuda12_8-test / test (gh) (similar failure)
ImportError: /opt/python/cp39-cp39/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommInitRankScalable

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot bot added the module: cpu label

ZhiweiYan-96 added a commit that referenced this pull request


          [Intel GPU] Enable fp64 GEMM

7c33b1c

ghstack-source-id: 36adebf
Pull Request resolved: #140677

ZhiweiYan-96 added module: xpu topic: not user facing labels

ZhiweiYan-96 marked this pull request as draft

November 14, 2024 06:27

ZhiweiYan-96 added ciflow/xpu ciflow/trunk labels

pytorchbot added the open source label


          Update

d84808b

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request


          [Intel GPU] Enable fp64 GEMM

d0a198f

ghstack-source-id: 72b12f8
Pull Request resolved: #140677

ZhiweiYan-96 self-assigned this


          Update

d5e9312

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request


          [Intel GPU] Enable fp64 GEMM

c33a13d

ghstack-source-id: 38c4397
Pull Request resolved: #140677

EikanWang reviewed

View reviewed changes

aten/src/ATen/native/mkldnn/xpu/detail/Matmul.cpp Outdated

@@ @@ -8,6 +8,7 @@ @@
               #include <Utils.h>
               #include <oneapi/dnnl/dnnl.hpp>
+              #include "c10/core/ScalarType.h"

Collaborator

EikanWang

Suggested change

      
            #include "c10/core/ScalarType.h"
          
            #include <c10/core/ScalarType.h>

Collaborator Author

ZhiweiYan-96

Thanks for suggestions, the code is changed here

aten/src/ATen/native/mkldnn/xpu/Blas.cpp Outdated

-                  TORCH_CHECK(
-
       false, "Double and complex datatype matmul is not supported in oneDNN");
+                if (self.is_complex()) {
+                  AT_ERROR("Complex datatype matmul is not supported in oneDNN");

Collaborator

EikanWang

AT_ERROR has been deprecated.

Collaborator Author

ZhiweiYan-96

Thanks for suggestion, AT_ERROR in file is changed to TORCH_CHECK

aten/src/ATen/native/mkldnn/xpu/Blas.cpp Outdated

-                if (self.is_complex() || self.scalar_type() == ScalarType::Double) {
-                  TORCH_CHECK(
-                      false, "Double and complex datatype matmul is not supported in oneDNN");
+                if (self.is_complex()) {

Collaborator

EikanWang

`TORCH_CHECK(!self.is_complex(), "error message");

Collaborator Author

ZhiweiYan-96

modified

aten/src/ATen/native/mkldnn/xpu/Blas.cpp Outdated

Comment on lines 53 to 56

+                // complex case
+                if (mat1.is_complex()) {
+                  AT_ERROR("Complex datatype matmul is not supported in oneDNN");
                 }

Collaborator

EikanWang

Suggested change

      
              // complex case
          
              if (mat1.is_complex()) {
          
                AT_ERROR("Complex datatype matmul is not supported in oneDNN");
          
              }
          
              // complex case
          
              TORCH_CHECK(!mat1.is_complex(), "Complex datatype matmul is not supported in oneDNN");

Collaborator Author

ZhiweiYan-96

modified

aten/src/ATen/native/mkldnn/xpu/Blas.cpp

@@ @@ -277,73 +280,6 @@ Tensor baddbmm( @@
                 return r;
               }
-              Tensor& addbmm_out(

Collaborator

EikanWang

Does Intel GPU not support addbmm_out?

Collaborator Author

ZhiweiYan-96

•

We does not need to write these glue codes, as cuda/cpu/xpu share an entry at natives_functions.yaml. They share same implementation(like op_stub or composite cases) in at::native::addbmm_out, the implementation in addbmm is general as it do the job by calling addmm which we have codes.

aten/src/ATen/native/mkldnn/xpu/Blas.cpp

@@ @@ -93,9 +94,13 @@ Tensor& addmm_out( @@
                   }
                 } else {
                   if (alpha.to<float>() == 1.f && beta_ == 1.f) {
-                    bias = self;
+                    bias = is_inplace ? self.clone() : self;

Collaborator

EikanWang

It would be better to add some comments to elaborate on why the clone is required here.

Collaborator Author

ZhiweiYan-96

Sure, the comments is added here

Contributor

jianyizh

You don't need to clone if #144759 is merged. We should use post sum instead of post binary in this case


          Update

7a85768

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request


          [Intel GPU] Enable fp64 GEMM

80b0ca4

ghstack-source-id: 56d0d04
Pull Request resolved: #140677

ZhiweiYan-96 requested review from EikanWang and guangyey

November 18, 2024 04:54

guangyey reviewed

View reviewed changes

aten/src/ATen/native/mkldnn/xpu/detail/Utils.cpp Outdated Show resolved Hide resolved

guangyey approved these changes

View reviewed changes


          Update

4c7186e

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request


          [Intel GPU] Enable fp64 GEMM

d1f4688

ghstack-source-id: 531139b
Pull Request resolved: #140677


          Update

52350c1

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request


          [Intel GPU] Enable fp64 GEMM

21cd7b8

ghstack-source-id: c7c11a0
Pull Request resolved: #140677


          Update

df0792d

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request


          [Intel GPU] Enable fp64 GEMM

7de7d4e

ghstack-source-id: 36a1992
Pull Request resolved: #140677

guangyey added release notes: xpu and removed topic: not user facing labels

Collaborator

guangyey commented

@pytorchbot merge

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Collaborator

pytorchmergebot commented

Merge failed

Reason: 1 jobs have failed, first few of them are: xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 2, 4, linux.idc.xpu)

Details for Dev Infra team

Raised by workflow job

pytorchmergebot removed the merging label

Collaborator

guangyey commented

@pytorchbot rebase -b main

Collaborator

pytorchmergebot commented

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here


          Update

4a243f7

[ghstack-poisoned]

Collaborator

pytorchmergebot commented

Successfully rebased gh/ZhiweiYan-96/40/orig onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/140677)

pytorchmergebot pushed a commit that referenced this pull request


          [Intel GPU] Enable fp64 GEMM

6a13530

ghstack-source-id: 6df76df
Pull Request resolved: #140677

Collaborator

guangyey commented

•

inductor/test_torchinductor_opinfo.py::TestInductorOpInfoXPU::test_comprehensive_linalg_eigvals_xpu_float64
@ZhiweiYan-96 The failure appears to be related to this PR. Please investigate and address the issue.


          Update

b358c61

[ghstack-poisoned]

ZhiweiYan-96 added the keep-going label

ZhiweiYan-96 added a commit that referenced this pull request


          [Intel GPU] Enable fp64 GEMM

6a2c98d

ghstack-source-id: b2f08f4
Pull Request resolved: #140677

Collaborator Author

ZhiweiYan-96 commented

•

inductor/test_torchinductor_opinfo.py::TestInductorOpInfoXPU::test_comprehensive_linalg_eigvals_xpu_float64 @ZhiweiYan-96 The failure appears to be related to this PR. Please investigate and address the issue.

@guangyey There should be some new skipped uts were added, but this PR fixs them. I retriggered ci and wait for all fixed uts shown in CI results. After that, I will fix them at single commit.

jianyizh mentioned this pull request

[Intel GPU] Avoid unnecessary copy when the dst of Matmul is non-contiguous #144759

Closed

Collaborator

EikanWang commented

@pytorchbot merge

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot added the Merged label

pytorchmergebot closed this in

ae5f7fe

pytorchmergebot removed the merging label

Raymo111 pushed a commit that referenced this pull request


          [Intel GPU] Enable fp64 GEMM (#140677)

d28ae68

Pull Request resolved: #140677
Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/desertfire

pytorch-bot bot pushed a commit that referenced this pull request


          [Intel GPU] Enable fp64 GEMM (#140677)

1b1e832

Pull Request resolved: #140677
Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/desertfire

majing921201 pushed a commit to majing921201/pytorch that referenced this pull request


          [Intel GPU] Enable fp64 GEMM (pytorch#140677)

33ba3cf

Pull Request resolved: pytorch#140677
Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/desertfire

github-actions bot deleted the gh/ZhiweiYan-96/40/head branch

March 25, 2025 02:14

atalman mentioned this pull request

Release 2.7.0 validations checklist and cherry-picks #150628

Closed

65 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

jianyizh jianyizh left review comments

desertfire desertfire approved these changes

EikanWang EikanWang approved these changes

guangyey guangyey approved these changes

gujinghui Awaiting requested review from gujinghui

Labels

ciflow/inductor ciflow/trunk ciflow/xpu keep-going Merged module: cpu module: inductor module: xpu open source release notes: xpu

0