[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend #140365

ZhiweiYan-96 · 2024-11-12T06:07:27Z

Motivation

This PR is intended to add post-op fusion support fo Linear. The liner-pointwise fusion is expected to be used in graph mode like torch.compile. The FusionUtils.cpp file defines a utilization APIs for generating primitive attribute. This APIs would also be used for conv-pointwise fusion, which is in #140372.

Validation

   python test/xpu/test_fusion.py

Stack from ghstack (oldest at bottom):

-> [Intel GPU] Enable mkdnn._linear_pointwise at XPU backend #140365

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @gujinghui @EikanWang @fengyuan14 @guangyey

[ghstack-poisoned]

pytorch-bot · 2024-11-12T06:07:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140365

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Large queue time for macos-m2-15 instances

✅ You can merge normally! (1 Unrelated Failure)

As of commit 9060933 with merge base 032ef48 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 3, 6, linux.idc.xpu) (gh) (disabled by #153123 but the issue was closed recently and a rebase is needed to make it pass)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_input_codegen_with_sympy_expr_xpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: dc02464 Pull Request resolved: #140365

[ghstack-poisoned]

ghstack-source-id: b0e3700 Pull Request resolved: #140365

[ghstack-poisoned]

etaf · 2024-11-20T02:17:49Z

aten/src/ATen/native/mkldnn/xpu/Linear.h

+    Tensor _bias = bias.has_value() ? bias.value() : at::Tensor();
+    Tensor _input = input.dim() <= 2 ? input : input.contiguous();
+    return impl::matmul_fusion_variants(
+        result, _input, weight, /*trans*/ true, attr, is_fused_, _bias);


Currently mkldnn fusion only works on freezing path. And the frozen weight from linear is always created with (out_features, input_features) which also means(n, k), So h 8000 ere the trans is false, we need to transpose the weight.

Suggested change

result, _input, weight, /*trans*/ true, attr, is_fused_, _bias);

result, _input, weight, /*trans*/ false, attr, is_fused_, _bias);

Thanks for your reminding. I have changed the code and uts.

etaf

Please check if the suggested change is reasonable.

EikanWang · 2024-11-24T15:47:07Z

~~Pls. ensure your PR to be small enough for the sake of review~~

[ghstack-poisoned]

liangan1 · 2025-01-10T03:02:19Z

aten/src/ATen/native/mkldnn/xpu/Linear.cpp

+  onednn::Attr attr;
+  attr = construct_binary_attr(binary_attr, /*alpha*/ 1.f, other_t, attr);
+
+  Tensor _input = input_t.dim() <= 2 ? input_t : input_t.contiguous();


The converter call already check the input shape and layout, no need to do it here.

Considering that, Linear+fusion should not use the complex logic in old LinearConverter, I remove all logic in Linear.h and BlasImpl.h

liangan1 · 2025-01-10T03:06:24Z

aten/src/ATen/native/mkldnn/xpu/FusionUtils.cpp

+  } else if (binary == "div") {
+    attr.append_post_binary(attr.kind_with_binary_div, other);
+  } else if (binary == "add") {
+    attr.append_post_binary(attr.kind_with_binary_add, other);


Need to add the TORCH_CHECK if the the binary op is not supported? e.g., &, | ？

liangan1 · 2025-01-10T03:32:49Z

aten/src/ATen/native/mkldnn/xpu/FusionUtils.cpp

+  if (simple_unary.find(unary) != simple_unary.end()) {
+    return string_to_unary_attr(unary, attr);
+  } else {
+    return unary_attr_with_arg(unary, scalars, algorithm, attr);


The logic is too complex here. Suggest to unify the logic for different activation ops.

liangan1 · 2025-01-10T06:46:59Z

Pls. ensure your PR to be small enough for the sake of review
Most of code in this pr is the BlasImpl to cover all kinds of matmul variants, e.g, scalar dot, mv. etc.. while we only need to cover the simple Linear+post ops case. @ZhiweiYan-96 suggest to simplify this PR.

ghstack-source-id: 841ce09 Pull Request resolved: pytorch#140365 Signed-off-by: xinan.lin <xinan.lin@intel.com>

ghstack-source-id: 841ce09 Pull Request resolved: pytorch/pytorch#140365

[ghstack-poisoned]

ghstack-source-id: ff834c9 Pull Request resolved: #140365

[ghstack-poisoned]

ghstack-source-id: e865251 Pull Request resolved: #140365

[ghstack-poisoned]

ghstack-source-id: a006cbf Pull Request resolved: #140365

[ghstack-poisoned]

ghstack-source-id: 4e47dd4 Pull Request resolved: #140365

etaf

Please fix lint error first.

etaf · 2025-05-13T09:13:49Z

test/xpu/test_fusion.py

+                    mod = M(
+                        pointwise_info.pointwise_module, input_shape[-1], 10, bias
+                    ).eval()
+                    mod = mod.to("xpu")


This isn’t a big issue, but since you’ve already used
instantiate_device_type_tests(TestoneDNNFusion, globals(), only_for="xpu", allow_xpu=True),
it’s best not to hardcode "xpu" inside the test cases.

Thanks for reminding, I add device argument in tests, and uses model.to(device) instead.

etaf · 2025-05-13T09:18:54Z

aten/src/ATen/native/mkldnn/xpu/Linear.cpp

+    const Tensor& input_t, // [M, K] or [B, M, K]
+    const Tensor& weight_t, // [N, K]
+    const c10::optional<Tensor>& bias_opt,
+    c10::string_view attr,


c10::string_view is an alias of std::string_view and the community are planning to deprecate it. Please replace all in this PR with std::string_view.

etaf · 2025-05-13T09:21:55Z

aten/src/ATen/native/mkldnn/xpu/Linear.cpp

+    const Tensor& other_t,
+    const Tensor& weight_t,
+    const c10::optional<Tensor>& bias_opt,
+    c10::string_view binary_attr) {


Suggested change

c10::string_view binary_attr) {

std::string_view binary_attr) {

etaf · 2025-05-13T09:22:18Z

aten/src/ATen/native/mkldnn/xpu/Linear.cpp

+    const Tensor& input_t, // [M, K] or [B, M, K]
+    const Tensor& weight_t, // [N, K]
+    const c10::optional<Tensor>& bias_opt,
+    c10::string_view attr,


Suggested change

c10::string_view attr,

std::string_view attr,

etaf · 2025-05-13T09:28:26Z

Please rebase your branch to get green CI.

[ghstack-poisoned]

ghstack-source-id: 619b82f Pull Request resolved: #140365

EikanWang · 2025-05-14T15:51:08Z

@ZhiweiYan-96 , is this PR ready for the pre-review?

ZhiweiYan-96 · 2025-05-15T05:09:16Z

@EikanWang Yes, please review the PR, thanks!

[ghstack-poisoned]

ghstack-source-id: 830357b Pull Request resolved: #140365

Copilot

Pull Request Overview

This PR adds post-op fusion support for linear layers on the XPU backend via oneDNN, including both unary and binary pointwise fusions.

Introduces FusionUtils APIs to construct unary and binary oneDNN attributes.
Implements linear_pointwise and linear_pointwise_binary in the XPU MKLDNN backend and registers them.
Adds Python tests to validate linear+unary and linear+binary fusion kernels on XPU.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
test/xpu/test_fusion.py	Adds tests for unary and binary linear+pointwise fusion
aten/src/ATen/native/mkldnn/xpu/FusionUtils.h	Declares utilities for building post-op fusion attrs
aten/src/ATen/native/mkldnn/xpu/FusionUtils.cpp	Defines unary/binary attribute constructors
aten/src/ATen/native/mkldnn/xpu/Linear.cpp	Implements and registers `linear_pointwise` kernels

Comments suppressed due to low confidence (5)

test/xpu/test_fusion.py:22

[nitpick] Class name TestoneDNNFusion is inconsistent—consider renaming to TestOneDNNFusion for readability and to follow CamelCase conventions.

class TestoneDNNFusion(TestCase):

aten/src/ATen/native/mkldnn/xpu/FusionUtils.h:11

The declaration of string_to_unary_attr in the header doesn't match its definition (which takes a c10::string_view unary parameter). Update the signature in the header to string_to_unary_attr(c10::string_view unary, onednn::Attr attr).

at::native::onednn::Attr string_to_unary_attr(onednn::Attr attr);

aten/src/ATen/native/mkldnn/xpu/FusionUtils.h:19

The header declares only a templated construct_binary_attr, but the .cpp defines a non-template overload. Either add the matching non-template declaration to the header or consolidate into a single function signature to avoid linker errors.

template <bool is_matmul = false>

test/xpu/test_fusion.py:116

The test for binary fusion calls torch.ops.mkldnn._linear_pointwise instead of the registered binary op torch.ops.mkldnn._linear_pointwise.binary. Update the call to use the .binary suffix so it exercises the correct kernel.

fused = torch.ops.mkldnn._linear_pointwise(

aten/src/ATen/native/mkldnn/xpu/FusionUtils.h:1

[nitpick] Public API in this header lacks a brief comment describing its purpose. Consider adding a file-level doc comment explaining the role of FusionUtils and its functions.

#pragma once

EikanWang · 2025-05-15T12:46:15Z

@ZhiweiYan-96 , pls. check the copilot's comments.

Update

249d701

[ghstack-poisoned]

ZhiweiYan-96 requested review from EikanWang and gujinghui as code owners November 12, 2024 06:07

pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Nov 12, 2024

ZhiweiYan-96 added a commit that referenced this pull request Nov 12, 2024

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend

9ec285d

ghstack-source-id: dc02464 Pull Request resolved: #140365

ZhiweiYan-96 marked this pull request as draft November 12, 2024 06:09

pytorchbot added the open source label Nov 12, 2024

ZhiweiYan-96 added topic: not user facing topic category ciflow/xpu Run XPU CI tasks ciflow/trunk Trigger trunk jobs on your pull request module: xpu Intel XPU related issues labels Nov 12, 2024

ZhiweiYan-96 mentioned this pull request Nov 12, 2024

[Intel GPU] Enable mkldnn::_convolution.pointwise at XPU backend #140372

Closed

ZhiweiYan-96 added 2 commits November 12, 2024 08:30

Update

f1dcb1a

[ghstack-poisoned]

Update

691449b

[ghstack-poisoned]

ZhiweiYan-96 requested a review from guangyey November 12, 2024 09:05

Update

5a0e425

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Nov 13, 2024

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend

6118df6

ghstack-source-id: b0e3700 Pull Request resolved: #140365

ZhiweiYan-96 added 4 commits November 13, 2024 03:34

Update

2caa517

[ghstack-poisoned]

Update

ef27ccb

[ghstack-poisoned]

Update

be0755f

[ghstack-poisoned]

Update

f87eb96

[ghstack-poisoned]

etaf reviewed Nov 20, 2024

View reviewed changes

etaf mentioned this pull request Nov 20, 2024

[WIP][Inductor XPU] Support mkldnn fusion in freezing for XPU. #141096

Closed

etaf requested changes Nov 20, 2024

View reviewed changes

ZhiweiYan-96 added 2 commits November 25, 2024 04:58

Update

8b87482

[ghstack-poisoned]

Update

5dd4806

[ghstack-poisoned]

liangan1 reviewed Jan 10, 2025

View reviewed changes

etaf pushed a commit to etaf/pytorch-inductor-xpu that referenced this pull request Mar 31, 2025

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend

ea24ce8

ghstack-source-id: 841ce09 Pull Request resolved: pytorch#140365 Signed-off-by: xinan.lin <xinan.lin@intel.com>

Divigroup-RAP pushed a commit to Divigroup-RAP/PYTORCH that referenced this pull request Apr 22, 2025

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend

abe5490

ghstack-source-id: 841ce09 Pull Request resolved: pytorch/pytorch#140365

Update

c9f9ccd

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request May 13, 2025

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend

ceb9da8

ghstack-source-id: ff834c9 Pull Request resolved: #140365

Update

2149ea7

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request May 13, 2025

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend

fd3b4d8

ghstack-source-id: e865251 Pull Request resolved: #140365

Update

fe81e43

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request May 13, 2025

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend

6df4ea1

ghstack-source-id: a006cbf Pull Request resolved: #140365

Update

82ce0a4

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request May 13, 2025

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend

46dc79e

ghstack-source-id: 4e47dd4 Pull Request resolved: #140365

ZhiweiYan-96 requested review from liangan1 and etaf May 13, 2025 08:11

etaf requested changes May 13, 2025

View reviewed changes

Update

6399e84

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request May 14, 2025

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend

e165fa5

ghstack-source-id: 619b82f Pull Request resolved: #140365

ZhiweiYan-96 requested a review from etaf May 14, 2025 15:25

Update

9060933

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request May 15, 2025

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend

1b39a89

ghstack-source-id: 830357b Pull Request resolved: #140365

ZhiweiYan-96 added the keep-going Don't stop on first failure, keep running tests until the end label May 15, 2025

EikanWang requested review from Copilot and liangan1 and removed request for liangan1 May 15, 2025 12:43

EikanWang marked this pull request as ready for review May 15, 2025 12:44

EikanWang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 15, 2025

EikanWang moved this from In Progress to Pre-Review Required in PyTorch Intel May 15, 2025

Copilot AI reviewed May 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend #140365

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend #140365

	result, _input, weight, /trans/ true, attr, is_fused_, _bias);
	result, _input, weight, /trans/ false, attr, is_fused_, _bias);

	c10::string_view binary_attr) {
	std::string_view binary_attr) {

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend #140365

Are you sure you want to change the base?

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend #140365

Conversation

Motivation

Validation

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140365

❗ 1 Active SEVs

✅ You can merge normally! (1 Unrelated Failure)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes