8000 [Intel GPU] Enable mkdnn._linear_pointwise at XPU backend by ZhiweiYan-96 · Pull Request #140365 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend #140365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: gh/ZhiweiYan-96/38/base
Choose a base branch
from

Conversation

ZhiweiYan-96
Copy link
Collaborator
@ZhiweiYan-96 ZhiweiYan-96 commented Nov 12, 2024

Motivation

This PR is intended to add post-op fusion support fo Linear. The liner-pointwise fusion is expected to be used in graph mode like torch.compile. The FusionUtils.cpp file defines a utilization APIs for generating primitive attribute. This APIs would also be used for conv-pointwise fusion, which is in #140372.

Validation

   python test/xpu/test_fusion.py

Stack from ghstack (oldest at bottom):

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @gujinghui @EikanWang @fengyuan14 @guangyey

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Nov 12, 2024
Copy link
pytorch-bot bot commented Nov 12, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140365

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (1 Unrelated Failure)

As of commit 9060933 with merge base 032ef48 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ZhiweiYan-96 added a commit that referenced this pull request Nov 12, 2024
@ZhiweiYan-96 ZhiweiYan-96 marked this pull request as draft November 12, 2024 06:09
@ZhiweiYan-96 ZhiweiYan-96 added topic: not user facing topic category ciflow/xpu Run XPU CI tasks ciflow/trunk Trigger trunk jobs on your pull request module: xpu Intel XPU related issues labels Nov 12, 2024
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
ZhiweiYan-96 added a commit that referenced this pull request Nov 13, 2024
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Tensor _bias = bias.has_value() ? bias.value() : at::Tensor();
Tensor _input = input.dim() <= 2 ? input : input.contiguous();
return impl::matmul_fusion_variants(
result, _input, weight, /*trans*/ true, attr, is_fused_, _bias);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently mkldnn fusion only works on freezing path. And the frozen weight from linear is always created with (out_features, input_features) which also means(n, k), So h 8000 ere the trans is false, we need to transpose the weight.

Suggested change
result, _input, weight, /*trans*/ true, attr, is_fused_, _bias);
result, _input, weight, /*trans*/ false, attr, is_fused_, _bias);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reminding. I have changed the code and uts.

Copy link
Collaborator
@etaf etaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if the suggested change is reasonable.

@EikanWang
Copy link
Collaborator

Pls. ensure your PR to be small enough for the sake of review

[ghstack-poisoned]
[ghstack-poisoned]
onednn::Attr attr;
attr = construct_binary_attr(binary_attr, /*alpha*/ 1.f, other_t, attr);

Tensor _input = input_t.dim() <= 2 ? input_t : input_t.contiguous();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The converter call already check the input shape and layout, no need to do it here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that, Linear+fusion should not use the complex logic in old LinearConverter, I remove all logic in Linear.h and BlasImpl.h

} else if (binary == "div") {
attr.append_post_binary(attr.kind_with_binary_div, other);
} else if (binary == "add") {
attr.append_post_binary(attr.kind_with_binary_add, other);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add the TORCH_CHECK if the the binary op is not supported? e.g., &, | ?

if (simple_unary.find(unary) != simple_unary.end()) {
return string_to_unary_attr(unary, attr);
} else {
return unary_attr_with_arg(unary, scalars, algorithm, attr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is too complex here. Suggest to unify the logic for different activation ops.

@liangan1
Copy link
liangan1 commented Jan 10, 2025

Pls. ensure your PR to be small enough for the sake of review
Most of code in this pr is the BlasImpl to cover all kinds of matmul variants, e.g, scalar dot, mv. etc.. while we only need to cover the simple Linear+post ops case. @ZhiweiYan-96 suggest to simplify this PR.

etaf pushed a commit to etaf/pytorch-inductor-xpu that referenced this pull request Mar 31, 2025
ghstack-source-id: 841ce09
Pull Request resolved: pytorch#140365

Signed-off-by: xinan.lin <xinan.lin@intel.com>
Divigroup-RAP pushed a commit to Divigroup-RAP/PYTORCH that referenced this pull request Apr 22, 2025
[ghstack-poisoned]
ZhiweiYan-96 added a commit that referenced this pull request May 13, 2025
[ghstack-poisoned]
ZhiweiYan-96 added a commit that referenced this pull request May 13, 2025
[ghstack-poisoned]
ZhiweiYan-96 added a commit that referenced this pull request May 13, 2025
[ghstack-poisoned]
ZhiweiYan-96 added a commit that referenced this pull request May 13, 2025
@ZhiweiYan-96 ZhiweiYan-96 requested review from liangan1 and etaf May 13, 2025 08:11
Copy link
Collaborator
@etaf etaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix lint error first.

mod = M(
pointwise_info.pointwise_module, input_shape[-1], 10, bias
).eval()
mod = mod.to("xpu")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn’t a big issue, but since you’ve already used
instantiate_device_type_tests(TestoneDNNFusion, globals(), only_for="xpu", allow_xpu=True),
it’s best not to hardcode "xpu" inside the test cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding, I add device argument in tests, and uses model.to(device) instead.

const Tensor& input_t, // [M, K] or [B, M, K]
const Tensor& weight_t, // [N, K]
const c10::optional<Tensor>& bias_opt,
c10::string_view attr,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c10::string_view is an alias of std::string_view and the community are planning to deprecate it. Please replace all in this PR with std::string_view.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified

const Tensor& other_t,
const Tensor& weight_t,
const c10::optional<Tensor>& bias_opt,
c10::string_view binary_attr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
c10::string_view binary_attr) {
std::string_view binary_attr) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified

const Tensor& input_t, // [M, K] or [B, M, K]
const Tensor& weight_t, // [N, K]
const c10::optional<Tensor>& bias_opt,
c10::string_view attr,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
c10::string_view attr,
std::string_view attr,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified

@etaf
Copy link
Collaborator
etaf commented May 13, 2025

Please rebase your branch to get green CI.

[ghstack-poisoned]
ZhiweiYan-96 added a commit that referenced this pull request May 14, 2025
@ZhiweiYan-96 ZhiweiYan-96 requested a review from etaf May 14, 2025 15:25
@EikanWang
Copy link
Collaborator

@ZhiweiYan-96 , is this PR ready for the pre-review?

@ZhiweiYan-96
Copy link
Collaborator Author

@EikanWang Yes, please review the PR, thanks!

[ghstack-poisoned]
ZhiweiYan-96 added a commit that referenced this pull request May 15, 2025
@ZhiweiYan-96 ZhiweiYan-96 added the keep-going Don't stop on first failure, keep running tests until the end label May 15, 2025
@EikanWang EikanWang requested review from Copilot and liangan1 and removed request for liangan1 May 15, 2025 12:43
@EikanWang EikanWang marked this pull request as ready for review May 15, 2025 12:44
@EikanWang EikanWang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 15, 2025
@EikanWang EikanWang moved this from In Progress to Pre-Review Required in PyTorch Intel May 15, 2025
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds post-op fusion support for linear layers on the XPU backend via oneDNN, including both unary and binary pointwise fusions.

  • Introduces FusionUtils APIs to construct unary and binary oneDNN attributes.
  • Implements linear_pointwise and linear_pointwise_binary in the XPU MKLDNN backend and registers them.
  • Adds Python tests to validate linear+unary and linear+binary fusion kernels on XPU.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
test/xpu/test_fusion.py Adds tests for unary and binary linear+pointwise fusion
aten/src/ATen/native/mkldnn/xpu/FusionUtils.h Declares utilities for building post-op fusion attrs
aten/src/ATen/native/mkldnn/xpu/FusionUtils.cpp Defines unary/binary attribute constructors
aten/src/ATen/native/mkldnn/xpu/Linear.cpp Implements and registers linear_pointwise kernels
Comments suppressed due to low confidence (5)

test/xpu/test_fusion.py:22

  • [nitpick] Class name TestoneDNNFusion is inconsistent—consider renaming to TestOneDNNFusion for readability and to follow CamelCase conventions.
class TestoneDNNFusion(TestCase):

aten/src/ATen/native/mkldnn/xpu/FusionUtils.h:11

  • The declaration of string_to_unary_attr in the header doesn't match its definition (which takes a c10::string_view unary parameter). Update the signature in the header to string_to_unary_attr(c10::string_view unary, onednn::Attr attr).
at::native::onednn::Attr string_to_unary_attr(onednn::Attr attr);

aten/src/ATen/native/mkldnn/xpu/FusionUtils.h:19

  • The header declares only a templated construct_binary_attr, but the .cpp defines a non-template overload. Either add the matching non-template declaration to the header or consolidate into a single function signature to avoid linker errors.
template <bool is_matmul = false>

test/xpu/test_fusion.py:116

  • The test for binary fusion calls torch.ops.mkldnn._linear_pointwise instead of the registered binary op torch.ops.mkldnn._linear_pointwise.binary. Update the call to use the .binary suffix so it exercises the correct kernel.
fused = torch.ops.mkldnn._linear_pointwise(

aten/src/ATen/native/mkldnn/xpu/FusionUtils.h:1

  • [nitpick] Public API in this header lacks a brief comment describing its purpose. Consider adding a file-level doc comment explaining the role of FusionUtils and its functions.
#pragma once

@EikanWang
Copy link
Collaborator

@ZhiweiYan-96 , pls. check the copilot's comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks keep-going Don't stop on first failure, keep running tests until the end module: cpu CPU specific problem (e.g., perf, algorithm) module: xpu Intel XPU related issues open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Pre-Review Required
Development

Successfully merging this pull request may close these issues.

5 participants