-
Notifications
You must be signed in to change notification settings - Fork 24.2k
[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend #140365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/ZhiweiYan-96/38/base
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140365
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (1 Unrelated Failure)As of commit 9060933 with merge base 032ef48 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Tensor _bias = bias.has_value() ? bias.value() : at::Tensor(); | ||
Tensor _input = input.dim() <= 2 ? input : input.contiguous(); | ||
return impl::matmul_fusion_variants( | ||
result, _input, weight, /*trans*/ true, attr, is_fused_, _bias); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently mkldnn fusion only works on freezing path. And the frozen weight from linear is always created with (out_features, input_features)
which also means(n, k), So h
8000
ere the trans is false, we need to transpose the weight.
result, _input, weight, /*trans*/ true, attr, is_fused_, _bias); | |
result, _input, weight, /*trans*/ false, attr, is_fused_, _bias); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reminding. I have changed the code and uts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if the suggested change is reasonable.
Pls. ensure your PR to be small enough for the sake of review |
onednn::Attr attr; | ||
attr = construct_binary_attr(binary_attr, /*alpha*/ 1.f, other_t, attr); | ||
|
||
Tensor _input = input_t.dim() <= 2 ? input_t : input_t.contiguous(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The converter call already check the input shape and layout, no need to do it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that, Linear+fusion should not use the complex logic in old LinearConverter, I remove all logic in Linear.h
and BlasImpl.h
} else if (binary == "div") { | ||
attr.append_post_binary(attr.kind_with_binary_div, other); | ||
} else if (binary == "add") { | ||
attr.append_post_binary(attr.kind_with_binary_add, other); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add the TORCH_CHECK if the the binary op is not supported? e.g., &, | ?
if (simple_unary.find(unary) != simple_unary.end()) { | ||
return string_to_unary_attr(unary, attr); | ||
} else { | ||
return unary_attr_with_arg(unary, scalars, algorithm, attr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is too complex here. Suggest to unify the logic for different activation ops.
|
ghstack-source-id: 841ce09 Pull Request resolved: pytorch#140365 Signed-off-by: xinan.lin <xinan.lin@intel.com>
ghstack-source-id: 841ce09 Pull Request resolved: pytorch/pytorch#140365
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix lint error first.
test/xpu/test_fusion.py
Outdated
mod = M( | ||
pointwise_info.pointwise_module, input_shape[-1], 10, bias | ||
).eval() | ||
mod = mod.to("xpu") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t a big issue, but since you’ve already used
instantiate_device_type_tests(TestoneDNNFusion, globals(), only_for="xpu", allow_xpu=True),
it’s best not to hardcode "xpu" inside the test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding, I add device
argument in tests, and uses model.to(device)
instead.
const Tensor& input_t, // [M, K] or [B, M, K] | ||
const Tensor& weight_t, // [N, K] | ||
const c10::optional<Tensor>& bias_opt, | ||
c10::string_view attr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c10::string_view is an alias of std::string_view and the community are planning to deprecate it. Please replace all in this PR with std::string_view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified
const Tensor& other_t, | ||
const Tensor& weight_t, | ||
const c10::optional<Tensor>& bias_opt, | ||
c10::string_view binary_attr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c10::string_view binary_attr) { | |
std::string_view binary_attr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified
const Tensor& input_t, // [M, K] or [B, M, K] | ||
const Tensor& weight_t, // [N, K] | ||
const c10::optional<Tensor>& bias_opt, | ||
c10::string_view attr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c10::string_view attr, | |
std::string_view attr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified
Please rebase your branch to get green CI. |
@ZhiweiYan-96 , is this PR ready for the pre-review? |
@EikanWang Yes, please review the PR, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds post-op fusion support for linear layers on the XPU backend via oneDNN, including both unary and binary pointwise fusions.
- Introduces FusionUtils APIs to construct unary and binary oneDNN attributes.
- Implements
linear_pointwise
andlinear_pointwise_binary
in the XPU MKLDNN backend and registers them. - Adds Python tests to validate linear+unary and linear+binary fusion kernels on XPU.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
File | Description |
---|---|
test/xpu/test_fusion.py | Adds tests for unary and binary linear+pointwise fusion |
aten/src/ATen/native/mkldnn/xpu/FusionUtils.h | Declares utilities for building post-op fusion attrs |
aten/src/ATen/native/mkldnn/xpu/FusionUtils.cpp | Defines unary/binary attribute constructors |
aten/src/ATen/native/mkldnn/xpu/Linear.cpp | Implements and registers linear_pointwise kernels |
Comments suppressed due to low confidence (5)
test/xpu/test_fusion.py:22
- [nitpick] Class name
TestoneDNNFusion
is inconsistent—consider renaming toTestOneDNNFusion
for readability and to follow CamelCase conventions.
class TestoneDNNFusion(TestCase):
aten/src/ATen/native/mkldnn/xpu/FusionUtils.h:11
- The declaration of
string_to_unary_attr
in the header doesn't match its definition (which takes ac10::string_view unary
parameter). Update the signature in the header tostring_to_unary_attr(c10::string_view unary, onednn::Attr attr)
.
at::native::onednn::Attr string_to_unary_attr(onednn::Attr attr);
aten/src/ATen/native/mkldnn/xpu/FusionUtils.h:19
- The header declares only a templated
construct_binary_attr
, but the .cpp defines a non-template overload. Either add the matching non-template declaration to the header or consolidate into a single function signature to avoid linker errors.
template <bool is_matmul = false>
test/xpu/test_fusion.py:116
- The test for binary fusion calls
torch.ops.mkldnn._linear_pointwise
instead of the registered binary optorch.ops.mkldnn._linear_pointwise.binary
. Update the call to use the.binary
suffix so it exercises the correct kernel.
fused = torch.ops.mkldnn._linear_pointwise(
aten/src/ATen/native/mkldnn/xpu/FusionUtils.h:1
- [nitpick] Public API in this header lacks a brief comment describing its purpose. Consider adding a file-level doc comment explaining the role of FusionUtils and its functions.
#pragma once
@ZhiweiYan-96 , pls. check the copilot's comments. |
Motivation
This PR is intended to add post-op fusion support fo Linear. The liner-pointwise fusion is expected to be used in graph mode like torch.compile. The FusionUtils.cpp file defines a utilization APIs for generating primitive attribute. This APIs would also be used for conv-pointwise fusion, which is in #140372.
Validation
Stack from ghstack (oldest at bottom):
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @gujinghui @EikanWang @fengyuan14 @guangyey