8000 [Intel GPU][pt2e]: Collapse 3D input to 2D for matmul in qlinear_pointwise_binary fusion by ZhiweiYan-96 · Pull Request #148423 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[Intel GPU][pt2e]: Collapse 3D input to 2D for matmul in qlinear_pointwise_binary fusion #148423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

ZhiweiYan-96
Copy link
Collaborator
@ZhiweiYan-96 ZhiweiYan-96 commented Mar 4, 2025

Motivation

During the qlinear_pointwise_binary lowering pass, dim collapsing only occurs when post-ops is add. It is the responsibility of C++ kernels to handle dimension for post-ops sum

Details

This PR explicitly reshape input from 3D to 2D in op qlinear_pointwise_binary. Besides, we refractor implementation qlinear_pointwise_binary.tensor to call qlinear_pointwise_binary for removing duplicated codes.

UT testing

python test/inductor/test_mkldnn_pattern_matcher.py -k test_qlienar_add_xpu

Stack from ghstack (oldest at bottom):

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]
Copy link
pytorch-bot bot commented Mar 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148423

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9753ff4 with merge base b3bb73e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Mar 4, 2025
ZhiweiYan-96 added a commit that referenced this pull request Mar 4, 2025
…wise_binary fusion

ghstack-source-id: 58d1fe1
Pull Request resolved: #148423
@ZhiweiYan-96 ZhiweiYan-96 marked this pull request as draft March 4, 2025 05:58
@ZhiweiYan-96 ZhiweiYan-96 added ciflow/xpu Run XPU CI tasks topic: not user facing topic category ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request labels Mar 4, 2025
def __init__(self):
super(Model, self).__init__()
self.linear = torch.nn.Linear(10, 10)
self.relu = torch.nn.ReLU()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.relu = torch.nn.ReLU()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestions, since the UT is removed, we may resolve this issue.

[ghstack-poisoned]
ZhiweiYan-96 added a commit that referenced this pull request Mar 4, 2025
…wise_binary fusion

ghstack-source-id: 930d36e
Pull Request resolved: #148423
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZhiweiYan-96 , why do we need to add a dedicated test file? I suppose it should reuse other test files. Right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciation for suggestion, this file is not necessary.. I have removed the file and add 3D cases in test_mkldnn_pattern_matcher.py

@ZhiweiYan-96 ZhiweiYan-96 changed the title [PT2E Intel GPU]: Collapse 3D input to 2D for matmul in qlinear_pointwise_binary fusion [Intel GPU][pt2e]: Collapse 3D input to 2D for matmul in qlinear_pointwise_binary fusion Mar 5, 2025
[ghstack-poisoned]
@EikanWang EikanWang marked this pull request as ready for review March 5, 2025 08:17
@EikanWang EikanWang moved this to Review Required in PyTorch Intel Mar 5, 2025
@EikanWang EikanWang added the keep-going Don't stop on first failure, keep running tests until the end label Mar 5, 2025
@EikanWang
Copy link
Collaborator

@pytorchbot merge

8000

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approvers from one of the following sets are needed:

  • superuser (pytorch/metamates)
  • Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
  • Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)
Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@EikanWang EikanWang requested review from desertfire and jansel March 5, 2025 13:50
@ZhiweiYan-96
Copy link
Collaborator Author
ZhiweiYan-96 commented Mar 6, 2025

Hi @jansel @desertfire,
Would you mind reviewing the code changes on the inductor side int this PR and #148522 ? Great appreciation for your suggestions.. Please note that we've already conducted an internal review of the changes in the XPU backend (aten/src/ATen/native/mkldnn/xpu/), so feel free to focus on the new additions, or review the entire change as you prefer.
Thanks for your time again 😄

@ZhiweiYan-96
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-project-automation github-project-automation bot moved this from Review Required to Done in PyTorch Intel Mar 7, 2025
pytorchmergebot pushed a commit that referenced this pull request Mar 7, 2025
# Motivation&Details
This PR fix a bug that blocked quantized group convolution before. The bug is
8000
 caused by that, grouped convolution requires setting weight scale mask on both group dimension and output channel dimension. This PR fixs the wrong mask in integration and add grouped conv in UT.

# UT
` python test/inductor/test_mkldnn_pattern_matcher.py -k test_qconv2d_xpu`

# Runtime exemplification
```onednn_verbose,v1,primitive,exec,gpu:0,convolution,jit:ir,forward_training,src:s8::blocked:acdb::f0 wei:s8::blocked:abcde::f0 bia:f32::blocked:a::f0 dst:f32::blocked:acdb::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:3:f32 attr-zero-points:src0:0:s32,alg:convolution_direct,g4mb1_ic128oc128_ih4oh2kh3sh1dh0ph0_iw4ow2kw3sw1dw0pw0,0.0529785``
The verbose shows that we successfully run into quantized convolution, where weight is `abcde` format(group conv).

Pull Request resolved: #148522
Approved by: https://github.com/EikanWang, https://github.com/liangan1, https://github.com/jansel
ghstack dependencies: #148423
@github-actions github-actions bot deleted the gh/ZhiweiYan-96/51/head branch April 11, 2025 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks keep-going Don't stop on first failure, keep running tests until the end Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor open source topic: not user facing topic category
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants
0