[Intel GPU] qlinear.pointwise with mixed dtype support #136753

ZhiweiYan-96 · 2024-09-26T13:00:46Z

Motivation

This PR is aimed to add mixed data type(AMP) support for qlinear_pointwise op. With current PR, we allow qlinear kernels output Tensor that is BF16, rather than FP32/INT8.

UT verification

DNNL_VERBOSE=1 python test/inductor/test_mkldnn_pattern_matcher.py -v \
    -k test_qlinear_int8_mixed_bf16_xpu \
    -k test_qlinear_relu_int8_mixed_bf16_xpu \
    -k test_qlinear_add_int8_mixed_bf16_xpu

Runtime exemplification

#qlinear+bf16 output
onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32,,4x4:4x4,0.0698242
# qlinear_add + bf16 output
onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32 attr-post-ops:eltwise_linear:1:-0.677141+sum:0.0132773,,4x4:4x4,0.0419922
# qlinear_add_relu + bf16 output
onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32 attr-post-ops:eltwise_linear:1:0.533096+sum:0.00416481+eltwise_relu,,4x4:4x4,0.0759277

As shown in the oneDNN verbose, the attribute dst_bf16::blocked:ab::f0 demonstrate that we could successfully output a bf16 tensor in int8 gemm.

Stack from ghstack (oldest at bottom):

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2024-09-26T13:00:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136753

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 6 Unrelated Failures

As of commit 41c7207 with merge base 3591657 ():

NEW FAILURES - The following jobs have failed:

windows-binary-wheel / wheel-py3_11-cuda11_8-build (gh)
sccache: error: couldn't connect to server
windows-binary-wheel / wheel-py3_12-cuda11_8-build (gh)
sccache: error: couldn't connect to server
windows-binary-wheel / wheel-py3_13t-cuda11_8-build (gh)
sccache: error: couldn't connect to server
windows-binary-wheel / wheel-py3_13t-cuda12_4-build (gh)
sccache: error: couldn't connect to server

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

windows-binary-wheel / wheel-py3_10-cuda11_8-build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
windows-binary-wheel / wheel-py3_13-cuda11_8-build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
windows-binary-wheel / wheel-py3_13t-cuda12_6-build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
windows-binary-wheel / wheel-py3_9-cuda11_8-build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 2, 4, linux.idc.xpu) (gh) (detected as infra flaky with no log or failing log classifier)
xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu) (gh) (similar failure)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fft_c2c_xpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 00b8adc Pull Request resolved: #136753

ZainRizvi · 2024-09-26T19:02:57Z

This commit is based off of a commit that's about a month old and no longer is compatible with our CI system. Please rebase this onto the latest viable/strict branch

[ghstack-poisoned]

ghstack-source-id: 3671a29 Pull Request resolved: #136753

ZhiweiYan-96 · 2024-10-09T09:13:44Z

@ZainRizvi Great thanks for your reminding! I have rebased my PR stakck.

ghstack-source-id: 3671a29 Pull Request resolved: #136753

[ghstack-poisoned]

ghstack-source-id: 0de2499 Pull Request resolved: #136753

[ghstack-poisoned]

ghstack-source-id: 7c650f4 Pull Request resolved: #136753

[ghstack-poisoned]

ghstack-source-id: 6b4560e Pull Request resolved: #136753

[ghstack-poisoned]

ghstack-source-id: dd0171f Pull Request resolved: #136753

[ghstack-poisoned]

ghstack-source-id: 26d47fa Pull Request resolved: #136753

[ghstack-poisoned]

guangyey · 2025-02-12T02:47:09Z

test/inductor/test_mkldnn_pattern_matcher.py

+            is_qat=is_qat,
+            is_dynamic=is_dynamic,
+        )
+
    def _qlinear_dequant_promotion_cpu_test_helper(


Suggested change

def _qlinear_dequant_promotion_cpu_test_helper(

def _qlinear_dequant_promotion_test_helper(

Thanks for reminding, has changed all the naming in this file.

guangyey · 2025-02-12T02:49:20Z

test/inductor/test_mkldnn_pattern_matcher.py

@@ -2375,7 +2461,6 @@ def test_qlinear_relu_xpu(self):
            (torch.randn((2, 4)).to(device="xpu"),), device="xpu"
        )

-    @skipIfNoDynamoSupport


thanks for reminding, it's added back

[ghstack-poisoned]

ghstack-source-id: 9d217b7 Pull Request resolved: #136753

[ghstack-poisoned]

ghstack-source-id: 2fa8655 Pull Request resolved: #136753

EikanWang · 2025-02-24T17:02:36Z

@pytorchbot merge

pytorchmergebot · 2025-02-24T17:04:36Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-02-24T17:05:13Z

Merge failed

Reason: 4 jobs have failed, first few of them are: windows-binary-wheel / wheel-py3_11-cuda11_8-build, windows-binary-wheel / wheel-py3_12-cuda11_8-build, windows-binary-wheel / wheel-py3_13t-cuda11_8-build, windows-binary-wheel / wheel-py3_13t-cuda12_4-build

Details for Dev Infra team

Raised by workflow job

EikanWang · 2025-02-24T17:06:19Z

@pytorchbot merge -i

pytorchmergebot · 2025-02-24T17:08:26Z

Merge started

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Motivation This PR is aimed to add mixed data type(AMP) support for `qlinear_pointwise` op. With current PR, we allow `qlinear` kernels output Tensor that is BF16, rather than FP32/INT8. # UT verification ```bash DNNL_VERBOSE=1 python test/inductor/test_mkldnn_pattern_matcher.py -v \ -k test_qlinear_int8_mixed_bf16_xpu \ -k test_qlinear_relu_int8_mixed_bf16_xpu \ -k test_qlinear_add_int8_mixed_bf16_xpu ``` # Runtime exemplification ```bash #qlinear+bf16 output onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32,,4x4:4x4,0.0698242 # qlinear_add + bf16 output onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32 attr-post-ops:eltwise_linear:1:-0.677141+sum:0.0132773,,4x4:4x4,0.0419922 # qlinear_add_relu + bf16 output onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32 attr-post-ops:eltwise_linear:1:0.533096+sum:0.00416481+eltwise_relu,,4x4:4x4,0.0759277 ``` As shown in the oneDNN verbose, the attribute `dst_bf16::blocked:ab::f0` demonstrate that we could successfully output a bf16 tensor in int8 gemm. Pull Request resolved: #136753 Approved by: https://github.com/EikanWang, https://github.com/guangyey, https://github.com/desertfire, https://github.com/jerryzh168 ghstack dependencies: #133307, #135189, #135337, #135465 Co-authored-by: guangyey <guangye.yu@intel.com>

# Motivation This PR is aimed to add mixed data type(AMP) support for `qlinear_pointwise` op. With current PR, we allow `qlinear` kernels output Tensor that is BF16, rather than FP32/INT8. # UT verification ```bash DNNL_VERBOSE=1 python test/inductor/test_mkldnn_pattern_matcher.py -v \ -k test_qlinear_int8_mixed_bf16_xpu \ -k test_qlinear_relu_int8_mixed_bf16_xpu \ -k test_qlinear_add_int8_mixed_bf16_xpu ``` # Runtime exemplification ```bash #qlinear+bf16 output onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32,,4x4:4x4,0.0698242 # qlinear_add + bf16 output onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32 attr-post-ops:eltwise_linear:1:-0.677141+sum:0.0132773,,4x4:4x4,0.0419922 # qlinear_add_relu + bf16 output onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32 attr-post-ops:eltwise_linear:1:0.533096+sum:0.00416481+eltwise_relu,,4x4:4x4,0.0759277 ``` As shown in the oneDNN verbose, the attribute `dst_bf16::blocked:ab::f0` demonstrate that we could successfully output a bf16 tensor in int8 gemm. Pull Request resolved: pytorch#136753 Approved by: https://github.com/EikanWang, https://github.com/guangyey, https://github.com/desertfire, https://github.com/jerryzh168 ghstack dependencies: pytorch#133307, pytorch#135189, pytorch#135337, pytorch#135465 Co-authored-by: guangyey <guangye.yu@intel.com>

Update

b1eb10e

[ghstack-poisoned]

ZhiweiYan-96 mentioned this pull request Sep 26, 2024

[Intel GPU] qconv at XPU backend #133080

Closed

pytorch-bot bot added module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor labels Sep 26, 2024

This was referenced Sep 9, 2024

[Intel GPU] qlinear at XPU backend #133307

Closed

[Intel GPU] qconv_pointwise.binary XPU support #135189

Closed

[Intel GPU] qlinear_pointwise.binary[_tensor] XPU support #135337

Closed

[Intel GPU] qconv.pointwise with mixed dtype XPU support #135465

Closed

ZhiweiYan-96 added a commit that referenced this pull request Sep 26, 2024

[Intel GPU] qlinear.pointwise with mixed dtype support

1695e8b

ghstack-source-id: 00b8adc Pull Request resolved: #136753

ZhiweiYan-96 marked this pull request as draft September 26, 2024 13:01

pytorchbot added the open source label Sep 26, 2024

Update

dae8179

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Oct 9, 2024

[Intel GPU] qlinear.pointwise with mixed dtype support

bcca882

ghstack-source-id: 3671a29 Pull Request resolved: #136753

ZhiweiYan-96 added the topic: not user facing topic category label Oct 9, 2024

ZhiweiYan-96 added a commit that referenced this pull request Oct 9, 2024

[Intel GPU] qlinear.pointwise with mixed dtype support

445c0a2

ghstack-source-id: 3671a29 Pull Request resolved: #136753

Update

9c0eda1

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Oct 9, 2024

[Intel GPU] qlinear.pointwise with mixed dtype support

68f80ac

ghstack-source-id: 0de2499 Pull Request resolved: #136753

Update

db98241

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Oct 17, 2024

[Intel GPU] qlinear.pointwise with mixed dtype support

6232a79

ghstack-source-id: 7c650f4 Pull Request resolved: #136753

Update

833de1e

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Oct 21, 2024

[Intel GPU] qlinear.pointwise with mixed dtype support

a71bef9

ghstack-source-id: 6b4560e Pull Request resolved: #136753

Update

21ee3c3

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Oct 23, 2024

[Intel GPU] qlinear.pointwise with mixed dtype support

2e885d2

ghstack-source-id: dd0171f Pull Request resolved: #136753

ZhiweiYan-96 added a commit that referenced this pull request Oct 23, 2024

[Intel GPU] qlinear.pointwise with mixed dtype support

44cf927

ghstack-source-id: dd0171f Pull Request resolved: #136753

Update

8f1ee44

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Oct 23, 2024

[Intel GPU] qlinear.pointwise with mixed dtype support

2d52d3e

ghstack-source-id: 26d47fa Pull Request resolved: #136753

Update

80e557a

[ghstack-poisoned]

guangyey reviewed Feb 12, 2025

View reviewed changes

guangyey approved these changes Feb 12, 2025

View reviewed changes

ZhiweiYan-96 added ciflow/xpu Run XPU CI tasks keep-going Don't stop on first failure, keep running tests until the end labels Feb 12, 2025

Update

7fbe418

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Feb 12, 2025

[Intel GPU] qlinear.pointwise with mixed dtype support

349ab48

ghstack-source-id: 9d217b7 Pull Request resolved: #136753

EikanWang requested a review from jerryzh168 February 14, 2025 03:00

Update

41c7207

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Feb 17, 2025

[Intel GPU] qlinear.pointwise with mixed dtype support

ce0be4b

ghstack-source-id: 2fa8655 Pull Request resolved: #136753

ZhiweiYan-96 added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Feb 20, 2025

desertfire approved these changes Feb 24, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 24, 2025

pytorchmergebot added the merging label Feb 24, 2025

pytorchmergebot removed the merging label Feb 24, 2025

pytorchmergebot added the merging label Feb 24, 2025

jerryzh168 approved these changes Feb 24, 2025

View reviewed changes

pytorchmergebot added the Merged label Feb 24, 2025

pytorchmergebot closed this in b9b1fd9 Feb 24, 2025

pytorchmergebot removed the merging label Feb 24, 2025

github-actions bot deleted the gh/ZhiweiYan-96/31/head branch March 27, 2025 02:10

atalman mentioned this pull request Apr 3, 2025

Release 2.7.0 validations checklist and cherry-picks #150628

Closed

65 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Intel GPU] qlinear.pointwise with mixed dtype support #136753

[Intel GPU] qlinear.pointwise with mixed dtype support #136753

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	def _qlinear_dequant_promotion_cpu_test_helper(
	def _qlinear_dequant_promotion_test_helper(

[Intel GPU] qlinear.pointwise with mixed dtype support #136753

[Intel GPU] qlinear.pointwise with mixed dtype support #136753

Uh oh!

Conversation

Uh oh!

Motivation

UT verification

Runtime exemplification

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136753

❌ 4 New Failures, 6 Unrelated Failures

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Merge started

Uh oh!

Merge failed

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!