-
Notifications
You must be signed in to change notification settings - Fork 24.3k
[Intel GPU] qlinear.pointwise with mixed dtype support #136753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136753
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New Failures, 6 Unrelated FailuresAs of commit 41c7207 with merge base 3591657 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This commit is based off of a commit that's about a month old and no longer is compatible with our CI system. Please rebase this onto the latest |
@ZainRizvi Great thanks for your reminding! I have rebased my PR stakck. |
is_qat=is_qat, | ||
is_dynamic=is_dynamic, | ||
) | ||
|
||
def _qlinear_dequant_promotion_cpu_test_helper( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _qlinear_dequant_promotion_cpu_test_helper( | |
def _qlinear_dequant_promotion_test_helper( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding, has changed all the naming in this file.
@@ -2375,7 +2461,6 @@ def test_qlinear_relu_xpu(self): | |||
(torch.randn((2, 4)).to(device="xpu"),), device="xpu" | |||
) | |||
|
|||
@skipIfNoDynamoSupport |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for reminding, it's added back
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 4 jobs have failed, first few of them are: windows-binary-wheel / wheel-py3_11-cuda11_8-build, windows-binary-wheel / wheel-py3_12-cuda11_8-build, windows-binary-wheel / wheel-py3_13t-cuda11_8-build, windows-binary-wheel / wheel-py3_13t-cuda12_4-build Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 10 checks: xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 2, 4, linux.idc.xpu), xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu), windows-binary-wheel / wheel-py3_11-cuda11_8-build, windows-binary-wheel / wheel-py3_13t-cuda12_6-build, windows-binary-wheel / wheel-py3_10-cuda11_8-build, windows-binary-wheel / wheel-py3_13-cuda11_8-build, windows-binary-wheel / wheel-py3_12-cuda11_8-build, windows-binary-wheel / wheel-py3_13t-cuda11_8-build, windows-binary-wheel / wheel-py3_13t-cuda12_4-build, windows-binary-wheel / wheel-py3_9-cuda11_8-build Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
# Motivation This PR is aimed to add mixed data type(AMP) support for `qlinear_pointwise` op. With current PR, we allow `qlinear` kernels output Tensor that is BF16, rather than FP32/INT8. # UT verification ```bash DNNL_VERBOSE=1 python test/inductor/test_mkldnn_pattern_matcher.py -v \ -k test_qlinear_int8_mixed_bf16_xpu \ -k test_qlinear_relu_int8_mixed_bf16_xpu \ -k test_qlinear_add_int8_mixed_bf16_xpu ``` # Runtime exemplification ```bash #qlinear+bf16 output onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32,,4x4:4x4,0.0698242 # qlinear_add + bf16 output onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32 attr-post-ops:eltwise_linear:1:-0.677141+sum:0.0132773,,4x4:4x4,0.0419922 # qlinear_add_relu + bf16 output onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32 attr-post-ops:eltwise_linear:1:0.533096+sum:0.00416481+eltwise_relu,,4x4:4x4,0.0759277 ``` As shown in the oneDNN verbose, the attribute `dst_bf16::blocked:ab::f0` demonstrate that we could successfully output a bf16 tensor in int8 gemm. Pull Request resolved: #136753 Approved by: https://github.com/EikanWang, https://github.com/guangyey, https://github.com/desertfire, https://github.com/jerryzh168 ghstack dependencies: #133307, #135189, #135337, #135465 Co-authored-by: guangyey <guangye.yu@intel.com>
# Motivation This PR is aimed to add mixed data type(AMP) support for `qlinear_pointwise` op. With current PR, we allow `qlinear` kernels output Tensor that is BF16, rather than FP32/INT8. # UT verification ```bash DNNL_VERBOSE=1 python test/inductor/test_mkldnn_pattern_matcher.py -v \ -k test_qlinear_int8_mixed_bf16_xpu \ -k test_qlinear_relu_int8_mixed_bf16_xpu \ -k test_qlinear_add_int8_mixed_bf16_xpu ``` # Runtime exemplification ```bash #qlinear+bf16 output onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32,,4x4:4x4,0.0698242 # qlinear_add + bf16 output onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32 attr-post-ops:eltwise_linear:1:-0.677141+sum:0.0132773,,4x4:4x4,0.0419922 # qlinear_add_relu + bf16 output onednn_verbose,primitive,exec,gpu:0,matmul,ocl:gemm_with_po:any,undef,src_s8::blocked:ab::f0 wei_s8::blocked:ab::f0 bia_bf16::blocked:ab::f0_mask2 dst_bf16::blocked:ab::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:2:f32 attr-zero-points:src0:0:s32 attr-post-ops:eltwise_linear:1:0.533096+sum:0.00416481+eltwise_relu,,4x4:4x4,0.0759277 ``` As shown in the oneDNN verbose, the attribute `dst_bf16::blocked:ab::f0` demonstrate that we could successfully output a bf16 tensor in int8 gemm. Pull Request resolved: pytorch#136753 Approved by: https://github.com/EikanWang, https://github.com/guangyey, https://github.com/desertfire, https://github.com/jerryzh168 ghstack dependencies: pytorch#133307, pytorch#135189, pytorch#135337, pytorch#135465 Co-authored-by: guangyey <guangye.yu@intel.com>
Motivation
This PR is aimed to add mixed data type(AMP) support for
qlinear_pointwise
op. With current PR, we allowqlinear
kernels output Tensor that is BF16, rather than FP32/INT8.UT verification
Runtime exemplification
As shown in the oneDNN verbose, the attribute
dst_bf16::blocked:ab::f0
demonstrate that we could successfully output a bf16 tensor in int8 gemm.Stack from ghstack (oldest at bottom):
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov