Support fp8 output of _scaled_mm for CPU #153600

yanbing-j · 2025-05-15T08:01:30Z

This PR is to support fp8 output of torch._scaled_mm for CPU, and create related UTs with fp8 and bf16/fp16/fp32 output.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @Guobing-Chen @Xia-Weiwen @snadampal

pytorch-bot · 2025-05-15T08:01:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153600

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d992029 with merge base 1a722f6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Xia-Weiwen · 2025-05-16T07:37:01Z

aten/src/ATen/native/mkldnn/Linear.cpp

+  ideep::tensor dst_scales_t = ideep::tensor(ideep::scale_t(1, output_scale));
+  args.insert({DNNL_ARG_ATTR_SCALES | DNNL_ARG_DST, dst_scales_t});


Do we need to check output_scale != 1.0f here?

Ok, I will follow the output_scale in qconv and add the check here.

leslie-fang-intel

LGTM

pytorch-bot bot added ciflow/linux-aarch64 linux aarch64 CI workflow module: cpu CPU specific problem (e.g., perf, algorithm) module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration labels May 15, 2025

pytorchbot added the open source label May 15, 2025

yanbing-j added the topic: not user facing topic category label May 15, 2025

yanbing-j force-pushed the yanbing/scaled_mm_fp8_output branch 2 times, most recently from 8876f82 to 75a6bba Compare May 16, 2025 07:20

yanbing-j marked this pull request as ready for review May 16, 2025 07:21

yanbing-j requested a review from leslie-fang-intel May 16, 2025 07:21

yanbing-j added the ciflow/trunk Trigger trunk jobs on your pull request label May 16, 2025

yanbing-j requested a review from mingfeima May 16, 2025 07:23

Support fp8 output of _scaled_mm for CPU

75a6bba

Xia-Weiwen reviewed May 16, 2025

View reviewed changes

yanbing-j requested a review from Xia-Weiwen May 16, 2025 07:45

Update

d992029

leslie-fang-intel approved these changes May 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support fp8 output of _scaled_mm for CPU #153600

Support fp8 output of _scaled_mm for CPU #153600

		ideep::tensor dst_scales_t = ideep::tensor(ideep::scale_t(1, output_scale));
		args.insert({DNNL_ARG_ATTR_SCALES \| DNNL_ARG_DST, dst_scales_t});

Support fp8 output of _scaled_mm for CPU #153600

Are you sure you want to change the base?

Support fp8 output of _scaled_mm for CPU #153600

Conversation

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153600

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment