[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor #127294

zhuhaozhe · 2024-05-28T13:50:43Z

When torch.backends.mkldnn.matmul.fp32_precision=='bf16', we also enabled mkldnn linear in inductor path and allow to run with bf16 computation data type.

Testplan:

python test/inductor/test_mkldnn_pattern_matcher.py -k test_linear_unary
python test/inductor/test_mkldnn_pattern_matcher.py -k test_linear_fp32
python test/inductor/test_mkldnn_pattern_matcher.py -k test_multi_linear_share_same_input

Stack from ghstack (oldest at bottom):

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @peterbell10 @yf225 @ColinPeppler @desertfire

pytorch-bot · 2024-05-28T13:50:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127294

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (7 Unrelated Failures)

As of commit 63915d3 with merge base 2022588 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor / unit-test / cuda12.8-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #126867 but the issue was closed recently and a rebase is needed to make it pass)
inductor/test_max_autotune.py::TestMaxAutotune::test_non_contiguous_input_mm_plus_mm

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 1, 6, linux.idc.xpu) (gh) (trunk failure)
ModuleNotFoundError: No module named 'boto3'
xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 2, 6, linux.idc.xpu) (gh) (trunk failure)
ModuleNotFoundError: No module named 'boto3'
xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 3, 6, linux.idc.xpu) (gh) (trunk failure)
ModuleNotFoundError: No module named 'boto3'
xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 4, 6, linux.idc.xpu) (gh) (trunk failure)
ModuleNotFoundError: No module named 'boto3'
xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 5, 6, linux.idc.xpu) (gh) (trunk failure)
ModuleNotFoundError: No module named 'boto3'
xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 6, 6, linux.idc.xpu) (gh) (trunk failure)
ModuleNotFoundError: No module named 'boto3'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 3582e56 Pull Request resolved: #127294

[ghstack-poisoned]

jgong5 · 2024-05-29T02:27:05Z

torch/_inductor/fx_passes/mkldnn_fusion.py

+            if not hasattr(add_node.args[1], "meta"):
+                # May add an "int"
+                # We meet this issue while enabling bf32 for test_linear_unary
+                # the hardsimoid case will add "3" here and cannot check meta
+                # TODO: Further investigate on this issue


Is it for the bias node or something else? Can you double check? Why didn't we get the problem in the past?

It should be another input node for "add".
We did not meet this in the past because we only enabled and tested lp for mkldnn fusion. For lp case, there will be a "to_dtype" inserted by autocast and will not match this pattern.

I am trying to find a case to satisfy this pattern by raising a runtime error if this function will return True.
#127597

Fixed in #127597

ghstack-source-id: 371129b Pull Request resolved: #127294

[ghstack-poisoned]

ghstack-source-id: 5deb9a0 Pull Request resolved: #127294

[ghstack-poisoned]

ghstack-source-id: 02c3c91 Pull Request resolved: #127294

[ghstack-poisoned]

ghstack-source-id: 02c3c91 Pull Request resolved: #127294

[ghstack-poisoned]

ghstack-source-id: 02c3c91 Pull Request resolved: #127294

[ghstack-poisoned]

ghstack-source-id: 8ea2aaa Pull Request resolved: pytorch#127294

[ghstack-poisoned]

yanbing-j · 2025-07-03T08:41:57Z

@pytorchbot merge

pytorchmergebot · 2025-07-03T08:43:49Z

Merge failed

Reason: Approvers from one of the following sets are needed:

superuser (pytorch/metamates)
Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

yanbing-j · 2025-07-04T01:19:34Z

Hi @jansel , could you please help review this PR as well? Thanks!

This PR is the follow-up of #125888, which is to refine fp32 precision API and has been merged. This PR will allow BF32 for mkldnn linear, which is use BF16 as FP32 internal precision for calculation.

[ghstack-poisoned]

yanbing-j · 2025-07-07T05:56:03Z

@pytorchbot merge

pytorchmergebot · 2025-07-07T05:57:56Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

zhuhaozhe mentioned this pull request May 28, 2024

refine fp32 precision api #125888

Closed

pytorch-bot bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor labels May 28, 2024

This was referenced May 28, 2024

allow to use bf16 as fp32 internal precision for mkldnn conv #126050

Closed

allow to use bf16 as fp32 internal precision for mkldnn conv backward #126054

Closed

[inductor] enable bf32 test for mkldnn conv #127293

Closed

zhuhaozhe added a commit that referenced this pull request May 28, 2024

[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor

ab9b956

ghstack-source-id: 3582e56 Pull Request resolved: #127294

Update

e6d2625

[ghstack-poisoned]

zhuhaozhe marked this pull request as draft May 28, 2024 14:01

pytorchbot added the open source label May 28, 2024

zhuhaozhe requested a review from jgong5 May 28, 2024 14:15

jgong5 reviewed May 29, 2024

View reviewed changes

zhuhaozhe added a commit that referenced this pull request May 30, 2024

[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor

db42984

ghstack-source-id: 371129b Pull Request resolved: #127294

Update

a197dd0

[ghstack-poisoned]

Update

7e0bd49

[ghstack-poisoned]

zhuhaozhe added a commit that referenced this pull request Jun 4, 2024

[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor

bc94d01

ghstack-source-id: 5deb9a0 Pull Request resolved: #127294

Update

e01772b

[ghstack-poisoned]

zhuhaozhe added a commit that referenced this pull request Jun 5, 2024

[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor

b58e116

ghstack-source-id: 02c3c91 Pull Request resolved: #127294

Update

1dde237

[ghstack-poisoned]

zhuhaozhe requested a review from jgong5 June 5, 2024 08:33

zhuhaozhe added a commit that referenced this pull request Jun 5, 2024

[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor

4d90ff0

ghstack-source-id: 02c3c91 Pull Request resolved: #127294

Update

ed0c68f

[ghstack-poisoned]

zhuhaozhe added a commit that referenced this pull request Jun 7, 2024

[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor

ad9bcb8

ghstack-source-id: 02c3c91 Pull Request resolved: #127294

jgong5 approved these changes Jun 7, 2024

View reviewed changes

Update

ab13767

[ghstack-poisoned]

yanbing-j mentioned this pull request Jun 25, 2025

allow to use tf32 as fp32 internal precision for mkldnn conv/matmul/rnn #156802

Closed

yanbing-j added 3 commits June 25, 2025 07:33

Update

79e8dfd

[ghstack-poisoned]

Update

3776c31

[ghstack-poisoned]

Update

5761f5d

[ghstack-poisoned]

yanbing-j pushed a commit to yanbing-j/pytorch that referenced this pull request Jun 30, 2025

[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor

e389464

ghstack-source-id: 8ea2aaa Pull Request resolved: pytorch#127294

yanbing-j added 4 commits June 30, 2025 08:56

Update

2192f93

[ghstack-poisoned]

Update

682b6c5

[ghstack-poisoned]

Update

1d12fbb

[ghstack-poisoned]

8000

Update

c7e8536

[ghstack-poisoned]

yanbing-j mentioned this pull request Jul 3, 2025

Enable TF32 as fp32 internal precision for matmul/linear/conv #157520

Closed

yanbing-j added 2 commits July 3, 2025 01:36

Update

2d2956b

[ghstack-poisoned]

Update

fc2e94e

[ghstack-poisoned]

yanbing-j marked this pull request as ready for review July 3, 2025 08:41

pytorchmergebot added the merging label Jul 3, 2025

pytorchmergebot removed the merging label Jul 3, 2025

yanbing-j requested a review from jansel July 3, 2025 09:49

jansel approved these changes Jul 5, 2025

View reviewed changes

Update

63915d3

[ghstack-poisoned]

pytorchmergebot added the merging label Jul 7, 2025

pytorchmergebot closed this in 815545f Jul 7, 2025

pytorchmergebot added Merged and removed merging labels Jul 7, 2025

github-actions bot deleted the gh/zhuhaozhe/33/head branch August 7, 2025 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor #127294

[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor #127294

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor #127294

[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor #127294

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127294

✅ You can merge normally! (7 Unrelated Failures)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Merge failed

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants