[Inductor] Expand dtype aware codegen for libdevice and tl.math ops #140864

blaine-rister · 2024-11-16T02:40:57Z

Feature

Previously, only the codegen for torch.sqrt was dtype aware. This PR updates most of the libdevice/tl.math ops to support dtype-aware codegen as well. This is often necessary to get correct code when config.triton.codegen_upcast_to_fp32=False, as most Triton math ops do not support float16/bfloat16.

This PR enables dtype aware codegen via the maybe_upcast_float32 decorator. This wraps TritonOverrides macros to upcast arguments to float32, and downcast the result back to the original dtype. The exception is for ops that return booleans, in which case we set convert_output=False and skip the output cast.

Test Plan

Added CI tests for all the new ops. The list of ops to test is automatically generated based on uses of the maybe_upcast_float32 decorator, and stored in the new OpDtypeSupport class. In each new test, we search the generated code for upcasts/downcasts using a regex.

Also added a unit test for OpDtypeSupport which checks that we have correct dtype info for ops that require upcasts.

This PR also moves some existing tests around, to collect all the dtype aware codegen tests in one file.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

[ghstack-poisoned]

…:pytorch/pytorch into brister/dtype_codegen

pytorch-bot · 2024-11-16T02:41:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140864

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 9c5d202 with merge base ed77901 ():

NEW FAILURE - The following job has failed:

pull / linux-focal-py3.13-clang10 / build (gh)
#21 68.54 Getting requirements to build wheel: finished with status 'error'

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge) (gh) (trunk failure)
##[error]Process completed with exit code 128.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

Adds the remaining unimplemented ops as well as an assertion failure if someone adds a new op without a dtype rule. We test all unique pointwise operators registered as lowerings which have an opinfo. There will be some follow ups for this to work well with both `codegen_upcast_to_fp32` as True and False. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

test/inductor/test_op_dtype_prop.py

…o brister/dtype_codegen

pytorchmergebot · 2024-12-06T21:03:00Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2024-12-06T21:03:09Z

@blaine-rister your PR has been successfully reverted.

…th ops (#140864)" This reverts commit 80ca6dd. Reverted #140864 on behalf of https://github.com/atalman due to failing internally ([comment](#140864 (comment)))

@pytorchbot

@pytorchbot revert -m "Nondetermistic test is failing internally"

pytorch-bot · 2024-12-06T21:06:38Z

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

blaine-rister · 2024-12-06T21:12:39Z

@pytorchbot merge

netlify · Dec 6, 2024

✅ Deploy Preview for chimerical-cranachan-793287 ready!

Name	Link
🔨 Latest commit	`9c5d202`
🔍 Latest deploy log	https://app.netlify.com/sites/chimerical-cranachan-793287/deploys/675368b3bb7011000897083c
😎 Deploy Preview	https://deploy-preview-140864--chimerical-cranachan-793287.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2024-12-06T21:14:30Z

@blaine-rister has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pytorchmergebot · 2024-12-06T21:15:29Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-12-06T21:33:34Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

blaine-rister · 2024-12-06T21:33:49Z

Canceling merge until I can confirm that internal tests are fixed.

…th ops (#140864)" This reverts commit 80ca6dd. Reverted #140864 on behalf of https://github.com/atalman due to failing internally ([comment](#140864 (comment)))

blaine-rister · 2024-12-07T07:36:24Z

@pytorchbot merge -i

pytorchmergebot · 2024-12-07T07:38:13Z

Merge started

Your change will be merged while ignoring the following 2 checks: pull / linux-focal-py3.13-clang10 / build, pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-12-07T13:36:45Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

blaine-rister · 2024-12-08T18:46:21Z

@pytocrhbot merge -i

blaine-rister · 2024-12-08T19:40:45Z

@pytorchbot merge -i

pytorchmergebot · 2024-12-08T19:42:25Z

Merge started

Your change will be merged while i F438 gnoring the following 2 checks: pull / linux-focal-py3.13-clang10 / build, pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…140864) # Feature Previously, only the codegen for `torch.sqrt` was dtype aware. This PR updates most of the `libdevice`/`tl.math` ops to support dtype-aware codegen as well. This is often necessary to get correct code when `config.triton.codegen_upcast_to_fp32=False`, as most Triton math ops do not support float16/bfloat16. This PR enables dtype aware codegen via the `maybe_upcast_float32` decorator. This wraps `TritonOverrides` macros to upcast arguments to float32, and downcast the result back to the original dtype. The exception is for ops that return booleans, in which case we set `convert_output=False` and skip the output cast. # Test Plan Added CI tests for all the new ops. The list of ops to test is automatically generated based on uses of the `maybe_upcast_float32` decorator, and stored in the new `OpDtypeSupport` class. In each new test, we search the generated code for upcasts/downcasts using a regex. Also added a unit test for `OpDtypeSupport` which checks that we have correct dtype info for ops that require upcasts. This PR also moves some existing tests around, to collect all the dtype aware codegen tests in one file. Pull Request resolved: #140864 Approved by: https://github.com/eellison, https://github.com/arui-meta Co-authored-by: eellison <elias.ellison@gmail.com>

…th ops (pytorch#140864)" This reverts commit 80ca6dd. Reverted pytorch#140864 on behalf of https://github.com/atalman due to failing internally ([comment](pytorch#140864 (comment)))

…ytorch#140864) # Feature Previously, only the codegen for `torch.sqrt` was dtype aware. This PR updates most of the `libdevice`/`tl.math` ops to support dtype-aware codegen as well. This is often necessary to get correct code when `config.triton.codegen_upcast_to_fp32=False`, as most Triton math ops do not support float16/bfloat16. This PR enables dtype aware codegen via the `maybe_upcast_float32` decorator. This wraps `TritonOverrides` macros to upcast arguments to float32, and downcast the result back to the original dtype. The exception is for ops that return booleans, in which case we set `convert_output=False` and skip the output cast. # Test Plan Added CI tests for all the new ops. The list of ops to test is automatically generated based on uses of the `maybe_upcast_float32` decorator, and stored in the new `OpDtypeSupport` class. In each new test, we search the generated code for upcasts/downcasts using a regex. Also added a unit test for `OpDtypeSupport` which checks that we have correct dtype info for ops that require upcasts. This PR also moves some existing tests around, to collect all the dtype aware codegen tests in one file. Pull Request resolved: pytorch#140864 Approved by: https://github.com/eellison, https://github.com/arui-meta Co-authored-by: eellison <elias.ellison@gmail.com>

eellison and others added 9 commits November 6, 2024 16:27

Refactor rsqrt lowering

1f22318

[ghstack-poisoned]

Refactor dtype propagation

b6923d0

[ghstack-poisoned]

Add a few methods

3723c19

[ghstack-poisoned]

Update base for Update on "Add a few methods"

7390778

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

Update on "Add a few methods"

8fef5d3

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

Add remaining method and tests

e2263ba

[ghstack-poisoned]

Merge commit 'e2263ba6ec909f6f2906bcb64a4ade15aa4a6868' of github.com…

3c639c8

…:pytorch/pytorch into brister/dtype_codegen

add in changes from phabricator

504226a

Automatically test dtype codegen

b506387

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 16, 2024

This was referenced Nov 16, 2024

Add remaining method and tests for dtype propagation #140057

Closed

[Inductor] Expand dtype aware codegen for libdevice and tl.math ops #139839

Closed

blaine-rister and others added 10 commits November 15, 2024 18:50

typo
54ff059

Update base for Update on "Add remaining method and tests"

417c5f1

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

Update on "Add remaining method and tests"

595f8fa

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

refactor dtype support into a class

f3d161a

blaine-rister commented Nov 20, 2024

View reviewed changes

test/inductor/test_op_dtype_prop.py Show resolved Hide resolved

isuruf mentioned this pull request Nov 20, 2024

DISABLED test_post_accumulate_grad_hook_e2e (__main__.TestAutograd) #137892

Closed

blaine-rister added 2 commits November 20, 2024 14:36

update test

bae17ce

Merge branch 'gh/eellison/726/head' of github.com:pytorch/pytorch int…

7d81dcd

…o brister/dtype_codegen

blaine-rister requested review from eellison and arui-meta November 21, 2024 00:34

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Dec 6, 2024

pytorchmergebot reopened this Dec 6, 2024

Copy link

Contributor Author

blaine-rister commented Dec 6, 2024

@pytorchbot revert -m "Nondetermistic test is failing internally"

fix test nondeterminism

9c5d202

pytorchmergebot added the merging label Dec 6, 2024

pytorchmergebot closed this in 0c66cee Dec 8, 2024

pytorchmergebot removed the merging label Dec 8, 2024

github-actions bot deleted the brister/dtype_codegen branch January 8, 2025 02:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inductor] Expand dtype aware codegen for libdevice and tl.math ops #140864

[Inductor] Expand dtype aware codegen for libdevice and tl.math ops #140864

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Inductor] Expand dtype aware codegen for libdevice and tl.math ops #140864

[Inductor] Expand dtype aware codegen for libdevice and tl.math ops #140864

Uh oh!

Conversation

Uh oh!

Feature

Test Plan

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140864

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

✅ Deploy Preview for chimerical-cranachan-793287 ready!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!