POC for mixed prec optim frontend #146640

janeyx99 · 2025-02-06T21:31:50Z

This PR is a prototype for what a frontend for asking for mixed precision can look like torch.optim through set_dtype_policy in optimizer.py.

This is not meant to be landable but to start some discussions on what people want/would like to see and to ask if there are things I haven't considered yet.

This currently only works with Adam(W)!

A toy script for how to use:

import torch

model = torch.nn.Sequential(
    torch.nn.Linear(2, 3),
    torch.nn.Sigmoid(),
    torch.nn.Linear(3, 1),
    torch.nn.Sigmoid(),
)
model.to("cuda")

optim = torch.optim.AdamW(model.named_parameters(), foreach=False)
mp_policy = {
    "exp_avg": lambda _: torch.bfloat16,
    "exp_avg_sq": lambda _: torch.bfloat16,
    "max_exp_avg_sq": lambda _: torch.bfloat16,
}
optim.set_dtype_policy(mp_policy)

i = torch.tensor([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], device="cuda").reshape(3, 2)
l = model(i).sum()
l.bac
10000
kward()

optim.step()

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2025-02-06T21:31:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146640

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 10 New Failures

As of commit 59010b7 with merge base 83bb921 ():

NEW FAILURES - The following jobs have failed:

pull / linux-focal-cuda12.4-py3.10-gcc9 / test (default, 2, 5, lf.linux.4xlarge.nvidia.gpu) (gh)
test_optim.py::TestOptimRenewedCUDA::test_complex_2d_AdamW_cuda_complex64
pull / linux-focal-cuda12.4-py3.10-gcc9-sm89 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh)
test_optim.py::TestOptimRenewedCUDA::test_complex_2d_AdamW_cuda_complex64
pull / linux-focal-py3.13-clang10 / test (crossref, 2, 2, lf.linux.2xlarge) (gh)
test_optim.py::TestOptimRenewedCPU::test_complex_2d_AdamW_cpu_complex64
pull / linux-focal-py3.13-clang10 / test (default, 2, 5, lf.linux.4xlarge) (gh)
test_optim.py::TestOptimRenewedCPU::test_complex_2d_AdamW_cpu_complex64
pull / linux-focal-py3.13-clang10 / test (dynamo_wrapped, 3, 3, lf.linux.2xlarge) (gh)
test_optim.py::TestOptimRenewedCPU::test_complex_AdamW_cpu_complex64
pull / linux-focal-py3.9-clang10 / test (crossref, 2, 2, lf.linux.2xlarge) (gh)
test_optim.py::TestOptimRenewedCPU::test_complex_2d_AdamW_cpu_complex64
pull / linux-focal-py3.9-clang10 / test (default, 2, 5, lf.linux.4xlarge) (gh)
test_optim.py::TestOptimRenewedCPU::test_complex_2d_AdamW_cpu_complex64
pull / linux-focal-py3.9-clang10 / test (dynamo_wrapped, 3, 3, lf.linux.2xlarge) (gh)
test_optim.py::TestOptimRenewedCPU::test_complex_AdamW_cpu_complex64
pull / linux-jammy-py3.10-clang15-asan / test (default, 4, 6, lf.linux.4xlarge) (gh)
test_optim.py::TestOptimRenewedCPU::test_complex_2d_AdamW_cpu_complex64
pull / linux-jammy-py3.9-gcc11 / test (default, 2, 5, lf.linux.2xlarge) (gh)
test_optim.py::TestOptimRenewedCPU::test_complex_2d_AdamW_cpu_complex64

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: b3b5d46 Pull Request resolved: #146640

This PR is a prototype for what a frontend for asking for mixed precision can look like torch.optim through set_dtype_policy in optimizer.py. This is not meant to be landable but to start some discussions on what people want/would like to see and to ask if there are things I haven't considered yet. This currently only works with Adam(W)! A toy script for how to use: ``` import torch model = torch.nn.Sequential( torch.nn.Linear(2, 3), torch.nn.Sigmoid(), torch.nn.Linear(3, 1), torch.nn.Sigmoid(), ) model.to("cuda") optim = torch.optim.AdamW(model.named_parameters(), foreach=False) mp_policy = { "exp_avg": lambda _: torch.bfloat16, "exp_avg_sq": lambda _: torch.bfloat16, "max_exp_avg_sq": lambda _: torch.bfloat16, } optim.set_dtype_policy(mp_policy) i = torch.tensor([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], device="cuda").reshape(3, 2) l = model(i).sum() l.backward() optim.step() ``` [ghstack-poisoned]

ghstack-source-id: 251b9c3 Pull Request resolved: #146640

janeyx99 · 2025-02-08T06:26:26Z

torch/_meta_registrations.py

-        start.dtype == end.dtype,
-        lambda: f"expected dtype {start.dtype} for `end`, but got dtype {end.dtype}",
-    )
+    # torch._check(


We should uncomment this once #146749 is fixed

This PR is a prototype for what a frontend for asking for mixed precision can look like torch.optim through set_dtype_policy in optimizer.py. This is not meant to be landable but to start some discussions on what people want/would like to see and to ask if there are things I haven't considered yet. This currently only works with Adam(W)! A toy script for how to use: ``` import torch model = torch.nn.Sequential( torch.nn.Linear(2, 3), torch.nn.Sigmoid(), torch.nn.Linear(3, 1), torch.nn.Sigmoid(), ) model.to("cuda") optim = torch.optim.AdamW(model.named_parameters(), foreach=False) mp_policy = { "exp_avg": lambda _: torch.bfloat16, "exp_avg_sq": lambda _: torch.bfloat16, "max_exp_avg_sq": lambda _: torch.bfloat16, } optim.set_dtype_policy(mp_policy) i = torch.tensor([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], device="cuda").reshape(3, 2) l = model(i).sum() l.backward() optim.step() ``` [ghstack-poisoned]

ghstack-source-id: 9f0e566 Pull Request resolved: #146640

This PR is a prototype for what a frontend for asking for mixed precision can look like torch.optim through set_dtype_policy in optimizer.py. This is not meant to be landable but to start some discussions on what people want/would like to see and to ask if there are things I haven't considered yet. This currently only works with Adam(W)! A toy script for how to use: ``` import torch model = torch.nn.Sequential( torch.nn.Linear(2, 3), torch.nn.Sigmoid(), torch.nn.Linear(3, 1), torch.nn.Sigmoid(), ) model.to("cuda") optim = torch.optim.AdamW(model.named_parameters(), foreach=False) mp_policy = { "exp_avg": lambda _: torch.bfloat16, "exp_avg_sq": lambda _: torch.bfloat16, "max_exp_avg_sq": lambda _: torch.bfloat16, } optim.set_dtype_policy(mp_policy) i = torch.tensor([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], device="cuda").reshape(3, 2) l = model(i).sum() l.backward() optim.step() ``` [ghstack-poisoned]

ghstack-source-id: 85f4266 Pull Request resolved: pytorch/pytorch#146640

github-actions · 2025-04-23T03:42:16Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

POC for mixed prec optim frontend

fd7b4ea

[ghstack-poisoned]

janeyx99 requested a review from albanD as a code owner February 6, 2025 21:31

pytorch-bot bot added the release notes: optim label Feb 6, 2025

janeyx99 added a commit that referenced this pull request Feb 6, 2025

POC for mixed prec optim frontend

32f2408

ghstack-source-id: b3b5d46 Pull Request resolved: #146640

janeyx99 marked this pull request as draft February 6, 2025 21:33

janeyx99 mentioned this pull request Feb 8, 2025

dest = zeros_like(source, dtype=DTYPE) changes source's DTensor dtype #146749

Closed

janeyx99 added a commit that referenced this pull request Feb 8, 2025

POC for mixed prec optim frontend

010cb06

ghstack-source-id: 251b9c3 Pull Request resolved: #146640

janeyx99 commented Feb 8, 2025

View reviewed changes

janeyx99 added a commit that referenced this pull request Feb 12, 2025

POC for mixed prec optim frontend

b2a4810

ghstack-source-id: 9f0e566 Pull Request resolved: #146640

janeyx99 mentioned this pull request Feb 22, 2025

Bf16 fused adam(W) #147653

Draft

desai0007 pushed a commit to desai0007/test-repo-pytorch that referenced this pull request Feb 26, 2025

POC for mixed prec optim frontend

358d92b

ghstack-source-id: 85f4266 Pull Request resolved: pytorch/pytorch#146640

github-actions bot added the Stale label Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC for mixed prec optim frontend #146640

POC for mixed prec optim frontend #146640

POC for mixed prec optim frontend #146640

Are you sure you want to change the base?

POC for mixed prec optim frontend #146640

Conversation

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146640

❌ 10 New Failures

Choose a reason for hiding this comment