Add support for differentiable LR in SGD + test v2.0 #143510

EmmettBicker · 2024-12-18T18:14:03Z

Second PR in a larger project to broader support for differentiable optimizers with @janeyx99 ! The first one had an issue near the end so this is the second PR on that subject. See #143122 for the development up until this point.

pytorch-bot · 2024-12-18T18:14:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143510

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ae57bfe with merge base 80a4239 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

removed default = False on differentiables lint Addressed several of Jane's comments, still not ready to merge Addressed more comments Renamed kwargs to inner_kwargs, changed x ,y to be simpler, reordered order of test var definitions to be more logical, put lr back into inner_kwargs to make the function more adaptable for future enhancement Add differentiable flag to functional_sgd.py Streamlined tester function + easier support for more kwargs Renamed + fixed test_differentiable_lr Functional refactoring (last time I added default param in sgd() it broke CI, fingers crossed!) lint Update test/optim/test_optim.py Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Add comment + attempt to under differentiable adding in torch/optim/sgd.py Will revert this commit if we choose to do ``` if differentiable and isinstance(lr, Tensor): ``` instead of ``` if isinstance(lr, Tensor): ``` Add newline to revert earlier change updated to use cpu-scalar addcmul + made name more generalizable Add support for differentiable LR in SGD + test removed default = False on differentiables lint Addressed several of Jane's comments, still not ready to merge Addressed more comments Renamed kwargs to inner_kwargs, changed x ,y to be simpler, reordered order of test var definitions to be more logical, put lr back into inner_kwargs to make the function more adaptable for future enhancement Add differentiable flag to functional_sgd.py Streamlined tester function + easier support for more kwargs Renamed + fixed test_differentiable_lr Functional refactoring (last time I added default param in sgd() it broke CI, fingers crossed!) lint Update test/optim/test_optim.py Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Add comment + attempt to under differentiable adding in torch/optim/sgd.py Will revert this commit if we choose to do ``` if differentiable and isinstance(lr, Tensor): ``` instead of ``` if isinstance(lr, Tensor): ``` Add newline to revert earlier change updated to use cpu-scalar addcmul + made name more generalizable

… push/merge and

janeyx99 · 2024-12-19T00:26:36Z

@pytorchbot merge

pytorchmergebot · 2024-12-19T00:28:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-12-19T00:50:07Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-py3.13-clang10 / test (default, 4, 5, lf.linux.4xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

….add_

janeyx99 · 2024-12-19T01:01:55Z

torch/optim/sgd.py

@@ -354,6 +354,8 @@ def _single_tensor_sgd(
        if isinstance(lr, Tensor) and lr.requires_grad:
            param.addcmul_(grad, lr, value=-1)
        else:
+            # CPU scalar tensors w/out grad works but isn't supported in typehints
+            lr = cast(float, lr)


It should almost always already be a float--we don't want to do this in the sgd code, but we should enforce that it's always a float in the tests

Okay! I added it bc the CI failed, which I think is because functional uses really strict typing. That being said the cast felt pretty wrong when I added it esp because I hadn't seen any casts in any of the code I've read. What do u think we should do about the strict typing?

The failed test is here: https://github.com/pytorch/pytorch/actions/runs/12403464090/job/34628316613?pr=143510

I was thinking there's a chance we could do just if isinstance(lr, Tensor) because the previous error of found at least two devices, mps:0 and cpu! occured on mps because my previous PR wasn't merged on this branch yet. In the last PR I allowed addcmul to take in cpu_scalars in the torchiterator and that wasn't there before, so it used to flag any cpu scalar before it even got to the implementation. We also added testing of the scalar to the prexisting addcmul test in the addcmul PR, and I believe it tested all the devices with the scalar and it seemed to work on all of them, but I could be terribly mistaken.

I also am kinda concerned that any tensor LRs would go into the addcmul and there might be some behavior supported w/ add that's not with addcmul, but it would fix this typing issue

What happens when you remove the type hint at the definition of this function?

Generally, jit script is too rigid with this..we shouldn't change our code semantics to make typing for jit script happy. Also, the addcmul path on MPS probably would not work even with your change because your previous PR is CUDA only.

janeyx99 · 2024-12-19T17:07:58Z

@pytorchbot merge

pytorchmergebot · 2024-12-19T17:09:45Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-12-19T17:31:01Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner-noclang / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

EmmettBicker · 2024-12-19T18:03:00Z

@pytorchbot merge

pytorchmergebot · 2024-12-19T18:04:53Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

EmmettBicker · 2024-12-19T21:08:50Z

Woohoo!

EmmettBicker requested review from albanD and janeyx99 as code owners December 18, 2024 18:14

pytorch-bot bot added the release notes: optim label Dec 18, 2024

pytorchbot added the open source label Dec 18, 2024

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 18, 2024

EmmettBicker and others added 2 commits December 18, 2024 18:57

added lr.requires grad to conditional + remove duplication from force…

28c4d27

… push/merge and

janeyx99 approved these changes Dec 19, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 19, 2024

pytorchmergebot added the merging label Dec 19, 2024

pytorchmergebot removed the merging label Dec 19, 2024

EmmettBicker added 2 commits December 18, 2024 19:55

(kinda hacky) experimental push to potentially fix strict typing for …

5b71fcd

….add_

Add comment about the cast

392b362

janeyx99 reviewed Dec 19, 2024

View reviewed changes

EmmettBicker added 4 commits December 19, 2024 00:06

remove typehint of LR for jit script

a062f96

[testing CI] reverting hint to only float

5559db8

[testing CI] just isinstance(lr, Tensor)

39d6948

Update sgd.py

70b9b82

pytorchmergebot added the merging label Dec 19, 2024

pytorchmergebot removed the merging label Dec 19, 2024

EmmettBicker added 2 commits December 19, 2024 12:34

lr: float | tensor -> lr: float

6843734

remove unnecessary from __future__ import

ae57bfe

pytorchmergebot added the merging label Dec 19, 2024

pytorchmergebot added the Merged label Dec 19, 2024

pytorchmergebot closed this in 0b2c479 Dec 19, 2024

pytorchmergebot removed the merging label Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for differentiable LR in SGD + test v2.0 #143510

Add support for differentiable LR in SGD + test v2.0 #143510

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add support for differentiable LR in SGD + test v2.0 #143510

Add support for differentiable LR in SGD + test v2.0 #143510

Uh oh!

Conversation

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143510

✅ No Failures

Uh oh!

Uh oh!

Merge started

Uh oh!

Merge failed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Merge started

Uh oh!

Merge failed

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!