8000 Optimize LRScheduler docs by zeshengzong · Pull Request #146684 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Optimize LRScheduler docs #146684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

zeshengzong
Copy link
Contributor
@zeshengzong zeshengzong commented Feb 7, 2025

Fixes part of #120735 via add more description about LRScheduler

Changes

  1. What the constructor's last_epoch argument is.
    And does epoch start from 0 or 1?
    Also there are two terms - "epoch", "step" - are they the same?

last_epoch explained via Args description, the difference might be clear after compare with step method description.

  1. That the constructor relies on/creates the 'initial_lr' property on the .optimizer.
    (By the way, is the Optimizer class up with it?)

initial_lr will be set when init LRScheduler, use optimizer lr value, which is initialized when create a optimizer. But these are inner implementation, users doesn't need c 8000 are about.

  1. That .get_last_lr() and .get_lr() are totally different methods despite of naming.
    Wait, there are also ._get_closed_form_lr() methods, hm...

Add description in get_last_lr and get_lr, but private method _get_closed_form_lr should not expose to users in docs.

  1. That the constructor does .step() itself via ._initial_step() method.
  2. Which arguments the .step() method has.
    Or is it (the epoch argument) deprecated?

Add deprecate epoch in step method doc

  1. What does .step() do?
    Does it modify (in some way) the .optimizer? (See also p.2.)

Update step doc

7.Are the .last_epoch, .base_lrs attributes public?
(Don't know, maybe it is not accepted to publish such information.)
(For .last_epoch, see also p.1a.)

These params seems handled by LRScheduler itself and not give example for use in here, I assume users can omit these params when use LRScheduler.

Test Result

Before

image

After

image
image

cc @janeyx99

Copy link
pytorch-bot bot commented Feb 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146684

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit af38ae0 with merge base 13339ce (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@zeshengzong zeshengzong marked this pull request as ready for review February 7, 2025 09:16
@albanD albanD removed their request for review February 7, 2025 10:15 8000
@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 7, 2025
Copy link
Contributor
@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the attempt, but I don't think this PR addresses most of the concerns in the linked issue.

@zeshengzong zeshengzong marked this pull request as draft February 14, 2025 01:32
@zeshengzong
Copy link
Contributor Author

Hi @janeyx99, please check this version, anything need to be added please let me know, thanks!

@zeshengzong zeshengzong marked this pull request as ready for review February 17, 2025 07:23
@zeshengzong
Copy link
Contributor Author

Hello @janeyx99 , please check any improvements should be made when available, thanks!

@zeshengzong
Copy link
Contributor Author

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased opt/docs/LRScheduler onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout opt/docs/LRScheduler && git pull --rebase)

@janeyx99
Copy link
Contributor

@zeshengzong sorry I haven't had the bandwidth to properly review this change--I want to make sure it actually addresses the concerns in the issue as clearly as possible. I'm hoping to get to it before the end of the week, feel free to ping me here if that does not happen to remind!

@zeshengzong zeshengzong force-pushed the opt/docs/LRScheduler branch from 9c82b7f to af38ae0 Compare April 23, 2025 02:27
@zeshengzong
Copy link
Contributor Author

Hello @janeyx99 , please help review this one when available, thanks!

Copy link
Contributor
@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look. This PR would not fully fix the concerns in the issue. See my review below and #120735 (comment).

I do not expect fixing the issue to be trivial.

optimizer (Optimizer): Wrapped optimizer. (Find more about
`optimizer <https://pytorch.org/docs/stable/optim.html#base-class>`_) .
last_epoch (int, optional): The index of the last training epoch to resume.
Default: -1, starts from the optimizer lr.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Default: -1, starts from the optimizer lr.
Default: -1, starts from the optimizer lr.

The starts from the optimizer lr is confusing still, see #120735 (comment)

@@ -252,7 +277,7 @@ class LambdaLR(LRScheduler):
"""Sets the initial learning rate.

The learning rate of each parameter group is set to the initial lr
times a given function. When last_epoch=-1, sets initial lr as lr.
times a given function. When last_epoch=-1, use optimizer's lr as inital lr.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
times a given function. When last_epoch=-1, use optimizer's lr as inital lr.
times a given function. When last_epoch=-1, use optimizer's lr as initial lr.

@@ -458,7 +483,7 @@ class StepLR(LRScheduler):
"""Decays the learning rate of each parameter group by gamma every step_size epochs.

Notice that such decay can happen simultaneously with other changes to the learning rate
from outside this scheduler. When last_epoch=-1, sets initial lr as lr.
from outside this scheduler. When last_epoch=-1, use optimizer's lr as inital lr.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from outside this scheduler. When last_epoch=-1, use optimizer's lr as inital lr.
from outside this scheduler. When last_epoch=-1, use optimizer's lr as initial lr.

What if last_epoch is not -1? Then what's the inital lr?

@@ -1010,7 +1035,7 @@ class CosineAnnealingLR(LRScheduler):
& T_{cur} = (2k+1)T_{max}.
\end{aligned}

When last_epoch=-1, sets initial lr as lr. Notice that because the schedule
When last_epoch=-1, use optimizer's lr as inital lr. Notice that because the schedule
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please spell initial correctly throughout this file

@zeshengzong
Copy link
Contributor Author

Thanks for taking a look. This PR would not fully fix the concerns in the issue. See my review below and #120735 (comment).

I do not expect fixing the issue to be trivial.

Hi, thanks for your time! I was thinking about how much detail should be exposed to users in public doc, as from my view something like in #120735 (comment) has describe so much detail and may not actually need it when I use it, like:

So after the scheduler object, the "first" (i.e. <last_epoch argument>-th) step had been done.
Also the optimizer's 'lr's had been replaced).
None of this behavior is documented.

And only add description about last_epoch and params seems not enough to describe "how it works". A better solution might be draw a diagram about the interaction of lr_scheduler and optimizer in a training loop (also include user passed params). WDYT? @janeyx99

@janeyx99
Copy link
Contributor
janeyx99 commented May 2, 2025

I'd be curious to see the diagram, and I agree that LRScheduler is not the easiest to document with its many intricacies. We can try to land some version of this PR but not close the original issue, as I think the original issue brings up valid points that should be addressed more (not just in docs, but perhaps in future design as well).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open source release notes: optim triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0