-
Notifications
You must be signed in to change notification settings - Fork 25.8k
updated adafactor doc #154862 #155248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated adafactor doc #154862 #155248
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155248
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit c2c5e09 with merge base f5e1b24 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
torch/optim/_adafactor.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, there is no mathematical deviation. We are computing the same thing, but using mean instead of sum to take advantage of being able to use smaller numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I’ve updated the wording to reflect that this is mathematically equivalent and that the use of mean is purely for numerical stability. Let me know if the new phrasing works for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2c91841 to
6e9c27a
Compare
91787ed to
0de4ade
Compare
|
Sorry I have been a little busy with getting things in 2.8 😛 Thanks for your phrasing--I've made more things concise but LGTM now! |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@janeyx99 All good, and thank you! |
updated adafactor doc to reflect difference in implementation vs original paper
Fixes #154862