8000 updated adafactor doc #154862 by morrison-turnansky · Pull Request #155248 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@morrison-turnansky
Copy link
Contributor

updated adafactor doc to reflect difference in implementation vs original paper

Fixes #154862

@pytorch-bot
Copy link
pytorch-bot bot commented Jun 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155248

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit c2c5e09 with merge base f5e1b24 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link
linux-foundation-easycla bot commented Jun 5, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: morrison-turnansky / name: Morrison Turnansky (61dd92e, 519c017)
  • ✅ login: janeyx99 / name: Jane (Yuan) Xu (c2c5e09)

@morrison-turnansky morrison-turnansky changed the title updated adafactor doc to reflect difference in implementation vs orig… pr for issue #154862 Jun 5, 2025
@morrison-turnansky morrison-turnansky changed the title pr for issue #154862 updated adafactor doc #154862 Jun 5, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, there is no mathematical deviation. We are computing the same thing, but using mean instead of sum to take advantage of being able to use smaller numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I’ve updated the wording to reflect that this is mathematically equivalent and that the use of mean is purely for numerical stability. Let me know if the new phrasing works for you.

Copy link
Contributor Author
@morrison-turnansky morrison-turnansky Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if any additional changes are necessary. @albanD @janeyx99

@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 5, 2025
@morrison-turnansky morrison-turnansky force-pushed the main branch 2 times, most recently from 2c91841 to 6e9c27a Compare June 16, 2025 17:12
@morrison-turnansky morrison-turnansky force-pushed the main branch 2 times, most recently from 91787ed to 0de4ade Compare June 20, 2025 16:04
@janeyx99
Copy link
Contributor

Sorry I have been a little busy with getting things in 2.8 😛 Thanks for your phrasing--I've made more things concise but LGTM now!

@janeyx99
Copy link
Contributor

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 27, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@morrison-turnansky
Copy link
Contributor Author

@janeyx99 All good, and thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: optim triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

documentation of Adafactor does not match the implementation

4 participants

0