8000 [DSD] Fix the shared parameter mismatch for optimizer state_dict when flattening FQNs are used by fegin · Pull Request #148825 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[DSD] Fix the shared parameter mismatch for optimizer state_dict when flattening FQNs are used #148825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

fegin
Copy link
Contributor
@fegin fegin commented Mar 8, 2025

[ghstack-poisoned]
Copy link
pytorch-bot bot commented Mar 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148825

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4537063 with merge base 915eb01 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (checkpoint) labels Mar 8, 2025
fegin added a commit that referenced this pull request Mar 8, 2025
… flattening FQNs are used

Summary:
As title.

ghstack-source-id: 3c05da1
Pull Request resolved: #148825
[ghstack-poisoned]
fegin added a commit that referenced this pull request Mar 8, 2025
… flattening FQNs are used

Summary:
As title.

ghstack-source-id: 73f670d
Pull Request resolved: #148825
@fegin fegin requested a review from mori360 March 8, 2025 22:39
@fegin fegin added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 10, 2025
Comment on lines +732 to +746
if fqn in info.shared_params_mapping:
in_params = False
for k in param_group.keys():
if k == _PARAMS:
continue
flatten_key = f"{_PG}.{fqn}.{k}"
if flatten_key in state_dict:
in_params = True
break
else:
in_params = True

if not in_params:
continue

Copy link
Contributor
@kwen2501 kwen2501 Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shall we add some comments to the code?

Copy link
Contributor
@fduwjj fduwjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unblock

[ghstack-poisoned]
fegin added a commit that referenced this pull request Mar 10, 2025
… flattening FQNs are used

Summary:
As title.

ghstack-source-id: 7f12689
Pull Request resolved: #148825
@fegin
Copy link
Contributor Author
fegin commented Mar 10, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (checkpoint)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0