8000 fix `distributed.checkpoint.state_dict.set_model_state_dict` returned _IncompatibleKeys when `full_state_dict=True` by YassineYousfi · Pull Request #153351 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

fix distributed.checkpoint.state_dict.set_model_state_dict returned _IncompatibleKeys when full_state_dict=True #153351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in 8000 to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

YassineYousfi
Copy link
@YassineYousfi YassineYousfi commented May 11, 2025

Copy link
pytorch-bot bot commented May 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153351

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 7317ade with merge base 0104ac0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
linux-foundation-easycla bot commented May 11, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (checkpoint) labels May 11, 2025
@YassineYousfi YassineYousfi changed the title fix set_model_state_dict _IncompatibleKeys when full_state_dict fix distributed.checkpoint.state_dict.set_model_state_dict returned _IncompatibleKeys when full_state_dict=True May 11, 2025
@fegin fegin added the ciflow/trunk Trigger trunk jobs on your pull request label May 12, 2025
@fegin
Copy link
Contributor
fegin commented May 12, 2025

@YassineYousfi Thanks for the fix. Would you mind to update the unittest to cover the case as well?

@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label May 12, 2025
@YassineYousfi
Copy link
Author

added to test_strict
passes with the change and fails without it

@colesbury colesbury requested a review from wconstab May 13, 2025 00:57
@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: distributed (checkpoint) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
4 participants
0