8000 Add state to distributed composable API by mrshenli · Pull Request #87838 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Add state to distributed composable API #87838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

mrshenli
Copy link
Contributor
@mrshenli mrshenli commented Oct 27, 2022

@pytorch-bot
Copy link
pytorch-bot bot commented Oct 27, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87838

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures, 6 Pending

As of commit 835f540:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Contributor

This PR needs a label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@mrshenli mrshenli added the topic: not user facing topic category label Oct 27, 2022
api.state(module).dummy_state = 8
return inp

# FIXME: circular reference looks a bit weird. Shall we make .state a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on this?

pass


state_key = _StateKey()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about STATE_KEY, like a constant?

assert isinstance(d, dict), "Distributed composable API states corrupted"
return d

def wrapper(module: nn.Module, *args, **kwargs) -> Optional[nn.Module]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to make module -> *module, like in the design doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. Will update. Is it OK to update that in the follow up PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, not a blocker for me

@yhcharles yhcharles self-requested a review October 27, 2022 04:57
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 27, 2022
def wrapper(module: nn.Module, *args 8000 , **kwargs) -> Optional[nn.Module]:
# install states specific to the wrapped ``func``
all_state: Dict[Callable, dict] = get_all_state(module)
assert func not in all_state, (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An open question we can resolve later: how to make sure some APIs are mutual exclusive, for example shard_embedding/replicate, while some others are not, for example checkpoint/fsdp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one option could be letting the contract adding info to the state, and the contract can help checking whether 1) same API is called twice, 2) whether there are conflicting APIs.

But might need a way to allow new APIs to declare conflicts.

@mrshenli
Copy link
Contributor Author

@pytorchbot merge -g

@mrshenli mrshenli mentioned this pull request Oct 28, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 additional jobs have failed, first few of them are: trunk ,trunk / macos-12-py3-arm64-mps / Run MPS tests

Details for Dev Infra team Raised by workflow job

@mrshenli
Copy link
Contributor Author

@pytorchbot merge -f "test failure is irrelevant"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Nov 5, 2022
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
@facebook-github-bot facebook-github-bot deleted the gh/mrshenli/338/head branch June 8, 2023 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0