10000 [tp] propagate src_data_rank kwarg in TP API by wanchaol · Pull Request #144005 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[tp] propagate src_data_rank kwarg in TP API #144005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

wanchaol
Copy link
Collaborator
@wanchaol wanchaol commented Dec 30, 2024

Stack from ghstack (oldest at bottom):

as titled, this PR propagates the src_data_rank in the TP API, so that
module level APIs could leverage the flexibility to choose
src_data_rank, and avoid the communication if it does not need to

cc @H-Huang @awgu @kwen2501 @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

as titled, this PR propagates the src_data_rank in the TP API, so that
module level APIs could leverage the flexibility to choose
src_data_rank, and avoid the communication if it does not need to

[ghstack-poisoned]
Copy link
pytorch-bot bot commented Dec 30, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144005

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit cd708c2 with merge base d88a8c4 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Dec 30, 2024
wanchaol added a commit that referenced this pull request Dec 30, 2024
as titled, this PR propagates the src_data_rank in the TP API, so that
module level APIs could leverage the flexibility to choose
src_data_rank, and avoid the communication if it does not need to

ghstack-source-id: c675839
Pull Request resolved: #144005
as titled, this PR propagates the src_data_rank in the TP API, so that
module level APIs could leverage the flexibility to choose
src_data_rank, and avoid the communication if it does not need to

cc H-Huang awgu kwen2501 fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Dec 30, 2024
as titled, this PR propagates the src_data_rank in the TP API, so that
module level APIs could leverage the flexibility to choose
src_data_rank, and avoid the communication if it does not need to

ghstack-source-id: 868d7c6
Pull Request resolved: #144005
@wanchaol wanchaol added release notes: distributed (dtensor) release notes category ciflow/trunk Trigger trunk jobs on your pull request labels Dec 30, 2024
Copy link
Contributor
@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One nit is that the test only covers 1D device mesh case, but the code is supposed to work under n-D.

@wanchaol
Copy link
Collaborator Author
wanchaol commented Jan 2, 2025

LGTM. One nit is that the test only covers 1D device mesh case, but the code is supposed to work under n-D.

The TP API is suppose to work with 1D DeviceMesh though, but I think we can test n-D one with the distribute_tensor update, I'll submit a follow up patch later

@wanchaol
Copy link
Collaborator Author
wanchaol commented Jan 2, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (dtensor) release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0