8000 [tp] refactor and fix PrepareModuleInput for DTensor inputs by wanchaol · Pull Request #128431 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[tp] refactor and fix PrepareModuleInput for DTensor inputs #128431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

wanchaol
Copy link
Collaborator
@wanchaol wanchaol commented Jun 11, 2024

Stack from ghstack (oldest at bottom):

as titled, this PR refactors the PrepareModuleInput style to have common
method prepare_input_arg, allow both args/kwargs to reuse this logic

This also fixes #128365

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k

as titled, this PR refactors the PrepareModuleInput style to have common
method prepare_input_arg, allow both args/kwargs to reuse this logic

This also fixes #128365

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jun 11, 2024
Copy link
pytorch-bot bot commented Jun 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128431

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 41c34a6 with merge base 4345d98 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@wanchaol wanchaol changed the title [tp] refactor PrepareModuleInput to reuse code [tp] refactor and fix PrepareModuleInput for DTensor inputs Jun 11, 2024
@wanchaol
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 11, 2024
as titled, this PR refactors the PrepareModuleInput style to have common
method prepare_input_arg, allow both args/kwargs to reuse this logic

This also fixes #128365

cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k

[ghstack-poisoned]
@wanchaol
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/wanchaol/485/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/128431)

@wanchaol
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@wanchaol
Copy link
Collaborator Author

@pytorchbot merge -f "macos job seems taking forever to build.."

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

as titled, this PR refactors the PrepareModuleInput style to have common
method prepare_input_arg, allow both args/kwargs to reuse this logic

This also fixes #128365

cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k

[ghstack-poisoned]
as titled, this PR refactors the PrepareModuleInput style to have common
method prepare_input_arg, allow both args/kwargs to reuse this logic

This also fixes #128365

cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Jun 12, 2024
as titled, this PR refactors the PrepareModuleInput styl
8000
e to have common
method prepare_input_arg, allow both args/kwargs to reuse this logic

This also fixes #128365

ghstack-source-id: 037bb6d
Pull Request resolved: #128431
@wanchaol
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@wanchaol
Copy link
Collaborator Author

@pytorchbot merge -f "lint fixing only, no need to re-run all CI jobs"

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

wanchaol added a commit to pytorch/torchtitan that referenced this pull request Jun 12, 2024
This PR is a follow up PR to enable fp8 allgather in TP after these PR
landed:
* pytorch/pytorch#128431
* pytorch-labs/float8_experimental#275

One need to update their pytorch/float8_experimental to have those
changes in to train with fp8 changes.

Since fp8 is not enabled as part of our integration tests yet, there
should be no issues on CI
wanchaol added a commit to pytorch/torchtitan that referenced this pull request Jun 13, 2024
This PR is a follow up PR to enable fp8 allgather in TP after these PR
landed:
* pytorch/pytorch#128431
* pytorch-labs/float8_experimental#275

One need to update their pytorch/float8_experimental to have those
changes in to train with fp8 changes.

Since fp8 is not enabled as part of our integration tests yet, there
should be no issues on CI or trains that does not use fp8
TharinduRusira pushed a commit to TharinduRusira/pytorch that referenced this pull request Jun 14, 2024
…128431)

as titled, this PR refactors the PrepareModuleInput style to have common
method prepare_input_arg, allow both args/kwargs to reuse this logic

This also fixes pytorch#128365

Pull Request resolved: pytorch#128431
Approved by: https://github.com/awgu
TharinduRusira pushed a commit to TharinduRusira/pytorch that referenced this pull request Jun 14, 2024
TharinduRusira pushed a commit to TharinduRusira/pytorch that referenced this pull request Jun 14, 2024
…128431)

as titled, this PR refactors the PrepareModuleInput style to have common
method prepare_input_arg, allow both args/kwargs to reuse this logic

This also fixes pytorch#128365

Pull Request resolved: pytorch#128431
Approved by: https://github.com/awgu
wanchaol added a commit that referenced this pull request Jun 14, 2024
as titled, this PR refactors the PrepareModuleInput style to have common
method prepare_input_arg, allow both args/kwargs to reuse this logic

This also fixes #128365

Pull Request resolved: #128431
Approved by: https://github.com/awgu

(cherry picked from commit 7775fee)
ignaciobartol pushed a commit to ignaciobartol/pytorch that referenced this pull request Jun 14, 2024
…128431)

as titled, this PR refactors the PrepareModuleInput style to have common
method prepare_input_arg, allow both args/kwargs to reuse this logic

This also fixes pytorch#128365

Pull Request resolved: pytorch#128431
Approved by: https://github.com/awgu
ignaciobartol pushed a commit to ignaciobartol/pytorch that referenced this pull request Jun 14, 2024
ignaciobartol pushed a commit to ignaciobartol/pytorch that referenced this pull request Jun 14, 2024
…128431)

as titled, this PR refactors the PrepareModuleInput style to have common
method prepare_input_arg, allow both args/kwargs to reuse this logic

This also fixes pytorch#128365

Pull Request resolved: pytorch#128431
Approved by: https://github.com/awgu
atalman pushed a commit that referenced this pull request Jun 19, 2024
…#128719)

as titled, this PR refactors the PrepareModuleInput style to have common
method prepare_input_arg, allow both args/kwargs to reuse this logic

This also fixes #128365

Pull Request resolved: #128431
Approved by: https://github.com/awgu

(cherry picked from commit 7775fee)
@github-actions github-actions bot deleted the gh/wanchaol/485/head branch July 14, 2024 02:02

tianyu-l pushed a commit to tianyu-l/torchtitan_intern24 that referenced this pull request Aug 16, 2024
This PR is a follow up PR to enable fp8 allgather in TP after these PR
landed:
* pytorch/pytorch#128431
* pytorch-labs/float8_experimental#275

One need to update their pytorch/float8_experimental to have those
changes in to train with fp8 changes.

Since fp8 is not enabled as part of our integration tests yet, there
should be no issues on CI or trains that does not use fp8
tianyu-l pushed a commit to pytorch/torchtitan that referenced this pull request Aug 16, 2024
This PR is a follow up PR to enable fp8 allgather in TP after these PR
landed:
* pytorch/pytorch#128431
* pytorch-labs/float8_experimental#275

One need to update their pytorch/float8_experimental to have those
changes in to train with fp8 changes.

Since fp8 is not enabled as part of our integration tests yet, there
should be no issues on CI
philippguevorguian pushed a commit to YerevaNN/YNNtitan that referenced this pull request Aug 17, 2024
This PR is a follow up PR to enable fp8 allgather in TP after these PR
landed:
* pytorch/pytorch#128431
* pytorch-labs/float8_experimental#275

One need to update their pytorch/float8_experimental to have those
changes in to train with fp8 changes.

Since fp8 is not enabled as part of our integration tests yet, there
should be no issues on CI or trains that does not use fp8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (dtensor) release notes category Reverted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0