8000 [inductor] parallel compile: Create new pipes for subproc communication by pytorchbot · Pull Request #133590 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[inductor] parallel compile: Create new pipes for subproc communication #133590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

pytorchbot
Copy link
Collaborator
@pytorchbot pytorchbot commented Aug 15, 2024

Stack from ghstack (oldest at bottom):

Summary: Rather then using stdin/stdout for IPC, we can create new pipes and pass the descriptors to the subproc via the cmd line. #131070 reports an issue where the combination of deepspeed and onnxruntime-training causes something in the subproc to write to stdout and corrupt the IPC. The current implementation was already brittle; we can just create new pipes specifically for the IPC.

Test Plan: I was able to repro the MemoryError in #131070 by installing deepspeed and onnxruntime-training. Verified this PR fixes.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

Differential Revision: D59968362

…on (#131194)

Summary: Rather then using stdin/stdout for IPC, we can create new pipes and pass the descriptors to the subproc via the cmd line. #131070 reports an issue where the combination of deepspeed and onnxruntime-training causes _something_ in the subproc to write to stdout and corrupt the IPC. The current implementation was already brittle; we can just create new pipes specifically for the IPC.

Test Plan: I was able to repro the MemoryError in #131070 by installing deepspeed and onnxruntime-training. Verified this PR fixes.

Differential Revision: [D59968362](https://our.internmc.facebook.com/intern/diff/D59968362)
Pull Request resolved: #131194
Approved by: https://github.com/malfet, https://github.com/eellison, https://github.com/atalman

(cherry picked from commit 3c43fe0)
Copy link
pytorch-bot bot commented Aug 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133590

Note: Links to docs will display an error until the docs builds have been completed.

❌ 37 New Failures, 5 Unrelated Failures

As of commit bebb111 with merge base b66e3f0 (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorchmergebot pushed a commit that referenced this pull request Aug 20, 2024
During cherry-picking we want to use default setting and fail if there is merge conflict
Here an example of invalid conflict resolution:
#131194
and cherry-pick
#133590

Pull Request resolved: #134047
Approved by: https://github.com/kit1980
malfet pushed a commit to aditew01/pytorch that referenced this pull request Sep 13, 2024
During cherry-picking we want to use default setting and fail if there is merge conflict
Here an example of invalid conflict resolution:
pytorch#131194
and cherry-pick
pytorch#133590

Pull Request resolved: pytorch#134047
Approved by: https://github.com/kit1980
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Oct 20, 2024
@github-actions github-actions bot closed this Nov 19, 2024
@github-actions github-actions bot deleted the cherry-pick-131194-by-pytorch_bot_bot_ branch December 20, 2024 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
< 2A5F /div>
0