8000 [4/N] Test NaN checker against broadcast by kwen2501 · Pull Request #134701 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[4/N] Test NaN checker against broadcast #134701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

Copy link
pytorch-bot bot commented Aug 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134701

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit edeb15e with merge base 0dbc728 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category labels Aug 28, 2024
kwen2501 added a commit that referenced this pull request Aug 28, 2024
ghstack-source-id: b64448a
Pull Request resolved: #134701
cc XilunWu H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
@kwen2501
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 28, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 5, 5, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team Raised by workflow job

cc XilunWu H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
cc XilunWu H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
@kwen2501
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch rebase origin/main returned non-zero exit code 1

Rebasing (1/1)
Auto-merging test/distributed/test_c10d_nccl.py
CONFLICT (content): Merge conflict in test/distributed/test_c10d_nccl.py
error: could not apply 05bf13af32... [4/N] Test NaN checker against broadcast (#134701)
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Could not apply 05bf13af32... [4/N] Test NaN checker against broadcast (#134701)
Details for Dev Infra team Raised by workflow job

cc XilunWu H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
@kwen2501
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x 79ad95b954943b0af3d1416a0b500ebb83724b9a returned non-zero exit code 1

Auto-merging torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp
CONFLICT (content): Merge conflict in torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp
error: could not apply 79ad95b954... [2/N] Add flag to control which rank should perform NaN check (#134345)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

cc XilunWu H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
@kwen2501
Copy link
Contributor Author

@pytorchbot merge -f "Re-merge after rebase"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Aug 29, 2024
By using a zeros() tensor instead of empty() tensor.

Pull Request resolved: #134707
Approved by: https://github.com/shuqiangzhang, https://github.com/wconstab
ghstack dependencies: #134345, #134357, #134701
tolleybot pushed a commit to tolleybot/pytorch that referenced this pull request Sep 14, 2024
tolleybot pushed a commit to tolleybot/pytorch that referenced this pull request Sep 14, 2024
By using a zeros() tensor instead of empty() tensor.

Pull Request resolved: pytorch#134707
Approved by: https://github.com/shuqiangzhang, https://github.com/wconstab
ghstack dependencies: pytorch#134345, pytorch#134357, pytorch#134701
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
By using a zeros() tensor instead of empty() tensor.

Pull Request resolved: pytorch#134707
Approved by: https://github.com/shuqiangzhang, https://github.com/wconstab
ghstack dependencies: pytorch#134345, pytorch#134357, pytorch#134701
@github-actions github-actions bot deleted the gh/kwen2501/55/head branch October 3, 2024 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0