10BC0 [PP] Fix disabled flaky tests by H-Huang · Pull Request #154856 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@H-Huang
Copy link
Member
@H-Huang H-Huang commented Jun 2, 2025

Stack from ghstack (oldest at bottom):

Fix #154373, #154391, #154408, #154443, #154481

Because MultiProcContinousTest now executes the tests with 8 GPUs instead of 2, our PP tests comparing gradients have become flakier due to the longer pipeline. The gradients are still close but we need to relax the tolerance.

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k

[ghstack-poisoned]
@pytorch-bot
Copy link
pytorch-bot bot commented Jun 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154856

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 8ac6883 with merge base 0d0058d (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category labels Jun 2, 2025
H-Huang added a commit that referenced this pull request Jun 2, 2025
ghstack-source-id: e127a12
Pull Request resolved: #154856
@H-Huang H-Huang requested a review from kwen2501 June 2, 2025 17:25
@H-Huang H-Huang added the release notes: distributed (pipeline) release notes category label Jun 2, 2025
@H-Huang H-Huang changed the title [PP] Fix disable flaky tests [PP] Fix disabled flaky tests Jun 2, 2025
@H-Huang
Copy link
Member Author
H-Huang commented Jun 3, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 3, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

iupaikov-amd pushed a commit to ROCm/pytorch that referenced this pull request Jun 4, 2025
Fix pytorch#154373, pytorch#154391, pytorch#154408, pytorch#154443, pytorch#154481

Because MultiProcContinousTest [now executes the tests with 8 GPUs instead of 2](pytorch#153653), our PP tests comparing gradients have become flakier due to the longer pipeline. The gradients are still close but we need to relax the tolerance.

Pull Request resolved: pytorch#154856
Approved by: https://github.com/Skylion007
angelayi pushed a commit to angelayi/pytorch that referenced this pull request Jun 5, 2025
Fix pytorch#154373, pytorch#154391, pytorch#154408, pytorch#154443, pytorch#154481

Because MultiProcContinousTest [now executes the tests with 8 GPUs instead of 2](pytorch#153653), our PP tests comparing gradients have become flakier due to the longer pipeline. The gradients are still close but we need to relax the tolerance.

Pull Request resolved: pytorch#154856
Approved by: https://github.com/Skylion007
@github-actions github-actions bot deleted the gh/H-Huang/186/head branch July 4, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (pipeline) release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

0