8000 Add device agnostic support for distributed tests by Alokksinha00 · Pull Request #151560 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Add device agnostic support for distributed tests #151560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Alokksinha00
Copy link
@Alokksinha00 Alokksinha00 commented Apr 17, 2025

/_composable/test_replicate.py and /algorithm/ddp_comm_hooks are modified for generic device

  1. create_pg() method from DistributedTestBase is used for process group creation wherever accelarator/cuda is used
  2. instantiate_device_type_tests is imported and device is brought in respective tests

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k

/_composable/test_replicate.py and /algorithm/ddp_comm_hooks
are modified for generic device
Copy link
pytorch-bot bot commented Apr 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151560

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures

As of commit aa03a85 with merge base bb11122 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category labels Apr 17, 2025
@albanD albanD requested a review from wconstab April 17, 2025 13:44
@albanD albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 17, 2025
@etaf etaf added the ciflow/xpu Run XPU CI tasks label Apr 25, 2025
/_composable/test_replicate.py, /checkpoint/test_fsspec.py
and /algorithm/ddp_comm_hooks are modified for generic supported device
@Alokksinha00
Copy link
Author

@wconstab @H-Huang @wanchaol @fegin @kwen2501 @d4l3k
Could you please help with this review ?
Thanks

@Alokksinha00
Copy link
Author

@wconstab @H-Huang @wanchaol @fegin @kwen2501 @d4l3k
gentle reminder..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: distributed (checkpoint) topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0