Fix multiprocessing with CUDA_VISIBLE_DEVICES seems to give the wrong device #149248

fzyzcjy · 2025-03-15T06:22:20Z

This is merely a proof-of-concept PR. I would like to hear a bit of feedback - is the direction acceptable - before working on it deeper.

Things that will be added if the direction of PR looks acceptable: Unit tests, caches, implement-in-C++ (to speedup), etc.

pytorch-bot · 2025-03-15T06:22:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149248

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 98aef5c with merge base 1e37e5b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

fzyzcjy · 2025-03-15T06:25:11Z

@pytorchbot label "release notes: distributed (miscellaneous)"

albanD · 2025-03-17T16:07:37Z

Let's discuss on the issue

github-actions · 2025-05-16T16:40:10Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

fzyzcjy · 2025-05-16T23:43:35Z

Hi is it possible to be merged?

albanD · 2025-05-20T18:13:25Z

@fzyzcjy I'm afraid it is not as this is very much breaking the current behavior. In particular for the many distributed users that rely on always using device=0 by setting appropriate CUDA_VISIBLE_DEVICE={rank}. This patch would make it impossible for these users to send Tensors across processes.

fzyzcjy · 2025-05-21T05:18:35Z

@albanD I see, thank you! However I do feel it to be weird: when we see such a tensor with "device=0", it indeed does not mean that it is on the 0th device, but mean no another device :/

fzyzcjy · 2025-05-21T05:18:42Z

(I also replied in sgl-project/sglang#4565)

more

98aef5c

fzyzcjy mentioned this pull request Mar 15, 2025

(Will PR) Multiprocessing with CUDA_VISIBLE_DEVICES seems to give the wrong device #149196

Open

pytorch-bot bot added the release notes: distributed (miscellaneous) label Mar 15, 2025

fzyzcjy mentioned this pull request Mar 15, 2025

[rollout] feat: add SGLang as rollout engine to verl volcengine/verl#490

8000 Merged

pytorchbot added the open source label Mar 15, 2025

albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 17, 2025

fzyzcjy mentioned this pull request Mar 19, 2025

Patch PyTorch's bug that cross-process tensor transfer will lead to wrong device sgl-project/sglang#4565

Merged

6 tasks

github-actions bot added the Stale label May 16, 2025

github-actions bot closed this Jun 20, 2025

hebiao064 mentioned this pull request Jun 25, 2025

[RL] support update_weights_from_tensor for mtp sgl-project/sglang#7415

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix multiprocessing with CUDA_VISIBLE_DEVICES seems to give the wrong device #149248

Fix multiprocessing with CUDA_VISIBLE_DEVICES seems to give the wrong device #149248

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix multiprocessing with CUDA_VISIBLE_DEVICES seems to give the wrong device #149248

Fix multiprocessing with CUDA_VISIBLE_DEVICES seems to give the wrong device #149248

Uh oh!

Conversation

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149248

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!