-
Notifications
You must be signed in to change notification settings - Fork 24.8k
Fix multiprocessing with CUDA_VISIBLE_DEVICES seems to give the wrong device #149248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149248
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 98aef5c with merge base 1e37e5b ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot label "release notes: distributed (miscellaneous)" |
Let's discuss on the issue |
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Hi is it possible to be merged? |
@fzyzcjy I'm afraid it is not as this is very much breaking the current behavior. In particular for the many distributed users that rely on always using |
@albanD I see, thank you! However I do feel it to be weird: when we see such a tensor with "device=0", it indeed does not mean that it is on the 0th device, but mean no another device :/ |
(I also replied in sgl-project/sglang#4565) |
Fixes #149196
This is merely a proof-of-concept PR. I would like to hear a bit of feedback - is the direction acceptable - before working on it deeper.
Things that will be added if the direction of PR looks acceptable: Unit tests, caches, implement-in-C++ (to speedup), etc.