You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is a purposed change in torch 2.8 :) -- treat collective kernels the same as other kernels if async_op=False, by launching them on the "current stream."
🚀 The feature, motivation and pitch
ProcessGroupNCCL uses indepdent cuda streams in pytorch 2.7.0, but use merged cuda streams in pytorch nightly
pytorch 2.7.0:
pip3 install torch torchvision torchaudio
pytorch nightly:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126
Alternatives
No response
Additional context
No response
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k
The text was updated successfully, but these errors were encountered: