torch.multiprocessing.Queue Zeroes Out Tensors on Retrieval #149155

ManuelZ · 2025-03-13T21:10:17Z

🐛 Describe the bug

When sending a CUDA tensor through a torch.multiprocessing.Queue, the received tensor contains only zeros instead of the expected values.

I reproduced it in Windows 10 with Pytorch 2.5.1 and 2.6.0.
I couldn't reproduce it in Colab with Pytorch 2.5.1.

Minimally reproducible example:

# Uncomment to test it in Colab
# %%writefile bug_report.py

import torch
import torch.multiprocessing as mp


def f1(shared_queue):
    """Send a CUDA tensor through the multiprocessing queue."""
    t = torch.tensor((1, 2), device="cuda:0")
    print("Tensor sent: ", t)
    shared_queue.put(t)


def f2(shared_queue):
    """Retrieve the tensor from the queue and print it."""
    while True:
        if shared_queue.empty():
            continue
        t = shared_queue.get()
        print(f"Tensor received: {t}")
        break


if __name__ == "__main__":

    mp.set_start_method("spawn", True)

    shared_queue = torch.multiprocessing.Queue()
    
    p1 = mp.Process(target=f1, args=(shared_queue,))
    p2 = mp.Process(target=f2, args=(shared_queue,))
    
    p1.start()
    p2.start()
    
    p1.join()
    p2.join()

# Uncomment to test it in Colab, in a new cell
# !python bug_report.py

Tensor sent:  tensor([1, 2], device='cuda:0')
Tensor received: tensor([0, 0], device='cuda:0')

Versions

PyTorch version: 2.6.0+cu126
Is debug build: False
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Home (10.0.19045 64-bit)
GCC version: (Rev6, Built by MSYS2 project) 13.1.0
Clang version: Could not collect
CMake version: version 3.31.0
Libc version: N/A

Python version: 3.11.11 | packaged by conda-forge | (main, Dec  5 2024, 14:06:23) [MSC v.1942 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 12.6.77
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1050
Nvidia driver version: 560.94
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Name: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Manufacturer: GenuineIntel
Family: 198
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 2800
MaxClockSpeed: 2801
L2CacheSize: 1024
L2CacheSpeed: None
Revision: None

Versions of relevant libraries:
[pip3] efficientnet_pytorch==0.7.1
[pip3] numpy==1.26.4
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] onnx==1.17.0
[pip3] onnxruntime-gpu==1.21.0
[pip3] onnxslim==0.1.48
[pip3] pytorch_toolbelt==0.8.0
[pip3] segmentation_models_pytorch==0.4.0
[pip3] torch==2.6.0+cu126
[pip3] torch-lr-finder==0.2.2
[pip3] torchaudio==2.6.0+cu126
[pip3] torcheval==0.0.7
[pip3] torchinfo==1.8.0
[pip3] torchvision==0.21.0+cu126
[conda] efficientnet-pytorch      0.7.1                    pypi_0    pypi
[conda] libblas                   3.9.0           31_h641d27c_mkl    conda-forge
[conda] libcblas                  3.9.0           31_h5e41251_mkl    conda-forge
[conda] liblapack                 3.9.0           31_h1aa476e_mkl    conda-forge
[conda] mkl                       2024.2.2            h66d3029_15    conda-forge
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.8.90                  pypi_0    pypi
[conda] pytorch-toolbelt          0.8.0                    pypi_0    pypi
[conda] segmentation-models-pytorch 0.4.0                    pypi_0    pypi
[conda] torch                     2.6.0+cu126              pypi_0    pypi
[conda] torch-lr-finder           0.2.2                    pypi_0    pypi
[conda] torchaudio                2.6.0+cu126              pypi_0    pypi
[conda] torcheval                 0.0.7                    pypi_0    pypi
[conda] torchinfo                 1.8.0                    pypi_0    pypi
[conda] torchvision               0.21.0+cu126             pypi_0    pypi

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @VitalyFedyunin @albanD @ptrblck @msaroufim @eqy

The text was updated successfully, but these errors were encountered:

albanD · 2025-04-09T19:56:10Z

Cannot repro on linux either.
@ptrblck is that something people on your end could take a look at?

neurochen · 2025-05-10T02:24:42Z

I encountered this issue as well. Usually the data will be zeroed out for the first several queue.put. It happens under both Windows 11 and Ubuntu 24.04.

albanD · 2025-05-12T15:13:52Z

Can you give more details on the Ubuntu setup where you were able to reproduce this? (running collect_env if possible) I was not able to on my end.

neurochen · 2025-05-12T16:15:03Z

Can you give more details on the Ubuntu setup where you were able to reproduce this? (running collect_env if possible) I was not able to on my end.

Thanks for checking on this. I tested ubuntu 24.04 using WSL2 and the above script shared BY @ManuelZ, with CUDA toolkit 12.8 for WSL2 installed. It's a clean setup: python 3.11.7, torch 2.0 or later versions. The common observation across both Windows and WSL2, across all versions of pytorch, is: the queue works fine for CPU tensors but zeroes out cuda tensors.

Actually this zerors-out happens for the first several attempts to send cuda tensors. One way to overcome this bug is to resend the cuda tensor again if it fails. Incorporating this idea in dataloader.py and worker.py, however, sometimes results in another error here

assert not self._shutdown and self._tasks_outstanding > 0

This makes things more complicated. Ultimately, I had to give up editing myself and wait for an official bug fix.

neurochen · 2025-05-12T20:13:17Z

Can you give more details on the Ubuntu setup where you were able to reproduce this? (running collect_env if possible) I was not able to on my end.

Just found a native ubuntu to test this. Works fine there. Thanks for working on this windows-specific issue.

zou3519 added module: multiprocessing Related to torch.multiprocessing triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 17, 2025

albanD added module: cuda Related to torch.cuda, and CUDA support in general module: windows Windows support for PyTorch labels Apr 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.multiprocessing.Queue Zeroes Out Tensors on Retrieval #149155

torch.multiprocessing.Queue Zeroes Out Tensors on Retrieval #149155

torch.multiprocessing.Queue Zeroes Out Tensors on Retrieval #149155

torch.multiprocessing.Queue Zeroes Out Tensors on Retrieval #149155

Comments

🐛 Describe the bug

Versions