-
Notifications
You must be signed in to change notification settings - Fork 24.8k
Description
🐛 Bug
repeat_interleave is ~3.5x slower compared to other equivalent methods.
To Reproduce
Steps to reproduce the behavior:
import torch
import timeit
timeit.timeit('torch.randn(10000).repeat_interleave(100, -1)', 'import torch', number=100)
timeit.timeit('torch.randn(10000)[..., None].expand(-1, 100).flatten(-2, -1)', 'import torch', number=100)
Outputs:
0.37211750000000166
0.10460659999995414
Expected behavior
Tensor.repeat_interleave is equivalent to Tensor.expand().flatten().
The execution time should be within acceptable bounds, and yet the performance difference is quite drastic.
Environment
PyTorch version: 1.3.1
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Microsoft Windows 10 Home
GCC version: Could not collect
CMake version: Could not collect
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudnn64_7.dll
Versions of relevant libraries:
[pip3] numpy==1.16.4
[pip3] pytorch-lightning==0.5.3.2
[pip3] torch==1.3.1
[pip3] torchvision==0.4.2
[conda] Could not collect