repeat_interleave Performance Issue

@VitalyFedyunin

🐛 Bug

repeat_interleave is ~3.5x slower compared to other equivalent methods.

To Reproduce

Steps to reproduce the behavior:

import torch
import timeit
timeit.timeit('torch.randn(10000).repeat_interleave(100, -1)', 'import torch', number=100)
timeit.timeit('torch.randn(10000)[..., None].expand(-1, 100).flatten(-2, -1)', 'import torch', number=100)

Outputs:

0.37211750000000166
0.10460659999995414

Expected behavior

Tensor.repeat_interleave is equivalent to Tensor.expand().flatten().
The execution time should be within acceptable bounds, and yet the performance difference is quite drastic.

Environment

PyTorch version: 1.3.1
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Microsoft Windows 10 Home
GCC version: Could not collect
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudnn64_7.dll

Versions of relevant libraries:
[pip3] numpy==1.16.4

[pip3] pytorch-lightning==0.5.3.2

[pip3] torch==1.3.1

[pip3] torchvision==0.4.2
[conda] Could not collect

cc @VitalyFedyunin @ngimel @gchanan @mruberry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions