8000 repeat_interleave Performance Issue · Issue #31980 · pytorch/pytorch · GitHub 6485
[go: up one dir, main page]

Skip to content
repeat_interleave Performance Issue #31980
@Rick-McCoy

Description

@Rick-McCoy

🐛 Bug

repeat_interleave is ~3.5x slower compared to other equivalent methods.

To Reproduce

Steps to reproduce the behavior:

import torch
import timeit
timeit.timeit('torch.randn(10000).repeat_interleave(100, -1)', 'import torch', number=100)
timeit.timeit('torch.randn(10000)[..., None].expand(-1, 100).flatten(-2, -1)', 'import torch', number=100)

Outputs:

0.37211750000000166
0.10460659999995414

Expected behavior

Tensor.repeat_interleave is equivalent to Tensor.expand().flatten().
The execution time should be within acceptable bounds, and yet the performance difference is quite drastic.

Environment

PyTorch version: 1.3.1
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Microsoft Windows 10 Home
GCC version: Could not collect
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudnn64_7.dll

Versions of relevant libraries:
[pip3] numpy==1.16.4

[pip3] pytorch-lightning==0.5.3.2

[pip3] torch==1.3.1

[pip3] torchvision==0.4.2
[conda] Could not collect

cc @VitalyFedyunin @ngimel @gchanan @mruberry

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issuemodule: performanceIssues related to performance, either of kernel code or framework gluemodule: tensor creationtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0