Fancy indexing of CS* matrices with column vector is very slow

Indexing a 2d array X with integer vectors cols and rows, we should have:

X[rows[:, np.newaxis], cols] === X[rows][:, cols]

However, the preceding formulation is currently much slower to compute:

In [1]: from scipy import sparse
In [2]: import numpy as np
In [3]: R = sparse.rand(1000, 100000, density=.001).tocsr()
In [4]: sel_rows = np.flatnonzero(np.random.randint(2, size=R.shape[0]))
In [5]: sel_cols = np.flatnonzero(np.random.randint(2, size=R.shape[1]))
In [6]: %time X1 = R[sel_rows][:, sel_cols]
CPU times: user 6.71 ms, sys: 2.05 ms, total: 8.75 ms
Wall time: 8.03 ms
In [7]: %time X2 = R[sel_rows[:, None], sel_cols]
CPU times: user 3min 20s, sys: 2.62 s, total: 3min 22s
Wall time: 3min 25s
In [8]: X1.sum_duplicates()
In [9]: X2.sum_duplicates()
In [10]: np.all(X1.indices == X2.indices), np.all(X1.indptr == X2.indptr), np.all(X1.data == X2.data)
Out[10]: (True, True, True)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions