-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Closed
Description
Indexing a 2d array X
with integer vectors cols
and rows
, we should have:
X[rows[:, np.newaxis], cols] === X[rows][:, cols]
However, the preceding formulation is currently much slower to compute:
In [1]: from scipy import sparse
In [2]: import numpy as np
In [3]: R = sparse.rand(1000, 100000, density=.001).tocsr()
In [4]: sel_rows = np.flatnonzero(np.random.randint(2, size=R.shape[0]))
In [5]: sel_cols = np.flatnonzero(np.random.randint(2, size=R.shape[1]))
In [6]: %time X1 = R[sel_rows][:, sel_cols]
CPU times: user 6.71 ms, sys: 2.05 ms, total: 8.75 ms
Wall time: 8.03 ms
In [7]: %time X2 = R[sel_rows[:, None], sel_cols]
CPU times: user 3min 20s, sys: 2.62 s, total: 3min 22s
Wall time: 3min 25s
In [8]: X1.sum_duplicates()
In [9]: X2.sum_duplicates()
In [10]: np.all(X1.indices == X2.indices), np.all(X1.indptr == X2.indptr), np.all(X1.data == X2.data)
Out[10]: (True, True, True)
Metadata
Metadata
Assignees
Labels
No labels