-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] ENH: Swap rows in sparsefuncs #3104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
|
||
# License: BSD 3 clause | ||
import scipy.sparse as sp | ||
import numpy as np | ||
|
||
from .sparsefuncs_fast import (csr_mean_variance_axis0, | ||
csc_mean_variance_axis0, | ||
|
@@ -56,3 +57,140 @@ def inplace_column_scale(X, scale): | |
else: | ||
raise TypeError( | ||
"Unsupported type; expected a CSR or CSC sparse matrix.") | ||
|
||
|
||
def inplace_swap_row_csc(X, m, n): | ||
""" | ||
Swaps two rows of a CSC matrix in-place. | ||
|
||
Parameters | ||
---------- | ||
X: scipy.sparse.csc_matrix, shape=(n_samples, n_features) | ||
Matrix whose two rows are to be swapped. | ||
|
||
m: int | ||
Index of the row of X to be swapped. | ||
|
||
n: int | ||
Index of the row of X to be swapped. | ||
""" | ||
for t in [m, n]: | ||
if isinstance(t, np.ndarray): | ||
raise TypeError("m and n should be valid integers") | ||
|
||
if m < 0: | ||
m += X.shape[0] | ||
if n < 0: | ||
n += X.shape[0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same remark. |
||
|
||
m_mask = X.indices == m | ||
X.indices[X.indices == n] = m | ||
X.indices[m_mask] = n | ||
|
||
|
||
def inplace_swap_row_csr(X, m, n): | ||
""" | ||
Swaps two rows of a CSR matrix in-place. | ||
|
||
Parameters | ||
---------- | ||
X: scipy.sparse.csr_matrix, shape=(n_samples, n_features) | ||
Matrix whose two rows are to be swapped. | ||
|
||
m: int | ||
Index of the row of X to be swapped. | ||
|
||
n: int | ||
Index of the row of X to be swapped. | ||
""" | ||
for t in [m, n]: | ||
if isinstance(t, np.ndarray): | ||
raise TypeError("m and n should be valid integers") | ||
|
||
if m < 0: | ||
m += X.shape[0] | ||
if n < 0: | ||
n += X.shape[0] | ||
|
||
# The following swapping makes life easier since m is assumed to be the | ||
# smaller integer below. | ||
if m > n: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you tell us what you are doing here (a comment in the code) as I find this surprising. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If m > n, then concatenate , won't work the way it is supposed to below. (Numpy slicing) |
||
m, n = n, m | ||
|
||
indptr = X.indptr | ||
m_start = indptr[m] | ||
m_stop = indptr[m + 1] | ||
n_start = indptr[n] | ||
n_stop = indptr[n + 1] | ||
nz_m = m_stop - m_start | ||
nz_n = n_stop - n_start | ||
|
||
|
||
if nz_m != nz_n: | ||
# Modify indptr first | ||
X.indptr[m + 2:n] += nz_n - nz_m | ||
X.indptr[m + 1] = m_start + nz_n | ||
X.indptr[n] = n_stop - nz_m | ||
|
||
X.indices = np.concatenate([X.indices[:m_start], | ||
X.indices[n_start:n_stop], | ||
X.indices[m_stop:n_start], | ||
X.indices[m_start:m_stop], | ||
X.indices[n_stop:]]) | ||
X.data = np.concatenate([X.data[:m_start], | ||
X.data[n_start:n_stop], | ||
X.data[m_stop:n_start], | ||
X.data[m_start:m_stop], | ||
X.data[n_stop:]]) | ||
|
||
|
||
def inplace_swap_row(X, m, n): | ||
""" | ||
Swaps two rows of a CSC/CSR matrix in-place. | ||
|
||
Parameters | ||
---------- | ||
X : CSR or CSC sparse matrix, shape=(n_samples, n_features) | ||
Matrix whose two rows are to be swapped. | ||
|
||
m: int | ||
Index of the row of X to be swapped. | ||
|
||
n: int | ||
Index of the row of X to be swapped. | ||
""" | ||
if isinstance(X, sp.csc_matrix): | ||
return inplace_swap_row_csc(X, m, n) | ||
elif isinstance(X, sp.csr_matrix): | ||
return inplace_swap_row_csr(X, m, n) | ||
else: | ||
raise TypeError( | ||
"Unsupported type; expected a CSR or CSC sparse matrix.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Coming late to the party but it would be nice to report the actual type of sparse matrix with |
||
|
||
|
||
def inplace_swap_column(X, m, n): | ||
""" | ||
Swaps two columns of a CSC/CSR matrix in-place. | ||
|
||
Parameters | ||
---------- | ||
X : CSR or CSC sparse matrix, shape=(n_samples, n_features) | ||
Matrix whose two columns are to be swapped. | ||
|
||
m: int | ||
Index of the column of X to be swapped. | ||
|
||
n : int | ||
Index of the column of X to be swapped. | ||
""" | ||
if m < 0: | ||
m += X.shape[1] | ||
if n < 0: | ||
n += X.shape[1] | ||
if isinstance(X, sp.csc_matrix): | ||
return inplace_swap_row_csr(X, m, n) | ||
elif isinstance(X, sp.csr_matrix): | ||
return inplace_swap_row_csc(X, m, n) | ||
else: | ||
raise TypeError( | ||
"Unsupported type; expected a CSR or CSC sparse matrix.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as above here. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,13 @@ | ||
import numpy as np | ||
import scipy.sparse as sp | ||
|
||
from scipy import linalg | ||
from numpy.testing import assert_array_almost_equal, assert_array_equal | ||
|
||
from sklearn.datasets import make_classification | ||
from sklearn.utils.sparsefuncs import (mean_variance_axis0, | ||
inplace_column_scale) | ||
inplace_column_scale, | ||
inplace_swap_row, inplace_swap_column) | ||
from sklearn.utils.sparsefuncs_fast import assign_rows_csr | ||
from sklearn.utils.testing import assert_raises | ||
|
||
|
@@ -60,3 +63,55 @@ def test_inplace_column_scale(): | |
assert_array_almost_equal(XA, Xc.toarray()) | ||
assert_array_almost_equal(XA, Xr.toarray()) | ||
assert_raises(TypeError, inplace_column_scale, X.tolil(), scale) | ||
|
||
|
||
def test_inplace_swap_row(): | ||
X = np.array([[0, 3, 0], | ||
[2, 4, 0], | ||
[0, 0, 0], | ||
[9, 8, 7], | ||
[4, 0, 5]], dtype=np.float64) | ||
X_csr = sp.csr_matrix(X) | ||
X_csc = sp.csc_matrix(X) | ||
|
||
swap = linalg.get_blas_funcs(('swap',), (X,)) | ||
swap = swap[0] | ||
X[0], X[-1] = swap(X[0], X[-1]) | ||
inplace_swap_row(X_csr, 0, -1) | ||
inplace_swap_row(X_csc, 0, -1) | ||
assert_array_equal(X_csr.toarray(), X_csc.toarray()) | ||
assert_array_equal(X, X_csc.toarray()) | ||
assert_array_equal(X, X_csr.toarray()) | ||
|
||
X[2], X[3] = swap(X[2], X[3]) | ||
inplace_swap_row(X_csr, 2, 3) | ||
inplace_swap_row(X_csc, 2, 3) | ||
assert_array_equal(X_csr.toarray(), X_csc.toarray()) | ||
assert_array_equal(X, X_csc.toarray()) | ||
assert_array_equal(X, X_csr.toarray()) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You should also add a test for the unsupported formats: assert_raises(TypeError, inplace_swap_row, X_csr.tolil()) |
||
|
||
def test_inplace_swap_column(): | ||
X = np.array([[0, 3, 0], | ||
[2, 4, 0], | ||
[0, 0, 0], | ||
[9, 8, 7], | ||
[4, 0, 5]], dtype=np.float64) | ||
X_csr = sp.csr_matrix(X) | ||
X_csc = sp.csc_matrix(X) | ||
|
||
swap = linalg.get_blas_funcs(('swap',), (X,)) | ||
swap = swap[0] | ||
X[:, 0], X[:, -1] = swap(X[:, 0], X[:, -1]) | ||
inplace_swap_column(X_csr, 0, -1) | ||
inplace_swap_column(X_csc, 0, -1) | ||
assert_array_equal(X_csr.toarray(), X_csc.toarray()) | ||
assert_array_equal(X, X_csc.toarray()) | ||
assert_array_equal(X, X_csr.toarray()) | ||
|
||
X[:, 0], X[:, 1] = swap(X[:, 0], X[:, 1]) | ||
inplace_swap_column(X_csr, 0, 1) | ||
inplace_swap_column(X_csc, 0, 1) | ||
assert_array_equal(X_csr.toarray(), X_csc.toarray()) | ||
assert_array_equal(X, X_csc.toarray()) | ||
assert_array_equal(X, X_csr.toarray()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please also check for unsupported input formats here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case someone passes in a mutable (for instance a numpy int instead of a Python int), I would do 'm = m + X.shape[0]' here to avoid side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiousity (and for learning), could you please tell me how would the behaviour of the code that I've written, change for a numpy int ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get that:
Gael, where you thinking of 0-d arrays?
I'm not sure if guarding for this is worthwhile. It's very likely that a later refactoring round undoes this; callers should take care not to produce these values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was.
Maybe I am paranoid but each time I see an in place modification I check that it does not apply to input arguments. Thus I think that l might see the potential problem also in a later code review.
I think that it is good practice not to let such a code. If m is an intégré there is no computational benefit to doing the +=.
—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very Pythonic shorthand, though. I think it would be cleaner to typecheck and raise a
TypeError
for a 0-d array:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GaelVaroquaux Is it just 0-D numpy arrays, or are there other cases also? Then it might be better to do
m = m + X.shape[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you whish.