-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
TST Extend tests for scipy.sparse.*array
in sklearn/svm/tests/test_sparse.py
#27511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST Extend tests for scipy.sparse.*array
in sklearn/svm/tests/test_sparse.py
#27511
Conversation
@Charlie-XIAO would love to take any help from you whenever you feel free. |
@work-mohit Personally I would suggest refactoring these two tests a bit because I 8000 find them hard to read (not sure if maintainers agree). First add Then, I think we can refactor def check_svm_model_equal(svm_model, X_train, y_train, X_test):
# Use the original svm model for dense fit and clone an exactly same
# svm model for sparse fit
sparse_svm = base.clone(svm_model)
svm_model.fit(X_train.toarray(), y_train)
if sparse.issparse(X_test):
X_test_dense = X_test.toarray()
else:
X_test_dense = X_test
sparse_svm.fit(X_train, y_train)
assert sparse.issparse(sparse_svm.support_vectors_)
assert sparse.issparse(sparse_svm.dual_coef_)
assert_array_almost_equal(
svm_model.support_vectors_, sparse_svm.support_vectors_.toarray()
)
assert_array_almost_equal(svm_model.dual_coef_, sparse_svm.dual_coef_.toarray())
if svm_model.kernel == "linear":
assert sparse.issparse(sparse_svm.coef_)
assert_array_almost_equal(svm_model.coef_, sparse_svm.coef_.toarray())
assert_array_almost_equal(svm_model.support_, sparse_svm.support_)
assert_array_almost_equal(
svm_model.predict(X_test_dense), sparse_svm.predict(X_test)
)
assert_array_almost_equal(
svm_model.decision_function(X_test_dense), sparse_svm.decision_function(X_test)
)
assert_array_almost_equal(
svm_model.decision_function(X_test_dense),
sparse_svm.decision_function(X_test_dense),
)
if isinstance(svm_model, svm.OneClassSVM):
msg = "cannot use sparse input in 'OneClassSVM' trained on dense data"
else:
assert_array_almost_equal(
svm_model.predict_proba(X_test_dense), sparse_svm.predict_proba(X_test), 4
)
msg = "cannot use sparse input in 'SVC' trained on dense data"
if sparse.issparse(X_test):
with pytest.raises(ValueError, match=msg):
svm_model.predict(X_test) Then @skip_if_32bit
@pytest.mark.parametrize(
"X_train, y_train, X_test",
[
[X, Y, T],
[X2, Y2, T2],
[X_blobs[:80], y_blobs[:80], X_blobs[80:]],
[iris.data, iris.target, iris.data],
],
)
@pytest.mark.parametrize("kernel", ["linear", "poly", "rbf", "sigmoid"])
@pytest.mark.parametrize("sparse_container", CSR_CONTAINERS + LIL_CONTAINERS)
def test_svc(X_train, y_train, X_test, kernel, sparse_container):
"""Check that sparse SVC gives the same result as SVC."""
X_train = sparse_container(X_train)
clf = svm.SVC(
gamma=1,
kernel=kernel,
probability=True,
random_state=0,
decision_function_shape="ovo",
)
check_svm_model_equal(clf, X_train, y_train, X_test) You can then do something similar to @pytest.mark.parametrize(
"X_train, y_train, X_test",
[
[X, None, T],
[X2, None, T2],
[X_blobs[:80], None, X_blobs[80:]],
[iris.data, None, iris.data],
],
)
@pytest.mark.parametrize("kernel", ["linear", "poly", "rbf", "sigmoid"])
@pytest.mark.parametrize("sparse_container", CSR_CONTAINERS + LIL_CONTAINERS)
@skip_if_32bit
def test_sparse_oneclasssvm(X_train, y_train, X_test, kernel, sparse_container):
# Check that sparse OneClassSVM gives the same result as dense OneClassSVM
X_train = sparse_container(X_train)
clf = svm.OneClassSVM(gamma=1, kernel=kernel)
check_svm_model_equal(clf, X_train, y_train, X_test) |
I have also seen you getting |
Intersting ! |
Yes sure, I have checked test yet, just raised as a draft so that I can take help with others. |
from ._radius_neighbors_classmode import ( facing this issue, any help ? |
Please recompile @work-mohit. This Cython module is created a few months ago (if I remembered correctly).
|
C:\Users\Mohit\anaconda3\envs\sklearndev\lib\site-packages\scipy\sparse_compressed.py:415: ValueError below is the code at 415 at def multiply(self, other):
"""Point-wise multiplication by another matrix, vector, or
scalar.
"""
# Scalar multiplication.
if isscalarlike(other):
return self._mul_scalar(other)
# Sparse matrix or vector.
if issparse(other):
if self.shape == other.shape:
other = self.__class__(other)
return self._binopt(other, '_elmul_')
# Single element.
elif other.shape == (1, 1):
return self._mul_scalar(other.toarray()[0, 0])
elif self.shape == (1, 1):
return other._mul_scalar(self.toarray()[0, 0])
# A row times a column.
elif self.shape[1] == 1 and other.shape[0] == 1:
return self._mul_sparse_matrix(other.tocsc())
elif self.shape[0] == 1 and other.shape[1] == 1:
return other._mul_sparse_matrix(self.tocsc())
# Row vector times matrix. other is a row.
elif other.shape[0] == 1 and self.shape[1] == other.shape[1]:
other = self._dia_container(
(other.toarray().ravel(), [0]),
shape=(other.shape[1], other.shape[1])
)
return self._mul_sparse_matrix(other)
# self is a row.
elif self.shape[0] == 1 and self.shape[1] == other.shape[1]:
copy = self._dia_container(
(self.toarray().ravel(), [0]),
shape=(self.shape[1], self.shape[1])
)`
` return other._mul_sparse_matrix(copy)
# Column vector times matrix. other is a column.
elif other.shape[1] == 1 and self.shape[0] == other.shape[0]:
other = self._dia_container(
(other.toarray().ravel(), [0]),
shape=(other.shape[0], other.shape[0])
)
return other._mul_sparse_matrix(self)
# self is a column.
elif self.shape[1] == 1 and self.shape[0] == other.shape[0]:
copy = self._dia_container(
(self.toarray().ravel(), [0]),
shape=(self.shape[0], self.shape[0])
)
return copy._mul_sparse_matrix(other)
else:
raise ValueError("inconsistent shapes")`` |
@work-mohit I think I've already told you how to debug this: #27511 (comment). I've checked that this is indeed the case, and it appears in the test instead of in the module. The thing is, if you have |
…tst/svm/test_sparse.py
…mohit/scikit-learn into fix/tst/svm/test_sparse.py
@ogrisel I'm getting
should this be reported? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please quote the full traceback of the exception to understand the context of the problem?
@@ -53,77 +56,71 @@ def check_svm_model_equal(dense_svm, sparse_svm, X_train, y_train, X_test): | |||
assert sparse.issparse(sparse_svm.support_vectors_) | |||
assert sparse.issparse(sparse_svm.dual_coef_) | |||
assert_array_almost_equal( | |||
dense_svm.support_vectors_, sparse_svm.support_vectors_.toarray() | |||
svm_model.support_vectors_, sparse_svm.support_vectors_.toarray() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather not rename this variable. The previous name was more informative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to point out that, instead of taking two params (dense_svm
& sparse_svm
) separately, we're combining both. So I guess only naming dense_svm
would not be sufficient. You can suggest some more meaningful name.
csr_container = <class 'scipy.sparse._csr.csr_array'>
|
Hi @work-mohit, are you still working on this PR? |
I got stuck somewhere in the issue, now I'm not tracking any more. |
@Charlie-XIAO: are you interested to continue this PR? |
Yes, I can continue this PR if needed. However, if I need to contribute to this branch I will need access granted by @work-mohit since I am not a maintainer. If @work-mohit no longer decides to continue, I would personally prefer to open another PR and mention him as collaborator in a commit. Would that be okay or do you suggest an alternative approach @jjerphan? |
Yes, this is usually the approach we take, giving credits to the original authors. |
@Charlie-XIAO this PR can be closed, right? |
I confirm it can be closed since it was superseded by #27723. |
Refer Issue #27090
currently a draft, would love to take any help.