TST Extend tests for `scipy.sparse.*array` in `sklearn/svm/tests/test_sparse.py` #27511

work-mohit · 2023-10-01T12:20:39Z

Refer Issue #27090

currently a draft, would love to take any help.

github-actions · 2023-10-01T12:22:11Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 52222e7. Link to the linter CI: here}

work-mohit · 2023-10-01T12:27:09Z

@Charlie-XIAO would love to take any help from you whenever you feel free.
I'm confused in two functions test_svc() and test_sparse_oneclasssvm .

Charlie-XIAO · 2023-10-01T13:27:56Z

@work-mohit Personally I would suggest refactoring these two tests a bit because I 8000 find them hard to read (not sure if maintainers agree).

First add X_blobs, y_blobs = make_blobs(n_samples=100, centers=10, random_state=0) to be beginning because I think it should be at the same level as the other datasets (X, X2, iris).

Then, I think we can refactor check_svm_model_equal because dense_svm and sparse_svm seems to be exactly the same model (though different copies). For instance:

def check_svm_model_equal(svm_model, X_train, y_train, X_test):
    # Use the original svm model for dense fit and clone an exactly same
    # svm model for sparse fit
    sparse_svm = base.clone(svm_model)

    svm_model.fit(X_train.toarray(), y_train)
    if sparse.issparse(X_test):
        X_test_dense = X_test.toarray()
    else:
        X_test_dense = X_test
    sparse_svm.fit(X_train, y_train)
    assert sparse.issparse(sparse_svm.support_vectors_)
    assert sparse.issparse(sparse_svm.dual_coef_)
    assert_array_almost_equal(
        svm_model.support_vectors_, sparse_svm.support_vectors_.toarray()
    )
    assert_array_almost_equal(svm_model.dual_coef_, sparse_svm.dual_coef_.toarray())
    if svm_model.kernel == "linear":
        assert sparse.issparse(sparse_svm.coef_)
        assert_array_almost_equal(svm_model.coef_, sparse_svm.coef_.toarray())
    assert_array_almost_equal(svm_model.support_, sparse_svm.support_)
    assert_array_almost_equal(
        svm_model.predict(X_test_dense), sparse_svm.predict(X_test)
    )
    assert_array_almost_equal(
        svm_model.decision_function(X_test_dense), sparse_svm.decision_function(X_test)
    )
    assert_array_almost_equal(
        svm_model.decision_function(X_test_dense),
        sparse_svm.decision_function(X_test_dense),
    )
    if isinstance(svm_model, svm.OneClassSVM):
        msg = "cannot use sparse input in 'OneClassSVM' trained on dense data"
    else:
        assert_array_almost_equal(
            svm_model.predict_proba(X_test_dense), sparse_svm.predict_proba(X_test), 4
        )
        msg = "cannot use sparse input in 'SVC' trained on dense data"
    if sparse.issparse(X_test):
        with pytest.raises(ValueError, match=msg):
            svm_model.predict(X_test)

Then test_svc can be simplified as follows (I used both LIL and CSR because X and X2 used the former while iris and blob used the latter; this is subject to further discussion):

@skip_if_32bit
@pytest.mark.parametrize(
    "X_train, y_train, X_test",
    [
        [X, Y, T],
        [X2, Y2, T2],
        [X_blobs[:80], y_blobs[:80], X_blobs[80:]],
        [iris.data, iris.target, iris.data],
    ],
)
@pytest.mark.parametrize("kernel", ["linear", "poly", "rbf", "sigmoid"])
@pytest.mark.parametrize("sparse_container", CSR_CONTAINERS + LIL_CONTAINERS)
def test_svc(X_train, y_train, X_test, kernel, sparse_container):
    """Check that sparse SVC gives the same result as SVC."""
    X_train = sparse_container(X_train)

    clf = svm.SVC(
        gamma=1,
        kernel=kernel,
        probability=True,
        random_state=0,
        decision_function_shape="ovo",
    )
    check_svm_model_equal(clf, X_train, y_train, X_test)

You can then do something similar to test_sparse_oneclasssvm:

@pytest.mark.parametrize(
    "X_train, y_train, X_test",
    [
        [X, None, T],
        [X2, None, T2],
        [X_blobs[:80], None, X_blobs[80:]],
        [iris.data, None, iris.data],
    ],
)
@pytest.mark.parametrize("kernel", ["linear", "poly", "rbf", "sigmoid"])
@pytest.mark.parametrize("sparse_container", CSR_CONTAINERS + LIL_CONTAINERS)
@skip_if_32bit
def test_sparse_oneclasssvm(X_train, y_train, X_test, kernel, sparse_container):
    # Check that sparse OneClassSVM gives the same result as dense OneClassSVM
    X_train = sparse_container(X_train)

    clf = svm.OneClassSVM(gamma=1, kernel=kernel)
    check_svm_model_equal(clf, X_train, y_train, X_test)

Charlie-XIAO · 2023-10-01T13:38:40Z

I have also seen you getting ValueError: inconsistent shapes in some other tests. Try using --full-trace when doing pytest and try to find out the last stack frame within scikit-learn. Not 100% sure but I doubt there is somewhere that we used * for matrix multiplication. If that's the case, replacing it with @ may solve the issue.

work-mohit · 2023-10-01T14:50:11Z

X_train

Intersting !

work-mohit · 2023-10-01T14:51:24Z

I have also seen you getting ValueError: inconsistent shapes in some other tests. I'm not a 100% sure but try using --full-trace when doing pytest and try to find out the last stack frame within scikit-learn. Did not look into it but I believe there is somewhere that we used * for matrix multiplication. Replacing it with @ may solve the issue.

Yes sure, I have checked test yet, just raised as a draft so that I can take help with others.

…mohit/scikit-learn into fix/tst/svm/test_sparse.py

work-mohit · 2023-10-03T15:20:18Z

from ._radius_neighbors_classmode import (
E ModuleNotFoundError: No module named 'sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode'
========================================================== short test summary info ==========================================================
ERROR sklearn/svm/tests/test_sparse.py - ModuleNotFoundError: No module named 'sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode'
ERROR sklearn/svm/tests/test_sparse.py

facing this issue, any help ?

Charlie-XIAO · 2023-10-03T15:21:26Z

Please recompile @work-mohit. This Cython module is created a few months ago (if I remembered correctly).

pip install -v --no-use-pep517 --no-build-isolation -e .

work-mohit · 2023-10-03T17:31:43Z

C:\Users\Mohit\anaconda3\envs\sklearndev\lib\site-packages\scipy\sparse_compressed.py:415: ValueError
========================================================== short test summary info ==========================================================
FAILED sklearn/svm/tests/test_sparse.py::test_sparse_svc_clone_with_callable_kernel[lil_array] - ValueError: inconsistent shapes
FAILED sklearn/svm/tests/test_sparse.py::test_timeout[lil_array] - ValueError: inconsistent shapes

below is the code at 415 at _commpressed.py
help needed! @Charlie-XIAO

def multiply(self, other):
        """Point-wise multiplication by another matrix, vector, or
        scalar.
        """
        # Scalar multiplication.
        if isscalarlike(other):
            return self._mul_scalar(other)
        # Sparse matrix or vector.
        if issparse(other):
            if self.shape == other.shape:
                other = self.__class__(other)
                return self._binopt(other, '_elmul_')
            # Single element.
            elif other.shape == (1, 1):
                return self._mul_scalar(other.toarray()[0, 0])
            elif self.shape == (1, 1):
                return other._mul_scalar(self.toarray()[0, 0])
            # A row times a column.
            elif self.shape[1] == 1 and other.shape[0] == 1:
                return self._mul_sparse_matrix(other.tocsc())
            elif self.shape[0] == 1 and other.shape[1] == 1:
                return other._mul_sparse_matrix(self.tocsc())
            # Row vector times matrix. other is a row.
            elif other.shape[0] == 1 and self.shape[1] == other.shape[1]:
                other = self._dia_container(
                    (other.toarray().ravel(), [0]),
                    shape=(other.shape[1], other.shape[1])
                )
                return self._mul_sparse_matrix(other)
            # self is a row.
            elif self.shape[0] == 1 and self.shape[1] == other.shape[1]:
                copy = self._dia_container(
                    (self.toarray().ravel(), [0]),
                    shape=(self.shape[1], self.shape[1])
                )`
`                return other._mul_sparse_matrix(copy)
            # Column vector times matrix. other is a column.
            elif other.shape[1] == 1 and self.shape[0] == other.shape[0]:
                other = self._dia_container(
                    (other.toarray().ravel(), [0]),
                    shape=(other.shape[0], other.shape[0])
                )
                return other._mul_sparse_matrix(self)
            # self is a column.
            elif self.shape[1] == 1 and self.shape[0] == other.shape[0]:
                copy = self._dia_container(
                    (self.toarray().ravel(), [0]),
                    shape=(self.shape[0], self.shape[0])
                )
                return copy._mul_sparse_matrix(other)
            else:
                raise ValueError("inconsistent shapes")``

Charlie-XIAO · 2023-10-04T02:32:13Z

@work-mohit I think I've already told you how to debug this: #27511 (comment). I've checked that this is indeed the case, and it appears in the test instead of in the module.

The thing is, if you have matrix * matrix, this is automatically matrix multiplication, but if you have array * array this is not, thus inconsistent shapes. You should use array @ array (and matrix @ matrix` so that in both cases you will have matrix multiplication.

…tst/svm/test_sparse.py

…mohit/scikit-learn into fix/tst/svm/test_sparse.py

work-mohit · 2023-10-10T05:29:56Z

@ogrisel I'm getting

ValueError: Only sparse matrices with 32-bit integer indices are accepted. Got int64 indices. Please do report a minimal reproducer on scikit-learn issue tracker so that support for your use-case can be studied by maintainers. See: https://scikit-learn.org/dev/developers/minimal_reproducer.html

should this be reported?

ogrisel

Could you please quote the full traceback of the exception to understand the context of the problem?

ogrisel · 2023-10-10T05:56:41Z

sklearn/svm/tests/test_sparse.py

@@ -53,77 +56,71 @@ def check_svm_model_equal(dense_svm, sparse_svm, X_train, y_train, X_test):
    assert sparse.issparse(sparse_svm.support_vectors_)
    assert sparse.issparse(sparse_svm.dual_coef_)
    assert_array_almost_equal(
-        dense_svm.support_vectors_, sparse_svm.support_vectors_.toarray()
+        svm_model.support_vectors_, sparse_svm.support_vectors_.toarray()


I would rather not rename this variable. The previous name was more informative.

Just to point out that, instead of taking two params (dense_svm & sparse_svm) separately, we're combining both. So I guess only naming dense_svm would not be sufficient. You can suggest some more meaningful name.

work-mohit · 2023-10-10T06:09:09Z

csr_container = <class 'scipy.sparse._csr.csr_array'>

@pytest.mark.parametrize("csr_container", CSR_CONTAINERS)
def test_sparse_realdata(csr_container):
    # Test on a subset from the 20newsgroups dataset.
    # This catches some bugs if input is not correctly converted into
    # sparse format or weights are not correctly initialized.

    data = np.array([0.03771744, 0.1003567, 0.01174647, 0.027069])
    indices = np.array([6, 5, 35, 31])
    indptr = np.array(
        [
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            1,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            2,
            4,
            4,
            4,
        ]
    )
    X = csr_container((data, indices, indptr))
    y = np.array(
        [
            1.0,
            0.0,
            2.0,
            2.0,
            1.0,
            1.0,
            1.0,
            2.0,
            2.0,
            0.0,
            1.0,
            2.0,
            2.0,
            0.0,
            2.0,
            0.0,
            3.0,
            0.0,
            3.0,
            0.0,
            1.0,
            1.0,
            3.0,
            2.0,
            3.0,
            2.0,
            0.0,
            3.0,
            1.0,
            0.0,
            2.0,
            1.0,
            2.0,
            0.0,
            1.0,
            0.0,
            2.0,
            3.0,
            1.0,
            3.0,
            0.0,
            1.0,
            0.0,
            0.0,
            2.0,
            0.0,
            1.0,
            2.0,
            2.0,
            2.0,
            3.0,
            2.0,
            0.0,
            3.0,
            2.0,
            1.0,
            2.0,
            3.0,
            2.0,
            2.0,
            0.0,
            1.0,
            0.0,
            1.0,
            2.0,
            3.0,
            0.0,
            0.0,
            2.0,
            2.0,
            1.0,
            3.0,
            1.0,
            1.0,
            0.0,
            1.0,
            2.0,
            1.0,
            1.0,
            3.0,
        ]
    )

    clf = svm.SVC(kernel="linear").fit(X.toarray(), y)
    sp_clf = svm.SVC(kernel="linear").fit(X.tocoo(), y)  <===== this line giving check failure.

ValueError: Only sparse matrices with 32-bit integer indices are accepted. Got int64 indices. Please do report a minimal reproducer on scikit-learn issue tracker so that support for your use-case can be studied by maintainers. See: https://scikit-learn.org/dev/developers/minimal_reproducer.html

jjerphan · 2023-11-01T20:59:12Z

Hi @work-mohit, are you still working on this PR?

work-mohit · 2023-11-02T01:31:52Z

Hi @work-mohit, are you still working on this PR?

I got stuck somewhere in the issue, now I'm not tracking any more.

jjerphan · 2023-11-02T08:05:04Z

@Charlie-XIAO: are you interested to continue this PR?

Charlie-XIAO · 2023-11-02T08:14:52Z

Yes, I can continue this PR if needed. However, if I need to contribute to this branch I will need access granted by @work-mohit since I am not a maintainer. If @work-mohit no longer decides to continue, I would personally prefer to open another PR and mention him as collaborator in a commit. Would that be okay or do you suggest an alternative approach @jjerphan?

jjerphan · 2023-11-02T12:19:07Z

Yes, this is usually the approach we take, giving credits to the original authors.

Charlie-XIAO · 2023-11-04T16:03:57Z

@jjerphan Sorry for the delay, you may refer to #27723.

marenwestermann · 2024-03-01T21:11:54Z

@Charlie-XIAO this PR can be closed, right?

jjerphan · 2024-03-01T21:22:25Z

I confirm it can be closed since it was superseded by #27723.

fixed some funtions

8aef042

github-actions bot added the module:svm label Oct 1, 2023

work-mohit mentioned this pull request Oct 1, 2023

TST Extend tests for scipy.sparse.*array #27090

Closed

work-mohit and others added 3 commits October 3, 2023 20:31

Merge branch 'main' into fix/tst/svm/test_sparse.py

2374202

refactor the test cases

7737323

Merge branch 'fix/tst/svm/test_sparse.py' of https://github.com/work-…

a4dab09

…mohit/scikit-learn into fix/tst/svm/test_sparse.py

fixed - errors

51e9369

work-mohit marked this pull request as ready for review October 3, 2023 16:06

work-mohit and others added 7 commits October 4, 2023 10:42

fixed inconsistent shapes

e63e3e7

Merge branch 'main' into fix/tst/svm/test_sparse.py

623f188

Merge branch 'main' of github.com:scikit-learn/scikit-learn into fix/…

a08bd1a

…tst/svm/test_sparse.py

fix

dbd3cc3

Empty-Commit

2efb7c2

Empty-Commit

e1f61f3

Merge branch 'fix/tst/svm/test_sparse.py' of https://github.com/work-…

97df380

…mohit/scikit-learn into fix/tst/svm/test_sparse.py

ogrisel reviewed Oct 10, 2023

View reviewed changes

Merge branch 'main' into fix/tst/svm/test_sparse.py

52222e7

glemaitre added the No Changelog Needed label Oct 31, 2023

Charlie-XIAO mentioned this pull request Nov 4, 2023

TST Extend tests for scipy.sparse.*array in sklearn/svm/tests/test_sparse #27723

Merged

jjerphan closed this Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST Extend tests for `scipy.sparse.*array` in `sklearn/svm/tests/test_sparse.py` #27511

TST Extend tests for `scipy.sparse.*array` in `sklearn/svm/tests/test_sparse.py` #27511

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TST Extend tests for scipy.sparse.*array in sklearn/svm/tests/test_sparse.py #27511

TST Extend tests for scipy.sparse.*array in sklearn/svm/tests/test_sparse.py #27511

Uh oh!

Conversation

Uh oh!

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TST Extend tests for `scipy.sparse.*array` in `sklearn/svm/tests/test_sparse.py` #27511

TST Extend tests for `scipy.sparse.*array` in `sklearn/svm/tests/test_sparse.py` #27511