TST Extend tests for `scipy.sparse.*array` in `test_pairwise.py` #27288

StefanieSenger · 2023-09-04T12:08:30Z

Reference Issues/PRs

Towards #27090.

What does this implement/fix? Explain your changes.

This PR substitutes scipy sparse matrices with the scipy containers introduced in #27095 in the sklearn/metrics/tests/test_pairwise.py test file.

github-actions · 2023-09-04T12:10:06Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 90c1c0b. Link to the linter CI: here}

StefanieSenger · 2023-09-04T12:57:24Z

 ERROR collecting metrics/tests/test_pairwise.py ________________
In test_euclidean_distances_known_result: 2 parameter sets specified, with different number of ids: 3
----------------- generated xml file: /temp_dir/test-data.xml ------------------

The failing test from the CI (test_euclidean_distances_known_result) passes in pytest with the correct number of test cases (9)... Do I need to do something?

Charlie-XIAO · 2023-09-04T15:25:23Z

 ERROR collecting metrics/tests/test_pairwise.py ________________
In test_euclidean_distances_known_result: 2 parameter sets specified, with different number of ids: 3
----------------- generated xml file: /temp_dir/test-data.xml ------------------
The failing test from the CI (test_euclidean_distances_known_result) passes in pytest with the correct number of test cases (9)... Do I need to do something?

I'm not maintainer but I think you can simply remove ids. The thing is XXX_CONTAINERS may have 1 or 2 elements based on scipy version, so when XXX_CONTAINERS only have 1 element the errored test case would have 2 parameter sets with 3 ids. (Well, unless you specify ids based on the length of XXX_CONTAINERS but I think it is not necessary since as far as I know, ids are mostly used for collecting specific test cases via the -k option).

StefanieSenger · 2023-09-06T08:43:10Z

Thank you, @Charlie-XIAO I will try this.

OmarManzoor · 2023-09-06T09:57:42Z

Hi @StefanieSenger,

I think the ids are basically used for identification purposes, otherwise some defaults will be used depending on the data structure. For example with ids the results are like:

test_pairwise.py::test_euclidean_distances_known_result[dense-dense] PASSED [ 11%]
test_pairwise.py::test_euclidean_distances_known_result[dense-sparse_matrix] PASSED [ 22%]
test_pairwise.py::test_euclidean_distances_known_result[sparse_matrix-dense] PASSED [ 44%]
test_pairwise.py::test_euclidean_distances_known_result[sparse_matrix-sparse_array] PASSED [ 66%]

without ids the results are

test_pairwise.py::test_euclidean_distances_known_result[array-array] PASSED [ 11%]
test_pairwise.py::test_euclidean_distances_known_result[array-csr_matrix] PASSED [ 22%]
test_pairwise.py::test_euclidean_distances_known_result[csr_matrix-array] PASSED [ 44%]
test_pairwise.py::test_euclidean_distances_known_result[csr_matrix-csr_array] PASSED [ 66%]

So we can either remove these ids or conditionally set the ids based on the container length.
Since they were present originally if we consider removing them, then I think we should also take another maintainer's opinion.

StefanieSenger · 2023-09-07T19:58:14Z

Hi, @OmarManzoor, I see and to be on the secure side I did now set the ids conditionally based on whether the container contained more than one element.
Though I'd prefer the simplest possible solution, if another maintainer confirms it's okay to delete the ids.

It seems that without passing the ids, these are named differently, but the combinations tested are the same.

=====================with passing IDs
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[dense-dense] PASSED                                                              [ 11%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[dense-sparse_matrix] PASSED                                                      [ 22%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[dense-sparse_array] PASSED                                                       [ 33%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[sparse_matrix-dense] PASSED                                                      [ 44%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[sparse_matrix-sparse_matrix] PASSED                                              [ 55%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[sparse_matrix-sparse_array] PASSED                                               [ 66%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[sparse_array-dense] PASSED                                                       [ 77%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[sparse_array-sparse_matrix] PASSED                                               [ 88%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[sparse_array-sparse_array] PASSED  


==================without passing IDs

sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[array-array] PASSED                                                              [ 11%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[array-csr_matrix] PASSED                                                         [ 22%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[array-csr_array] PASSED                                                          [ 33%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[csr_matrix-array] PASSED                                                         [ 44%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[csr_matrix-csr_matrix] PASSED                                                    [ 55%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[csr_matrix-csr_array] PASSED                                                     [ 66%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[csr_array-array] PASSED                                                          [ 77%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[csr_array-csr_matrix] PASSED                                                     [ 88%]
sklearn/metrics/tests/test_pairwise.py::test_euclidean_distances_known_result[csr_array-csr_array] PASSED                                                      [100%]

"dense", "sparse_matrix", "sparse_array" translates directly to , "array", "csr_matrix", "csr_array".

OmarManzoor

Thanks @StefanieSenger. LGTM.

CI errors

OmarManzoor

I think there are a number of other instances as well where we might needs to set ids conditionally. Since they all involve CSR_CONTAINERS we can simply have a variable at the top like

container_ids = (
        ["dense", "sparse_matrix"] if len(CSR_CONTAINERS) == 1
        else ["dense", "sparse_matrix", "sparse_array"]
)

and then use this in all the required places.

Or you can remove the ids if you prefer. Since I added the second reviewer tag, if another maintainer feels we need them, then they can be added.

sklearn/metrics/tests/test_pairwise.py

StefanieSenger · 2023-09-14T08:41:20Z

I think there are a number of other instances as well where we might needs to set ids conditionally. Since they all involve CSR_CONTAINERS we can simply have a variable at the top like

container_ids = (
["dense", "sparse_matrix"] if len(CSR_CONTAINERS) == 1
else ["dense", "sparse_matrix", "sparse_array"]
)

and then use this in all the required places.

Or you can remove the ids if you prefer. Since I added the second reviewer tag, if another maintainer feels we need them, then they can be added.

Thank you, @OmarManzoor.

I've taken a look into the pytests documentation. It seems that ids are strings, that pytest automatically assigns for identifying each test case, but we can also explicitly name them, if we want, for clarity or debugging.

Since the ids are not further used in the code, I will omit them.

Charlie-XIAO · 2023-09-14T08:47:19Z

@StefanieSenger You may want to look at this one: #27262 (comment)

StefanieSenger · 2023-09-14T08:52:30Z

@StefanieSenger You may want to look at this one: #27262 (comment)

Ah, thank you, @Charlie-XIAO. I will keep them then. Do you know why?

Charlie-XIAO · 2023-09-14T09:00:12Z

@StefanieSenger You may want to look at this one: #27262 (comment)

Ah, thank you, @Charlie-XIAO. I will keep them then. Do you know why?

I think it's still for compatibility, for instance, one may have a script that selects tests to run based on names. But not sure, maybe @glemaitre?

glemaitre · 2023-09-14T09:10:11Z

The ids allow to have the explicit parameter used during the parametrization in case it give some mistic names. I would say that they are not necessary if the set of parameters makes it already explicit and readable when running the output of pytest.

StefanieSenger · 2023-09-14T09:13:14Z

It's easier to trace back failing tests then, I see. Thank you @glemaitre .

OmarManzoor

LGTM. Thanks @StefanieSenger

glemaitre

Otherwise LGTM for the other tests.

glemaitre · 2023-09-14T16:20:51Z

sklearn/metrics/tests/test_pairwise.py

@@ -146,8 +152,8 @@ def test_pairwise_distances(global_dtype):

    # Test with sparse X and Y,
    # currently only supported for Euclidean, L1 and cosine.
-    X_sparse = csr_matrix(X)
-    Y_sparse = csr_matrix(Y)
+    X_sparse = csr_container(X)


Could we isolate the code that test with the sparse matrices.
It would avoid to repeat the tests for the dense case multiple times.

@glemaitre, do you mean to turn this test into two separate tests?

Yes exactly.

Okay, I will try that. :)

Edit: It's done. :)

glemaitre · 2023-09-14T16:22:50Z

sklearn/metrics/tests/test_pairwise.py

-    X_sparse = csr_matrix(X)
-    Y_sparse = csr_matrix(Y)
+    X_sparse = csr_container(X)
+    Y_sparse = csr_container(Y)

    S = pairwise_distances(X_sparse, Y_sparse, metric="euclidean")
    S2 = euclidean_distances(X_sparse, Y_sparse)


In l. 168-169, I see the following:

S = pairwise_distances(X_sparse, Y_sparse.tocsc(), metric="manhattan") S2 = manhattan_distances(X_sparse.tobsr(), Y_sparse.tocoo())

It means that somehow, we expect to try with other containers as well, BSR, COO, and CSC.
I assume that we just decorate the new test by parametrizing for all combinations. I don't think that we need to mix the type of X and y, but try the same type for both.

Okay, I think I know what you mean here.

glemaitre

LGTM. Thanks @StefanieSenger

…kit-learn#27288)

added scipy sparse type containers to tests

920cb4c

github-actions bot added the module:metrics label Sep 4, 2023

OmarManzoor added the No Changelog Needed label Sep 6, 2023

set ids conditionally on container length

d11a5c0

OmarManzoor previously approved these changes Sep 8, 2023

View reviewed changes

OmarManzoor added the Waiting for Second Reviewer First reviewer is done, need a second one! label Sep 8, 2023

OmarManzoor reviewed Sep 8, 2023

View reviewed changes

Merge branch 'main' into sparse_array_test_pairwise

c934898

omitted unused ids

30e878a

ids added after discussion

455a880

OmarManzoor approved these changes Sep 14, 2023

View reviewed changes

glemaitre self-requested a review September 14, 2023 16:17

glemaitre reviewed Sep 14, 2023

View reviewed changes

changes after review

90c1c0b

OmarManzoor removed the Waiting for Second Reviewer First reviewer is done, need a second one! label Sep 18, 2023

Tialo mentioned this pull request Sep 18, 2023

TST Extend tests for scipy.sparse.*array #27090

Closed

OmarManzoor mentioned this pull request Sep 27, 2023

TST Extend tests for scipy.sparse.*array in sklearn/metrics/tests/test_pairwise.py #27415

Closed

glemaitre approved these changes Sep 29, 2023

View reviewed changes

glemaitre merged commit df60c75 into scikit-learn:main Sep 29, 2023

StefanieSenger deleted the sparse_array_test_pairwise branch September 30, 2023 05:29

REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023

TST Extend tests for scipy.sparse.*array in test_pairwise.py (sci…

d578d8d

…kit-learn#27288)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST Extend tests for `scipy.sparse.*array` in `test_pairwise.py` #27288

TST Extend tests for `scipy.sparse.*array` in `test_pairwise.py` #27288

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TST Extend tests for scipy.sparse.*array in test_pairwise.py #27288

TST Extend tests for scipy.sparse.*array in test_pairwise.py #27288

Uh oh!

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TST Extend tests for `scipy.sparse.*array` in `test_pairwise.py` #27288

TST Extend tests for `scipy.sparse.*array` in `test_pairwise.py` #27288