-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
TST Extend tests for scipy.sparse.*array
in test_pairwise.py
#27288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST Extend tests for scipy.sparse.*array
in test_pairwise.py
#27288
Conversation
The failing test from the CI ( |
I'm not maintainer but I think you can simply remove |
Thank you, @Charlie-XIAO I will try this. |
Hi @StefanieSenger, I think the ids are basically used for identification purposes, otherwise some defaults will be used depending on the data structure. For example with ids the results are like: test_pairwise.py::test_euclidean_distances_known_result[dense-dense] PASSED [ 11%]
test_pairwise.py::test_euclidean_distances_known_result[dense-sparse_matrix] PASSED [ 22%]
test_pairwise.py::test_euclidean_distances_known_result[sparse_matrix-dense] PASSED [ 44%]
test_pairwise.py::test_euclidean_distances_known_result[sparse_matrix-sparse_array] PASSED [ 66%] without ids the results are test_pairwise.py::test_euclidean_distances_known_result[array-array] PASSED [ 11%]
test_pairwise.py::test_euclidean_distances_known_result[array-csr_matrix] PASSED [ 22%]
test_pairwise.py::test_euclidean_distances_known_result[csr_matrix-array] PASSED [ 44%]
test_pairwise.py::test_euclidean_distances_known_result[csr_matrix-csr_array] PASSED [ 66%] So we can either remove these ids or conditionally set the ids based on the container length. |
Hi, @OmarManzoor, I see and to be on the secure side I did now set the ids conditionally based on whether the container contained more than one element. It seems that without passing the ids, these are named differently, but the combinations tested are the same.
"dense", "sparse_matrix", "sparse_array" translates directly to , "array", "csr_matrix", "csr_array". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @StefanieSenger. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a number of other instances as well where we might needs to set ids conditionally. Since they all involve CSR_CONTAINERS we can simply have a variable at the top like
container_ids = (
["dense", "sparse_matrix"] if len(CSR_CONTAINERS) == 1
else ["dense", "sparse_matrix", "sparse_array"]
)
and then use this in all the required places.
Or you can remove the ids if you prefer. Since I added the second reviewer tag, if another maintainer feels we need them, then they can be added.
Thank you, @OmarManzoor. I've taken a look into the pytests documentation. It seems that ids are strings, that pytest automatically assigns for identifying each test case, but we can also explicitly name them, if we want, for clarity or debugging. Since the ids are not further used in the code, I will omit them. |
@StefanieSenger You may want to look at this one: #27262 (comment) |
Ah, thank you, @Charlie-XIAO. I will keep them then. Do you know why? |
I think it's still for compatibility, for instance, one may have a script that selects tests to run based on names. But not sure, maybe @glemaitre? |
The |
It's easier to trace back failing tests then, I see. Thank you @glemaitre . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @StefanieSenger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM for the other tests.
@@ -146,8 +152,8 @@ def test_pairwise_distances(global_dtype): | |||
|
|||
# Test with sparse X and Y, | |||
# currently only supported for Euclidean, L1 and cosine. | |||
X_sparse = csr_matrix(X) | |||
Y_sparse = csr_matrix(Y) | |||
X_sparse = csr_container(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we isolate the code that test with the sparse matrices.
It would avoid to repeat the tests for the dense case multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre, do you mean to turn this test into two separate tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will try that. :)
Edit: It's done. :)
X_sparse = csr_matrix(X) | ||
Y_sparse = csr_matrix(Y) | ||
X_sparse = csr_container(X) | ||
Y_sparse = csr_container(Y) | ||
|
||
S = pairwise_distances(X_sparse, Y_sparse, metric="euclidean") | ||
S2 = euclidean_distances(X_sparse, Y_sparse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In l. 168-169, I see the following:
S = pairwise_distances(X_sparse, Y_sparse.tocsc(), metric="manhattan")
S2 = manhattan_distances(X_sparse.tobsr(), Y_sparse.tocoo())
It means that somehow, we expect to try with other containers as well, BSR, COO, and CSC.
I assume that we just decorate the new test by parametrizing for all combinations. I don't think that we need to mix the type of X and y, but try the same type for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think I know what you mean here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @StefanieSenger
Reference Issues/PRs
Towards #27090.
What does this implement/fix? Explain your changes.
This PR substitutes scipy sparse matrices with the scipy containers introduced in #27095 in the
sklearn/metrics/tests/test_pairwise.py
test file.