8000 TST Extend tests for `scipy.sparse.*array` · Issue #27090 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

TST Extend tests for scipy.sparse.*array #27090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jjerphan opened this issue Aug 18, 2023 · 177 comments
Closed

TST Extend tests for scipy.sparse.*array #27090

jjerphan opened this issue Aug 18, 2023 · 177 comments
Labels
good first PR to review Simple atomic PR to review Meta-issue General issue associated to an identified list of tasks module:test-suite everything related to our tests Sprint

Comments

@jjerphan
Copy link
Member
jjerphan commented Aug 18, 2023

SciPy sparse matrices (i.e. scipy.sparse.*matrix) are tested but their sparse arrays counterpart (i.e. scipy.sparse.*array) aren't yet will become ubiquitous (see #26418).

Tests and their parameterizations (when they exist) must be adapted to include scipy.sparse.*array conditionally to versions of SciPy that support them (i.e. scipy>=1.8).

Steps

ℹ️ You can take #27095 as an example for your PRs.

1. Choose one of the files to adapt

  • Indicate the file you are adapting (later referred to as <filename>) on this issue so no-one ends up duplicating work. E.g.
Hi, I am starting working on `<filename>`.

Please double check as this list might be outdated, but candidate files might be:

2. Perform the following (non-exhaustive) changes for <filename>

  • Add test parametrization for tests using *_matrix (for instance see eda7b16)
  • Adapt existing test parametrization to test against sparse array (for instance see 6717f99)
  • Extend or add test checking that type of sparse containers are preserve by functions. E.g. adding tests like (as proposed by TST Extend tests for scipy.sparse.*array #27090 (comment)):
    sparse_input = sparse_container(...)
    sparse_output = process_sparse_container(sparse_input)
    
    assert isinstance(sparse_output, type(sparse_input))
  • Run tests for this file:
    # Run all the test in <filename>
    pytest <filename> -k test_to_run
    
    # Run a specific test in this file
    pytest <filename> -k test_to_run
    
  • If tests fail, the tested implementations has to be adapted (let us know if you need help)

3. Create a pull request

  • Use this title:
TST Extend tests for `scipy.sparse.*array` in `<filename>`
  • Mention this issue in its description:
Towards #27090.

4. Once the pull request is created

  • Ignore the error on the CI related the changelog: those PR need not update the changelog since nothing has changed for the users (until this issue is resolved), and maintainers must label PRs as "No Changelog Needed" for the check to be deactivated.
@jjerphan jjerphan added Meta-issue General issue associated to an identified list of tasks good first issue Easy with clear instructions to resolve labels Aug 18, 2023
@jjerphan
Copy link
Member Author

Hi, I am starting working on sklearn/cluster/tests/test_affinity_propagation.py.

@jjerphan jjerphan changed the title TST Extend test for scipy.sparse.*array TST Extend tests for scipy.sparse.*array Aug 18, 2023
@jjerphan jjerphan added the module:test-suite everything related to our tests label Aug 18, 2023
@msgomez06
Copy link
Contributor

Hi! I'm starting work on sklearn/cluster/tests/test_dbscan.py

@msgomez06
Copy link
Contributor

Hi! I'm starting work on sklearn/cluster/tests/test_birch.py

@Kislovskiy
Copy link
Contributor
Kislovskiy commented Aug 18, 2023

Hi, I am starting working on sklearn/cluster/tests/test_bisect_k_means.py.

@marenwestermann
Copy link
Member

Working on sklearn/cluster/tests/test_optics.py

@vvkrddy
Copy link
Contributor
vvkrddy commented Aug 18, 2023

I am working on sklearn/cluster/tests/test_hierarchical.py.

@Kislovskiy
Copy link
Contributor

I am working on sklearn/cluster/tests/test_k_means.py.

@ivirshup
Copy link
Contributor

@jjerphan I think we should add a step for testing that types are maintained in cases where a sparse array can be returned. E.g.:

f(X: *_array, ...) -> *_array
f(X: *_matrix, ...) -> *_matrix

@jjerphan
Copy link
Member Author
jjerphan commented Aug 18, 2023

Indeed; thanks for mentioning that, @ivirshup. I updated the issue's description accordingly.

@ivirshup
Copy link
Contributor

If we have to modify the code to support the *_array classes, is this considered a fix or a feature? Wondering where this goes in the changelog.

@jjerphan
Copy link
Member Author

I would consider this support and an enhancement, I think.

@andyscanzio
Copy link
Contributor

Hi, I am starting working on sklearn/cluster/tests/test_hdbscan.py.

@naman1608
Copy link
Contributor

Hi, I am starting working on sklearn/svm/tests/test_bounds.py.

@msgomez06
Copy link
Contributor

Hi! I'm starting work on sklearn/compose/tests/test_column_transformer.py

@work-mohit
Copy link
Contributor

Hey ! Can anyone help me with the setup of the project. When I run the pytest command I'm facing import error conflict between the sklearn that I installed and the one I'm working with.

ImportError while loading conftest
'G:<some folder>\scikit-learn\sklearn\conftest.py'.
_pytest.pathlib.ImportPathMismatchError: ('sklearn.conftest', 'C:\Users\\anaconda3\lib\site-packages\sklearn\conftest.py', WindowsPath('G://scikit-learn/sklearn/conftest.py'))

@jjerphan
Copy link
Member Author
jjerphan commented Oct 1, 2023

I think there is nothing to do for test_samples_generator, is this correct, @jjerphan?

In test_make_multilabel_classification_return_indicator_sparse make_multilabel_classification returns Y as a sparse matrix of type csr_matrix. The only part in this test that relies on Y's sparse type is when assert sp.issparse(Y) is checked, but it will also return True if Y later becomes of type csr_array.

The same applies for test_make_sparse_spd_matrix, which is in the making(#27438). assert sp.issparse(X) here also works with changing sparse data type.

And we won't parametrize any sparse container here, because sp.issparse works either way. Is this right?

@StefanieSenger: I just had a look, and your analysis is right: nothing to be done for test_samples_generator.py; I will remove it from the list. This might be the case for other files.

@StefanieSenger
Copy link
Contributor

Thank you, @jjerphan :)

@jjerphan
Copy link
Member Author
jjerphan commented Oct 1, 2023

This also was the case for other files, and I removed their items from the list.

@KartikeyBartwal
Copy link

Hi, I am starting working on sklearn/svm/tests/test_sparse.py.

@work-mohit
Copy link
Contributor

Hi, I am starting working on sklearn/svm/tests/test_sparse.py.

Hey @KartikeyBartwal , I already did some work on this previously.
If you wanna help please let me know, I'm making a draft PR , will mention you.

@itrajkov
Copy link
itrajkov commented Oct 1, 2023

Hey, I'm starting work on sklearn/manifold/tests/test_spectral_embedding.

@Charlie-XIAO
Copy link
Contributor

@itrajkov I'm sorry but I've already have an open PR for that, see #27240.

@itrajkov
Copy link
itrajkov commented Oct 1, 2023

Oh my bad!

@KartikeyBartwal
Copy link

Hi, I am starting working on sklearn/svm/tests/test_sparse.py.

Hey @KartikeyBartwal , I already did some work on this previously. If you wanna help please let me know, I'm making a draft PR , will mention you.

Please share the updates from your side. I'll try to contribute to the PR

@work-mohit
Copy link
Contributor

Hi, I am starting working on sklearn/svm/tests/test_sparse.py.

Hey @KartikeyBartwal , I already did some work on this previously. If you wanna help please let me know, I'm making a draft PR , will mention you.

Please share the updates from your side. I'll try to contribute to the PR
#27511

@jjerphan
Copy link
Member Author

After a brief discussion with @Charlie-XIAO in #27723 (comment), we might want to clarify the changelog entry which has been introduced to clarify explicitly that all the interfaces supporting sparse matrices now explicitly support sparse areays.

What do you think?

@StefanieSenger
Copy link
Contributor

After a brief discussion with @Charlie-XIAO in #27723 (comment), we might want to clarify the changelog entry which has been introduced to clarify explicitly that all the interfaces supporting sparse matrices now explicitly support sparse areays.
What do you think?

I agree, with modifying the tests, the support for sparse arrays is made sure, but not introduced. Changing this phrase from the changelog maybe, into "now tested to support SciPy sparse array"?

Several estimators are now supporting SciPy sparse arrays. The following functions and classes are impacted:

@jjerphan
Copy link
Member Author

When this issue is closed, I think we can adapt the changelog entry to mention that all interfaces supporting sparse matrices now support sparse arrays (potentially removing the enumeration of interfaces if it is too verbose).

What do you think? Is this sufficient?

@y-vectorfield
Copy link

Hello, I will be starting working on sklearn/utils/_testing.py.
@jjerphan, CC) @norbusan, this is my first contribution for sklearn.
If I meet with trouble, can you advise me??

@jjerphan
Copy link
Member Author

Hello @y-vectorfield,

Thank you. Yes, we will be there to help you.

@y-vectorfield
Copy link

Hello, @jjerphan,
I modified tests in sklearn/utils/_testing.py and submitted a draft version of PR(#27847).
I have two questions.
First: Is my modification correct?
I modified the codes as follows.
https://github.com/scikit-learn/scikit-learn/pull/27847/files

Second: I could not test using pytest.
I implemented the modified codes, however, I could not run the tests...

$ pytest sklearn/utils/_testing.py 
======================================================================================== test session starts =========================================================================================
platform linux -- Python 3.8.18, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/skuser/dev/scikit-learn
configfile: setup.cfg
plugins: cov-4.1.0
collected 2 items                                                                                                                                                                                    

sklearn/utils/_testing.py ss                                                                                                                                                                   [100%]

========================================================================================= 2 skipped in 0.07s =========================================================================================

@wmin401
Copy link
wmin401 commented Feb 19, 2024

I made PRs on sklearn/utils/tests/test_param_validation.py #27317.
I am first-timer and it is my first PRs. I would appreciate it if you give me a advice with my PRs.
I extended the existing test to cover scipy.sparse.*array objects and changed the parameter validation from sparse container where necessary.

Here is my PRs below:
#28456

@Charlie-XIAO
Copy link
Contributor

@wmin401 I'm sorry but there is #27950 open for that file already. I'm afraid there are no more tasks left for this issue.

Perhaps @jjerphan the "good first issue" tag should be removed now?

@jjerphan jjerphan removed the good first issue Easy with clear instructions to resolve label Feb 21, 2024
@Charlie-XIAO
Copy link
Contributor
Charlie-XIAO commented Feb 28, 2024

The last PRs have been merged. Thanks a lot everyone!

@jjerphan I think all tasks here are completed. Should we close now?

@jjerphan
Copy link
Member Author

Thank you for the heads-up, @Charlie-XIAO !

I will take some time to double-check it before we close it as completed.

@jjerphan
Copy link
Member Author
jjerphan commented Mar 1, 2024

Everything looks good to me.

Thank you everyone for your involvement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first PR to review Simple atomic PR to review Meta-issue General issue associated to an identified list of tasks module:test-suite everything related to our tests Sprint
Projects
None yet
Development

No branches or pull requests

0