8000 TST Fix unreachable code in tests by VarIr · Pull Request #16110 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

TST Fix unreachable code in tests #16110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Feb 16, 2020
Merged

TST Fix unreachable code in tests #16110

merged 20 commits into from
Feb 16, 2020

Conversation

VarIr
Copy link
Contributor
@VarIr VarIr commented Jan 13, 2020

Reference Issues/PRs

Fixes #14231

What does this implement/fix? Explain your changes.

Tests can contain unreachable code which then produces false positives.
This PR aims at fixing as many of these issues as possible.

Any other comments?

Since this PR covers nearly all subpackages of sklearn, I will add fixes for one subpackage at a time, starting with clusters.
Primarily, I aim at red lines in the codecov report. Occasionally, I might fix additional issues, e.g. when variables in loops are not actually used. (For example, in test_project_and_cluster).

@VarIr
Copy link
Contributor Author
VarIr commented Jan 13, 2020

@rth Ping, as requested in #14231. It took me some time to start working on this, but I will now try to incrementally fix these issues.

@jnothman
Copy link
Member

Thanks for this @VarIr. Let us know when you want review

@VarIr
Copy link
Contributor Author
VarIr commented Jan 21, 2020

Sure, I'll first go through all the packages. There will certainly be a number of things to discuss, then.

@VarIr
Copy link
Contributor Author
VarIr commented Jan 27, 2020

Only a handful "red lines" actually seem to risk false positive tests (e.g. in test_bicluster.py, test_fastica.py, test_text.test_countvectorizer_custom_vocabulary_repeated_indices, test_dist_metrics.py, test_reingold_tilford.py). I think it's still reasonable to fix the other red lines, because they might hint at not easily comprehensible tests, or at least make it harder to find the true false-positive tests in the future.

A number of changes are debatable.

  • # pragma: no cover for lines that are expected to never be reached (say, for example, predict on a MockClassifier that is tested to raise an error upon incorrect input to fit).
  • Left several # XXX unreached code as of v0.22 comments. These mostly indicate that I could not deduce what should be done. Any advice per occurrence highly appreciated.
  • Remove module-level pytest.importorskip. These can have side effects on other tests that import from the module (e.g. test_docstring_parameters.test_tabs), when lightgbm or cython are not available.
  • Remove behavior specific to old versions of some requirements, whenenver they are below the now required version (e.g. scipy < 0.17 in test_impute.py and others.)
  • test_label_propagation.test_distribution. I think the "correct" figures are wrong. Please doublecheck.
  • test_extmath.py. Test for unstable variance might relate to ENH: implement pairwise summation numpy/numpy#3685, which has been addressed in 1.9. We already require 1.11+.
  • Marked duplicated helper functions in multiple neighbors test modules. Better import from one common module?
  • Many tests in datasets not covered, because they require downloading the data. Idk, if there is any mechanism in place to run these semi-automatically/manually from time to time. Perhaps one CI job could be created that specifially runs these, and caches the data?
  • Fix several unused imports/variables, typos, PEP8 etc. Please indicate, if this should better be done in a separate PR. For now, I consider this PR general maintenance of the tests for higher code quality, in which case those might fit.

@VarIr
Copy link
Contributor Author
VarIr commented Jan 27, 2020

Ready for review @jnothman @rth

Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @VarIr ! This indeed addresses a number of genuine issues in tests. Comments are below. I think you can replace the WIP to MRG label, as this PR is long enough and we probably shouldn't add more things to it even if we could.

For the pragma: no cover I would personally remove it, but I can be convinced otherwise.

.coveragerc Outdated

[report]
exclude_lines =
pragma: no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully sure how it works, but I think it's fine to count not covered lines in the coverage report. There are a lot of other lines that could be excluded from coverage, and I don't think we should go that road..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main idea behind it is to make a second run of this PR in the future much faster (which involves opening all modules that show >0 uncovered lines in the coverage report, which are quite a few). Granted, this may be too little benefit to justify this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have mixed feeling about it. Let's leave pragma: no cover for now as is, and see what the second reviewer think about it.


def predict(self, X):
return X
return X # pragma: no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the pragma: no cover confusing here (and above). It looks like it's never called, granted, but there is no reason to explicitly exclude it from coverage as that sends a confusing message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering, would it be acceptable, if it had a more explicit form? Along the line # pragma: unreachable but required test statement (or, preferably, some shorter version)

@@ -55,7 +55,7 @@ def decode_column(data_bunch, col_idx):
sparse = data_description['format'].lower() == 'sparse_arff'
if sparse is True:
raise ValueError('This test is not intended for sparse data, to keep '
'code relatively simple')
'code relatively simple') # pragma: no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not use pragma: no cover in tests in general.

@@ -1586,7 +1582,8 @@ def test_SGDClassifier_fit_for_all_backends(backend):
# buffer.

if joblib.__version__ < LooseVersion('0.12') and backend == 'loky':
pytest.skip('loky backend does not exist in joblib <0.12')
pytest.skip(
'loky backend does not exist in joblib <0.12') # pragma: no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert, and remove the pragma.

@rth
Copy link
Member
rth commented Jan 29, 2020

Remove module-level pytest.importorskip. These can have side effects on other tests that import from the module (e.g. test_docstring_parameters.test_tabs), when lightgbm or cython are not available.

Good point! I would still expect that we shouldn't import from test module. In the particular case of test_docstring_parameters and test_tabs, I think we should explicitly exclude test modules.

test_label_propagation.test_distribution. I think the "correct" figures are wrong. Please doublecheck.

The change you proposed looks good to me.

Marked duplicated helper functions in multiple neighbors test modules. Better import from one common module?

Yes, some of it was done #15148 but some of the redundancy remain. You would be very welcome to open another PR for that refactoring.

Many tests in datasets not covered, because they require downloading the data. Idk, if there is any mechanism in place to run these semi-automatically/manually from time to time. Perhaps one CI job could be created that specifially runs these, and caches the data?

We could run in them in the cron job daily. Please open an issue about it if you have double checked that they are not run in any CI at the moment.

Fix several unused imports/variables, typos, PEP8 etc. Please indicate, if this should better be done in a separate PR. For now, I consider this PR general maintenance of the tests for higher code quality, in which case those might fit.

Yes, it's better to avoid cosmetic changes in general. #11336 For the changes you have made in this PR we can keep them.

@VarIr VarIr changed the title [WIP] Fix unreachable code in tests [MRG] Fix unreachable code in tests Jan 29, 2020
@VarIr
Copy link
Contributor Author
VarIr commented Jan 31, 2020

Good point! I would still expect that we shouldn't import from test module. In the particular case of test_docstring_parameters and test_tabs, I think we should explicitly exclude test modules.

Wouldn't we still want to enforce common code style throughout the library and the tests? I think it's cleaner to always avoid module-level importorskip instead. Btw, test_tabs seems to be out of place in test_docstring_parameters.py. I wonder whether it should be in an own module, like test_source_files.py or similar.

Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9E88

LGTM, overall with a +0 for the particular aspect of adding pragma: no cover: I'm concerned about about its long term maintainability (i.e. they would tend to get outdated with code changes, and incorrect comments is worse than no comments).

When there is a second approval we can merge. Thanks @VarIr !

Wouldn't we still want to enforce common code style throughout the library and the tests? I think it's cleaner to always avoid module-level importorskip instead.

OK, I don't mind either way.

@thomasjpfan
Copy link
Member
thomasjpfan commented Feb 2, 2020

It looks like all the use cases for # pragma: no cover is for test code that we never expect to have run and never in non-testing code. Is this correct?

If so, we should document how we expect to use # pragma: no cover for future contributors.

@VarIr
Copy link
Contributor Author
VarIr commented Feb 3, 2020

@thomasjpfan Yes, that was exactly my intention. I would use the pragma in tests only. Used otherwise, it would most likely shadow missing test scenarios.

In tests, however, the pragma can help contributors to focus on actual issues, for example, when following the workflow to improve test coverage for unit tests. I'll add docs for this.

@thomasjpfan
Copy link
Member

Thinking it over, I am also concerned with the maintainability of # pragma: no cover. Keeping codecov happy seems like it may slip over time.

This is specifically an issue because we run coverage on the tests. For reference, pandas omits the test files from coverage. We have discussed removing test coverage in #14066 and in the end we decided to keep them.

I am a +0.5 on this PR with the #pragmas included. Without the #pragmas, I would be +1.

@VarIr
Copy link
Contributor Author
VarIr commented Feb 10, 2020

In this case, I'll remove the pragmas for now, to make the rest of this PR happen.

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Partial review up to sklearn/tests/test_base.py

@@ -102,6 +102,7 @@ def _fetch_dataset_from_openml(data_id, data_name, data_version,
assert data_by_id.target.shape == (expected_observations,
len(target_column))
assert data_by_id.target_names == target_column
# TODO Use expected_data_dtype here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to otherwise go unused? So either that, or expected_data_dtype should be removed...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's use it.

@@ -42,7 +42,7 @@ def _check_with_col_sign_flipping(A, B, tol=0.0):
(((A[:, column_idx] +
B[:, column_idx]) ** 2).mean() <= tol ** 2))
if not sign:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to change this function to be assert_equal_with_sign_flipping.

The function is awkward because it keeps sign from iteration to iteration, but we know at the start of each iteration that sign is True.

So it would be sufficient to have something like

def ...
    for A_col, B_col in zip(A.T, B.T):
        assert np.mean((A_col - B_col) ** 2) <= tol ** 2 or np.mean((A_col + B_col) ** 2) <= tol ** 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I'll make it np.max though, as we might miss outliers otherwise.

@@ -43,9 +43,8 @@ def test_distribution():
for estimator, parameters in ESTIMATORS:
clf = estimator(**parameters).fit(samples, labels)
if parameters['kernel'] == 'knn':
continue # unstable test; changes in k-NN ordering break it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried that changes like this might break on platforms checked in the release process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert these changes. I'm not confident about them anyway, since I don't really know the algorithm well.

@rth
Copy link
Member
rth commented Feb 16, 2020

I am a +0.5 on this PR with the #pragmas included. Without the #pragmas, I would be +1.

The #pragmas were removed, merging with +2 and a partial positive review by Joel.

F0CD

@rth rth changed the title [MRG] Fix unreachable code in tests TST Fix unreachable code in tests Feb 16, 2020
@rth rth merged commit 9b39c4c into scikit-learn:master Feb 16, 2020
@rth
Copy link
Member
rth commented Feb 16, 2020

Thanks @VarIr !

@VarIr VarIr deleted the unreachable branch February 16, 2020 15:29
thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020
panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix unreachable code in tests
4 participants
0