-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
TST Fix unreachable code in tests #16110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…al), manifold, metrics, mixture
Thanks for this @VarIr. Let us know when you want review |
Sure, I'll first go through all the packages. There will certainly be a number of things to discuss, then. |
Only a handful "red lines" actually seem to risk false positive tests (e.g. in A number of changes are debatable.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @VarIr ! This indeed addresses a number of genuine issues in tests. Comments are below. I think you can replace the WIP to MRG label, as this PR is long enough and we probably shouldn't add more things to it even if we could.
For the pragma: no cover
I would personally remove it, but I can be convinced otherwise.
.coveragerc
Outdated
|
||
[report] | ||
exclude_lines = | ||
pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fully sure how it works, but I think it's fine to count not covered lines in the coverage report. There are a lot of other lines that could be excluded from coverage, and I don't think we should go that road..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main idea behind it is to make a second run of this PR in the future much faster (which involves opening all modules that show >0 uncovered lines in the coverage report, which are quite a few). Granted, this may be too little benefit to justify this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have mixed feeling about it. Let's leave pragma: no cover
for now as is, and see what the second reviewer think about it.
|
||
def predict(self, X): | ||
return X | ||
return X # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the pragma: no cover
confusing here (and above). It looks like it's never called, granted, but there is no reason to explicitly exclude it from coverage as that sends a confusing message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering, would it be acceptable, if it had a more explicit form? Along the line # pragma: unreachable but required test statement
(or, preferably, some shorter version)
@@ -55,7 +55,7 @@ def decode_column(data_bunch, col_idx): | |||
sparse = data_description['format'].lower() == 'sparse_arff' | |||
if sparse is True: | |||
raise ValueError('This test is not intended for sparse data, to keep ' | |||
'code relatively simple') | |||
'code relatively simple') # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not use pragma: no cover
in tests in general.
@@ -1586,7 +1582,8 @@ def test_SGDClassifier_fit_for_all_backends(backend): | |||
# buffer. | |||
|
|||
if joblib.__version__ < LooseVersion('0.12') and backend == 'loky': | |||
pytest.skip('loky backend does not exist in joblib <0.12') | |||
pytest.skip( | |||
'loky backend does not exist in joblib <0.12') # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert, and remove the pragma.
Good point! I would still expect that we shouldn't import from test module. In the particular case of test_docstring_parameters and test_tabs, I think we should explicitly exclude test modules.
The change you proposed looks good to me.
Yes, some of it was done #15148 but some of the redundancy remain. You would be very welcome to open another PR for that refactoring.
We could run in them in the cron job daily. Please open an issue about it if you have double checked that they are not run in any CI at the moment.
Yes, it's better to avoid cosmetic changes in general. #11336 For the changes you have made in this PR we can keep them. |
Wouldn't we still want to enforce common code style throughout the library and the tests? I think it's cleaner to always avoid module-level importorskip instead. Btw, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, overall with a +0 for the particular aspect of adding pragma: no cover
: I'm concerned about about its long term maintainability (i.e. they would tend to get outdated with code changes, and incorrect comments is worse than no comments).
When there is a second approval we can merge. Thanks @VarIr !
Wouldn't we still want to enforce common code style throughout the library and the tests? I think it's cleaner to always avoid module-level importorskip instead.
OK, I don't mind either way.
It looks like all the use cases for If so, we should document how we expect to use |
@thomasjpfan Yes, that was exactly my intention. I would use the pragma in tests only. Used otherwise, it would most likely shadow missing test scenarios. In tests, however, the pragma can help contributors to focus on actual issues, for example, when following the workflow to improve test coverage for unit tests. I'll add docs for this. |
Thinking it over, I am also concerned with the maintainability of This is specifically an issue because we run coverage on the tests. For reference, pandas omits the test files from coverage. We have discussed removing test coverage in #14066 and in the end we decided to keep them. I am a +0.5 on this PR with the |
In this case, I'll remove the pragmas for now, to make the rest of this PR happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Partial review up to sklearn/tests/test_base.py
@@ -102,6 +102,7 @@ def _fetch_dataset_from_openml(data_id, data_name, data_version, | |||
assert data_by_id.target.shape == (expected_observations, | |||
len(target_column)) | |||
assert data_by_id.target_names == target_column | |||
# TODO Use expected_data_dtype here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to otherwise go unused? So either that, or expected_data_dtype should be removed...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's use it.
@@ -42,7 +42,7 @@ def _check_with_col_sign_flipping(A, B, tol=0.0): | |||
(((A[:, column_idx] + | |||
B[:, column_idx]) ** 2).mean() <= tol ** 2)) | |||
if not sign: | |||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to change this function to be assert_equal_with_sign_flipping
.
The function is awkward because it keeps sign
from iteration to iteration, but we know at the start of each iteration that sign
is True.
So it would be sufficient to have something like
def ...
for A_col, B_col in zip(A.T, B.T):
assert np.mean((A_col - B_col) ** 2) <= tol ** 2 or np.mean((A_col + B_col) ** 2) <= tol ** 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I'll make it np.max
though, as we might miss outliers otherwise.
@@ -43,9 +43,8 @@ def test_distribution(): | |||
for estimator, parameters in ESTIMATORS: | |||
clf = estimator(**parameters).fit(samples, labels) | |||
if parameters['kernel'] == 'knn': | |||
continue # unstable test; changes in k-NN ordering break it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried that changes like this might break on platforms checked in the release process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert these changes. I'm not confident about them anyway, since I don't really know the algorithm well.
The |
Thanks @VarIr ! |
Reference Issues/PRs
Fixes #14231
What does this implement/fix? Explain your changes.
Tests can contain unreachable code which then produces false positives.
This PR aims at fixing as many of these issues as possible.
Any other comments?
Since this PR covers nearly all subpackages of sklearn, I will add fixes for one subpackage at a time, starting with
clusters
.Primarily, I aim at red lines in the codecov report. Occasionally, I might fix additional issues, e.g. when variables in loops are not actually used. (For example, in
test_project_and_cluster
).