8000 [MRG+2] catch more expected warnings in common tests by ShangwuYao · Pull Request #11151 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+2] catch more expected warnings in common tests #11151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jun 11, 2018

Conversation

ShangwuYao
Copy link
Contributor
@ShangwuYao ShangwuYao commented May 27, 2018

Reference Issues/PRs

Fixes #11109. See also #7006.

What does this implement/fix? Explain your changes.

  1. Ignore the expected warning information in common tests, including DeprecationWarning, ConvergenceWarning, UserWarning and FutureWarning.
  2. Removed unnecessary warnings.resetfilters in clean_warning_registry().
  3. Changed ignore_warnings to ignore both local warnings and external warnings like suppress_warnings in numpy.
  4. Fixed a bug in using ignore_warnings as context manager, restore warning filters upon exiting, avoid affecting non-local filters.

Any other comments?

@ShangwuYao ShangwuYao changed the title ignoring expected warnings catch more expected warnings in common tests May 27, 2018
@rth
Copy link
Member
rth commented May 27, 2018

Thanks for this PR @ShangwuYao !

ignore_warnings will not work as a decorator in combination with pytest parametrization on Python 2, I think.

You could use it as a context manager with ignore_warnings(): inside the test though..

@ShangwuYao
Copy link
Contributor Author
ShangwuYao commented May 27, 2018

Thanks for the clarification @rth , I have tried several different ways of using ignore_warnings, including decorator and context manager, but whichever way I am doing it, I still have those warnings generated when running pytest on my local machine. Do you know why is that?

@rth
Copy link
Member
rth commented May 27, 2018

Yes, I wonder if pytest warning capture interferes with ignore_warnings.

You could also try to use the pytest warnings filter mechanism, approximately,

ignore_common_warnings = pytest.mark.filterwarnings(
     'ignore:.*:UserWarning',
     'ignore:.*:ConvergenceWarning',
     ...)


@ignore_common_warnings
def some_test_function():
   pass

(I think, pytest filterwarnings arguments are not too well documented)

@ShangwuYao
Copy link
Contributor Author
ShangwuYao commented May 28, 2018

Yeah, I tried that, it didn't work either.

The next thing I tried is comparing this file (test_common.py) with other test files, and found that the ignore_warnings works everywhere except in test_non_meta_estimators and test_no_attributes_set_in_init, and those two methods both uses pytest.mark.parametrize, so I think this might be the reason why the ignore_warnings is not working. So I went to lower level of the test_non_meta_estimators, for example, the check_classifier_data_not_an_array in estimator_checks.py, and added this decorator to the method:

@ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
                                   UserWarning, FutureWarning))

And it did helped, but I think I should only ignore those warnings in the common test, so I am still looking into how to deal with the pytest.mark.parametrize problem.

@ShangwuYao
Copy link
Contributor Author

Can you help me reopen it? @rth I was trying to rollback and commit. Thanks!

@jnothman
Copy link
Member

I'm not sure what's happened, but the Reopen button is disabled.

@ShangwuYao ShangwuYao reopened this May 30, 2018
@ShangwuYao ShangwuYao changed the title catch more expected warnings in common tests [WIP] catch more expected warnings in common tests Jun 5, 2018
@ShangwuYao
Copy link
Contributor Author
def controlled_warning_context_manager():
    with ignore_warnings():
        warnings.warn('yo', UserWarning)
        print(warnings.filters)

@ignore_warnings
def controlled_warning_decorator():
    warnings.warn('yo', UserWarning)
    print(warnings.filters)

Using ignore_warnings as context manager:

warnings.resetwarnings()
warnings.simplefilter('ignore', UserWarning)
print(warnings.filters)
controlled_warning_context_manager()
print(warnings.filters)
[('ignore', None, <class 'UserWarning'>, None, 0)]
[('ignore', None, <class 'Warning'>, None, 0), ('ignore', None, <class 'UserWarning'>, None, 0)]
[('ignore', None, <class 'UserWarning'>, None, 0)]

Using ignore_warnings as decorator:

warnings.resetwarnings()
warnings.simplefilter('ignore', UserWarning)
print(warnings.filters)
controlled_warning_decorator()
print(warnings.filters)
[('ignore', None, <class 'UserWarning'>, None, 0)]
[('ignore', None, <class 'Warning'>, None, 0), ('ignore', None, <class 'UserWarning'>, None, 0)]
[('ignore', None, <class 'UserWarning'>, None, 0)]

Both are working as expected now, can see old erroneous behavior in #11109 .

@ShangwuYao
Copy link
Contributor Author
ShangwuYao commented Jun 5, 2018

This work is done, could some one review it? Thanks! @rth @jnothman

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In travis Python 2.7: there are 762 warnings in master, but 5555 warnings in this PR
In travis Python 3.4: there are 8718 warnings in master, but 6743 in this PR.
In travis Python 3.6: there are 8105 warnings in master, but 6004 in this PR.

So something is going wrong in Python 2.7. Perhaps this is one of the Python bugs that I noted was motivation for resetting warning filters in some contexts...

Search for "warnings in" in each of the respective logs to confirm these results. (Please report similar results in future updates to this PR.)

@@ -137,6 +137,7 @@ def assert_warns(warning_class, func, *args, **kw):

"""
# very important to avoid uncontrolled state propagation
warnings.resetwarnings()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere there was a suggestion that we should never need to resetwarnings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please see #11109 (comment) (if you have not seen it yet).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah actually looking at the diff, warnings.resetwarnings has been removed from clean_warning_registry which is the change I was advocating.

It would be great if you could give us some reason why you need warnings.resetwarnings here. I am hoping we don't need to use it anywhere. In case we do need to use warnings.resetwarnings, it should be used inside a warnings.catch_warnings context manager so that the global warnings.filters is not affected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked at #11109 before I did this. After I removed the warnings.resetwarnings() from clean_warning_registry, I couldn't pass one test case - the test_warn in test_testing:

# This class is inspired from numpy 1.7 with an alteration to check
# the reset warning filters after calls to assert_warns.
# This assert_warns behavior is specific to scikit-learn because
# `clean_warning_registry()` is called internally by assert_warns
# and clears all previous filters.
class TestWarns(unittest.TestCase):
    def test_warn(self):
        def f():
            warnings.warn("yo")
            return 3

        # Test that assert_warns is not impacted by externally set
        # filters and is reset internally.
        # This is because `clean_warning_registry()` is called internally by
        # assert_warns and clears all previous filters.
        warnings.simplefilter("ignore", UserWarning)
        assert_equal(assert_warns(UserWarning, f), 3)

        # Test that the warning registry is empty after assert_warns
        assert_equal(sys.modules['warnings'].filters, [])

        assert_raises(AssertionError, assert_no_warnings, f)
        assert_equal(assert_no_warnings(lambda x: x, 1), 1)

It looks like former developers expect to use warnings.resetwarnings in assert_warns , while I think we should store the filters, and restore them upon exiting, but this seems like another issue, I intended to create another PR for this. Right now I am just making it behave the same as I haven't remove warnings.resetwarnings in clean_warning_registry . If you could tell me what you want for the assert_warns, I can do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented somewhere else that you should remove these two lines:

# Test that the warning registry is empty after assert_warns
assert_equal(sys.modules['warnings'].filters, [])

Expecting a assert_warns call to affect the global warnings.filters is not something I would expect. I can't see a good reason for it.

@@ -304,7 +304,7 @@ def check_estimator(Estimator):
except SkipTest as message:
# the only SkipTest thrown currently results from not
# being able to import pandas.
warnings.warn(message, SkipTestWarning)
warnings.warn(str(message), SkipTestWarning)
Copy link
Member
@lesteve lesteve Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record I bumped into the same problem. This is clearly a bug in our code, trying to provide an exception object into warnings.warn only work if none of the warnings filters use a regex. Here is a snippet to show what I means:

import warnings

exc = ValueError()
warnings.resetwarnings()
warnings.warn(exc)  # works fine

warnings.filterwarnings('always', message='regex_goes_here')
warnings.warn(exc)  # exception here TypeError: expected string or bytes-like object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are at it, you may as well rename message -> exception to improve readability.

@jnothman
Copy link
Member
jnothman commented Jun 6, 2018

Does anyone have an idea why warnings would increase on py2?

@lesteve
Copy link
Member
lesteve commented Jun 6, 2018

Not off the top of my head no. I would be quite happy with just reducing the warnings on Python 3 to a manageable number to be perfectly honest.

More generally, something quite useful to reduce the number of warnings would be to break it down by warning class as I did in #10158 (comment) for example.

@ShangwuYao
Copy link
Contributor Author
ShangwuYao commented Jun 7, 2018

hmm, I compared the warning log before and after the change, seems like the new warnings after my change are pretty legitimate warnings. Maybe because I fixed those bugs, we actually get to see those warnings that we are supposed to see. :trollface:
And I don't even understand why python 2.7 have so much less warnings in the first place, this might be the reason?
newpy27log.txt
oldpy27log.txt

@jnothman
Copy link
Member
jnothman commented Jun 7, 2018 via email

@ShangwuYao
Copy link
Contributor Author

Thank you all a lot! should I do anything to merge?

@jnothman
Copy link
Member
jnothman commented Jun 7, 2018 via email

@lesteve
Copy link
Member
lesteve commented Jun 7, 2018

Can I have a breakdown of warnings by warning class just for test_common.py as I asked above? Without this I feel this is a bit a shot in the dark ...

I have looked at this closely and based on what I found, I think these changes makes sense to me, but 1. I am not 100% sure of the subtle impacts of the changes and the Py27 increase is a bit weird
2. to be reassured, it would be nice to check that we can reduce the warnings in Py3 to a manageable number.

@jnothman
Copy link
Member
jnothman commented Jun 7, 2018 via email

@ShangwuYao
Copy link
Contributor Author

This PR is initially intended for catching warnings in common tests, which is now reduced from thousands to 5-6. And during the process, I fixed the bugs in ignore_warnings, assert_warns and clean_warning_registry , if you intend to clear all expected warnings for the entire library, I can add filters to the tests, but those are not the intention for this PR.

@jnothman
Copy link
Member
jnothman commented Jun 7, 2018

I agree.

Why is "4403 /home/lesteve/dev/alt-scikit-learn/sklearn/utils/deprecation.py:77: DeprecationWarning: Function squared_exponential is deprecated; The function squared_exponential of correlation_models is deprecated in version 0.19.1 and will be removed in 0.22." apparently appearing after this PR and not before??

@ShangwuYao
Copy link
Contributor Author
ShangwuYao commented Jun 7, 2018

well, if it is deprecated, and no filters are set, it should give the warnings. As I checked, that is the case.
The external filters were messed up before, so these warnings somehow get ignored. I didn't figure out exactly why either.
The deprecation warnings you talked about came from test_gaussian_process.py, this one file generates 4752 warnings. After I set the warning filters, all those warnings disappeared as I tested locally, shall I commit this change as well?

if self._entered:
raise RuntimeError("Cannot enter %r twice" % self)
self._entered = True
self._filters = self._module.filters
self._module.filters = self._filters[:]
self._showwarning = self._module.showwarning
warnings.simplefilter("ignore", self.category)
Copy link
Member
@lesteve lesteve Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you moved it here?

Should there not be an equivalent warnings.filters.pop() in __exit__? Sorry for the noise this is not needed as warnings.filters are restored in __exit__.

Copy link
Contributor Author
@ShangwuYao ShangwuYao Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose there are some external warning filters and local warning filters, and this change means that in the local domain, both the external warning filters and local warning filters are working, and upon exiting this local domain, the external warning filters are restored to be the way they were (before entering the local domain). This is intended to match the behavior of context manager as decorator, since the decorator use with warnings.catch_warnings(): to do so.
I have proposed to use this in the previous posts several times, I think this makes sense but no one confirms this, and then I found numpy's suppress_warnings has the same behavior.

Without moving this here, the warning.filters will restore both the local filters as well as the external filters upon exiting.

@lesteve
Copy link
Member
lesteve commented Jun 7, 2018

While you are at it, I think you can remove this warnings.simplefilter in esttimator_checks.py:

warnings.simplefilter("ignore", ConvergenceWarning)

@ShangwuYao
Copy link
Contributor Author

I don't quite understand why the build failed.

@lesteve
Copy link
Member
lesteve commented Jun 8, 2018

I don't quite understand why the build failed.

There were due to flake8 errors that you introduced.

I pushed some tweaks in the last commit, let me know if you have comments about them. I think this is looking good, being able to remove 4k warnings in test_gaussian_process.py in a straightforward manner makes me confident that the way we are handling the warnings is a lot less hacky than before.

@lesteve
Copy link
Member
lesteve commented Jun 8, 2018

LGTM. The number of warnings are:
Python 2.7 build: 797
Python 3.4 build: 2028
Python 3.6 build: 1448

I am strongly in favour of creating separate PRs to investigate how to reduce the number of warnings further.

FYI there were quirks on Python 2.7 due to the interaction of pytest.mark.parametrize and ignore_warnings/pytest.mark.filterwarnings which I did not investigate further. This is why my last commit reverts to using a ignore_warnings context manager as was the case originally.

@lesteve lesteve changed the title [WIP] catch more expected warnings in common tests [MRG+1] catch more expected warnings in common tests Jun 8, 2018
Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM (apart for the minor comment below), thanks for working on this!

I am strongly in favour of creating separate PRs to investigate how to reduce the number of warnings further.

+1

@@ -143,7 +144,6 @@ def test_batch_size():


def test_random_starts():
# Test that an increasing number of random-starts of GP fitting only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this removal intentional?

@rth rth changed the title [MRG+1] catch more expected warnings in common tests [MRG+2] catch more expected warnings in common tests Jun 9, 2018
@ShangwuYao
Copy link
Contributor Author

alright, then I will create another PR to further reduce warnings.

@lesteve
Copy link
Member
lesteve commented Jun 11, 2018

Merging this one, thanks a lot for working on this @ShangwuYao!

@lesteve lesteve merged commit cb1dbc2 into scikit-learn:master Jun 11, 2018
@ShangwuYao
Copy link
Contributor Author

This is exciting, thank you all for reviewing!!

jorisvandenbossche
@@ -15,6 +15,7 @@
from sklearn.datasets import make_regression
from sklearn.utils.testing import assert_greater, assert_true, assert_raises

pytestmark = pytest.mark.filterwarnings('ignore', category=DeprecationWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason that DeprecationWarnings are ignored here? (this is not part of the common tests)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR fixed bugs in ignore_warnings and clean_warning_registry, see here, not just common tests. Before this, some warnings that should be raised in test_gaussian_process was somehow suppressed, after this change, they are correctly raised, and this file alone raises more than 4000 new DeprecationWarnings, so we ignored it to see this fix is really working, see here. We probably shouldn't ignore this warning as you said.

Copy link
Member
@lesteve lesteve Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason that DeprecationWarnings are ignored here? (this is not part of the common tests)

Unless I am missing something sklearn/gaussian_process/gaussian_process.py is deprecated and scheduled to be removed in 0.20.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a good reason :-)
Ideally I would say that deprecation warnings are asserted once and then explicitly ignored for other tests, but since this is deprecating a full sub-module and not a method or so, then probably makes sense to just ignore them for the full submodule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

catch more warnings in common tests
5 participants
0