-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+2] Add common test for set_params behavior #7760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] Add common test for set_params behavior #7760
Conversation
@@ -66,6 +75,9 @@ def test_check_estimator(): | |||
# check that we have a set_params and can clone | |||
msg = "it does not implement a 'get_params' methods" | |||
assert_raises_regex(TypeError, msg, check_estimator, object) | |||
# check that properties can be set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow I kinda forgot that I wrote this test. This looks great!
Maybe add a stronger test that ensures idempotence as you pointed out? You could have a getter and setter but they are not inverse of each other, like a |
Is it against sklearn's conventions if a set_param fails due to TypeError, etc? If the getter and setter do processing, then they can fail for all kinds of reasons. |
There is no conventions on what errors can be raised where, I think. I'm not quite sure what you're getting at. |
If Currently the contributor guidelines say:
What I am wondering is if property-params are subject to this rule, or not. Finally, the reason I ask this is because I'm not sure how to test invertibliilty/idempotence in a general way except to Something like
If property-setters raise Exceptions then that wouldn't work. |
sklearn/utils/estimator_checks.py
Outdated
@@ -1457,6 +1457,7 @@ def param_filter(p): | |||
# true for mixins | |||
return | |||
params = estimator.get_params() | |||
estimator.set_params(**params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put a comment in here to describe what this is testing.
Also: Should we do further checking that get_params()
returns the same thing before and after this statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
@absolutelyNoWarranty Sorry I don't get your point about raising errors.
Again, I'm not sure what you're asking. The rule says that you shouldn't do validation in |
f865379
to
96958a2
Compare
Can you please fix the pep8 error? |
The linter still says no:
|
sklearn/utils/estimator_checks.py
Outdated
estimator.set_params(**{param_name: value}) | ||
get_value = estimator.get_params()[param_name] | ||
except: | ||
# triggered some parameter validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait I don't get this. Shouldn't we error on this? Or is that legal as long as we don't touch it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't realize there had been comments.
Here we don't error because we just want to check that get_value
is value
if value
is successfully set.
sklearn/utils/estimator_checks.py
Outdated
pass | ||
else: | ||
errmsg = ("get_params does not match set_params: " | ||
"called set_params of {0} with {1}={2} " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're gonna use explicit numbers you could have not passed param_name twice ;)
sklearn/utils/estimator_checks.py
Outdated
param_name, get_value) | ||
assert_equal(value, get_value, errmsg) | ||
|
||
with ignore_warnings(category=DeprecationWarning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this doing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to re-make estimator
as in lines 1637-1641.
@jnothman are we now doing "approve changes" instead of "[MRG + 1]"? |
No, I currently haven't found an easy way to say "I don't want a cross next to my review so far because the changes I requested have been made, but I haven't done a full review of the code either". Perhaps I should just dismiss reviews more often. |
@jnothman Can't you change it to "comment"? (I don't actually know). I only use "comment" so far, though that is clearly also not ideal |
@absolutelyNoWarranty you need to rebase. |
5a89deb
to
85da7cd
Compare
Updated and rebased. |
85da7cd
to
7a7743a
Compare
7a7743a
to
f66f1cf
Compare
sklearn/utils/estimator_checks.py
Outdated
@ignore_warnings(category=(DeprecationWarning, FutureWarning)) | ||
def check_set_params(name, estimator_orig): | ||
if name in META_ESTIMATORS: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this never get executed? And why is it necessary? Surely as long as the object has been constructed, this should work fine.
sklearn/utils/estimator_checks.py
Outdated
try: | ||
estimator.set_params(**{param_name: value}) | ||
get_value = estimator.get_params()[param_name] | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use except Exception
. bare except
catches KeyboardInterrupt, for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps except (ValueError, TypeError)
?
sklearn/utils/estimator_checks.py
Outdated
"but get_params returns {1}={3}") | ||
errmsg = errmsg.format(name, param_name, value, | ||
get_value) | ||
assert_equal(value, get_value, errmsg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we be asserting equality (==
) or identity (is
)?
sklearn/utils/estimator_checks.py
Outdated
get_value) | ||
assert_equal(value, get_value, errmsg) | ||
|
||
estimator = clone(estimator_orig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not put this at the beginning of the loop, rather than doing one more clone than necessary per estimator?
The reduced time is probably because you're no longer calling clone and get_params so often also... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting, there, I think... But I'd like @amueller to have another go now. This feels like it's a lot of red tape.
Or perhaps we need to run it with scikit-learn-contrib estimators to check what it breaks...
sklearn/utils/estimator_checks.py
Outdated
pass | ||
else: | ||
new_params = estimator.get_params() | ||
assert_equal(params.keys(), new_params.keys(), msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not assert_is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing for identity doesn't work or keys even if the dictionary really is the same dictionary.
>>> d = {'a':1, 'b':2}
>>> d is d
True
>>> d.keys() is d.keys()
False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I misread. But actually, I think that we are not assured that this equality check will work. In py2 this value is a list, in iteration order which is not to be assumed deterministic
sklearn/utils/testing.py
Outdated
@@ -106,6 +106,7 @@ | |||
assert_greater = _dummy.assertGreater | |||
assert_less_equal = _dummy.assertLessEqual | |||
assert_greater_equal = _dummy.assertGreaterEqual | |||
assert_is = _dummy.assertIs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we no longer use nose,
assert X is Y, msg
is actually sufficient without a helper.
sklearn/utils/estimator_checks.py
Outdated
# before and after set_params() with some fuzz | ||
estimator = clone(estimator_orig) | ||
|
||
params = estimator.get_params() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be using deep=False, as we will test each estimator separately ...
sklearn/utils/estimator_checks.py
Outdated
try: | ||
estimator.set_params(**params) | ||
except Exception: | ||
# triggered some parameter validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ideal world, we might not permit this validation outside of fit either. I think we should consider raising a warning of some kind where this case is over-used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point but when is a developer expected to see the warning? As part of the (massive) output of the test suite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the output of the test-suite should not be massive. There are a lot of issues in master because we were not careful about handling warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for raising a warning (or even error?) here. Did this error for all_estimators? where? Usually each new test in estimator_checks
requires fixes to the codebase. This one here is probably the one that should have been fixed. Not having any errors on the codebase is suspicious to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a second I thought running this on a deep estimator would fail, but I guess we avoid this by doing deep=False
. It would be really nice to run estimator_checks
on some deep estimators though (I tried and ran into a whole host of problems)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Please add an entry to the change log at |
The idea is that developers run check_estimator on a smaller set of objects
than we do here, and that they may be more diligent than we are about
responding to warnings!
|
Always the optimist. Gonna do another round now, but maybe point out what the red tape is you're concerned about? |
@@ -149,7 +149,7 @@ def __exit__(self, exc_type, exc_value, tb): | |||
|
|||
|
|||
class TestCase(unittest.TestCase): | |||
longMessage = False | |||
longMessage = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand your comment. If it's true by default, why would someone set it to True?
sklearn/utils/estimator_checks.py
Outdated
try: | ||
estimator.set_params(**params) | ||
except Exception: | ||
# triggered some parameter validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the output of the test-suite should not be massive. There are a lot of issues in master because we were not careful about handling warnings.
sklearn/utils/estimator_checks.py
Outdated
try: | ||
estimator.set_params(**params) | ||
except Exception: | ||
# triggered some parameter validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for raising a warning (or even error?) here. Did this error for all_estimators? where? Usually each new test in estimator_checks
requires fixes to the codebase. This one here is probably the one that should have been fixed. Not having any errors on the codebase is suspicious to me.
sklearn/utils/estimator_checks.py
Outdated
try: | ||
estimator.set_params(**params) | ||
except Exception: | ||
# triggered some parameter validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a second I thought running this on a deep estimator would fail, but I guess we avoid this by doing deep=False
. It would be really nice to run estimator_checks
on some deep estimators though (I tried and ran into a whole host of problems)
assert_raises_regex(AssertionError, msg, check_estimator, | ||
ModifiesValueInsteadOfRaisingError()) | ||
check_estimator(RaisesError()) | ||
assert_raises_regex(AssertionError, msg, check_estimator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you maybe run the test on a pipeline to make sure it works on deep estimators?
@jnothman @amueller
|
yes, for good or bad, some estimators currently validate upon setting. I
don't think we need to be bureaucratic/frameworkish about this - such
exceptions are pretty harmless - and should focus on the essence of the PR:
testing that set_params can be used.
I'd be tempted, however, to make it only catch ValueErrors, not all
Exceptions.
|
doc/whats_new/v0.20.rst
Outdated
@@ -303,3 +303,8 @@ Changes to estimator checks | |||
- Allow :func:`estimator_checks.check_estimator` to check that there is no | |||
private settings apart from parameters during estimator initialization. | |||
:issue:`9378` by :user:`Herilalaina Rakotoarison <herilalaina>` | |||
|
|||
- Added :func:`estimator_checks.check_set_params` test which checks that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link won't work as we don't generate docs for estimator checks (though perhaps we should)
doc/whats_new/v0.20.rst
Outdated
@@ -303,3 +303,8 @@ Changes to estimator checks | |||
- Allow :func:`estimator_checks.check_estimator` to check that there is no | |||
private settings apart from parameters during estimator initialization. | |||
:issue:`9378` by :user:`Herilalaina Rakotoarison <herilalaina>` | |||
|
|||
- Added :func:`estimator_checks.check_set_params` test which checks that | |||
`set_params` is equivalent to passing parameters in `__init__` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double backticks please.
Should I squash and rebase? |
Don't squash and rebase. We'll do that upon merge. Otherwise it makes it
hard to track what's changed.
|
No longer needed as we are using pytests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I pushed a change remove assert_is. Once the CI are green, I will merge.
Merged. Thank you! |
Reference Issue
Fix #7738
What does this implement/fix? Explain your changes.
Testset_params
by addingestimator.set_params(**params)
tocheck_parameters_default_constructible
Test
set_params
by adding acheck_set_params
to theestimator_checks
module.