8000 [MRG+2] Add common test for set_params behavior by absolutelyNoWarranty · Pull Request #7760 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+2] Add common test for set_params behavior #7760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jul 16, 2018

Conversation

absolutelyNoWarranty
Copy link
Contributor
@absolutelyNoWarranty absolutelyNoWarranty commented Oct 26, 2016

Reference Issue

Fix #7738

What does this implement/fix? Explain your changes.

Test set_params by adding estimator.set_params(**params) to check_parameters_default_constructible
Test set_params by adding a check_set_params to the estimator_checks module.

@@ -66,6 +75,9 @@ def test_check_estimator():
# check that we have a set_params and can clone
msg = "it does not implement a 'get_params' methods"
assert_raises_regex(TypeError, msg, check_estimator, object)
# check that properties can be set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow I kinda forgot that I wrote this test. This looks great!

@amueller
Copy link
Member

Maybe add a stronger test that ensures idempotence as you pointed out? You could have a getter and setter but they are not inverse of each other, like a +1 in one or both of them.

@absolutelyNoWarranty
Copy link
Contributor Author

Is it against sklearn's conventions if a set_param fails due to TypeError, etc? If the getter and setter do processing, then they can fail for all kinds of reasons.

@amueller
Copy link
Member

There is no conventions on what errors can be raised where, I think. I'm not quite sure what you're getting at.

@absolutelyNoWarranty
Copy link
Contributor Author

If p is a property setter with logic (i.e. an addition operation), it could raise Exceptions. Wouldn't that be accidentally doing input validation at init time? In this case at init non numeric values would raise a TypeError.

Currently the contributor guidelines say:

The reason for postponing the validation is that the same validation would have to be performed in set_params, which is used in algorithms like GridSearchCV.

What I am wondering is if property-params are subject to this rule, or not.

Finally, the reason I ask this is because I'm not sure how to test invertibliilty/idempotence in a general way except to set_param(get_param(**params)) with a range of different kinds of values.

Something like

for value in [None, -1, 1, "abc"]:
    est = est.set_params(**{'alpha':value})
    alpha_get_value = est.alpha
    assert alpha_get_value == est.set_params(**{'alpha':alpha_get_value}).get_params()['alpha']

If property-setters raise Exceptions then that wouldn't work.

@@ -1457,6 +1457,7 @@ def param_filter(p):
# true for mixins
return
params = estimator.get_params()
estimator.set_params(**params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put a comment in here to describe what this is testing.

Also: Should we do further checking that get_params() returns the same thing before and after this statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@amueller
Copy link
Member
amueller commented Nov 1, 2016

@absolutelyNoWarranty Sorry I don't get your point about raising errors.

What I am wondering is if property-params are subject to this rule, or not.

Again, I'm not sure what you're asking. The rule says that you shouldn't do validation in __init__ because then set_params wouldn't do the validation and you'd get wrong results in GridSearchCV.
Doing validation in set_params is permissible (though we usually avoid it).
The more general rule is that calling __init__ and set_params should result in the same estimator. An easy way to achieve that is to do any validation and checking and computation in fit.

@amueller
Copy link
Member

Can you please fix the pep8 error?
Also, it would be great if you can check that get_params returns the same before and after the call to set_params.

@jnothman
Copy link
Member

The linter still says no:


./sklearn/utils/estimator_checks.py:1648:1: W293 blank line contains whitespace
^
./sklearn/utils/estimator_checks.py:1649:80: E501 line too long (108 > 79 characters)
    test_values = [-np.inf, np.inf, None, -100, 100, -0.5, 0.5, 0, "", "value", ('a', 'b'), {'key':'value'}]
                                                                               ^
./sklearn/utils/estimator_checks.py:1649:99: E231 missing whitespace after ':'
    test_values = [-np.inf, np.inf, None, -100, 100, -0.5, 0.5, 0, "", "value", ('a', 'b'), {'key':'value'}]
                                                                                                  ^
./sklearn/utils/estimator_checks.py:1663:71: E502 the backslash is redundant between brackets
                             "get_params does not match set_params: " \
                                                                      ^
./sklearn/utils/estimator_checks.py:1664:80: E501 line too long (155 > 79 characters)
                             "called set_params of {0} with {1}={2} but get_params returns {3}={4}".format(name, param_name, value, param_name, get_value))

estimator.set_params(**{param_name: value})
get_value = estimator.get_params()[param_name]
except:
# triggered some parameter validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I don't get this. Shouldn't we error on this? Or is that legal as long as we don't touch it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't realize there had been comments.

Here we don't error because we just want to check that get_value is value if value is successfully set.

pass
else:
errmsg = ("get_params does not match set_params: "
"called set_params of {0} with {1}={2} "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're gonna use explicit numbers you could have not passed param_name twice ;)

param_name, get_value)
assert_equal(value, get_value, errmsg)

with ignore_warnings(category=DeprecationWarning):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this doing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to re-make estimator as in lines 1637-1641.

@amueller
Copy link
Member

@jnothman are we now doing "approve changes" instead of "[MRG + 1]"?

@jnothman
Copy link
Member

No, I currently haven't found an easy way to say "I don't want a cross next to my review so far because the changes I requested have been made, but I haven't done a full review of the code either". Perhaps I should just dismiss reviews more often.

@amueller
Copy link
Member
amueller commented Dec 1, 2016

@jnothman Can't you change it to "comment"? (I don't actually know). I only use "comment" so far, though that is clearly also not ideal

@agramfort
Copy link
Member

@absolutelyNoWarranty you need to rebase.

@absolutelyNoWarranty absolutelyNoWarranty force-pushed the test_set_param branch 3 times, most recently from 5a89deb to 85da7cd Compare June 10, 2017 17:12
@absolutelyNoWarranty
Copy link
Contributor Author

Updated and rebased.

@jnothman jnothman modified the milestones: 0.19, 0.20 Jun 18, 2017
@jnothman jnothman changed the title Add set_param check and test [MRG] Add set_param check and test Dec 12, 2017
@jnothman jnothman changed the title [MRG] Add set_param check and test [MRG] Add common test for set_params behavior Dec 12, 2017
@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_set_params(name, estimator_orig):
if name in META_ESTIMATORS:
return
Copy link
Member
@jnothman jnothman Dec 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this never get executed? And why is it necessary? Surely as long as the object has been constructed, this should work fine.

try:
estimator.set_params(**{param_name: value})
get_value = estimator.get_params()[param_name]
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use except Exception. bare except catches KeyboardInterrupt, for instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps except (ValueError, TypeError)?

"but get_params returns {1}={3}")
errmsg = errmsg.format(name, param_name, value,
get_value)
assert_equal(value, get_value, errmsg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be asserting equality (==) or identity (is)?

get_value)
assert_equal(value, get_value, errmsg)

estimator = clone(estimator_orig)
Copy link
Member
@jnothman jnothman Dec 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not put this at the beginning of the loop, rather than doing one more clone than necessary per estimator?

@jnothman
Copy link
Member

The reduced time is probably because you're no longer calling clone and get_params so often also...

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting, there, I think... But I'd like @amueller to have another go now. This feels like it's a lot of red tape.

Or perhaps we need to run it with scikit-learn-contrib estimators to check what it breaks...

pass
else:
new_params = estimator.get_params()
assert_equal(params.keys(), new_params.keys(), msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not assert_is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing for identity doesn't work or keys even if the dictionary really is the same dictionary.

>>> d = {'a':1, 'b':2}
>>> d is d
True
>>> d.keys() is d.keys()
False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I misread. But actually, I think that we are not assured that this equality check will work. In py2 this value is a list, in iteration order which is not to be assumed deterministic

@@ -106,6 +106,7 @@
assert_greater = _dummy.assertGreater
assert_less_equal = _dummy.assertLessEqual
assert_greater_equal = _dummy.assertGreaterEqual
assert_is = _dummy.assertIs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we no longer use nose,

assert X is Y, msg

is actually sufficient without a helper.

# before and after set_params() with some fuzz
estimator = clone(estimator_orig)

params = estimator.get_params()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be using deep=False, as we will test each estimator separately ...

try:
estimator.set_params(**params)
except Exception:
# triggered some parameter validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ideal world, we might not permit this validation outside of fit either. I think we should consider raising a warning of some kind where this case is over-used.

Copy link
Contributor Author
@absolutelyNoWarranty absolutelyNoWarranty Dec 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point but when is a developer expected to see the warning? As part of the (massive) output of the test suite?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the output of the test-suite should not be massive. There are a lot of issues in master because we were not careful about handling warnings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for raising a warning (or even error?) here. Did this error for all_estimators? where? Usually each new test in estimator_checks requires fixes to the codebase. This one here is probably the one that should have been fixed. Not having any errors on the codebase is suspicious to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a second I thought running this on a deep estimator would fail, but I guess we avoid this by doing deep=False. It would be really nice to run estimator_checks on some deep estimators though (I tried and ran into a whole host of problems)

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jnothman
Copy link
Member

Please add an entry to the change log at doc/whats_new/v0.20.rst. There is a section for changes to estimator checks. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

@jnothman
Copy link
Member
jnothman commented Dec 19, 2017 via email

@amueller
Copy link
Member

The idea is that developers run check_estimator on a smaller set of objects than we do here, and that they may be more diligent than we are about responding to warnings!

Always the optimist. Gonna do another round now, but maybe point out what the red tape is you're concerned about?

@@ -149,7 +149,7 @@ def __exit__(self, exc_type, exc_value, tb):


class TestCase(unittest.TestCase):
longMessage = False
longMessage = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your comment. If it's true by default, why would someone set it to True?

try:
estimator.set_params(**params)
except Exception:
# triggered some parameter validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the output of the test-suite should not be massive. There are a lot of issues in master because we were not careful about handling warnings.

try:
estimator.set_params(**params)
except Exception:
# triggered some parameter validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for raising a warning (or even error?) here. Did this error for all_estimators? where? Usually each new test in estimator_checks requires fixes to the codebase. This one here is probably the one that should have been fixed. Not having any errors on the codebase is suspicious to me.

try:
estimator.set_params(**params)
except Exception:
# triggered some parameter validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a second I thought running this on a deep estimator would fail, but I guess we avoid this by doing deep=False. It would be really nice to run estimator_checks on some deep estimators though (I tried and ran into a whole host of problems)

assert_raises_regex(AssertionError, msg, check_estimator,
ModifiesValueInsteadOfRaisingError())
check_estimator(RaisesError())
assert_raises_regex(AssertionError, msg, check_estimator,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you maybe run the test on a pipeline to make sure it works on deep estimators?

Sorry, something went wrong.

@absolutelyNoWarranty
Copy link
Contributor Author
absolutelyNoWarranty commented Dec 20, 2017

@jnothman @amueller
I just realized that what should happen to params after an Exception is currently not checked. If parameter validation raises an Exception, should the params change still go through?
Currently this happens on some estimators:

>>> from sklearn.linear_model import SGDRegressor
>>> est = SGDRegressor()
>>> est.get_params()['loss']
'squared_loss'
>>> est.set_params(loss='thisdoesnotexist')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sklearn/linear_model/stochastic_gradient.py", line 78, in set_params
    self._validate_params(set_max_iter=False)
  File "sklearn/linear_model/stochastic_gradient.py", line 108, in _validate_params
    raise ValueError("The loss %s is not supported. " % self.loss)
ValueError: The loss thisdoesnotexist is not supported. 
>>> est.get_params()['loss']
'thisdoesnotexist'

@jnothman
Copy link
Member
jnothman commented Dec 20, 2017 via email

@@ -303,3 +303,8 @@ Changes to estimator checks
- Allow :func:`estimator_checks.check_estimator` to check that there is no
private settings apart from parameters during estimator initialization.
:issue:`9378` by :user:`Herilalaina Rakotoarison <herilalaina>`

- Added :func:`estimator_checks.check_set_params` test which checks that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link won't work as we don't generate docs for estimator checks (though perhaps we should)

@@ -303,3 +303,8 @@ Changes to estimator checks
- Allow :func:`estimator_checks.check_estimator` to check that there is no
private settings apart from parameters during estimator initialization.
:issue:`9378` by :user:`Herilalaina Rakotoarison <herilalaina>`

- Added :func:`estimator_checks.check_set_params` test which checks that
`set_params` is equivalent to passing parameters in `__init__` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double backticks please.

@absolutelyNoWarranty
Copy link
Contributor Author
absolutelyNoWarranty commented Dec 24, 2017

Should I squash and rebase?

@jnothman
Copy link
Member
jnothman commented Dec 24, 2017 via email

@jnothman jnothman changed the title [MRG] Add common test for set_params behavior [MRG+1] Add common test for set_params behavior Mar 20, 2018
@GaelVaroquaux GaelVaroquaux changed the title [MRG+1] Add common test for set_params behavior [MRG+2] Add common test for set_params behavior Jul 16, 2018
Copy link
Member
@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I pushed a change remove assert_is. Once the CI are green, I will merge.

@GaelVaroquaux GaelVaroquaux merged commit 46913ad into scikit-learn:master Jul 16, 2018
@GaelVaroquaux
Copy link
Member

Merged. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stronger common tests for setting init params? / check_estimator
5 participants
0