[MRG+2] Add common test for set_params behavior #7760

absolutelyNoWarranty · 2016-10-26T16:03:36Z

Reference Issue

What does this implement/fix? Explain your changes.

~~Test set_params by adding estimator.set_params(**params) to check_parameters_default_constructible~~
Test set_params by adding a check_set_params to the estimator_checks module.

amueller · 2016-10-26T19:26:26Z

sklearn/utils/tests/test_estimator_checks.py

@@ -66,6 +75,9 @@ def test_check_estimator():
    # check that we have a set_params and can clone
    msg = "it does not implement a 'get_params' methods"
    assert_raises_regex(TypeError, msg, check_estimator, object)
+    # check that properties can be set


wow I kinda forgot that I wrote this test. This looks great!

amueller · 2016-10-26T19:27:33Z

Maybe add a stronger test that ensures idempotence as you pointed out? You could have a getter and setter but they are not inverse of each other, like a +1 in one or both of them.

absolutelyNoWarranty · 2016-10-27T16:21:40Z

Is it against sklearn's conventions if a set_param fails due to TypeError, etc? If the getter and setter do processing, then they can fail for all kinds of reasons.

amueller · 2016-10-27T17:27:51Z

There is no conventions on what errors can be raised where, I think. I'm not quite sure what you're getting at.

absolutelyNoWarranty · 2016-10-28T02:09:01Z

If p is a property setter with logic (i.e. an addition operation), it could raise Exceptions. Wouldn't that be accidentally doing input validation at init time? In this case at init non numeric values would raise a TypeError.

Currently the contributor guidelines say:

The reason for postponing the validation is that the same validation would have to be performed in set_params, which is used in algorithms like GridSearchCV.

What I am wondering is if property-params are subject to this rule, or not.

Finally, the reason I ask this is because I'm not sure how to test invertibliilty/idempotence in a general way except to set_param(get_param(**params)) with a range of different kinds of values.

Something like

for value in [None, -1, 1, "abc"]:
    est = est.set_params(**{'alpha':value})
    alpha_get_value = est.alpha
    assert alpha_get_value == est.set_params(**{'alpha':alpha_get_value}).get_params()['alpha']

If property-setters raise Exceptions then that wouldn't work.

jnothman · 2016-11-01T05:03:27Z

sklearn/utils/estimator_checks.py

@@ -1457,6 +1457,7 @@ def param_filter(p):
            # true for mixins
            return
        params = estimator.get_params()
+        estimator.set_params(**params)


Please put a comment in here to describe what this is testing.

Also: Should we do further checking that get_params() returns the same thing before and after this statement.

amueller · 2016-11-01T21:38:25Z

@absolutelyNoWarranty Sorry I don't get your point about raising errors.

What I am wondering is if property-params are subject to this rule, or not.

Again, I'm not sure what you're asking. The rule says that you shouldn't do validation in __init__ because then set_params wouldn't do the validation and you'd get wrong results in GridSearchCV.
Doing validation in set_params is permissible (though we usually avoid it).
The more general rule is that calling __init__ and set_params should result in the same estimator. An easy way to achieve that is to do any validation and checking and computation in fit.

amueller · 2016-11-22T19:30:11Z

Can you please fix the pep8 error?
Also, it would be great if you can check that get_params returns the same before and after the call to set_params.

jnothman · 2016-11-28T01:27:31Z

The linter still says no:


./sklearn/utils/estimator_checks.py:1648:1: W293 blank line contains whitespace
^
./sklearn/utils/estimator_checks.py:1649:80: E501 line too long (108 > 79 characters)
    test_values = [-np.inf, np.inf, None, -100, 100, -0.5, 0.5, 0, "", "value", ('a', 'b'), {'key':'value'}]
                                                                               ^
./sklearn/utils/estimator_checks.py:1649:99: E231 missing whitespace after ':'
    test_values = [-np.inf, np.inf, None, -100, 100, -0.5, 0.5, 0, "", "value", ('a', 'b'), {'key':'value'}]
                                                                                                  ^
./sklearn/utils/estimator_checks.py:1663:71: E502 the backslash is redundant between brackets
                             "get_params does not match set_params: " \
                                                                      ^
./sklearn/utils/estimator_checks.py:1664:80: E501 line too long (155 > 79 characters)
                             "called set_params of {0} with {1}={2} but get_params returns {3}={4}".format(name, param_name, value, param_name, get_value))

amueller · 2016-11-30T19:35:29Z

sklearn/utils/estimator_checks.py

+                estimator.set_params(**{param_name: value})
+                get_value = estimator.get_params()[param_name]
+            except:
+                # triggered some parameter validation


Wait I don't get this. Shouldn't we error on this? Or is that legal as long as we don't touch it?

Sorry, I didn't realize there had been comments.

Here we don't error because we just want to check that get_value is value if value is successfully set.

amueller · 2016-11-30T19:35:53Z

sklearn/utils/estimator_checks.py

+                pass
+            else:
+                errmsg = ("get_params does not match set_params: "
+                          "called set_params of {0} with {1}={2} "


if you're gonna use explicit numbers you could have not passed param_name twice ;)

amueller · 2016-11-30T19:36:20Z

sklearn/utils/estimator_checks.py

+                                       param_name, get_value)
+                assert_equal(value, get_value, errmsg)
+
+        with ignore_warnings(category=DeprecationWarning):


What is this doing here?

Just to re-make estimator as in lines 1637-1641.

amueller · 2016-11-30T19:37:24Z

@jnothman are we now doing "approve changes" instead of "[MRG + 1]"?

jnothman · 2016-11-30T22:43:42Z

No, I currently haven't found an easy way to say "I don't want a cross next to my review so far because the changes I requested have been made, but I haven't done a full review of the code either". Perhaps I should just dismiss reviews more often.

amueller · 2016-12-01T21:29:48Z

@jnothman Can't you change it to "comment"? (I don't actually know). I only use "comment" so far, though that is clearly also not ideal

agramfort · 2017-06-08T11:11:56Z

@absolutelyNoWarranty you need to rebase.

absolutelyNoWarranty · 2017-06-12T01:53:37Z

Updated and rebased.

jnothman · 2017-12-12T02:53:32Z

sklearn/utils/estimator_checks.py

+@ignore_warnings(category=(DeprecationWarning, FutureWarning))
+def check_set_params(name, estimator_orig):
+    if name in META_ESTIMATORS:
+        return


Why does this never get executed? And why is it necessary? Surely as long as the object has been constructed, this should work fine.

jnothman · 2017-12-12T02:54:03Z

sklearn/utils/estimator_checks.py

+            try:
+                estimator.set_params(**{param_name: value})
+                get_value = estimator.get_params()[param_name]
+            except:


Please use except Exception. bare except catches KeyboardInterrupt, for instance.

Or perhaps except (ValueError, TypeError)?

jnothman · 2017-12-12T02:54:57Z

sklearn/utils/estimator_checks.py

+                          "but get_params returns {1}={3}")
+                errmsg = errmsg.format(name, param_name, value,
+                                       get_value)
+                assert_equal(value, get_value, errmsg)


should we be asserting equality (==) or identity (is)?

jnothman · 2017-12-12T02:55:16Z

sklearn/utils/estimator_checks.py

+                                       get_value)
+                assert_equal(value, get_value, errmsg)
+
+        estimator = clone(estimator_orig)


why not put this at the beginning of the loop, rather than doing one more clone than necessary per estimator?

jnothman · 2017-12-16T21:10:06Z

The reduced time is probably because you're no longer calling clone and get_params so often also...

jnothman

Getting, there, I think... But I'd like @amueller to have another go now. This feels like it's a lot of red tape.

Or perhaps we need to run it with scikit-learn-contrib estimators to check what it breaks...

jnothman · 2017-12-16T21:11:58Z

sklearn/utils/estimator_checks.py

+                pass
+            else:
+                new_params = estimator.get_params()
+                assert_equal(params.keys(), new_params.keys(), msg)


Not assert_is?

Testing for identity doesn't work or keys even if the dictionary really is the same dictionary.

>>> d = {'a':1, 'b':2} >>> d is d True >>> d.keys() is d.keys() False

Sorry. I misread. But actually, I think that we are not assured that this equality check will work. In py2 this value is a list, in iteration order which is not to be assumed deterministic

jnothman · 2017-12-16T21:13:25Z

sklearn/utils/testing.py

@@ -106,6 +106,7 @@
 assert_greater = _dummy.assertGreater
 assert_less_equal = _dummy.assertLessEqual
 assert_greater_equal = _dummy.assertGreaterEqual
+assert_is = _dummy.assertIs


Since we no longer use nose,

assert X is Y, msg

is actually sufficient without a helper.

jnothman · 2017-12-16T21:15:01Z

sklearn/utils/estimator_checks.py

+    # before and after set_params() with some fuzz
+    estimator = clone(estimator_orig)
+
+    params = estimator.get_params()


I think we should be using deep=False, as we will test each estimator separately ...

jnothman · 2017-12-16T21:18:40Z

sklearn/utils/estimator_checks.py

+            try:
+                estimator.set_params(**params)
+            except Exception:
+                # triggered some parameter validation


In the ideal world, we might not permit this validation outside of fit either. I think we should consider raising a warning of some kind where this case is over-used.

I see your point but when is a developer expected to see the warning? As part of the (massive) output of the test suite?

the output of the test-suite should not be massive. There are a lot of issues in master because we were not careful about handling warnings.

+1 for raising a warning (or even error?) here. Did this error for all_estimators? where? Usually each new test in estimator_checks requires fixes to the codebase. This one here is probably the one that should have been fixed. Not having any errors on the codebase is suspicious to me.

For a second I thought running this on a deep estimator would fail, but I guess we avoid this by doing deep=False. It would be really nice to run estimator_checks on some deep estimators though (I tried and ran into a whole host of problems)

jnothman

LGTM!

jnothman · 2017-12-19T01:09:54Z

Please add an entry to the change log at doc/whats_new/v0.20.rst. There is a section for changes to estimator checks. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

jnothman · 2017-12-19T01:57:47Z

The idea is that developers run check_estimator on a smaller set of objects than we do here, and that they may be more diligent than we are about responding to warnings!

amueller · 2017-12-19T02:03:06Z

The idea is that developers run check_estimator on a smaller set of objects than we do here, and that they may be more diligent than we are about responding to warnings!

Always the optimist. Gonna do another round now, but maybe point out what the red tape is you're concerned about?

amueller · 2017-12-19T02:03:50Z

sklearn/utils/_unittest_backport.py

@@ -149,7 +149,7 @@ def __exit__(self, exc_type, exc_value, tb):


 class TestCase(unittest.TestCase):
-    longMessage = False
+    longMessage = True


I don't understand your comment. If it's true by default, why would someone set it to True?

amueller · 2017-12-19T02:06:09Z

sklearn/utils/estimator_checks.py

+            try:
+                estimator.set_params(**params)
+            except Exception:
+                # triggered some parameter validation


the output of the test-suite should not be massive. There are a lot of issues in master because we were not careful about handling warnings.

amueller · 2017-12-19T02:07:45Z

sklearn/utils/estimator_checks.py

+            try:
+                estimator.set_params(**params)
+            except Exception:
+                # triggered some parameter validation


+1 for raising a warning (or even error?) here. Did this error for all_estimators? where? Usually each new test in estimator_checks requires fixes to the codebase. This one here is probably the one that should have been fixed. Not having any errors on the codebase is suspicious to me.

amueller · 2017-12-19T02:10:34Z

sklearn/utils/estimator_checks.py

+            try:
+                estimator.set_params(**params)
+            except Exception:
+                # triggered some parameter validation


For a second I thought running this on a deep estimator would fail, but I guess we avoid this by doing deep=False. It would be really nice to run estimator_checks on some deep estimators though (I tried and ran into a whole host of problems)

amueller · 2017-12-19T02:11:53Z

sklearn/utils/tests/test_estimator_checks.py

+    assert_raises_regex(AssertionError, msg, check_estimator,
+                        ModifiesValueInsteadOfRaisingError())
+    check_estimator(RaisesError())
+    assert_raises_regex(AssertionError, msg, check_estimator,


can you maybe run the test on a pipeline to make sure it works on deep estimators?

absolutelyNoWarranty · 2017-12-20T04:05:45Z

@jnothman @amueller
I just realized that what should happen to params after an Exception is currently not checked. If parameter validation raises an Exception, should the params change still go through?
Currently this happens on some estimators:

>>> from sklearn.linear_model import SGDRegressor
>>> est = SGDRegressor()
>>> est.get_params()['loss']
'squared_loss'
>>> est.set_params(loss='thisdoesnotexist')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sklearn/linear_model/stochastic_gradient.py", line 78, in set_params
    self._validate_params(set_max_iter=False)
  File "sklearn/linear_model/stochastic_gradient.py", line 108, in _validate_params
    raise ValueError("The loss %s is not supported. " % self.loss)
ValueError: The loss thisdoesnotexist is not supported. 
>>> est.get_params()['loss']
'thisdoesnotexist'

jnothman · 2017-12-20T05:01:52Z

yes, for good or bad, some estimators currently validate upon setting. I don't think we need to be bureaucratic/frameworkish about this - such exceptions are pretty harmless - and should focus on the essence of the PR: testing that set_params can be used. I'd be tempted, however, to make it only catch ValueErrors, not all Exceptions.

jnothman · 2017-12-20T21:17:44Z

doc/whats_new/v0.20.rst

@@ -303,3 +303,8 @@ Changes to estimator checks
 - Allow :func:`estimator_checks.check_estimator` to check that there is no
  private settings apart from parameters during estimator initialization.
  :issue:`9378` by :user:`Herilalaina Rakotoarison <herilalaina>`
+
+- Added :func:`estimator_checks.check_set_params` test which checks that


This link won't work as we don't generate docs for estimator checks (though perhaps we should)

jnothman · 2017-12-20T21:18:04Z

doc/whats_new/v0.20.rst

@@ -303,3 +303,8 @@ Changes to estimator checks
 - Allow :func:`estimator_checks.check_estimator` to check that there is no
  private settings apart from parameters during estimator initialization.
  :issue:`9378` by :user:`Herilalaina Rakotoarison <herilalaina>`
+
+- Added :func:`estimator_checks.check_set_params` test which checks that
+  `set_params` is equivalent to passing parameters in `__init__` and


Double backticks please.

absolutelyNoWarranty · 2017-12-24T13:47:05Z

Should I squash and rebase?

jnothman · 2017-12-24T20:09:21Z

Don't squash and rebase. We'll do that upon merge. Otherwise it makes it hard to track what's changed.

No longer needed as we are using pytests

GaelVaroquaux

LGTM.

I pushed a change remove assert_is. Once the CI are green, I will merge.

GaelVaroquaux · 2018-07-16T15:59:10Z

Merged. Thank you!

amueller reviewed Oct 26, 2016

View reviewed changes

jnothman requested changes Nov 1, 2016

View reviewed changes

absolutelyNoWarranty force-pushed the test_set_param branch from f865379 to 96958a2 Compare November 13, 2016 08:49

jnothman approved these changes Nov 28, 2016

< 8000 span class="Button-content"> View reviewed changes

amueller reviewed Nov 30, 2016

View reviewed changes

absolutelyNoWarranty force-pushed the test_set_param branch 3 times, most recently from 5a89deb to 85da7cd Compare June 10, 2017 17:12

absolutelyNoWarranty force-pushed the test_set_param branch from 85da7cd to 7a7743a Compare June 13, 2017 01:46

jnothman modified the milestones: 0.19, 0.20 Jun 18, 2017

absolutelyNoWarranty force-pushed the test_set_param branch from 7a7743a to f66f1cf Compare August 20, 2017 07:31

jnothman changed the title ~~Add set_param check and test~~ [MRG] Add set_param check and test Dec 12, 2017

jnothman changed the title ~~[MRG] Add set_param check and test~~ [MRG] Add common test for set_params behavior Dec 12, 2017

jnothman reviewed Dec 12, 2017

View reviewed changes

jnothman reviewed Dec 16, 2017

View reviewed changes

absolutelyNoWarranty added 2 commits December 18, 2017 14:24

Use deep=False

379171f

Order invariant comparison of dictionary keys

c03f389

jnothman approved these changes Dec 19, 2017

View reviewed changes

amueller reviewed Dec 19, 2017

View reviewed changes

absolutelyNoWarranty added 2 commits December 21, 2017 01:03

Raise warnings

3cfacfc

Edit whats new

c4134a7

jnothman reviewed Dec 20, 2017

View reviewed changes

absolutelyNoWarranty added 5 commits December 21, 2017 14:25

Double backticks

bb43a16

Remove broken link

3b940db

fix flake8

321757b

Rewrite whats_new to not start with :

f2d5572

Fix links to check_estimator

8ce3efd

Merge branch 'master' into test_set_param

7a90d6f

jnothman changed the title ~~[MRG] Add common test for set_params behavior~~ [MRG+1] Add common test for set_params behavior Mar 20, 2018

GaelVaroquaux added 2 commits July 16, 2018 14:32

Merge branch 'master' into test_set_param

8f18c4e

TST: remove assert_is

f799b28

No longer needed as we are using pytests

GaelVaroquaux changed the title ~~[MRG+1] Add common test for set_params behavior~~ [MRG+2] Add common test for set_params behavior Jul 16, 2018

GaelVaroquaux approved these changes Jul 16, 2018

View reviewed changes

GaelVaroquaux merged commit 46913ad into scikit-learn:master Jul 16, 2018

amueller mentioned this pull request Nov 22, 2018

Warnings in common tests about set_params #12652

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+2] Add common test for set_params behavior #7760

[MRG+2] Add common test for set_params behavior #7760

[MRG+2] Add common test for set_params behavior #7760

[MRG+2] Add common test for set_params behavior #7760

Conversation

Reference Issue

What does this implement/fix? Explain your changes.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment