[MRG+2] Adds helpful messages in all error assertions in estimator_checks #9588

thechargedneutron · 2017-08-20T21:54:59Z

Reference Issue

What does this implement/fix? Explain your changes.

Added a context manager for each occurrence of assert_raises.
The msg arguments have not been changed which returns nice error messages.

thechargedneutron · 2017-08-21T18:02:52Z

Kindly review this Pull Request. Need reviews on how the nice error messages should look.

jnothman

Your error messages should be telling the person who wrote some new estimator what the estimator should be doing but did not.

jnothman · 2017-08-21T22:54:45Z

sklearn/utils/estimator_checks.py

@@ -688,7 +688,9 @@ def check_transformers_unfitted(name, transformer):
    X, y = _boston_subset()

    transformer = clone(transformer)
-    assert_raises((AttributeError, ValueError), transformer.transform, X)
+    with assert_raises((AttributeError, ValueError),
+                       msg="Transformers unfitted"):


Message should be more like: "The unfitted transformer {name} does not raise an error when transform is called. Perhaps use check_is_fitted"

thechargedneutron · 2017-08-24T14:25:05Z

Why do we have two assert_raises having the same statements in estimators_ckeck.py:

# raises error on malformed input
assert_raises(ValueError, classifier.decision_function, X.T)
 # raises error on malformed input for decision_function
 assert_raises(ValueError, classifier.decision_function, X.T)

…into fourth

thechargedneutron · 2017-08-27T12:30:39Z

I am not sure about why same statement is used twice (assert_raises(ValueError, classifier.decision_function, X.T) is stated twice). I guess the rest is fine. WDYT?

jnothman · 2017-08-27T22:34:58Z

The duplicated assertion is probably an error. Is it possible one of those should have been a method other than decision_function? On 27 Aug 2017 10:30 pm, "Kumar Ashutosh" <notifications@github.com> wrote: I am not sure about why same statement is used twice (assert_raises(ValueError, classifier.decision_function, X.T) is stated twice). I guess the rest is fine. WDYT? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9588 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6xyYoRIgiAS-6Mbwl4Rc68urDaqlks5scWFygaJpZM4O8vdR> .

jnothman

This is a great start, thanks, but I think we can afford to be more specific and hold the hands of developers.

jnothman · 2017-08-27T22:38:08Z

sklearn/utils/estimator_checks.py

@@ -760,7 +763,10 @@ def _check_transformer(name, transformer_orig, X, y):
        # raises error on malformed input for transform
        if hasattr(X, 'T'):
            # If it's not an array, it does not have a 'T' property
-            assert_raises(ValueError, transformer.transform, X.T)
+            with assert_raises(ValueError, msg="The transformer {name} does "


By malformed here do we mean "has a different number of features to when fitting"?

jnothman · 2017-08-27T22:38:55Z

sklearn/utils/estimator_checks.py

@@ -853,7 +859,10 @@ def check_estimators_empty_data_messages(name, estimator_orig):
    X_zero_samples = np.empty(0).reshape(0, 3)
    # The precise message can change depending on whether X or y is
    # validated first. Let us test the type of exception only:
-    assert_raises(ValueError, e.fit, X_zero_samples, [])
+    with assert_raises(ValueError, msg="The estimators {name} does not"


Estimators -> estimator

jnothman · 2017-08-27T22:39:17Z

sklearn/utils/estimator_checks.py

@@ -988,7 +997,12 @@ def check_estimators_partial_fit_n_features(name, estimator_orig):
    except NotImplementedError:
        return

-    assert_raises(ValueError, estimator.partial_fit, X[:, :-1], y)
+    with assert_raises(ValueError,
+                       msg="The estimators {name} does not raise an"


Estimators -> estimator

jnothman · 2017-08-27T22:39:45Z

sklearn/utils/estimator_checks.py

@@ -1092,7 +1106,10 @@ def check_classifiers_train(name, classifier_orig):
            X -= X.min()
        set_random_state(classifier)
        # raises error on malformed input for fit
-        assert_raises(ValueError, classifier.fit, X, y[:-1])
+        with assert_raises(ValueError, msg="The classifers {name} does not"
+                           " raise an error when incorrect/malformed input "


Be more specific about the malformation

jnothman · 2017-08-27T22:40:05Z

sklearn/utils/estimator_checks.py

@@ -1092,7 +1106,10 @@ def check_classifiers_train(name, classifier_orig):
            X -= X.min()
        set_random_state(classifier)
        # raises error on malformed input for fit
-        assert_raises(ValueError, classifier.fit, X, y[:-1])
+        with assert_raises(ValueError, msg="The classifers {name} does not"


Classifiers -> classifier

jnothman · 2017-08-27T22:40:43Z

sklearn/utils/estimator_checks.py

@@ -1106,7 +1123,10 @@ def check_classifiers_train(name, classifier_orig):
            assert_greater(accuracy_score(y, y_pred), 0.83)

        # raises error on malformed input for predict
-        assert_raises(ValueError, classifier.predict, X.T)
+        with assert_raises(ValueError, msg="The classifers {name} does not"
+                           " raise an error when incorrect/malformed input "


Be more specific about the malformation

jnothman · 2017-08-27T22:40:52Z

sklearn/utils/estimator_checks.py

@@ -1122,11 +1142,14 @@ def check_classifiers_train(name, classifier_orig):
                    assert_array_equal(np.argmax(decision, axis=1), y_pred)

                # raises error on malformed input
-                assert_raises(ValueError,
-                              classifier.decision_function, X.T)
+                with assert_raises(ValueError, msg="Malformed inputs"):


not addressed yet

…into fourth

thechargedneutron · 2017-08-28T11:59:26Z

Addded the suggested changes.
Also, I don't think we can use some other method in place of the duplicate assert statements. We already tested for a deformed X, no need to test it for other malformed version of X. WDYT?

thechargedneutron · 2017-08-28T13:00:24Z

Changes done. @jnothman Thanks a lot!! :)

amueller · 2017-08-28T20:16:43Z

sklearn/utils/estimator_checks.py

@@ -760,7 +763,13 @@ def _check_transformer(name, transformer_orig, X, y):
        # raises error on malformed input for transform
        if hasattr(X, 'T'):
            # If it's not an array, it does not have a 'T' property
-            assert_raises(ValueError, transformer.transform, X.T)
+            with assert_raises(ValueError, msg="The classifer {name} does not "


*classifier

But should be "transformer" right? "The transformer {} does not raise an error when the number of features in transform is different from the number of features in fit"?

amueller · 2017-08-28T20:19:12Z

sklearn/utils/estimator_checks.py

+                       msg="The estimator {name} does not raise an"
+                           " error when number of features changes "
+                           "between calls to partial_fit. Perhaps"
+                           " use check_X_y"):


How does check_X_y relate to this?

amueller · 2017-08-28T20:19:27Z

sklearn/utils/estimator_checks.py

+                           " raise an error when incorrect/malformed input "
+                           "data for fit is passed. Number of training exam"
+                           "ples is not the same as the number of "
+                           "labels. Perhapse use check_array"):


check_X_y

amueller · 2017-08-28T20:20:00Z

sklearn/utils/estimator_checks.py

+                           " raise an error when incorrect/malformed input "
+                           " for predict is passed. Number of features in p"
+                           "redict dataset does not match the number of fea"
+                           "tures in fit dataset. Perhaps use check_array"):


I don't think check_array helps here.

amueller · 2017-08-28T20:20:19Z

sklearn/utils/estimator_checks.py

+                               " for predict is passed. Number of features in "
+                               "predict dataset does not match the number of f"
+                               "eatures in fit dataset."
+                               " Perhaps use check_array"):


I don't think check_array helps here.

Would it be fine if I only replace this statement with the one you mentioned "The transformer {} does not raise an error when the number of features in transform is different from the number of features in fit" ?

sure (though you want to keep the name).

amueller · 2017-08-28T20:21:40Z

sklearn/utils/estimator_checks.py

-                              classifier.decision_function, X.T)
+                with assert_raises(ValueError, msg="The classifer {name} does "
+                                   "not raise an error when incorrect/malforme"
+                                   "d input for decision_function is passed. N"


I would really try to make this a bit shorter and maybe make it one sentence. The context of the second sentence is slightly confusing to me. Again, I don't think check_array helps here.

" The classifier {} does not raise an error when number of features is inconsistent "
How about this? I am not much familiar with error messages, so sorry for this :(

amueller · 2017-08-28T20:21:53Z

sklearn/utils/estimator_checks.py

+                               " raise an error when incorrect/malformed inpu"
+                               "t  for predict_proba is passed. Number of fea"
+                               "tures in predict dataset does not match the n"
+                               "umber of features in fit dataset. "


amueller · 2017-08-28T20:22:03Z

sklearn/utils/estimator_checks.py

+                       " raise an error when incorrect/malformed input "
+                       "data for fit is passed. Number of training exam"
+                       "ples is not the same as the number of "
+                       "labels. Perhapse use check_array"):


…into fourth

thechargedneutron · 2017-08-28T21:05:32Z

@amueller Changes added. Kindly suggest if any more change is required.

amueller · 2017-08-28T21:08:59Z

sklearn/utils/estimator_checks.py

+    with assert_raises(ValueError,
+                       msg="The estimator {name} does not raise an"
+                           " error when number of features changes "
+                           "between calls to partial_fit. Perhaps"):


amueller · 2017-08-28T21:09:15Z

sklearn/utils/estimator_checks.py

+        with assert_raises(ValueError, msg="The classifier {name} does not"
+                           " raise an error when the number of features "
+                           "in predict is different from the number of"
+                           " features in fit"):


all sentences should end with full stop

thechargedneutron · 2017-08-28T21:14:56Z

@amueller Also, we have redundant assert_raises statement like:

with assert_raises(ValueError, msg="Malformed inputs"):
                    classifier.decision_function(X.T)
# raises error on malformed input for decision_function
with assert_raises(ValueError, msg="The classifier {name} does"
                               " not raise an error when the number of fea"
                               "tures in decision_function is different "
                               "from the number of features in fit."):
               classifier.decision_function(X.T)

Should I remove the extra same statement? I don't think they serve any special purpose.

amueller · 2017-08-28T21:16:13Z

yes. they were not redundant in master, right?

thechargedneutron · 2017-08-28T21:18:24Z

Yes, they were. Here's the code copied from master

 # raises error on malformed input
assert_raises(ValueError, classifier.predict_proba, X.T)
# raises error on malformed input for predict_proba
assert_raises(ValueError, classifier.predict_proba, X.T)

thechargedneutron · 2017-08-28T21:22:43Z

@amueller Hope this works!!

amueller · 2017-08-28T21:40:28Z

looks good :)
you could add tests to test_check_estimator but maybe that's overkill.

amueller · 2017-08-28T21:40:51Z

+1 from me but maybe @jnothman wants to have another look.

jnothman

Otherwise looks good

jnothman · 2017-08-29T00:01:41Z

sklearn/utils/estimator_checks.py

@@ -688,7 +688,10 @@ def check_transformers_unfitted(name, transformer):
    X, y = _boston_subset()

    transformer = clone(transformer)
-    assert_raises((AttributeError, ValueError), transformer.transform, X)
+    with assert_raises((AttributeError, ValueError), msg="The unfitted "
+                       "transformer {name} does not raise an error when "


Perhaps I missed it. Does this {name} thing work??? Surely we need to format the string...

…into fourth

thechargedneutron · 2017-08-29T08:54:01Z

@jnothman Changes added. Kindly review. Hope this works!!

lesteve · 2017-08-29T14:48:45Z

sklearn/utils/estimator_checks.py

@@ -688,7 +688,11 @@ def check_transformers_unfitted(name, transformer):
    X, y = _boston_subset()

    transformer = clone(transformer)
-    assert_raises((AttributeError, ValueError), transformer.transform, X)
+    with assert_raises((AttributeError, ValueError), msg="The unfitted "
+                       "transformer {} does not raise an error when tra"


Please please please, why would you break the line in the middle of words 🙃 ? Can you fix that everywhere?

In order to minimise the number of lines.. sorry :( .. will be done in 5 minutes.

@lesteve Added the changes. Kindly review.

Well that was rather creative but there is nothing to be sorry about ;-).

lesteve

Some minor comments with improvements to the error messages.

lesteve · 2017-08-29T15:05:12Z

sklearn/utils/estimator_checks.py

+    with assert_raises((AttributeError, ValueError), msg="The unfitted "
+                       "transformer {} does not raise an error when "
+                       "transform is called. Perhaps use "
+                       "check_is_fitted.".format(name)):


I would be more explicit: "use check_is_fitted in transform"

lesteve · 2017-08-29T15:05:58Z

sklearn/utils/estimator_checks.py

+    with assert_raises(ValueError, msg="The estimator {} does not"
+                       " raise an error when an empty data is used "
+                       "to train. Perhaps use "
+                       "check_array.".format(name)):


use check_array in train.

lesteve · 2017-08-29T15:06:31Z

sklearn/utils/estimator_checks.py

-    assert_raises(ValueError, estimator.partial_fit, X[:, :-1], y)
+    with assert_raises(ValueError,
+                       msg="The estimator {} does not raise an"
+                           " error when number of features changes "


when the number of features

lesteve · 2017-08-29T15:07:04Z

sklearn/utils/estimator_checks.py

-        assert_raises(ValueError, classifier.fit, X, y[:-1])
+        with assert_raises(ValueError, msg="The classifer {} does not"
+                           " raise an error when incorrect/malformed input "
+                           "data for fit is passed. Number of training "


The number of training examples

lesteve · 2017-08-29T15:07:37Z

sklearn/utils/estimator_checks.py

+                           " raise an error when incorrect/malformed input "
+                           "data for fit is passed. Number of training "
+                           "examples is not the same as the number of "
+                           "labels. Perhapse use check_X_y.".format(name)):


Perhaps (without the e at the end) use check_X_y in fit.

lesteve · 2017-08-29T15:08:10Z

sklearn/utils/estimator_checks.py

-    assert_raises(ValueError, regressor.fit, X, y[:-1])
+    with assert_raises(ValueError, msg="The classifer {} does not"
+                       " raise an error when incorrect/malformed input "
+                       "data for fit is passed. Number of training "


The number of training

lesteve · 2017-08-29T15:08:35Z

sklearn/utils/estimator_checks.py

+                       " raise an error when incorrect/malformed input "
+                       "data for fit is passed. Number of training "
+                       "examples is not the same as the number of "
+                       "labels. Perhapse use check_X_y.".format(name)):


Typo in Perhaps
use check_X_y in fit

thechargedneutron · 2017-08-29T15:39:49Z

@lesteve changes added.

lesteve · 2017-08-29T16:38:56Z

LGTM, I'll leave a few hours before merging in case @jnothman or @amueller want to have a quick look.

lesteve · 2017-08-29T20:40:31Z

Thanks a lot, merging!

amueller · 2017-08-29T20:57:21Z

thanks!

thechargedneutron · 2017-08-29T21:04:44Z

Thanks a lot @lesteve @amueller . Feels good to contribute with such a great mentor support. :)

jnothman · 2017-08-29T23:53:11Z

You wanted me to have a look between 2:30am and 6:30am?? :P All good! Thanks, @thechargedneutron

…

On 30 August 2017 at 07:04, Kumar Ashutosh ***@***.***> wrote: Thanks a lot @lesteve <https://github.com/lesteve> @amueller <https://github.com/amueller> . Feels good to contribute with such a great mentor support. :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9588 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz69wD6EBUC53v3DaP-uCOStIKXFRpks5sdHzugaJpZM4O8vdR> .

…ecks (scikit-learn#9588)

thechargedneutron added 2 commits August 21, 2017 04:12

Changes added
8000

1847a04

pep8 errors removed

7046d97

thechargedneutron force-pushed the fourth branch from 2b05c74 to 7046d97 Compare August 20, 2017 22:53

jnothman reviewed Aug 21, 2017

View reviewed changes

thechargedneutron added 2 commits August 24, 2017 11:20

msg statements to be improved

1115680

msg added

439310c

thechargedneutron added 3 commits August 25, 2017 16:31

helpful messages added

c6025e7

Chanes to estimator_check added

8929748

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

548fb81

…into fourth

thechargedneutron changed the title ~~[WIP] Adds helpful messages in all Assertions in estimator_checks~~ [MRG] Adds helpful messages in all Assertions in estimator_checks Aug 27, 2017

jnothman reviewed 10000 Aug 27, 2017

View reviewed changes

thechargedneutron added 2 commits August 28, 2017 13:23

changes added

402ce7f

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

fb12e85

…into fourth

jnothman changed the title ~~[MRG] Adds helpful messages in all Assertions in estimator_checks~~ [MRG+1] Adds helpful messages in all error assertions in estimator_checks Aug 28, 2017

added helpful messages

411e3db

amueller reviewed Aug 28, 2017

View reviewed changes

thechargedneutron added 3 commits August 29, 2017 02:30

adds helpful messages

465535b

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

def51c3

…into fourth

pep8 errors removed

b19dd4c

amueller reviewed Aug 28, 2017

View reviewed changes

Redundant code removed

4a70926

jnothman reviewed Aug 29, 2017

View reviewed changes

thechargedneutron added 2 commits August 29, 2017 14:19

changes added

b50a81b

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

c286c8a

…into fourth

lesteve reviewed Aug 29, 2017

View reviewed changes

message statement changed

f40b6aa

lesteve reviewed Aug 29, 2017

View reviewed changes

typos removed

93fcdbd

lesteve changed the title ~~[MRG+1] Adds helpful messages in all error assertions in estimator_checks~~ [MRG+2] Adds helpful messages in all error assertions in estimator_checks Aug 29, 2017

lesteve merged commit 9e606bf into scikit-learn:master Aug 29, 2017

thechargedneutron deleted the fourth branch August 30, 2017 07:59

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+2] Adds helpful messages in all error assertions in estimator_ch…

ed98ca3

…ecks (scikit-learn#9588)

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG+2] Adds helpful messages in all error assertions in estimator_ch…

0b58c83

…ecks (scikit-learn#9588)

rth mentioned this pull request Aug 3, 2018

[MRG+1] DOC Cleaning up what's new for 0.20 #11734

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+2] Adds helpful messages in all error assertions in estimator_checks #9588

[MRG+2] Adds helpful messages in all error assertions in estimator_checks #9588

[MRG+2] Adds helpful messages in all error assertions in estimator_checks #9588

[MRG+2] Adds helpful messages in all error assertions in estimator_checks #9588

Conversation

Reference Issue

What does this implement/fix? Explain your changes.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment