-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+2] Adds helpful messages in all error assertions in estimator_checks #9588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2b05c74
to
7046d97
Compare
Kindly review this Pull Request. Need reviews on how the nice error messages should look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your error messages should be telling the person who wrote some new estimator what the estimator should be doing but did not.
sklearn/utils/estimator_checks.py
Outdated
@@ -688,7 +688,9 @@ def check_transformers_unfitted(name, transformer): | |||
X, y = _boston_subset() | |||
|
|||
transformer = clone(transformer) | |||
assert_raises((AttributeError, ValueError), transformer.transform, X) | |||
with assert_raises((AttributeError, ValueError), | |||
msg="Transformers unfitted"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Message should be more like: "The unfitted transformer {name} does not raise an error when transform
is called. Perhaps use check_is_fitted"
Why do we have two
|
I am not sure about why same statement is used twice ( |
The duplicated assertion is probably an error. Is it possible one of those
should have been a method other than decision_function?
On 27 Aug 2017 10:30 pm, "Kumar Ashutosh" <notifications@github.com> wrote:
I am not sure about why same statement is used twice (assert_raises(ValueError,
classifier.decision_function, X.T) is stated twice). I guess the rest is
fine. WDYT?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#9588 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6xyYoRIgiAS-6Mbwl4Rc68urDaqlks5scWFygaJpZM4O8vdR>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start, thanks, but I think we can afford to be more specific and hold the hands of developers.
sklearn/utils/estimator_checks.py
Outdated
@@ -760,7 +763,10 @@ def _check_transformer(name, transformer_orig, X, y): | |||
# raises error on malformed input for transform | |||
if hasattr(X, 'T'): | |||
# If it's not an array, it does not have a 'T' property | |||
assert_raises(ValueError, transformer.transform, X.T) | |||
with assert_raises(ValueError, msg="The transformer {name} does " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By malformed here do we mean "has a different number of features to when fitting"?
sklearn/utils/estimator_checks.py
Outdated
@@ -853,7 +859,10 @@ def check_estimators_empty_data_messages(name, estimator_orig): | |||
X_zero_samples = np.empty(0).reshape(0, 3) | |||
# The precise message can change depending on whether X or y is | |||
# validated first. Let us test the type of exception only: | |||
assert_raises(ValueError, e.fit, X_zero_samples, []) | |||
with assert_raises(ValueError, msg="The estimators {name} does not" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Estimators -> estimator
sklearn/utils/estimator_checks.py
Outdated
@@ -988,7 +997,12 @@ def check_estimators_partial_fit_n_features(name, estimator_orig): | |||
except NotImplementedError: | |||
return | |||
|
|||
assert_raises(ValueError, estimator.partial_fit, X[:, :-1], y) | |||
with assert_raises(ValueError, | |||
msg="The estimators {name} does not raise an" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Estimators -> estimator
sklearn/utils/estimator_checks.py
Outdated
@@ -1092,7 +1106,10 @@ def check_classifiers_train(name, classifier_orig): | |||
X -= X.min() | |||
set_random_state(classifier) | |||
# raises error on malformed input for fit | |||
assert_raises(ValueError, classifier.fit, X, y[:-1]) | |||
with assert_raises(ValueError, msg="The classifers {name} does not" | |||
" raise an error when incorrect/malformed input " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be more specific about the malformation
sklearn/utils/estimator_checks.py
Outdated
@@ -1092,7 +1106,10 @@ def check_classifiers_train(name, classifier_orig): | |||
X -= X.min() | |||
set_random_state(classifier) | |||
# raises error on malformed input for fit | |||
assert_raises(ValueError, classifier.fit, X, y[:-1]) | |||
with assert_raises(ValueError, msg="The classifers {name} does not" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Classifiers -> classifier
sklearn/utils/estimator_checks.py
Outdated
@@ -1106,7 +1123,10 @@ def check_classifiers_train(name, classifier_orig): | |||
assert_greater(accuracy_score(y, y_pred), 0.83) | |||
|
|||
# raises error on malformed input for predict | |||
assert_raises(ValueError, classifier.predict, X.T) | |||
with assert_raises(ValueError, msg="The classifers {name} does not" | |||
" raise an error when incorrect/malformed input " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be more specific about the malformation
sklearn/utils/estimator_checks.py
Outdated
@@ -1122,11 +1142,14 @@ def check_classifiers_train(name, classifier_orig): | |||
assert_array_equal(np.argmax(decision, axis=1), y_pred) | |||
|
|||
# raises error on malformed input | |||
assert_raises(ValueError, | |||
classifier.decision_function, X.T) | |||
with assert_raises(ValueError, msg="Malformed inputs"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not addressed yet
Addded the suggested changes. |
Changes done. @jnothman Thanks a lot!! :) |
sklearn/utils/estimator_checks.py
Outdated
@@ -760,7 +763,13 @@ def _check_transformer(name, transformer_orig, X, y): | |||
# raises error on malformed input for transform | |||
if hasattr(X, 'T'): | |||
# If it's not an array, it does not have a 'T' property | |||
assert_raises(ValueError, transformer.transform, X.T) | |||
with assert_raises(ValueError, msg="The classifer {name} does not " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*classifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But should be "transformer" right? "The transformer {} does not raise an error when the number of features in transform
is different from the number of features in fit
"?
sklearn/utils/estimator_checks.py
Outdated
msg="The estimator {name} does not raise an" | ||
" error when number of features changes " | ||
"between calls to partial_fit. Perhaps" | ||
" use check_X_y"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does check_X_y
relate to this?
sklearn/utils/estimator_checks.py
Outdated
" raise an error when incorrect/malformed input " | ||
"data for fit is passed. Number of training exam" | ||
"ples is not the same as the number of " | ||
"labels. Perhapse use check_array"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_X_y
sklearn/utils/estimator_checks.py
Outdated
" raise an error when incorrect/malformed input " | ||
" for predict is passed. Number of features in p" | ||
"redict dataset does not match the number of fea" | ||
"tures in fit dataset. Perhaps use check_array"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think check_array
helps here.
sklearn/utils/estimator_checks.py
Outdated
" for predict is passed. Number of features in " | ||
"predict dataset does not match the number of f" | ||
"eatures in fit dataset." | ||
" Perhaps use check_array"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think check_array
helps here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be fine if I only replace this statement with the one you mentioned "The transformer {} does not raise an error when the number of features in transform is different from the number of features in fit" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure (though you want to keep the name).
sklearn/utils/estimator_checks.py
Outdated
classifier.decision_function, X.T) | ||
with assert_raises(ValueError, msg="The classifer {name} does " | ||
"not raise an error when incorrect/malforme" | ||
"d input for decision_function is passed. N" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really try to make this a bit shorter and maybe make it one sentence. The context of the second sentence is slightly confusing to me. Again, I don't think check_array helps here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" The classifier {} does not raise an error when number of features is inconsistent "
How about this? I am not much familiar with error messages, so sorry for this :(
sklearn/utils/estimator_checks.py
Outdated
" raise an error when incorrect/malformed inpu" | ||
"t for predict_proba is passed. Number of fea" | ||
"tures in predict dataset does not match the n" | ||
"umber of features in fit dataset. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
sklearn/utils/estimator_checks.py
Outdated
" raise an error when incorrect/malformed input " | ||
"data for fit is passed. Number of training exam" | ||
"ples is not the same as the number of " | ||
"labels. Perhapse use check_array"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_X_y
@amueller Changes added. Kindly suggest if any more change is required. |
sklearn/utils/estimator_checks.py
Outdated
with assert_raises(ValueError, | ||
msg="The estimator {name} does not raise an" | ||
" error when number of features changes " | ||
"between calls to partial_fit. Perhaps"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps?
sklearn/utils/estimator_checks.py
Outdated
with assert_raises(ValueError, msg="The classifier {name} does not" | ||
" raise an error when the number of features " | ||
"in predict is different from the number of" | ||
" features in fit"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all sentences should end with full stop
@amueller Also, we have redundant assert_raises statement like:
Should I remove the extra same statement? I don't think they serve any special purpose. |
yes. they were not redundant in master, right? |
Yes, they were. Here's the code copied from master
|
@amueller Hope this works!! |
looks good :) |
+1 from me but maybe @jnothman wants to have another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise looks good
sklearn/utils/estimator_checks.py
Outdated
@@ -688,7 +688,10 @@ def check_transformers_unfitted(name, transformer): | |||
X, y = _boston_subset() | |||
|
|||
transformer = clone(transformer) | |||
assert_raises((AttributeError, ValueError), transformer.transform, X) | |||
with assert_raises((AttributeError, ValueError), msg="The unfitted " | |||
"transformer {name} does not raise an error when " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I missed it. Does this {name} thing work??? Surely we need to format the string...
@jnothman Changes added. Kindly review. Hope this works!! |
sklearn/utils/estimator_checks.py
Outdated
@@ -688,7 +688,11 @@ def check_transformers_unfitted(name, transformer): | |||
X, y = _boston_subset() | |||
|
|||
transformer = clone(transformer) | |||
assert_raises((AttributeError, ValueError), transformer.transform, X) | |||
with assert_raises((AttributeError, ValueError), msg="The unfitted " | |||
"transformer {} does not raise an error when tra" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please please please, why would you break the line in the middle of words 🙃 ? Can you fix that everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to minimise the number of lines.. sorry :( .. will be done in 5 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lesteve Added the changes. Kindly review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that was rather creative but there is nothing to be sorry about ;-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments with improvements to the error messages.
sklearn/utils/estimator_checks.py
Outdated
with assert_raises((AttributeError, ValueError), msg="The unfitted " | ||
"transformer {} does not raise an error when " | ||
"transform is called. Perhaps use " | ||
"check_is_fitted.".format(name)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be more explicit: "use check_is_fitted in transform"
sklearn/utils/estimator_checks.py
Outdated
with assert_raises(ValueError, msg="The estimator {} does not" | ||
" raise an error when an empty data is used " | ||
"to train. Perhaps use " | ||
"check_array.".format(name)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use check_array in train.
sklearn/utils/estimator_checks.py
Outdated
assert_raises(ValueError, estimator.partial_fit, X[:, :-1], y) | ||
with assert_raises(ValueError, | ||
msg="The estimator {} does not raise an" | ||
" error when number of features changes " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the number of features
sklearn/utils/estimator_checks.py
Outdated
assert_raises(ValueError, classifier.fit, X, y[:-1]) | ||
with assert_raises(ValueError, msg="The classifer {} does not" | ||
" raise an error when incorrect/malformed input " | ||
"data for fit is passed. Number of training " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of training examples
sklearn/utils/estimator_checks.py
Outdated
" raise an error when incorrect/malformed input " | ||
"data for fit is passed. Number of training " | ||
"examples is not the same as the number of " | ||
"labels. Perhapse use check_X_y.".format(name)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps (without the e at the end) use check_X_y in fit.
sklearn/utils/estimator_checks.py
Outdated
assert_raises(ValueError, regressor.fit, X, y[:-1]) | ||
with assert_raises(ValueError, msg="The classifer {} does not" | ||
" raise an error when incorrect/malformed input " | ||
"data for fit is passed. Number of training " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of training
sklearn/utils/estimator_checks.py
Outdated
" raise an error when incorrect/malformed input " | ||
"data for fit is passed. Number of training " | ||
"examples is not the same as the number of " | ||
"labels. Perhapse use check_X_y.".format(name)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in Perhaps
use check_X_y in fit
@lesteve changes added. |
Thanks a lot, merging! |
thanks! |
You wanted me to have a look between 2:30am and 6:30am?? :P
All good! Thanks, @thechargedneutron
…On 30 August 2017 at 07:04, Kumar Ashutosh ***@***.***> wrote:
Thanks a lot @lesteve <https://github.com/lesteve> @amueller
<https://github.com/amueller> . Feels good to contribute with such a
great mentor support. :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9588 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz69wD6EBUC53v3DaP-uCOStIKXFRpks5sdHzugaJpZM4O8vdR>
.
|
Reference Issue
Fixes #9572
What does this implement/fix? Explain your changes.
Added a context manager for each occurrence of
assert_raises
.The
msg
arguments have not been changed which returns nice error messages.