8000 [MRG+2] Adds helpful messages in all error assertions in estimator_checks by thechargedneutron · Pull Request #9588 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+2] Adds helpful messages in all error assertions in estimator_checks #9588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Aug 29, 2017

Conversation

thechargedneutron
Copy link
Contributor

Reference Issue

Fixes #9572

What does this implement/fix? Explain your changes.

Added a context manager for each occurrence of assert_raises.
The msg arguments have not been changed which returns nice error messages.

@thechargedneutron
Copy link
Contributor Author

Kindly review this Pull Request. Need reviews on how the nice error messages should look.

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your error messages should be telling the person who wrote some new estimator what the estimator should be doing but did not.

@@ -688,7 +688,9 @@ def check_transformers_unfitted(name, transformer):
X, y = _boston_subset()

transformer = clone(transformer)
assert_raises((AttributeError, ValueError), transformer.transform, X)
with assert_raises((AttributeError, ValueError),
msg="Transformers unfitted"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message should be more like: "The unfitted transformer {name} does not raise an error when transform is called. Perhaps use check_is_fitted"

@thechargedneutron
Copy link
Contributor Author

Why do we have two assert_raises having the same statements in estimators_ckeck.py:

# raises error on malformed input
assert_raises(ValueError, classifier.decision_function, X.T)
 # raises error on malformed input for decision_function
 assert_raises(ValueError, classifier.decision_function, X.T)

@thechargedneutron thechargedneutron changed the title [WIP] Adds helpful messages in all Assertions in estimator_checks [MRG] Adds helpful messages in all Assertions in estimator_checks Aug 27, 2017
@thechargedneutron
Copy link
Contributor Author

I am not sure about why same statement is used twice (assert_raises(ValueError, classifier.decision_function, X.T) is stated twice). I guess the rest is fine. WDYT?

@jnothman
Copy link
Member
jnothman commented Aug 27, 2017 via email

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start, thanks, but I think we can afford to be more specific and hold the hands of developers.

@@ -760,7 +763,10 @@ def _check_transformer(name, transformer_orig, X, y):
# raises error on malformed input for transform
if hasattr(X, 'T'):
# If it's not an array, it does not have a 'T' property
assert_raises(ValueError, transformer.transform, X.T)
with assert_raises(ValueError, msg="The transformer {name} does "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By malformed here do we mean "has a different number of features to when fitting"?

@@ -853,7 +859,10 @@ def check_estimators_empty_data_messages(name, estimator_orig):
X_zero_samples = np.empty(0).reshape(0, 3)
# The precise message can change depending on whether X or y is
# validated first. Let us test the type of exception only:
assert_raises(ValueError, e.fit, X_zero_samples, [])
with assert_raises(ValueError, msg="The estimators {name} does not"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Estimators -> estimator

@@ -988,7 +997,12 @@ def check_estimators_partial_fit_n_features(name, estimator_orig):
except NotImplementedError:
return

assert_raises(ValueError, estimator.partial_fit, X[:, :-1], y)
with assert_raises(ValueError,
msg="The estimators {name} does not raise an"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Estimators -> estimator

@@ -1092,7 +1106,10 @@ def check_classifiers_train(name, classifier_orig):
X -= X.min()
set_random_state(classifier)
# raises error on malformed input for fit
assert_raises(ValueError, classifier.fit, X, y[:-1])
with assert_raises(ValueError, msg="The classifers {name} does not"
" raise an error when incorrect/malformed input "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be more specific about the malformation

@@ -1092,7 +1106,10 @@ def check_classifiers_train(name, classifier_orig):
X -= X.min()
set_random_state(classifier)
# raises error on malformed input for fit
assert_raises(ValueError, classifier.fit, X, y[:-1])
with assert_raises(ValueError, msg="The classifers {name} does not"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classifiers -> classifier

@@ -1106,7 +1123,10 @@ def check_classifiers_train(name, classifier_orig):
assert_greater(accuracy_score(y, y_pred), 0.83)

# raises error on malformed input for predict
assert_raises(ValueError, classifier.predict, X.T)
with assert_raises(ValueError, msg="The classifers {name} does not"
" raise an error when incorrect/malformed input "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be more specific about the malformation

@@ -1122,11 +1142,14 @@ def check_classifiers_train(name, classifier_orig):
assert_array_equal(np.argmax(decision, axis=1), y_pred)

# raises error on malformed input
assert_raises(ValueError,
classifier.decision_function, X.T)
with assert_raises(ValueError, msg="Malformed inputs"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not addressed yet

@thechargedneutron
Copy link
Contributor Author

Addded the suggested changes.
Also, I don't think we can use some other method in place of the duplicate assert statements. We already tested for a deformed X, no need to test it for other malformed version of X. WDYT?

@jnothman jnothman changed the title [MRG] Adds helpful messages in all Assertions in estimator_checks [MRG+1] Adds helpful messages in all error assertions in estimator_checks Aug 28, 2017
@thechargedneutron
Copy link
Contributor Author

Changes done. @jnothman Thanks a lot!! :)

@@ -760,7 +763,13 @@ def _check_transformer(name, transformer_orig, X, y):
# raises error on malformed input for transform
if hasattr(X, 'T'):
# If it's not an array, it does not have a 'T' property
assert_raises(ValueError, transformer.transform, X.T)
with assert_raises(ValueError, msg="The classifer {name} does not "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*classifier

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But should be "transformer" right? "The transformer {} does not raise an error when the number of features in transform is different from the number of features in fit"?

msg="The estimator {name} does not raise an"
" error when number of features changes "
"between calls to partial_fit. Perhaps"
" use check_X_y"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does check_X_y relate to this?

" raise an error when incorrect/malformed input "
"data for fit is passed. Number of training exam"
"ples is not the same as the number of "
"labels. Perhapse use check_array"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_X_y

" raise an error when incorrect/malformed input "
" for predict is passed. Number of features in p"
"redict dataset does not match the number of fea"
"tures in fit dataset. Perhaps use check_array"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think check_array helps here.

" for predict is passed. Number of features in "
"predict dataset does not match the number of f"
"eatures in fit dataset."
" Perhaps use check_array"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think check_array helps here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be fine if I only replace this statement with the one you mentioned "The transformer {} does not raise an error when the number of features in transform is different from the number of features in fit" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure (though you want to keep the name).

classifier.decision_function, X.T)
with assert_raises(ValueError, msg="The classifer {name} does "
"not raise an error when incorrect/malforme"
"d input for decision_function is passed. N"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really try to make this a bit shorter and maybe make it one sentence. The context of the second sentence is slightly confusing to me. Again, I don't think check_array helps here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" The classifier {} does not raise an error when number of features is inconsistent "
How about this? I am not much familiar with error messages, so sorry for this :(

" raise an error when incorrect/malformed inpu"
"t for predict_proba is passed. Number of fea"
"tures in predict dataset does not match the n"
"umber of features in fit dataset. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

" raise an error when incorrect/malformed input "
"data for fit is passed. Number of training exam"
"ples is not the same as the number of "
"labels. Perhapse use check_array"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_X_y

@thechargedneutron
Copy link
Contributor Author

@amueller Changes added. Kindly suggest if any more change is required.

with assert_raises(ValueError,
msg="The estimator {name} does not raise an"
" error when number of features changes "
"between calls to partial_fit. Perhaps"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps?

with assert_raises(ValueError, msg="The classifier {name} does not"
" raise an error when the number of features "
"in predict is different from the number of"
" features in fit"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all sentences should end with full stop

@thechargedneutron
Copy link
Contributor Author

@amueller Also, we have redundant assert_raises statement like:

with assert_raises(ValueError, msg="Malformed inputs"):
                    classifier.decision_function(X.T)
# raises error on malformed input for decision_function
with assert_raises(ValueError, msg="The classifier {name} does"
                               " not raise an error when the number of fea"
                               "tures in decision_function is different "
                               "from the number of features in fit."):
               classifier.decision_function(X.T)

Should I remove the extra same statement? I don't think they serve any special purpose.

@amueller
Copy link
Member

yes. they were not redundant in master, right?

@thechargedneutron
Copy link
Contributor Author

Yes, they were. Here's the code copied from master

 # raises error on malformed input
assert_raises(ValueError, classifier.predict_proba, X.T)
# raises error on malformed input for predict_proba
assert_raises(ValueError, classifier.predict_proba, X.T)

@thechargedneutron
Copy link
Contributor Author

@amueller Hope this works!!

@amueller
Copy link
Member

looks good :)
you could add tests to test_check_estimator but maybe that's overkill.

@amueller
Copy link
Member

+1 from me but maybe @jnothman wants to have another look.

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise looks good

@@ -688,7 +688,10 @@ def check_transformers_unfitted(name, transformer):
X, y = _boston_subset()

transformer = clone(transformer)
assert_raises((AttributeError, ValueError), transformer.transform, X)
with assert_raises((AttributeError, ValueError), msg="The unfitted "
"transformer {name} does not raise an error when "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I missed it. Does this {name} thing work??? Surely we need to format the string...

@thechargedneutron
Copy link
Contributor Author

@jnothman Changes added. Kindly review. Hope this works!!

@@ -688,7 +688,11 @@ def check_transformers_unfitted(name, transformer):
X, y = _boston_subset()

transformer = clone(transformer)
assert_raises((AttributeError, ValueError), transformer.transform, X)
with assert_raises((AttributeError, ValueError), msg="The unfitted "
"transformer {} does not raise an error when tra"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please please please, why would you break the line in the middle of words 🙃 ? Can you fix that everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to minimise the number of lines.. sorry :( .. will be done in 5 minutes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lesteve Added the changes. Kindly review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that was rather creative but there is nothing to be sorry about ;-).

Copy link
Member
@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments with improvements to the error messages.

with assert_raises((AttributeError, ValueError), msg="The unfitted "
"transformer {} does not raise an error when "
"transform is called. Perhaps use "
"check_is_fitted.".format(name)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be more explicit: "use check_is_fitted in transform"

with assert_raises(ValueError, msg="The estimator {} does not"
" raise an error when an empty data is used "
"to train. Perhaps use "
"check_array.".format(name)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use check_array in train.

assert_raises(ValueError, estimator.partial_fit, X[:, :-1], y)
with assert_raises(ValueError,
msg="The estimator {} does not raise an"
" error when number of features changes "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the number of features

assert_raises(ValueError, classifier.fit, X, y[:-1])
with assert_raises(ValueError, msg="The classifer {} does not"
" raise an error when incorrect/malformed input "
"data for fit is passed. Number of training "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of training examples

" raise an error when incorrect/malformed input "
"data for fit is passed. Number of training "
"examples is not the same as the number of "
"labels. Perhapse use check_X_y.".format(name)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps (without the e at the end) use check_X_y in fit.

assert_raises(ValueError, regressor.fit, X, y[:-1])
with assert_raises(ValueError, msg="The classifer {} does not"
" raise an error when incorrect/malformed input "
"data for fit is passed. Number of training "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of training

" raise an error when incorrect/malformed input "
"data for fit is passed. Number of training "
"examples is not the same as the number of "
"labels. Perhapse use check_X_y.".format(name)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in Perhaps
use check_X_y in fit

@thechargedneutron
Copy link
Contributor Author

@lesteve changes added.

@lesteve
Copy link
Member
lesteve commented Aug 29, 2017

LGTM, I'll leave a few hours before merging in case @jnothman or @amueller want to have a quick look.

@lesteve lesteve changed the title [MRG+1] Adds helpful messages in all error assertions in estimator_checks [MRG+2] Adds helpful messages in all error assertions in estimator_checks Aug 29, 2017
@lesteve
Copy link
Member
lesteve commented Aug 29, 2017

Thanks a lot, merging!

@lesteve lesteve merged commit 9e606bf into scikit-learn:master Aug 29, 2017
@amueller
Copy link
Member
amueller commented Aug 29, 2017 10000

thanks!

@thechargedneutron
Copy link
Contributor Author

Thanks a lot @lesteve @amueller . Feels good to contribute with such a great mentor support. :)

@jnothman
Copy link
Member
jnothman commented Aug 29, 2017 via email

@thechargedneutron thechargedneutron deleted the fourth branch August 30, 2017 07:59
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

All assertions in estimator_checks should have a helpful message
4 participants
0