-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT Allow partial param validation for functions #25087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT Allow partial param validation for functions #25087
Conversation
use non_negative_factorization as example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
I am okay with using regex to replace the estimator name with the function name.
sklearn/utils/_param_validation.py
Outdated
@@ -16,6 +17,12 @@ | |||
from .validation import _is_arraylike_not_scalar | |||
|
|||
|
|||
class InvalidParameterError(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to InvalidParameterError
may break existing code that is catching ValueError
. To be backward compatible, I think this needs to inherit from ValueError
:
class InvalidParameterError(Exception): | |
class InvalidParameterError(ValueError): |
With this inheritance, a lot of the tests can remain the same and look for ValueError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two aspects here:
- we introduce the parameter validation only in 1.2 so we did not break code of people using the framework
- we already broke the code of people if we search for the equivalence between the framework and manual checks. In
imbalanced-learn
we were checking for strings and sometimes we differed fromValueError
toTypeError
.
I am wondering if inheriting from both ValueError
and TypeError
would be better then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok to inherit from TypeError and ValueError for backward compat. I made this change. Then I think we need to backport this into 1.2.X.
However I prefer to keep looking for InvalidParameterError in our test suite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am +1 with this change. I don't have a strong opinion regarding the type of error raised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a minor comment, otherwise this PR LGTM. I agree that the exception type
8000
makes this nice to have for 1.2.X
.
As we add more parameter validation to functions, we will run into the reverse situation, where the estimator validates a parameter and calls a function that validates the same parameter. This will also end up with duplicate parameter constraints between the estimator and function. (This point is not a blocker for this PR)
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py
Show resolved
Hide resolved
LGTM. Thanks @jeremiedbb |
Fixes #25058
We have public functions that are just wrappers around estimators, e.g
k_means
,fast_ica
,non_negative_factorization
,dict_learning_online
, ... Adding full param validation for these estimators is not satisfying because it means that we need to keep in sync the dict of constraints in the class and in the function.Current common tests ensure that we have as many constraints as parameters. To allow partial validation, this PR proposes to split the test for public functions, one for normal functions and one for functions that are wrappers around estimators. For the later, we don't enforce that the constraint dict of the function has as many entries as the number of params. Instead, we join the dict of the class and the dict of the function and make sure that at least one of them raises the appropriate error. This way, the function can delegate validation to the class for some parameters.
The issue with delegating validation to the class is that the error message mentions the name of the estimator instead of the name of the function which can be confusing. To avoid that, we catch the error raised and switch the estimator name by the function name in the error message. To make sure that we can only catch the error from param validation, I think it's better to change the
ValueError
that we used until now by a custom exceptionInvalidParameterError
.This PR also implements this for
non_negative_factorization
to show how it works. It already had the double validation but was not in the list of tested functions.cc/ @glemaitre @thomasjpfan @adrinjalali