-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Init parameter check of estimator_checks overly restrictive #17756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The way to go would be: from keras.optimizers import SGD
from sklearn.base import BaseEstimator
from sklearn.utils.estimator_checks import check_parameters_default_constructible
class MyEstimator(BaseEstimator):
def __init__(self, optimizer=None, **kwargs):
self.optimizer=optimizer
def fit(self, X, y):
if self.optimizer is None:
self.optimizer_ = SGD()
else:
self.optimizer_ = clone(self.optimizer) # or make a deep copy to not change the passed optimizer
check_parameters_default_constructible("default_constructible", MyEstimator) |
That would be a workaround for the test, but why would that be better? It is less readable (implicit |
A agree that that validation list seems a bit arbitrary. Currently it would pass with int, but fail with np.int32, pass with np.float64 but fail with np.float32. Maybe we could check for |
I have already made some changes locally to produce better error messages and make the test failures a bit more actionable & explanatory. Should I add those relaxations to it and make a PR? |
Let's wait for one more opinion to confirm (@NicolasHug ?) (maybe I'm forgetting some use case) and then a PR would be very welcome. |
This part of the check feels very restrictive. I believe the point is to make sure that the estimator can be constructed by completely based on the This feel like this can get more complicated if the parameters start to depend on each other. For example in the original post: my_estimator = MyEstimator(optimizer=MySGD, optimizer__lr=1e-5) What if |
Its also to make sure all parameters are immutable, which is why it makes sense to forbid arbitrary objects such an initialized
Why does that matter? As long as |
In this context of optimizers, there are optimizers that are require some parameters during MyEstimator(optimizer= CyclicLR, optimizer__base_lr=1e-5, optimizer__ max_lr=1e-4) MyEstimator can construct |
But that means |
How would a check know that |
That is a valid point, but I would argue that this goes beyond default constructibility. The Estimator is default constructible. It may break at |
Okay lets move forward with the suggestion by @rth #17756 (comment) to be more accepting of defaults. |
We now allow any "type" (uninitialized classes) and all numeric numpy types. See scikit-learn#17756 for a discussion.
We now allow any "type" (uninitialized classes) and all numeric numpy types. See scikit-learn#17756 for a discussion.
We now allow any "type" (uninitialized classes) and all numeric numpy types. See scikit-learn#17756 for a discussion.
We now allow any "type" (uninitialized classes) and all numeric numpy types. See scikit-learn#17756 for a discussion.
We now allow any "type" (uninitialized classes) and all numeric numpy types. See scikit-learn#17756 for a discussion.
The PR is now ready for review: #17936 |
Describe the bug
I'm working on a custom estimator that needs an optimizer as one of its hyperparameters. I'm using the pattern of passing in an uninitialized optimizer and its parameters separately, i.e.
which will then initialize the optimizer as needed. I'm attempting to verify this with the
check_parameters_default_constructible
test, but it fails due to this assertion:scikit-learn/sklearn/utils/estimator_checks.py
Line 2624 in cf4fa5c
which restricts type arguments to numpy floats or numpy ints. I don't understand why this restriction is necessary. With some digging through the history, I found that the general type checks were added in 90d5ef1 to verify that all parameters are of immutable types. This was then extended by the restriction on type parameters in ab2f539, as a drive-by change without explanation. Since types are immutable, I don't understand why this was necessary.
I'm not sure if this is a bug or I am missing something. CC @amueller who added that check originally.
Steps/Code to Reproduce
Define an estimator such as
Expected Results
The test passes since the estimator is default constructible and all arguments are of immutable type (the optimizer itself is of course not immutable, but the type is).
Actual Results
Versions
(the relevant code hasn't changed in current sklearn master though).
The text was updated successfully, but these errors were encountered: