-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
API split "model" parameters from "config" params. #17566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree, having an estimator tag,
Can you provide a code snippet on what the configure args will look like? |
It's really the same as clf = SVC(kernel='rbf').configure(verbose=2, cache_size=300) We could potentially have a decorator which sets the |
It seems to me that this add user-facing complexity, while its original motivation (#17441 (comment)) is really related to solving an internal problem.
I fear that it will decrease usability for little gain for the user. The loss of usability will come from making things more complex (two methods instead of one), and from having to keep in mind a method that is not a universal standard (configure instead of __init__).
|
I have the feeling that the distinction between config and hyper params will have a grey zone. For instance, for the |
I feel like we might have additional motivations to #17441 (comment) for having model/config params. Distinguishing between those parameters that may impact a model's predictions from those that can't can be valuable to users who don't really know where to start. Also we might want to consider merging this discussion with that of having automatic parameter search spaces? |
I agree that having a way to distinguish between hyperparams and config params is valuable. I think we can have that without removing them from the init, keeping the current simplicity of the API. We could have a dict as class attribute of all estimators which stores for each param what type and values it can take and if it's a hyper param or not. This way we can easily:
|
Some of that can probably be done with type annotations, as well. |
This is a documentation issue, not an API issue. Now that we are using kwargs, I think it would be good to change numpydoc so that a project can have custom "* Parameters" sections. The "Parameters" section made sense when everything was positional. |
I think it will be quite hard to agree on a definition. @thomasjpfan and me discussed this to some length because of the hyper-parameter specification project he's working on. I think having better documentation and/or machine-readable versions while still keeping it in |
Another way to look at it, is common config vs advanced config. We're increasingly adding functionalities which would be easier to implement if we didn't have to make the choice of adding an It would also make it easier for users to start with the library since they wouldn't be flooded with information on Another way it would benefit users would be that we'd be implementing those features faster and therefore the users having access to them sooner, so it's not just us not putting the effort of adding them to every single It would also give third party developers more flexibility since this is an area where many external estimators don't comply with our API. Having a way for them to comply would make the library more extendable. As a start, we don't have to remove anything from our |
Thank you for opening this issue. I can share a bit of our experience in the D3M program where we are working on AutoML tools, including use of sklearn for AutoML. I think distinguishing model and config parameters is useful, but as already noted in other comments, it is hard to draw a line and some parameters are both. To address all that, we allow parameters to belong to more than one category. Moreover, we in fact have multiple categories of parameters:
We have few more, but they specific to our needs (like metafeature parameter, which controls which metafeatures do we compute while running the program). And then each parameter has a list of categories the author of the model provides. And I think this is important point/realization here. It is OK if things are hard to define. What this just tells what author of the model thinks is the main category or categories this parameter will be used as. It does not have to be perfect. User can ignore it. Or determine their own interpretation. But it is useful also to capture author's intention/assumption. Which leads to my last point, that personally, i would keep all of them inside So all for having this extra metadata, even if imperfect. But make it optional. Maybe it can just be rendered in documentation as tags next to the parameter name. |
Really helpful input. I would still like them to be grouped in numpydoc sections if possible... Or maybe we just need to be able to display them with bootstrap-style labels as we have done with our change log. Maybe it's easy to just add such tags to each parameter docstring and then be able to extract such annotations... Or make it machine readable first. |
Hm, grouping is hard if parameter belongs to multiple categories. :-) And yes, let's make it machine readable first. |
If you had sphinx-friendly shorthand in docstrings, like |
well I'm okay with including the categorisation of parameters as
annotations on the params, but that makes the annotations very heavy...
|
Resolves scikit-learn#17566 more or less
We can think of our estimators' parameters as two different types:
nu
or the number of trees in a random forest.We could split these two types of parameters and have model hyperparams as
__init__
args and the rest asset_params
or a newconfigure
args. This would also allow us to introduce new configuration parameters to all estimators w/o having to add another arg to__init__
. An example would be the routing parameter required for sample props implementation, or potentially controlling log level at the estimator level.My proposal would be to have a
config_params
estimator tag which includes the names of those parameters, and haveclone
,get_params
andset_params
understand them, and potentially introduce a newconfigure
method to set those parameters.This also would mean partially relaxing the set only
__init__
params in__init__
requirement since__init__
would be allowed to set those extra params.cc: @scikit-learn/core-devs
also related to: #17441 (comment)
The text was updated successfully, but these errors were encountered: