Initial implementation of GridFactory #21784

jnothman · 2021-11-25T00:21:15Z

Reference Issues/PRs

Fixes #19045

What does this implement/fix? Explain your changes.

This implements a GridFactory with functionality comparable to searchgrid to allow easier specification of parameter search spaces, especially as a composite estimator is built.

Any other comments?

A complete version of this will:

import tests from searchgrid
adopt GridFactory in some examples
add a DistributionFactory for use with Randomized searches (this is tricky because currently Randomized searches do not support conditional search spaces, unless I'm much mistaken)
add factory support to *SearchCV (currently only GridSearchCV)
maybe add a function to interpret the parameter names in cv_results_ in a __-free way

I note that supporting a factory to be passed to GridSearchCV instead of a parameter grid doesn't save the user much code. Maybe we don't need to support it?

Fixes scikit-learn#19045

sklearn/model_selection/_search.py

glemaitre · 2021-11-25T09:48:47Z

I think that I did not comment on the issue but I really like what this factory remove as cognitive overhead on the user. The __ is problematic to explain when teaching.

I would be happy to review and give feedback whenever you want. Just ping me directly.

ogrisel · 2021-11-25T15:02:29Z

Wouldn't it make sense to add methods directly to the estimators them-selves?

>>> pipeline = make_pipeline(
...     PCA().with_search_space(n_components=[2, 5, 10, 20, 50, 100]),
...     LogisticRegression().with_search_space(C=np.logspace(-6, 6, num=13)),
... )
>>> GridSearchCV(pipeline, cv=5).fit(X, y).best_params_
{"pca__n_components"=100, "logisiticregression__C"=1e-3}

EDIT: One could even special-case pipelines to make it possible to accept tuple of estimators to express a disjunctions in the search space:

>>> pipeline = make_pipeline(
...     (
...         PCA().with_search_space(n_components=[2, 5, 10, 20, 50, 100]),
...         NMF().with_search_space(n_components=[2, 5, 10, 20, 50, 100]),
...     ),
...     LogisticRegression().with_search_space(C=np.logspace(-6, 6, num=13)),
... )
>>> GridSearchCV(pipeline, cv=5).fit(X, y).best_params_
{"nmf__n_components": 100, "logisiticregression__C": 1e-3}

If you call pipeline.fit(X, y) it would just use the first element of each tuple by default, but such disjunctive pipeline would primarily be meant to be tuned by grid search or random search.

This is a bit implicit, but given the gain in readability that might be worth it.

ogrisel · 2021-11-25T15:38:42Z

BTW, another source of inspiration to tackle this problem:

jnothman · 2021-11-25T20:45:25Z

Thanks for those references, I'll enjoy reading them.

I don't mind setting directly on the estimators, either with a method on the estimators or with an external function that just assumes it can write to a certain attribute of the estimator. Do we think this would be better?

glemaitre · 2021-11-25T21:34:05Z

Setting an attribute could be a way to go toward an API where estimators could define their own default hyper parameters space in the future?

jnothman · 2021-11-25T22:31:39Z

I agree it would probably be more intuitive to use.
It would require a SLEP.

jnothman · 2021-11-26T03:50:32Z

Neuraxle has some nice solutions, recognising some of the longstanding questionable design choices in Scikit-learn. One of our troubles is again going to be clone, although perhaps here you can calculate the whole param space without first cloning.

ogrisel · 2021-11-26T14:00:28Z

I agree. Let's put this API design on the agenda of the next dev meeting if you plan to attend @jnothman.

thomasjpfan · 2021-11-26T17:17:23Z

Setting attributes not in __init__ + clone issue was observed in the metadata routing PR. A possible solution is to introduce a attrs attribute that clone will always copy over. This is similar to the attrs attribute in pandas' DataFrames or xarray Datasets.

For this PR, we can use the attrs attributes to store the hyperparameter space.

amueller · 2021-11-29T21:50:11Z

add a DistributionFactory for use with Randomized searches (this is tricky because currently Randomized searches do not support conditional search spaces, unless I'm much mistaken)

They support lists of search spaces, which is the same as GridSearchCV, right?

amueller · 2021-11-29T21:53:54Z

Wouldn't it make sense to add methods directly to the estimators them-selves?

I think what I find slightly counter-intuitive about that is that with_search_space doesn't change the behavior of fit. But I think that's probably fine?
Overall something like this plus built-in search spaces is something I've wanted for yeeaarrs

Initial implementation of GridFactory

9bd4e46

Fixes scikit-learn#19045

jnothman requested a review from thomasjpfan November 25, 2021 00:21

github-actions bot added the module:model_selection label Nov 25, 2021

jnothman commented Nov 25, 2021

View reviewed changes

sklearn/model_selection/_search.py Outdated Show resolved Hide resolved

Revert frivolous black fixes

d25bbfb

thomasjpfan mentioned this pull request Nov 26, 2021

Better (__-free) ways to specify grid search hyperparameters #19045

Open

thomasjpfan mentioned this pull request Nov 30, 2021

__sklearn_clone__ protocol proposal #21838

Closed

ogrisel mentioned this pull request Jan 27, 2023

InteractionTransformer #25412

Open

jnothman mentioned this pull request Dec 29, 2023

SLEP016: parameter spaces on estimators scikit-learn/enhancement_proposals#62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Initial implementation of GridFactory #21784

Initial implementation of GridFactory #21784

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Initial implementation of GridFactory #21784

Are you sure you want to change the base?

Initial implementation of GridFactory #21784

Uh oh!

Conversation

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!