8000 Setting search parameters on estimators · Issue #5082 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Setting search parameters on estimators #5082
Open
@jnothman

Description

@jnothman

The underscore notation for specifying grid search parameters is unwieldy, because adding a layer of indirection in the model (e.g. a Pipeline wrapping an estimator you want to search parameters on) means prefixing all corresponding parameters.

We should be able to specify parameter searches using the estimator instances. The interface proposed by @amueller at #4949 (comment) (and elsewhere) suggests a syntax like:

char_vec = CountVectorizer(analyzer="char").search_params(n_gram_range=[(3, 3), (3, 5), (5, 5)])
word_vec = CountVectorizer().search_params(n_gram_range=[(1, 1), (1, 2), (2, 2)])
svc = LinearSVC().search_params(C=[0.001, 0.1, 10, 100])
GridSearchCV(make_pipeline(make_feature_union(char_vec, word_vec), svc), cv=..., scoring=...).fit(X, y)

Calling search_params would presumably set an instance attribute on the estimator to record the search information.

Questions of fine semantics that need to be clarified for this approach include:

  1. does a call to search_params overwrite all previous settings for that estimator?
  2. does clone maintain the prior search_params?
  3. should this affect the search space of specialised CV objects (e.g. LassoCV)

Questions of functionality include:

a) is RandomizedSearchCV supported by merely making one of the search spaces a scipy.stats rv, making some searches GridSearchCV-incompatible?
b) is there any way to support multiple grids, as is currently allowed in GridSearchCV?

I have proposed an alternative syntax that still avoids problems with underscore notation, and does not have the above issues, but is less user-friendly than the syntax above:

char_vec = CountVectorizer(analyzer="char")
word_vec = CountVectorizer()
svc = LinearSVC()
param_grid = {(char_vec, 'n_gram_range'): [(3, 3), (3, 5), (5, 5)],
              (word_vec, 'n_gram_range'): [(1, 1), (1, 2), (2, 2)],
              (svc, 'C'): [0.001, 0.1, 10, 100]}
GridSearchCV(make_pipeline(make_feature_union(char_vec, word_vec), svc),
             param_grid,
             cv=..., scoring=...).fit(X, y)

Here, parameters are specified as a pair of (estimator, parameter name), but they are constructed directly as a grid and passed to GridSearchCV/RandomizedSearchCV

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0