Description
The underscore notation for specifying grid search parameters is unwieldy, because adding a layer of indirection in the model (e.g. a Pipeline
wrapping an estimator you want to search parameters on) means prefixing all corresponding parameters.
We should be able to specify parameter searches using the estimator instances. The interface proposed by @amueller at #4949 (comment) (and elsewhere) suggests a syntax like:
char_vec = CountVectorizer(analyzer="char").search_params(n_gram_range=[(3, 3), (3, 5), (5, 5)])
word_vec = CountVectorizer().search_params(n_gram_range=[(1, 1), (1, 2), (2, 2)])
svc = LinearSVC().search_params(C=[0.001, 0.1, 10, 100])
GridSearchCV(make_pipeline(make_feature_union(char_vec, word_vec), svc), cv=..., scoring=...).fit(X, y)
Calling search_params
would presumably set an instance attribute on the estimator to record the search information.
Questions of fine semantics that need to be clarified for this approach include:
- does a call to
search_params
overwrite all previous settings for that estimator? - does
clone
maintain the priorsearch_params
? - should this affect the search space of specialised CV objects (e.g.
LassoCV
)
Questions of functionality include:
a) is RandomizedSearchCV
supported by merely making one of the search spaces a scipy.stats
rv, making some searches GridSearchCV
-incompatible?
b) is there any way to support multiple grids, as is currently allowed in GridSearchCV
?
I have proposed an alternative syntax that still avoids problems with underscore notation, and does not have the above issues, but is less user-friendly than the syntax above:
char_vec = CountVectorizer(analyzer="char")
word_vec = CountVectorizer()
svc = LinearSVC()
param_grid = {(char_vec, 'n_gram_range'): [(3, 3), (3, 5), (5, 5)],
(word_vec, 'n_gram_range'): [(1, 1), (1, 2), (2, 2)],
(svc, 'C'): [0.001, 0.1, 10, 100]}
GridSearchCV(make_pipeline(make_feature_union(char_vec, word_vec), svc),
param_grid,
cv=..., scoring=...).fit(X, y)
Here, parameters are specified as a pair of (estimator, parameter name), but they are constructed directly as a grid and passed to GridSearchCV
/RandomizedSearchCV