Description
Throwing some thoughts on warm starting, grid search and successive halving (#13900 ). In particular: warm-starting in grid search is very different from warm-starting in SH. Details below.
I think this could be of interest to @jnothman , @amueller, @adrinjalali and @ogrisel .
Warm start in GridSearchCV
GridSearchCV currently does not support warm-starting, which is a shame since it wastes resources. Supporting it would allow to get rid of the EstimatorCV objects.
Consider the folloging param_grid, where b is a warmstartable parameter:
param_grid = {
'a': [1, 2],
'b': [3, 4]
}
The trick proposed by @jnothman in #8230 is to transform the list generated by ParameterGrid
from
[{'a': 1, 'b': 3}, {'a': 1, 'b': 4}, {'a': 2, 'b': 3}, {'a': 2, 'b': 4}]
to
[[{'a': 1, 'b': 3}, {'a': 1, 'b': 4}],
[{'a': 2, 'b': 3}, {'a': 2, 'b': 4}]]
This way, in evaluate_candidates()
, instead of cloning the estimator 4 times (once per dict), we only clone it twice (once for per sublist, where warm_start can be leveraged).
The transformation isn't necessarily obvious, especially considering that the values of 'a' aren't hashable in general.
From a (private) API point of view, there are many nasty ways of doing it, I couldn't come up with a clean version so far.
Warm start in RandomizedSearchCV
RandomizedSearchCV
simply cannot support warm starting. It doesn't make sense in this case since the non-warmstartable parameters are sampled at random, and there is no way to construct groups of parameters that can be warmstarted together.
Warm start in SuccessiveHalving
Warm start in SH is possible (for both the grid and random version), but it is completely different from the way we can warm-start a GridSearchCV
.
Let's say we budget on the number of trees of a GBDT and n_trees
is a warmstartable parameter (I'm using n_trees
instead of max_iter
to avoid confusion with the iterations of the SH process).
Warm-starting here consists of re-using at SH iteration i + 1
the estimators that were ran at SH iteration i
, since the only parameter that differs between them is n_trees
:
The dashed lines represent the re-use of an estimator from one SH iteration to the next.
I hope it's clear that this kind of warm-starting is very different from the one that GridSearchCV
can leverage.
Technically:
GridSearchCV
supports warm-starting for a single call toevaluate_candidates(candidate_params)
. The warm-starting is done from one candidate to another.SH
supports warm-starting for multiple successive calls toevaluate_candidates(candidate_params)
.
That version should be relatively easy to implement in a very hacky and non-backward compatible way. But a clean version would require much more work and possibly require a whole new API to BaseSearchCV
.
Supporting both kinds of warm-starting
(this is a digression)
Since the nature of the warmstarting is so different, I think it should be possible to support both (i.e. both GS and SH).
However, if we ever support both, we should definitely deactivate the GS warmstart when doing SH warmstart: doing GS warm-start at a given SH iteration would mean that an estimator x
would "become" another estimator y
(meaning, not being cloned) because of the GS warm-start. But if that estimator x
is one of the survivors for iteration i + 1
, tough luck: it's lost.
In any case, that's not even an issue unless multiple parameters can be warm-started on the same estimator which is rare.
New warm-start API???
It looks like it was decided during the sprint to support a new warm-start API for estimators, introducing a new fit
parameter warm_start_with
(#8230 (comment))
The transition from the old API to the new one should be pretty straightforward. I implemented a basic version for GBDTs in #15105
This new API allows tools like GS and SH to automatically leverage warm-starting, without needing the user to explicitly ask for it.