-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Notes on warmstarting GridSearchCV and SuccessiveHalving #15125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is a complex issue... I'm happy to chat if anyone wants to discuss it. |
That's not true. Especially since we sample the parameters at the beginning, we can take the sample, and the sort them according to what's needed to do warm start, and then continue.
could you please elaborate? I think it shouldn't be too hard to handle both cases if we support something like
Why would this make it more tricky?
we still probably want the user to be able to enable/disable it. Thinking about this one, and the issue of pipeline refitting the whole pipeline when not necessary, it increasingly seems to me that some tweaks to the |
I don't think we need to change anything to clone regarding the issues discussed here
Supporting warm-start in SH requires
This way, the SH class can pass in the survivors for the next iteration. That's what I mean by "hacky and not backward compatible."
What I mean is that unless an estimator can warm-start at least 2 things, the combination of SH warmstart + GS warmstart is not an issue, since the GS has nothing left to warm-start. But I'm actually wrong... when running SH (regardless of warm-starting SH), we still don't want GS to do any warmstart. |
I agree that warm starting with I checked the SH PR out, and I understand the issue better now.
I'm not sure why we allow users to have their own Generally, in order to support Therefore we either need to not clone the estimator before fit in GS and follow a warm startable path of parameters, or change the cloning mechanism there. My understanding is that the above concerns don't apply to SH since you do/can follow a warm startable path of parameters and keep the last one only, but you need to tell the Now, do you think this statement is correct?
We could also have a call and discuss the issues. |
Indeed, the main constraints are imposed by |
Throwing some thoughts on warm starting, grid search and successive halving (#13900 ). In particular: warm-starting in grid search is very different from warm-starting in SH. Details below.
I think this could be of interest to @jnothman , @amueller, @adrinjalali and @ogrisel .
Warm start in GridSearchCV
GridSearchCV currently does not support warm-starting, which is a shame since it wastes resources. Supporting it would allow to get rid of the EstimatorCV objects.
Consider the folloging param_grid, where b is a warmstartable parameter:
The trick proposed by @jnothman in #8230 is to transform the list generated by
ParameterGrid
fromto
This way, in
evaluate_candidates()
, instead of cloning the estimator 4 times (once per dict), we only clone it twice (once for per sublist, where warm_start can be leveraged).The transformation isn't necessarily obvious, especially considering that the values of 'a' aren't hashable in general.
From a (private) API point of view, there are many nasty ways of doing it, I couldn't come up with a clean version so far.
Warm start in RandomizedSearchCV
RandomizedSearchCV
simply cannot support warm starting. It doesn't make sense in this case since the non-warmstartable parameters are sampled at random, and there is no way to construct groups of parameters that can be warmstarted together.Warm start in SuccessiveHalving
Warm start in SH is possible (for both the grid and random version), but it is completely different from the way we can warm-start a
GridSearchCV
.Let's say we budget on the number of trees of a GBDT and
n_trees
is a warmstartable parameter (I'm usingn_trees
instead ofmax_iter
to avoid confusion with the iterations of the SH process).Warm-starting here consists of re-using at SH iteration
i + 1
the estimators that were ran at SH iterationi
, since the only parameter that differs between them isn_trees
:The dashed lines represent the re-use of an estimator from one SH iteration to the next.
I hope it's clear that this kind of warm-starting is very different from the one that
GridSearchCV
can leverage.Technically:
GridSearchCV
supports warm-starting for a single call toevaluate_candidates(candidate_params)
. The warm-starting is done from one candidate to another.SH
supports warm-starting for multiple successive calls toevaluate_candidates(candidate_params)
.That version should be relatively easy to implement in a very hacky and non-backward compatible way. But a clean version would require much more work and possibly require a whole new API to
BaseSearchCV
.Supporting both kinds of warm-starting
(this is a digression)
Since the nature of the warmstarting is so different, I think it should be possible to support both (i.e. both GS and SH).
However, if we ever support both, we should definitely deactivate the GS warmstart when doing SH warmstart: doing GS warm-start at a given SH iteration would mean that an estimator
x
would "become" another estimatory
(meaning, not being cloned) because of the GS warm-start. But if that estimatorx
is one of the survivors for iterationi + 1
, tough luck: it's lost.In any case, that's not even an issue unless multiple parameters can be warm-started on the same estimator which is rare.
New warm-start API???
It looks like it was decided during the sprint to support a new warm-start API for estimators, introducing a new
fit
parameterwarm_start_with
(#8230 (comment))The transition from the old API to the new one should be pretty straightforward. I implemented a basic version for GBDTs in #15105
This new API allows tools like GS and SH to automatically leverage warm-starting, without needing the user to explicitly ask for it.
The text was updated successfully, but these errors were encountered: