Notes on warmstarting GridSearchCV and SuccessiveHalving

@jnothman

Throwing some thoughts on warm starting, grid search and successive halving (#13900 ). In particular: warm-starting in grid search is very different from warm-starting in SH. Details below.

I think this could be of interest to @jnothman , @amueller, @adrinjalali and @ogrisel .

Warm start in GridSearchCV

GridSearchCV currently does not support warm-starting, which is a shame since it wastes resources. Supporting it would allow to get rid of the EstimatorCV objects.

Consider the folloging param_grid, where b is a warmstartable parameter:

param_grid = {
    'a': [1, 2],
    'b': [3, 4]
}

The trick proposed by @jnothman in #8230 is to transform the list generated by ParameterGrid from

[{'a': 1, 'b': 3}, {'a': 1, 'b': 4}, {'a': 2, 'b': 3}, {'a': 2, 'b': 4}]

to

[[{'a': 1, 'b': 3}, {'a': 1, 'b': 4}],
 [{'a': 2, 'b': 3}, {'a': 2, 'b': 4}]]

This way, in evaluate_candidates(), instead of cloning the estimator 4 times (once per dict), we only clone it twice (once for per sublist, where warm_start can be leveraged).

The transformation isn't necessarily obvious, especially considering that the values of 'a' aren't hashable in general.

From a (private) API point of view, there are many nasty ways of doing it, I couldn't come up with a clean version so far.

Warm start in RandomizedSearchCV

RandomizedSearchCV simply cannot support warm starting. It doesn't make sense in this case since the non-warmstartable parameters are sampled at random, and there is no way to construct groups of parameters that can be warmstarted together.

Warm start in SuccessiveHalving

Warm start in SH is possible (for both the grid and random version), but it is completely different from the way we can warm-start a GridSearchCV.

Let's say we budget on the number of trees of a GBDT and n_trees is a warmstartable parameter (I'm using n_trees instead of max_iter to avoid confusion with the iterations of the SH process).

Warm-starting here consists of re-using at SH iteration i + 1 the estimators that were ran at SH iteration i, since the only parameter that differs between them is n_trees:

The dashed lines represent the re-use of an estimator from one SH iteration to the next.

I hope it's clear that this kind of warm-starting is very different from the one that GridSearchCV can leverage.

Technically:

GridSearchCV supports warm-starting for a single call to evaluate_candidates(candidate_params). The warm-starting is done from one candidate to another.
SH supports warm-starting for multiple successive calls to evaluate_candidates(candidate_params).

That version should be relatively easy to implement in a very hacky and non-backward compatible way. But a clean version would require much more work and possibly require a whole new API to BaseSearchCV.

Supporting both kinds of warm-starting

(this is a digression)

Since the nature of the warmstarting is so different, I think it should be possible to support both (i.e. both GS and SH).

However, if we ever support both, we should definitely deactivate the GS warmstart when doing SH warmstart: doing GS warm-start at a given SH iteration would mean that an estimator x would "become" another estimator y (meaning, not being cloned) because of the GS warm-start. But if that estimator x is one of the survivors for iteration i + 1, tough luck: it's lost.

In any case, that's not even an issue unless multiple parameters can be warm-started on the same estimator which is rare.

New warm-start API???

It looks like it was decided during the sprint to support a new warm-start API for estimators, introducing a new fit parameter warm_start_with (#8230 (comment))

The transition from the old API to the new one should be pretty straightforward. I implemented a basic version for GBDTs in #15105

This new API allows tools like GS and SH to automatically leverage warm-starting, without needing the user to explicitly ask for it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Notes on warmstarting GridSearchCV and SuccessiveHalving #15125

Warm start in GridSearchCV

Warm start in RandomizedSearchCV

Warm start in SuccessiveHalving

Supporting both kinds of warm-starting

New warm-start API???

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Notes on warmstarting GridSearchCV and SuccessiveHalving #15125

Description

Warm start in GridSearchCV

Warm start in RandomizedSearchCV

Warm start in SuccessiveHalving

Supporting both kinds of warm-starting

New warm-start API???

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions