8000 Notes on warmstarting GridSearchCV and SuccessiveHalving · Issue #15125 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Notes on warmstarting GridSearchCV and SuccessiveHalving #15125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
NicolasHug opened this issue Oct 3, 2019 · 6 comments
Open

Notes on warmstarting GridSearchCV and SuccessiveHalving #15125

NicolasHug opened this issue Oct 3, 2019 · 6 comments

Comments

@NicolasHug
Copy link
Member

Throwing some thoughts on warm starting, grid search and successive halving (#13900 ). In particular: warm-starting in grid search is very different from warm-starting in SH. Details below.

I think this could be of interest to @jnothman , @amueller, @adrinjalali and @ogrisel .

Warm start in GridSearchCV

GridSearchCV currently does not support warm-starting, which is a shame since it wastes resources. Supporting it would allow to get rid of the EstimatorCV objects.

Consider the folloging param_grid, where b is a warmstartable parameter:

param_grid = {
    'a': [1, 2],
    'b': [3, 4]
}

The trick proposed by @jnothman in #8230 is to transform the list generated by ParameterGrid from

[{'a': 1, 'b': 3}, {'a': 1, 'b': 4}, {'a': 2, 'b': 3}, {'a': 2, 'b': 4}]

to

[[{'a': 1, 'b': 3}, {'a': 1, 'b': 4}],
 [{'a': 2, 'b': 3}, {'a': 2, 'b': 4}]]

This way, in evaluate_candidates(), instead of cloning the estimator 4 times (once per dict), we only clone it twice (once for per sublist, where warm_start can be leveraged).

The transformation isn't necessarily obvious, especially considering that the values of 'a' aren't hashable in general.

From a (private) API point of view, there are many nasty ways of doing it, I couldn't come up with a clean version so far.

Warm start in RandomizedSearchCV

RandomizedSearchCV simply cannot support warm starting. It doesn't make sense in this case since the non-warmstartable parameters are sampled at random, and there is no way to construct groups of parameters that can be warmstarted together.

Warm start in SuccessiveHalving

Warm start in SH is possible (for both the grid and random version), but it is completely different from the way we can warm-start a GridSearchCV.

Let's say we budget on the number of trees of a GBDT and n_trees is a warmstartable parameter (I'm using n_trees instead of max_iter to avoid confusion with the iterations of the SH process).

Warm-starting here consists of re-using at SH iteration i + 1 the estimators that were ran at SH iteration i, since the only parameter that differs between them is n_trees:

IMG_20191003_173112  | height=10

The dashed lines represent the re-use of an estimator from one SH iteration to the next.

I hope it's clear that this kind of warm-starting is very different from the one that GridSearchCV can leverage.

Technically:

  • GridSearchCV supports warm-starting for a single call to evaluate_candidates(candidate_params). The warm-starting is done from one candidate to another.
  • SH supports warm-starting for multiple successive calls to evaluate_candidates(candidate_params).

That version should be relatively easy to implement in a very hacky and non-backward compatible way. But a clean version would require much more work and possibly require a whole new API to BaseSearchCV.

Supporting both kinds of warm-starting

(this is a digression)

Since the nature of the warmstarting is so different, I think it should be possible to support both (i.e. both GS and SH).

However, if we ever support both, we should definitely deactivate the GS warmstart when doing SH warmstart: doing GS warm-start at a given SH iteration would mean that an estimator x would "become" another estimator y (meaning, not being cloned) because of the GS warm-start. But if that estimator x is one of the survivors for iteration i + 1, tough luck: it's lost.

In any case, that's not even an issue unless multiple parameters can be warm-started on the same estimator which is rare.

New warm-start API???

It looks like it was decided during the sprint to support a new warm-start API for estimators, introducing a new fit parameter warm_start_with (#8230 (comment))

The transition from the old API to the new one should be pretty straightforward. I implemented a basic version for GBDTs in #15105

This new API allows tools like GS and SH to automatically leverage warm-starting, without needing the user to explicitly ask for it.

@NicolasHug
Copy link
Member Author

This is a complex issue... I'm happy to chat if anyone wants to discuss it.

@adrinjalali
Copy link
Member

RandomizedSearchCV simply cannot support warm starting. It doesn't make sense in this case since the non-warmstartable parameters are sampled at random, and there is no way to construct groups of parameters that can be warmstarted together.

That's not true. Especially since we sample the parameters at the beginning, we can take the sample, and the sort them according to what's needed to do warm start, and then continue.

That version should be relatively easy to implement in a very hacky and non-backward compatible way. But a clean version would require much more work and possibly require a whole new API to BaseSearchCV.

could you please elaborate? I think it shouldn't be too hard to handle both cases if we support something like estimator.clone(for_warm_start=True), which could be used in certain places in SH. Or possibly I'm not understanding what you mean.

In any case, that's not even an issue unless multiple parameters can be warm-started on the same estimator which is rare.

Why would this make it more tricky?

This new API allows tools like GS and SH to automatically leverage warm-starting, without needing the user to explicitly ask for it.

we still probably want the user to be able to enable/disable it.

Thinking about this one, and the issue of pipeline refitting the whole pipeline when not necessary, it increasingly seems to me that some tweaks to the clone API may help both issues.

@NicolasHug
Copy link
Member Author

About warmstarting a Random search:

This is hard to explain succinctly... Hope this picture will make it clearer. To warmstart a parameter b, you need some structure/regularity on the rest of the parameters. Such structure doesn't exists in general in RandomizedSearchCV.

IMG_20191004_100449

@NicolasHug
Copy link
Member Author

it increasingly seems to me that some tweaks to the clone API may help both issues.

I don't think we need to change anything to clone regarding the issues discussed here

could you please elaborate? ... Or possibly I'm not understanding what you mean.

Supporting warm-start in SH requires evaluate_candidates() to:

  • accept estimators as input (and not clone them)
  • return the fitted estimators

This way, the SH class can pass in the survivors for the next iteration.

That's what I mean by "hacky and not backward compatible."

Why would this make it more tricky?

What I mean is that unless an estimator can warm-start at least 2 things, the combination of SH warmstart + GS warmstart is not an issue, since the GS has nothing left to warm-start.

But I'm actually wrong... when running SH (regardless of warm-starting SH), we still don't want GS to do any warmstart.

@adrinjalali
Copy link
Member

I agree that warm starting with RandomizedSearchCV, has a lower probability of being useful, but still you can sort the parameter sets and use warm start if it applies.< 8000 /p>

I checked the SH PR out, and I understand the issue better now.

accept estimators as input (and not clone them)

I'm not sure why we allow users to have their own _run_search but not their own evaluage_candidates.

Generally, in order to support warm_start in GS and still keep the fitted estimators, we need to clone the estimator, and yet do warm_start. If I understand correctly, at the moment that's not possible, since clone clears the estimator's state.

Therefore we either need to not clone the estimator before fit in GS and follow a warm startable path of parameters, or change the cloning mechanism there.

My understanding is that the above concerns don't apply to SH since you do/can follow a warm startable path of parameters and keep the last one only, but you need to tell the evaluate_candidates not to clone the estimator.

Now, do you think this statement is correct?

  • We can move evaluate_candidates out of the fit, and let users to have their own version of it, and that would fix the issue in SH since it can have its own version of it.

We could also have a call and discuss the issues.

@NicolasHug
Copy link
Member Author

We can move evaluate_candidates out of the fit, and let users to have their own version of it, and that would fix the issue in SH since it can have its own version of it.

Indeed, the main constraints are imposed by evaluate_candidates. But changing the nature of evaluate_candidates also comes with other issues, see #9499 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0