[WIP] Gaussian Process-based hyper-parameter optimization #5491

fabianp · 2015-10-20T15:10:25Z

Based on @sds-dubois code (PR #5185), this code adds an API that is more compatible with the GridSearchCV interface and some tests.

Things still TODO are

Make the example simpler and faster.
Fix the failing test (test_n_iter_smaller_n_iter).
Add this object to the narrative documentation.
Clearly state the way to specify hyperparameters. In particular, it seems fine with not adding the type of the params, i.e. both {'foo_param': [1, 2, 3]} and {'foo_param': ['int', [2, 3]]}. I'm not against it, but is should be clearly stated how hyperparameters need to be specified, what is accepted and what is not.
Improve test coverage.
GPSearchCV().scores_ should have the same structure as GridSearchCV().grid_scores_
Random seed for _sample_candidates
Allow parallelization across CV folds.

I think the module name is not ideal, since in the future we migth have other *SearchCV objects (e.g. based on trees). Any suggestions?

CC @Djabbz

jmetzen · 2015-10-20T17:20:23Z

sklearn/gp_search.py

+
+
+def compute_expected_improvement(predictions, sigma, y_best):
+    ei_array = np.zeros(predictions.shape[0])


this could be easily vectorized

glouppe · 2015-10-20T17:37:42Z

Clearly state the way to specify hyperparameters. In particular, it seems fine with not adding the type of the params, i.e. both {'foo_param': [1, 2, 3]} and {'foo_param': ['int', [2, 3]]}. I'm not against it, but is should be clearly stated how hyperparameters need to be specified, what is accepted and what is not.

It would make sense to follow the input format of RandomizedSearchCV, which accepts both distributions and lists.

jmetzen · 2015-10-20T17:38:16Z

sklearn/gp_search.py

+
+    acquisition function : string, optional
+        Function to maximize in order to choose the next parameter to test.
+        - Simple : maximize the predicted output


"Greedy" might be a better name than "simple" as it is commonly used to denote strategies which purely exploit the current knowledge without any exploration

glouppe · 2015-10-20T17:40:54Z

I think the module name is not ideal, since in the future we migth have other *SearchCV objects (e.g. based on trees). Any suggestions?

The general framework is called "bayesian optimization". Maybe something along the lines of BayesSearchCV?

jmetzen · 2015-10-20T17:41:38Z

sklearn/gp_search.py

+
+    n_init : int, optional
+        Number of random iterations to perform before the smart search.
+        Default is 30.


30 seems relatively large IMO. Not sure if there are any established heuristics for choosing this value but I would personally set this value to no more than 2*self.n_parameters

sds-dubois · 2015-10-20T20:28:53Z

I was quite surprised that {'foo_param': [1, 2, 3]} is accepted and works as in the GridSearchCV(). Is this intended?

@fabianp No that was not intended.
Maybe {'foo_param': [1, 2, 3]} could be accepted and treated by default as categorical. However I think it's better to specify that the hyperparameter is an integer or a float when it is (see eg Bergstra,Bengio ).
And using something like {'foo_param': [1, 2, 3]} may tempt the user to specify only 5 points when the search space is actually continuous.

jnothman · 2015-10-20T21:56:14Z

sklearn/gp_search.py

+    Parameters
+    ----------
+
+    estimator : 1) sklearn estimator or 2) callable


I assume we'll drop callable support

This could be useful if we wanted a generic bayesian optimization algorithm, but indeed, this is certainly out of scope.

fabianp · 2015-10-21T07:14:09Z

Thanks all for your comments! This kind of things keep me going :-)

glouppe · 2015-10-21T08:43:07Z

sklearn/gp_search.py

+def compute_expected_improvement(predictions, sigma, y_best):
+    ei_array = np.zeros(predictions.shape[0])
+    for i in range(ei_array.shape[0]):
+        z = (y_best - predictions[i]) / sigma[i]


Shouldnt it be (predictions[i] - y_best) / sigma[i] instead? (see equation 4 of http://arxiv.org/pdf/1012.2599v1)

jnothman · 2015-10-21T12:17:51Z

It is possible to rewrite BaseSearchCV so that not all parameter settings need to be specified in advance. Now might be the right time to do it.

For example, we could support the search method being a generator/coroutine that repeatedly yields a list of parameter settings to try, and is sent (with gen.send) the results. Or something equivalent with callbacks.

fabianp · 2015-10-21T16:22:16Z

@glouppe , the problem with following the same input as RandomizedSearchCV is that we need a way to specify either a discrete list and a range of possible values. As far as I know RandomizedSearchCV allows to specify either a discrete list or a distribution, but not a range of possible values (i.e. an interval). Right now I don't know what is the most intuitive way to do this.

glouppe · 2015-10-21T16:27:20Z

@fabianp An interval could be specified by a uniform distribution (with a uniform prior).

fabianp · 2015-10-21T16:27:46Z

@jnothman I like the idea, but that would involve a refactoring of grid_search objects... Since my goal is to have a usable GPSearchCV object by the end of the week I would leave that for a future PR

fabianp · 2015-10-21T16:29:41Z

@glouppe so what you are proposing is to get the bounds from the distribution and ignore the distribution? or to optimize the expected improvement by sampling from this distribution?

glouppe · 2015-10-21T16:32:45Z

optimize the expected improvement by sampling from this distribution?

This.

But maybe others would think otherwise. What is your opinion @amueller @jmetzen ?

fabianp · 2015-10-21T16:51:17Z

The inconvenient I see is that it forbids you from using standard derivative-free optimizers (e.g. scipy's cobyla) on the acquisition function.

glouppe · 2015-10-21T16:57:27Z

Hmm I am not sure to see the connection. Why would it prevent it you from using cobyla?

MechCoder · 2016-02-18T06:49:58Z

I'm benching on the MNIST data with LR and leaving it overnight. Will let you know of the results tomorrow.

MechCoder · 2016-02-18T19:55:30Z

Yay! I fixed the joblib parallel handling in the last commit using the context manager. I was creating and destroying the forks for every iteration which lead to significant overhead. There is still quite a bit to do but we are getting there.

These are the new graphs. Note that I have set n_init=1 which means the GP's are in effect from first.

MechCoder · 2016-02-18T22:45:02Z

New benchmarks on the digits dataset. I removed some dict_values to list overhead for values from ParameterSampler in the last commit for which the sampling is faster than the RandomizedSearhCV methods. I plan to apply non-gradient based methods like cobyla etc and see if it improves performance.

GaelVaroquaux · 2016-02-19T10:04:06Z

I fixed the joblib parallel handling in the last commit using the context manager.

I wanted to suggest that a few days ago, and looked at the code to notice
that you were doing it. It's probably after you changed and before you
benchmarked :).

GaelVaroquaux · 2016-02-19T10:12:21Z

By the way, with regards to optimization: Do you have constraints? Because cobyla is slow, and therefore useful only if you have constraints. If those constraints are bound constraints, I would expect "l_bfgs_b" to be faster.

If you don't have constraints, I would still investigate l_bfgs_b, or nelder-meads (optimize.fmin).

For all options, I would maybe limit the number of iterations of the optimizer, as you don't want a very good optimization.

By the way: a hopefully useful resource on choosing an optimization method: http://www.scipy-lectures.org/advanced/mathematical_optimization/index.html#practical-guide-to-optimization-with-scipy

lesshaste · 2016-02-19T11:11:25Z

I am sorry if I am missing an important detail . The graphs that @MechCoder has made are awesome but.. is it possible to find examples where there is a really significant advantage to hyper-parameter optimization? Some of the earlier graphs seem to show this but in the later ones the advantages looks marginal. I would have thought that there were cases where making a significant change in the hyper-parameters of a classifier/regressor would really make a huge difference.

Off the top of my head, I would guess that MLP might a good classifier to look at for this as well as the regularization parameters of our standard classifiers/regressors.

Also are there interesting/surprising results from the searches that have been tried or do we just find that the higher/lower some parameters go (e.g. the number of trees for random forests) the better?

Finally, on a related but a little different topic, maybe there are other sort of hyper-parameters such as how data is preprocessed, what sort of scaling is used, which features are treated as numerical vs categorical, which interaction terms are introduced etc. that would be of interest as well.

MechCoder · 2016-02-19T15:35:32Z

@lesshaste Thanks for raising some very valid points. I am still at the point of convincing myself but I'll still try and put forward a case for.

The graphs that you see in the latter stage are less convincing because I am averaging out the randomness. In practise I think when time and computing power is a constraint one run of SequentialSearchCV would be preferred rather than several runs of RSCV.
Note that there are two graphs for the RandomizedSearch with two priors. It is impossible to know your prior on the hyperparameters unless you have done a wider grid search before. So for example, in the graph above the RSCV with alpha=10**-1 is the best that you can ever have.
I hope we can convince ourselves why this would be favourable against GridSearch when there a huge number of hyperparameters (for example, the DecisionTree bench with the california dataset above). Also we need not do multiple grid searches to narrow down to a single best parameter.

However all this is in theory and I hope we can find cases where this beats the existing searches convincingly. I do not have the resources currently to bench on huge problems, but feel free to try yourself. I can however bench for simple problems with different estimators.

Regarding your last question(s), they are all API issues. I wanted to bench with the MLP but then it takes number of layers as a list. I need to tweak the code for this particular use case. I could not off the top of my head, recall another estimator that allows parameters as a list.

How would you handle different normalizations with the present grid search API? The most straightforward way is to run different searches across different pipelines and compare your best results. Also we can handle categorical parameters but that involves changing the API to also allow a dict -list format in addition to the already convoluted dict-dict format.

MechCoder · 2016-02-19T15:37:29Z

Thanks a lot for the link Gael, but I'm afraid we are still in the "show me it is useful" stage. Just FYI, I have used l_bfgsb and limited the number of iterations, so blindly following your comments, it should be close to the best we can get.

amueller · 2016-02-26T18:46:12Z

I was told this would be the right way to deal with the log vs linear parameters, and that's what's implemented in spearmint: http://www.cs.toronto.edu/~zemel/documents/bayesopt-warping.pdf

amueller · 2016-02-26T18:57:39Z

They integrate out the parameters of the warping, which is easy for them, I guess, as they already do that with the kernel parameters. We maximize the kernel parameters IIRC, and that happens in the GP. I'm not sure how simple it is for us to integrate out the warping parameters.
Maybe we could add the warping parameters to the model? Or sample them? It would also be nice to apply our implementation to the standard global optimization tasks, to see how we compare to the other approaches on HPOLib.

amueller · 2016-02-26T18:59:54Z

I see that there is currently no low-level interface to run tests on Branin or some other function. That wouldn't be hard to refactor, though, right?
I can imagine other people wanting to reuse the Bayesian optimization not using an estimator, and it would be nice for benchmarking.

MechCoder · 2016-02-29T17:39:32Z

Thanks Andy, for the comments. I will follow up in a while..

joshlk · 2016-04-14T16:23:04Z

Hi all, there's a package already out there called Optunity for this kind of optimisation. May be worthing checking it out: #6662

Hvass-Labs · 2016-05-13T10:30:39Z

#5491

Hello Everyone,

This appears to be the main thread on hyper-parameter optimization in sklearn. I've had some experience in a related field called Meta-Optimization, so I thought I might share some of my thoughts in case they are useful to you.

Please note that I'm fairly new to Python and I'm completely new to sklearn.

Random and Grid Search

First off, Random Search on any kind of optimization problem will only find near-optimal solutions if the search-space is very smooth. This is because the probability of finding near-optimal parameters decreases exponentially towards zero as the number of parameters increases. This is the so-called Curse of Dimensionality in the search-space of hyper-parameters.

If n is the number of parameters and near-optimal parameter choices only occupy 10% of the range of each parameter, then the probability of finding near-optimal parameters by completely random sampling is p=0.1^n. For n=10 parameters this probability is p=1e-10 and will hence require 1e+10 random samples on average to find near-optimal hyper-parameters.

I've sometimes seen people claim that Random Search eventually converges to the optimal parameters, and I also saw this claim in a recent lecture on hyper-parameter tuning, but the claim is a bit of a stretch - an infinite stretch to be exact.

Practical Use

You may object that both GridSearch and RandomSearch in sklearn seem to work in practice. Well, they may find acceptable hyper-parameters but it is highly unlikely that they find near-optimal parameters. Unless the search-space of hyper-parameters is very smooth, there may be a very large difference in performance between acceptable and near-optimal parameters.

However, I do agree that it is much preferable to automate either grid- or random-search instead of doing it manually. But even better would of course be to have a proper optimizer to find near-optimal hyper-parameters.

Meta-Optimization

The problem I was working on was the tuning of control-parameters (aka. behavioural parameters, aka. hyper-parameters) for heuristic optimizers. I prefer to call this Meta-Optimization although it is conceptually very similar to hyper-parameter optimization.

The Wikipedia page has a short description of the concept, which was apparently first explored back in the 1970's:

https://en.wikipedia.org/wiki/Meta-optimization

The main problem when I worked on this was the massive computational needs. The popular heuristic optimizers of the day, such as Particle Swarm Optimization (PSO), typically required many thousand evaluations of the cost-function. This was intractable when the meta-cost-function consisted of another 50 optimization runs that might take hours to complete.

I found a system that was simple, worked well and was quite fast. It had two main components: (1) A meta-optimizer that only required few iterations, and (2) a measure of performance whose calculation could be aborted pre-emptively in many cases. These two things in combination made it perform reasonably fast. I made a short YouTube-video on how it works:

https://www.youtube.com/watch?v=O6OQPpzVHBc

And another short video with a demonstration:

https://www.youtube.com/watch?v=6cM-e10YRdI

Python Source-Code

I recently implemented this form of meta-optimization in Python as an exercise for myself to get better at Python programming (it is surprising how many language constructs you get to exercise for such a small project).

You can download it here if you want to see the actual source-code and try running it. See the file demo-meta-optimize.py:

https://github.com/Hvass-Labs/swarmops

Meta-optimization in sklearn?

I considered proposing to implement LUS for hyper-parameter tuning in sklearn, but I see two major problems with that approach:

(1) LUS only works for real-valued parameters and would therefore have to be combined with another heuristic, or grid-search, or random-search for the discrete hyper-parameters. I suppose you could hack it by mapping continuous numbers to discrete parameters - but I'm not sure it would work as LUS assumes the search-space is numerically ordered and reasonably smooth.

(2) LUS typically requires about 5 runs because it has a tendency to become stuck in local optima, and each run requires a number of iterations equal to about 30 x the number of hyper-parameters to be tuned. Because of how the performance measure was designed for meta-optimization, it is possible to abort the computation pre-emptively in many cases, thus making the tuning process reasonably fast, only taking a small fraction of the full computation time. An example is shown in the following image where all the crosses are hyper-parameters that were contemplated but the calculation was aborted pre-emptively because the parameters were found to be inferior (note the meta-optimization is doing minimization in this image). I'm not sure this would work when tuning the hyper-parameters of Machine Learning methods, so this method may not be fast enough.

Bayesian Optimization

I've looked at some of the other proposals for hyper-parameter tuning in Machine Learning. I like the idea of Bayesian Optimization (BO). As far as I can tell from having just studied it briefly, it is also a heuristic optimizer with some of the strengths and weaknesses of heuristics.

However, BO works quite differently than e.g. LUS or PSO, because BO builds an approximation to the search-space in a clever way and improves the approximation with each evaluation of the cost-function. The main draw-back seems to be that BO is more complicated to implement (certainly compared to LUS) - and it should only be used when the cost-function is very time-consuming to compute, because BO has significant computational overhead of its own.

I'm not quite sure if the standard BO variants support a combination of continuous and discrete hyper-parameters, because it is always demonstrated on continuous parameters. Can someone elaborate?

A recent paper by Wang et al. (@ziyuw) proposed a BO variant which supports both types of parameters and claims to have other advantages as well. (I also find it amusing that they call their variant REMBO; while LUS means louse in Danish :-)

http://arxiv.org/abs/1301.1942

Please update Wikipedia

If some of the experts on Bayesian Optimization should read this, then please consider updating the Wikipedia page so the rest of us can get a quick overview. The page receives more than 60 views per day so it is certainly worth your time - does any of your academic papers get this many views? :-)

https://en.wikipedia.org/wiki/Bayesian_optimization

Meta-Meta-Optimization

In my opinion, it makes sense for the research community to look for a good method for doing meta-optimization / hyper-parameter optimization. If we think about it a bit, we see that this process can of course also be automated with meta-meta-optimization aka. hyper-hyper-parameter optimization. But is it worth the trouble to build such a system? This is what I wrote on the subject in my thesis:

http://www.hvass-labs.org/people/magnus/thesis/pedersen08thesis.pdf

"The LUS method was used as meta-optimizer with its own behavioural parameter being found by manual experimentation. It would be interesting to see if this behavioural parameter of LUS itself could be tuned to make LUS perform even better when used as a meta-optimizer. If LUS is going to be used often as the meta-optimizer then it makes perfect sense to find the behavioural parameter that makes it perform its best at this task. This would effectively mean that the behavioural parameter of the LUS method should be Meta-Meta-Optimized, and this naturally raises the question of which optimization method to use as the meta-meta-optimizer? Since the meta-meta search-space can still be expected to be fairly smooth, a suggestion would be to use the LUS method again, and with its standard parameter that worked well for ordinary meta-optimization.

Although this may seem silly to people who are just beginning to accept the usefulness of doing meta-optimization, comparisons of techniques for doing meta-optimization have actually already been made in the literature to identify which approach works best, which is a manual way of doing meta-meta-optimization, see Smit and Eiben [ref]. Indeed, this was also done in chapter 3 here.

Meta-Meta-Meta-...

Does it then make sense to tune the behavioural parameter of LUS when it is being used as the meta-meta-optimizer? This does not appear to be the case, because the parameter search-spaces for each additional meta-layer seem to become increasingly simple and smooth and when we reach the meta-meta-layer they just might be smooth enough to almost guarantee that the best performing parameters can be found."

Meta-Meta Implementation

SwarmOps for C# supports meta-meta-optimization with any meta-nesting-depth you may like. You may also be able to get this working in SwarmOps for Python. I can't quite remember if the Java and C versions of SwarmOps also support this.

http://www.hvass-labs.org/projects/swarmops/

Questions?

I'm sorry to hi-jack your thread with such a long post, but perhaps some of these thoughts might be useful to you when implementing hyper-parameter optimization in sklearn. If you have any questions then please tag the comment with @Hvass-Labs so I see it. I'll look forward to seeing your solution in one of the next versions of sklearn.

GaelVaroquaux · 2016-06-15T04:04:54Z

What's the status on this PR? It seems to be stalled at the fact that the internal optimization was taking a lot of time, but with LBFGS and early stopping, I was hoping for an improvement in time (I didn't see a bench).

Any hope in getting this merged or mergeable?

Hvass-Labs · 2016-06-15T13:46:46Z

@GaelVaroquaux I'm not really a part of the effort and it has been a month since I looked at this so I don't know if the following suggestion is helpful.

If the problem is that the inner-loop of the Bayes Optimizer is too slow because it needs too many random samples, or because L-BFGS-B is too slow for some reason; then how about trying a heuristic optimizer for the inner-loop of the Bayes Optimizer?

LUS and PS in my SwarmOps library were specifically made for semi-smooth search-spaces that had to be searched in few iterations (typically something like 20 x dimensionality of the search-space). It may be worth a try to plug them into the Bayes Optimizer code you guys have already made - or perhaps try and let them optimize the real-valued hyper-parameters directly without using the Gaussian approximation. Perhaps it'll work.

https://github.com/Hvass-Labs/swarmops

I also tried TPOT a while ago. It is a great idea, although it still seemed to be under development. Perhaps the author @rhiever has some insights to share in this thread.

https://github.com/rhiever/tpot

fabianp · 2016-06-15T14:14:35Z

Since this was opened @MechCoder and @glouppe have made a very nice
package, https://github.com/scikit-optimize/scikit-optimize/, that
implements the routines needed here. In my opinion the way to go would be
to backport skopt into sklearn.externals and then it would just be a matter
of API to have a functional GP-based hyperparameter optimization.

On Wed, Jun 15, 2016 at 3:46 PM, Hvass-Labs notifications@github.com
wrote:

@GaelVaroquaux https://github.com/GaelVaroquaux I'm not really a part
of the effort and it has been a month since I looked at this so I don't
know if the following suggestion is helpful.

If the problem is that the inner-loop of the Bayes Optimizer is too slow
because it needs too many random samples, or because L-BFGS-B is too slow
for some reason; then how about trying a heuristic optimizer for the
inner-loop of the Bayes Optimizer?

LUS and PS in my SwarmOps library were specifically made for semi-smooth
search-spaces that had to be searched in few iterations (typically
something like 20 x dimensionality of the search-space). It may be worth a
try to plug them into the Bayes Optimizer code you guys have already made -
or perhaps try and let them optimize the real-valued hyper-parameters
directly without using the Gaussian approximation. Perhaps it'll work.

https://github.com/Hvass-Labs/swarmops

I also tried TPOT a while ago. It is a great idea, although it still
seemed to be under development. Perhaps the author @rhiever
https://github.com/rhiever has some insights to share in this thread.

https://github.com/rhiever/tpot

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#5491 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAQ8hyI9I8-MsGk52niVTHOQe0G7Ykogks5qMAJOgaJpZM4GSO9I
.

rhiever · 2016-06-15T14:16:27Z

From the TPOT end: Currently, genetic programming (also shortened as GP) optimization is likely not better than Bayesian optimization for tuning real-valued hyperparameters. TPOT specializes more in optimizing a sequence of preprocessors, models, etc. and tuning the parameters of those operators at a granular level.

You might look into some of the tricks and tips used in auto-sklearn.

They use meta-learning to choose a good model/parameters to start with (which may be useful), and 10000 their paper reports that Bayesian optimization works quite well in the hyperparameter optimization domain. I don't think it would be possible to take their code and port it right into sklearn (as it's quite heavy on the dependencies), but it could be useful if you gutted it for the optimization process. /cc @mfeurer

Hvass-Labs · 2016-06-16T06:37:46Z

I took a quick look at the source-code for scikit-optimize and I like that you can pass a parameter for the type of search. So far it supports either random sampling or L-BFGS-B for searching the Gaussian approximation. This leaves the door open for using other search methods and heuristics as suggested above. The source-code for skopt looks to be reasonably easy to modify; although both the variable names and comments could perhaps be made a bit more descriptive so it's easier to understand and modify by others (please consider this @glouppe and @MechCoder).

https://github.com/scikit-optimize/scikit-optimize/blob/master/skopt/gp_opt.py

Both TPOT by @rhiever et al. and the work by @mfeurer et al. look to be fascinating contributions to the field. Perhaps something like that can be added to sklearn when they're ready? It would certainly be useful to the end-users if they can automatically discover and tune the entire pipeline, without having to import other libraries. I think it would be well worth the effort to consider how it could be integrated directly into sklearn.

glouppe · 2016-06-16T06:47:57Z

@Hvass-Labs Thanks for your enthusiasm and comments! Scikit-optimize is still very much in its infancy and much remains to be done before a first usable release, but hopefully we will get there by this summer. In the meantime, PRs are welcome if you feel like helping us :)

sds-dubois · 2018-05-14T21:34:28Z

Hey all! It's been a long time, what's the status of the PR?

amueller · 2018-05-21T22:43:54Z

There's scikit-optimize now: https://scikit-optimize.github.io/

I'm in favor of closing this PR and deferring to scikit-optimize. @fabianp ok with you?

fabianp · 2018-05-24T22:26:54Z

Absolutely. Closing

fabianp mentioned this pull request Oct 20, 2015

Gaussian Process-based hyper-parameter optimizer #5185

Closed

ogrisel changed the title ~~Gaussian Process-based hyper-parameter optimization~~ [WIP] Gaussian Process-based hyper-parameter optimization Oct 20, 2015

jmetzen reviewed Oct 20, 2015
View reviewed changes

jnothman reviewed Oct 20, 2015
View reviewed changes

glouppe reviewed Oct 21, 2015
View reviewed changes

FIX Joblib parallel handling

d3b6105

Speed up sampling

7c9b379

amueller mentioned this pull request Apr 14, 2016

[WIP] Proof-of-concept using OptunityLib to optimise parameters #6662

Closed

betatim mentioned this pull request Jun 18, 2016

Add scikit-learn compatible BayesSearchCV scikit-optimize/scikit-optimize#78

Closed

MechCoder mentioned this pull request Aug 31, 2016

forest_minimize slows down with n_jobs=4 scikit-optimize/scikit-optimize#181

Closed

iaroslav-ai mentioned this pull request Jun 18, 2017

[MRG+1] Scikit - Optimize based GridSearchCV plug-in replacement scikit-optimize/scikit-optimize#405

Merged

9 tasks

fabianp closed this May 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Gaussian Process-based hyper-parameter optimization #5491

[WIP] Gaussian Process-based hyper-parameter optimization #5491



		def compute_expected_improvement(predictions, sigma, y_best):
		ei_array = np.zeros(predictions.shape[0])

[WIP] Gaussian Process-based hyper-parameter optimization #5491

[WIP] Gaussian Process-based hyper-parameter optimization #5491

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Random and Grid Search

Practical Use

Meta-Optimization

Python Source-Code

Meta-optimization in sklearn?

Bayesian Optimization

Please update Wikipedia

Meta-Meta-Optimization

Meta-Meta Implementation

Questions?