[MRG] Add noise attribute to skopt.GaussianProcessRegressor #225

MechCoder · 2016-09-16T03:22:30Z

Adds a noise attribute directly to skopt.learning.GaussianProcessRegressor
At prediction time, to predict P(f(x) | y) the kernel K(X_train, X_new) and K(X_new, X_new), should not include the noise or white kernel argument, this has been described in Eq, 2.24 of http://www.gaussianprocess.org/gpml/chapters/RW2.pdf . Note that K(X_train, X_new) and K(X_new, X_new) do not have the sigma**2 term.

MechCoder · 2016-09-16T03:39:11Z

skopt/learning/gpr.py

+        for param, value in self.kernel_.get_params().items():
+            # XXX: Should return this only in the case where a
+            # WhiteKernel is added.
+            if param.endswith('noise_level'):


This seems a bit hacky to me. What do you think are the other alternatives?

MechCoder · 2016-09-16T03:43:22Z

What are your thoughts about decoupling the noise argument from the base_estimator argument (by having a noise=auto argument)? This makes it explicit that I have added a WhiteKernel whose noise component I can set to zero while estimating K(X_new, X_new) and K(X_train, X_new)

betatim · 2016-09-16T11:25:26Z

Where would you want to introduce the noise= keyword? As an argument to gp_minimize? As a naive user I think I'd prefer that either I give a kernel that I've designed myself and it gets used or that skopt uses its default. Having a parameter that then (maybe) modifies the kernel that I passed seems like it would make things more complicated.

Not knowledgable enough about this whole magic of brewing a kernel :(

cc @glouppe

glouppe · 2016-09-16T11:57:58Z

or simply use the alpha parameter from the original scikit-learn interface? (instead of adding the WhiteKernel component)

MechCoder · 2016-09-17T06:30:22Z

or simply use the alpha parameter from the original scikit-learn interface?

We can do that but that assumes I need to know the amount of noise before hand right?

The alpha parameter does the right thing, but using a WhiteKernel means I need to set the noise component of the kernel to zero while doing the kernel product here (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/gpr.py#L285) and here (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/gpr.py#L298). What are some other ways to achieve that?

MechCoder · 2016-09-18T03:40:26Z

Should be ready for reviews. I would consider this a 8000 s a bug.

Before this PR (n_calls=50)

After this PR (n_calls=50)

glouppe · 2016-09-19T06:02:07Z

skopt/learning/gpr.py

+        # The noise component of this kernel should be set to zero
+        # while estimating K(X, X_test) and K(X_test, X_test)
+        # Note that the term K(X, X) should include the noise but
+        # this (K(X, X))^-1y is precomputed as the attribute alpha.


What alpha attribute are you referring to?

Can you add the reference to Eqn. 2.24 in the comment?

Sorry I meant this alpha_ (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/gpr.py#L229)

glouppe

I am fine with these changes, but we should be aware this trick only holds for identically distributed noise, right?

glouppe · 2016-09-19T06:05:33Z

Also, not convinced by the before/after figure. Both look fine to me.

MechCoder · 2016-09-19T13:44:56Z

but we should be aware this trick only holds for identically distributed noise, right?

Yeah, :/. I had added that point in the initial PR (https://github.com/scikit-optimize/scikit-optimize/blob/master/skopt/optimizer/gp.py#L69)

Also, not convinced by the before/after figure. Both look fine to me.

Why do you think so, given a good number of function calls, the mean prediction should approximate the actual function right, and the first one seems a bit off?

glouppe · 2016-09-19T13:49:54Z

Why do you think so, given a good number of function calls, the mean prediction should approximate the actual function right, and the first one seems a bit off?

This would be true for uniformly distributed samples, but this assumption does not hold in BO. We should only expect to be accurate in regions close of candidate optima. In the first figure, the region of high uncertainty is never ever sampled again because its LCB is larger than other parts of the input space. This is an expected behaviour of the algorithm.

betatim · 2016-09-19T08:59:24Z

skopt/learning/gpr.py

@@ -0,0 +1,41 @@
+from sklearn.gaussian_process import GaussianProcessRegressor as sk_GaussianProcessRegressor
+from sklearn.gaussian_process.kernels import WhiteKernel
+


PEP8 police: double blank please

MechCoder · 2016-09-19T14:25:32Z

Yes, that's true. Sorry about the "noise"

MechCoder · 2016-09-19T14:27:39Z

But I think the second cluster of points should be closer to the second local minima, but seems to be displaced by a bit in the first graph. But that just might be me nitpicking

MechCoder · 2016-09-19T15:09:30Z

I can haz merge?

codecov-io · 2016-09-19T16:43:00Z

Current coverage is 82.90% (diff: 96.00%)

Merging #225 into master will increase coverage by 1.06%

@@             master       #225   diff @@
==========================================
  Files            18         20     +2   
  Lines           892        965    +73   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            730        800    +70   
- Misses          162        165     +3   
  Partials          0          0

Powered by Codecov. Last update 3066a09...cfc9aa2

MechCoder · 2016-09-20T13:27:34Z

anything else?

betatim · 2016-09-20T13:57:57Z

I'm happy, but don't really understand much of the GP mumbojumbo so deferring to @glouppe 😃

glouppe · 2016-09-20T17:51:21Z

skopt/learning/gpr.py

+            self.kernel_.set_params(noise_level=0.0)
+        else:
+            for param, value in self.kernel_.get_params().items():
+                if isinstance(value, WhiteKernel):


Isnt this only correct when WhiteKernel is a term in a sum kernel? (e.g. in the case where kernel = K*WhiteKernel, i dont believe this trick holds, but your if-statement will still be true)

I never saw that operations like + inherited from Sum, so I pushed a workaround that and added a test

glouppe · 2016-09-21T07:31:30Z

skopt/learning/gpr.py

+            for param, value in self.kernel_.get_params().items():
+                if isinstance(value, WhiteKernel):
+                    self.kernel_.set_params(
+                        **{param: WhiteKernel(noise_level=0.0)})


Sorry for nitpicking again, but this still does not solve completely the issue. If you have kernel = K*WhiteKernel + WhiteKernel, you would change both of them.

The cleanest solution might be to instantiate the kernel yourself in this new class, adding in the WhiteKernel term to the user-provided kernel, and then keep a reference to this term.

adding in the WhiteKernel term to the user-provided kernel,

This goes against the principles of sklearn to not set logic at init time, right?

Not if you have something like:

def __init__(self, ..., kernel, ...): self.kernel = kernel def fit(self, X, y): self.noise_ = WhiteKernel() self.kernel_ = self.kernel + self.noise_ ... self.gp_ = sklearn.GaussianProcessRegressor(self.kernel_).fit(X, y) ... self.noise_.set_params(noise_level=0.0) ...

But you wouldn't always want to set the "noise" option on, no? Which is why I suggested the noise="auto" option to figure if we wanted to add the WhiteKernel or not.

noise="auto" -> noise="gaussian" (to be more accurate)

fixing this now

MechCoder · 2016-09-24T22:20:39Z

@glouppe Your solution will not work because the provided kernel is not modified at fit time, but rather a clone of it. (So the noise_level of self.noise_ will remain the default value). I have pushed another fix (with tests). Let me know what you think.

glouppe · 2016-09-26T07:00:56Z

skopt/learning/gpr.py

+    def __init__(self, kernel=None, alpha=1e-10,
+                 optimizer="fmin_l_bfgs_b", n_restarts_optimizer=0,
+                 normalize_y=False, copy_X_train=True, random_state=None,
+                 noise=None):


So by default, noise=None, which amounts to reuse scikit-learn's GaussianProcessRegressor. Sorry to be blunt, but in the end, what is the added advantage of this PR? simply providing a shortcut for adding a WhiteKernel component? i.e. noise="gaussian" versus kernel=K+WhiteKernel()?

... oh no, forget that comment. It also deals with disabling that WhiteKernel part when making predictions. Accordingly, it might be good to add a test highlighting the difference between skopt.GP(kernel=K, noise="gaussian").predict and sklearn.GP(kernel=K+WhiteKernel()).predict.

…n) are not same

glouppe · 2016-09-26T14:47:41Z

LGTM!

MechCoder · 2016-09-26T14:47:51Z

@glouppe Anything else?

MechCoder · 2016-09-26T14:48:16Z

Phew, this took more time then I had expected.

MechCoder · 2016-09-26T15:43:48Z

squashed and mergred

betatim · 2016-09-26T21:22:44Z

good job!

MechCoder mentioned this pull request Sep 16, 2016

[MRG] Handle warnings #222

Merged

MechCoder commented Sep 16, 2016

View reviewed changes

MechCoder mentioned this pull request Sep 18, 2016

[WIP] Add TensorFlow MNIST example #203

Closed

MechCoder force-pushed the fix_noise branch from ccb8458 to fdc5b19 Compare September 18, 2016 03:38

MechCoder changed the title ~~[WIP] Add noise attribute to skopt.GaussianProcessRegressor~~ [MRG] Add noise attribute to skopt.GaussianProcessRegressor Sep 18, 2016

glouppe reviewed Sep 19, 2016

View reviewed changes

betatim reviewed Sep 19, 2016

View reviewed changes

glouppe reviewed Sep 20, 2016

View reviewed changes

glouppe reviewed Sep 21, 2016

View reviewed changes

MechCoder added 6 commits September 24, 2016 16:53

Add noise attribute to skopt.GaussianProcessRegressor

484fb9d

Do not include noise at prediction time

9459116

Update documentation

ac4bd76

blank line

da251da

Workaround to check only if WhiteKernel exists in Sum

d73f058

Return string_param when white noise is present in sum

c6ffe6c

MechCoder force-pushed the fix_noise branch from ad82d8b to c6ffe6c Compare September 24, 2016 21:48

Modify the GaussianProcessRegressor object

760879b

Add documentation

c581e61

glouppe reviewed Sep 26, 2016

View reviewed changes

Add tests to show that gpr(rbf + wk) and gpr(rbf, noise_level=gaussia…

cfc9aa2

…n) are not same

MechCoder merged commit 31b7c84 into scikit-optimize:master Sep 26, 2016

MechCoder deleted the fix_noise branch September 26, 2016 15:43

MechCoder added a commit that referenced this pull request Sep 27, 2016

Update benchmarks after #225

968bdf2

		@@ -0,0 +1,41 @@
		from sklearn.gaussian_process import GaussianProcessRegressor as sk_GaussianProcessRegressor
		from sklearn.gaussian_process.kernels import WhiteKernel

[MRG] Add noise attribute to skopt.GaussianProcessRegressor #225

[MRG] Add noise attribute to skopt.GaussianProcessRegressor #225

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Current coverage is 82.90% (diff: 96.00%)

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!