8000 [MRG] Add noise attribute to skopt.GaussianProcessRegressor by MechCoder · Pull Request #225 · scikit-optimize/scikit-optimize · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Feb 28, 2024. It is now read-only.

[MRG] Add noise attribute to skopt.GaussianProcessRegressor #225

Merged
merged 9 commits into from
Sep 26, 2016

Conversation

MechCoder
Copy link
Member
@MechCoder MechCoder commented Sep 16, 2016
  • Adds a noise attribute directly to skopt.learning.GaussianProcessRegressor
  • At prediction time, to predict P(f(x) | y) the kernel K(X_train, X_new) and K(X_new, X_new), should not include the noise or white kernel argument, this has been described in Eq, 2.24 of http://www.gaussianprocess.org/gpml/chapters/RW2.pdf . Note that K(X_train, X_new) and K(X_new, X_new) do not have the sigma**2 term.

@MechCoder MechCoder mentioned this pull request Sep 16, 2016
for param, value in self.kernel_.get_params().items():
# XXX: Should return this only in the case where a
# WhiteKernel is added.
if param.endswith('noise_level'):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit hacky to me. What do you think are the other alternatives?

@MechCoder
Copy link
Member Author
MechCoder commented Sep 16, 2016

What are your thoughts about decoupling the noise argument from the base_estimator argument (by having a noise=auto argument)? This makes it explicit that I have added a WhiteKernel whose noise component I can set to zero while estimating K(X_new, X_new) and K(X_train, X_new)

@betatim
Copy link
Member
betatim commented Sep 16, 2016

Where would you want to introduce the noise= keyword? As an argument to gp_minimize? As a naive user I think I'd prefer that either I give a kernel that I've designed myself and it gets used or that skopt uses its default. Having a parameter that then (maybe) modifies the kernel that I passed seems like it would make things more complicated.

Not knowledgable enough about this whole magic of brewing a kernel :(

cc @glouppe

@glouppe
Copy link
Member
glouppe commented Sep 16, 2016

or simply use the alpha parameter from the original scikit-learn interface? (instead of adding the WhiteKernel component)

@MechCoder
Copy link
Member Author

or simply use the alpha parameter from the original scikit-learn interface?

We can do that but that assumes I need to know the amount of noise before hand right?

The alpha parameter does the right thing, but using a WhiteKernel means I need to set the noise component of the kernel to zero while doing the kernel product here (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/gpr.py#L285) and here (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/gpr.py#L298). What are some other ways to achieve that?

@MechCoder MechCoder changed the title [WIP] Add noise attribute to skopt.GaussianProcessRegressor [MRG] Add noise attribute to skopt.GaussianProcessRegressor Sep 18, 2016
@MechCoder
Copy link
Member Author

Should be ready for reviews. I would consider this a 8000 s a bug.

Before this PR (n_calls=50)

gp_before

After this PR (n_calls=50)

gp_after

# The noise component of this kernel should be set to zero
# while estimating K(X, X_test) and K(X_test, X_test)
# Note that the term K(X, X) should include the noise but
# this (K(X, X))^-1y is precomputed as the attribute alpha.
Copy link
Member
@glouppe glouppe Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What alpha attribute are you referring to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the reference to Eqn. 2.24 in the comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member
@glouppe glouppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with these changes, but we should be aware this trick only holds for identically distributed noise, right?

@glouppe
Copy link
Member
glouppe commented Sep 19, 2016

Also, not convinced by the before/after figure. Both look fine to me.

@MechCoder
Copy link
Member Author

but we should be aware this trick only holds for identically distributed noise, right?

Yeah, :/. I had added that point in the initial PR (https://github.com/scikit-optimize/scikit-optimize/blob/master/skopt/optimizer/gp.py#L69)

Also, not convinced by the before/after figure. Both look fine to me.

Why do you think so, given a good number of function calls, the mean prediction should approximate the actual function right, and the first one seems a bit off?

@glouppe
Copy link
Member
glouppe commented Sep 19, 2016

Why do you think so, given a good number of function calls, the mean prediction should approximate the actual function right, and the first one seems a bit off?

This would be true for uniformly distributed samples, but this assumption does not hold in BO. We should only expect to be accurate in regions close of candidate optima. In the first figure, the region of high uncertainty is never ever sampled again because its LCB is larger than other parts of the input space. This is an expected behaviour of the algorithm.

@@ -0,0 +1,41 @@
from sklearn.gaussian_process import GaussianProcessRegressor as sk_GaussianProcessRegressor
from sklearn.gaussian_process.kernels import WhiteKernel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 police: double blank please

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@MechCoder
Copy link
Member Author

Yes, that's true. Sorry about the "noise"

@MechCoder
Copy link
Member Author

But I think the second cluster of points should be closer to the second local minima, but seems to be displaced by a bit in the first graph. But that just might be me nitpicking

@MechCoder
Copy link
Member Author

I can haz merge?

@codecov-io
Copy link
codecov-io commented Sep 19, 2016

Current coverage is 82.90% (diff: 96.00%)

Merging #225 into master will increase coverage by 1.06%

@@             master       #225   diff @@
==========================================
  Files            18         20     +2   
  Lines           892        965    +73   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            730        800    +70   
- Misses          162        165     +3   
  Partials          0          0          

Powered by Codecov. Last update 3066a09...cfc9aa2

@MechCoder
Copy link
Member Author

anything else?

@betatim
Copy link
Member
betatim commented Sep 20, 2016

I'm happy, but don't really understand much of the GP mumbojumbo so deferring to @glouppe 😃

self.kernel_.set_params(noise_level=0.0)
else:
for param, value in self.kernel_.get_params().items():
if isinstance(value, WhiteKernel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt this only correct when WhiteKernel is a term in a sum kernel? (e.g. in the case where kernel = K*WhiteKernel, i dont believe this trick holds, but your if-statement will still be true)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never saw that operations like + inherited from Sum, so I pushed a workaround that and added a test

for param, value in self.kernel_.get_params().items():
if isinstance(value, WhiteKernel):
self.kernel_.set_params(
**{param: WhiteKernel(noise_level=0.0)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for nitpicking again, but this still does not solve completely the issue. If you have kernel = K*WhiteKernel + WhiteKernel, you would change both of them.

The cleanest solution might be to instantiate the kernel yourself in this new class, adding in the WhiteKernel term to the user-provided kernel, and then keep a reference to this term.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding in the WhiteKernel term to the user-provided kernel,

This goes against the principles of sklearn to not set logic at init time, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if you have something like:

def __init__(self, ..., kernel, ...):
    self.kernel = kernel

def fit(self, X, y):
    self.noise_ = WhiteKernel()
    self.kernel_ = self.kernel + self.noise_
    ...
    self.gp_ = sklearn.GaussianProcessRegressor(self.kernel_).fit(X, y)
    ...
    self.noise_.set_params(noise_level=0.0)
    ...

Copy link
Member Author
@MechCoder MechCoder Sep 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you wouldn't always want to set the "noise" option on, no? Which is why I suggested the noise="auto" option to figure if we wanted to add the WhiteKernel or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noise="auto" -> noise="gaussian" (to be more accurate)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing this now

@MechCoder
Copy link
Member Author
MechCoder commented Sep 24, 2016

@glouppe Your solution will not work because the provided kernel is not modified at fit time, but rather a clone of it. (So the noise_level of self.noise_ will remain the default value). I have pushed another fix (with tests). Let me know what you think.

def __init__(self, kernel=None, alpha=1e-10,
optimizer="fmin_l_bfgs_b", n_restarts_optimizer=0,
normalize_y=False, copy_X_train=True, random_state=None,
noise=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So by default, noise=None, which amounts to reuse scikit-learn's GaussianProcessRegressor. Sorry to be blunt, but in the end, what is the added advantage of this PR? simply providing a shortcut for adding a WhiteKernel component? i.e. noise="gaussian" versus kernel=K+WhiteKernel()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... oh no, forget that comment. It also deals with disabling that WhiteKernel part when making predictions. Accordingly, it might be good to add a test highlighting the difference between skopt.GP(kernel=K, noise="gaussian").predict and sklearn.GP(kernel=K+WhiteKernel()).predict.

@glouppe
Copy link
Member
glouppe commented Sep 26, 2016

LGTM!

@MechCoder
Copy link
Member Author

@glouppe Anything else?

@MechCoder
Copy link
Member Author

Phew, this took more time then I had expected.

@MechCoder MechCoder merged commit 31b7c84 into scikit-optimize:master Sep 26, 2016
@MechCoder MechCoder deleted the fix_noise branch September 26, 2016 15:43
@MechCoder
Copy link
Member Author

squashed and mergred

@betatim
Copy link
Member
betatim commented Sep 26, 2016

:shipit: good job!

MechCoder added a commit that referenced this pull request Sep 27, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0