-
Notifications
You must be signed in to change notification settings - Fork 552
[MRG] Add noise attribute to skopt.GaussianProcessRegressor #225
Conversation
for param, value in self.kernel_.get_params().items(): | ||
# XXX: Should return this only in the case where a | ||
# WhiteKernel is added. | ||
if param.endswith('noise_level'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit hacky to me. What do you think are the other alternatives?
What are your thoughts about decoupling the noise argument from the |
Where would you want to introduce the Not knowledgable enough about this whole magic of brewing a kernel :( cc @glouppe |
or simply use the |
We can do that but that assumes I need to know the amount of noise before hand right? The |
ccb8458
to
fdc5b19
Compare
# The noise component of this kernel should be set to zero | ||
# while estimating K(X, X_test) and K(X_test, X_test) | ||
# Note that the term K(X, X) should include the noise but | ||
# this (K(X, X))^-1y is precomputed as the attribute alpha. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What alpha
attribute are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the reference to Eqn. 2.24 in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I meant this alpha_
(https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/gpr.py#L229)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with these changes, but we should be aware this trick only holds for identically distributed noise, right?
Also, not convinced by the before/after figure. Both look fine to me. |
Yeah, :/. I had added that point in the initial PR (https://github.com/scikit-optimize/scikit-optimize/blob/master/skopt/optimizer/gp.py#L69)
Why do you think so, given a good number of function calls, the mean prediction should approximate the actual function right, and the first one seems a bit off? |
This would be true for uniformly distributed samples, but this assumption does not hold in BO. We should only expect to be accurate in regions close of candidate optima. In the first figure, the region of high uncertainty is never ever sampled again because its LCB is larger than other parts of the input space. This is an expected behaviour of the algorithm. |
@@ -0,0 +1,41 @@ | |||
from sklearn.gaussian_process import GaussianProcessRegressor as sk_GaussianProcessRegressor | |||
from sklearn.gaussian_process.kernels import WhiteKernel | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 police: double blank please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Yes, that's true. Sorry about the "noise" |
But I think the second cluster of points should be closer to the second local minima, but seems to be displaced by a bit in the first graph. But that just might be me nitpicking |
I can haz merge? |
Current coverage is 82.90% (diff: 96.00%)@@ master #225 diff @@
==========================================
Files 18 20 +2
Lines 892 965 +73
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 730 800 +70
- Misses 162 165 +3
Partials 0 0
|
anything else? |
I'm happy, but don't really understand much of the GP mumbojumbo so deferring to @glouppe 😃 |
self.kernel_.set_params(noise_level=0.0) | ||
else: | ||
for param, value in self.kernel_.get_params().items(): | ||
if isinstance(value, WhiteKernel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isnt this only correct when WhiteKernel
is a term in a sum kernel? (e.g. in the case where kernel = K*WhiteKernel
, i dont believe this trick holds, but your if-statement will still be true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never saw that operations like +
inherited from Sum
, so I pushed a workaround that and added a test
for param, value in self.kernel_.get_params().items(): | ||
if isinstance(value, WhiteKernel): | ||
self.kernel_.set_params( | ||
**{param: WhiteKernel(noise_level=0.0)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for nitpicking again, but this still does not solve completely the issue. If you have kernel = K*WhiteKernel + WhiteKernel
, you would change both of them.
The cleanest solution might be to instantiate the kernel yourself in this new class, adding in the WhiteKernel
term to the user-provided kernel, and then keep a reference to this term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding in the WhiteKernel term to the user-provided kernel,
This goes against the principles of sklearn to not set logic at init time, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not if you have something like:
def __init__(self, ..., kernel, ...):
self.kernel = kernel
def fit(self, X, y):
self.noise_ = WhiteKernel()
self.kernel_ = self.kernel + self.noise_
...
self.gp_ = sklearn.GaussianProcessRegressor(self.kernel_).fit(X, y)
...
self.noise_.set_params(noise_level=0.0)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you wouldn't always want to set the "noise" option on, no? Which is why I suggested the noise="auto"
option to figure if we wanted to add the WhiteKernel
or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noise="auto"
-> noise="gaussian"
(to be more accurate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing this now
ad82d8b
to
c6ffe6c
Compare
@glouppe Your solution will not work because the provided kernel is not modified at fit time, but rather a clone of it. (So the |
def __init__(self, kernel=None, alpha=1e-10, | ||
optimizer="fmin_l_bfgs_b", n_restarts_optimizer=0, | ||
normalize_y=False, copy_X_train=True, random_state=None, | ||
noise=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So by default, noise=None
, which amounts to reuse scikit-learn's GaussianProcessRegressor. Sorry to be blunt, but in the end, what is the added advantage of this PR? simply providing a shortcut for adding a WhiteKernel
component? i.e. noise="gaussian"
versus kernel=K+WhiteKernel()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... oh no, forget that comment. It also deals with disabling that WhiteKernel
part when making predictions. Accordingly, it might be good to add a test highlighting the difference between skopt.GP(kernel=K, noise="gaussian").predict
and sklearn.GP(kernel=K+WhiteKernel()).predict
.
LGTM! |
@glouppe Anything else? |
Phew, this took more time then I had expected. |
squashed and mergred |
|
skopt.learning.GaussianProcessRegressor
P(f(x) | y)
the kernelK(X_train, X_new)
andK(X_new, X_new)
, should not include the noise or white kernel argument, this has been described in Eq, 2.24 of http://www.gaussianprocess.org/gpml/chapters/RW2.pdf . Note thatK(X_train, X_new)
andK(X_new, X_new)
do not have thesigma**2
term.