-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Allow multiple input feature with the same value for gaussian processes #3163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiple input feature with the same value for gaussian processes #3163
Conversation
To consider merging this, we need to have a test added. |
I added a unit test for nosiy data and used here duplicate input feature. I also changed the code slightly so it will only allow duplicate input feature for noisy data (unit test added for this as well). |
for corr in all_corr: | ||
test_1d(regr='constant', corr=corr, random_start=random_start) | ||
test_2d(regr='constant', corr=corr, random_start=random_start) | ||
test_2d_2d(regr='constant', corr=corr, random_start=random_start) | ||
check2d(X, regr='constant', corr=corr, random_start=random_start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be appropriate to use yield
to run check2d
and check2d_2d
@filthysocks, one reason this PR hasn't gained much attention is that I don't think "multiple features" could possibly be what you mean. Of course GP allows multiple predictors in X (i.e. X.shape[1] > 1). By "multiple features" do you mean repeated samples? repeated features? It looks like the former, but I'm not familiar with the GP implementation, so it's hard to know.. |
"multiple input feature with the same value" so with this i mean repeated samples. So if i would try to estimate the function sin(x)*x with a GP (as in the test), currently all my samples for x would need to be different (except I choose the cov. function 'pure_nugget'). However, there is no need for this limitation. |
@@ -525,6 +525,15 @@ def predict(self, X, eval_MSE=False, batch_size=None): | |||
|
|||
return y | |||
|
|||
def _prohibitMultipleInputFeature(self, D): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @jnothman said this name is confusing. Can you please give an example of invalid input? Also function names are lower case in Python. A better name would be _has_repeated_samples(self, pairwise_distances)
.
Thanks for your feedback, it's very much appreciated.
There is a unit test that triggers a value error @raises(ValueError)
def test_no_multiple_feature():
'''
check that multiple features are not allowed for non-noisy data
'''
X = np.array([[-4.61611719, -6.00099547],
[4.10469096, 5.32782448],
[0.00000000, -0.50000000],
[-6.17289014, -4.6984743],
[-6.17289014, -4.6984743],
[1.3109306, -6.93271427],
[-5.03823144, 3.10584743],
[-2.87600388, 6.74310541],
[5.21301203, 4.26386883]])
check2d(X)
For sure, but how useful is a regression model that does not allow for repeated samples? That's kind of the point of regression models. |
This looks reasonable to me, and doesn't appear to break the current API since our default is (apparently) to already add diagonal loading (I really hate the term nugget). However, I still don't understand why it was broken in the first place if the loading was being added - can you clarify this @filthysocks? |
Well, there is just an if condition in there saying 'you are not allowed to have repeated samples even if you use a nugget'. This is reasonable if you assume no noise (except for machine imprecisions). It would be allowed if you use a specific kernel (self.corr != correlation.pure_nugget) ... However, there is just no need to limit it to just this one. So it is not really broken in some sense. It is just setting a limitation that is just unnecessary. |
Closing. We now have a new implementation of gaussian processes (see #4270). Thanks for you contribution. |
Fix for issue #3162. Allows to have multiple input feature with the same values for Gaussian Processes