-
-
Notifications
You must be signed in to change notification settings - Fork 26k
MAINT Parameters validation for chi2_kernel with gamma #26153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ling parameter gamma
…heck for X and Y as duplicate
…heck cannot be skipped as it also checks for sparse matrix
@@ -242,6 +242,7 @@ def _check_function_param_validation( | |||
"sklearn.tree.export_text", | |||
"sklearn.tree.plot_tree", | |||
"sklearn.utils.gen_batches", | |||
"sklearn.metrics.pairwise.chi2_kernel", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this line in alphabetic order
sklearn/metrics/pairwise.py
Outdated
{ | ||
"X": ["array-like", "sparse matrix"], | ||
"Y": ["array-like", "sparse matrix", None], | ||
"gamma": [Interval(Real, None, None, closed="neither")], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also surprised here that it does fail because we don't allow ndarray
which would be required by Gaussian processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, here we would trigger a regression:
In [2]: from sklearn.datasets import make_friedman2
...: from sklearn.gaussian_process import GaussianProcessRegressor
...: from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel, PairwiseKernel
...: X, y = make_friedman2(n_samples=500, noise=0, random_state=0)
...: kernel = DotProduct() + WhiteKernel() + PairwiseKernel(gamma=1.0, metric="chi2")
...: gpr = GaussianProcessRegressor(kernel=kernel,
...: random_state=0).fit(X, y)
InvalidParameterError: The 'gamma' parameter of chi2_kernel must be a float in the range (-inf, inf). Got array([1.]) instead.
So it would be useful to add in test_gpr.py
this minimal example to be sure that we have a test triggering this issue.
Here, it means that we need:
"gamma": [Interval(Real, None, None, closed="neither")], | |
"gamma": [Interval(Real, None, None, closed="neither"), Hidden(np.ndarray)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rand0wn please also add a test with what @glemaitre points out. Also, why don't we want to document gamma
as an array publicly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is an array with unique values. We found out that the only case that this happens is internal to the GP models that provide such entry.
We therefore should still accept this type of entry to avoid regression but we should not document it since we don't support gamma
being several values in an array.
…, added ndarray in gamma for gaussian processes
Added changes for review again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @rand0wn
Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr>
Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr>
Reference Issues/PRs
Towards #24862
What does this implement/fix? Explain your changes.
Added automatic parameter validation for "sklearn.metrics.pairwise.chi2_kernel".
Any other comments?
The validation of "sklearn.metrics.pairwise.additive_chi2_kernel" doesn't include validation of gamma, so even through X and Y are revalidated, gamm 8000 a isn't, additive_chi2_kernel validation can be removed as it may not be needed.