-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
ENH Allow 0 < p < 1 for Minkowski distance for algorithm="brute"
in NeighborsBase
#24750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Allow 0 < p < 1 for Minkowski distance for algorithm="brute"
in NeighborsBase
#24750
Conversation
…_calculation_exception
@jjerphan I've added the basic functionality, let me know what else changes do I need to do... |
algo="brute"
in NeighborsBase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for opening this PR, @RudreshVeerkhare.
You are on a good start!
I adapted the title to match the convention used within scikit-learn and turned your PR as a Draft (now preferred over prefixing PRs with [WIP]
).
Here are some guidelines about what is to be tested for this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding tests, @RudreshVeerkhare.
Here are some more comments for precision. When all are addressed and a whats_new
entry is added for 1.2 changelog, I think this PR will be mergeable.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
@jjerphan I've incorporated your suggested changes and also added the changelog to v1.2, let me know your suggestions on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few last remarks before this LGTM.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
@jjerphan modified the files as per latest suggestions... |
Thank you. We need another maintainer's approval before merging it — I labeled the PR accordingly. |
y = np.ones(10) | ||
|
||
model = Estimator(p=0) | ||
with pytest.raises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should match the error message using the match
keyword in pytest.raises
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre error message for Exception raised by _validate_params
is dynamic, so I'm not sure how to handle it🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if this is raised because of _valid_params
because p=0
is not in the range, you don't need to test it because it will be tested by the common test for parameter validation. So you can remove this case from your test.
model = Estimator(algorithm="kd_tree", p=0.1) | ||
msg = ( | ||
'algo="kd_tree" does not support 0 < p < 1 for ' | ||
"the Minkowski metric. To resolve this problem either " | ||
'set p >= 1 or algo="brute".' | ||
) | ||
with pytest.raises(ValueError, match=msg): | ||
model.fit(X, y) | ||
|
||
model = Estimator(algorithm="ball_tree", p=0.1) | ||
msg = ( | ||
'algo="ball_tree" does not support 0 < p < 1 for ' | ||
"the Minkowski metric. To resolve this problem either " | ||
'set p >= 1 or algo="brute".' | ||
) | ||
with pytest.raises(ValueError, match=msg): | ||
model.fit(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should make a separate test. It will allow to parameterized it and to have less redundant code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre can you elaborate more on this?🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that:
@pytest.mark.parametrize(
"Estimator",
[
neighbors.KNeighborsClassifier,
neighbors.RadiusNeighborsClassifier,
neighbors.KNeighborsRegressor,
neighbors.RadiusNeighborsRegressor,
],
)
@pytest.mark.parametrize("algorithm", ["kd_tree", "ball_tree"])
def test_neighbors_minkowski_algo_error(Estimator, algorithm):
"""Check that we raise a proper error if `algorithm!='brute'` and `p<1`."""
X = rng.random_sample((10, 2))
y = np.ones(10)
model = Estimator(algorithm=algorithm, p=0.1)
msg = (
f'algorithm="{algorithm}" does not support 0 < p < 1 for '
"the Minkowski metric. To resolve this problem either "
'set p >= 1 or algorithm="brute".'
)
with pytest.raises(ValueError, match=msg):
model.fit(X, y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the idea is to do something similar for code that is repeated in the test.
model = Estimator(p=0.1, algorithm="auto") | ||
msg = ( | ||
"Mind that for 0 < p < 1, Minkowski metrics are not distance metric. Continuing" | ||
" the execution with `algo='brute'`." | ||
) | ||
with pytest.warns(UserWarning, match=msg): | ||
model.fit(X, y) | ||
|
||
assert model._fit_method == "brute" | ||
|
||
model = Estimator(p=0.1, algorithm="brute") | ||
msg = ( | ||
"Mind that for 0 < p < 1, Minkowski metrics are not distance metric. Continuing" | ||
" the execution with `algo='brute'`." | ||
) | ||
with pytest.warns(UserWarning, match=msg): | ||
model.fit(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark regarding the parametrization
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of more suggestions.
model = Estimator(algorithm="kd_tree", p=0.1) | ||
msg = ( | ||
'algo="kd_tree" does not support 0 < p < 1 for ' | ||
"the Minkowski metric. To resolve this problem either " | ||
'set p >= 1 or algo="brute".' | ||
) | ||
with pytest.raises(ValueError, match=msg): | ||
model.fit(X, y) | ||
|
||
model = Estimator(algorithm="ball_tree", p=0.1) | ||
msg = ( | ||
'algo="ball_tree" does not support 0 < p < 1 for ' | ||
"the Minkowski metric. To resolve this problem either " | ||
'set p >= 1 or algo="brute".' | ||
) | ||
with pytest.raises(ValueError, match=msg): | ||
model.fit(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that:
@pytest.mark.parametrize(
"Estimator",
[
neighbors.KNeighborsClassifier,
neighbors.RadiusNeighborsClassifier,
neighbors.KNeighborsRegressor,
neighbors.RadiusNeighborsRegressor,
],
)
@pytest.mark.parametrize("algorithm", ["kd_tree", "ball_tree"])
def test_neighbors_minkowski_algo_error(Estimator, algorithm):
"""Check that we raise a proper error if `algorithm!='brute'` and `p<1`."""
X = rng.random_sample((10, 2))
y = np.ones(10)
model = Estimator(algorithm=algorithm, p=0.1)
msg = (
f'algorithm="{algorithm}" does not support 0 < p < 1 for '
"the Minkowski metric. To resolve this problem either "
'set p >= 1 or algorithm="brute".'
)
with pytest.raises(ValueError, match=msg):
model.fit(X, y)
y = np.ones(10) | ||
|
||
model = Estimator(p=0) | ||
with pytest.raises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if this is raised because of _valid_params
because p=0
is not in the range, you don't need to test it because it will be tested by the common test for parameter validation. So you can remove this case from your test.
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre I've incorporated your review 😄, let me know if anything else is needed... |
algo="brute"
in NeighborsBase
algorithm="brute"
in NeighborsBase
I just fixed two small details in the docstring. Otherwise LGTM. |
Thanks @RudreshVeerkhare Merging. |
Reference Issues/PRs
Fixes #22811
What does this implement/fix? Explain your changes.
_parameter_constraints
forNeighborsBase
to allow p < 1 for Minkowski calculationalgorithm="auto"
to assign_fit_method="brute
if0 < p < 1
for Minkowski metic.Any other comments?
If anything extra needed to be done please let me know.