-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
API Deprecate metrics other than euclidean and manhattan for NearestCentroid #24083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Deprecate metrics other than euclidean and manhattan for NearestCentroid #24083
Conversation
For info, this was introduced in #3772. IMO it is ok to depreciate them, but this is open to discussion. |
Thank you for your PR. Your points are valid to me: theoretically, the current implementation is false for other distance metrics than euclidean an Manhattan. Moreover, like you, I do not see why would people choose to use other distance metrics even if the implementations were valid for them. I think the general support of distances in scikit-learn made all distance metrics usable in most of estimators exposing a |
Makes sense :) I'll be happy to work on this later on if deprecation is the chosen way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Valentin-Laurent. Looking at the discussions in the linked issue and in the PR, it seems that everyone involved is in favor of the deprecation.
Are you still interested/available in continuing this PR ? If so, here are some comments. In particular, it should now target 1.3. Can you also add an entry in the what's new ?
Hello @jeremiedbb, |
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Quick update: been quite busy lately, but this is still on my radar :) |
@Valentin-Laurent I directly pushed the remaining changes since we'd like to release 1.3 soon-ish and I'd like to have this deprecation in 1.3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM up to a few minor comments.
Thank you, @Valentin-Laurent.
): | ||
warnings.warn( | ||
( | ||
"Support for metrics other than euclidean and " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Distance metrics and metrics are two different concepts. I think we need to be clear despite the potentially misleading name of this parameter.
"Support for metrics other than euclidean and " | |
"Support for other distance metrics than euclidean and " |
clf = NearestCentroid(metric=metric) | ||
with pytest.warns( | ||
FutureWarning, | ||
match="Support for metrics other than euclidean and manhattan", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Companion suggestion.
match="Support for metrics other than euclidean and manhattan", | |
match="Support for other distance metrics than euclidean and manhattan", |
sklearn/neighbors/tests/test_nearest_centroid.py
Outdated
8000
for metric in {"manhattan", "euclidean"}: | ||
clf = NearestCentroid(metric=metric) | ||
with warnings.catch_warnings(): | ||
warnings.simplefilter("error") | ||
clf.fit(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually do not test absence of warnings.
Removing this part allows parameterizing the tests on the other metrics, allowing testing the behavior for deprecating their support (currently a first failure for the test does not allow testing against other metrics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed :)
@jeremiedbb Sure thing, thank you! Sorry for the bad planning on my side. I let you take care of Julien last round of comments if that's ok. |
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
…entroid (scikit-learn#24083) Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Reference Issues/PRs
Follow up on #23874. See also #23890.
What does this implement/fix? Explain your changes.
As discussed in #23874, this deprecates the use of metrics other than
"euclidean"
and"manhattan"
forNearestCentroid
.To add a bit more context to why it doesn't seem to make sense to use any other metric:
Any other comments?
NearestCentroid
except intest_nearest_centroid.py
.metric="precomputed"
, even though this is not exactly related to this PR.whats_new
yet.