8000 API Deprecate metrics other than euclidean and manhattan for NearestCentroid by Valentin-Laurent · Pull Request #24083 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

API Deprecate metrics other than euclidean and manhattan for NearestCentroid #24083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Valentin-Laurent
Copy link
Contributor

Reference Issues/PRs

Follow up on #23874. See also #23890.

What does this implement/fix? Explain your changes.

As discussed in #23874, this deprecates the use of metrics other than "euclidean" and "manhattan" for NearestCentroid.
To add a bit more context to why it doesn't seem to make sense to use any other metric:

  • I haven't found any mention of NearestCentroid using other metrics in the litterature
  • This is probably related to the fact that NearestCentroid is based on K-means, and K-means is not about distance but rather variance (see the second answer on this question), so it doesn't makes sense to speak of a "distance".
  • NearestCentroid can also be based on K-medians rather than K-means, that's why "using the manhattan distance" instead of euclidean is legit
  • Disclaimer: I'm not a specialist, so the statements above may be innacurate or even wrong :)

Any other comments?

  • I checked the examples (everything is OK).
  • I haven't found any test using NearestCentroid except in test_nearest_centroid.py.
  • I updated an obsolete docstring about metric="precomputed", even though this is not exactly related to this PR.
  • I haven't updated the whats_new yet.

@ArturoAmorQ
Copy link
Member

For info, this was introduced in #3772. IMO it is ok to depreciate them, but this is open to discussion.

@jjerphan
Copy link
Member

Hi @Valentin-Laurent,

Thank you for your PR. Your points are valid to me: theoretically, the current implementation is false for other distance metrics than euclidean an Manhattan. Moreover, like you, I do not see why would people choose to use other distance metrics even if the implementations were valid for them.

I think the general support of distances in scikit-learn made all distance metrics usable in most of estimators exposing a metric parameters, but its relevance for each estimator maybe has not been thought in details. This relates to other similar issues (like #22348) I think maintainers need to agree on their resolution (deprecation cycle or continued support). Hence, I think would recommend putting this PR on hold until a decision has been taken. Thank you for your patience.

@Valentin-Laurent
Copy link
Contributor Author

Makes sense :) I'll be happy to work on this later on if deprecation is the chosen way

Copy link
Member
@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Valentin-Laurent. Looking at the discussions in the linked issue and in the PR, it seems that everyone involved is in favor of the deprecation.

Are you still interested/available in continuing this PR ? If so, here are some comments. In particular, it should now target 1.3. Can you also add an entry in the what's new ?

@Valentin-Laurent
Copy link
Contributor Author

Hello @jeremiedbb,
Sure, let me get back to it.

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
@Valentin-Laurent
Copy link
Contributor Author

Quick update: been quite busy lately, but this is still on my radar :)

@jeremiedbb
Copy link
Member

@Valentin-Laurent I directly pushed the remaining changes since we'd like to release 1.3 soon-ish and I'd like to have this deprecation in 1.3.

Copy link
Member
@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM up to a few minor comments.

Thank you, @Valentin-Laurent.

):
warnings.warn(
(
"Support for metrics other than euclidean and "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distance metrics and metrics are two different concepts. I think we need to be clear despite the potentially misleading name of this parameter.

Suggested change
"Support for metrics other than euclidean and "
"Support for other distance metrics than euclidean and "

clf = NearestCentroid(metric=metric)
with pytest.warns(
FutureWarning,
match="Support for metrics other than euclidean and manhattan",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Companion suggestion.

Suggested change
match="Support for metrics other than euclidean and manhattan",
match="Support for other distance metrics than euclidean and manhattan",

Comment on lines 155 to 159
for metric in {"manhattan", "euclidean"}:
clf = NearestCentroid(metric=metric)
with warnings.catch_warnings():
warnings.simplefilter("error")
clf.fit(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually do not test absence of warnings.

Removing this part allows parameterizing the tests on the other metrics, allowing testing the behavior for deprecating their support (currently a first failure for the test does not allow testing against other metrics).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed :)

@Valentin-Laurent
Copy link
Contributor Author

@Valentin-Laurent I directly pushed the remaining changes since we'd like to release 1.3 soon-ish and I'd like to have this deprecation in 1.3.

@jeremiedbb Sure thing, thank you! Sorry for the bad planning on my side. I let you take care of Julien last round of comments if that's ok.

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
@jjerphan jjerphan enabled auto-merge (squash) May 24, 2023 14:02
@jjerphan jjerphan merged commit 689efe2 into scikit-learn:main May 24, 2023
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
…entroid (scikit-learn#24083)

Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0