8000 DOC Improve doc of Nearest Neighbors metric param by Valentin-Laurent · Pull Request #23806 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DOC Improve doc of Nearest Neighbors metric param #23806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Valentin-Laurent
Copy link
Contributor
@Valentin-Laurent Valentin-Laurent commented Jun 30, 2022

Reference Issues/PRs

Fixes #22348.

What does this implement/fix? Explain your changes.

The original goal of this PR was to clarify that a DistanceMetric instance is not a valid metric parameter for most neighbors algorithms (see #22348).

While working on it, I realised that the docstrings about the distance metric parameter are inconsistent across the neighbors classes. That's why I tried to standardise the documentation, while also improving readability and correcting mistakes. I tried to make sure everything was correct by reading the code and experimenting.

I understand that I modified a lot of docstrings at once, and that it is not very readable on a GitHub diff. If needed, I can provide a summary of the changes, in the form of a table for ex.

I wonder if we should factorise this documentation to avoid inconsistencies to happen in the first place. I standardised the docstrings as much as possible, but I also kept class-specific details. If we agree on what should and shouldn't be in the doc, we could easily have a single source of truth for all neighbors classes (keeping of course necessary variations for some of them).

Also, I ran into issues while experimenting with the neighbors.NearestCentroid class (see #23890), so I specified in the doc that some metrics are not supported.

Any other comments?

I'm a first time contributor, so please let me know what I could have done better! My code is a suggestion of improvement, but I can work on a simpler PR to fix solely the original issue if needed.

@jjerphan jjerphan self-requested a review July 4, 2022 07:49
@jjerphan jjerphan changed the title [MRG] Improve doc of Nearest Neighbors metric param DOC Improve doc of Nearest Neighbors metric param Jul 4, 2022
@Valentin-Laurent
Copy link
Contributor Author

@jjerphan I agree with your comment to #22348. I can change metric : str or DistanceMetric object to metric : str, default='minkowski' in _binary_tree.pxi.

Let me know if you see other changes to be made to the PR (and I'd be curious to know your thoughts about factorising documentation).

Copy link
Member
@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @Valentin-Laurent. Here is a first pass of comments.

@Valentin-Laurent
Copy link
Contributor Author
Valentin-Laurent commented Jul 8, 2022

I updated my branch according to the comments of @ArturoAmorQ.

However, I just realized two things regarding sklearn/neighbors/_binary_tree.pxi.

  • Maybe we shouldn't remove DistanceMetric from the documentation as suggested by jjerphan in KNeighborsClassifier not accepting a DistanceMetric #22348, because using an instance of this class is the only way to pass additional metric parameters (for example p for a Minkowski distance)
  • Therefore, it is weird to indicate "minkowski" as the default value, because using metric="minkowski" is a bit confusing since we can't specify p. Maybe we can indicate "euclidean" as the default (which is actually true).

@ogrisel
Copy link
Member
ogrisel commented Jul 13, 2022

Maybe we shouldn't remove DistanceMetric from the documentation as suggested by jjerphan in #22348, because using an instance of this class is the only way to pass additional metric parameters (for example p for a Minkowski distance)

This is not true: you can pass algorithm="ball_tree", metric="minkowski", metric_params={"p": 1} to KNeighborsClassifier for instance.

@Valentin-Laurent
Copy link
Contributor Author

Maybe we shouldn't remove DistanceMetric from the documentation as suggested by jjerphan in #22348, because using an instance of this class is the only way to pass additional metric parameters (for example p for a Minkowski distance)

This is not true: you can pass algorithm="ball_tree", metric="minkowski", metric_params={"p": 1} to KNeighborsClassifier for instance.

I meant we shouldn't remove DistanceMetric from the documentation of the BallTreeand KDTree classes, because the following code won't work:

tree = BallTree(X, metric='minkowski', metric_params={"p": 1})

and you have to do:

metric = DistanceMetric.get_metric(metric="minkowski", p=1)
tree = BallTree(X, metric=metric)

@ogrisel
Copy link
Member
ogrisel commented Jul 13, 2022

Alright.

Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few last remarks.

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise.

Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Thank you, @Valentin-Laurent.

@jjerphan jjerphan merged commit 3a792b6 into scikit-learn:main Jul 19, 2022
@Valentin-Laurent Valentin-Laurent deleted the distance_metric_docstring branch August 1, 2022 22:00
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KNeighborsClassifier not accepting a DistanceMetric
4 participants
0