-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
DOC Improve doc of Nearest Neighbors metric param #23806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Improve doc of Nearest Neighbors metric param #23806
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, @Valentin-Laurent. Here is a first pass of comments.
…nt/scikit-learn into distance_metric_docstring
I updated my branch according to the comments of @ArturoAmorQ. However, I just realized two things regarding
|
This is not true: you can pass |
I meant we shouldn't remove DistanceMetric from the documentation of the tree = BallTree(X, metric='minkowski', metric_params={"p": 1}) and you have to do: metric = DistanceMetric.get_metric(metric="minkowski", p=1)
tree = BallTree(X, metric=metric) |
Alright. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few last remarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thank you, @Valentin-Laurent.
Reference Issues/PRs
Fixes #22348.
What does this implement/fix? Explain your changes.
The original goal of this PR was to clarify that a
DistanceMetric
instance is not a validmetric
parameter for most neighbors algorithms (see #22348).While working on it, I realised that the docstrings about the distance metric parameter are inconsistent across the
neighbors
classes. That's why I tried to standardise the documentation, while also improving readability and correcting mistakes. I tried to make sure everything was correct by reading the code and experimenting.I understand that I modified a lot of docstrings at once, and that it is not very readable on a GitHub diff. If needed, I can provide a summary of the changes, in the form of a table for ex.
I wonder if we should factorise this documentation to avoid inconsistencies to happen in the first place. I standardised the docstrings as much as possible, but I also kept class-specific details. If we agree on what should and shouldn't be in the doc, we could easily have a single source of truth for all
neighbors
classes (keeping of course necessary variations for some of them).Also, I ran into issues while experimenting with the neighbors.NearestCentroid class (see #23890), so I specified in the doc that some metrics are not supported.
Any other comments?
I'm a first time contributor, so please let me know what I could have done better! My code is a suggestion of improvement, but I can work on a simpler PR to fix solely the original issue if needed.