DOC Improve doc of Nearest Neighbors metric param#23806
DOC Improve doc of Nearest Neighbors metric param#23806jjerphan merged 8 commits intoscikit-learn:mainfrom
Conversation
There was a problem hiding this comment.
Thanks for the PR, @Valentin-Laurent. Here is a first pass of comments.
…nt/scikit-learn into distance_metric_docstring
|
I updated my branch according to the comments of @ArturoAmorQ. However, I just realized two things regarding
|
This is not true: you can pass |
I meant we shouldn't remove DistanceMetric from the documentation of the tree = BallTree(X, metric='minkowski', metric_params={"p": 1})and you have to do: metric = DistanceMetric.get_metric(metric="minkowski", p=1)
tree = BallTree(X, metric=metric) |
|
Alright. |
There was a problem hiding this comment.
LGTM.
Thank you, @Valentin-Laurent.
Reference Issues/PRs
Fixes #22348.
What does this implement/fix? Explain your changes.
The original goal of this PR was to clarify that a
DistanceMetricinstance is not a validmetricparameter for most neighbors algorithms (see #22348).While working on it, I realised that the docstrings about the distance metric parameter are inconsistent across the
neighborsclasses. That's why I tried to standardise the documentation, while also improving readability and correcting mistakes. I tried to make sure everything was correct by reading the code and experimenting.I understand that I modified a lot of docstrings at once, and that it is not very readable on a GitHub diff. If needed, I can provide a summary of the changes, in the form of a table for ex.
I wonder if we should factorise this documentation to avoid inconsistencies to happen in the first place. I standardised the docstrings as much as possible, but I also kept class-specific details. If we agree on what should and shouldn't be in the doc, we could easily have a single source of truth for all
neighborsclasses (keeping of course necessary variations for some of them).Also, I ran into issues while experimenting with the neighbors.NearestCentroid class (see #23890), so I specified in the doc that some metrics are not supported.
Any other comments?
I'm a first time contributor, so please let me know what I could have done better! My code is a suggestion of improvement, but I can work on a simpler PR to fix solely the original issue if needed.