8000 DOC Improve doc of Nearest Neighbors metric param (#23806) · glemaitre/scikit-learn@504f3d8 · GitHub
[go: up one dir, main page]

Skip to content

Commit 504f3d8

Browse files
Valentin-Laurentglemaitre
authored andcommitted
DOC Improve doc of Nearest Neighbors metric param (scikit-learn#23806)
1 parent 633c9fa commit 504f3d8

File tree

8 files changed

+152
-132
lines changed

8 files changed

+152
-132
lines changed

sklearn/neighbors/_binary_tree.pxi

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -230,12 +230,14 @@ leaf_size : positive int, default=40
230230
satisfy ``leaf_size <= n_points <= 2 * leaf_size``, except in
231231
the case that ``n_samples < leaf_size``.
232232
233-
metric : str or DistanceMetric object
234-
The distance metric to use for the tree. Default='minkowski'
235-
with p=2 (that is, a euclidean metric). See the documentation
236-
of the DistanceMetric class for a list of available metrics.
237-
{binary_tree}.valid_metrics gives a list of the metrics which
238-
are valid for {BinaryTree}.
233+
metric : str or DistanceMetric object, default='minkowski'
234+
Metric to use for distance computation. Default is "minkowski", which
235+
results in the standard Euclidean distance when p = 2.
236+
{binary_tree}.valid_metrics gives a list of the metrics which are valid for
237+
{BinaryTree}. See the documentation of `scipy.spatial.distance
238+
<https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and the
239+
metrics listed in :class:`~sklearn.metrics.pairwise.distance_metrics` for
240+
more information.
239241
240242
Additional keywords are passed to the distance metric class.
241243
Note: Callable functions in the metric parameter are NOT supported for KDTree

sklearn/neighbors/_classification.py

Lines changed: 30 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -65,15 +65,22 @@ class KNeighborsClassifier(KNeighborsMixin, ClassifierMixin, NeighborsBase):
6565
(l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
6666
6767
metric : str or callable, default='minkowski'
68-
The distance metric to use for the tree. The default metric is
69-
minkowski, and with p=2 is equivalent to the standard Euclidean
70-
metric. For a list of available metrics, see the documentation of
71-
:class:`~sklearn.metrics.DistanceMetric` and the metrics listed in
72-
`sklearn.metrics.pairwise.PAIRWISE_DISTANCE_FUNCTIONS`. Note that the
73-
"cosine" metric uses :func:`~sklearn.metrics.pairwise.cosine_distances`.
68+
Metric to use for distance computation. Default is "minkowski", which
69+
results in the standard Euclidean distance when p = 2. See the
70+
documentation of `scipy.spatial.distance
71+
<https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and
72+
the metrics listed in
73+
:class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric
74+
values.
75+
7476
If metric is "precomputed", X is assumed to be a distance matrix and
75-
must be square during fit. X may be a :term:`sparse graph`,
76-
in which case only "nonzero" elements may be considered neighbors.
77+
must be square during fit. X may be a :term:`sparse graph`, in which
78+
case only "nonzero" elements may be considered neighbors.
79+
80+
If metric is a callable function, it takes two arrays representing 1D
81+
vectors as inputs and must return one value indicating the distance
82+
between those vectors. This works for Scipy's metrics, but is less
83+
efficient than passing the metric name as a string.
7784
7885
metric_params : dict, default=None
7986
Additional keyword arguments for the metric function.
@@ -357,13 +364,22 @@ class RadiusNeighborsClassifier(RadiusNeighborsMixin, ClassifierMixin, Neighbors
357364
(l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
358365
359366
metric : str or callable, default='minkowski'
360-
Distance metric to use for the tree. The default metric is
361-
minkowski, and with p=2 is equivalent to the standard Euclidean
362-
metric. For a list of available metrics, see the documentation of
363-
:class:`~sklearn.metrics.DistanceMetric`.
367+
Metric to use for distance computation. Default is "minkowski", which
368+
results in the standard Euclidean distance when p = 2. See the
369+
documentation of `scipy.spatial.distance
370+
<https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and
371+
the metrics listed in
372+
:class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric
373+
values.
374+
364375
If metric is "precomputed", X is assumed to be a distance matrix and
365-
must be square during fit. X may be a :term:`sparse graph`,
366-
in which case only "nonzero" elements may be considered neighbors.
376+
must be square during fit. X may be a :term:`sparse graph`, in which
377+
case only "nonzero" elements may be considered neighbors.
378+
379+
If metric is a callable function, it takes two arrays representing 1D
380+
vectors as inputs and must return one value indicating the distance
381+
between those vectors. This works for Scipy's metrics, but is less
382+
efficient than passing the metric name as a string.
367383
368384
outlier_label : {manual label, 'most_frequent'}, default=None
369385
Label for outlier samples (samples with no neighbors in given radius).

sklearn/neighbors/_graph.py

Lines changed: 36 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -65,13 +65,13 @@ def kneighbors_graph(
6565
between neighbors according to the given metric.
6666
6767
metric : str, default='minkowski'
68-
The distance metric to use for the tree. The default metric is
69-
minkowski, and with p=2 is equivalent to the standard Euclidean
70-
metric.
71-
For a list of available metrics, see the documentation of
72-
:class:`~sklearn.metrics.DistanceMetric` and the metrics listed in
73-
`sklearn.metrics.pairwise.PAIRWISE_DISTANCE_FUNCTIONS`. Note that the
74-
"cosine" metric uses :func:`~sklearn.metrics.pairwise.cosine_distances`.
68+
Metric to use for distance computation. Default is "minkowski", which
69+
results in the standard Euclidean distance when p = 2. See the
70+
documentation of `scipy.spatial.distance
71+
<https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and
72+
the metrics listed in
73+
:class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric
74+
values.
7575
7676
p : int, default=2
7777
Power parameter for the Minkowski metric. When p = 1, this is
@@ -160,13 +160,13 @@ def radius_neighbors_graph(
160160
between neighbors according to the given metric.
161161
162162
metric : str, default='minkowski'
163-
The distance metric to use for the tree. The default metric is
164-
minkowski, and with p=2 is equivalent to the standard Euclidean
165-
metric.
166-
For a list of available metrics, see the documentation of
167-
:class:`~sklearn.metrics.DistanceMetric` and the metrics listed in
168-
`sklearn.metrics.pairwise.PAIRWISE_DISTANCE_FUNCTIONS`. Note that the
169-
"cosine" metric uses :func:`~sklearn.metrics.pairwise.cosine_distances`.
163+
Metric to use for distance computation. Default is "minkowski", which
164+
results in the standard Euclidean distance when p = 2. See the
165+
documentation of `scipy.spatial.distance
166+
<https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and
167+
the metrics listed in
168+
:class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric
169+
values.
170170
171171
p : int, default=2
172172
Power parameter for the Minkowski metric. When p = 1, this is
@@ -266,31 +266,21 @@ class KNeighborsTransformer(
266266
nature of the problem.
267267
268268
metric : str or callable, default='minkowski'
269-
Metric to use for distance computation. Any metric from scikit-learn
270-
or scipy.spatial.distance can be used.
271-
272-
If metric is a callable function, it is called on each
273-
pair of instances (rows) and the resulting value recorded. The callable
274-
should take two arrays as input and return one value indicating the
275-
distance between them. This works for Scipy's metrics, but is less
269+
Metric to use for distance computation. Default is "minkowski", which
270+
results in the standard Euclidean distance when p = 2. See the
271+
documentation of `scipy.spatial.distance
272+
<https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and
273+
the metrics listed in
274+
:class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric
275+
values.
276+
277+
If metric is a callable function, it takes two arrays representing 1D
278+
vectors as inputs and must return one value indicating the distance
279+
between those vectors. This works for Scipy's metrics, but is less
276280
efficient than passing the metric name as a string.
277281
278282
Distance matrices are not supported.
279283
280-
Valid values for metric are:
281-
282-
- from scikit-learn: ['cityblock', 'cosine', 'euclidean', 'l1', 'l2',
283-
'manhattan']
284-
285-
- from scipy.spatial.distance: ['braycurtis', 'canberra', 'chebyshev',
286-
'correlation', 'dice', 'hamming', 'jaccard', 'kulsinski',
287-
'mahalanobis', 'minkowski', 'rogerstanimoto', 'russellrao',
288-
'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean',
289-
'yule']
290-
291-
See the documentation for scipy.spatial.distance for details on these
292-
metrics.
293-
294284
p : int, default=2
295285
Parameter for the Minkowski metric from
296286
sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is
@@ -493,31 +483,21 @@ class RadiusNeighborsTransformer(
493483
nature of the problem.
494484
495485
metric : str or callable, default='minkowski'
496-
Metric to use for distance computation. Any metric from scikit-learn
497-
or scipy.spatial.distance can be used.
498-
499-
If metric is a callable function, it is called on each
500-
pair of instances (rows) and the resulting value recorded. The callable
501-
should take two arrays as input and return one value indicating the
502-
distance between them. This works for Scipy's metrics, but is less
486+
Metric to use for distance computation. Default is "minkowski", which
487+
results in the standard Euclidean distance when p = 2. See the
488+
documentation of `scipy.spatial.distance
489+
<https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and
490+
the metrics listed in
491+
:class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric
492+
values.
493+
494+
If metric is a callable function, it takes two arrays representing 1D
495+
vectors as inputs and must return one value indicating the distance
496+
between those vectors. This works for Scipy's metrics, but is less
503497
efficient than passing the metric name as a string.
504498
505499
Distance matrices are not supported.
506500
507-
Valid values for metric are:
508-
509-
- from scikit-learn: ['cityblock', 'cosine', 'euclidean', 'l1', 'l2',
510-
'manhattan']
511-
512-
- from scipy.spatial.distance: ['braycurtis', 'canberra', 'chebyshev',
513-
'correlation', 'dice', 'hamming', 'jaccard', 'kulsinski',
514-
'mahalanobis', 'minkowski', 'rogerstanimoto', 'russellrao',
515-
'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean',
516-
'yule']
517-
518-
See the documentation for scipy.spatial.distance for details on these
519-
metrics.
520-
521501
p : int, default=2
522502
Parameter for the Minkowski metric from
523503
sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is

sklearn/neighbors/_kde.py

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,17 @@ class KernelDensity(BaseEstimator):
4747
The kernel to use.
4848
4949
metric : str, default='euclidean'
50-
The distance metric to use. Note that not all metrics are
51-
valid with all algorithms. Refer to the documentation of
52-
:class:`BallTree` and :class:`KDTree` for a description of
53-
available algorithms. Note that the normalization of the density
54-
output is correct only for the Euclidean distance metric. Default
55-
is 'euclidean'.
50+
Metric to use for distance computation. See the
51+
documentation of `scipy.spatial.distance
52+
<https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and
53+
the metrics listed in
54+
:class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric
55+
values.
56+
57+
Not all metrics are valid with all algorithms: refer to the
58+
documentation of :class:`BallTree` and :class:`KDTree`. Note that the
59+
normalization of the density output is correct only for the Euclidean
60+
distance metric.
5661
5762
atol : float, default=0
5863
The desired absolute tolerance of the result. A larger tolerance will

sklearn/neighbors/_lof.py

Lines changed: 12 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -58,34 +58,23 @@ class LocalOutlierFactor(KNeighborsMixin, OutlierMixin, NeighborsBase):
5858
nature of the problem.
5959
6060
metric : str or callable, default='minkowski'
61-
The metric is used for distance computation. Any metric from scikit-learn
62-
or scipy.spatial.distance can be used.
61+
Metric to use for distance computation. Default is "minkowski", which
62+
results in the standard Euclidean distance when p = 2. See the
63+
documentation of `scipy.spatial.distance
64+
<https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and
65+
the metrics listed in
66+
:class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric
67+
values.
6368
6469
If metric is "precomputed", X is assumed to be a distance matrix and
65-
must be square. X may be a sparse matrix, in which case only "nonzero"
66-
elements may be considered neighbors.
70+
must be square during fit. X may be a :term:`sparse graph`, in which
71+
case only "nonzero" elements may be considered neighbors.
6772
68-
If metric is a callable function, it is called on each
69-
pair of instances (rows) and the resulting value recorded. The callable
70-
should take two arrays as input and return one value indicating the
71-
distance between them. This works for Scipy's metrics, but is less
73+
If metric is a callable function, it takes two arrays representing 1D
74+
vectors as inputs and must return one value indicating the distance
75+
between those vectors. This works for Scipy's metrics, but is less
7276
efficient than passing the metric name as a string.
7377
74-
Valid values for metric are:
75-
76-
- from scikit-learn: ['cityblock', 'cosine', 'euclidean', 'l1', 'l2',
77-
'manhattan']
78-
79-
- from scipy.spatial.distance: ['braycurtis', 'canberra', 'chebyshev',
80-
'correlation', 'dice', 'hamming', 'jaccard', 'kulsinski',
81-
'mahalanobis', 'minkowski', 'rogerstanimoto', 'russellrao',
82-
'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean',
83-
'yule']
84-
85-
See the documentation for scipy.spatial.distance for details on these
86-
metrics:
87-
https://docs.scipy.org/doc/scipy/reference/spatial.distance.html.
88-
8978
p : int, default=2
9079
Parameter for the Minkowski metric from
9180
:func:`sklearn.metrics.pairwise.pairwise_distances`. When p = 1, this

sklearn/neighbors/_nearest_centroid.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,16 @@ class NearestCentroid(ClassifierMixin, BaseEstimator):
3030
Parameters
3131
----------
3232
metric : str or callable, default="euclidean"
33-
The metric to use when calculating distance between instances in a
34-
feature array. If metric is a string or callable, it must be one of
35-
the options allowed by
36-
:func:`~sklearn.metrics.pairwise_distances` for its metric
37-
parameter. The centroids for the samples corresponding to each class is
33+
Metric to use for distance computation. Default is "minkowski", which
34+
results in the standard Euclidean distance when p = 2. See the
35+
documentation of `scipy.spatial.distance
36+
<https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and
37+
the metrics listed in
38+
:class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric
39+
values. Note that "wminkowski", "seuclidean" and "mahalanobis" are not
40+
supported.
41+
42+
The centroids for the samples corresponding to each class is
3843
the point from which the sum of the distances (according to the metric)
3944
of all samples that belong to that particular class are minimized.
4045
If the `"manhattan"` metric is provided, this centroid is the median

sklearn/neighbors/_regression.py

Lines changed: 30 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -72,15 +72,22 @@ class KNeighborsRegressor(KNeighborsMixin, RegressorMixin, NeighborsBase):
7272
(l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
7373
7474
metric : str or callable, default='minkowski'
75-
The distance metric to use for the tree. The default metric is
76-
minkowski, and with p=2 is equivalent to the standard Euclidean
77-
metric. For a list of available metrics, see the documentation of
78-
:class:`~sklearn.metrics.DistanceMetric` and the metrics listed in
79-
`sklearn.metrics.pairwise.PAIRWISE_DISTANCE_FUNCTIONS`. Note that the
80-
"cosine" metric uses :func:`~sklearn.metrics.pairwise.cosine_distances`.
75+
Metric to use for distance computation. Default is "minkowski", which
76+
results in the standard Euclidean distance when p = 2. See the
77+
documentation of `scipy.spatial.distance
78+
<https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and
79+
the metrics listed in
80+
:class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric
81+
values.
82+
8183
If metric is "precomputed", X is assumed to be a distance matrix and
82-
must be square during fit. X may be a :term:`sparse graph`,
83-
in which case only "nonzero" elements may be considered neighbors.
84+
must be square during fit. X may be a :term:`sparse graph`, in which
85+
case only "nonzero" elements may be considered neighbors.
86+
87+
If metric is a callable function, it takes two arrays representing 1D
88+
vectors as inputs and must return one value indicating the distance
89+
between those vectors. This works for Scipy's metrics, but is less
90+
efficient than passing the metric name as a string.
8491
8592
metric_params : dict, default=None
8693
Additional keyword arguments for the metric function.
@@ -300,13 +307,22 @@ class RadiusNeighborsRegressor(RadiusNeighborsMixin, RegressorMixin, NeighborsBa
300307
(l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
301308
302309
metric : str or callable, default='minkowski'
303-
The distance metric to use for the tree. The default metric is
304-
minkowski, and with p=2 is equivalent to the standard Euclidean
305-
metric. See the documentation of :class:`DistanceMetric` for a
306-
list of available metrics.
310+
Metric to use for distance computation. Default is "minkowski", which
311+
results in the standard Euclidean distance when p = 2. See the
312+
documentation of `scipy.spatial.distance
313+
<https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and
314+
the metrics listed in
315+
:class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric
316+
values.
317+
307318
If metric is "precomputed", X is assumed to be a distance matrix and
308-
must be square during fit. X may be a :term:`sparse graph`,
309-
in which case only "nonzero" elements may be considered neighbors.
319+
must be square during fit. X may be a :term:`sparse graph`, in which
320+
case only "nonzero" elements may be considered neighbors.
321+
322+
If metric is a callable function, it takes two arrays representing 1D
323+
vectors as inputs and must return one value indicating the distance
324+
between those vectors. This works for Scipy's metrics, but is less
325+
efficient than passing the metric name as a string.
310326
311327
metric_params : dict, default=None
312328
Additional keyword arguments for the metric function.

0 commit comments

Comments
 (0)
0