8000 Listed valid metrics in neighbors.rst · scikit-learn/scikit-learn@3926ed2 · GitHub
[go: up one dir, main page]

Skip to content

Commit 3926ed2

Browse files
committed
Listed valid metrics in neighbors.rst
Reordered algorithms
1 parent 9206a78 commit 3926ed2

File tree

1 file changed

+78
-54
lines changed

1 file changed

+78
-54
lines changed

doc/modules/neighbors.rst

Lines changed: 78 additions & 54 deletions
8000
Original file line numberDiff line numberDiff line change
@@ -252,6 +252,61 @@ the lower half of those faces.
252252
multi-output regression using nearest neighbors.
253253

254254

255+
Nearest Centroid Classifier
256+
===========================
257+
258+
The :class:`NearestCentroid` classifier is a simple algorithm that represents
259+
each class by the centroid of its members. In effect, this makes it
260+
similar to the label updating phase of the :class:`sklearn.KMeans` algorithm.
261+
It also has no parameters to choose, making it a good baseline classifier. It
262+
does, however, suffer on non-convex classes, as well as when classes have
263+
drastically different variances, as equal variance in all dimensions is
264+
assumed. See Linear Discriminant Analysis (:class:`sklearn.lda.LDA`) and
265+
Quadratic Discriminant Analysis (:class:`sklearn.qda.QDA`) for more complex
266+
methods that do not make this assumption. Usage of the default
267+
:class:`NearestCentroid` is simple:
268+
269+
>>> from sklearn.neighbors.nearest_centroid import NearestCentroid
270+
>>> import numpy as np
271+
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
272+
>>> y = np.array([1, 1, 1, 2, 2, 2])
273+
>>> clf = NearestCentroid()
274+
>>> clf.fit(X, y)
275+
NearestCentroid(metric='euclidean', shrink_threshold=None)
276+
>>> print(clf.predict([[-0.8, -1]]))
277+
[1]
278+
279+
280+
Nearest Shrunken Centroid
281+
-------------------------
282+
283+
The :class:`NearestCentroid` classifier has a ``shrink_threshold`` parameter,
284+
which implements the nearest shrunken centroid classifier. In effect, the value
285+
of each feature for each centroid is divided by the within-class variance of
286+
that feature. The feature values are then reduced by ``shrink_threshold``. Most
287+
notably, if a particular feature value crosses zero, it is set
288+
to zero. In effect, this removes the feature from affecting the classification.
289+
This is useful, for example, for removing noisy features.
290+
291+
In the example below, using a small shrink threshold increases the accuracy of
292+
the model from 0.81 to 0.82.
293+
294+
.. |nearest_centroid_1| image:: ../auto_examples/neighbors/images/plot_nearest_centroid_001.png
295+
:target: ../auto_examples/neighbors/plot_classification.html
296+
:scale: 50
297+
298+
.. |nearest_centroid_2| image:: ../auto_examples/neighbors/images/plot_nearest_centroid_002.png
299+
:target: ../auto_examples/neighbors/plot_classification.html
300+
:scale: 50
301+
302+
.. centered:: |nearest_centroid_1| |nearest_centroid_2|
303+
304+
.. topic:: Examples:
305+
306+
* :ref:`example_neighbors_plot_nearest_centroid.py`: an example of
307+
classification using nearest centroid with different shrink thresholds.
308+
309+
255310
Nearest Neighbor Algorithms
256311
===========================
257312

@@ -427,6 +482,29 @@ and the ``'effective_metric_'`` is in the ``'VALID_METRICS'`` list of
427482
same order as the number of training points, and that ``leaf_size`` is
428483
close to its default value of ``30``.
429484

485+
Valid Metrics for Nearest Neighbor Algorithms
486+
---------------------------------------------
487+
488+
======================== =================================================================
489+
Algorithm Valid Metrics
490+
======================== =================================================================
491+
**Brute Force** 'euclidean', 'l2', 'l1', 'manhattan', 'cityblock',
492+
'braycurtis', 'canberra', 'chebyshev', 'correlation',
493+
'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski',
494+
'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto',
495+
'russellrao', 'seuclidean', 'sokalmichener',
496+
'sokalsneath', 'sqeuclidean', 'yule', 'wminkowski'
497+
498+
**K-D Tree** 'chebyshev', 'euclidean', 'cityblock', 'manhattan', 'infinity',
499+
'minkowski', 'p', 'l2', 'l1'
500+
501+
**Ball Tree** 'chebyshev', 'sokalmichener', 'canberra', 'haversine',
502+
'rogerstanimoto', 'matching', 'dice', 'euclidean', 'braycurtis',
503+
'russellrao', 'cityblock', 'manhattan', 'infinity', 'jaccard',
504+
'seuclidean', 'sokalsneath', 'kulsinski', 'minkowski',
505+
'mahalanobis', 'p', 'l2', 'hamming', 'l1', 'wminkowski', 'pyfunc'
506+
======================== =================================================================
507+
430508
Effect of ``leaf_size``
431509
-----------------------
432510
As noted above, for small sample sizes a brute force search can be more
@@ -457,60 +535,6 @@ leaf nodes. The level of this switch can be specified with the parameter
457535
``leaf_size`` is not referenced for brute force queries.
458536

459537

460-
Nearest Centroid Classifier
461-
===========================
462-
463-
The :class:`NearestCentroid` classifier is a simple algorithm that represents
464-
each class by the centroid of its members. In effect, this makes it
465-
similar to the label updating phase of the :class:`sklearn.KMeans` algorithm.
466-
It also has no parameters to choose, making it a good baseline classifier. It
467-
does, however, suffer on non-convex classes, as well as when classes have
468-
drastically different variances, as equal variance in all dimensions is
469-
assumed. See Linear Discriminant Analysis (:class:`sklearn.lda.LDA`) and
470-
Quadratic Discriminant Analysis (:class:`sklearn.qda.QDA`) for more complex
471-
methods that do not make this assumption. Usage of the default
472-
:class:`NearestCentroid` is simple:
473-
474-
>>> from sklearn.neighbors.nearest_centroid import NearestCentroid
475-
>>> import numpy as np
476-
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
477-
>>> y = np.array([1, 1, 1, 2, 2, 2])
478-
>>> clf = NearestCentroid()
479-
>>> clf.fit(X, y)
480-
NearestCentroid(metric='euclidean', shrink_threshold=None)
481-
>>> print(clf.predict([[-0.8, -1]]))
482-
[1]
483-
484-
485-
Nearest Shrunken Centroid
486-
-------------------------
487-
488-
The :class:`NearestCentroid` classifier has a ``shrink_threshold`` parameter,
489-
which implements the nearest shrunken centroid classifier. In effect, the value
490-
of each feature for each centroid is divided by the within-class variance of
491-
that feature. The feature values are then reduced by ``shrink_threshold``. Most
492-
notably, if a particular feature value crosses zero, it is set
493-
to zero. In effect, this removes the feature from affecting the classification.
494-
This is useful, for example, for removing noisy features.
495-
496-
In the example below, using a small shrink threshold increases the accuracy of
497-
the model from 0.81 to 0.82.
498-
499-
.. |nearest_centroid_1| image:: ../auto_examples/neighbors/images/plot_nearest_centroid_001.png
500-
:target: ../auto_examples/neighbors/plot_classification.html
501-
:scale: 50
502-
503-
.. |nearest_centroid_2| image:: ../auto_examples/neighbors/images/plot_nearest_centroid_002.png
504-
:target: ../auto_examples/neighbors/plot_classification.html
505-
:scale: 50
506-
507-
.. centered:: |nearest_centroid_1| |nearest_centroid_2|
508-
509-
.. topic:: Examples:
510-
511-
* :ref:`example_neighbors_plot_nearest_centroid.py`: an example of
512-
classification using nearest centroid with different shrink thresholds.
513-
514538
Approximate Nearest Neighbors
515539
=============================
516540

0 commit comments

Comments
 (0)
0