@@ -252,6 +252,61 @@ the lower half of those faces.
252
252
multi-output regression using nearest neighbors.
253
253
254
254
255
+ Nearest Centroid Classifier
256
+ ===========================
257
+
258
+ The :class: `NearestCentroid ` classifier is a simple algorithm that represents
259
+ each class by the centroid of its members. In effect, this makes it
260
+ similar to the label updating phase of the :class: `sklearn.KMeans ` algorithm.
261
+ It also has no parameters to choose, making it a good baseline classifier. It
262
+ does, however, suffer on non-convex classes, as well as when classes have
263
+ drastically different variances, as equal variance in all dimensions is
264
+ assumed. See Linear Discriminant Analysis (:class: `sklearn.lda.LDA `) and
265
+ Quadratic Discriminant Analysis (:class: `sklearn.qda.QDA `) for more complex
266
+ methods that do not make this assumption. Usage of the default
267
+ :class: `NearestCentroid ` is simple:
268
+
269
+ >>> from sklearn.neighbors.nearest_centroid import NearestCentroid
270
+ >>> import numpy as np
271
+ >>> X = np.array([[- 1 , - 1 ], [- 2 , - 1 ], [- 3 , - 2 ], [1 , 1 ], [2 , 1 ], [3 , 2 ]])
272
+ >>> y = np.array([1 , 1 , 1 , 2 , 2 , 2 ])
273
+ >>> clf = NearestCentroid()
274
+ >>> clf.fit(X, y)
275
+ NearestCentroid(metric='euclidean', shrink_threshold=None)
276
+ >>> print (clf.predict([[- 0.8 , - 1 ]]))
277
+ [1]
278
+
279
+
280
+ Nearest Shrunken Centroid
281
+ -------------------------
282
+
283
+ The :class: `NearestCentroid ` classifier has a ``shrink_threshold `` parameter,
284
+ which implements the nearest shrunken centroid classifier. In effect, the value
285
+ of each feature for each centroid is divided by the within-class variance of
286
+ that feature. The feature values are then reduced by ``shrink_threshold ``. Most
287
+ notably, if a particular feature value crosses zero, it is set
288
+ to zero. In effect, this removes the feature from affecting the classification.
289
+ This is useful, for example, for removing noisy features.
8000
290
+
291
+ In the example below, using a small shrink threshold increases the accuracy of
292
+ the model from 0.81 to 0.82.
293
+
294
+ .. |nearest_centroid_1 | image :: ../auto_examples/neighbors/images/plot_nearest_centroid_001.png
295
+ :target: ../auto_examples/neighbors/plot_classification.html
296
+ :scale: 50
297
+
298
+ .. |nearest_centroid_2 | image :: ../auto_examples/neighbors/images/plot_nearest_centroid_002.png
299
+ :target: ../auto_examples/neighbors/plot_classification.html
300
+ :scale: 50
301
+
302
+ .. centered :: |nearest_centroid_1| |nearest_centroid_2|
303
+
304
+ .. topic :: Examples:
305
+
306
+ * :ref: `example_neighbors_plot_nearest_centroid.py `: an example of
307
+ classification using nearest centroid with different shrink thresholds.
308
+
309
+
255
310
Nearest Neighbor Algorithms
256
311
===========================
257
312
@@ -427,6 +482,29 @@ and the ``'effective_metric_'`` is in the ``'VALID_METRICS'`` list of
427
482
same order as the number of training points, and that ``leaf_size `` is
428
483
close to its default value of ``30 ``.
429
484
485
+ Valid Metrics for Nearest Neighbor Algorithms
486
+ ---------------------------------------------
487
+
488
+ ======================== =================================================================
489
+ Algorithm Valid Metrics
490
+ ======================== =================================================================
491
+ **Brute Force ** 'euclidean', 'l2', 'l1', 'manhattan', 'cityblock',
492
+ 'braycurtis', 'canberra', 'chebyshev', 'correlation',
493
+ 'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski',
494
+ 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto',
495
+ 'russellrao', 'seuclidean', 'sokalmichener',
496
+ 'sokalsneath', 'sqeuclidean', 'yule', 'wminkowski'
497
+
498
+ **K-D Tree** 'chebyshev', 'euclidean', 'cityblock', 'manhattan', 'infinity',
499
+ 'minkowski', 'p', 'l2', 'l1'
500
+
501
+ **Ball Tree** 'chebyshev', 'sokalmichener', 'canberra', 'haversine',
502
+ 'rogerstanimoto', 'matching', 'dice', 'euclidean', 'braycurtis',
503
+ 'russellrao', 'cityblock', 'manhattan', 'infinity', 'jaccard',
504
+ 'seuclidean', 'sokalsneath', 'kulsinski', 'minkowski',
505
+ 'mahalanobis', 'p', 'l2', 'hamming', 'l1', 'wminkowski', 'pyfunc'
506
+ ======================== =================================================================
507
+
430
508
Effect of ``leaf_size ``
431
509
-----------------------
432
510
As noted above, for small sample sizes a brute force search can be more
@@ -457,60 +535,6 @@ leaf nodes. The level of this switch can be specified with the parameter
457
535
``leaf_size `` is not referenced for brute force queries.
458
536
459
537
460
- Nearest Centroid Classifier
461
- ===========================
462
-
463
- The :class: `NearestCentroid ` classifier is a simple algorithm that represents
464
- each class by the centroid of its members. In effect, this makes it
465
- similar to the label updating phase of the :class: `sklearn.KMeans ` algorithm.
466
- It also has no parameters to choose, making it a good baseline classifier. It
467
- does, however, suffer on non-convex classes, as well as when classes have
468
- drastically different variances, as equal variance in all dimensions is
469
- assumed. See Linear Discriminant Analysis (:class: `sklearn.lda.LDA `) and
470
- Quadratic Discriminant Analysis (:class: `sklearn.qda.QDA `) for more complex
471
- methods that do not make this assumption. Usage of the default
472
- :class: `NearestCentroid ` is simple:
473
-
474
- >>> from sklearn.neighbors.nearest_centroid import NearestCentroid
475
- >>> import numpy as np
476
- >>> X = np.array([[- 1 , - 1 ], [- 2 , - 1 ], [- 3 , - 2 ], [1 , 1 ], [2 , 1 ], [3 , 2 ]])
477
- >>> y = np.array([1 , 1 , 1 , 2 , 2 , 2 ])
478
- >>> clf = NearestCentroid()
479
- >>> clf.fit(X, y)
480
- NearestCentroid(metric='euclidean', shrink_threshold=None)
481
- >>> print (clf.predict([[- 0.8 , - 1 ]]))
482
- [1]
483
-
484
-
485
- Nearest Shrunken Centroid
486
- -------------------------
487
-
488
- The :class: `NearestCentroid ` classifier has a ``shrink_threshold `` parameter,
489
- which implements the nearest shrunken centroid classifier. In effect, the value
490
- of each feature for each centroid is divided by the within-class variance of
491
- that feature. The feature values are then reduced by ``shrink_threshold ``. Most
492
- notably, if a particular feature value crosses zero, it is set
493
- to zero. In effect, this removes the feature from affecting the classification.
494
- This is useful, for example, for removing noisy features.
495
-
496
- In the example below, using a small shrink threshold increases the accuracy of
497
- the model from 0.81 to 0.82.
498
-
499
- .. |nearest_centroid_1 | image :: ../auto_examples/neighbors/images/plot_nearest_centroid_001.png
500
- :target: ../auto_examples/neighbors/plot_classification.html
501
- :scale: 50
502
-
503
- .. |nearest_centroid_2 | image :: ../auto_examples/neighbors/images/plot_nearest_centroid_002.png
504
- :target: ../auto_examples/neighbors/plot_classification.html
505
- :scale: 50
506
-
507
- .. centered :: |nearest_centroid_1| |nearest_centroid_2|
508
-
509
- .. topic :: Examples:
510
-
511
- * :ref: `example_neighbors_plot_nearest_centroid.py `: an example of
512
- classification using nearest centroid with different shrink thresholds.
513
-
514
538
Approximate Nearest Neighbors
515
539
=============================
516
540
0 commit comments