@@ -89,14 +89,15 @@ Overview of clustering methods
89
89
* - :ref: `DBSCAN <dbscan >`
90
90
- neighborhood size
91
91
- Very large ``n_samples ``, medium ``n_clusters ``
92
- - Non-flat geometry, uneven cluster sizes, transductive
92
+ - Non-flat geometry, uneven cluster sizes, outlier removal,
93
+ transductive
93
94
- Distances between nearest points
94
95
95
96
* - :ref: `OPTICS <optics >`
96
97
- minimum cluster membership
97
98
- Very large ``n_samples ``, large ``n_clusters ``
98
99
- Non-flat geometry, uneven cluster sizes, variable cluster density,
99
- transductive
100
+ outlier removal, transductive
100
101
- Distances between points
101
102
102
103
* - :ref: `Gaussian mixtures <mixture >`
@@ -203,9 +204,9 @@ initializations of the centroids. One method to help address this issue is the
203
204
k-means++ initialization scheme, which has been implemented in scikit-learn
204
205
(use the ``init='k-means++' `` parameter). This initializes the centroids to be
205
206
(generally) distant from each other, leading to probably better results than
206
- random initialization, as shown in the reference.
207
+ random initialization, as shown in the reference.
207
208
208
- K-means++ can also be called independently to select seeds for other
209
+ K-means++ can also be called independently to select seeds for other
209
210
clustering algorithms, see :func: `sklearn.cluster.kmeans_plusplus ` for details
210
211
and example usage.
211
212
@@ -1383,7 +1384,7 @@ more broadly common names.
1383
1384
1384
1385
* `Wikipedia entry for the Adjusted Mutual Information
1385
1386
<https://en.wikipedia.org/wiki/Adjusted_Mutual_Information> `_
1386
-
1387
+
1387
1388
.. [VEB2009 ] Vinh, Epps, and Bailey, (2009). "Information theoretic measures
1388
1389
for clusterings comparison". Proceedings of the 26th Annual International
1389
1390
Conference on Machine Learning - ICML '09.
@@ -1394,13 +1395,13 @@ more broadly common names.
1394
1395
Clusterings Comparison: Variants, Properties, Normalization and
1395
1396
Correction for Chance". JMLR
1396
1397
<http://jmlr.csail.mit.edu/papers/volume11/vinh10a/vinh10a.pdf>
1397
-
1398
+
1398
1399
.. [YAT2016 ] Yang, Algesheimer, and Tessone, (2016). "A comparative analysis of
1399
1400
community
1400
1401
detection algorithms on artificial networks". Scientific Reports 6: 30750.
1401
1402
`doi:10.1038/srep30750 <https://www.nature.com/articles/srep30750 >`_.
1402
-
1403
-
1403
+
1404
+
1404
1405
1405
1406
.. _homogeneity_completeness :
1406
1407
@@ -1738,8 +1739,8 @@ Calinski-Harabasz Index
1738
1739
1739
1740
1740
1741
If the ground truth labels are not known, the Calinski-Harabasz index
1741
- (:func: `sklearn.metrics.calinski_harabasz_score `) - also known as the Variance
1742
- Ratio Criterion - can be used to evaluate the model, where a higher
1742
+ (:func: `sklearn.metrics.calinski_harabasz_score `) - also known as the Variance
1743
+ Ratio Criterion - can be used to evaluate the model, where a higher
1743
1744
Calinski-Harabasz score relates to a model with better defined clusters.
1744
1745
1745
1746
The index is the ratio of the sum of between-clusters dispersion and of
0 commit comments