8000 DOC update documentation for DBSCAN and OPTICS (#21343) · scikit-learn/scikit-learn@958ccc5 · GitHub
[go: up one dir, main page]

Skip to content

Commit 958ccc5

Browse files
DOC update documentation for DBSCAN and OPTICS (#21343)
1 parent a3f09ea commit 958ccc5

File tree

1 file changed

+11
-10
lines changed

1 file changed

+11
-10
lines changed

doc/modules/clustering.rst

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -89,14 +89,15 @@ Overview of clustering methods
8989
* - :ref:`DBSCAN <dbscan>`
9090
- neighborhood size
9191
- Very large ``n_samples``, medium ``n_clusters``
92-
- Non-flat geometry, uneven cluster sizes, transductive
92+
- Non-flat geometry, uneven cluster sizes, outlier removal,
93+
transductive
9394
- Distances between nearest points
9495

9596
* - :ref:`OPTICS <optics>`
9697
- minimum cluster membership
9798
- Very large ``n_samples``, large ``n_clusters``
9899
- Non-flat geometry, uneven cluster sizes, variable cluster density,
99-
transductive
100+
outlier removal, transductive
100101
- Distances between points
101102

102103
* - :ref:`Gaussian mixtures <mixture>`
@@ -203,9 +204,9 @@ initializations of the centroids. One method to help address this issue is the
203204
k-means++ initialization scheme, which has been implemented in scikit-learn
204205
(use the ``init='k-means++'`` parameter). This initializes the centroids to be
205206
(generally) distant from each other, leading to probably better results than
206-
random initialization, as shown in the reference.
207+
random initialization, as shown in the reference.
207208

208-
K-means++ can also be called independently to select seeds for other
209+
K-means++ can also be called independently to select seeds for other
209210
clustering algorithms, see :func:`sklearn.cluster.kmeans_plusplus` for details
210211
and example usage.
211212

@@ -1383,7 +1384,7 @@ more broadly common names.
13831384

13841385
* `Wikipedia entry for the Adjusted Mutual Information
13851386
<https://en.wikipedia.org/wiki/Adjusted_Mutual_Information>`_
1386-
1387+
13871388
.. [VEB2009] Vinh, Epps, and Bailey, (2009). "Information theoretic measures
13881389
for clusterings comparison". Proceedings of the 26th Annual International
13891390
Conference on Machine Learning - ICML '09.
@@ -1394,13 +1395,13 @@ more broadly common names.
13941395
Clusterings Comparison: Variants, Properties, Normalization and
13951396
Correction for Chance". JMLR
13961397
<http://jmlr.csail.mit.edu/papers/volume11/vinh10a/vinh10a.pdf>
1397-
1398+
13981399
.. [YAT2016] Yang, Algesheimer, and Tessone, (2016). "A comparative analysis of
13991400
community
14001401
detection algorithms on artificial networks". Scientific Reports 6: 30750.
14011402
`doi:10.1038/srep30750 <https://www.nature.com/articles/srep30750>`_.
1402-
1403-
1403+
1404+
14041405
14051406
.. _homogeneity_completeness:
14061407

@@ -1738,8 +1739,8 @@ Calinski-Harabasz Index
17381739

17391740

17401741
If the ground truth labels are not known, the Calinski-Harabasz index
1741-
(:func:`sklearn.metrics.calinski_harabasz_score`) - also known as the Variance
1742-
Ratio Criterion - can be used to evaluate the model, where a higher
1742+
(:func:`sklearn.metrics.calinski_harabasz_score`) - also known as the Variance
1743+
Ratio Criterion - can be used to evaluate the model, where a higher
17431744
Calinski-Harabasz score relates to a model with better defined clusters.
17441745

17451746
The index is the ratio of the sum of between-clusters dispersion and of

0 commit comments

Comments
 (0)
0