@@ -1344,18 +1344,20 @@ mean of homogeneity and completeness**:
1344
1344
Fowlkes-Mallows scores
1345
1345
----------------------
1346
1346
1347
- The Fowlkes-Mallows index (FMI) is defined as the geometric mean of
1348
- the pairwise precision and recall::
1347
+ The Fowlkes-Mallows index (:func: `sklearn.metrics.fowlkes_mallows_score `) can be
1348
+ used when the ground truth class assignments of the samples is known. The
1349
+ Fowlkes-Mallows score FMI is defined as the geometric mean of the
1350
+ pairwise precision and recall:
1349
1351
1350
- FMI = TP / sqrt((TP + FP) * (TP + FN))
1352
+ .. math :: \text{ FMI} = \frac{\text{TP}}{\ sqrt{(\text{TP} + \text{FP}) (\text{TP} + \text{FN})}}
1351
1353
1352
- Where :math: `TP ` is the number of **True Positive ** (i.e. the number of pair
1353
- of points that belong to the same clusters in both labels_true and
1354
- labels_pred ), :math: `FP ` is the number of **False Positive ** (i.e. the number
1355
- of pair of points that belong to the same clusters in labels_true and not
1356
- in labels_pred ) and :math: ` FN`is the number of **False Negative** (i.e the
1357
- number of pair of points that belongs in the same clusters in labels_pred
1358
- and not in labels_True ).
1354
+ Where `` TP ` ` is the number of **True Positive ** (i.e. the number of pair
1355
+ of points that belong to the same clusters in both the true labels and the
1356
+ predicted labels ), `` FP ` ` is the number of **False Positive ** (i.e. the number
1357
+ of pair of points that belong to the same clusters in the true labels and not
1358
+ in the predicted labels ) and `` FN `` is the number of **False Negative ** (i.e the
1359
+ number of pair of points that belongs in the same clusters in the predicted
1360
+ labels and not in the true labels ).
1359
1361
1360
1362
The score ranges from 0 to 1. A high value indicates a good similarity
1361
1363
between two clusters.
@@ -1505,24 +1507,28 @@ Calinski-Harabaz Index
1505
1507
----------------------
1506
1508
1507
1509
If the ground truth labels are not known, the Calinski-Harabaz index
1508
- (:func: ' sklearn.metrics.calinski_harabaz_score' ) can be used to evaluate the
1510
+ (:func: ` sklearn.metrics.calinski_harabaz_score ` ) can be used to evaluate the
1509
1511
model, where a higher Calinski-Harabaz score relates to a model with better
1510
1512
defined clusters.
1511
1513
1512
- For :math: `k` clusters, the Calinski-Harabaz :math: `ch` is given as the ratio
1513
- of the between-clusters dispersion mean and the within-cluster dispersion:
1514
+ For :math: `k` clusters, the Calinski-Harabaz score :math: `s` is given as the
1515
+ ratio of the between-clusters dispersion mean and the within-cluster
1516
+ dispersion:
1514
1517
1515
1518
.. math ::
1516
- ch(k) = \frac {trace(B_k)}{trace(W_k)} \times \frac {N - k}{k - 1 }
1517
- W_k = \sum _{q=1 }^k \sum _{x \in C_q} (x - c_q) (x - c_q)^T \\
1518
- B_k = \sum _q n_q (c_q - c) (c_q - c)^T \\
1519
+ s(k) = \frac {\mathrm {Tr}(B_k)}{\mathrm {Tr}(W_k)} \times \frac {N - k}{k - 1 }
1519
1520
1520
- where:
1521
- - :math: `N` be the number of points in our data,
1522
- - :math: `C_q` be the set of points in cluster :math: `q`,
1523
- - :math: `c_q` be the center of cluster :math: `q`,
1524
- - :math: `c` be the center of :math: `E`,
1525
- - :math: `n_q` be the number of points in cluster :math: `q`:
1521
+ where :math: `B_K` is the between group dispersion matrix and :math: `W_K`
1522
+ is the within-cluster dispersion matrix defined by:
1523
+
1524
+ .. math :: W_k = \sum_{q=1}^k \sum_{x \in C_q} (x - c_q) (x - c_q)^T
1525
+
1526
+ .. math :: B_k = \sum_q n_q (c_q - c) (c_q - c)^T
1527
+
1528
+ with :math: `N` be the number of points in our data, :math: `C_q` be the set of
1529
+ points in cluster :math: `q`, :math: `c_q` be the center of cluster
1530
+ :math: `q`, :math: `c` be the center of :math: `E`, :math: `n_q` be the number of
1531
+ points in cluster :math: `q`.
1526
1532
1527
1533
1528
1534
>>> from sklearn import metrics
@@ -1539,8 +1545,7 @@ cluster analysis.
1539
1545
>>> from sklearn.cluster import KMeans
1540
1546
>>> kmeans_model = KMeans(n_clusters = 3 , random_state = 1 ).fit(X)
1541
1547
>>> labels = kmeans_model.labels_
1542
- >>> metrics.calinski_harabaz_score(X, labels)
1543
- ... # doctest: +ELLIPSIS
1548
+ >>> metrics.calinski_harabaz_score(X, labels) # doctest: +ELLIPSIS
1544
1549
560.39...
1545
1550
1546
1551
0 commit comments