@@ -12,16 +12,16 @@ Clustering: grouping observations together
12
12
**clustering task **: split the observations into well-separated group
13
13
called *clusters *.
14
14
15
- ..
16
- >>> # Set the PRNG
15
+ ..
16
+ >>> # Set the PRNG
17
17
>>> import numpy as np
18
18
>>> np.random.seed(1)
19
19
20
20
K-means clustering
21
21
-------------------
22
22
23
23
Note that there exist a lot of different clustering criteria and associated
24
- algorithms. The simplest clustering algorithm is
24
+ algorithms. The simplest clustering algorithm is
25
25
:ref: `k_means `.
26
26
27
27
.. image :: ../../auto_examples/cluster/images/plot_cluster_iris_002.png
@@ -30,7 +30,7 @@ algorithms. The simplest clustering algorithm is
30
30
:align: right
31
31
32
32
33
- ::
33
+ ::
34
34
35
35
>>> from sklearn import cluster, datasets
36
36
>>> iris = datasets.load_iris()
@@ -57,30 +57,30 @@ algorithms. The simplest clustering algorithm is
57
57
:target: ../../auto_examples/cluster/plot_cluster_iris.html
58
58
:scale: 63
59
59
60
- .. warning ::
61
-
60
+ .. warning ::
61
+
62
62
There is absolutely no guarantee of recovering a ground truth. First,
63
63
choosing the right number of clusters is hard. Second, the algorithm
64
64
is sensitive to initialization, and can fall into local minima,
65
65
although scikit-learn employs several tricks to mitigate this issue.
66
66
67
67
.. list-table ::
68
68
:class: centered
69
-
70
- *
71
-
69
+
70
+ *
71
+
72
72
- |k_means_iris_bad_init |
73
73
74
74
- |k_means_iris_8 |
75
75
76
76
- |cluster_iris_truth |
77
77
78
- *
79
-
78
+ *
79
+
80
80
- **Bad initialization **
81
-
81
+
82
82
- **8 clusters **
83
-
83
+
84
84
- **Ground truth **
85
85
86
86
**Don't over-interpret clustering results **
@@ -105,8 +105,8 @@ algorithms. The simplest clustering algorithm is
105
105
106
106
Clustering in general and KMeans, in particular, can be seen as a way
107
107
of choosing a small number of exemplars to compress the information.
108
- The problem is sometimes known as
109
- `vector quantization <http://en.wikipedia.org/wiki/Vector_quantization >`_.
108
+ The problem is sometimes known as
109
+ `vector quantization <http://en.wikipedia.org/wiki/Vector_quantization >`_.
110
110
For instance, this can be used to posterize an image::
111
111
112
112
>>> import scipy as sp
@@ -125,7 +125,7 @@ algorithms. The simplest clustering algorithm is
125
125
>>> lena_compressed.shape = lena.shape
126
126
127
127
.. list-table ::
128
- :class: centered
128
+ :class: centered
129
129
130
130
*
131
131
- |lena |
@@ -275,8 +275,7 @@ data by projecting on a principal subspace.
275
275
>>> from sklearn import decomposition
276
276
>>> pca = decomposition.PCA()
277
277
>>> pca.fit(X)
278
- PCA(copy=True, iterated_power=3, n_components=None, random_state=None,
279
- svd_solver='auto', tol=0.0, whiten=False)
278
+ PCA(copy=True, n_components=None, whiten=False)
280
279
>>> print(pca.explained_variance_) # doctest: +SKIP
281
280
[ 2.18565811e+00 1.19346747e+00 8.43026679e-32]
282
281
@@ -321,3 +320,4 @@ a maximum amount of independent information. It is able to recover
321
320
>>> A_ = ica.mixing_.T
322
321
>>> np.allclose(X, np.dot(S_, A_) + ica.mean_)
323
322
True
323
+
0 commit comments