8000 DOC Speed up plot_digits_linkage.py example #21598 (#21678) · samronsin/scikit-learn@85cd33f · GitHub
[go: up one dir, main page]

Skip to content

Commit 85cd33f

Browse files
yarkhinephyosamronsin
authored andcommitted
DOC Speed up plot_digits_linkage.py example scikit-learn#21598 (scikit-learn#21678)
* Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment, elaborate analysis
1 parent 27735b4 commit 85cd33f

File tree

1 file changed

+11
-20
lines changed

1 file changed

+11
-20
lines changed

examples/cluster/plot_digits_linkage.py

Lines changed: 11 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,18 @@
1212
1313
What this example shows us is the behavior "rich getting richer" of
1414
agglomerative clustering that tends to create uneven cluster sizes.
15+
1516
This behavior is pronounced for the average linkage strategy,
16-
that ends up with a couple of singleton clusters, while in the case
17-
of single linkage we get a single central cluster with all other clusters
18-
being drawn from noise points around the fringes.
17+
that ends up with a couple of clusters with few datapoints.
18+
19+
The case of single linkage is even more pathologic with a very
20+
large cluster covering most digits, an intermediate size (clean)
21+
cluster with most zero digits and all other clusters being drawn
22+
from noise points around the fringes.
23+
24+
The other linkage strategies lead to more evenly distributed
25+
clusters that are therefore likely to be less sensible to a
26+
random resampling of the dataset.
1927
2028
"""
2129

@@ -25,7 +33,6 @@
2533
from time import time
2634

2735
import numpy as np
28-
from scipy import ndimage
2936
from matplotlib import pyplot as plt
3037

3138
from sklearn import manifold, datasets
@@ -36,22 +43,6 @@
3643
np.random.seed(0)
3744

3845

39-
def nudge_images(X, y):
40-
# Having a larger dataset shows more clearly the behavior of the
41-
# methods, but we multiply the size of the dataset only by 2, as the
42-
# cost of the hierarchical clustering methods are strongly
43-
# super-linear in n_samples
44-
shift = lambda x: ndimage.shift(
45-
x.reshape((8, 8)), 0.3 * np.random.normal(size=2), mode="constant"
46-
).ravel()
47-
X = np.concatenate([X, np.apply_along_axis(shift, 1, X)])
48-
Y = np.concatenate([y, y], axis=0)
49-
return X, Y
50-
51-
52-
X, y = nudge_images(X, y)
53-
54-
5546
# ----------------------------------------------------------------------
5647
# Visualize the clustering
5748
def plot_clustering(X_red, labels, title=None):

0 commit comments

Comments
 (0)
0