How to run cluster an
K-means cluste
Geoff Fripp
Marketing Lecturer, The Universi
STEP ONE - Start with your data set STEP TWO - If two var
Case X Y Z
1 4.40 4.57 2.29
2 3.25 3.92 2.17
3 3.10 4.25 2.40
4 4.83 4.31 2.16
5 3.63 3.60 1.67
Start1 6 3.26 1.64 1.48
7 4.89 1.33 1.04
8 4.50 2.01 1.28
Start2 9 4.99 2.47 2.60
10 4.12 2.12 1.70
11 2.21 2.51 4.17
12 2.97 4.10 3.92
13 2.40 2.45 4.46
14 2.10 3.30 4.90
Start3 15 1.13 2.05 3.28
Min 1.13 1.33 1.04
Max 4.99 4.57 4.90
Median 3.26 2.51 2.29
STEP THREE - Work out sum of squares distance and first allocation
1 2 3
Case Start1 Start2 Start3 Min Initial Choice
1 10.54 4.84 18.07 4.84 2
2 5.64 5.30 9.19 5.30 2
3 7.65 6.75 9.50 6.75 2
4 10.01 3.58 20.05 3.58 2
5 4.00 3.97 11.25 3.97 2
6 - 4.91 7.95 - 1
7 2.94 3.75 19.67 2.94 1
8 1.72 2.18 15.40 1.72 1
9 4.91 - 15.54 - 2
10 1.02 1.67 11.47 1.02 1
11 9.09 10.19 2.18 2.18 3
12 12.10 8.50 8.02 8.02 3
13 10.28 10.19 3.17 3.17 3
14 15.79 14.31 5.15 5.15 3
15 7.95 15.54 - - 3
SSE 48.64
STEP FOUR - Calculate means of each current cluster
Cluster 1 Cluster 2
Current Case X Y Z X Y Z
2 1 4.40 4.57 2.29
2 2 3.25 3.92 2.17
2 3 3.10 4.25 2.40
2 4 4.83 4.31 2.16
2 5 3.63 3.60 1.67
1 6 3.26 1.64 1.48
1 7 4.89 1.33 1.04
1 8 4.50 2.01 1.28
2 9 4.99 2.47 2.60
1 10 4.12 2.12 1.70
3 11
3 12
3 13
3 14
3 15
Mean 4.19 1.77 1.37 4.03 3.85 2.21
Set as named new ranges
STEP FIVE - Repeat step three - with new cluster means above
1 2 3
Case Cluster 1 Cluster 2 Cluster 3 Min Revised Choice
1 8.71 0.66 11.33 0.66 2
2 6.12 0.63 6.14 0.63 2
3 8.35 1.06 5.82 1.06 2
4 7.42 0.84 13.11 0.84 2
5 3.73 0.53 8.82 0.53 2
6 0.90 6.01 9.85 0.90 1
7 0.79 8.49 19.52 0.79 1
8 0.16 4.50 14.48 0.16 1
9 2.61 2.96 10.56 2.61 1
10 0.23 3.26 10.40 0.23 1
11 12.29 8.96 0.14 0.14 3
12 13.42 4.12 2.19 2.19 3
13 13.21 9.70 0.34 0.34 3
14 19.14 11.26 0.75 0.75 3
15 13.10 12.82 2.52 2.52 3
SSE 14.35
STEP SIX+ - Repeat steps four and five - until SSE only has minor impro
Mean/Centroid
Segment 1
Segment 2
Segment 3
AVERAGE
Respondents
Segment 1
Segment 2
Segment 3
TOTAL
uster analysis in Excel
eans clustering
Geoff Fripp
cturer, The University of Sydney
STEP TWO - If two variables, run a visual check with a scatter graph
e and first allocation
Use =SUMXMY2
Case X Y Z
1 4.40 4.57 2.29
Start1 6 3.26 1.64 1.48
Difference 1.14 2.93 0.81
Sqaured 1.31 8.58 0.66 10.54
Case 1 =IF(G45=D45,D$43,IF(G45=E45,E$43,IF(G45=F45,F$43,"")))
Check
Cluster 1 4
Cluster 2 6
Cluster 3 5
TOTAL 15
Cluster 3
X Y Z
Case 1
Cluster 1 =IF($B66=1,D22,"")
Cluster 2 =IF($B66=2,D22,"")
Cluster 3 =IF($B66=3,D22,"")
2.21 2.51 4.17
2.97 4.10 3.92
2.40 2.45 4.46
2.10 3.30 4.90
1.13 2.05 3.28
2.16 2.88 4.15
r means above
Use =SUMXMY2
Case X Y Z
1 4.40 4.57 2.29
Cluster 1 6 4.19 1.77 1.37
Difference 0.21 2.80 0.92
Sqaured 0.04 7.82 0.84 8.71
Case 1 =IF(G45=D45,D$43,IF(G45=E45,E$43,IF(G45=F45,F$43,"")))
Check
Cluster 1 5
Cluster 2 5
Cluster 3 5
TOTAL 15
E only has minor improvement
Output for THREE Clusters/Segments
Mean/Centroid X Y Z Variable 4 Variable 5
Segment 1 4.35 1.91 1.62
Segment 2 3.84 4.13 2.14
Segment 3 2.16 2.88 4.15
AVERAGE 3.45 2.97 2.63
Respondents Number % SSE/Segment
Segment 1 5 33.3% 4.2
Segment 2 5 33.3% 3.1 SSE Total 13.2
Segment 3 5 33.3% 5.9
TOTAL 15 100.0%
ters/Segments
Variable 5 Variable 6 Variable 7 Variable 8
otal 13.2