[go: up one dir, main page]

0% found this document useful (0 votes)
81 views9 pages

How To Run Cluster Analysis in Excel

The document outlines the steps for performing K-means clustering analysis, starting from data preparation to calculating cluster means and repeating the process until minimal improvement in SSE is achieved. It includes specific case data and calculations for cluster assignments and distances. The final output presents the means for three segments along with the number of respondents and their respective SSE values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views9 pages

How To Run Cluster Analysis in Excel

The document outlines the steps for performing K-means clustering analysis, starting from data preparation to calculating cluster means and repeating the process until minimal improvement in SSE is achieved. It includes specific case data and calculations for cluster assignments and distances. The final output presents the means for three segments along with the number of respondents and their respective SSE values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 9

How to run cluster an

K-means cluste
Geoff Fripp
Marketing Lecturer, The Universi

STEP ONE - Start with your data set STEP TWO - If two var

Case X Y Z
1 4.40 4.57 2.29
2 3.25 3.92 2.17
3 3.10 4.25 2.40
4 4.83 4.31 2.16
5 3.63 3.60 1.67
Start1 6 3.26 1.64 1.48
7 4.89 1.33 1.04
8 4.50 2.01 1.28
Start2 9 4.99 2.47 2.60
10 4.12 2.12 1.70
11 2.21 2.51 4.17
12 2.97 4.10 3.92
13 2.40 2.45 4.46
14 2.10 3.30 4.90
Start3 15 1.13 2.05 3.28

Min 1.13 1.33 1.04


Max 4.99 4.57 4.90
Median 3.26 2.51 2.29
STEP THREE - Work out sum of squares distance and first allocation
1 2 3
Case Start1 Start2 Start3 Min Initial Choice
1 10.54 4.84 18.07 4.84 2
2 5.64 5.30 9.19 5.30 2
3 7.65 6.75 9.50 6.75 2
4 10.01 3.58 20.05 3.58 2
5 4.00 3.97 11.25 3.97 2
6 - 4.91 7.95 - 1
7 2.94 3.75 19.67 2.94 1
8 1.72 2.18 15.40 1.72 1
9 4.91 - 15.54 - 2
10 1.02 1.67 11.47 1.02 1
11 9.09 10.19 2.18 2.18 3
12 12.10 8.50 8.02 8.02 3
13 10.28 10.19 3.17 3.17 3
14 15.79 14.31 5.15 5.15 3
15 7.95 15.54 - - 3

SSE 48.64

STEP FOUR - Calculate means of each current cluster

Cluster 1 Cluster 2
Current Case X Y Z X Y Z
2 1 4.40 4.57 2.29
2 2 3.25 3.92 2.17
2 3 3.10 4.25 2.40
2 4 4.83 4.31 2.16
2 5 3.63 3.60 1.67
1 6 3.26 1.64 1.48
1 7 4.89 1.33 1.04
1 8 4.50 2.01 1.28
2 9 4.99 2.47 2.60
1 10 4.12 2.12 1.70
3 11
3 12
3 13
3 14
3 15

Mean 4.19 1.77 1.37 4.03 3.85 2.21

Set as named new ranges

STEP FIVE - Repeat step three - with new cluster means above
1 2 3

Case Cluster 1 Cluster 2 Cluster 3 Min Revised Choice


1 8.71 0.66 11.33 0.66 2
2 6.12 0.63 6.14 0.63 2
3 8.35 1.06 5.82 1.06 2
4 7.42 0.84 13.11 0.84 2
5 3.73 0.53 8.82 0.53 2
6 0.90 6.01 9.85 0.90 1
7 0.79 8.49 19.52 0.79 1
8 0.16 4.50 14.48 0.16 1
9 2.61 2.96 10.56 2.61 1
10 0.23 3.26 10.40 0.23 1
11 12.29 8.96 0.14 0.14 3
12 13.42 4.12 2.19 2.19 3
13 13.21 9.70 0.34 0.34 3
14 19.14 11.26 0.75 0.75 3
15 13.10 12.82 2.52 2.52 3

SSE 14.35

STEP SIX+ - Repeat steps four and five - until SSE only has minor impro
Mean/Centroid

Segment 1
Segment 2
Segment 3
AVERAGE
Respondents
Segment 1
Segment 2
Segment 3
TOTAL
uster analysis in Excel
eans clustering
Geoff Fripp
cturer, The University of Sydney

STEP TWO - If two variables, run a visual check with a scatter graph
e and first allocation

Use =SUMXMY2

Case X Y Z
1 4.40 4.57 2.29
Start1 6 3.26 1.64 1.48
Difference 1.14 2.93 0.81
Sqaured 1.31 8.58 0.66 10.54

Case 1 =IF(G45=D45,D$43,IF(G45=E45,E$43,IF(G45=F45,F$43,"")))

Check
Cluster 1 4
Cluster 2 6
Cluster 3 5
TOTAL 15

Cluster 3
X Y Z
Case 1
Cluster 1 =IF($B66=1,D22,"")
Cluster 2 =IF($B66=2,D22,"")
Cluster 3 =IF($B66=3,D22,"")

2.21 2.51 4.17


2.97 4.10 3.92
2.40 2.45 4.46
2.10 3.30 4.90
1.13 2.05 3.28

2.16 2.88 4.15

r means above
Use =SUMXMY2

Case X Y Z
1 4.40 4.57 2.29
Cluster 1 6 4.19 1.77 1.37
Difference 0.21 2.80 0.92
Sqaured 0.04 7.82 0.84 8.71

Case 1 =IF(G45=D45,D$43,IF(G45=E45,E$43,IF(G45=F45,F$43,"")))

Check
Cluster 1 5
Cluster 2 5
Cluster 3 5
TOTAL 15

E only has minor improvement


Output for THREE Clusters/Segments
Mean/Centroid X Y Z Variable 4 Variable 5

Segment 1 4.35 1.91 1.62


Segment 2 3.84 4.13 2.14
Segment 3 2.16 2.88 4.15
AVERAGE 3.45 2.97 2.63
Respondents Number % SSE/Segment
Segment 1 5 33.3% 4.2
Segment 2 5 33.3% 3.1 SSE Total 13.2
Segment 3 5 33.3% 5.9
TOTAL 15 100.0%
ters/Segments
Variable 5 Variable 6 Variable 7 Variable 8

otal 13.2

You might also like