[go: up one dir, main page]

0% found this document useful (0 votes)
25 views8 pages

Hierarchical Clustering

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 8

2/14/23, 12:29 PM Hierarchical Clustering

Map > Data Science > Predicting the Future > Modeling > Clustering > Hierarchical
 
ads via Carbon
Hierarchical Clustering

Hierarchical clustering involves creating clusters that have a predetermined ordering from top to bottom. For example,
all files and folders on the hard disk are organized in a hierarchy. There are two types of hierarchical clustering, Divisive
and Agglomerative.
Use machine
Intelligence to
improve efficiency
and boost profit

ADS VIA CARBON

Divisive method
In divisive or top-down clustering method we assign all of the observations to a single cluster and then partition the
cluster to two least similar clusters using a flat clustering method (e.g., K-Means). Finally, we proceed recursively on
each cluster until there is one cluster for each observation. There is evidence that divisive algorithms produce more
accurate hierarchies than agglomerative  algorithms in some circumstances but is conceptually more complex.
 
Agglomerative method
In agglomerative or bottom-up clustering method we assign each observation to its own cluster. Then, compute the
similarity (e.g., distance) between each of the clusters and join the two most similar clusters. Finally, repeat steps 2 and
3 until there is only a single cluster left. The related algorithm is shown below.

https://www.saedsayad.com/clustering_hierarchical.htm 1/8
2/14/23, 12:29 PM Hierarchical Clustering

ads via Carbon

Use machine
Intelligence to
improve efficiency
and boost profit

ADS VIA CARBON

Before any clustering is performed, it is required to determine the proximity matrix containing the distance between
each point using a distance function. Then, the matrix is updated to display the distance between each cluster. The
following three methods differ in how the distance between each cluster is measured.
 
Single Linkage
In single linkage hierarchical clustering, the distance between two clusters is defined as the shortest distance between
two points in each cluster. For example, the distance between clusters “r” and “s” to the left is equal to the length of
the arrow between their two closest points.

https://www.saedsayad.com/clustering_hierarchical.htm 2/8
2/14/23, 12:29 PM Hierarchical Clustering

ads via Carbon

Use machine
Intelligence to
improve efficiency
and boost profit
Complete Linkage ADS VIA CARBON
In complete linkage hierarchical clustering, the distance between two clusters is defined as the longest distance
between two points in each cluster. For example, the distance between clusters “r” and “s” to the left is equal to the
length of the arrow between their two furthest points.

Average Linkage
In average linkage hierarchical clustering, the distance between two clusters is defined as the average distance
between each point in one cluster to every point in the other cluster. For example, the distance between clusters “r”
and “s” to the left is equal to the average length each arrow between connecting the points of one cluster to the other.

https://www.saedsayad.com/clustering_hierarchical.htm 3/8
2/14/23, 12:29 PM Hierarchical Clustering

ads via Carbon

Use machine
Intelligence to
improve efficiency
and boost profit

ADS VIA CARBON


 
Example: Clustering the following 7 data points.
 
X1 X2
A 10 5
B 1 4
C 5 8
D 9 2
E 12 10
F 15 8
G 7 7

https://www.saedsayad.com/clustering_hierarchical.htm 4/8
2/14/23, 12:29 PM Hierarchical Clustering

ads via Carbon

Use machine
Intelligence to
improve efficiency
and boost profit

ADS VIA CARBON


 
Step 1: Calculate distances between all data points using Euclidean distance function.  The shortest distance is between
data points C and G.

A B C D E F
B 9.06          
C 5.83 5.66        
D 3.16 8.25 7.21      
E 5.39 12.53 7.28 14.42    
F 5.83 14.56 10.00 16.16 3.61  
G 3.61 6.71 2.24 8.60 5.83 8.06

 
Step 2: We use "Average Linkage" to measure the distance between the "C,G" cluster and other data points.

https://www.saedsayad.com/clustering_hierarchical.htm 5/8
2/14/23, 12:29 PM Hierarchical Clustering

A B C,G D E
B 9.06         ads via Carbon

C,G 4.72 6.10      


D 3.16 8.25 6.26    
E 5.39 12.53 6.50 14.42  
F 5.83 14.56 9.01 16.16 3.61
Use machine
Intelligence to
  improve efficiency
and boost profit
Step 3:
ADS VIA CARBON

A,D B C,G E
B 8.51      
C,G 5.32 6.10    
E 6.96 12.53 6.50  
F 7.11 14.56 9.01 3.61

 
Step 4:

A,D B C,G
B 8.51    
C,G 5.32 6.10  
E,F 6.80 13.46 7.65

https://www.saedsayad.com/clustering_hierarchical.htm 6/8
2/14/23, 12:29 PM Hierarchical Clustering

 
Step 5:
ads via Carbon

A,D,C,G B
B 6.91  
E,F 6.73 13.46
Use machine
  Intelligence to
improve efficiency
and boost profit

ADS VIA CARBON


 
Step 6:

A,D,C,G,E,F
B 9.07

 
Final dendrogram:

https://www.saedsayad.com/clustering_hierarchical.htm 7/8
2/14/23, 12:29 PM Hierarchical Clustering

ads via Carbon

Use machine
Intelligence to
improve efficiency
and boost profit

ADS VIA CARBON

 
 

Exercise
 

https://www.saedsayad.com/clustering_hierarchical.htm 8/8

You might also like