1731009606_Clustering_(Class_38-39)

Clustering
Unsupervised Learning
Unsupervised learning
Clustering is the
partitioning of a data set
into subsets (clusters), so
that the data in each
subset (ideally) share
some common trait - often
according to some defined
distance measure.
Training set:
Different Types of clustering:
1. Hierarchical algorithms: these find successive clusters using
previously established clusters.
1. Agglomerative ("bottom-up"): Agglomerative algorithms begin with
each element as a separate cluster and merge them into successively
larger clusters.
2. Divisive ("top-down"): Divisive algorithms begin with the whole set
and proceed to divide it into successively smaller clusters.
2. Partitional clustering: Partitional algorithms determine all clusters at
once. They include:
• K-means
• Fuzzy c-means clustering
• QT clustering algorithm
Common Distance measures:
• Distance measure will determine how the similarity of two
elements is calculated and it will influence the shape of the
clusters.
1. The Euclidean distance (also called 2-norm distance) is
given by:
2. The Manhattan distance (also called taxicab norm or 1-norm)

is given by:
3.The maximum norm is given by:
4. The Mahalanobis distance corrects data for different scales and

correlations in the variables.
5. Inner product space: The angle between two vectors can be used
as a distance measure when clustering high dimensional data.
6. Hamming distance (sometimes edit distance) measures the
minimum number of substitutions required to change one
member into another.
Cluster Centroids
Cluster Centroids
K-means algorithm assumption
Input:
- (number of clusters)
- Training set
(drop convention)
K-means algorithm
Randomly initialize cluster centroids

Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster
}
K-means optimization objective
= index of cluster (1,2,…, ) to which example is currently
assigned
= cluster centroid ( )
= cluster centroid of cluster to which example has been
assigned
Optimization objective:
K-means algorithm
Randomly initialize cluster centroids
Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster
}
K-Mean Clustering algorithm
Random initialization
Should have
Randomly pick training

examples.
Set equal to these

examples.
Local optima
Depending on the initialization of

cluster centroids K-means can produce
different results- it is known as
random initialization trap
Random initialization
For i = 1 to 100 {
Randomly initialize K-means.

Run K-means. Get .
Compute cost function (distortion)
Pick clustering that gave lowest cost

What is the right value of K?
Choosing the value of K
Elbow method:
Cost function
Cost function
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
(no. of clusters) (no. of clusters)

Choosing the value of K
Sometimes, you’re running K-means to get clusters to use for some
later/downstream purpose. Evaluate K-means based on a metric for how
well it performs for that later purpose.
E.g. T-shirt sizing T-shirt sizing
Weight
Weight
Height Height
A Simple example showing the implementation of
k-means algorithm
(using K=2)
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).
Step 2:
• Thus, we obtain two clusters
containing:
{1,2,3} and {4,5,6,7}.
• Their new centroids are:
Step 3:
• Now using these centroids we
compute the Euclidean
distance of each object, as
shown in table.
• Therefore, the new clusters

are:
{1,2} and {3,4,5,6,7}
• Next centroids are:

m1=(1.25,1.5) and m2 =
(3.9,5.1)
• Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}
• Therefore, there is no change

in the cluster.
• Thus, the algorithm comes to
a halt here and final result
consist of 2 clusters {1,2} and
{3,4,5,6,7}.
PLOT
Weaknesses of K-Mean Clustering
1. When the numbers of data are not so many, initial grouping
will determine the cluster significantly.
2. The number of cluster, K, must be determined before hand. Its
disadvantage is that it does not yield the same result with each
run, since the resulting clusters depend on the initial random
assignments.
3. We never know the real cluster, using the same data, because if
it is inputted in a different order it may produce different
cluster if the number of data is few.
4. It is sensitive to initial condition. Different initial condition
may produce different result of cluster. The algorithm may be
trapped in the local optimum.
Applications of K-Mean Clustering
• It is relatively efficient and fast. It computes result at O(tkn),
where n is number of objects or points, k is number of
clusters and t is number of iterations.
• k-means clustering can be applied to machine learning or
data mining
• Used on acoustic data in speech understanding to convert
waveforms into one of k categories (known as Vector
Quantization or Image Segmentation).
• Also used for choosing color palettes on old fashioned
graphical display devices and Image Quantization.
CONCLUSION
• K-means algorithm is useful for undirected knowledge discovery and
is relatively simple. K-means has found wide spread usage in lot of
fields, ranging from unsupervised learning of neural network, Pattern
recognitions, Classification analysis, Artificial intelligence, image
processing, machine vision, and many others.
Hierarchical Clustering
• Use distance matrix as clustering criteria. This method does not require
the number of clusters k as an input, but needs a termination condition Lets
say we have five data points {a, b, c, d, e}.
agglomerative
Step 0 Step 1 Step 2 Step 3 Step 4
(AGNES)
a • It begin with each element as a separate cluster

ab and at each iteration, the similar clusters merge
b abcde
with each other until one cluster or K clusters are
formed.
c
cde • Consider all the data points as a single cluster and
in each iteration, separate the data points from the
d
de cluster which are not similar. Each data point
which is separated is considered as an individual
e cluster. In the end, we’ll be left with n clusters.
Step 4 Step 3 Step 2 Step 1 Step 0 divisive

(DIANA)
Dendrogram: Shows How the Clusters are Merged
Decompose data objects into a several levels of nested partitioning (tree of clusters),
called a dendrogram.
A clustering of the data objects is obtained by cutting the dendrogram at the desired
level, then each connected component forms a cluster.
“Calculation of the Similarity Between Two
Clusters?”
Calculating the similarity between two clusters is important to
merge or divide the clusters.
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Ward’s Method
“Calculation of the Similarity Between Two Clusters?”
• MIN
Sim(C1,C2) = Min Sim(Pi, Pj) such that Pi ∈ C1 & Pj ∈ C2.
Also known as single-linkage algorithm. The similarity of two clusters C1
and C2 is equal to the minimum of the similarity between points Pi and Pj
such that Pi belongs to C1 and Pj belongs to C2.
It can separate non-elliptical shapes as long as the gap between the two
clusters is not small but cannot separate clusters properly if there is noise
between clusters.
• MAX
Sim(C1,C2) = Max Sim(Pi, Pj) such that Pi ∈ C1 & Pj ∈ C2
Pick the two farthest points such that one point lies in cluster one and the other
point lies in cluster 2
Complete Linkage; it does well in separating clusters if there is noise between

clusters; but it is biased towards globular clusters, and tends to break large
clusters.
• Group Average
sim(C1,C2) = ∑ sim(Pi, Pj)/|C1|*|C2|; where, Pi ∈ C1 & Pj ∈ C2
Take all the pairs of points and compute their similarities and calculate
the average of the similarities.
It does well in separating clusters if there is noise between clusters; but

is biased towards globular clusters.
• Distance between centroids
Compute the centroids of two clusters C1 & C2 and take the similarity
between the two centroids as the similarity between two clusters. This is
a less popular technique in the real world.
• Ward’s Method
sim(C1,C2) = ∑ (dist(Pi, Pj))²/|C1|*|C2|
calculate the similarity between two clusters in same way as computed

by Group Average except it calculates the sum of the square of the
distances Pi and PJ.
It also does well in separating clusters if there is noise between clusters.

But it is also biased towards globular clusters.
Space and Time Complexity of Hierarchical
clustering Technique
• The space required is very high to store the similarity matrix in the
RAM for large number of data points in case of Hierarchical clustering
Technique.
Space complexity = O(m²) where m is the number of data points.
• To perform “m” iterations and in each iteration, we need to update the

similarity matrix and restore the matrix, the time complexity is also
very high.
Time complexity = O(m³) where m is the number of data points.

Limitations of Hierarchical clustering Technique
 There is no mathematical objective for Hierarchical

clustering.
 All the approaches to calculate the similarity between

clusters has its own disadvantages.
 High space and time complexity for Hierarchical clustering.

Hence this clustering algorithm cannot be used when we have
huge data.

1731009606_Clustering_(Class_38-39)

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

1731009606_Clustering_(Class_38-39)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1731009606_Clustering_(Class_38-39)

Uploaded by

Copyright:

Available Formats

Clustering

2. The Manhattan distance (also called taxicab norm or 1-norm)

4. The Mahalanobis distance corrects data for different scales and

Randomly initialize cluster centroids

Randomly initialize cluster centroids

Randomly pick training

Set equal to these

Depending on the initialization of

Randomly initialize K-means.

Pick clustering that gave lowest cost

(no. of clusters) (no. of clusters)

E.g. T-shirt sizing T-shirt sizing

• Therefore, the new clusters

• Next centroids are:

• Therefore, there is no change

a • It begin with each element as a separate cluster

Step 4 Step 3 Step 2 Step 1 Step 0 divisive

Complete Linkage; it does well in separating clusters if there is noise between

It does well in separating clusters if there is noise between clusters; but

sim(C1,C2) = ∑ (dist(Pi, Pj))²/|C1|*|C2|

calculate the similarity between two clusters in same way as computed

It also does well in separating clusters if there is noise between clusters.

Space complexity = O(m²) where m is the number of data points.

• To perform “m” iterations and in each iteration, we need to update the

Time complexity = O(m³) where m is the number of data points.

 There is no mathematical objective for Hierarchical

 All the approaches to calculate the similarity between

 High space and time complexity for Hierarchical clustering.

You might also like