Joint Faculty Development Programme on
ML Application in Signal Processing
and Communication Engineering
Lecture 07: Bayesian Learning
Presented by
Dr. Irshad Ansari
(Indian Institute of Information Technology, Design and Manufacturing, Jabalpur,)
Machine Learning Application in Signal Processing and Communication Engineering
Clustering
By- Dr. Irshad Ahmad Ansari
Email: irshad@iiitdmj.ac.in
Outline
• Unsupervised Learning
• Introduction to Clustering
• K-means
• Competitive Learning (SOM):
Winners takes all*
*No Error Correction: Like in BP
What is clustering?
• The organization of unlabeled data into
similarity groups called clustering.
• A cluster is a collection of data items which
are “similar*” and “dissimilar” to data items
in other clusters.
*Color, approach, group etc.
Historic application of clustering
• Covid-19 Clusters in
India
https://howindialives.com/gram/coronadistricts/
Computer vision application:
Image segmentation
Cluster evaluation (a hard problem)
• Intra-cluster cohesion (compactness):
– Cohesion measures how near the data points
in a cluster are to the cluster centroid.
– Sum of squared error (SSE) is a commonly
used measure.
• Inter-cluster separation (isolation):
– Separation means that different cluster
centroids should be far away from one
another.
How many clusters?
K-Means clustering
• K-means (MacQueen, 1967) is a partitional
clustering algorithm
• Let the set of data points D be {x1, x2, …, xn},
where xi = (xi1, xi2, …, xir) is a vector in X Rr, and r is
the number of dimensions.
• The k-means algorithm partitions the given data
into k clusters:
– Each cluster has a cluster center, called centroid.
– k is specified by the user
K-means algorithm
• Given k, the k-means algorithm works as follows:
1. Choose k (random) data points (seeds) to be the
initial centroids, cluster centers
2. Assign each data point to the closest centroid
3. Re-compute the centroids using the current
cluster memberships
4. If a convergence criterion is not met, repeat steps 2
and 3
K-means convergence (stopping)
criterion
• no (or minimum) re-assignments of data
points to different clusters, or
• no (or minimum) change of centroids, or
• minimum decrease in the sum of squared error
(SSE),
SSE xC
2
k d (x, m j )
j
j1
– Cj is the jth cluster,
– mj is the centroid of cluster Cj (the mean vector of all
the data points in Cj),
– d(x, mj) is the (Eucledian) distance between data
point x and centroid mj.
K-means clustering example: step 1
K-means clustering example – step 2
K-means clustering example – step 3
K-means clustering example
K-means clustering example
K-means clustering example
Why use K-means?
• Strengths:
– Simple: easy to understand and to implement
– Efficient: Time complexity: O(tkn),
where n is the number of data
points, k is the number of
clusters, and
t is the number of iterations.
– Since both k and t are small. k-means is considered a
linear
algorithm.
• K-means is the most popular clustering algorithm.
• Note that: it terminates at a local optimum if SSE is
used. The global optimum is hard to find due to
complexity.
Outlier
s
Competitive learning