Clustering and Dimensionality Reduction
Clustering and Dimensionality Reduction
1. Introduction to Clustering,
2. K means Clustering Algorithm,
3. Cost function,
4. Applications,
5. Dimensionality reduction,
6. PCA- Principal Component Analysis
7. Applications,
8. Clustering data and PCA.
❑ The quality of a clustering result depends on both the similarity measure used
by the method and its implementation.
❑ Partitioning algorithms: Construct various partitions and then evaluate them by some
criterion
❑ Hierarchy algorithms: Create a hierarchical decomposition of the set of data (or objects)
using some criterion
❑ Density-based: based on connectivity and density functions
❑ Grid-based: based on a multiple-level granularity structure
❑ Model-based: A model is hypothesized for each of the clusters and the idea is to find
the best fit of that model to each other
❑ Given a k, find a partition of k clusters that optimizes the chosen partitioning criterion
1. Introduction to Clustering,
2. K means Clustering Algorithm,
3. Cost function,
4. Applications,
5. Dimensionality reduction,
6. PCA- Principal Component Analysis
7. Applications,
8. Clustering data and PCA.
1. Choose the number of clusters k : The first step in k-means is to pick the
number of clusters, k.
2. Select k random points from the data as centroids:
➢ Randomly select the centroid for each cluster.
➢ Let’s say we want to have 2 clusters, so k is equal to 2 here.
➢ We then randomly select the centroid:
➢ Here you can see that the points closer to the red point are assigned to
the red cluster, whereas the points closer to the green point are
assigned to the green cluster.
May 19, 2025 11
2. The K-Means Clustering Algorithm
➢ Here, the red and green crosses are the new centroids.
➢ The step of computing the centroid and assigning all the points to the
cluster based on their distance from the centroid is a single iteration.
❖ when should we stop this process? It can’t run till eternity, right?
May 19, 2025 13
2. The K-Means Clustering Algorithm
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
◼ Example
http://shabal.in/visuals/kmeans/6.html
❑ Strength
▪ Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations.
Normally, k, t << n.
▪ Often terminates at a local optimum. The global optimum may be found using
techniques such as: deterministic annealing and genetic algorithms
❑ Weakness
▪ Applicable only when mean is defined, then what about categorical data?
▪ Need to specify k, the number of clusters, in advance
▪ Unable to handle noisy data and outliers
▪ Not suitable to discover clusters with non-convex shapes
❑ Problem Statement
▪ Assume that K=3 and initially the points are assigned to clusters as follows.
C1={x1,x2,x3}, C2={x4,x5,x6} and C3={x7,x8}. Apply the K-means algorithm until
convergence.(i.e. untill the clusters do not change), using the manhattan distance.
2. The K-Means Clustering Algorithm
❑ Given data
◼ X1=(1,9)
◼ X2=(1,2)
◼ X3=(9,5)
◼ X4=(4,7)
◼ X5=(10,6)
◼ X6=(7,5)
◼ X7=(2,3)
◼ X8=(4,8)
2. The K-Means Clustering Algorithm
1+1+ 9 9 + 2 + 5
C1 = ( , ) = (3.67,5.33)
3 3
4 + 10 + 7 7 + 6 + 5
C2 = ( , ) = (7,6)
3 3
2+ 4 3+8
C3 = ( , ) = (3,5.5)
2 2
2. The K-Means Clustering Algorithm
❑ Manhattan distance
▪ Distance calculation:
d (C2 , x1 ) = 7 − 1 + 9 − 6 = 6
d (C3 , x1 ) = 3 − 1 + 9 − 5.5 = 5
2. The K-Means Clustering Algorithm
❑ Iteration 1
▪ The table shows the distance between the object and centroids
x1 x2 x3 x4 x5 x6 x7 x8
C2 9 10 3 4 3 1 8 5
C3 C3 C2 C1 C2 C2 C3 C1
2. The K-Means Clustering Algorithm
❑ Iteration 1
▪ New centroids
4+ 4 7+8
C1 = {x4 , x8 } = ( , ) = (4,7.5)
2 2
9 + 10 + 7 5 + 6 + 5
C2 = {x3 , x5 , x6 } = ( , ) = (8.67,5.33)
3 3
1+1+ 2 9 + 2 + 3
C3 = {x1 , x2 , x7 } = ( , ) = (1.33,4.67)
3 3
2. The K-Means Clustering Algorithm
❑ Iteration 2
▪ The table shows the distance between the object and new centroids
x1 x2 x3 x4 x5 x6 x7 x8
C1 4.5 8.5 7.5 0.5 7.5 5.5 6.5 0.5
C2 11.34 11 0.66 6.34 2 2 9 7.34
C3 4.66 3 8 5 10 6 2.34 6
C1 C3 C2 C1 C2 C2 C3 C1
2. The K-Means Clustering Algorithm
❑ Iteration 2
▪ New centroids
1+ 4 + 4 9 + 7 + 8
C1 = {x1 , x4 , x8 } = ( , ) = (3,8)
3 3
9 + 10 + 7 5 + 6 + 5
C2 = {x3 , x5 , x6 } = ( , ) = (8.67,5.33)
3 3
1+ 2 2 + 3
C3 = {x2 , x7 } = ( , ) = (1.5,2.5)
2 2
2. The K-Means Clustering Algorithm
❑ Iteration 3
▪ The table shows the distance between the object and new centroids
x1 x2 x3 x4 x5 x6 x7 x8
C1 3 8 9 2 9 7 6 1
C3 4.66 3 8 5 10 6 2.34 6
C1 C3 C2 C1 C2 C2 C3 C1
2. The K-Means Clustering Algorithm
❑ Iteration 3
▪ New centroids
1+ 4 + 4 9 + 7 + 8
C1 = {x1 , x4 , x8 } = ( , ) = (3,8)
3 3
9 + 10 + 7 5 + 6 + 5
C2 = {x3 , x5 , x6 } = ( , ) = (8.67,5.33)
3 3
1+ 2 2 + 3
C3 = {x2 , x7 } = ( , ) = (1.5,2.5)
2 2
2. The K-Means Clustering Algorithm
❑ Results
▪ At this stage centroid remains same.
▪ The K-means algorithm converges here.
▪ Thus the final centroids are
➢ C1=(3,8)
➢ C2=(8.67,5.33)
➢ C3=(1.5,2.5)
2. The K-Means Clustering Algorithm
❑ Results
10
9 1, 9
8 3, 8 4, 8
7 4, 7
6 10, 6
8.67, 5.33
5 7, 5 9, 5
3 2, 3
1.5, 2.5
2 1, 2
0
0 2 4 6 8 10 12
2. The K-Means Clustering Algorithm- Problem
❑ Problem
▪ Assume that K=3 and initially the points are assigned to clusters as
follows. C1={x1,x3,x5} and C2={x2,x4}. Apply the K-means algorithm until
convergence.(i.e. untill the clusters do not change), using the manhattan
distance. Data points are given below.
X1=(3,11)
X2=(5,10)
X3=(3,4)
X4=(6,9)
X5=(2,2)
2. The K-Means Clustering Algorithm- Problem
X1=(3,11)
X2=(5,10)
X3=(3,4)
X4=(6,9)
X5=(2,2)
Problem
2. The K-Means Clustering Algorithm- Problem
X1=(3,11)
X2=(5,10)
X3=(3,4)
X4=(6,9)
X5=(2,2)
2. The K-Means Clustering Algorithm- Problem
X1=(3,11)
X2=(5,10)
X3=(3,4)
X4=(6,9)
X5=(2,2)
Ch-4 Clustering and Dimensionality Reduction
1. Introduction to Clustering,
2. K means Clustering Algorithm,
3. Cost function,
4. Applications,
5. Dimensionality reduction,
6. PCA- Principal Component Analysis
7. Applications,
8. Clustering data and PCA.
▪ The cost function measures the "goodness" of a particular clustering - A lower cost value
indicates a better clustering, where data points within a cluster are more tightly grouped around
their centroid.
▪ The K-means cost function, also known as the Within-Cluster Sum of Squares (WCSS), aims to
minimize the sum of squared distances between each data point and its assigned cluster
centroid.
3. Cost Function
1. Introduction to Clustering,
2. K means Clustering Algorithm,
3. Cost function,
4. Applications,
5. Dimensionality reduction,
6. PCA- Principal Component Analysis
7. Applications,
8. Clustering data and PCA.
◼ Pattern Recognition
◼ Image Processing
◼ WWW
◼ Document classification
➢ This is another common application of clustering. Let’s say you have multiple
documents and you need to cluster similar documents together.
➢ Clustering helps us group these documents such that similar documents are in
the same clusters.
May 19, 2025 41
4. Applications - Examples of Clustering Applications
➢ We can also use clustering to perform image segmentation. Here, we try to club
similar pixels in the image together. We can apply clustering to create clusters
having similar pixels in the same group.
◼ Recommendation Engines
➢ Clustering can also be used in recommendation engines. Let’s say you want to
recommend songs to your friends. You can look at the songs liked by that
person and then use clustering to find similar songs and finally recommend the
most similar songs.
◼ Marketing: Help marketers discover distinct groups in their customer bases, and then
use this knowledge to develop targeted marketing programs
◼ Land use: Identification of areas of similar land use in an earth observation database
◼ Insurance: Identifying groups of motor insurance policy holders with a high average
claim cost
◼ City-planning: Identifying groups of houses according to their house type, value, and
geographical location
1. Introduction to Clustering,
2. K means Clustering Algorithm,
3. Cost function,
4. Applications,
5. Dimensionality reduction,
6. PCA- Principal Component Analysis
7. Applications,
8. Clustering data and PCA.
❑ Curse of dimensionality:
◼ Where more features exponentially increase the data needed for reliable results.
◼ Having too many features in data can cause problems like overfitting (good on training data
but poor on new data), slower computation, and lower accuracy.
◼ The explosion of feature combinations makes sampling harder in high-dimensional data and
tasks like clustering or classification are more complex and slow.
❑ Curse of dimensionality: . . .
◼ To tackle the problem, Feature engineering Techniques ,such as feature
selection (choosing the most important features) and feature extraction (creating new
features from the original ones) are used.
◼ One popular feature extraction method is dimensionality reduction, which reduces the
number of features while keeping as much important information as possible.
◼ One of the most widely used dimensionality reduction techniques is Principal Component
Analysis (PCA).
◼ Helps preserve the most important patterns and relationships in the data.
◼ It prioritizes the directions where the data varies the most (because more variation = more
useful information.
Why is PCA often used as a preprocessing step before applying K-means? How does it help with
clustering performance and interpretability?
❑ Answer:
Benefit Description
Avoids curse of dimensionality, improves
Dimensionality Reduction
distance calculations
Noise Filtering Eliminates low-variance noise
➢ PCA identifies new axes (like rotating a camera) where the data spreads out the most:
o 1st Principal Component (PC1): The direction of maximum variance (most spread).
o 2nd Principal Component (PC2): The next best direction, perpendicular to PC1, and so
on.
➢ These directions are calculated using Eigenvalues and Eigenvectors
o where: eigenvectors (math tools that find these axes), and their importance is
ranked by eigenvalues (how much variance each captures).
➢ Keep only the top 2–3 directions (or enough to capture ~95% of the variance).
➢ Project the data onto these directions to get a simplified, lower-dimensional version.
❑ Visualizing PCA: x-axis (Radius) and y-axis (Area) represent two original features in the dataset.
➢ PC₂: The direction orthogonal
(perpendicular) to PC₁. It captures the
remaining variance but is less significant.
❑ Visualizing PCA: x-axis (Radius) and y-axis (Area) represent two original features in the dataset.
◼ https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-
clustering/
◼ https://www.geeksforgeeks.org/principal-component-analysis-pca/
!
!