[go: up one dir, main page]

0% found this document useful (0 votes)
12 views58 pages

Clustering and Dimensionality Reduction

Chapter 4 discusses clustering and dimensionality reduction, focusing on the K-means clustering algorithm and its implementation steps. It outlines the requirements for effective clustering, major approaches, and the strengths and weaknesses of K-means. Additionally, it provides a detailed example of applying the K-means algorithm to a dataset until convergence.

Uploaded by

relaxeddavinci0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views58 pages

Clustering and Dimensionality Reduction

Chapter 4 discusses clustering and dimensionality reduction, focusing on the K-means clustering algorithm and its implementation steps. It outlines the requirements for effective clustering, major approaches, and the strengths and weaknesses of K-means. Additionally, it provides a detailed example of applying the K-means algorithm to a dataset until convergence.

Uploaded by

relaxeddavinci0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Ch-4

Clustering and Dimensionality Reduction

May 19, 2025 1


Ch-4 Clustering and Dimensionality Reduction

1. Introduction to Clustering,
2. K means Clustering Algorithm,
3. Cost function,
4. Applications,
5. Dimensionality reduction,
6. PCA- Principal Component Analysis
7. Applications,
8. Clustering data and PCA.

May 19, 2025 2


1. Introduction to Clustering - What Is Good Clustering?

❑ A good clustering method will produce high quality clusters with


▪ high intra-class similarity
▪ low inter-class similarity

❑ The quality of a clustering result depends on both the similarity measure used
by the method and its implementation.

❑ The quality of a clustering method is also measured by its ability to discover


some or all of the hidden patterns.

May 19, 2025 4


1. Introduction to Clustering -

❑ Requirements of Clustering for Machine Learning


◼ Scalability
◼ Ability to deal with different types of attributes
◼ Discovery of clusters with arbitrary shape
◼ Minimal requirements for domain knowledge to determine input parameters
◼ Able to deal with noise and outliers
◼ Insensitive to order of input records
◼ High dimensionality
◼ Incorporation of user-specified constraints
◼ Interpretability and usability

May 19, 2025 5


1. Introduction to Clustering - Major Clustering Approaches

❑ Partitioning algorithms: Construct various partitions and then evaluate them by some
criterion
❑ Hierarchy algorithms: Create a hierarchical decomposition of the set of data (or objects)
using some criterion
❑ Density-based: based on connectivity and density functions
❑ Grid-based: based on a multiple-level granularity structure
❑ Model-based: A model is hypothesized for each of the clusters and the idea is to find
the best fit of that model to each other

May 19, 2025 6


1. Introduction to Clustering - Partitioning Algorithms: Basic Concept

❑ Partitioning method: Construct a partition of a database D of n objects into a set of k


clusters

❑ Given a k, find a partition of k clusters that optimizes the chosen partitioning criterion

▪ k-means (MacQueen’67): Each cluster is represented by the center of the cluster

▪ k-medoids or PAM (Partition Around Medoids) (Kaufman & Rousseeuw’87): Each


cluster is represented by one of the objects in the cluster

May 19, 2025 7


Ch-4 Clustering and Dimensionality Reduction

1. Introduction to Clustering,
2. K means Clustering Algorithm,
3. Cost function,
4. Applications,
5. Dimensionality reduction,
6. PCA- Principal Component Analysis
7. Applications,
8. Clustering data and PCA.

May 19, 2025 8


2. The K-Means Clustering Algorithm

❑ Given k, the k-means algorithm is implemented in 5 steps:

1. Choose the number of clusters k

2. Select k random points from the data as centroids

3. Assign all the points to the closest cluster centroid

4. Recompute the centroids of newly formed clusters

5. Repeat steps 3 and 4

May 19, 2025 9


2. The K-Means Clustering Algorithm

❑ Given k, the k-means algorithm is implemented in 5 steps: . . .

1. Choose the number of clusters k : The first step in k-means is to pick the
number of clusters, k.
2. Select k random points from the data as centroids:
➢ Randomly select the centroid for each cluster.
➢ Let’s say we want to have 2 clusters, so k is equal to 2 here.
➢ We then randomly select the centroid:

May 19, 2025 10


2. The K-Means Clustering Algorithm

❑ Given k, the k-means algorithm is implemented in 5 steps: . . .


3. Assign all the points to the closest cluster centroid
➢ Once we have initialized the centroids, we assign each point to the
closest cluster centroid:

➢ Here you can see that the points closer to the red point are assigned to
the red cluster, whereas the points closer to the green point are
assigned to the green cluster.
May 19, 2025 11
2. The K-Means Clustering Algorithm

❑ Given k, the k-means algorithm is implemented in 5 steps: . . .


4. Recompute the centroids of newly formed clusters
➢ Once we have assigned all of the points to either cluster, the next step
is to compute the centroids of newly formed clusters:

➢ Here, the red and green crosses are the new centroids.

May 19, 2025 12


2. The K-Means Clustering Algorithm

❑ Given k, the k-means algorithm is implemented in 5 steps: . . .


4. Repeat steps 3 and 4

➢ The step of computing the centroid and assigning all the points to the
cluster based on their distance from the centroid is a single iteration.

❖ when should we stop this process? It can’t run till eternity, right?
May 19, 2025 13
2. The K-Means Clustering Algorithm

❑ Stopping Criteria for K-Means Clustering

▪ There are essentially three stopping criteria that can be adopted to


stop the K-means algorithm:

1. Centroids of newly formed clusters do not change


2. Points remain in the same cluster
3. Maximum number of iterations is reached

May 19, 2025 14


2. The K-Means Clustering Algorithm
◼ Example
10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

May 19, 2025 15


2. The K-Means Clustering Algorithm
2. The K-Means Clustering Algorithm

◼ Example

http://shabal.in/visuals/kmeans/6.html

May 19, 2025 17


2. The K-Means Clustering Algorithm

❑ Strength
▪ Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations.
Normally, k, t << n.
▪ Often terminates at a local optimum. The global optimum may be found using
techniques such as: deterministic annealing and genetic algorithms

❑ Weakness
▪ Applicable only when mean is defined, then what about categorical data?
▪ Need to specify k, the number of clusters, in advance
▪ Unable to handle noisy data and outliers
▪ Not suitable to discover clusters with non-convex shapes

May 19, 2025 18


2. The K-Means Clustering Algorithm

❑ Problem Statement
▪ Assume that K=3 and initially the points are assigned to clusters as follows.
C1={x1,x2,x3}, C2={x4,x5,x6} and C3={x7,x8}. Apply the K-means algorithm until
convergence.(i.e. untill the clusters do not change), using the manhattan distance.
2. The K-Means Clustering Algorithm

❑ Given data
◼ X1=(1,9)
◼ X2=(1,2)
◼ X3=(9,5)
◼ X4=(4,7)
◼ X5=(10,6)
◼ X6=(7,5)
◼ X7=(2,3)
◼ X8=(4,8)
2. The K-Means Clustering Algorithm

❑ The initial centroids are


▪ C1={x1,x2,x3}
▪ C2={x4,x5,x6}
▪ C3={x7,x8}

1+1+ 9 9 + 2 + 5
C1 = ( , ) = (3.67,5.33)
3 3

4 + 10 + 7 7 + 6 + 5
C2 = ( , ) = (7,6)
3 3

2+ 4 3+8
C3 = ( , ) = (3,5.5)
2 2
2. The K-Means Clustering Algorithm

❑ Manhattan distance

▪ Distance calculation:

d (C1 , x1 ) = 3.67 − 1 + 9 − 5.33 = 6.34

d (C2 , x1 ) = 7 − 1 + 9 − 6 = 6

d (C3 , x1 ) = 3 − 1 + 9 − 5.5 = 5
2. The K-Means Clustering Algorithm

❑ Iteration 1
▪ The table shows the distance between the object and centroids

x1 x2 x3 x4 x5 x6 x7 x8

C1 6.34 6 5.66 2 7 3.66 4 3

C2 9 10 3 4 3 1 8 5

C3 5.5 5.5 6.5 2.5 7.5 4.5 3.5 3.5

C3 C3 C2 C1 C2 C2 C3 C1
2. The K-Means Clustering Algorithm

❑ Iteration 1
▪ New centroids

4+ 4 7+8
C1 = {x4 , x8 } = ( , ) = (4,7.5)
2 2
9 + 10 + 7 5 + 6 + 5
C2 = {x3 , x5 , x6 } = ( , ) = (8.67,5.33)
3 3

1+1+ 2 9 + 2 + 3
C3 = {x1 , x2 , x7 } = ( , ) = (1.33,4.67)
3 3
2. The K-Means Clustering Algorithm

❑ Iteration 2
▪ The table shows the distance between the object and new centroids

x1 x2 x3 x4 x5 x6 x7 x8
C1 4.5 8.5 7.5 0.5 7.5 5.5 6.5 0.5
C2 11.34 11 0.66 6.34 2 2 9 7.34
C3 4.66 3 8 5 10 6 2.34 6
C1 C3 C2 C1 C2 C2 C3 C1
2. The K-Means Clustering Algorithm

❑ Iteration 2

▪ New centroids

1+ 4 + 4 9 + 7 + 8
C1 = {x1 , x4 , x8 } = ( , ) = (3,8)
3 3

9 + 10 + 7 5 + 6 + 5
C2 = {x3 , x5 , x6 } = ( , ) = (8.67,5.33)
3 3
1+ 2 2 + 3
C3 = {x2 , x7 } = ( , ) = (1.5,2.5)
2 2
2. The K-Means Clustering Algorithm

❑ Iteration 3
▪ The table shows the distance between the object and new centroids

x1 x2 x3 x4 x5 x6 x7 x8

C1 3 8 9 2 9 7 6 1

C2 11.34 11 0.66 6.34 2 2 9 7.34

C3 4.66 3 8 5 10 6 2.34 6

C1 C3 C2 C1 C2 C2 C3 C1
2. The K-Means Clustering Algorithm

❑ Iteration 3
▪ New centroids

1+ 4 + 4 9 + 7 + 8
C1 = {x1 , x4 , x8 } = ( , ) = (3,8)
3 3
9 + 10 + 7 5 + 6 + 5
C2 = {x3 , x5 , x6 } = ( , ) = (8.67,5.33)
3 3

1+ 2 2 + 3
C3 = {x2 , x7 } = ( , ) = (1.5,2.5)
2 2
2. The K-Means Clustering Algorithm

❑ Results
▪ At this stage centroid remains same.
▪ The K-means algorithm converges here.
▪ Thus the final centroids are
➢ C1=(3,8)
➢ C2=(8.67,5.33)
➢ C3=(1.5,2.5)
2. The K-Means Clustering Algorithm

❑ Results
10

9 1, 9

8 3, 8 4, 8

7 4, 7

6 10, 6

8.67, 5.33
5 7, 5 9, 5

3 2, 3
1.5, 2.5
2 1, 2

0
0 2 4 6 8 10 12
2. The K-Means Clustering Algorithm- Problem

❑ Problem
▪ Assume that K=3 and initially the points are assigned to clusters as
follows. C1={x1,x3,x5} and C2={x2,x4}. Apply the K-means algorithm until
convergence.(i.e. untill the clusters do not change), using the manhattan
distance. Data points are given below.
X1=(3,11)
X2=(5,10)
X3=(3,4)
X4=(6,9)
X5=(2,2)
2. The K-Means Clustering Algorithm- Problem

X1=(3,11)
X2=(5,10)
X3=(3,4)
X4=(6,9)
X5=(2,2)
Problem
2. The K-Means Clustering Algorithm- Problem

X1=(3,11)
X2=(5,10)
X3=(3,4)
X4=(6,9)
X5=(2,2)
2. The K-Means Clustering Algorithm- Problem

X1=(3,11)
X2=(5,10)
X3=(3,4)
X4=(6,9)
X5=(2,2)
Ch-4 Clustering and Dimensionality Reduction

1. Introduction to Clustering,
2. K means Clustering Algorithm,
3. Cost function,
4. Applications,
5. Dimensionality reduction,
6. PCA- Principal Component Analysis
7. Applications,
8. Clustering data and PCA.

May 19, 2025 36


3. Cost Function

❑ Cost Function for K- Means:


▪ The K-means algorithm aims to partition data points into k clusters, where each data point
belongs to the cluster with the nearest centroid.

▪ The cost function measures the "goodness" of a particular clustering - A lower cost value
indicates a better clustering, where data points within a cluster are more tightly grouped around
their centroid.

▪ The K-means cost function, also known as the Within-Cluster Sum of Squares (WCSS), aims to
minimize the sum of squared distances between each data point and its assigned cluster
centroid.
3. Cost Function

❑ Cost Function for K- Means: . . .


▪ After each iteration we get k Centroids with assignment of training examples to respective
clusters.
▪ So the objective function is defined as- Summation of euclidean distance of each training
example with its cluster center and this is summed over k clusters:
Ch-4 Clustering and Dimensionality Reduction

1. Introduction to Clustering,
2. K means Clustering Algorithm,
3. Cost function,
4. Applications,
5. Dimensionality reduction,
6. PCA- Principal Component Analysis
7. Applications,
8. Clustering data and PCA.

May 19, 2025 39


4. Applications - - General Applications of Clustering

◼ Pattern Recognition

◼ Spatial Data Analysis


◼ create thematic maps in GIS by clustering feature spaces

◼ detect spatial clusters and explain them in spatial data mining

◼ Image Processing

◼ Economic Science (especially market research)

◼ WWW
◼ Document classification

◼ Cluster Weblog data to discover groups of similar access patterns

May 19, 2025 Data Mining: Concepts and Techniques 40


4. Applications - Examples of Clustering Applications

❑ Applications of Clustering in Real-World Scenarios


Clustering is a widely used technique in the industry. It is being used in almost every
domain, from banking and recommendation engines to document clustering and
image segmentation.
◼ Customer Segmentation

➢ We covered this earlier – one of the most common applications of clustering


is customer segmentation. And it isn’t just limited to banking. This strategy is
across functions, including telecom, e-commerce, sports, advertising, sales, etc.
◼ Document Clustering

➢ This is another common application of clustering. Let’s say you have multiple
documents and you need to cluster similar documents together.
➢ Clustering helps us group these documents such that similar documents are in
the same clusters.
May 19, 2025 41
4. Applications - Examples of Clustering Applications

❑ Applications of Clustering in Real-World Scenarios . . .


◼ Image Segmentation

➢ We can also use clustering to perform image segmentation. Here, we try to club

similar pixels in the image together. We can apply clustering to create clusters
having similar pixels in the same group.

◼ Recommendation Engines
➢ Clustering can also be used in recommendation engines. Let’s say you want to
recommend songs to your friends. You can look at the songs liked by that
person and then use clustering to find similar songs and finally recommend the
most similar songs.

May 19, 2025 42


4. Applications - Examples of Clustering Applications

◼ Marketing: Help marketers discover distinct groups in their customer bases, and then
use this knowledge to develop targeted marketing programs

◼ Land use: Identification of areas of similar land use in an earth observation database

◼ Insurance: Identifying groups of motor insurance policy holders with a high average
claim cost

◼ City-planning: Identifying groups of houses according to their house type, value, and
geographical location

◼ Earth-quake studies: Observed earth quake epicenters should be clustered along


continent faults

May 19, 2025 43


Ch-4 Clustering and Dimensionality Reduction

1. Introduction to Clustering,
2. K means Clustering Algorithm,
3. Cost function,
4. Applications,
5. Dimensionality reduction,
6. PCA- Principal Component Analysis
7. Applications,
8. Clustering data and PCA.

May 19, 2025 44


5 . Dimensionality reduction

❑ Curse of dimensionality:
◼ Where more features exponentially increase the data needed for reliable results.

◼ Having too many features in data can cause problems like overfitting (good on training data
but poor on new data), slower computation, and lower accuracy.

◼ The explosion of feature combinations makes sampling harder in high-dimensional data and
tasks like clustering or classification are more complex and slow.

May 19, 2025 45


5 . Dimensionality reduction

❑ Curse of dimensionality: . . .
◼ To tackle the problem, Feature engineering Techniques ,such as feature
selection (choosing the most important features) and feature extraction (creating new
features from the original ones) are used.

◼ One popular feature extraction method is dimensionality reduction, which reduces the
number of features while keeping as much important information as possible.

◼ One of the most widely used dimensionality reduction techniques is Principal Component
Analysis (PCA).

May 19, 2025 46


5 . Dimensionality reduction

❑ Principal Component Analysis (PCA) :


◼ PCA is a statistical technique introduced by mathematician Karl Pearson in 1901.

◼ It works by transforming high-dimensional data into a lower-dimensional space while


maximizing the variance (or spread) of the data in the new space.

◼ Helps preserve the most important patterns and relationships in the data.

◼ It prioritizes the directions where the data varies the most (because more variation = more
useful information.

May 19, 2025 47


5 . Dimensionality reduction

Why is PCA often used as a preprocessing step before applying K-means? How does it help with
clustering performance and interpretability?
❑ Answer:
Benefit Description
Avoids curse of dimensionality, improves
Dimensionality Reduction
distance calculations
Noise Filtering Eliminates low-variance noise

Faster Computation Lower dimensions = faster K-means iterations


Projects data into spaces with clearer
Better Cluster Separation
groupings
Improved Visualization Enables 2D/3D plotting of clusters
Converts correlated features into orthogonal
Handles Correlated Features
principal components

May 19, 2025 48


5 . Dimensionality reduction

❑ Principal Component Analysis (PCA) – Algorithm

Step 1: Standardize the Data.

Step 2: Find Relationships

Step 3: Find the “Magic Directions” (Principal Components)

Step 4: Pick the Top Directions & Transform Data

May 19, 2025 49


5 . Dimensionality reduction

❑ Principal Component Analysis (PCA) – Algorithm . . .


▪ Step 1: Standardize the Data -
➢ Make sure all features (e.g., height, weight, age) are on the same scale.
o Why? - A feature like “salary” (ranging 0–100,000) could dominate “age” (0–100)
otherwise.
➢ Standardizing our dataset ensures that each variable has a mean of 0 and a standard
deviation of 1.
Z=(X−μ)/σ
o μ is the mean of independent features μ={μ1,μ2,⋯,μm}
o σ is the standard deviation of independent features σ={σ1,σ2,⋯,σm}

May 19, 2025 50


5 . Dimensionality reduction

❑ Principal Component Analysis (PCA) – Algorithm . . .


▪ Step 2: Find Relationships -
➢ Calculate how features move together using a covariance matrix.
➢ Covariance measures the strength of joint variability between two or more variables,
indicating how much they change in relation to each other.
➢ To find the covariance we can use the formula:

➢ The value of covariance can be positive, negative, or zeros:


o Positive - As the x1 increases x2 also increases.
o Negative - As the x1 increases x2 also decreases.
o Zeros - No direct relation.
May 19, 2025 51
5 . Dimensionality reduction

❑ Principal Component Analysis (PCA) – Algorithm . . .


▪ Step 3: Find the “Magic Directions” (Principal Components) -

➢ PCA identifies new axes (like rotating a camera) where the data spreads out the most:
o 1st Principal Component (PC1): The direction of maximum variance (most spread).
o 2nd Principal Component (PC2): The next best direction, perpendicular to PC1, and so
on.
➢ These directions are calculated using Eigenvalues and Eigenvectors
o where: eigenvectors (math tools that find these axes), and their importance is
ranked by eigenvalues (how much variance each captures).

May 19, 2025 52


5 . Dimensionality reduction

❑ Principal Component Analysis (PCA) – Algorithm . . .


▪ Step 3: Find the “Magic Directions” (Principal Components) - . . .
➢ For a square matrix A, an eigenvector X (a non-zero vector) and its corresponding eigenvalue
λ (a scalar) satisfy: AX=λX
o When A acts on X, it only stretches or shrinks X by the scalar λ.
o The direction of X remains unchanged (hence, eigenvectors define “stable directions” of A).
➢ It can also be written as :
AX−λX = 0
(A−λI)X​=0
o where I is the identity matrix of the same shape as matrix A.
o And the above conditions will be true only if (A–λI) will be non-invertible (i.e. singular
matrix). That is, ∣A–λI∣ = 0
May 19, 2025 53
5 . Dimensionality reduction

❑ Principal Component Analysis (PCA) – Algorithm . . .


▪ Step 3: Find the “Magic Directions” (Principal Components) - . . .
➢ This determinant equation is called the characteristic equation.

➢ Solving it gives the eigenvalues \lambda,


➢ Therefore corresponding eigenvector can be found using the equation AX=λX.
➢ How This Connects to PCA?
o In PCA, the covariance matrix C (from Step 2) acts as matrix A.
o Eigenvectors of C are the principal components (PCs).
o Eigenvalues represent the variance captured by each PC.

May 19, 2025 54


5 . Dimensionality reduction

❑ Principal Component Analysis (PCA) – Algorithm . . .


▪ Step 4: Pick the Top Directions & Transform Data-

➢ Keep only the top 2–3 directions (or enough to capture ~95% of the variance).

➢ Project the data onto these directions to get a simplified, lower-dimensional version.

➢ PCA is an unsupervised learning algorithm - it doesn’t require prior knowledge of


target variables.
➢ It’s commonly used in exploratory data analysis and machine learning to simplify
datasets without losing critical information.

May 19, 2025 55


5 . Dimensionality reduction

❑ Visualizing PCA: x-axis (Radius) and y-axis (Area) represent two original features in the dataset.
➢ PC₂: The direction orthogonal
(perpendicular) to PC₁. It captures the
remaining variance but is less significant.

o The red dashed lines indicate the spread


(variance) of data along different directions .
o The variance along PC₁ is greater than
PC₂, which means that PC₁ carries more
useful information about the dataset.
o The data points (blue dots) are projected
▪ Principal Components (PCs): . . . onto PC₁, effectively reducing the dataset
from two dimensions (Radius, Area) to one
➢ PC₁: The direction along which the data has dimension (PC₁).
the maximum variance. It captures the most o This transformation simplifies the dataset
important information. while retaining most of the original
variability.
May 19, 2025 56
5 . Dimensionality reduction

❑ Visualizing PCA: x-axis (Radius) and y-axis (Area) represent two original features in the dataset.

▪ The image visually explains why


PCA selects the direction with the
highest variance (PC₁).

▪ By removing PC₂, we reduce


redundancy while keeping essential
information.

▪ The transformation helps in data


compression, visualization, and
improved model performance.

May 19, 2025 57


References

◼ https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-
clustering/
◼ https://www.geeksforgeeks.org/principal-component-analysis-pca/

May 19, 2025 58


Thank You

!
!

You might also like