0% found this document useful (0 votes)

56 views23 pages

Clustering

Uploaded by

Aditya Mohite

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views23 pages

Clustering

Uploaded by

Aditya Mohite

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Clustering

(Unsupervised Learning)
Partitioned Clustering
• The data set is divided into a pre-defined number of non-overlapping subsets or clusters
• Each data point belongs to exactly one cluster
• One of the most common partitioned clustering algorithms is K-means
1.Initialization:
• Choose the number of clusters, K, and randomly initialize K cluster centroids. These centroids
can be randomly selected data points from the dataset.
2.Assign Data Points to Nearest Cluster:
• For each data point, calculate the distance to each cluster centroid.
• Assign each data point to the cluster whose centroid is the closest (usually based on Euclidean
distance).
3.Update Cluster Centroids:
• Recalculate the centroids of the clusters based on the mean of all data points assigned to each
cluster.
4.Repeat:
• Repeat steps 2 and 3 until convergence, i.e., until the cluster assignments no longer change or a
specified number of iterations is reached.
5.Convergence Criteria:
• Convergence can be determined by checking whether the cluster centroids are no longer
changing significantly between iterations or whether the assignments of data points to clusters
remain unchanged.
6.Final Output:
• Once the algorithm converges, the final output is a set of K clusters, each represented by its
centroid, and the data points assigned to each cluster.
Partitioned Clustering (K-Means
Clustering)
Partitional clustering assigns a set of data points into k-clusters by using iterative
processes. In these processes, n data are classified into k-clusters.

Solve this ……
Elbow
method
The elbow method is a graphical method for finding the optimal K value in a k-means
clustering algorithm.

It works by finding WCSS (Within-Cluster Sum of Square) i.e. the sum of the square
distance between points in a cluster and the cluster centroid.
Step 2: For each k calculate the within-cluster sum of squares (WCSS)
Hierarchical Clustering

Hierarchical clustering is another unsupervised machine learning algorithm, which is used to

group the unlabeled datasets into a cluster and also known as hierarchical cluster analysis or
HCA.

In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-
shaped structure is known as the dendrogram.

Sometimes the results of K-means clustering and hierarchical clustering may look similar, but
they both differ depending on how they work. As there is no requirement to predetermine
the number of clusters as we did in the K-Means algorithm.

The hierarchical clustering technique has two approaches:

1.Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts

with taking all data points as single clusters and merging them until one cluster is left.

2.Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-

down approach.
Agglomerative Hierarchical clustering

There are various ways to calculate the distance between two clusters, and these ways decide the
rule for clustering. These measures are called Linkage methods.

1. Single Linkage
2. Complete Linkage
3. Average Linkage
4. Centroid Linkage
Single
Linkage
Problem: Assume that the database D is given by the table below. Follow single link technique
to find clusters in D. Use Euclidean distance measure.
Step 1. Plot the objects in n-dimensional space (where n is the number of attributes). In
our case we have 2 attributes – x and y, so we plot the objects p1, p2, … p6 in 2-
dimensional space:

Step 2. Calculate the distance from each object (point) to all other points, using Euclidean
distance measure, and place the numbers in a distance matrix.
Step 3 Identify the two clusters with the shortest distance in the matrix, and merge them together.
Re-compute the distance matrix, as those two clusters are now in a single cluster, (no longer exist
by themselves).

By looking at the distance matrix above, we see that p3 and p6 have the smallest distance
from all - 0.11 So, we merge those two in a single cluster, and re-compute the distance matrix.
dist( (p3, p6), p1 ) = MIN ( dist(p3, p1) ,
dist(p6, p1) )
= MIN ( 0.22 , 0.23 )
//from original matrix
= 0.22
Step 4 Repeat Step 3 until all clusters are merged
Since, we have merged (p2, p5) together in a cluster, we now have one entry for (p2, p5) in
the table, and no longer have p2 or p5 separately. Therefore, we need to re-compute the distance
from all other points / clusters to our new cluster - (p2, p5). The distance between (p3, p6) and
(p2, p5) would be calculated as follows:
dist( (p3, p6), (p2, p5) ) = MIN ( dist(p3, p2) ,
dist(p6, p2), dist(p3, p5), dist(p6, p5) ) = MIN (
0.15 , 0.25, 0.28, 0.39 ) //from original matrix
= 0.15

Since we have more clusters to merge, we continue to repeat Step 3.

Since we have more clusters to merge, we continue to repeat Step
3.
Since we have more clusters to merge, we continue to repeat Step 3.
Stopping condition

The user would like the data partitioned into several clusters for unsupervised
learning purposes.

The algorithm has to stop clustering at some point – either the user will specify the
number of clusters he/she would like to have, or the algorithm has to make a decision
on its own.
Density based clustering

In density-based clustering (unsupervised non-parametric algorithm): given a set of points in

some space, it groups together points that are closely packed together (points with many
nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest
neighbors are too far away).

DBSCAN is the density

based clustering
algorithm
Calculate the distance that is less than epsilon which is 1.9 in this case
If the number of
points in the set is less
than 4 then it is noise

If the number of
points in the set is
greater than or equal
to 4 then it is Core

If the point is a part of

the core then it is the
Border and the one
which is not a border
will remain noise or
the outlier
The core is the centroid
and along with the
member points they
form the group

MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
51 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
ML Unit-5
No ratings yet
ML Unit-5
21 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Unit 3
No ratings yet
Unit 3
12 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Module 4 - 5TH Sem
No ratings yet
Module 4 - 5TH Sem
23 pages
Unsupervised Learning for Students
No ratings yet
Unsupervised Learning for Students
59 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
K Mean
No ratings yet
K Mean
3 pages
PART2
No ratings yet
PART2
61 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
ML Unit 4 Part A Material
No ratings yet
ML Unit 4 Part A Material
15 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
12 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
ML Seminar
No ratings yet
ML Seminar
37 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Clustering Personal
No ratings yet
Clustering Personal
9 pages
5 - CH 5-K-Means Clustering
No ratings yet
5 - CH 5-K-Means Clustering
54 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
79 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
45 pages
Module 5
No ratings yet
Module 5
43 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
69 pages
Week 10
No ratings yet
Week 10
84 pages
K Means
No ratings yet
K Means
26 pages
Clustering Analysis
No ratings yet
Clustering Analysis
12 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
Chapter 4
No ratings yet
Chapter 4
30 pages
Unsupervised Learning: Clustering Algorithms
No ratings yet
Unsupervised Learning: Clustering Algorithms
13 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
Algo
No ratings yet
Algo
59 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Unit 4
No ratings yet
Unit 4
19 pages
3.k-Metoids and Hierarchical Updated
No ratings yet
3.k-Metoids and Hierarchical Updated
50 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Unit 5
No ratings yet
Unit 5
63 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
19 - Sessionppt - Clusteringalgos
No ratings yet
19 - Sessionppt - Clusteringalgos
36 pages
10 Lecture AI 10
No ratings yet
10 Lecture AI 10
48 pages
Module 3
No ratings yet
Module 3
123 pages
Module 5
No ratings yet
Module 5
98 pages
Unit 4
No ratings yet
Unit 4
63 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Cluster
100% (1)
Cluster
72 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
6 pages
Kmea
No ratings yet
Kmea
53 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
K-Means Clustering Guide & Python Implementation
No ratings yet
K-Means Clustering Guide & Python Implementation
21 pages
Week 9
No ratings yet
Week 9
66 pages
Retrotech Business Solutions - Candidate Aptitude Test Amit Kushwaha
No ratings yet
Retrotech Business Solutions - Candidate Aptitude Test Amit Kushwaha
12 pages
Wind Turbine Specs for Engineers
No ratings yet
Wind Turbine Specs for Engineers
1 page
Manufacturing Project Management Guide
No ratings yet
Manufacturing Project Management Guide
46 pages
Statement of Facts Respendent
No ratings yet
Statement of Facts Respendent
29 pages
Ad Hoc Problems: Explanation
No ratings yet
Ad Hoc Problems: Explanation
6 pages
Mathematics: Daily Practice Problems
No ratings yet
Mathematics: Daily Practice Problems
2 pages
SpinCoater SOP 2018-05-31 RShih-manual
No ratings yet
SpinCoater SOP 2018-05-31 RShih-manual
3 pages
T40 High End Presentation Investor Pitch Deck
No ratings yet
T40 High End Presentation Investor Pitch Deck
15 pages
Drilling Fluid Filter Press Guide
No ratings yet
Drilling Fluid Filter Press Guide
14 pages
Orona Fault Codes
100% (5)
Orona Fault Codes
19 pages
Umi Dissertation Abstract Format
100% (2)
Umi Dissertation Abstract Format
7 pages
Advances in Oil & Gas Drilling
0% (1)
Advances in Oil & Gas Drilling
35 pages
Co-extrusion Film Machine Quote
No ratings yet
Co-extrusion Film Machine Quote
2 pages
Vparboot Manual PDF
No ratings yet
Vparboot Manual PDF
4 pages
G3f-Hsca 080325
No ratings yet
G3f-Hsca 080325
95 pages
Assignment 1 Digital Literacy
No ratings yet
Assignment 1 Digital Literacy
8 pages
Analyze TLS/SSL Handshake in Wireshark
No ratings yet
Analyze TLS/SSL Handshake in Wireshark
10 pages
Discrete Structures: Key Concepts and Questions
No ratings yet
Discrete Structures: Key Concepts and Questions
14 pages
Book Slides - Chapter 11 - Communication
No ratings yet
Book Slides - Chapter 11 - Communication
28 pages
Configure Layer 2 Direct Forwarding
No ratings yet
Configure Layer 2 Direct Forwarding
16 pages
Driveless RAN Tuning
No ratings yet
Driveless RAN Tuning
12 pages
PRoj Mangement
100% (1)
PRoj Mangement
31 pages
Babylog 8000 Parts Guide
No ratings yet
Babylog 8000 Parts Guide
12 pages
qpd0005-pk020 Rev (G)
No ratings yet
qpd0005-pk020 Rev (G)
15 pages
Mortgage Professionals Canada Online Courses: Education@mpc - Ca
No ratings yet
Mortgage Professionals Canada Online Courses: Education@mpc - Ca
5 pages
Maintenance Service Level Agreement Courtyard 2020 21
No ratings yet
Maintenance Service Level Agreement Courtyard 2020 21
8 pages
Amended Manual On Corporate Governance
No ratings yet
Amended Manual On Corporate Governance
21 pages
All-Hallow'S Steve - Peely Mask: Colour Print
100% (1)
All-Hallow'S Steve - Peely Mask: Colour Print
8 pages
Savage Armoury
100% (3)
Savage Armoury
10 pages
Innovators DNA
No ratings yet
Innovators DNA
4 pages

Clustering

Uploaded by

Clustering

Uploaded by

Clustering

Hierarchical clustering is another unsupervised machine learning algorithm, which is used to

The hierarchical clustering technique has two approaches:

1.Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts

2.Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-

Since we have more clusters to merge, we continue to repeat Step 3.

In density-based clustering (unsupervised non-parametric algorithm): given a set of points in

DBSCAN is the density

If the point is a part of

You might also like