0% found this document useful (0 votes)

26 views11 pages

Clustering

Clustering is a process of grouping similar data points into clusters based on their characteristics, which is useful in various applications such as customer segmentation and image processing. Different clustering techniques include partitioning methods like K-means, hierarchical clustering, and density-based methods like DBSCAN, each with its advantages and specific use cases. Metrics such as Silhouette Score, Davies-Bouldin Index, and Calinski-Harabasz Index are used to evaluate the effectiveness of clustering algorithms.

Uploaded by

vedanta.rcr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views11 pages

Clustering

Uploaded by

vedanta.rcr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

CLUSTERING

Clustering:
The process of making a group of abstract objects into classes of similar
objects is known as clustering.
Points to Remember:
One group is treated as a cluster of data objects
 In the process of cluster analysis, the first step is to partition the set of
data into groups with the help of data similarity, and then groups are
assigned to their respective labels.
 The biggest advantage of clustering over-classification is it can adapt to
the changes made and helps single out useful features that differentiate
different groups.

Cluster Analysis
Cluster analysis is also known as clustering, which groups similar data
points forming clusters. The goal is to ensure that data points within a
cluster are more similar to each other than to those in other clusters. For
example, in e-commerce retailers use clustering to group customers based
on their purchasing habits. If one group frequently buys fitness gear while
another prefers electronics. This helps companies to give personalized
recommendations and improve customer experience.

It is useful for:
 Scalability: It can efficiently handle large volumes of data.
 High Dimensionality: Can handle high-dimensional data.
1. Adaptability to Different Data Types: It can work with numerical
data like age, salary and categorical data like gender, occupation.
2. Handling Noisy and Missing Data: Usually, datasets contain missing
values or inconsistencies and clustering can manage them easily.
3. Interpretability: Output of clustering is easy to understand and apply
in real-world scenarios.

Types of Clustering Techniques

Clustering can be broadly classified into several methods. The choice of
method depends on the type of data and the problem you're solving.

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 1

CLUSTERING

Partitioning Methods
4. Partitioning Methods divide the data into k groups (clusters) where each
data point belongs to only one group. These methods are used when you
already know how many clusters you want to create. A common example
is K-means clustering.
1. In K-means the algorithm assigns each data point to the nearest center
and then updates the center based on the average of all points in that
group. This process repeats until the centres stop changing. It is used in
real-life applications like streaming platforms like Spotify to group users
based on their listening habits.

This clustering method classifies the information into multiple groups based
on the characteristics and similarity of the data. Its the data analysts to
specify the number of clusters that has to be generated for the clustering
methods.

In the partitioning method when database(D) that contains multiple(N)

objects then the partitioning method constructs user-specified(K) partitions
of the data in which each partition represents a cluster and a particular
region. There are many algorithms that come under partitioning method
some of the popular ones are K-Mean, PAM(K-Medoids), CLARA algorithm
(Clustering Large Applications) etc. In this article, we will be seeing the
working of K Mean algorithm in detail.

K-Mean (A centroid based Technique): The K means algorithm takes

the input parameter K from the user and partitions the dataset containing N
objects into K clusters so that resulting similarity among the data objects
inside the group (intracluster) is high but the similarity of data objects with
the data objects from outside the cluster is low (intercluster). The similarity
of the cluster is determined with respect to the mean value of the cluster. It
is a type of square error algorithm. At the start randomly k objects from the
dataset are chosen in which each of the objects represents a cluster
mean(centre). For the rest of the data objects, they are assigned to the
nearest cluster based on their distance from the cluster mean. The new
mean of each of the cluster is then calculated with the added data objects.

Algorithm:
K mean:
Input:
K: The number of clusters in which the dataset has to be divided

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 2

CLUSTERING

D: A dataset containing N number of objects

Output:
A dataset of K clusters
Method:
2. Randomly assign K objects from the dataset(D) as cluster centres(C)
 (Re) Assign each object to which object is most similar based upon mean
values.
 Update Cluster means, i.e., Recalculate the mean of each cluster with
the updated values.
 Repeat Step 2 until no change occurs.

Figure - K-mean Clustering

For example refer running notes

Hierarchical clustering
A Hierarchical clustering method works via grouping data into a tree of
clusters. Hierarchical clustering begins by treating every data point as a
separate cluster. Then, it repeatedly executes the subsequent steps:
 Identify the 2 clusters which can be closest together, and
 Merge the 2 maximum comparable clusters. We need to continue these
steps until all the clusters are merged together.

In Hierarchical Clustering, the aim is to produce a hierarchical series of

nested clusters. A diagram called Dendrogram (A Dendrogram is a tree-
like diagram that statistics the sequences of merges or splits) graphically
represents this hierarchy and is an inverted tree that describes the order in

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 3

CLUSTERING

which factors are merged (bottom-up view) or clusters are broken up (top-
down view).

Let's say we have six data points A, B, C, D, E, and F.

 Step-1: Consider each alphabet as a single cluster and calculate the

distance of one cluster from all the other clusters.
 Step-2: In the second step comparable clusters are merged together to
form a single cluster. Let's say cluster (B) and cluster (C) are very similar
to each other therefore we merge them in the second step similarly to
cluster (D) and (E) and at last, we get the clusters [(A), (BC), (DE), (F)]
 Step-3: We recalculate the proximity according to the algorithm and
merge the two nearest clusters([(DE), (F)]) together to form new clusters
as [(A), (BC), (DEF)]
 Step-4: Repeating the same process; The clusters DEF and BC are
comparable and merged together to form a new cluster. We’re now left
with clusters [(A), (BCDEF)].
 Step-5: At last, the two remaining clusters are merged together to form
a single cluster [(ABCDEF)].

Divisive Hierarchical clustering

We can say that Divisive Hierarchical clustering is precisely the opposite of
Agglomerative Hierarchical clustering. In Divisive Hierarchical clustering, we
take into account all of the data points as a single cluster and in every
iteration, we separate the data points from the clusters which aren't
comparable. In the end, we are left with N clusters.

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 4

CLUSTERING

Divisive Hierarchical clustering

DBSCAN is a density-based clustering algorithm that groups data

points that are closely packed together and marks outliers as
noise based on their density in the feature space. It identifies clusters as
dense regions in the data space separated by areas of lower density. Unlike
K-Means or hierarchical clustering which assumes clusters are compact
and spherical, DBSCAN perform well in handling real-world data
irregularities such as:
1. Arbitrary-Shaped Clusters: Clusters can take any shape not just
circular or convex.
2. Noise and Outliers: It effectively identifies and handles noise points
without assigning them to any cluster.

DBSCAN Clustering in ML | Density based clustering

The figure above shows a data set with clustering algorithms: K-Means and
Hierarchical handling compact, spherical clusters with varying noise
tolerance while DBSCAN manages arbitrary-shaped clusters and noise
handling.
Key Parameters in DBSCAN

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 5

CLUSTERING

1. eps: This defines the radius of the neighborhood around a data point. If
the distance between two points is less than or equal to eps they are
considered neighbors. A common method to determine eps is by analyzing
the k-distance graph. Choosing the right eps is important:
3. If eps is too small most points will be classified as noise.
4. If eps is too large clusters may merge and the algorithm may fail to
distinguish between them.
2. MinPts: This is the minimum number of points required within
the eps radius to form a dense region. A general rule of thumb is to set
MinPts >= D+1 where D is the number of dimensions in the dataset.
For most cases a minimum value of MinPts = 3 is recommended.

Flowchart:
Figure - K-mean Clustering
Example: Suppose we want to group the visitors to a website using just
their age as follows:
16, 16, 17, 20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66
Initial Cluster:
K=2
Centroid(C1) = 16 [16]
Centroid(C2) = 22 [22]
Note: These two points are chosen randomly from the dataset.

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 6

CLUSTERING

Iteration-1:
C1 = 16.33 [16, 16, 17]
C2 = 37.25 [20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-2:
C1 = 19.55 [16, 16, 17, 20, 20, 21, 21, 22, 23]
C2 = 46.90 [29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-3:
C1 = 20.50 [16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C2 = 48.89 [36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-4:
C1 = 20.50 [16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C2 = 48.89 [36, 41, 42, 43, 44, 45, 61, 62, 66]
No change Between Iteration 3 and 4, so we stop. Therefore we get the
clusters (16-29) and (36-66) as 2 clusters we get using K Mean Algorithm.

Grid-based clustering methods are a category of clustering algorithms that

quantize the data space into a finite number of cells or grids, and then
perform clustering operations on the grid structures. The key idea is to
reduce the vast data objects to a much smaller number of grid cells, which
can make the clustering process more efficient, especially for large datasets.
Grid-based clustering offers several advantages:

 Efficiency: By reducing the problem space to a finite number of cells,

grid-based methods can significantly speed up the clustering process,
especially for large datasets.

 Scalability: These methods can handle high-dimensional data more

effectively by working with grids in subspaces.

 Flexibility: Grid-based methods can easily adapt to different data

distributions and can be combined with other clustering techniques

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 7

CLUSTERING

Process:
 Grid Creation: The data space is partitioned into a grid structure.
 Cell Assignment: Each data point is assigned to the corresponding grid
cell.
 Clustering Operations: Clustering algorithms are applied on the grid cells,
rather than individual data points.
 Cluster Refinement (Optional): In some methods, clusters can be refined
further by examining the relationships between cells.
Advantages:
 Fast Processing:
Grid-based clustering often has a faster processing time, especially for
large datasets, as the number of calculations is reduced.
 Scalability:
These methods can be more scalable for large datasets, as the clustering
operations are performed on the grid cells rather than individual data
points.
 Computational Complexity:
Grid-based methods can have lower computational complexity, making
them suitable for analyzing large and complex datasets.
Examples:
 STING (Statistical Information Grid):
A grid-based clustering algorithm that uses statistical information within the
grid cells to identify clusters.
 CLIQUE (CLustering In QUEst):
Another grid-based clustering algorithm that identifies clusters by
examining the density of grid cells.
 Wave Cluster:

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 8

CLUSTERING

A grid-based clustering algorithm that uses a wavelet transform to analyze

the data.

Applications:
Grid-based clustering is used in various applications, including:
 Spatial Data Mining: Analyzing spatial datasets, such as location data or
geographic information systems.
 Web Data Mining: Analyzing web usage patterns and identifying user
clusters.
 Image Processing: Segmenting images and identifying regions of interest.
 Data Mining: General clustering tasks, particularly for large and complex
datasets.

Clustering is a technique in Machine Learning that is used to group

similar data points. While the algorithm performs its job, helping uncover
the patterns and structures in the data, it is important to judge how well it
functions. Several metrics have been designed to evaluate the performance
of these clustering algorithms.
Clustering is an unsupervised machine-learning approach that is used
to group comparable data points based on specific traits or attributes.
Clustering algorithms do not require labelled data, which makes them ideal
for finding patterns in large datasets. It is a widely used technique in
applications like customer segmentation, image recognition, anomaly
detection, etc.
There are multiple clustering algorithms, and each has its way of grouping
data points. Clustering metrics are used to evaluate all these algorithms.
Let us take a look at some of the most commonly used clustering metrics:
1. Silhouette Score
The Silhouette Score is a way to measure how good the clusters are in a
dataset. It helps us understand how well the data points have been
grouped. The score ranges from -1 to 1.
 A score close to 1 means a point fits really well in its group (cluster) and
is far from other groups.
 A score close to 0 means the point is on the border between two
clusters.
 A score close to -1 means the point might be in the wrong cluster.
Silhouette Score (S) for a data point i is calculated as:

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 9

CLUSTERING

S(i)=b(i)−a(i)max(a(i),b(i))S(i)=max(a(i),b(i))b(i)−a(i)
where,
 a(i) is the average distance from i to other data points in the same
cluster.
 b(i) is the smallest average distance from i to data points in a different
cluster.
2. Davies-Bouldin Index
The Davies-Bouldin Index (DBI) helps us measure how good the
clustering is in a dataset. It looks at how tight each cluster is
(compactness), and how far apart the clusters are (separation).
 Lower DBI = better, clearer clusters
 Higher DBI = messy, overlapping clusters
A lower score is better, because it means:
 Points in the same cluster are close to each other.
 Different clusters are far apart from one another.
Davies-Bouldin Index (DB) is calculated as:
DB=1k∑i=1kmax⁡j≠i(Rii+RjjRij)DB=k1∑i=1kmaxj=i(RijRii+Rjj)
where,
 k is the total number of clusters.
 RiiRii is the compactness of cluster i.
 RjjRjj is the compactness of cluster j.
 RijRij is the dissimilarity (distance) between cluster i and cluster j.

3. Calinski-Harabasz Index (Variance Ratio Criterion)

The Calinski-Harabasz Index measures how good the clusters are in a
dataset.
It looks at:
 How close the points are inside each cluster?
 How far apart the clusters are?
A higher score is better, as it means the clusters are tight and well-
separated. It helps determine the ideal number of clusters.
Calinski-Harabasz Index (CH) is calculated as:
CH=BW×N−KK−1CH=WB×K−1N−K
where,
 B is the sum of squares between clusters.
 W is the sum of squares within clusters.
 N is the total number of data points.
 K is the number of clusters.

Adjusted Rand Index (ARI)

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 10

CLUSTERING

The Adjusted Rand Index (ARI) helps us measure how accurate a

clustering result is by comparing it to the true labels (ground truth).
It checks how well the pairs of points are grouped:
 Are the same pairs together in both the real and predicted clusters?
 Are different pairs also kept apart correctly?
The score ranges from -1 to 1:
 1 means perfect match - the clustering is exactly right.
 0 means random guess - no better than chance.
 Below 0 means worse than random - very poor clustering.
Adjusted Rand Index (ARI) is calculated as:
ARI=(RI−ExpectedRI)(max(RI)−ExpectedRI)ARI=(max(RI)−ExpectedRI)
(RI−ExpectedRI)
where,
 RI is the Rand Index.
 ExpectedRIExpectedRI is the expected value of the Rand Index.

5. Mutual Information (MI)

Mutual Information measures how much two variables are related or
connected. In clustering, it compares how much the true cluster labels
match with the predicted labels. It shows how much knowing about one
variable helps us predict the other. The more agreement there is, the
higher the score.
 Higher values mean better agreement between the clusters.
Zero means no agreement at all.

MI(y,z)=∑i∑jp(yi,zj)⋅log⁡(p(yi,zj)p(yi)⋅p(zj))MI(y,z)=∑i∑jp(yi,zj)⋅log(p(yi)⋅p(zj
MI between true labels Y and predicted labels Z is calculated as:

)p(yi,zj))
where,
yiyi is a true label.
zizi is a predicted label.
p(yi,zi)p(yi,zi) is the joint probability of yiyi and zjzj.
p(yi)p(yi) and p(zi)p(zi) are the marginal probabilities.
These clustering metrics help in evaluating the quality and performance of
clustering algorithms, allowing for informed decisions when selecting the
most suitable clustering solution for a given dataset.

**************************************************

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 11

Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Unit 4
No ratings yet
Unit 4
16 pages
DWM 4
No ratings yet
DWM 4
14 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
AI
No ratings yet
AI
19 pages
Clustering New
No ratings yet
Clustering New
6 pages
Unit 4
No ratings yet
Unit 4
74 pages
DMW Unit 5
No ratings yet
DMW Unit 5
10 pages
Unit 4
No ratings yet
Unit 4
4 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Unit 5 Cluster Analysis
No ratings yet
Unit 5 Cluster Analysis
15 pages
ML - 8
No ratings yet
ML - 8
70 pages
An Introduction To Different Methods of Clustering in Machine Learning
No ratings yet
An Introduction To Different Methods of Clustering in Machine Learning
8 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Unit - V DW
No ratings yet
Unit - V DW
6 pages
Unit 5
No ratings yet
Unit 5
5 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
EML %TH Module
No ratings yet
EML %TH Module
40 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
Clustering
No ratings yet
Clustering
38 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Intro to Cluster Analysis
No ratings yet
Intro to Cluster Analysis
24 pages
SPK Clustering
No ratings yet
SPK Clustering
35 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
11 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Module 5
No ratings yet
Module 5
91 pages
Cluster Analysis & Methods Guide
No ratings yet
Cluster Analysis & Methods Guide
11 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
M5
No ratings yet
M5
40 pages
An Introduction To Clustering Methods
No ratings yet
An Introduction To Clustering Methods
8 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
40 pages
Ds Econtent
No ratings yet
Ds Econtent
8 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Clustering: Methods and Applications
No ratings yet
Clustering: Methods and Applications
69 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
3 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Day 3
No ratings yet
Day 3
74 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
DM After Midz
No ratings yet
DM After Midz
22 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Unit 5
No ratings yet
Unit 5
10 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
21 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Clustering Analysis
No ratings yet
Clustering Analysis
12 pages
Chapter 4 - Clustering
No ratings yet
Chapter 4 - Clustering
21 pages
Module 5
No ratings yet
Module 5
43 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
Ambo University: Inistitute of Technology
No ratings yet
Ambo University: Inistitute of Technology
15 pages
Untitled Form (Responses)
No ratings yet
Untitled Form (Responses)
6 pages
PHP Notes Unit 2,3 &4
No ratings yet
PHP Notes Unit 2,3 &4
95 pages
Organization of Data
No ratings yet
Organization of Data
17 pages
Chapter 3 Logic Gates
No ratings yet
Chapter 3 Logic Gates
7 pages
Mining Frequent Patterns Unit-3
No ratings yet
Mining Frequent Patterns Unit-3
13 pages
Chapter 6
No ratings yet
Chapter 6
27 pages
Data Tools for IT Professionals
No ratings yet
Data Tools for IT Professionals
24 pages
Case Study OS
No ratings yet
Case Study OS
14 pages
DevsecOps Part 3 Post Quiz - Attempt Review
100% (2)
DevsecOps Part 3 Post Quiz - Attempt Review
2 pages
Hydropower Control System Overview
No ratings yet
Hydropower Control System Overview
1 page
Banco de Questões 1 HUAWEI HCIA
No ratings yet
Banco de Questões 1 HUAWEI HCIA
18 pages
Internal Auditor - Micro Fibre Group - Bdjobs
No ratings yet
Internal Auditor - Micro Fibre Group - Bdjobs
2 pages
Literature Review Sample Civil Engineering
100% (1)
Literature Review Sample Civil Engineering
4 pages
PHP Interview Prep Guide
No ratings yet
PHP Interview Prep Guide
13 pages
BTVTE Lesson: Computer Basics
No ratings yet
BTVTE Lesson: Computer Basics
10 pages
Openedge Abl Develop HTTP Clients
No ratings yet
Openedge Abl Develop HTTP Clients
54 pages
A High-Efficiency, Small-Size Gan Doherty Amplifier For Lte Micro-Cell and Active Antenna System Applications
No ratings yet
A High-Efficiency, Small-Size Gan Doherty Amplifier For Lte Micro-Cell and Active Antenna System Applications
4 pages
ATS8600
No ratings yet
ATS8600
2 pages
Mobile UX Research Guide
No ratings yet
Mobile UX Research Guide
37 pages
Nilesh Yadav
No ratings yet
Nilesh Yadav
3 pages
Blank Utility Conflict Matrix
No ratings yet
Blank Utility Conflict Matrix
1 page
Rajeev D Hand
No ratings yet
Rajeev D Hand
5 pages
ALL BSC, RNC PunchPoint Details of Haryana
No ratings yet
ALL BSC, RNC PunchPoint Details of Haryana
13 pages
Shirisha Sirapurapu
No ratings yet
Shirisha Sirapurapu
2 pages
Gigaset DA610 User Guide
No ratings yet
Gigaset DA610 User Guide
16 pages
8 Application of Phasor Measurement Unit
No ratings yet
8 Application of Phasor Measurement Unit
9 pages
PDF Programming Language Concepts Peter Sestoft Download
100% (1)
PDF Programming Language Concepts Peter Sestoft Download
55 pages
Consumer Durables Industry in India
No ratings yet
Consumer Durables Industry in India
22 pages
Lenovo YOGA 730-15IKB Hardware Maintenance Manual
No ratings yet
Lenovo YOGA 730-15IKB Hardware Maintenance Manual
68 pages
A Beautiful Day PDF Bookdash FKB Stories
No ratings yet
A Beautiful Day PDF Bookdash FKB Stories
19 pages
CCNA 1 v7 Modules 1 - 3 - Basic Network Connectivity and Communications Exam Answers
No ratings yet
CCNA 1 v7 Modules 1 - 3 - Basic Network Connectivity and Communications Exam Answers
36 pages
Software Engineering for B.E. Students
No ratings yet
Software Engineering for B.E. Students
6 pages
Win DR Manual
100% (1)
Win DR Manual
30 pages
Gartner Reprint
No ratings yet
Gartner Reprint
21 pages
LC2005 BA Lift Control
No ratings yet
LC2005 BA Lift Control
40 pages

Clustering

Uploaded by

Clustering

Uploaded by

CLUSTERING

Types of Clustering Techniques

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 1

In the partitioning method when database(D) that contains multiple(N)

K-Mean (A centroid based Technique): The K means algorithm takes

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 2

D: A dataset containing N number of objects

Figure - K-mean Clustering

For example refer running notes

In Hierarchical Clustering, the aim is to produce a hierarchical series of

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 3

Let's say we have six data points A, B, C, D, E, and F.

 Step-1: Consider each alphabet as a single cluster and calculate the

Divisive Hierarchical clustering

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 4

Divisive Hierarchical clustering

DBSCAN is a density-based clustering algorithm that groups data

DBSCAN Clustering in ML | Density based clustering

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 5

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 6

Grid-based clustering methods are a category of clustering algorithms that

 Efficiency: By reducing the problem space to a finite number of cells,

 Scalability: These methods can handle high-dimensional data more

 Flexibility: Grid-based methods can easily adapt to different data

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 7

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 8

A grid-based clustering algorithm that uses a wavelet transform to analyze

Clustering is a technique in Machine Learning that is used to group

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 9

3. Calinski-Harabasz Index (Variance Ratio Criterion)

Adjusted Rand Index (ARI)

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 10

The Adjusted Rand Index (ARI) helps us measure how accurate a

5. Mutual Information (MI)

Dr.Vidya Pawar, Vedanta Degree College, RaichurPage 11

You might also like