0% found this document useful (0 votes)

80 views41 pages

2 Clustering

This document provides an overview of machine learning clustering techniques. Clustering is an unsupervised learning method used to group unlabeled data points into clusters such that points within a cluster are similar to each other and dissimilar to points in other clusters. The document discusses different types of clustering including exclusive, overlapping, and hierarchical clustering. It also provides details on the k-means clustering algorithm, which iteratively assigns data points to the closest cluster centroid to minimize distances between points and their assigned cluster.

Uploaded by

patil_555

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views41 pages

2 Clustering

Uploaded by

patil_555

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Machine Learning

Sunbeam Infotech www.sunbeaminfo.com

unlabelia dura
-

finding groups/ clusters

Clustering
-
-

Explorative Data Analysis

Sunbeam Infotech www.sunbeaminfo.com

Overview unlabelled
⑧
data

§ Clustering is one of the most common exploratory data analysis technique used to get an intuition
a n

about the structure of the data mecasters, erosite

§ It can be defined as the task of identifying subgroups in the data such that data points in the same
-

subgroup (cluster) are very similar while data points in different clusters are very different
-

§ In other words, we try to find homogeneous subgroups within the data such that data points in each
- e n te

cluster are as similar as possible according to a similarity measure such as euclidean-based distance
- -
-
or correlation-based distance
e n
dance ↑

§ The decision of which similarity measure to use is application-specific

§ Unlike supervised learning, clustering is considered an unsupervised learning method since we don’t
-

have the ground truth to compare the output of the clustering algorithm to the true labels to evaluate
its performance ↳ concrete
e

↳
-

outputvariable
↳ dependent variable
↳Y

Sunbeam Infotech www.sunbeaminfo.com

Overview

⑧
atthe e
Preprocess
②

Select similarity
③

Cluster
④

Analyze

i1 nvariable inte.Austeneer
data measure e

I I convert the
-> data cleaning ->

clusters to classes
processing
visualize the clusters
↳ Saling ->

Sunbeam Infotech www.sunbeaminfo.com

Use Cases

§ Marketing
-

§ Discovering groups in customer databases like who makes long-distance calls or who are earning more or
- s e e

who are spending more

e n

§ Insurance
-

§ Identifying groups of insurance policy holder with high claim rate

§ Land use
-

§ Identification of areas of similar land use in GIS database

↳ city
↳ State
↳
country

Sunbeam Infotech www.sunbeaminfo.com

Types
-
-

§-
Exclusive clustering
§ An item belongs exclusively to one cluster and not several
e n t e

§ E.g. K-Means clustering

-
-

⑲ge
age<50 age>,50

Cl (2

Sunbeam Infotech www.sunbeaminfo.com

Types

§ Overlapping clustering
-

§ An Item can belong to multiple clusters

n e e

§ Its degree of association with each cluster is known

§ E.g. Fuzzy/C-means clustering

Sunbeam Infotech www.sunbeaminfo.com

Types

§ Hierarchical clustering
-re
-
-

§ When two clusters have a parent child relationship

n e e

§ It forms a tree like structure

n e e

§ E.g. Hierarchical clustering cluster with

two smaller clusters

dendograph

⑧
->
-
A
root
C1
cluster with 4
small clusters

cluster
~
->
one item
with
I ⑪ ⑰ C2
-

⑳ C3 18

->
8
i2 I7

14
IG
I5

Sunbeam Infotech www.sunbeaminfo.com

KMeans
--

Sunbeam Infotech www.sunbeaminfo.com

Overview
perform actions multiple times
to
§ Kmeans algorithm is an iterative algorithm that tries to partition the dataset into distinct non-
-

overlapping subgroups (clusters) where each data point belongs to only one group
-

§ It tries to make the inter-cluster data points as similar as possible while also keeping the clusters as
e
-n
different (far) as possible
-

§ It assigns data points to a cluster such that the sum of the squared distance between the data points
-

and the cluster’s centroid (arithmetic mean of all the data points that belong to that cluster) is at the
-
-

-
minimum
§ The less variation we have within clusters, the more homogeneous (similar) the data points are within
- -

the same cluster

similarity
measure

a
=distance

Sunbeam Infotech www.sunbeaminfo.com

How does it work?

-
§ Specify number of clusters K
§ Initialize centroids by first shuffling the dataset and then randomly selecting K data points for the
-
centroids without replacement
§ Keep iterating until there is no change to the centroids. i.e assignment of data points to clusters isn’t
--

-
changing
§ Compute the sum of the squared distance between data points and all centroids
-

§ Assign each data point to the closest cluster (centroid)

§ Compute the centroids for the clusters by taking the average of the all data points that belong to each
-
cluster

Sunbeam Infotech www.sunbeaminfo.com

I
Cl

.
C2
C2
⑮
18

:s:::::
2

x X ·o

x
x
6 8

⑮ "
..
.
.
K-Means Clustering - Algorithm

§ Initialization
-

§ randomly initialise two points called the cluster centroids

-
its notrequired that
the initial centroids

dataset
are the
party CI
dataset ⑧
-

You may select centroids outside of

well
as

⑧
(2

Sunbeam Infotech www.sunbeaminfo.com

K-Means Clustering - Algorithm

§ Cluster Assignment
-

§ Compute the distance between both the points and

centroids
-

§ Depending on the minimum distance from the centroid

divide the points into two clusters

⑧
%=
-

·
(2

Sunbeam Infotech www.sunbeaminfo.com

K-Means Clustering - Algorithm

§ Move Centroid
- -

§ Consider the older centroids are data points

§ Take
-
the older centroid and iteratively reposition them
for optimization
-

§ Optimization
-

§ Repeat the steps until the cluster centroids stop

changing the position Cl

- ⑧

892

Sunbeam Infotech www.sunbeaminfo.com

K-Means Clustering - Algorithm

§ Convergence
e

§ Finally, k-means clustering algorithm converges and divides the data points into two clusters clearly visible in
-
multiple clusters
-

⑭
O

Sunbeam Infotech www.sunbeaminfo.com

K-Means Clustering - Example

§ Suppose we want to group the visitors to a website using just their age (one-dimensional space) as

-
follows:

15,15,16,19,19,20,20,21,22,28,35,40,41,42,43,44,60,61,65

N = 19

Sunbeam Infotech www.sunbeaminfo.com

K-Means Clustering - Example

§ Initial clusters (random centroid or average)

&
k=2
c1 = 16
-

c2 = 22
-

e
--
2
-
-

Sunbeam Infotech www.sunbeaminfo.com

K-Means Clustering - Example

age lizei-cil (n 181

§ Iteration I

Before:
-
W
c1 = 16
-

c2 = 22
-

After:
c1 = 15.33
-

c2 = 36.25
--

Sunbeam Infotech www.sunbeaminfo.com

K-Means Clustering - Example

/xi-cil (xi-cel
§ Iteration II
a 0

I
⑧
o
O
Before:
"
c1 = 15.33

E.
c2 = 36.25

=8
After:

-I
c1 = 18.56
c2 = 45.9
o

Sunbeam Infotech www.sunbeaminfo.com

K-Means Clustering - Example

§ Iteration III

Before:

=)
c1 = 18.56
c2 = 45.9

After:
c1 = 19.50
c2 = 47.89

Sunbeam Infotech www.sunbeaminfo.com

K-Means Clustering - Example

§ Iteration IV

SO
Before:
c1 = 19.50 Cl
-
0
c2 = 47.89
-

After:

=
c1 = 19.50
-

c2 = 47.89
0

Sunbeam Infotech www.sunbeaminfo.com

K-Means Clustering

§ How to find the optimum number of clusters?

-§
Elbow Method
-
- -

§ Purpose Method (Intuition C

Sunbeam Infotech www.sunbeaminfo.com

Elbow Method
-
-

§ Total within-cluster variation

-
-
↑
sum
of distance
squares between
every
point & cluster's centroid
§ Also known as Within Sum of Squares (WSS)
e

§ The sum of squared distances (Euclidean) between the items and the corresponding centroid
-

↳ calculate cluster all

of
for men

all clusters

C =)
e
I
d

↑
-

points within
the cluster

Sunbeam Infotech www.sunbeaminfo.com

Elbow Method
-

§ Draw a curve between WSS (within sum of squares) and

-
the number of clusters

§ It is called elbow method because the curve looks like a

human arm and the elbow point gives us the optimum

number of clusters

I
I ↑...
-
- I
---

Elbor
I
⑧
·

Sunbeam Infotech www.sunbeaminfo.com

Purpose Method
-

§ Get different clusters based on a variety of purposes

§ Partition the data on different metrics and see how well it performs for that particular case
-n e e

-
V

/
O

⑧ 0

§ K=3: If you want to provide only 3 sizes(S, M, L) so that prices are cheaper, you will divide the data set into 3 clusters.
-see -

§ K=5: Now, if you want to provide more comfort and variety to your customers with more sizes (XS, S, M, L, XL), then you will
-
-
divide the data set into 5 clusters.
e

Sunbeam Infotech www.sunbeaminfo.com

Applications

§ It is very popular and used in a variety of applications such as market segmentation, document
- - -

clustering, image segmentation and image compression, etc.

-
-

Sunbeam Infotech www.sunbeaminfo.com

Disadvantages

§ It always try to construct a nice spherical shape around the centroid. That means, the minute the
clusters have a complicated geometric shapes, kmeans does a poor job in clustering the data
§ Doesn’t let data points that are far-away from each other share the same cluster even though they
obviously belong to the same cluster

Sunbeam Infotech www.sunbeaminfo.com

Hierarchical Clustering
-
-

Sunbeam Infotech www.sunbeaminfo.com

Hierarchical Clustering

§ Separating data into different groups based on some measure of similarity

-stance
§ Types
§ Agglomerative
-

§ Divisive
-

Sunbeam Infotech www.sunbeaminfo.com

⑳
Linkage ge
Bottom-up
-

Dendegsam
-

⑳
x
33
-
biggest the

· T
fot-------
- -

I I
I 1-
⑧I
I
! I

⑪ ⑧ ⑤ B E
C D E

N
.
Hierarchical Clustering
e n

§ Dendrogram
-

§ diagram that shows the hierarchical relationship between objects

⑳ Roof

parent

↓ parent
.

Sunbeam Infotech www.sunbeaminfo.com

Agglomerative Clustering
-

§ Also called as bottom-top clustering as it uses bottom-up approach

§ Each data point starts in its own cluster

§ These clusters are then joined greedily by taking two most similar clusters together
-

Sunbeam Infotech www.sunbeaminfo.com

Agglomerative Clustering
-

§ Start by assigning each item to a cluster

§ if you have N items, you now have N clusters, each containing just one item
- e

§ Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you
- -

have one cluster less

§ Compute distances (similarities) between the new cluster and each of the old clusters
-

§ Repeat steps 2 and 3 until all items are clustered into a single cluster of size N
-
↳ Root
-
cluster
-

Sunbeam Infotech www.sunbeaminfo.com

Agglomerative Clustering
-

§ Step 3 can be done in different ways

§ Single-linkage
-

§ Complete-linkage
n e

§ Average-linkage
-

↳ how the clusters need to

be merged

Sunbeam Infotech www.sunbeaminfo.com

Agglomerative Clustering

§ Single linkage
-

§ Also known as nearest neighbour clustering

§ The
-
distance between two groups is defined as the distance between their two closest members
§ It often yields clusters in which individuals are added sequentially to a single group
-

o C

------

0.......
no

......
Sunbeam Infotech www.sunbeaminfo.com
Agglomerative Clustering

§ Complete linkage
-
§ also known as furthest neighbour clustering
§ the
-
distance between two groups as the distance between their two farthest-apart members
&

·C

⑱----
complete linkage

⑪
⑳-------- 13
·
⑧
D

Sunbeam Infotech www.sunbeaminfo.com

Agglomerative Clustering

§ Average linkage
-
-

§ referred to as the unweighted pair-group method

§ distance between two groups is defined as the average distance between each of their members
-

⑲or
i
0

Sunbeam Infotech www.sunbeaminfo.com

Divisive Clustering
-

§ Also called as top-bottom clustering as it uses top-bottom approach

§ All data point starts in it’s the same cluster (Roo)

§ Then using parametric clustering like k-means divide the cluster into multiple clusters
-

§ For each cluster repeating the process find sub cluster till the desired number of clusters found
-

Sunbeam Infotech www.sunbeaminfo.com

K Mean Clustering
No ratings yet
K Mean Clustering
59 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Unit 4
No ratings yet
Unit 4
125 pages
Clustering
No ratings yet
Clustering
125 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Unsupesfwafarvised Learning
No ratings yet
Unsupesfwafarvised Learning
49 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
57 pages
Understanding Clustering - A Comprehensive Guide To
No ratings yet
Understanding Clustering - A Comprehensive Guide To
5 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Clustering
No ratings yet
Clustering
84 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
Session 37 CO4 Unsupervised Learning
No ratings yet
Session 37 CO4 Unsupervised Learning
34 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
cz4041 10 Clustering
No ratings yet
cz4041 10 Clustering
67 pages
Data Mining and Machine Learning
No ratings yet
Data Mining and Machine Learning
48 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering
No ratings yet
Clustering
38 pages
21csc305p Machine Learning Unit 3 - Updated
No ratings yet
21csc305p Machine Learning Unit 3 - Updated
147 pages
Unit 7 Clustering (P)
No ratings yet
Unit 7 Clustering (P)
22 pages
Clustering
No ratings yet
Clustering
67 pages
Clustering (Class 38-39)
No ratings yet
Clustering (Class 38-39)
45 pages
K, Eans
No ratings yet
K, Eans
4 pages
Lecture 4.6 Unsupervised-Learning Clustering
No ratings yet
Lecture 4.6 Unsupervised-Learning Clustering
60 pages
K-Means Clustering Guide 2023
No ratings yet
K-Means Clustering Guide 2023
14 pages
Neural Network Clustering Guide
No ratings yet
Neural Network Clustering Guide
168 pages
Machine Learning Chapter 3
No ratings yet
Machine Learning Chapter 3
12 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
ML - 8
No ratings yet
ML - 8
70 pages
Clustering
No ratings yet
Clustering
80 pages
Unsupervised Machine Learning Techniques
No ratings yet
Unsupervised Machine Learning Techniques
58 pages
SPK Clustering
No ratings yet
SPK Clustering
35 pages
Clustering and K-Means Algorithm
No ratings yet
Clustering and K-Means Algorithm
81 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
Unit 4
No ratings yet
Unit 4
74 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
4 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Session 3-Clustering
No ratings yet
Session 3-Clustering
41 pages
M3 - Unsupervised Machine Learning
No ratings yet
M3 - Unsupervised Machine Learning
35 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
Day 3
No ratings yet
Day 3
74 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Session 34 - 35clustering
No ratings yet
Session 34 - 35clustering
50 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Week 9
No ratings yet
Week 9
66 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Minor Project
No ratings yet
Minor Project
10 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
V5I5201647
No ratings yet
V5I5201647
13 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Adavance Analytic and Statistic
No ratings yet
Adavance Analytic and Statistic
26 pages
3 Dbscan
No ratings yet
3 Dbscan
7 pages
(Ref. Page 10 Last Para.) (Ref. Page 10 Last Para.)
No ratings yet
(Ref. Page 10 Last Para.) (Ref. Page 10 Last Para.)
1 page
Zomorrodi Shahrokhi PID Tunning Comparison PDF
No ratings yet
Zomorrodi Shahrokhi PID Tunning Comparison PDF
12 pages
Linux Commands
No ratings yet
Linux Commands
14 pages
Instrument Measurement Basics
100% (2)
Instrument Measurement Basics
17 pages
Gkenergy Quality Process
No ratings yet
Gkenergy Quality Process
12 pages
Government of Maharashtra Directorate of Employment & Self-Employment Job Seeker Registration Slip
No ratings yet
Government of Maharashtra Directorate of Employment & Self-Employment Job Seeker Registration Slip
1 page
Root File
No ratings yet
Root File
84 pages
Design, Development & Comparison of Various Control Strategies For Distillation Column Pilot Plant
No ratings yet
Design, Development & Comparison of Various Control Strategies For Distillation Column Pilot Plant
2 pages
Shri Nivas
No ratings yet
Shri Nivas
6 pages
Root File
No ratings yet
Root File
84 pages
Sensor Characteristics Overview
100% (1)
Sensor Characteristics Overview
26 pages
Estimation of Optimum Range of Set-Points and Two Level & Four Level Control For A Closed Loop Quadruple Tank Pilot Plant Through PLC
No ratings yet
Estimation of Optimum Range of Set-Points and Two Level & Four Level Control For A Closed Loop Quadruple Tank Pilot Plant Through PLC
9 pages
Fuzzy PID Control for Industry
No ratings yet
Fuzzy PID Control for Industry
2 pages
Raewwwwwwwwwwwwwwwwwwwwwwwww
No ratings yet
Raewwwwwwwwwwwwwwwwwwwwwwwww
5 pages
Design of PI/PID Controller For Coupled Tank System
No ratings yet
Design of PI/PID Controller For Coupled Tank System
3 pages
Commonlyused Sentences
No ratings yet
Commonlyused Sentences
1 page
Raewwwwwwwwwwwwwwwwwwwwwwwww: RUNG - A Section of The PLC Ladder Program That Terminates in An Output
No ratings yet
Raewwwwwwwwwwwwwwwwwwwwwwwww: RUNG - A Section of The PLC Ladder Program That Terminates in An Output
5 pages
Raewwwwwwwwwwwwwwwwwwwwwwwww: RUNG - A Section of The PLC Ladder Program That Terminates in An Output
No ratings yet
Raewwwwwwwwwwwwwwwwwwwwwwwww: RUNG - A Section of The PLC Ladder Program That Terminates in An Output
6 pages
Raewwwwwwwwwwwwwwwwwwwwwwwww: RUNG - A Section of The PLC Ladder Program That Terminates in An Output
No ratings yet
Raewwwwwwwwwwwwwwwwwwwwwwwww: RUNG - A Section of The PLC Ladder Program That Terminates in An Output
6 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Module-05 RM & Ipr
No ratings yet
Module-05 RM & Ipr
42 pages
DataMining Reflection Paper
No ratings yet
DataMining Reflection Paper
2 pages
3CP10 MJJ Clustering Intro
No ratings yet
3CP10 MJJ Clustering Intro
18 pages
Data Mining
No ratings yet
Data Mining
232 pages
Data Mining vs Data Warehouse Explained
No ratings yet
Data Mining vs Data Warehouse Explained
4 pages
One Day Before
No ratings yet
One Day Before
2 pages
Scalable Distribution in Data Summarization Clustering
No ratings yet
Scalable Distribution in Data Summarization Clustering
4 pages
Introduction To Course: Advanced Data Mining
No ratings yet
Introduction To Course: Advanced Data Mining
7 pages
CSS 241
No ratings yet
CSS 241
232 pages
Tamil CV
No ratings yet
Tamil CV
3 pages
Dataware Housing and Data Mining Question
No ratings yet
Dataware Housing and Data Mining Question
8 pages
DWDM Important Questions
No ratings yet
DWDM Important Questions
2 pages
Chapter - 5 Machine Learning
0% (1)
Chapter - 5 Machine Learning
25 pages
Prediction of Dynamic Price of Ride-On-Demand Services Using Linear Regression
No ratings yet
Prediction of Dynamic Price of Ride-On-Demand Services Using Linear Regression
1 page
Aiml Project Review
No ratings yet
Aiml Project Review
22 pages
Data Warehousing and Data Mining MCQ (Free PDF) - Objective Question Answer For Data Warehousing and Data Mining Quiz - Download Now!
No ratings yet
Data Warehousing and Data Mining MCQ (Free PDF) - Objective Question Answer For Data Warehousing and Data Mining Quiz - Download Now!
24 pages
Lessons Learned: A Case Study Using Data Mining in The Newspaper Industry
No ratings yet
Lessons Learned: A Case Study Using Data Mining in The Newspaper Industry
10 pages
Assignment Unit-1: III Year VI Semester CSE
No ratings yet
Assignment Unit-1: III Year VI Semester CSE
7 pages
Cluster Analysis: Biological Data Analysis and Chemometrics
No ratings yet
Cluster Analysis: Biological Data Analysis and Chemometrics
41 pages
Introduction To Deep Convolutional Neural Networks: March 2016
No ratings yet
Introduction To Deep Convolutional Neural Networks: March 2016
51 pages
Exercise For K Means Tutorial
No ratings yet
Exercise For K Means Tutorial
5 pages
Bioinformatics & Data Mining Insights
No ratings yet
Bioinformatics & Data Mining Insights
3 pages
Geomorphometry Concepts, Software, Applications
No ratings yet
Geomorphometry Concepts, Software, Applications
32 pages
Data Analytics for CMA Students
No ratings yet
Data Analytics for CMA Students
49 pages
Big Data Analytics 2016th Edition Radha Shankarmani Vijayalakshmi PDF Download
No ratings yet
Big Data Analytics 2016th Edition Radha Shankarmani Vijayalakshmi PDF Download
90 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
48 pages
International Journal of Computer Science & Information Technology (IJCSIT)
No ratings yet
International Journal of Computer Science & Information Technology (IJCSIT)
2 pages
Apriori Algorithm for Groceries
No ratings yet
Apriori Algorithm for Groceries
3 pages
Precisions Assembly Is A Unique Branch
No ratings yet
Precisions Assembly Is A Unique Branch
6 pages