0% found this document useful (0 votes)

60 views13 pages

Slides - Clustering

This is an introduction to Machine learning that allows retailers understand the importance of having some requirements in the same place or next to each other to yield a better turnover.

Uploaded by

mofolukeakintayojo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views13 pages

Slides - Clustering

This is an introduction to Machine learning that allows retailers understand the importance of having some requirements in the same place or next to each other to yield a better turnover.

Uploaded by

mofolukeakintayojo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Clustering

Machine Learning

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Data we will work with

• Customer Spend Data

• AVG_Mthly_Spend: The average monthly amount spent by customer
• No_of_Visits: The number of times a customer visited in a month
• Item Counts: Count of Apparel, Fruits and Vegetable, Staple Items purchased

• Can we cluster similar customers together?

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited 2
Connectivity Based: Hierarchical Clustering

• Hierarchical Clustering techniques create clusters in a

hierarchical tree like structure

• Any type of distance measure can be used as a measure of

similarity

• Cluster tree like output is called Dendogram

• Techniques either start with individual objects and

sequentially combine them (Agglomerative ), or start from
one cluster of all objects and sequentially divide them
(Divisive)

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited 3
Agglomerative

• Starts with each object as a cluster of one record each

• Sequentially merges 2 closest records by distance as a

measure of similarity to form a cluster.

• How would we measure distance between two clusters?

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited 4
Distance between clusters

• Single linkage – Minimum

distance or Nearest neighbor

• Complete linkage – Maximum

distance or Farthest distance

• Average linkage – Average of

the distances between all
pairs

• Centroid method – combine

cluster with minimum distance
between the centroids of the
two clusters

• Ward’s method – Combine

clusters with which the
increase in within cluster
variance is to the smallest
degree
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited 5
Distance between objects

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited 6
Centroid based: K-Means Clustering

• K-Means is probably the most used clustering technique

• Aims to partition the n observations into k clusters so as to

minimize the within-cluster sum of squares (i.e. variance).

• Computationally less expensive compared to hierarchical

techniques.

• Have to pre-define K, the no of clusters

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited 7
Lloyd’s algorithm

1. Assume K Centroids

2. Compute Squared Eucledian distance of each objects with

these K centroids. Assign each to the closest centroid forming
clusters.

3. Compute the new centroid (mean) of each cluster based on

the objects assigned to each clusters.

4. Repeat 2 and 3 till convergence: usually defined as the point

at which there is no movement of objects between clusters

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited 8
Choosing the optimal K

• Usually subjective, based on striking a good balance between

compression and accuracy

•The “elbow” method is commonly used

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited 9
Lloyd’s algorithm

1. Assume K Centroids

2. Compute Squared Eucledian distance of each objects with

these K centroids. Assign each to the closest centroid forming
clusters.

3. Compute the new centroid (mean) of each cluster based on

the objects assigned to each clusters.

4. Repeat 2 and 3 till convergence: usually defined as the point

at which there is no movement of objects between clusters

Strengths Weakness

Use simple principles without the need for any How to choose K?

complex statistical terms

Once clusters and their associated centroids are The k-means algorithm is sensitive to the

identified, it is easy to assign new objects (for starting positions of the initial centroid. Thus, it

example, new customers) to a cluster based on is important to rerun the k-means analysis

the object's distance from the closest centroid several times for a particular value of k to

ensure the cluster results provide the overall

minimum WSS

Because the method is unsupervised, using k- Susceptible to curse of dimensionality

means helps to eliminate subjectivity from the

• Visual analysis of the attributes selected for the clustering may given
an idea of the range of values that K should be evaluated in

• Identifying the attributes on which clusters are clearly demarcated and

using them in incremental order to build the multi-dimensional clusters
likely to give much better clusters than using all the attributes at one go

• Clustering on correct attributes is the key to good clustering

results.

• We can also consider those attributes who's value changes with

time. For e.g. age, income category, years of work experience
etc.

• We can use sequential k means clustering over time to track

individual clusters (how they change in size, shape and position

• We can also understand how individual data points move across

clusters, form new clusters etc.

• Analyzing the changes in the clusters over time using metrics

such as

• Cluster size, new entries and exits one can analyze the impact of
strategies designed based on earlier clustering analysis

Introduction To Machine Learning-Presentation
No ratings yet
Introduction To Machine Learning-Presentation
28 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
8.cluster Analysis HCA
No ratings yet
8.cluster Analysis HCA
31 pages
Market Segmentation - Cluster Analysis
No ratings yet
Market Segmentation - Cluster Analysis
18 pages
Clustering Analysis for Data Science
No ratings yet
Clustering Analysis for Data Science
51 pages
Clustering - Unit 4
No ratings yet
Clustering - Unit 4
19 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Unit 4
No ratings yet
Unit 4
74 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
Slides - Intro To Clustering
No ratings yet
Slides - Intro To Clustering
10 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
The Math Behind The K-Means and Hierarchical Clust+
No ratings yet
The Math Behind The K-Means and Hierarchical Clust+
13 pages
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
No ratings yet
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
31 pages
Chapter 04 Clustering
No ratings yet
Chapter 04 Clustering
36 pages
11 Chapter 3
No ratings yet
11 Chapter 3
17 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
Clustering
No ratings yet
Clustering
84 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
31 pages
Unit II Final
No ratings yet
Unit II Final
152 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
12 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
K-Means Clustering Insights
No ratings yet
K-Means Clustering Insights
8 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Unit 4
No ratings yet
Unit 4
125 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Unit 2
No ratings yet
Unit 2
89 pages
Unsupervised Learning Part 1
No ratings yet
Unsupervised Learning Part 1
9 pages
SLide#4 - Clustering and Elbow Technique
No ratings yet
SLide#4 - Clustering and Elbow Technique
29 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Clustering
No ratings yet
Clustering
6 pages
Unsupervised Learning for Students
No ratings yet
Unsupervised Learning for Students
59 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Model 3
No ratings yet
Model 3
31 pages
21AI71 Module 5 Textbook
No ratings yet
21AI71 Module 5 Textbook
25 pages
Ds Un4
No ratings yet
Ds Un4
11 pages
07 Clustering
No ratings yet
07 Clustering
34 pages
Module 4 - 5TH Sem
No ratings yet
Module 4 - 5TH Sem
23 pages
Clustering
No ratings yet
Clustering
29 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
40 pages
ML Ch-5 Clustering, Dimensionality Reduction and Recommender System
No ratings yet
ML Ch-5 Clustering, Dimensionality Reduction and Recommender System
13 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
Cluster Analysis
No ratings yet
Cluster Analysis
25 pages
Unsupervised Learning: Clustering Algorithms
No ratings yet
Unsupervised Learning: Clustering Algorithms
13 pages
Cluster Analysis: Abu Bashar
No ratings yet
Cluster Analysis: Abu Bashar
18 pages
CS8091 BDA Unit 2
No ratings yet
CS8091 BDA Unit 2
101 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
EML %TH Module
No ratings yet
EML %TH Module
40 pages
Computer Lab Bcsl013
No ratings yet
Computer Lab Bcsl013
5 pages
Ping Assignment
No ratings yet
Ping Assignment
2 pages
Bosepo Ali Hassan
No ratings yet
Bosepo Ali Hassan
16 pages
E-Banking Service Quality in Riyadh
No ratings yet
E-Banking Service Quality in Riyadh
20 pages
WWW Artfolio Tech Kiruthik Arun...
No ratings yet
WWW Artfolio Tech Kiruthik Arun...
7 pages
Visual Statistics Use R!
50% (2)
Visual Statistics Use R!
388 pages
Konelab PRIME 30 Service Manual
No ratings yet
Konelab PRIME 30 Service Manual
282 pages
New-Preparation For Noon Interview
No ratings yet
New-Preparation For Noon Interview
4 pages
Alaa El Mouttaqui Math Resume
No ratings yet
Alaa El Mouttaqui Math Resume
4 pages
Enensys Asiipguard Datasheet A
No ratings yet
Enensys Asiipguard Datasheet A
4 pages
Web-Based Department Management System
No ratings yet
Web-Based Department Management System
7 pages
Reviewed COPYRIGHT ISSUES ON INTERNET
No ratings yet
Reviewed COPYRIGHT ISSUES ON INTERNET
6 pages
Sr. Data Modeler Data Analyst Resume Madison, WI - Hire IT People - We Get IT Done 2
No ratings yet
Sr. Data Modeler Data Analyst Resume Madison, WI - Hire IT People - We Get IT Done 2
1 page
Linear Equations: Direct & Iterative Methods
No ratings yet
Linear Equations: Direct & Iterative Methods
36 pages
Setup and User Guide: Mediaaccess Tg788Vn
No ratings yet
Setup and User Guide: Mediaaccess Tg788Vn
120 pages
Scor 350 701
No ratings yet
Scor 350 701
10 pages
Flutter Application Development Book
No ratings yet
Flutter Application Development Book
639 pages
QBO Certification Training Guide - v031122
100% (1)
QBO Certification Training Guide - v031122
308 pages
Fake News Detect
No ratings yet
Fake News Detect
4 pages
Sidebar - Shadcn - Ui
No ratings yet
Sidebar - Shadcn - Ui
38 pages
Equipment List: Leica TS03, TS07, TS10
No ratings yet
Equipment List: Leica TS03, TS07, TS10
20 pages
Admit Card
No ratings yet
Admit Card
2 pages
VeliPayments Presentation
No ratings yet
VeliPayments Presentation
20 pages
UsersGuide 50S 2.4.0 en 230714
No ratings yet
UsersGuide 50S 2.4.0 en 230714
66 pages
Dhyan
No ratings yet
Dhyan
1 page
Data Science & IT Resume - Harshitha S
No ratings yet
Data Science & IT Resume - Harshitha S
3 pages
Kushal Intern Report
No ratings yet
Kushal Intern Report
22 pages
Digital Printing Price List
No ratings yet
Digital Printing Price List
4 pages
Introduction To Aerosol Modelling: From Theory To Code 1st Edition David L. Topping PDF Download
No ratings yet
Introduction To Aerosol Modelling: From Theory To Code 1st Edition David L. Topping PDF Download
111 pages
IT Accessibility & Web Basics
No ratings yet
IT Accessibility & Web Basics
5 pages

Slides - Clustering

Uploaded by

Slides - Clustering

Uploaded by

Clustering

• Customer Spend Data

• Can we cluster similar customers together?

• Hierarchical Clustering techniques create clusters in a

• Any type of distance measure can be used as a measure of

• Cluster tree like output is called Dendogram

• Techniques either start with individual objects and

• Starts with each object as a cluster of one record each

• Sequentially merges 2 closest records by distance as a

• How would we measure distance between two clusters?

• Single linkage – Minimum

• Complete linkage – Maximum

• Average linkage – Average of

• Centroid method – combine

• Ward’s method – Combine

• K-Means is probably the most used clustering technique

• Aims to partition the n observations into k clusters so as to

• Computationally less expensive compared to hierarchical

• Have to pre-define K, the no of clusters

2. Compute Squared Eucledian distance of each objects with

3. Compute the new centroid (mean) of each cluster based on

4. Repeat 2 and 3 till convergence: usually defined as the point

• Usually subjective, based on striking a good balance between

•The “elbow” method is commonly used

2. Compute Squared Eucledian distance of each objects with

3. Compute the new centroid (mean) of each cluster based on

4. Repeat 2 and 3 till convergence: usually defined as the point

complex statistical terms

ensure the cluster results provide the overall

Because the method is unsupervised, using k- Susceptible to curse of dimensionality

means helps to eliminate subjectivity from the

• Identifying the attributes on which clusters are clearly demarcated and

• Clustering on correct attributes is the key to good clustering

• We can also consider those attributes who's value changes with

• We can use sequential k means clustering over time to track

• We can also understand how individual data points move across

• Analyzing the changes in the clusters over time using metrics

You might also like