SPK Clustering

This document discusses clustering as an unsupervised learning technique in data mining. It describes clustering as grouping similar records together while maximizing within-cluster similarity and minimizing between-cluster similarity. It provides details on hierarchical and partitional (k-means) clustering algorithms. Hierarchical clustering creates nested clusters organized as a dendrogram, while k-means clustering iteratively assigns records to centroids to optimize cluster homogeneity. The document provides examples of clustering applications and demonstrates k-means clustering in Minitab.

Uploaded by

Antonius

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views35 pages

SPK Clustering

Uploaded by

Antonius

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Data Mining:

CLUSTERING
Antonius Reynard Affandi 14416018
Chintia Diah Permata 14416045
● Description
WHAT TASKS CAN ●
●
Estimation
Prediction
DATA MINING ● Classiﬁcation
● Clustering
ACCOMPLISH? ● Association
Our Discussion
Clustering Task K-Means
Clustering

What is Hierarchical Examples of

Clustering Clustering Clustering
Application
What is Clustering?
Grouping of records, observations, or
cases into classes of similar objects.

A cluster is a collection of records that are

similar to one another and dissimilar to
records in other clusters.
WHAT TASK NEEDED TO PERFORM
CLUSTERING?
Clustering algorithms seek to
segment the entire data set into
relatively homogeneous
subgroups or clusters, where the
similarity of the records within the
cluster is maximized, and the
similarity to records outside this
cluster is minimized
Similarity Measurement
How to measure
similarity ?
Clustering is grouped based on similarity
to measure relationship between objects.
Similarity is measured by calculating
distance between object (for metric) or
determine association (for nonmetric)
Data Standardization
Why we need to standardize the data
Luai, et all. 2016

Normalization before clustering is specifically needed for distance metric, like

the Euclidean distance that are sensitive to variations within the magnitude or
scales from the attributes.

In actual applications, due to the variations in selection of the attribute's value,

one attribute might overpower another one.

Normalization prevents outweighing features having a large number over

features with smaller numbers. The aim would be to equalize the dimensions
or magnitude and also the variability of those features.
How to standardize the data
Cluster Analysis Methods and
Algorithm
Clustering Methods

Non-
Hierarchical
Hierarchical
Hierarchical
A treelike cluster structure
Grouping technique that creates a
(dendrogram) is created
hierarchical construction or based on a through recursive
certain hierarchy treelike structure. The partitioning (divisive
grouping process is done gradually. methods) or combining
(agglomerative) of existing
clusters.
Several criteria for
determining distance ● Single linkage / nearest-neighbor
approach

between arbitrary ● Complete linkage /

farthest-neighbor approach

clusters A and B ● Average linkage

Single linkage / nearest-neighbor approach
Single linkage / nearest-neighbor approach
Complete linkage / farthest-neighbor approach
How many clusters?
The Stopping Rule

Low within-cluster variability,

High Between-cluster variability
Example: Hierarchical clustering using MINITAB
Example: Hierarchical clustering using MINITAB
Example: Hierarchical clustering using MINITAB
Example: Hierarchical clustering using MINITAB
Step 1: Ask the user how many clusters k the data set
Nonhierarchical: should be partitioned into.

K-Means Step 2: Randomly assign k records to be the initial cluster

center locations.

● This method using nearest centroid Step 3: For each record, find the nearest cluster center.
Thus, in a sense, each cluster center “owns” a subset of the
sorting for grouping based on least
records, thereby representing a partition of the data set.
distance between record and cluster We therefore have k clusters, C1,C2, . . . ,Ck .
center locations (using Euclidean distance)

Step 4: For each of the k clusters, find the cluster centroid,

● Used for many data record, easy to and update the location of each cluster center to the new
group value of the centroid.

Step 5: Repeat steps 3 to 5 until convergence or

termination. (centroids don’t change anymore)
Example: Nonhierarchical clustering

Step 1 : There are k cluster, where k = 2

(Round one)
Step 2 : Initial cluster centroid are (1,1) and (2,1)
m1 m2

Step 3 : Find the nearest distance for each record to cluster centroids

C1 = (a,e,g) with m1 as cluster centroid

C2 = (b,c,d,f,h) with m2 as cluster centroid
Step 4 : Find the cluster centroid, update cluster centroid and begin
Round two

C1 centroid = (1+1+1):3 , (3+2+1):3 = 1 , 2

C2 centroid = (3+4+5+4+2):5 , (3+3+3+2+1):5 = 3.6 , 2.4

(Round two)
Step 2 : Initial cluster centroid are m1 = (1,2) and m2 = (3.6 , 2.4)
Step 3 : Find the nearest distance for each record to cluster centroids
Step 4 : Find the cluster centroid, update cluster centroid and begin
Round three

New m1 = 1.25 , 1.75

New m2 = 4 , 2.75

The iterative steps keep continue until the cluster centroid is unchanged.
The final result is in round three with mentioned cluster centroid and
grouping as below :
C1 = (a,,e,g) and C2 = (b,c,d,f,h)
Application of
K-Means Clustering
using Minitab
Application of
K-Means Clustering
using Minitab
We’re registered at www.example.com
… and we’re so excited to celebrate with you!

07Clustering
No ratings yet
07Clustering
34 pages
Unit-4-ML-Reinforcement Learning
No ratings yet
Unit-4-ML-Reinforcement Learning
31 pages
2.2 Evaluating Determinants by Row Reduction
No ratings yet
2.2 Evaluating Determinants by Row Reduction
52 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
K-Means and Hierarchical Clustering
No ratings yet
K-Means and Hierarchical Clustering
30 pages
Factoring Polynomials: Be Sure Your Answers Will Not Factor Further!
No ratings yet
Factoring Polynomials: Be Sure Your Answers Will Not Factor Further!
5 pages
Julia for Data Science
From Everand
Julia for Data Science
Anshul Joshi
No ratings yet
Clustering
No ratings yet
Clustering
80 pages
Clustering
No ratings yet
Clustering
75 pages
19 - Sessionppt - Clusteringalgos
No ratings yet
19 - Sessionppt - Clusteringalgos
36 pages
Prasanna Hebbar @govt First Grade College Honnavar
No ratings yet
Prasanna Hebbar @govt First Grade College Honnavar
11 pages
MODULE 4 - 5TH SEM (2)
No ratings yet
MODULE 4 - 5TH SEM (2)
23 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
8. Clustering
No ratings yet
8. Clustering
80 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
MBB 410 Practicals
No ratings yet
MBB 410 Practicals
25 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Approximate Inference Turns Deep Networks Into Gaussian Processes
No ratings yet
Approximate Inference Turns Deep Networks Into Gaussian Processes
18 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
Dip Unit 4
No ratings yet
Dip Unit 4
58 pages
Understanding Clustering_ A Comprehensive Guide to
No ratings yet
Understanding Clustering_ A Comprehensive Guide to
5 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
Unit 8
No ratings yet
Unit 8
66 pages
Week-10
No ratings yet
Week-10
84 pages
Chapter 4 _ Clustering
No ratings yet
Chapter 4 _ Clustering
21 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
No ratings yet
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
24 pages
Unsupervised Machine Learning Techniques (2)
No ratings yet
Unsupervised Machine Learning Techniques (2)
58 pages
MODULE_5
No ratings yet
MODULE_5
43 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Lecture 14
No ratings yet
Lecture 14
25 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
UNIT 4
No ratings yet
UNIT 4
16 pages
Keras
No ratings yet
Keras
12 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Clustering Personal
No ratings yet
Clustering Personal
9 pages
Clustering
No ratings yet
Clustering
104 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
An Introduction To Clustering Methods
No ratings yet
An Introduction To Clustering Methods
8 pages
UnSupervisedLearning
No ratings yet
UnSupervisedLearning
22 pages
Assignment
No ratings yet
Assignment
6 pages
Unit 3
No ratings yet
Unit 3
12 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Dwdm Unit v Note
No ratings yet
Dwdm Unit v Note
19 pages
Signal
No ratings yet
Signal
2 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
No ratings yet
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
45 pages
Unit 5
No ratings yet
Unit 5
10 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Khichidi 1
No ratings yet
Khichidi 1
94 pages
Clustering
No ratings yet
Clustering
84 pages
Types of Networks
No ratings yet
Types of Networks
31 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
U-5_IML (2)
No ratings yet
U-5_IML (2)
20 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Lecture4 Notes
No ratings yet
Lecture4 Notes
7 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Clustering
No ratings yet
Clustering
125 pages
Quora Insincere Questions Classification: Joseph Lionel Shabharinath TR Saravanakumar Velayutham
No ratings yet
Quora Insincere Questions Classification: Joseph Lionel Shabharinath TR Saravanakumar Velayutham
12 pages
Data Mining Unit 3 Cluster Analysis: Types of Clusters
No ratings yet
Data Mining Unit 3 Cluster Analysis: Types of Clusters
11 pages
Syllabus_Soft_Computing
No ratings yet
Syllabus_Soft_Computing
2 pages
Department of Computer Engineering: Assignment No-4
No ratings yet
Department of Computer Engineering: Assignment No-4
3 pages
Unit 4
No ratings yet
Unit 4
74 pages
Cluster
100% (1)
Cluster
72 pages
Clustering
No ratings yet
Clustering
11 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
1) Introduction To Numpy, Pandas and Matplotlib
No ratings yet
1) Introduction To Numpy, Pandas and Matplotlib
11 pages
Deepfake Audio Detection Via MFCC Features Using M
No ratings yet
Deepfake Audio Detection Via MFCC Features Using M
11 pages
Pengenalan FIlter Aktif
No ratings yet
Pengenalan FIlter Aktif
19 pages
Reed-Solomon Encoder Design by Means of The Digital Filtration
No ratings yet
Reed-Solomon Encoder Design by Means of The Digital Filtration
4 pages
Channel Coding Using Matlab
No ratings yet
Channel Coding Using Matlab
14 pages
B.Tech R20 III Year ECE Model Papers FINAL ws-16-23
No ratings yet
B.Tech R20 III Year ECE Model Papers FINAL ws-16-23
8 pages
MLStackCafe QAS 1672810525772
No ratings yet
MLStackCafe QAS 1672810525772
12 pages
Newton
No ratings yet
Newton
48 pages
Notes ITSC
No ratings yet
Notes ITSC
117 pages
Types of Mesh: Triangle
No ratings yet
Types of Mesh: Triangle
11 pages
ECT352 - Ktu Qbank
No ratings yet
ECT352 - Ktu Qbank
7 pages
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
6 pages
IITG Credit Linked DS
No ratings yet
IITG Credit Linked DS
10 pages