[go: up one dir, main page]

0% found this document useful (0 votes)
78 views35 pages

SPK Clustering

This document discusses clustering as an unsupervised learning technique in data mining. It describes clustering as grouping similar records together while maximizing within-cluster similarity and minimizing between-cluster similarity. It provides details on hierarchical and partitional (k-means) clustering algorithms. Hierarchical clustering creates nested clusters organized as a dendrogram, while k-means clustering iteratively assigns records to centroids to optimize cluster homogeneity. The document provides examples of clustering applications and demonstrates k-means clustering in Minitab.

Uploaded by

Antonius
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views35 pages

SPK Clustering

This document discusses clustering as an unsupervised learning technique in data mining. It describes clustering as grouping similar records together while maximizing within-cluster similarity and minimizing between-cluster similarity. It provides details on hierarchical and partitional (k-means) clustering algorithms. Hierarchical clustering creates nested clusters organized as a dendrogram, while k-means clustering iteratively assigns records to centroids to optimize cluster homogeneity. The document provides examples of clustering applications and demonstrates k-means clustering in Minitab.

Uploaded by

Antonius
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Data Mining:

CLUSTERING
Antonius Reynard Affandi 14416018
Chintia Diah Permata 14416045
● Description
WHAT TASKS CAN ●

Estimation
Prediction
DATA MINING ● Classification
● Clustering
ACCOMPLISH? ● Association
Our Discussion
Clustering Task K-Means
Clustering

What is Hierarchical Examples of


Clustering Clustering Clustering
Application
What is Clustering?
Grouping of records, observations, or
cases into classes of similar objects.

A cluster is a collection of records that are


similar to one another and dissimilar to
records in other clusters.
WHAT TASK NEEDED TO PERFORM
CLUSTERING?
Clustering algorithms seek to
segment the entire data set into
relatively homogeneous
subgroups or clusters, where the
similarity of the records within the
cluster is maximized, and the
similarity to records outside this
cluster is minimized
Similarity Measurement
How to measure
similarity ?
Clustering is grouped based on similarity
to measure relationship between objects.
Similarity is measured by calculating
distance between object (for metric) or
determine association (for nonmetric)
Data Standardization
Why we need to standardize the data
Luai, et all. 2016

Normalization before clustering is specifically needed for distance metric, like


the Euclidean distance that are sensitive to variations within the magnitude or
scales from the attributes.

In actual applications, due to the variations in selection of the attribute's value,


one attribute might overpower another one.

Normalization prevents outweighing features having a large number over


features with smaller numbers. The aim would be to equalize the dimensions
or magnitude and also the variability of those features.
How to standardize the data
Cluster Analysis Methods and
Algorithm
Clustering Methods

Non-
Hierarchical
Hierarchical
Hierarchical
A treelike cluster structure
Grouping technique that creates a
(dendrogram) is created
hierarchical construction or based on a through recursive
certain hierarchy treelike structure. The partitioning (divisive
grouping process is done gradually. methods) or combining
(agglomerative) of existing
clusters.
Several criteria for
determining distance ● Single linkage / nearest-neighbor
approach

between arbitrary ● Complete linkage /


farthest-neighbor approach

clusters A and B ● Average linkage


Single linkage / nearest-neighbor approach
Single linkage / nearest-neighbor approach
Complete linkage / farthest-neighbor approach
How many clusters?
The Stopping Rule

Low within-cluster variability,


High Between-cluster variability
Example: Hierarchical clustering using MINITAB
Example: Hierarchical clustering using MINITAB
Example: Hierarchical clustering using MINITAB
Example: Hierarchical clustering using MINITAB
Step 1: Ask the user how many clusters k the data set
Nonhierarchical: should be partitioned into.

K-Means Step 2: Randomly assign k records to be the initial cluster


center locations.

● This method using nearest centroid Step 3: For each record, find the nearest cluster center.
Thus, in a sense, each cluster center “owns” a subset of the
sorting for grouping based on least
records, thereby representing a partition of the data set.
distance between record and cluster We therefore have k clusters, C1,C2, . . . ,Ck .
center locations (using Euclidean distance)

Step 4: For each of the k clusters, find the cluster centroid,


● Used for many data record, easy to and update the location of each cluster center to the new
group value of the centroid.

Step 5: Repeat steps 3 to 5 until convergence or


termination. (centroids don’t change anymore)
Example: Nonhierarchical clustering

Step 1 : There are k cluster, where k = 2

(Round one)
Step 2 : Initial cluster centroid are (1,1) and (2,1)
m1 m2

Step 3 : Find the nearest distance for each record to cluster centroids

C1 = (a,e,g) with m1 as cluster centroid


C2 = (b,c,d,f,h) with m2 as cluster centroid
Step 4 : Find the cluster centroid, update cluster centroid and begin
Round two

C1 centroid = (1+1+1):3 , (3+2+1):3 = 1 , 2


C2 centroid = (3+4+5+4+2):5 , (3+3+3+2+1):5 = 3.6 , 2.4

(Round two)
Step 2 : Initial cluster centroid are m1 = (1,2) and m2 = (3.6 , 2.4)
Step 3 : Find the nearest distance for each record to cluster centroids
Step 4 : Find the cluster centroid, update cluster centroid and begin
Round three

New m1 = 1.25 , 1.75


New m2 = 4 , 2.75

The iterative steps keep continue until the cluster centroid is unchanged.
The final result is in round three with mentioned cluster centroid and
grouping as below :
C1 = (a,,e,g) and C2 = (b,c,d,f,h)
Application of
K-Means Clustering
using Minitab
Application of
K-Means Clustering
using Minitab
We’re registered at www.example.com
… and we’re so excited to celebrate with you!

You might also like