[go: up one dir, main page]

0% found this document useful (0 votes)
4 views19 pages

ML-12

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 19

Lecture No.

12 08-01-2025

Course: Machine Learning.

Instructor:
Nazish Bashir
Lecturer(Computer Science)

1/8/2025 1
K-Means

• It is an iterative algorithm that divides the unlabeled


dataset into k different clusters in such a way that
each dataset belongs to only one group that has
similar properties.
• K-means clustering is an unsupervised
learning technique to classify unlabeled data by
grouping them by features, rather than pre-defined
categories.
• The variable K represents the number of groups or
categories created.

2
K-Means

• The goal is to split the data into K different clusters


and report the location of the center of mass for
each cluster. Then, a new data point can be
assigned a cluster (class) based on the closed
center of mass.
• The big advantage of this approach is that the
human bias is taken out of the equation.
• Instead of having a researcher create classification
groups, the machine creates its own clusters based
upon empirical proofs, rather than assumptions.
How does K-Means Clustering Work?

Each centroid of a cluster is a collection of feature


values which define the resulting groups.
Examining the centroid feature weights can be
used to qualitatively interpret what kind of group
each cluster represents [2].
Data assignment: Each cluster is created and
defined by its centroid (central collection of
features). Each data point is then assigned to its
nearest centroid, based on some choice of
distance function
Cont….

Centroid update: After all data points are assigned, the


centroids are recalculated by taking the mean of all
data points assigned to that cluster.
Repeat: This assignment and update process repeats
until some stopping criteria is met, such as, no change
to clusters, the sum of the distances is minimized, or
some maximum iteration threshold is reached.
❑The k-means clustering algorithm mainly performs two tasks:
❑Determines the best value for K center points or centroids by an
iterative process.
❑Assigns each data point to its closest k-center. Those data points which
are near to the particular k-center, create a cluster.
❑Hence each cluster has data points with some commonalities, and it is
away from other clusters.
❑The below diagram explains the working of the K-means Clustering
Algorithm:
How does the K-Means Algorithm
Work?

The working of the K-Means algorithm is explained in


the below steps:
Step-1: Select the number K to decide the number of
clusters.
Step-2: Select random K points or centroids. (It can be
other from the input dataset).
Step-3: Calculate distance of every data point with k
centroids.
Step-4: Based on distance values, each point is
assigned to the nearest cluster.
Cont…

Step-5: New cluster centroid positions are updated.


Similar to find a mean in point locations.
Step-6: If the centroid locations changed, the process
repeats from Step 3 until the calculated new center
stays the same, which signals that clusters members
and centroids are now set.
Step-7: The model is ready.
Flow chart of K-means Clustering

START

Number of
cluster K

Centroid

No objects
END
move group
Distance objects to centroids

Grouping based on minimum


distance
Example 2 [5,6]
❑Cluster the following eight points (with (x, y) representing locations) into
three clusters:
❑A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)
❑Initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).
❑The distance function between two points a = (x1, y1) and b = (x2, y2) is
defined as-
❑Ρ(a, b) = |x2 – x1| + |y2 – y1|
❑Use K-Means Algorithm to find the three cluster centers after the second
iteration.
Thank You

You might also like