Computer Applications in Engineering Design
Assignment 02
Abdur Rahman 180380
December 2, 2019
Implementation of K-mean Clustering Algorithm in Python
Submitted To: Dr.Habib
1
Contents
1 What is K-mean Clustering? 2
1.1 K-mean Algorithm: . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Implementation of K-mean Algorithm in Python 3
2.1 Flow Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Pseudo code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1
1 What is K-mean Clustering?
Clustering is a technique to find out the sub-groups in a group of given
data.Kmeans algorithm is an iterative algorithm that tries to partition
the dataset into Kpre-defined distinct non-overlapping subgroups (clusters)
where each data point belongs to only one group.
1.1 K-mean Algorithm:
• Specify number of clusters K.
• Initialize centroids by first shuffling the dataset and then randomly
selecting K data points for the centroids without replacement.
• Keep iterating until there is no change to the centroids. i.e assignment
of data points to clusters isn’t changing.
• Compute the sum of the squared distance between data points and all
centroid.
• Assign each data point to the closest cluster (centroid).
• Compute the centroid for the clusters by taking the average of the all
data points that belong to each cluster.
2
2 Implementation of K-mean Algorithm in
Python
2.1 Flow Chart
2.2 Pseudo code
• Read data from text files into arrays.
• Concatenate the above arrays into one array variable.
3
• Get the maximum and minimum value of the given points.
• Generate k random clustering points in the range from maximum and
minimum value measure in the above step.
• Get the distance of every data point in given data from every clustering
point.
• Compute the average distance and assign the clustering points to there
respective clusters with respect to the less distance as compare to oth-
ers.
• Plot these points and system pause for some seconds for visualization.
• Move the clustering points to there new position in there respective
clusters
• Measure the difference between the old and new positions of clustering
points
• Repeat the above process until the above difference is very much less