Unsupervised Learning
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Agenda
1 Why do we need Machine Learning?
7 K-Means Clustering
2 What is Machine Learning?
8 Demo on K-Means Clustering
3 Life cycle to build a model with ML
4 What is unsupervised Learning?
5 What is Clustering?
6 Types of clustering
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Why do we need Machine Learning?
• In the past, we used to have data in a structured format but now as the volume
Why machine of the data is increasing, so the number of structured data becomes very less,
Learning becomes
more popular
so to handle the massive amount of data we need data science techniques.
these days?
• Those data can be used to get the proper business insights and the hidden
trends from them.
• These insights helps the organization to predict the Future
• Helps to reduce the production cost
• Build model based on the data to give the ability to the machine to predicts on
its own
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
What is Machine Learning?
Machine learning is a sub-set of artificial intelligence (AI) that allows the system to automatically learn and
improve from experience without being explicitly programmed
Training Data Model Building Testing Data
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Traditional vs Machine Learning
Traditional Programming Machine Learning
Data Data
Output Model
Program Output
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Machine Learning
Process to train a Machine Learning model
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Life cycle of Machine Learning
Understand the Exploratory data
Data Acquisition Data Cleaning
business problem Analysis
Machine Learning
Deploy the model Predict your model accuracy
Algorithm
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Types of Machine Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Unsupervised Learning
Problem to be resolved: Classifying unstructured and unlabeled data into different categories/Predicting the
class of unlabeled and unstructured data
Solution: This is where supervised learning fail, and unsupervised learning algorithms come into picture.
The unsupervised learning algorithm cluster the input
Example: Cluster different bikes based
upon their speed limit, acceleration,
average
Data into different classes on the basis ofcontent.
Proprietary their statistical
© Great properties
Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Unsupervised Learning
Training data for unsupervised learning is collection of information without any label
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Unsupervised Learning - Example
• A set of relevant information is fed into the system
• The system identifies different type of Bike using features like color, size, speed limit, average etc., and
categorizes them
• When a new Bike is shown, it analyses its features and puts it into the category having similar featured
items
Groups depends on attributes Proprietary
used content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
What Is Clustering?
Process of dividing the datasets into groups, consisting of similar data-points
NOTE: Points within the same clusters are similar to each other but are
different when compared to other cluster
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Example Of Clustering
Example 1: Cluster of different colors of Example 2: Cluster of different types of garbage
fruits
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Why Clustering is needed?
1 Determine intrinsic grouping in a set of unlabeled data
2 Organizing data into clusters, thereby showing internal structure of the data
3 Create partition in the dataset
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Types Of Clustering
Clustering
Exclusive Clustering Overlapping Clustering Hierarchical Clustering
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Types Of Clustering
Clustering
Exclusive Clustering Overlapping Clustering Hierarchical Clustering
Item exclusively belongs to one cluster, not several.
Ex: K-means Clustering
Cluster 0
Cluster 2
Cluster 1
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Types Of Clustering
Clustering
Exclusive Clustering Overlapping Clustering Hierarchical Clustering
Set of items belonging to multiple clusters.
Ex: fuzzy/c-means is of this type.
Cluster 1
Cluster 2
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Types Of Clustering
Clustering
Exclusive Clustering Overlapping Clustering Hierarchical Clustering
Cluster 0
Cluster having a parent-child relationship / tree-like structure.
Cluster 1
Ex: Hierarchical Clustering C2
C1 C0
Cluster 2
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
What Is K-Means Clustering?
K-Means is a clustering algorithm which focuses on grouping similar elements or data points into a cluster
NOTE: ‘K’ in K-Means represent the number of clusters
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Business Application Of K-means
Behavioural Segmentation
Inventory Categorization
Sorting sensor measurements
Detecting bots or anomalies
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Number of Clusters = 3
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
This is our final output… Let’s see how to do it!
Number of Clusters = 3
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Step 1: Identify the number of clusters (K =3 in this case)
Step 2: Randomly select 3 distinct data point
Step 3: Measure the distance between the 1st point and selected 3 clusters
Measure the distance from point 1 to the
orange cluster
Distance from point 1 to the blue cluster
Distance from point 1 to the green cluster
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Step 4: Assign the 1st point to nearest cluster (orrange in this case)
Repeat the process
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Step 5: Calculate the mean value including the new point for the orange cluster
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Find to which cluster does point 2 belongs to, HOW?
Repeat the same procedure but measure the distance to the orange mean
Distance from point 2 to the orange cluster
Distance from point 2 to the blue cluster
DistanceProprietary
fromcontent.
point 2Learning.
© Great to the green
All Rights cluster
Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Point 2 belongs to the orange cluster
Calculate the new cluster mean including the new point
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Find to which cluster does point 3 belongs to, HOW?
Repeat the same procedure but measure the distance to the red mean
Distance from point 3 to the new
orange mean
Distance from point 3 to the blue cluster
Distance from point 3 to the green
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
cluster
Understanding K - Means Algorithm
Point 3 belongs to the orange cluster
Measure the distance and add the 3rd point to the cluster(orange) having the minimum distance & calculate the
new cluster mean including the new point
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Find to which cluster does point 5 belongs to, HOW?
REPEAT THE SAME STEPS AGAIN…
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Since rest of the points lies closest to the green cluster, so all the point belong to green cluster
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Result from 1st iteration
Original/Expected Result
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Total variation within the cluster
Iteration 1:
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Iteration 2: Start from the first But with different initial random point (as compared to the 1st iteration)
Step 1: Select the number of clusters, i.e. K =3
Step 2: Randomly select 3 distinct data point
Step 3: Measure the distance between the points and selected 3 clusters
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Iteration 3: Restart from the scratch with different initial random point (as compared
to the 2nd iteration)
Step 1: Select the number of clusters, i.e. K =3
Step 2: Randomly select 3 distinct data point
Step 3: Measure the distance between the points and selected 3 clusters
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Finally the iteration with the minimum variation is selected
Iteration 1:
Iteration 2:
Iteration 3:
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How To Decide The Value Of K?
This time k = 3 was known, but what-if the exact value of k is unknown?
The idea behind partitioning, is to define clusters such that total intra-cluster variation or total with-in sum of
square (WSS) for each cluster is minimized.
NOTE: The total WSS measures the compactness of the clustering
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How To Decide The Value Of K: Elbow Method
The Elbow method looks at the total WSS as a function of the number of clusters
Number of clusters should be chosen so that on adding another cluster doesn’t improve the total WSS.
Intra-Cluster Variance
Number of Cluster
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How To Decide The Value Of K: Elbow Method
Compute different values of k varying k from 1 to 10 clusters
For each k, calculate the total within-cluster sum of square (WSS)
Plot the curve of WSS according to the number of clusters k
The location of a bend (knee) in the plot gives the appropriate number of clusters
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Thank You
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited