Introduction to Machine Learning
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
Learning from Data
• Can we learn about the world around us using data?
• Model building from data
– Take data as input
– Find patterns in the data
– Summarize the pattern in a mathematically precise way
• Machine learning automates this model building.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
2
The Challenge
• Data unfortunately contains noise. If not, machine learning
would be trivial!
• Think of Data = Information + Noise
• The challenge is to identify the information content and
distill away the noise.
• To help do this, machine learning uses a train and test
approach.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
3
Over fitting Vs under fitting
• If the model we finish with ends up
– modeling the noise as well, we call it “over fitting” - bad for
prediction!
– not modeling all the information, we call it “under fitting” - bad for
prediction!
• The hope is that the model that does the best on testing
data manages to capture/model all the information but leave
out all the noise.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
4
Machine Learning tasks
1. Supervised learning: Building a mathematical model using
data that contains both the inputs and the desired outputs
(ground truth).
– Examples:
• Determining if an image has a horse. The data would include images with
and without the horse (the input), and for each image we would have a
label (the output) indicating if there is a horse in that image.
• Determining is a client might default on a loan
• Determining if a call center employee is likely to quit
– Since we have desired outputs, model performance can be
evaluated by comparisons.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
5
Machine Learning Tasks
2. Unsupervised learning: Building a mathematical model
using data that contains only inputs and no desired outputs.
– Used to find structure in the data, like grouping or clustering of
data points. To discover patterns and group the inputs into
categories.
– Example: an advertising platform segments the population into
smaller groups with similar demographics and purchasing habits.
Helping advertisers reach their target market with relevant ads.
– Since no labels are provided, there is no specific way to compare
model performance in most unsupervised learning methods.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
6
Tools and techniques
• Supervised learning
– Regression: desired output is a continuous number
– Classification: desired output is a category
• Unsupervised learning
– Clustering: Grouping data
– Dimensionality reduction: Compressing data
– Association rule learning: If X then Y
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
7
Intro to Clustering
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
Clustering
• Clustering is an Unsupervised Learning Technique
• A Cluster: collection of objects that are similar
• Objective is to group similar data points into a group
– Segmenting customers into similar groups
– Automatically organizing similar files/emails into folders
• Simplifies data by reducing many data points into a few
clusters
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
9
Distance
• Do define “similarity” you need a measure of distance
• Examples of common distance measures
– Manhattan Distance
– Eucledian Distance
– Chebyshev Distance
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
1
0
Types of Clustering
1. Connectivity based clustering (Hierarchical clustering): based on the idea that related
objects are closer to each other. Can we then create a hierarchy of clusters/groups.
– Useful when you want flexibility in how many clusters you ultimately want. For
example, imagine grouping items on an online marketplace like Etsy or Amazon.
– In terms of outputs from the algorithm, in addition to cluster assignments you
also build a nice tree (dendrogram) that tells you about the hierarchies between
the clusters. You can then pick the number of clusters you want from this tree.
– In a dendrogram, the y-axis marks the distance at which the clusters merge,
while the objects are placed along the x-axis.
– Algorithms can be agglomerative (start with 1 object and aggregate them into
clusters) or divisive (start with complete data and divide into partitions).
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
1
1
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
1
2
Types of Clustering
2. Centroid based clustering (Eg. K- Means clustering):
The objective is to find K clusters/groups. The way
these groups are defined is by creating a centroid for
each group. The centroids are like the heart of the
cluster, they “capture” the points closest to them
and add them to the cluster.
– Large K produces smaller groups and a small K produces
larger groups
– K-Means uses Eucledian distances and is the most popular
– Other variants like K-medians and K-mediods use other
distance measures
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
1
3
Clustering
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
Data we will work with
– Customer Spend Data
• AVG_Mthly_Spend: The average monthly amount spent by customer
• No_of_Visits: The number of times a customer visited in a month
• Item Counts: Count of Apparel, Fruits and Vegetable, Staple Items purchased
• Can we cluster similar customers together?
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 1
5
or distribution prohibited
Connectivity Based: Hierarchical Clustering
• Hierarchical Clustering techniques create clusters in a
hierarchical tree like structure
• Any type of distance measure can be used as a
measure of similarity
• Cluster tree like output is called Dendogram
• Techniques either start with individual objects and
sequentially combine them (Agglomerative ), or start
from one cluster of all objects and sequentially divide
them (Divisive)
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
1
6
Agglomerative
• Starts with each object as a cluster of one record each
• Sequentially merges 2 closest records by distance as a
measure of similarity to form a cluster.
• How would we measure distance between two
clusters?
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
1
7
Distance between clusters
• Single linkage – Minimum
distance or Nearest neighbor
• Complete linkage –
Maximum distance or
Farthest distance
• Average linkage – Average
of the distances between all
pairs
• Centroid method – combine
cluster with minimum
distance between the
centroids of the two clusters
• Ward’s method – Combine
clusters with which the
increase in within cluster
variance is to the smallest
degree
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 1
8
or distribution prohibited
Distance between objects
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 1
9
or distribution prohibited
Centroid based: K-Means Clustering
• K-Means is probably the most used clustering technique
• Aims to partition the n observations into k clusters so as to
minimize the within-cluster sum of squares (i.e. variance).
• Computationally less expensive compared to hierarchical
techniques.
• Have to pre-define K, the no of clusters
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
2
0
Lloyd’s algorithm
1. Assume K Centroids
2. Compute Squared Eucledian distance of each objects with
these K centroids. Assign each to the closest centroid forming
clusters.
3. Compute the new centroid (mean) of each cluster based on
the objects assigned to each clusters.
4. Repeat 2 and 3 till convergence: usually defined as the point
at which there is no movement of objects between clusters
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
2
1
Choosing the optimal K
• Usually subjective, based on striking a good balance between
compression and accuracy
• The “elbow” method is commonly used
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
2
2
Lloyd’s algorithm
1. Assume K Centroids
2. Compute Squared Eucledian distance of each objects with
these K centroids. Assign each to the closest centroid forming
clusters.
3. Compute the new centroid (mean) of each cluster based on
the objects assigned to each clusters.
4. Repeat 2 and 3 till convergence: usually defined as the point
at which there is no movement of objects between clusters
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
23
Market Basket Analysis (or) Association Rules
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
Market Baskets
– Transactions/Baskets
• Is it true that {Breakfast Cereals}->{Bread}
• How sure are you?
• Other patterns like, If {A,B,…} then {C,…}?
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 2
5
or distribution prohibited
Association Rules Learning
• Rules-bases unsupervised learning:
– If X then Y. Written as X -> Y.
– X and Y can be sets of multiple items
• Market basket analysis is the term usually used to
when the context is the transactions in retail/e-
commerce.
• The rule X -> Y, indicating that if you have all items in X
then you are more likely to have items in Y as well. Of
course each rule might or might not be true in a given
data set and hence has to be appropriate qualified.
• Other Applications
– web usage mining
– intrusion detection, network traffic analysis
– bioinformatics, protein sequencing
– medical diagnosis
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 2
6
or distribution prohibited
How good is a given Rule?
• {Breakfast Cereals}->{Bread}?
• If you think this is true
– Does it apply to a large number of transactions?
– Is it often correct?
– Are you sure it is not just a coincidence?
• Lets say for example, transactions looked like this
– Total: 415
– BC: 54
– Bread: 90
– Bread and BC: 44
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
2
7
Support, Confidence and Lift
• Results of an actual analysis would look like this:
2
8
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited