0% found this document useful (0 votes)

29 views28 pages

Introduction To Machine Learning-Presentation

This is an introduction to Machine Learning and its importance in our everyday live.

Uploaded by

mofolukeakintayojo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views28 pages

Introduction To Machine Learning-Presentation

This is an introduction to Machine Learning and its importance in our everyday live.

Uploaded by

mofolukeakintayojo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Introduction to Machine Learning

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited
Learning from Data

• Can we learn about the world around us using data?

• Model building from data
– Take data as input
– Find patterns in the data
– Summarize the pattern in a mathematically precise way
• Machine learning automates this model building.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

2
The Challenge

• Data unfortunately contains noise. If not, machine learning

would be trivial!
• Think of Data = Information + Noise
• The challenge is to identify the information content and
distill away the noise.
• To help do this, machine learning uses a train and test
approach.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

3
Over fitting Vs under fitting

• If the model we finish with ends up

– modeling the noise as well, we call it “over fitting” - bad for
prediction!
– not modeling all the information, we call it “under fitting” - bad for
prediction!
• The hope is that the model that does the best on testing
data manages to capture/model all the information but leave
out all the noise.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

4
Machine Learning tasks

1. Supervised learning: Building a mathematical model using

data that contains both the inputs and the desired outputs
(ground truth).
– Examples:
• Determining if an image has a horse. The data would include images with
and without the horse (the input), and for each image we would have a
label (the output) indicating if there is a horse in that image.
• Determining is a client might default on a loan
• Determining if a call center employee is likely to quit
– Since we have desired outputs, model performance can be
evaluated by comparisons.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

5
Machine Learning Tasks

2. Unsupervised learning: Building a mathematical model

using data that contains only inputs and no desired outputs.
– Used to find structure in the data, like grouping or clustering of
data points. To discover patterns and group the inputs into
categories.
– Example: an advertising platform segments the population into
smaller groups with similar demographics and purchasing habits.
Helping advertisers reach their target market with relevant ads.
– Since no labels are provided, there is no specific way to compare
model performance in most unsupervised learning methods.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

6
Tools and techniques

• Supervised learning
– Regression: desired output is a continuous number
– Classification: desired output is a category
• Unsupervised learning
– Clustering: Grouping data
– Dimensionality reduction: Compressing data
– Association rule learning: If X then Y

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

7
Intro to Clustering

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited
Clustering

• Clustering is an Unsupervised Learning Technique

• A Cluster: collection of objects that are similar
• Objective is to group similar data points into a group
– Segmenting customers into similar groups
– Automatically organizing similar files/emails into folders
• Simplifies data by reducing many data points into a few
clusters

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

9
Distance

• Do define “similarity” you need a measure of distance

• Examples of common distance measures
– Manhattan Distance
– Eucledian Distance
– Chebyshev Distance

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

1
0
Types of Clustering

1. Connectivity based clustering (Hierarchical clustering): based on the idea that related
objects are closer to each other. Can we then create a hierarchy of clusters/groups.

– Useful when you want flexibility in how many clusters you ultimately want. For
example, imagine grouping items on an online marketplace like Etsy or Amazon.

– In terms of outputs from the algorithm, in addition to cluster assignments you

also build a nice tree (dendrogram) that tells you about the hierarchies between
the clusters. You can then pick the number of clusters you want from this tree.

– In a dendrogram, the y-axis marks the distance at which the clusters merge,
while the objects are placed along the x-axis.

– Algorithms can be agglomerative (start with 1 object and aggregate them into
clusters) or divisive (start with complete data and divide into partitions).

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

1
1
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited

1
2
Types of Clustering
2. Centroid based clustering (Eg. K- Means clustering):
The objective is to find K clusters/groups. The way
these groups are defined is by creating a centroid for
each group. The centroids are like the heart of the
cluster, they “capture” the points closest to them
and add them to the cluster.
– Large K produces smaller groups and a small K produces
larger groups
– K-Means uses Eucledian distances and is the most popular
– Other variants like K-medians and K-mediods use other
distance measures

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

1
3
Clustering

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited
Data we will work with
– Customer Spend Data
• AVG_Mthly_Spend: The average monthly amount spent by customer
• No_of_Visits: The number of times a customer visited in a month
• Item Counts: Count of Apparel, Fruits and Vegetable, Staple Items purchased

• Can we cluster similar customers together?

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 1
5
or distribution prohibited
Connectivity Based: Hierarchical Clustering

• Hierarchical Clustering techniques create clusters in a

hierarchical tree like structure
• Any type of distance measure can be used as a
measure of similarity
• Cluster tree like output is called Dendogram
• Techniques either start with individual objects and
sequentially combine them (Agglomerative ), or start
from one cluster of all objects and sequentially divide
them (Divisive)

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

1
6
Agglomerative
• Starts with each object as a cluster of one record each
• Sequentially merges 2 closest records by distance as a
measure of similarity to form a cluster.
• How would we measure distance between two
clusters?

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

1
7
Distance between clusters
• Single linkage – Minimum
distance or Nearest neighbor
• Complete linkage –
Maximum distance or
Farthest distance
• Average linkage – Average
of the distances between all
pairs
• Centroid method – combine
cluster with minimum
distance between the
centroids of the two clusters
• Ward’s method – Combine
clusters with which the
increase in within cluster
variance is to the smallest
degree
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 1
8
or distribution prohibited
Distance between objects

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 1

9
or distribution prohibited
Centroid based: K-Means Clustering

• K-Means is probably the most used clustering technique

• Aims to partition the n observations into k clusters so as to

minimize the within-cluster sum of squares (i.e. variance).

• Computationally less expensive compared to hierarchical

techniques.

• Have to pre-define K, the no of clusters

or distribution prohibited

2
0
Lloyd’s algorithm

1. Assume K Centroids

2. Compute Squared Eucledian distance of each objects with

these K centroids. Assign each to the closest centroid forming
clusters.

3. Compute the new centroid (mean) of each cluster based on

the objects assigned to each clusters.

4. Repeat 2 and 3 till convergence: usually defined as the point

at which there is no movement of objects between clusters

or distribution prohibited

2
1
Choosing the optimal K

• Usually subjective, based on striking a good balance between

compression and accuracy

• The “elbow” method is commonly used

or distribution prohibited
2
2
Lloyd’s algorithm

1. Assume K Centroids

2. Compute Squared Eucledian distance of each objects with

these K centroids. Assign each to the closest centroid forming
clusters.

3. Compute the new centroid (mean) of each cluster based on

the objects assigned to each clusters.

4. Repeat 2 and 3 till convergence: usually defined as the point

at which there is no movement of objects between clusters

or distribution prohibited

23
Market Basket Analysis (or) Association Rules

or distribution prohibited
Market Baskets
– Transactions/Baskets

• Is it true that {Breakfast Cereals}->{Bread}

• How sure are you?

• Other patterns like, If {A,B,…} then {C,…}?

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 2
5
or distribution prohibited
Association Rules Learning
• Rules-bases unsupervised learning:
– If X then Y. Written as X -> Y.
– X and Y can be sets of multiple items
• Market basket analysis is the term usually used to
when the context is the transactions in retail/e-
commerce.
• The rule X -> Y, indicating that if you have all items in X
then you are more likely to have items in Y as well. Of
course each rule might or might not be true in a given
data set and hence has to be appropriate qualified.
• Other Applications
– web usage mining
– intrusion detection, network traffic analysis
– bioinformatics, protein sequencing
– medical diagnosis
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 2
6
or distribution prohibited
How good is a given Rule?
• {Breakfast Cereals}->{Bread}?
• If you think this is true
– Does it apply to a large number of transactions?
– Is it often correct?
– Are you sure it is not just a coincidence?
• Lets say for example, transactions looked like this
– Total: 415
– BC: 54
– Bread: 90
– Bread and BC: 44

or distribution prohibited

2
7
Support, Confidence and Lift

• Results of an actual analysis would look like this:

Slides - Clustering
No ratings yet
Slides - Clustering
13 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
ML Unit4
No ratings yet
ML Unit4
19 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Slides - Intro To Clustering
No ratings yet
Slides - Intro To Clustering
10 pages
Unit 3
No ratings yet
Unit 3
34 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
80 pages
Unsupervised Machine Learning Techniques
No ratings yet
Unsupervised Machine Learning Techniques
58 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Unsupervised Learning Insights
No ratings yet
Unsupervised Learning Insights
10 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Lecture 3 Types of Machine Learning
No ratings yet
Lecture 3 Types of Machine Learning
40 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
57 pages
Classification vs Clustering Guide
No ratings yet
Classification vs Clustering Guide
31 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Clustering
No ratings yet
Clustering
38 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
R20 Machine Learning Unit 4
No ratings yet
R20 Machine Learning Unit 4
49 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Unit 4
No ratings yet
Unit 4
16 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Unsupervised Learning for Students
No ratings yet
Unsupervised Learning for Students
59 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
Unit 4
No ratings yet
Unit 4
74 pages
M3 - Unsupervised Machine Learning
No ratings yet
M3 - Unsupervised Machine Learning
35 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Clustering
No ratings yet
Clustering
75 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Clustering
No ratings yet
Clustering
20 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Unit IV
No ratings yet
Unit IV
96 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Clustering
No ratings yet
Clustering
75 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
84 pages
Unit 5
No ratings yet
Unit 5
40 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
Clustering
No ratings yet
Clustering
27 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
32 pages
Unit Iv
No ratings yet
Unit Iv
19 pages
Unit-4 ML
No ratings yet
Unit-4 ML
16 pages
Intro to Clustering Methods
No ratings yet
Intro to Clustering Methods
39 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
Module 5
No ratings yet
Module 5
43 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
63 pages
Clustering - Unit 4
No ratings yet
Clustering - Unit 4
19 pages
Chapter 3 p4
No ratings yet
Chapter 3 p4
18 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
Unit IV
No ratings yet
Unit IV
6 pages
Unit 2 ML
No ratings yet
Unit 2 ML
11 pages
2024 Set D Paper2
No ratings yet
2024 Set D Paper2
28 pages
Unit 7: Parameters, Return, and Libraries
No ratings yet
Unit 7: Parameters, Return, and Libraries
146 pages
Introduction To Film Production
No ratings yet
Introduction To Film Production
7 pages
MIS in Service Industry
No ratings yet
MIS in Service Industry
26 pages
Complete Linear Algebra Lecture Notes Math 110 Lec 002 Summer 2016 Brandon Williams PDF For All Chapters
100% (1)
Complete Linear Algebra Lecture Notes Math 110 Lec 002 Summer 2016 Brandon Williams PDF For All Chapters
55 pages
Ebook As I See It The Autobiography of J Paul Getty
0% (1)
Ebook As I See It The Autobiography of J Paul Getty
2 pages
Intro To Dnswatch and Dnswatchgo: Watchguard Training
No ratings yet
Intro To Dnswatch and Dnswatchgo: Watchguard Training
122 pages
Bayes Theorem AI
No ratings yet
Bayes Theorem AI
5 pages
Scor 350 701
No ratings yet
Scor 350 701
10 pages
Dell PowerEdge R7415 Owners Manual
No ratings yet
Dell PowerEdge R7415 Owners Manual
172 pages
Course Book Dlcoa - Alka Srivastava - Se B
No ratings yet
Course Book Dlcoa - Alka Srivastava - Se B
132 pages
Database Sharding
No ratings yet
Database Sharding
5 pages
Brigosha Profile
No ratings yet
Brigosha Profile
7 pages
Universal Protection Relay Guide
No ratings yet
Universal Protection Relay Guide
4 pages
CA 2012 15 Batch
No ratings yet
CA 2012 15 Batch
61 pages
The Beginner's Guide To HTML & CSS For Marketers
0% (1)
The Beginner's Guide To HTML & CSS For Marketers
16 pages
DMS Unit 1 22319-1
No ratings yet
DMS Unit 1 22319-1
17 pages
Computer Science Engineer Resume
No ratings yet
Computer Science Engineer Resume
1 page
WT420M - Modbus Gateway (CBO) Catalog
No ratings yet
WT420M - Modbus Gateway (CBO) Catalog
2 pages
IOS Interview Questions For MNC's (Infosys, Cognizent, HCL, Wipro) - by Randhir Raj - Medium
No ratings yet
IOS Interview Questions For MNC's (Infosys, Cognizent, HCL, Wipro) - by Randhir Raj - Medium
14 pages
AI Curriculum for Early Childhood
No ratings yet
AI Curriculum for Early Childhood
12 pages
Industrial Automation Solutions
100% (1)
Industrial Automation Solutions
328 pages
IS Concept 2020 - Session 17 - Social Computing (T)
No ratings yet
IS Concept 2020 - Session 17 - Social Computing (T)
27 pages
C, C++ Questions
No ratings yet
C, C++ Questions
78 pages
NV11 Q R G: Configuration Button
No ratings yet
NV11 Q R G: Configuration Button
2 pages
Software Project Management - U2
No ratings yet
Software Project Management - U2
14 pages
IMF502 Part 1 Discussion Chapter 4 230319
No ratings yet
IMF502 Part 1 Discussion Chapter 4 230319
8 pages
Test Chapter # 5
No ratings yet
Test Chapter # 5
1 page
List of Alumni 2018
No ratings yet
List of Alumni 2018
3 pages
Digital Printing Price List
No ratings yet
Digital Printing Price List
4 pages

Introduction To Machine Learning-Presentation

Uploaded by

Introduction To Machine Learning-Presentation

Uploaded by

Introduction to Machine Learning

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Can we learn about the world around us using data?

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Data unfortunately contains noise. If not, machine learning

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• If the model we finish with ends up

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

1. Supervised learning: Building a mathematical model using

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

2. Unsupervised learning: Building a mathematical model

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Clustering is an Unsupervised Learning Technique

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Do define “similarity” you need a measure of distance

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

– In terms of outputs from the algorithm, in addition to cluster assignments you

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Can we cluster similar customers together?

• Hierarchical Clustering techniques create clusters in a

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 1

• K-Means is probably the most used clustering technique

• Aims to partition the n observations into k clusters so as to

• Computationally less expensive compared to hierarchical

• Have to pre-define K, the no of clusters

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

2. Compute Squared Eucledian distance of each objects with

3. Compute the new centroid (mean) of each cluster based on

4. Repeat 2 and 3 till convergence: usually defined as the point

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Usually subjective, based on striking a good balance between

• The “elbow” method is commonly used

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

2. Compute Squared Eucledian distance of each objects with

3. Compute the new centroid (mean) of each cluster based on

4. Repeat 2 and 3 till convergence: usually defined as the point

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Is it true that {Breakfast Cereals}->{Bread}

• How sure are you?

• Other patterns like, If {A,B,…} then {C,…}?

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Results of an actual analysis would look like this:

You might also like