Chapter 4 Machine Learning
COMP 472 Artificial Intelligence
Russell & Norvig – Section 18.1 & 18.2
2 Why Machine Learning?
Over 2.5 quintillion bytes of data are created every single day, and it is only
going to grow from there. It is estimated that 1.7MB of data will be created
every second for every person on earth.
3 Why Machine Learning?
4 What is Machine Learning
´ In 1959, Arthur Samuel first proposed
the concept Machine Learning.
´ Machine Learning is a subset of Artificial Intelligence which
provides machines the ability to learn automatically &
improve from experience without being explicitly
programmed.
5 What is Machine Learning
´ “A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P
if its performance at tasks in T, as measured by P, improves
with experience E.”
6 Definitions
´ Algorithms: A set of rules and statistical techniques used to learn
patterns from data.
´ Model: A model is trained by using a machine learning algorithm.
´ Predictor Variable: It is a features(s) of the data that can be used to
predict the output.
´ Response Variable: It is the feature or the output variable that
needs to be predicted by suing the predictor variable(s).
´ Training Data: The Machine Learning model is built using the training
data.
´ Testing Data: The Machine Learning model evaluated using the
testing data.
7 Machine Learning Process
´ Machine Learning Process involves building a Predictive model that
can be used to find a solution for a Problem Statement.
Define Objective
Predictions Data Gathering
Model Evaluation Preparing Data
Building a Model Data Exploration
8 Machine Learning Process
´ Step 1: Define the objective of the problem
To predict the possibility of rain by studying the weather
conditions
Weather Forecast
using Machine
Learning
9 Machine Learning Process
´ What we are trying to predict?
´ What are the target features?
´ What is the input data?
´ What kind of problem are we facing? Binary classification?
Clustering?
Weather
Forecast using
Machine
Learning
10 Machine Learning Process
´ Step 2: Data Gathering
Data such as weather conditions, humidity level, temperature,
pressure etc. are either collected manually or scarped from the
web.
Weather Forecast
using Machine
Learning
11 Machine Learning Process
´ Data Open Sources
´ Google Public Data Explorer
https://www.google.com/publicdata/directory
´ Registry of Open Data on AWS (RODA)
https://registry.opendata.aws/
´ Kaggle
https://www.kaggle.com/datasets
´ Dbpedia
https://wiki.dbpedia.org/
12 Machine Learning Process
´ Step 3: Preparing Data
Data Cleaning involves getting rid of inconsistencies in data
such as missing values or redundant variables.
´ Transform data into desired format
´ Data Cleaning
Missing values
Corrupted data
Remove unnecessary data
13 Machine Learning Process
´ Step 4: Exploratory Data Analysis (EDA)
Data Exploration involves understanding the patterns and trends
in the data. At this stage all the useful insights are drawn and
correlations between the variables are understood.
14 Machine Learning Process
´ Step 5: Building a Machine Learning Model
At this stage a Predictive Model is built by using Machine Learning
Algorithms such as Linear Regression, Decision Tree, etc.
´ Machine Learning model is built by using the training data set.
´ The model is the Machine Learning algorithm that predicts the
output by using the data fed to it.
Training Data Machine Learning Model
15 Machine Learning Process
´ Step 6: Model Evaluation & Optimization
The efficiency of the model is evaluated and any further
improvement in the model are implemented.
´ Machine Learning model is evaluated by using the testing
data set.
´ The accuracy of the model is calculated
´ Further improvement in the model are done by using
techniques like parameter tuning.
Machine Learning Model
16 Machine Learning Process
´ Step 7: Predictions
The final outcome is predicted after performing parameter
tuning and improving the accuracy of the model.
17 Types of Machine Learning
´ Supervised Learning is a technique in which we teach or train the
machine using data which is well labelled.
18 Types of Machine Learning
´ Unsupervised Learning is the training of machine using information
that is unlabeled and allowing the algorithm to act on that
information without guidance.
19 Types of Machine Learning
´ Reinforcement Learning is a part of Machine learning where an
agent is put in an environment and he learns to behave in this
environment by performing certain actions and observing the
rewards which it gets from those actions.
´ e.g., self-driving cars, Alpha GO
20 Types of Machine Learning
Machine Learning
Supervised Unsupervised Reinforcement Learning
Learning Learning
(learns by reacting to
(task-driven) (data analytics) environment)
Classification Regression Association Clustering Reward Based
21 Types of Machine Learning
´ In Supervised learning
´ We are given a training set of (X, f(X)) pairs
big nose big teeth big eyes no moustache f(X) = not person
small nose small teeth small eyes no moustache f(X) = person
small nose big teeth small eyes moustache f(X) = ?
22 Types of Machine Learning
´ In Unsupervised learning
´ We are only given the Xs - not the corresponding f(X)
big nose big teeth big eyes no moustache not given
small nose small teeth small eyes no moustache not given
small nose big teeth small eyes moustache f(X) = ?
´ No teacher involved
´ Goal: find regularities among the Xs (clustering)
´ Data mining
23 Note on Data Mining
´ Other names:
´ Unsupervised Machine Learning
´ Clustering
´ Knowledge Discovery
´ Example: predict if a customer is likely to purchase certain
goods according to history of shopping activities.
24 Types of Machine Learning
´ In Reinforcement learning
´ We are not given the (X, f(X)) pairs
small nose big teeth small eyes moustache f(X) = ?
´ But somehow we are told whether our learned f(X) is right or
wrong
´ Goal: maximize the objective of right answers
25 Types of Machine Learning
Supervised Unsupervised Reinforcement
Learning Learning Learning
An agent interacts with its
The machine is trained
The machine learns by environment by producing
Definition on unlabeled data
using labelled data actions & discovers errors
without any guidance
and rewards
Types of problems Regression &Classification Association & Clustering Reward based
Type of data Labelled data Unlabelled data No pre-defined data
Training External supervision No supervision No supervision
Map labelled input to Understand patterns Follow trail and error
Approach
known output and discover output method
Linear Regression, Logistic
Popular Algorithms K-means, C-means, etc Q-learning, etc
Regression, KNN, etc
26 Types of Problems
27 Example 0
Real ML applications typically require hundreds, thousands or millions of examples
28 Example 1
´ Problem Statement: To study the House Sales dataset and build
a Machine Learning model that predicts the house pricing
index.
Linear Regression
Algorithm
Predict the house
pricing index
Regression
29 Example 2
´ Problem Statement: To study a bank credit dataset and make a
decision about whether to approve the loan of an applicant
based on his profile.
KNN Algorithm
Approve Reject
Classification
30 Example 3
´ Problem Statement: To cluster a set of movies as either good or
average based on their social media outreach.
K-means Algorithm
Popular Unpopular
Clustering
31 Supervised Learning Algorithms
´ Linear Regression
´ Logistic Regression
´ Naïve Bayes Classifier
´ Decision Tree
´ Random Forest
32 Linear Regression
´ Linear Regression is a method to predict dependent variable (Y)
based on values of independent variables (X). It can be used for the
cases where we want to predict some continuous quantity.
´ Dependent variable (Y)
The response variable whose value needs to be predicted.
´ Independent variable (X)
The predictor variable used to predict the response variable.
´ The following equation is used to represent a linear regression model:
33 Linear Regression
34 Supervised Learning Algorithms
´ Linear Regression
´ Logistic Regression
´ Decision Tree
´ Random Forest
´ Naïve Bayes Classifier
35 Logistic Regression
´ Spam Detection : Predicting if an email is Spam or not
´ Credit Card Fraud : Predicting if a given credit card transaction is fraud or
not
´ Health : Predicting if a given mass of tissue is benign or malignant
´ Marketing : Predicting if a given user will buy an insurance product or not
´ Banking : Predicting if a customer will default on a loan.
36 Logistic Regression
´ Logistic Regression is a method used to predict a dependent
variable, given a set of independent variables, such that the
dependent variable is categorical.
´ Logistic Regression is used for classification.
37 Logistic Regression
´ Linear Regression equation:
Representing a relationship between p(X) = P(Y=1|X) and X ?
´ Take the exponent of the equation, since the exponential of any
value is a positive number.
´ Secondly, a number divided by itself + 1 will always be less than 1.
Hence, the formula :
38 Logistic Regression
39 The End