Machine Learning AND Predictive Modeling: Rabi Kulshi
Machine Learning AND Predictive Modeling: Rabi Kulshi
Machine Learning AND Predictive Modeling: Rabi Kulshi
AND
PREDICTIVE MODELING
Rabi Kulshi
1
Contents
2
Introduction
3
What is Machine Learning?
As defined in Wikipedia (Tom M. Mitchell) : A computer program is said to learn from experience E
with respect to some class of tasks T and performance measure P, if its performance at tasks in
T, as measured by P, improves with experience E
E
Successfully
labeled xp
observations
Train
Incoming new
events, T ask
E
Data set,
xp Observation,
% of correct
(Experiences)
classification
P erf
4
The Learning Model?
with continuous knowledge update
Train
Incoming new
events, T ask
E
Data set,
xp Observation,
% of correct
(Experiences) classification
P erf
X{x1,x2,…xn}
f(x) y
5
Type of Machine Learning?
• Supervised Learning
In supervised learning a supervisor provides a set of labeled data,
also known as experience that is used to train the machine to build
knowledge (i.e. a model/function)
• Unsupervised Learning
Find some structure or groups in the data. Also known as clustering
technique
Example :News groups, Cohesive groups in Facebook, customer groups
6
Examples of Supervised Learning
where outcome is a continuous variable
y f(x x ........ x )
1 2 m
7
Examples of Supervised Learning
where outcome is discrete variable
8
Regression Model, Visual Analysis
On-line purchase of images per year (y)
4k
(Dependent variable, Yield, result)
3k
y 0 1 x1
y 0 x0 1 x1
2k
x0
y ( 0 , 1 )
x1
1k
Y T X
0 5 10 15
Passion for photography(x)
(Feature, independent variable)
9
Classification Model, Visual Analysis
Distance
from
decision
boundary
200 Decision
Purchase amount in $ (Feature)
Boundary
150
100
50
0 10 20 30
Time spend to search and make decision
(In minutes) (Feature)
10
Clustering Model, Visual Analysis
e
Type of Image purchased
20 30 40 50 60 60 70
Customer Age
11
Key considerations and concepts for implementation
and development of Machine learning Models
Visualize your data to identify features,
understand and determine the model
Carefully analyze Standard deviation or
percentage of false positive or false negative
Set users expectations on the probability of
false conclusion, this is very important
Keep a balance between Bias caused by
under fitting and Variance cause by over
fitting
May decide to use Training, Validation and
Test (60%, 20%,20%) steps
Regularization process and selection of
regularization parameter
12
A Few Machine Learning Techniques
Normal Equation
Regression
Model
Gradient Descent
Supervised
Learning Logistic Regression
(Gradient Descent)
Classification
Model Naive Bayesian
Machine Learning
Support Vector
Machine
K Mean
Unsupervised Clustering
Learning Model
Bisecting K Mean
13
Let’s take a
break for
questions
14
Linear Regression Model
Introduction to Linear model
y f ( x) a bx f ( x) a b x f ( x) a b x b x
11 11 2 2
f ( x)b b x b x f ( x) b x b x b x
0 11 2 2 0 0 11 2 2
Where b a, x 1
0 0
f ( x)b x b x b x ........ b x
0 0 11 2 2 n n
15
Linear Regression Model
Compact and Matrix representation of Linear model
n
f ( x) β x β x β x ........ β x f ( x) x
0 0 1 1 2 2 n n i i
i 0
f ( x)βTX
β x
0 0
β x
1 1
β . , x .
. .
β n x n
16
Linear Regression
Find beta values using Normal equation
Solve analytically using well known Normal equation
method
β (XTX)1 XT y
Well know methodology
Easy to implement
Computationally less expensive for small to
medium number of feature (<1000)
No need to select Learning rate that is
needed for Gradient Descent
Pay attention on matrix inverse issue in
case of linear dependency
17
Linear Regression
Find beta values Using Gradient descent
Cost function with m observations
1 m (i) (i) 2
C(β) (f (x ) y )
2m i 1 β
Let’s understand gradient descent technique using example of an ant
who wants to reach the bottom of the hat
δ 1 m (i) (i) (i)
β :β α C( ) β :β α (f (x ) y ) x
j j δβ j j m i 1 β j
j
Now iterate through the above formula to optimize cost function . Stop
iteration when cost function does not change significantly between
iterations
J=1,2….n
And α is the Learning Rate
18
Classification Model
Solution using Logistic Regression
Classification Model
Y, the outcome can get one of the two values; 0=negative
(No), 1=positive (Yes)
Email Spam, Email Threat, Fraudulent transaction
Multi-class classification problem; Tone of the message
The approach
Compute the Decision Boundary based on training data and
that boundary that you will use to classify as YES or NO
depending on where (which side of the Decision Boundary) the
observation falls
Depending on the dimension (i.e. number of features) The
Decision Boundary will be a line, a plane or a hyper plane (one
degree less than the number of features)
19
Logistic Regression Model Distance
from
decision
boundary
Purchase amount in $ (Feature)
4 Decision
Boundary
3
0 10 20 30
Time spend to search and make decision
(In minutes) (Feature)
20
Logistic Regression Model
f
β
(x)
1
T x 0
βT x
1 e
f
β
(x) g(β T x) Predict Y=1 if g(z) ≥0.5, z ≥0
g(z)
1 Predict Y=0 if g(z) <0.5, z <0
1 e z T x 0
The Model
0<=f(x)<=1
The function known as Sigmoid function
The Cost function become a convex function
Decision boundary
Now we need to optimize the cost function for Logistics
regression for estimating beta using Gradient descent
21
Classification Model
Solution using Bayesian Model
• Based on conditional probability
• It assumes features are statistically independent variable
• We find the probability for a given observation to be classifies
as class ci, for i=1,2…n
• The observation will be classified as a type/category for which
this conditional probability is maximum
e
Type of Image purchased
20 30 40 50 60 60 70
Customer Age
23
Let’s take
some
Questions
24