[go: up one dir, main page]

0% found this document useful (0 votes)
3 views49 pages

Lesson 12 Logistic Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views49 pages

Lesson 12 Logistic Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Logistic Regression

Outline

Logistic Regression

Hypothesis Cost Gradient


Classification
Function Function Descent
Binary outcomes are common and important
• The patient survives the operation, or does not.
• The accused is convicted, or is not.
• The customer makes a purchase, or does not.
• The marriage lasts at least five years, or does not.
• The student graduates, or does not.

3
Categorical
Examples: Response Variables

 Non  smoker
Whether or not a person Y 
smokes Binary Response Smoker
Survives
Success of a medical Y 
treatment Dies

Opinion poll responses Agree



Y   Neutral
Ordinal Response Disagree

Difference between linear regression and
logistic regression

https://www.kaggle.com/
Sigmoid Function

“Bias unit”

“Weights”
“Parameters”
“Output”

• Sigmoid (logistic) activation function


“Input” Slide credit: Andrew Ng
Learning a Logistic Regression Model
• How to learn a logistic regression model 𝜽
𝑻
,
where = [ 𝟎 𝒎 and 𝟎 𝒎 ?
• By minimizing the following cost function:

Cost( 𝜽 )=
𝜽 𝜽
• That is:
1
minimize Cost(𝒉𝜽 𝒙 ,𝑦 )
𝑛


1 1 1 Cost function
minimize −𝑦 log − (1 − 𝑦) log 1 −
𝑛 1+𝑒 𝜽 1+𝑒 𝜽
Learning a Logistic Regression Model
• How to learn a logistic regression model 𝜽
𝑻
,
where = [ 𝟎 𝒎 and 𝟎 𝒎 ?
• By minimizing the following cost function:

Cost( 𝜽 )=
𝜽 𝜽
• That is:
1
minimize Cost(𝒉𝜽 𝒙 ,𝑦 )
𝑛


1 1 1 Cost function
minimize −𝑦 log − (1 − 𝑦) log 1 −
𝑛 1+𝑒 𝜽 1+𝑒 𝜽
Learning a Logistic Regression Model
• How to learn a logistic regression model 𝜽
𝑻
,
where = [ 𝟎 𝒎 and 𝟎 𝒎 ?
• By minimizing the following cost function:

Cost( 𝜽 )=
𝜽 𝜽
• That is:
1
minimize Cost(𝒉𝜽 𝒙 ,𝑦 )
𝑛


1 1 1 Cost function
minimize −𝑦 log − (1 − 𝑦) log 1 −
𝑛 1+𝑒 𝜽 1+𝑒 𝜽
Gradient Descent For Logistic Regression
• Outline:
• Have cost function , where =[ 𝟎 𝒎
• Start off with some guesses for
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence{ Partial derivative

Note: Update all 𝜽𝒋 simulatenously

}
Learing rate, which controls how big a step we take
when we update
Gradient Descent For Logistic Regression
• Outline:
• Have cost function , where =[ 𝟎 𝒎
• Start off with some guesses for
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence{

The final formula


() after applying
𝜽 partial derivatives
}
Inference After Learning
• After learning the parameters =[ , we can predict the
output of any new unseen 𝟎 𝒎 as follows:

𝜽 𝜽 𝒙

𝜽 𝜽 𝒙
Visualization of weights, bias, activation function

range
determined
by g(.)
bias b only change the
position of the hyperplane

Slide credit: Hugo Larochelle


Activation - sigmoid

• Squashes the neuron’s pre-


activation between 0 and 1
• Always positive
• Bounded
• Strictly increasing

Slide credit: Hugo Larochelle


A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]

and vaccine the of nigeria y


Email a 1 1 0 1 1 1
Email b 0 0 1 1 0 0
Email c 0 1 1 0 0 1
Email d 1 0 0 1 0 0
Email e 1 0 1 0 1 1
Email f 1 0 1 1 0 0

A Training Dataset
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]

and vaccine the of nigeria y


Email a 1 1 0 1 1 1
Email b 0 0 1 1 0 0
Email c 0 1 1 0 0 1
Email d 1 0 0 1 0 0
Email e 1 0 1 0 1 1
Email f 1 0 1 1 0 0

1 entails that a word (i.e., “and”) is present in an email (i.e., “Email a”)
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]

and vaccine the of nigeria y


Email a 1 1 0 1 1 1
Email b 0 0 1 1 0 0
Email c 0 1 1 0 0 1
Email d 1 0 0 1 0 0
Email e 1 0 1 0 1 1
Email f 1 0 1 1 0 0

0 entails that a word (i.e., “and”) is abscent in an email (i.e., “Email b”)
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0] We define 6 parameters
(the first one, i.e., 𝜃 ,
5 words (or features) = [ 𝟏, 𝟐, 𝟑, 𝟒, 𝟓] is the intercept)

𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y


Email a 1 1 0 1 1 1
Email b 0 0 1 1 0 0
Email c 0 1 1 0 0 1
Email d 1 0 0 1 0 0
Email e 1 0 1 0 1 1
Email f 1 0 1 1 0 0
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0] The parameter vector:
𝜽 = [𝜽𝟎 , 𝜽𝟏 , 𝜽𝟐 , 𝜽𝟑 , 𝜽𝟒 , 𝜽𝟓 ]
x=[ 𝟎, 𝟏, 𝟐, 𝟑, 𝟒, 𝟓] The feature vector

𝒙𝟎 = 𝟏 𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y


Email a 1 1 1 0 1 1 1
Email b 1 0 0 1 1 0 0
Email c 1 0 1 1 0 0 1
Email d 1 1 0 0 1 0 0
Email e 1 1 0 1 0 1 1
Email f 1 1 0 1 1 0 0

To account for the intercept


Recap: Gradient Descent For Logistic
Regression
• Outline:
• Have cost function , where =[ 𝟎 𝒎
• Start off with some guesses for
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence{

()
𝜽
First, let us calculate this factor
} for every example in our
training dataset
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟎
𝜽 𝒙

[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5


[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 -0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
Recap: Gradient Descent For Logistic
Regression
• Outline:
• Have cost function , where =[ 𝟎 𝒎
• Start off with some guesses for
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence{ Second, let us calculate
this equation for every
example in our training
() dataset and for every 𝜽𝒋 ,
𝜽 where j is between 0
and m
}
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟎
𝜽 𝒙

[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 ( − 𝟏) × 𝟏 = -0.5


𝟎

[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 ( − 𝟎) × 𝟏 = 0.5


[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 ( − 𝟏) × 𝟏 = -0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 ( − 𝟎) × 𝟏 = 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 ( − 𝟏) × 𝟏 = -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 ( − 𝟎) × 𝟏 = 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟎
𝜽 𝒙

[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 ( − 𝟏) × 𝟏 = -0.5


𝟎

[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 ( − 𝟎) × 𝟏 = 0.5


[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 ( − 𝟏) × 𝟏 = -0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 ( − 𝟎) × 𝟏 = 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 ( − 𝟏) × 𝟏 = -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 ( − 𝟎) × 𝟏 = 0.5
Recap: Gradient Descent For Logistic
Regression
• Outline:
• Have cost function , where =[ 𝟎 𝒎
• Start off with some guesses for
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence{

() Third, let us compute


𝜽 every 𝜽𝒋

}
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟎 −𝑦 𝑥 () =𝟎
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 = 𝜃 − α ×0
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 New 𝜽𝟎
0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟎 −𝑦 𝑥 () =𝟎
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 = 𝜃 − α ×0
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5 Old 𝜽𝟎

[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5


[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟎 −𝑦 𝑥 () =𝟎
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 = 𝜃 − α ×0
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5 = 0 − 0.5 × 𝟎 = 𝟎

[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5


[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5 New Paramter Vector:
𝟏 𝟐, 𝟑, 𝟒, 𝟓
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟏
𝜽 𝒙

[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5


[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟏 −𝑦 𝑥 () =𝟎
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 = 𝜃 − α ×0
0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5 = 0 − 0.5 × 𝟎 = 𝟎

[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5


[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5 New Paramter Vector:
𝟐, 𝟑, 𝟒, 𝟓
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟐
𝜽 𝒙

[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5


[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟐 −𝑦 𝑥 () = −𝟏
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 = 𝜃 − α × (−𝟏)
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0 = 0 − 0.5 × −𝟏 = 𝟎. 𝟓

[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 0
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0 New Paramter Vector:
, 𝟑, 𝟒, 𝟓
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟑
𝜽 𝒙

[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5


[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟑 −𝑦 𝑥 () =𝟎
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 0
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 =𝜃 −α×𝟎
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0 = 0 − 0.5 × 0 = 𝟎

[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5


[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5 New Paramter Vector:
, , 𝟒, 𝟓
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟒
𝜽 𝒙

[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5


[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟒 −𝑦 𝑥 () =𝟏
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 =𝜃 −α×𝟏
0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5 = 0 − 0.5 × 1 = −𝟎. 𝟓

[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 0
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5 New Paramter Vector:
, , , 𝟓
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟓
𝜽 𝒙

[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5


[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟓 −𝑦 𝑥 () = −𝟏
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0
-0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 = 𝜃 − α × (−𝟏)
0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 = 0 − 0.5 × (−1) = 𝟎. 𝟓
0
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0
-0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 New Paramter Vector:
0
, , ,
A Concrete Example: Testing
• Let us now test logistic regression on the spam email recognition
problem, using the just learnt , , ,
• Note: Testing is typically done over a portion of the dataset that is not used
during training, but rather kept only for testing the accuracy of the algorithm’s
predictions thus far
• In this example, we will test over all the examples that we used during training,
just for illustrative purposes
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt , , ,
𝒙 𝒚 𝜽𝑻 𝒙 𝒉𝜽 𝒙 = ( ) Predicted Class
𝜽 𝒙

[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331


[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 0.377540669
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt , , ,
𝒙 𝒚 𝜽𝑻 𝒙 𝒉𝜽 𝒙 = ( ) Predicted Class
𝜽 𝒙

[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331


[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 0.377540669
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt , , ,
(𝒊𝒇 𝒉𝜽 𝒙 ≥ 𝟎. 𝟓, 𝒚’ = 𝟏; 𝒆𝒍𝒔𝒆 𝒚’ = 𝟎)
𝒙 𝒚 𝜽𝑻 𝒙 𝒉𝜽 𝒙 = ( ) Predicted Class (or 𝒚’)
𝜽 𝒙

[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331


[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 0.377540669
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt , , ,
(𝒊𝒇 𝒉𝜽 𝒙 ≥ 𝟎. 𝟓, 𝒚’ = 𝟏; 𝒆𝒍𝒔𝒆 𝒚’ = 𝟎)
𝒙 𝒚 𝜽𝑻 𝒙 𝒉𝜽 𝒙 = ( ) Predicted Class (or 𝒚’)
𝜽 𝒙

[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331 1


[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669 0
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331 1
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 0.377540669 0
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331 1
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669 0
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt , , ,
(𝒊𝒇 𝒉𝜽 𝒙 ≥ 𝟎. 𝟓, 𝒚’ = 𝟏; 𝒆𝒍𝒔𝒆 𝒚’ = 𝟎)
𝒙 𝒚 𝜽𝑻 𝒙 𝒉𝜽 𝒙 = ( ) Predicted Class (or 𝒚’)
𝜽 𝒙

[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331 1


[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669 0
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331 NO
1
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 Mispredictions!
0.377540669 0
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331 1
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669 0
A Concrete Example: Inference
• Let us infer whether a given new email, say, k = [1, 0, 1, 0, 0, 1] is a spam
or not, using logistic regression with the just learnt parameter vector
, , ,
𝒙𝟎 = 𝟏 𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y
Email a 1 1 1 0 1 1 1
Email b 1 0 0 1 1 0 0
Email c 1 0 1 1 0 0 1
Email d 1 1 0 0 1 0 0
Email e 1 1 0 1 0 1 1
Email f 1 1 0 1 1 0 0
Email k 1 0 1 0 0 1 ?
Our Training Dataset
A Concrete Example: Inference
• Let us infer whether a given new email, say, k = [1, 0, 1, 0, 0, 1] is a spam
or not, using logistic regression with the just learnt parameter vector
, , ,
𝒙𝟎 = 𝟏 𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y
Email a 1 1 1 0 1 1 1
Email b 1 0 0 1 1 0 0
Email c 1 0 1 1 0 0 1
Email d 1 1 0 0 1 0 0
Email e 1 1 0 1 0 1 1
Email f 1 1 0 1 1 0 0
Email k 1 0 1 0 0 1 ?
A Concrete Example: Inference
• Let us infer whether a given new email, say, k = [1, 0, 1, 0, 0, 1] is a spam
or not, using logistic regression with the just learnt parameter vector
, , ,
𝟎
𝟎
𝜽 𝜽 𝒙 𝟎. 𝟓
𝟏, 𝟎, 𝟏, 𝟎, 𝟎, 𝟏 = 𝟎. 𝟓 × 𝟏 + 𝟎. 𝟓 × 𝟏 = 𝟏
𝟎
−𝟎. 𝟓
𝟎. 𝟓
𝟏

 Class 1 (i.e., Spam)


A Concrete Example: Inference
• Let us infer whether a given new email, say, k = [1, 0, 1, 0, 0, 1] is a spam
or not, using logistic regression with the just learnt parameter vector
, , ,
𝒙𝟎 = 𝟏 𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y
Email a 1 1 1 0 1 1 1
Email b 1 0 0 1 1 0 0
Email c 1 0 1 1 0 0 1
Email d 1 1 0 0 1 0 0
Email e 1 1 0 1 0 1 1
Email f 1 1 0 1 1 0 0
Email k 1 0 1 0 0 1 1

Somehow interesting since it considered “vaccine” and “nigeria” indicative of spam!


Logistic Regression
 Sources:
 https://www.kaggle.com/
 http://research.cs.tamu.edu
 http://web.iitd.ac.in
 https://www3.nd.edu/

You might also like