Logistic Regression
Outline
Logistic Regression
Hypothesis Cost Gradient
Classification
Function Function Descent
Binary outcomes are common and important
• The patient survives the operation, or does not.
• The accused is convicted, or is not.
• The customer makes a purchase, or does not.
• The marriage lasts at least five years, or does not.
• The student graduates, or does not.
3
Categorical
Examples: Response Variables
Non smoker
Whether or not a person Y
smokes Binary Response Smoker
Survives
Success of a medical Y
treatment Dies
Opinion poll responses Agree
Y Neutral
Ordinal Response Disagree
Difference between linear regression and
logistic regression
https://www.kaggle.com/
Sigmoid Function
“Bias unit”
“Weights”
“Parameters”
“Output”
• Sigmoid (logistic) activation function
“Input” Slide credit: Andrew Ng
Learning a Logistic Regression Model
• How to learn a logistic regression model 𝜽
𝑻
,
where = [ 𝟎 𝒎 and 𝟎 𝒎 ?
• By minimizing the following cost function:
Cost( 𝜽 )=
𝜽 𝜽
• That is:
1
minimize Cost(𝒉𝜽 𝒙 ,𝑦 )
𝑛
≣
1 1 1 Cost function
minimize −𝑦 log − (1 − 𝑦) log 1 −
𝑛 1+𝑒 𝜽 1+𝑒 𝜽
Learning a Logistic Regression Model
• How to learn a logistic regression model 𝜽
𝑻
,
where = [ 𝟎 𝒎 and 𝟎 𝒎 ?
• By minimizing the following cost function:
Cost( 𝜽 )=
𝜽 𝜽
• That is:
1
minimize Cost(𝒉𝜽 𝒙 ,𝑦 )
𝑛
≣
1 1 1 Cost function
minimize −𝑦 log − (1 − 𝑦) log 1 −
𝑛 1+𝑒 𝜽 1+𝑒 𝜽
Learning a Logistic Regression Model
• How to learn a logistic regression model 𝜽
𝑻
,
where = [ 𝟎 𝒎 and 𝟎 𝒎 ?
• By minimizing the following cost function:
Cost( 𝜽 )=
𝜽 𝜽
• That is:
1
minimize Cost(𝒉𝜽 𝒙 ,𝑦 )
𝑛
≣
1 1 1 Cost function
minimize −𝑦 log − (1 − 𝑦) log 1 −
𝑛 1+𝑒 𝜽 1+𝑒 𝜽
Gradient Descent For Logistic Regression
• Outline:
• Have cost function , where =[ 𝟎 𝒎
• Start off with some guesses for
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence{ Partial derivative
Note: Update all 𝜽𝒋 simulatenously
}
Learing rate, which controls how big a step we take
when we update
Gradient Descent For Logistic Regression
• Outline:
• Have cost function , where =[ 𝟎 𝒎
• Start off with some guesses for
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence{
The final formula
() after applying
𝜽 partial derivatives
}
Inference After Learning
• After learning the parameters =[ , we can predict the
output of any new unseen 𝟎 𝒎 as follows:
𝜽 𝜽 𝒙
𝜽 𝜽 𝒙
Visualization of weights, bias, activation function
range
determined
by g(.)
bias b only change the
position of the hyperplane
Slide credit: Hugo Larochelle
Activation - sigmoid
• Squashes the neuron’s pre-
activation between 0 and 1
• Always positive
• Bounded
• Strictly increasing
Slide credit: Hugo Larochelle
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
and vaccine the of nigeria y
Email a 1 1 0 1 1 1
Email b 0 0 1 1 0 0
Email c 0 1 1 0 0 1
Email d 1 0 0 1 0 0
Email e 1 0 1 0 1 1
Email f 1 0 1 1 0 0
A Training Dataset
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
and vaccine the of nigeria y
Email a 1 1 0 1 1 1
Email b 0 0 1 1 0 0
Email c 0 1 1 0 0 1
Email d 1 0 0 1 0 0
Email e 1 0 1 0 1 1
Email f 1 0 1 1 0 0
1 entails that a word (i.e., “and”) is present in an email (i.e., “Email a”)
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
and vaccine the of nigeria y
Email a 1 1 0 1 1 1
Email b 0 0 1 1 0 0
Email c 0 1 1 0 0 1
Email d 1 0 0 1 0 0
Email e 1 0 1 0 1 1
Email f 1 0 1 1 0 0
0 entails that a word (i.e., “and”) is abscent in an email (i.e., “Email b”)
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0] We define 6 parameters
(the first one, i.e., 𝜃 ,
5 words (or features) = [ 𝟏, 𝟐, 𝟑, 𝟒, 𝟓] is the intercept)
𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y
Email a 1 1 0 1 1 1
Email b 0 0 1 1 0 0
Email c 0 1 1 0 0 1
Email d 1 0 0 1 0 0
Email e 1 0 1 0 1 1
Email f 1 0 1 1 0 0
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0] The parameter vector:
𝜽 = [𝜽𝟎 , 𝜽𝟏 , 𝜽𝟐 , 𝜽𝟑 , 𝜽𝟒 , 𝜽𝟓 ]
x=[ 𝟎, 𝟏, 𝟐, 𝟑, 𝟒, 𝟓] The feature vector
𝒙𝟎 = 𝟏 𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y
Email a 1 1 1 0 1 1 1
Email b 1 0 0 1 1 0 0
Email c 1 0 1 1 0 0 1
Email d 1 1 0 0 1 0 0
Email e 1 1 0 1 0 1 1
Email f 1 1 0 1 1 0 0
To account for the intercept
Recap: Gradient Descent For Logistic
Regression
• Outline:
• Have cost function , where =[ 𝟎 𝒎
• Start off with some guesses for
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence{
()
𝜽
First, let us calculate this factor
} for every example in our
training dataset
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟎
𝜽 𝒙
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 -0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
Recap: Gradient Descent For Logistic
Regression
• Outline:
• Have cost function , where =[ 𝟎 𝒎
• Start off with some guesses for
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence{ Second, let us calculate
this equation for every
example in our training
() dataset and for every 𝜽𝒋 ,
𝜽 where j is between 0
and m
}
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟎
𝜽 𝒙
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 ( − 𝟏) × 𝟏 = -0.5
𝟎
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 ( − 𝟎) × 𝟏 = 0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 ( − 𝟏) × 𝟏 = -0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 ( − 𝟎) × 𝟏 = 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 ( − 𝟏) × 𝟏 = -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 ( − 𝟎) × 𝟏 = 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟎
𝜽 𝒙
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 ( − 𝟏) × 𝟏 = -0.5
𝟎
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 ( − 𝟎) × 𝟏 = 0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 ( − 𝟏) × 𝟏 = -0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 ( − 𝟎) × 𝟏 = 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 ( − 𝟏) × 𝟏 = -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 ( − 𝟎) × 𝟏 = 0.5
Recap: Gradient Descent For Logistic
Regression
• Outline:
• Have cost function , where =[ 𝟎 𝒎
• Start off with some guesses for
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence{
() Third, let us compute
𝜽 every 𝜽𝒋
}
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟎 −𝑦 𝑥 () =𝟎
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 = 𝜃 − α ×0
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 New 𝜽𝟎
0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟎 −𝑦 𝑥 () =𝟎
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 = 𝜃 − α ×0
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5 Old 𝜽𝟎
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟎 −𝑦 𝑥 () =𝟎
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 = 𝜃 − α ×0
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5 = 0 − 0.5 × 𝟎 = 𝟎
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5 New Paramter Vector:
𝟏 𝟐, 𝟑, 𝟒, 𝟓
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟏
𝜽 𝒙
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟏 −𝑦 𝑥 () =𝟎
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 = 𝜃 − α ×0
0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5 = 0 − 0.5 × 𝟎 = 𝟎
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5 New Paramter Vector:
𝟐, 𝟑, 𝟒, 𝟓
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟐
𝜽 𝒙
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟐 −𝑦 𝑥 () = −𝟏
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 = 𝜃 − α × (−𝟏)
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0 = 0 − 0.5 × −𝟏 = 𝟎. 𝟓
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 0
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0 New Paramter Vector:
, 𝟑, 𝟒, 𝟓
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟑
𝜽 𝒙
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟑 −𝑦 𝑥 () =𝟎
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 0
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 =𝜃 −α×𝟎
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0 = 0 − 0.5 × 0 = 𝟎
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5 New Paramter Vector:
, , 𝟒, 𝟓
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟒
𝜽 𝒙
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟒 −𝑦 𝑥 () =𝟏
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 =𝜃 −α×𝟏
0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5 = 0 − 0.5 × 1 = −𝟎. 𝟓
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 0
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5 New Paramter Vector:
, , , 𝟓
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 ( − 𝒚)𝒙𝟓
𝜽 𝒙
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming = 0.5 and starting with = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
( 𝜽 𝒙 − 𝒚)𝒙𝟓 −𝑦 𝑥 () = −𝟏
1+𝑒 𝜽
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0
-0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃 = 𝜃 − α × (−𝟏)
0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 = 0 − 0.5 × (−1) = 𝟎. 𝟓
0
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0
-0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 New Paramter Vector:
0
, , ,
A Concrete Example: Testing
• Let us now test logistic regression on the spam email recognition
problem, using the just learnt , , ,
• Note: Testing is typically done over a portion of the dataset that is not used
during training, but rather kept only for testing the accuracy of the algorithm’s
predictions thus far
• In this example, we will test over all the examples that we used during training,
just for illustrative purposes
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt , , ,
𝒙 𝒚 𝜽𝑻 𝒙 𝒉𝜽 𝒙 = ( ) Predicted Class
𝜽 𝒙
[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331
[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 0.377540669
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt , , ,
𝒙 𝒚 𝜽𝑻 𝒙 𝒉𝜽 𝒙 = ( ) Predicted Class
𝜽 𝒙
[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331
[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 0.377540669
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt , , ,
(𝒊𝒇 𝒉𝜽 𝒙 ≥ 𝟎. 𝟓, 𝒚’ = 𝟏; 𝒆𝒍𝒔𝒆 𝒚’ = 𝟎)
𝒙 𝒚 𝜽𝑻 𝒙 𝒉𝜽 𝒙 = ( ) Predicted Class (or 𝒚’)
𝜽 𝒙
[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331
[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 0.377540669
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt , , ,
(𝒊𝒇 𝒉𝜽 𝒙 ≥ 𝟎. 𝟓, 𝒚’ = 𝟏; 𝒆𝒍𝒔𝒆 𝒚’ = 𝟎)
𝒙 𝒚 𝜽𝑻 𝒙 𝒉𝜽 𝒙 = ( ) Predicted Class (or 𝒚’)
𝜽 𝒙
[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331 1
[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669 0
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331 1
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 0.377540669 0
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331 1
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669 0
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt , , ,
(𝒊𝒇 𝒉𝜽 𝒙 ≥ 𝟎. 𝟓, 𝒚’ = 𝟏; 𝒆𝒍𝒔𝒆 𝒚’ = 𝟎)
𝒙 𝒚 𝜽𝑻 𝒙 𝒉𝜽 𝒙 = ( ) Predicted Class (or 𝒚’)
𝜽 𝒙
[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331 1
[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669 0
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331 NO
1
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 Mispredictions!
0.377540669 0
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331 1
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669 0
A Concrete Example: Inference
• Let us infer whether a given new email, say, k = [1, 0, 1, 0, 0, 1] is a spam
or not, using logistic regression with the just learnt parameter vector
, , ,
𝒙𝟎 = 𝟏 𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y
Email a 1 1 1 0 1 1 1
Email b 1 0 0 1 1 0 0
Email c 1 0 1 1 0 0 1
Email d 1 1 0 0 1 0 0
Email e 1 1 0 1 0 1 1
Email f 1 1 0 1 1 0 0
Email k 1 0 1 0 0 1 ?
Our Training Dataset
A Concrete Example: Inference
• Let us infer whether a given new email, say, k = [1, 0, 1, 0, 0, 1] is a spam
or not, using logistic regression with the just learnt parameter vector
, , ,
𝒙𝟎 = 𝟏 𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y
Email a 1 1 1 0 1 1 1
Email b 1 0 0 1 1 0 0
Email c 1 0 1 1 0 0 1
Email d 1 1 0 0 1 0 0
Email e 1 1 0 1 0 1 1
Email f 1 1 0 1 1 0 0
Email k 1 0 1 0 0 1 ?
A Concrete Example: Inference
• Let us infer whether a given new email, say, k = [1, 0, 1, 0, 0, 1] is a spam
or not, using logistic regression with the just learnt parameter vector
, , ,
𝟎
𝟎
𝜽 𝜽 𝒙 𝟎. 𝟓
𝟏, 𝟎, 𝟏, 𝟎, 𝟎, 𝟏 = 𝟎. 𝟓 × 𝟏 + 𝟎. 𝟓 × 𝟏 = 𝟏
𝟎
−𝟎. 𝟓
𝟎. 𝟓
𝟏
Class 1 (i.e., Spam)
A Concrete Example: Inference
• Let us infer whether a given new email, say, k = [1, 0, 1, 0, 0, 1] is a spam
or not, using logistic regression with the just learnt parameter vector
, , ,
𝒙𝟎 = 𝟏 𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y
Email a 1 1 1 0 1 1 1
Email b 1 0 0 1 1 0 0
Email c 1 0 1 1 0 0 1
Email d 1 1 0 0 1 0 0
Email e 1 1 0 1 0 1 1
Email f 1 1 0 1 1 0 0
Email k 1 0 1 0 0 1 1
Somehow interesting since it considered “vaccine” and “nigeria” indicative of spam!
Logistic Regression
Sources:
https://www.kaggle.com/
http://research.cs.tamu.edu
http://web.iitd.ac.in
https://www3.nd.edu/