Logistic Regression
Part 1
Manoj Kumar
Youtube
18th October, 2024
Manoj Kumar Youtube
Logistic Regression 1 / 36
Given data
Blood Pressure Level (mm Hg) Diabetes Status
80 Not Diabetic
90 Not Diabetic
100 Not Diabetic
110 Not Diabetic
120 Diabetic
130 Diabetic
140 Diabetic
150 Diabetic
1: Blood Pressure Levels and Diabetes Status
Manoj Kumar Youtube
Logistic Regression 2 / 36
Blood Diabetes
Pressure Status
Level
(mm Hg)
80 Not Diabetic
90 Not Diabetic
100 Not Diabetic
110 Not Diabetic
120 Diabetic
130 Diabetic
140 Diabetic
150 Diabetic
Blood Diabetes
Pressure Status
Level
(mm Hg)
80 Not Diabetic
90 Not Diabetic
100 Not Diabetic
110 Not Diabetic
120 Diabetic
130 Diabetic
140 Diabetic
150 Diabetic
Blood Diabetes
Pressure Status
Level
(mm Hg)
80 Not Diabetic
90 Not Diabetic
100 Not Diabetic
110 Not Diabetic
120 Diabetic
130 Diabetic
140 Diabetic
150 Diabetic
Blood Diabetes
Pressure Status
Level
(mm Hg)
80 Not Diabetic
90 Not Diabetic
100 Not Diabetic
110 Not Diabetic
120 Diabetic
130 Diabetic
140 Diabetic
150 Diabetic
Why is Linear Regression not good for classification?
• The output ŷ can be outside of the label range {0, 1}.
• What does a response of value of -2 mean?
Possible classification: assign ŷ > 0.5 to 1, and 0 otherwise.
• Boundary very sensitive to outliers.
Manoj Kumar Youtube
Logistic Regression 4 / 36
Blood Diabetes
Pressure Status
Level
(mm Hg)
80 Not Diabetic
90 Not Diabetic
100 Not Diabetic
110 Not Diabetic
120 Diabetic
130 Diabetic
140 Diabetic
150 Diabetic
Blood Diabetes
Pressure Status
Level
(mm Hg)
80 Not Diabetic
90 Not Diabetic
100 Not Diabetic
110 Not Diabetic
120 Diabetic
130 Diabetic
140 Diabetic
150 Diabetic
Blood Diabetes
Pressure Status
Level
(mm Hg)
80 Not Diabetic
90 Not Diabetic
100 Not Diabetic
110 Not Diabetic
120 Diabetic
130 Diabetic
140 Diabetic
150 Diabetic
Blood Diabetes
Pressure Status
Level
(mm Hg)
80 Not Diabetic
90 Not Diabetic
100 Not Diabetic
110 Not Diabetic
120 Diabetic
130 Diabetic
140 Diabetic
150 Diabetic
Sigmoid/Logistic Function
Manoj Kumar Youtube
Logistic Regression 6 / 36
Sigmoid/Logistic Function
Sigmoid Function
σ(a)
0.5
Manoj Kumar Youtube
Logistic Regression 6 / 36
Sigmoid/Logistic Function
• Bounded:
1
Sigmoid Function σ(a) = ∈ (0, 1)
1 + exp(−a)
σ(a)
0.5
Manoj Kumar Youtube
Logistic Regression 6 / 36
Sigmoid/Logistic Function
• Bounded:
1
Sigmoid Function σ(a) = ∈ (0, 1)
1 + exp(−a)
σ(a)
• Symmetric:
exp(−a) 1
1 − σ(a) = = = σ(−a)
0.5 1 + exp(−a) exp(a) + 1
Manoj Kumar Youtube
Logistic Regression 6 / 36
Sigmoid/Logistic Function
• Bounded:
1
Sigmoid Function σ(a) = ∈ (0, 1)
1 + exp(−a)
σ(a)
• Symmetric:
exp(−a) 1
1 − σ(a) = = = σ(−a)
0.5 1 + exp(−a) exp(a) + 1
• Gradient:
exp(−a)
a σ ′ (a) = = σ(a)(1 − σ(a))
(1 + exp(−a))2
Manoj Kumar Youtube
Logistic Regression 6 / 36
Interpretation of the Sigmoid Function
Manoj Kumar Youtube
Logistic Regression 7 / 36
Interpretation of the Sigmoid Function
Note
For a given input x , the probability that the
Sigmoid Function
class label Y equals 1 given x , is represented
by: 1
σ(x )
1 0.8
P(Y = 1|x ) =
1 + e −(w0 +w1 x )
0.6
• P(Y = 0|x ) + P(Y = 1|x ) = 1 0.4
• P(Y = 0|x ) = 1 − 1
1+e −(w0 +w1 x ) 0.2
x
By default, we take threshold = 0.5: −5 5
• Y = 1 if P(Y = 1|x ) ≥ 0.5
• Y = 0 if P(Y = 1|x ) < 0.5
Manoj Kumar Youtube
Logistic Regression 7 / 36
Question
3
Given a logistic regression model with a parameter vector θ = 4, and a test data point
1
1
x = 1, what is the probability of Y = 1 for the given data point x?
1
Question
Suppose that you have trained a logistic regression classifier hθ (x ) = σ(1 − x ) where σ(·) is the
logistic/sigmoid function. What does its output on a new example x = 2 mean? Check all that
apply.
• □ Your estimate for P(y = 1|x ; θ) is about 0.73.
• □ Your estimate for P(y = 0|x ; θ) is about 0.27.
• □ Your estimate for P(y = 1|x ; θ) is about 0.27.
• □ Your estimate for P(y = 0|x ; θ) is about 0.73.
Question
Consider the sigmoid function f (x ) = 1
1+e −x
. The derivative f ′ (x ) is:
• f (x ) ln f (x ) + (1 − f (x )) ln(1 − f (x ))
• f (x )(1 − f (x ))
• f (x ) ln(1 − f (x ))
• f (x )(1 + f (x ))
Question
Let σ(a) = 13 . Using the properties of sigmoid function, calculate the value of the expression:
σ ′ (−a), where ′ represents derivative.
2
1 9
2 − 29
1
3 9
4 − 19
Question
Q3-2: Which of the following statement is true about outliers in Linear regression?
1 Linear regression is sensitive to outliers
2 Linear regression is NOT sensitive to outliers
3 Can’t say
4 None of these
Question
Suppose we have trained a logistic regression classifier for a binary classification task. The table
below provides the true labels y and the predicted probabilities P(Y = 1 | x ) for a set of data
points. We want to evaluate the accuracy of the classifier for the following thresholds:
• Model A: T = 0.25
• Model B: T = 0.5
• Model C: T = 0.75
Calculate the accuracy for each model and determine which threshold results in the highest
accuracy.
True Label y P(Y = 1 | x )
1 0.9
1 0.6
0 0.6
0 0.55
1 0.4
0 0.3
Decision Boundary (Using Logistic)
Manoj Kumar Youtube
Logistic Regression 14 / 36
Decision Boundary (Using Logistic)
• Y = 1 if
P(Y = 1|X ) ≥ 0.5
• For 1 input feature (X ):
z = w0 + w1 X
1 1
≥
1 + e −z 2
1 + e −(w0 +w1 X ) ≤ 2
w0 + w1 X ≥ 0
Y = 1 if X ≥ − ww10
Y =0 Y =1
Decision Boundary
Manoj Kumar Youtube
Logistic Regression 14 / 36
• For 2 input features (X1 , X2 ):
z = w0 + w1 X1 + w2 X2
P(Y = 1|X1 , X2 ) ≥ 0.5
1
≥ 0.5
1 + e −z
Decision Boundary
Y = 1 if w0 + w1 X1 + w2 X2 ≥ 0
X2
Y=1
Y=0
X1
Question
Suppose you train a logistic regression classifier and the learned hypothesis function is:
hθ (x ) = σ(θ0 + θ1 x1 + θ2 x2 ),
where θ0 = 6, θ1 = 0, θ2 = −1. Which of the following represents the decision boundary for
hθ (x )?
A B C D
10 x2 10 x2 10 x2 10 x2
8 y =1 8 y =1 y =0 8 y =0 y =1 8 y =0
6 6 6 6
4 y =0 4 4 4 y =1
2 x1 2 x1 2 x1 2 x1
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10