0% found this document useful (0 votes)

3 views10 pages

Introduction

Uploaded by

Akimahbus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views10 pages

Introduction

Uploaded by

Akimahbus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

INTRODUCTION

`Winter is here`. Let’s welcome winters with a warm data science

problem

Let’s take a case study of a clothing company that manufactures jackets

and cardigans. They want to have a model that can predict whether the
customer will buy a jacket (class 1) or a cardigan(class 0) from their
historical behavioral pattern so that they can give specific offers according
to the customer’s needs. As a data scientist, you need to help them to build
a predictive model.

When we start Machine Learning algorithms, the first algorithm we learn

about is `Linear Regression` in which we predict a continuous target
variable.

If we use Linear Regression in our classification problem, we will get a

best-fit line like this:
Z = ßX + b

Problem with the linear line:

When you extend this line, you will have values greater than 1 and less
than 0, which do not make much sense in our classification problem. It will
make a model interpretation a challenge. That is where `Logistic
Regression` comes in. If we needed to predict sales for an outlet, then this
model could be helpful. But here we need to classify customers.

-We need a function to transform this straight line in such a way that values
will be between 0 and 1:

Ŷ = Q (Z)

Q (Z) =1/1+ e-z (Sigmoid Function)

Ŷ =1/1+ e-z

-After transformation, we will get a line that remains between 0 and 1.

Another advantage of this function is all the continuous values we will get
will be between 0 and 1 which we can use as a probability for making
predictions. For example, if the predicted value is on the extreme right, the
probability will be close to 1 and if the predicted value is on the extreme
left, the probability will be close to 0.
Selecting the right model is not enough. You need a function
that measures the performance of a Machine Learning model for given
data. Cost Function quantifies the error between predicted values and
expected values.

`If you can’t measure it, you can’t improve it.`

-Another thing that will change with this transformation is Cost Function. In
Linear Regression, we use `Mean Squared Error` for cost function given
by:-

and when this error function is plotted with respect to weight parameters of
the Linear Regression Model, it forms a convex curve which makes it
eligible to apply Gradient Descent Optimization Algorithm to minimize the
error by finding global minima and adjust weights.

Why don’t we use `Mean Squared Error as a cost

function in Logistic Regression?

In Logistic Regression Ŷi is a nonlinear function(Ŷ=1/1+ e-z), if we put this in

the above MSE equation it will give a non-convex function as shown:

 When we try to optimize values using gradient descent it will

create complications to find global minima.
 Another reason is in classification problems, we have target
values like 0/1, So (Ŷ-Y)2 will always be in between 0-1 which can
make it very difficult to keep track of the errors and it is difficult
to store high precision floating numbers.

The cost function used in Logistic Regression is Log Loss.

What is Log Loss?

Log loss, also known as logarithmic loss or cross-entropy loss, is a

common evaluation metric for binary classification models. It measures the
performance of a model by quantifying the difference between predicted
probabilities and actual values. Log-loss is indicative of how close the
prediction probability is to the corresponding actual/true value (0 or 1 in
case of binary classification), penalizing inaccurate predictions with higher
values. Lower log-loss indicates better model performance.

Log Loss is the most important classification metric based on probabilities.

It’s hard to interpret raw log-loss values, but log-loss is still a good metric
for comparing models. For any given problem, a lower log loss value
means better predictions.

Mathematical interpretation:

Log Loss is the negative average of the log of corrected predicted

probabilities for each instance.

Let us understand it with an example:

The model is giving predicted probabilities as shown above.

What are the corrected probabilities?

-> By default, the output of the logistics regression model is the probability
of the sample being positive(indicated by 1) i.e if a logistic regression
model is trained to classify on a `company dataset` then the predicted
probability column says What is the probability that the person has bought
jacket. Here in the above data set the probability that a person with ID6 will
buy a jacket is 0.94.

In the same way, the probability that a person with ID5 will buy a jacket (i.e.
belong to class 1) is 0.1 but the actual class for ID5 is 0, so the probability
for the class is (1-0.1)=0.9. 0.9 is the correct probability for ID5.
We will find a log of corrected probabilities for each instance.

As you can see these log values are negative. To deal with the negative
sign, we take the negative average of these values, to maintain a
common convention that lower loss scores are better.

In short, there are three steps to find Log Loss:

1. To find corrected probabilities.

2. Take a log of corrected probabilities.
3. Take the negative average of the values we get in the
2nd step.

If we summarize all the above steps, we can use the formula:-

Here Yi represents the actual class and log(p(yi)is the probability of that
class.

 p(yi) is the probability of 1.

 1-p(yi) is the probability of 0.

Now Let’s see how the above formula is working in two cases:

1. When the actual class is 1: second term in the formula would

be 0 and we will left with first term i.e. yi.log(p(yi)) and (1-
1).log(1-p(yi) this will be 0.
2. When the actual class is 0: First-term would be 0 and will be
left with the second term i.e (1-yi).log(1-p(yi)) and
0.log(p(yi)) will be 0.

wow!! we got back to the original formula for binary cross-entropy/log

loss

The benefits of taking logarithm reveal themselves

when you look at the cost function graphs for actual
class 1 and 0 :
 The Red line represents 1 class. As we can see, when the
predicted probability (x-axis) is close to 1, the loss is less and
when the predicted probability is close to 0, loss approaches
infinity.
 The Black line represents 0 class. As we can see, when the
predicted probability (x-axis) is close to 0, the loss is less and
when the predicted probability is close to 1, loss approaches
infinity.

Frequently Asked Questions

Q1. Why do we use log loss?
A. Log loss is commonly used as an evaluation metric for binary
classification tasks for several reasons. Firstly, it provides a continuous and
differentiable measure of the model’s performance, making it suitable for
optimization algorithms. Secondly, log loss penalizes confident and
incorrect predictions more heavily, incentivizing calibrated probability
estimates. Finally, log loss can be interpreted as the logarithmic measure of
the likelihood of the predicted probabilities aligning with the true labels.

Q2. What is a good log loss?

A. The interpretation of a “good” log loss value depends on the specific
context and problem domain. In general, a lower log loss indicates better
model performance. However, what constitutes a good log loss can vary
depending on the complexity of the problem, the availability of data, and
the desired level of accuracy. It is often useful to compare the log loss of
different models or benchmarks to assess their relative performance.

Regression vs Classification Algorithms
100% (1)
Regression vs Classification Algorithms
13 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Cost Function For Logistic Regression
No ratings yet
Cost Function For Logistic Regression
42 pages
Exp 2
No ratings yet
Exp 2
7 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
2+logistic Regression
No ratings yet
2+logistic Regression
10 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Week 7
No ratings yet
Week 7
21 pages
Lec 20
No ratings yet
Lec 20
16 pages
Binary Classification and Logistic Regression
No ratings yet
Binary Classification and Logistic Regression
7 pages
Log Reg Skimed - Ipynb - Colab
No ratings yet
Log Reg Skimed - Ipynb - Colab
10 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Loss Functions in Regression & Classification
No ratings yet
Loss Functions in Regression & Classification
14 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
Session-11 Machine Learning - Jupyter Notebook
No ratings yet
Session-11 Machine Learning - Jupyter Notebook
11 pages
Intro to Logistic Regression
No ratings yet
Intro to Logistic Regression
4 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Reference Material Logistic Regression
No ratings yet
Reference Material Logistic Regression
11 pages
Logistic Regression-1
No ratings yet
Logistic Regression-1
3 pages
Mathematics Behind Logistic Regression Model 1598272636
No ratings yet
Mathematics Behind Logistic Regression Model 1598272636
6 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
25 pages
4.logistic Regression
No ratings yet
4.logistic Regression
16 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Intro to Classification & Regression
No ratings yet
Intro to Classification & Regression
42 pages
Session-11 Machine Learning
No ratings yet
Session-11 Machine Learning
27 pages
Lecture3 Logistic Regression Regularization
No ratings yet
Lecture3 Logistic Regression Regularization
39 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
50 pages
cs188 Fa23 Note22
No ratings yet
cs188 Fa23 Note22
3 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Lec 02 LogisticReg
No ratings yet
Lec 02 LogisticReg
33 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
Logistic Regression
No ratings yet
Logistic Regression
13 pages
08 Logistic Regression
No ratings yet
08 Logistic Regression
19 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
Lec12 Logreg
No ratings yet
Lec12 Logreg
41 pages
Logistic Regression
No ratings yet
Logistic Regression
36 pages
15 Surrogate
No ratings yet
15 Surrogate
3 pages
Logistic Regression Basics
No ratings yet
Logistic Regression Basics
18 pages
ML2 Logistic Regression
No ratings yet
ML2 Logistic Regression
23 pages
Output 23
No ratings yet
Output 23
6 pages
Loss Functions
No ratings yet
Loss Functions
8 pages
Regression 3
No ratings yet
Regression 3
5 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
Uswah
No ratings yet
Uswah
4 pages
L14 Logistic Regression
No ratings yet
L14 Logistic Regression
22 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Loss Functions
No ratings yet
Loss Functions
7 pages
Logistic Regressions
No ratings yet
Logistic Regressions
11 pages
Logisticregression 2021
No ratings yet
Logisticregression 2021
78 pages
Iet Cipher ML Bootcamp (Session-1)
No ratings yet
Iet Cipher ML Bootcamp (Session-1)
67 pages
ML Lec-9
No ratings yet
ML Lec-9
13 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
SWMS - Brocken Glass Replacement
No ratings yet
SWMS - Brocken Glass Replacement
8 pages
H T-Valvula CLACK WS 1.5EE Con Medidor
No ratings yet
H T-Valvula CLACK WS 1.5EE Con Medidor
2 pages
Rbi Dsim Study Guide
No ratings yet
Rbi Dsim Study Guide
18 pages
Louisiana DOTD Hydraulics Program Guide
No ratings yet
Louisiana DOTD Hydraulics Program Guide
14 pages
Plotinus, Complete Works, VOL 1 Kenneth Sylvan Guthrie (1918)
100% (1)
Plotinus, Complete Works, VOL 1 Kenneth Sylvan Guthrie (1918)
300 pages
Origins of Imperial Ottoman Architecture in Istanbul - A Cross-Cu
No ratings yet
Origins of Imperial Ottoman Architecture in Istanbul - A Cross-Cu
124 pages
Iso 4624 2002 en FR PDF
100% (1)
Iso 4624 2002 en FR PDF
6 pages
ZDZ Engine
0% (1)
ZDZ Engine
6 pages
Mechanism of Labor
No ratings yet
Mechanism of Labor
22 pages
Dalian VTS Guide For Users: 3 Edition
No ratings yet
Dalian VTS Guide For Users: 3 Edition
14 pages
Delta Lube 06 (PDB) - en - GB Technical
No ratings yet
Delta Lube 06 (PDB) - en - GB Technical
3 pages
Introduction To Sustainable Architecture in The Delhi Sultanate - (1) - Deyani
No ratings yet
Introduction To Sustainable Architecture in The Delhi Sultanate - (1) - Deyani
10 pages
Final Project Report Glaucoma Detection
No ratings yet
Final Project Report Glaucoma Detection
41 pages
SCP-M-022 - Pipette Calibration
No ratings yet
SCP-M-022 - Pipette Calibration
3 pages
High School English Portfolio
No ratings yet
High School English Portfolio
17 pages
NE002 R 04 Practice Paper
No ratings yet
NE002 R 04 Practice Paper
10 pages
Questions For Adult Class 2023
No ratings yet
Questions For Adult Class 2023
4 pages
Infineon IQDH29NE2LM5 DataSheet v02 00 EN-3367072
No ratings yet
Infineon IQDH29NE2LM5 DataSheet v02 00 EN-3367072
12 pages
MCQs 1
No ratings yet
MCQs 1
6 pages
QP Field Sales Executive TEL Q0200 v4.0
No ratings yet
QP Field Sales Executive TEL Q0200 v4.0
42 pages
Elgan Powerpoint 1
No ratings yet
Elgan Powerpoint 1
20 pages
XLOT Procedures for Reservoir Stress
No ratings yet
XLOT Procedures for Reservoir Stress
5 pages
The Emergence of Life On Eart - 2001 - Progress in Biophysics and Molecular Biol
No ratings yet
The Emergence of Life On Eart - 2001 - Progress in Biophysics and Molecular Biol
46 pages
4th Gen Pneumatic Actuators Guide
No ratings yet
4th Gen Pneumatic Actuators Guide
22 pages
Is Your Glass Half Full or Half Empty
No ratings yet
Is Your Glass Half Full or Half Empty
2 pages
Abhay Pandey Resume
No ratings yet
Abhay Pandey Resume
3 pages
Kikambala Revised Drawings
No ratings yet
Kikambala Revised Drawings
1 page
Constantin Bulichi Portofolio
No ratings yet
Constantin Bulichi Portofolio
25 pages
Self-Reliance by Ralph Waldo Emerson
100% (1)
Self-Reliance by Ralph Waldo Emerson
17 pages
Lecture 5
No ratings yet
Lecture 5
3 pages

Introduction

Uploaded by

Introduction

Uploaded by

INTRODUCTION

`Winter is here`. Let’s welcome winters with a warm data science

Let’s take a case study of a clothing company that manufactures jackets

When we start Machine Learning algorithms, the first algorithm we learn

If we use Linear Regression in our classification problem, we will get a

Problem with the linear line:

Q (Z) =1/1+ e-z (Sigmoid Function)

-After transformation, we will get a line that remains between 0 and 1.

`If you can’t measure it, you can’t improve it.`

Why don’t we use `Mean Squared Error as a cost

In Logistic Regression Ŷi is a nonlinear function(Ŷ=1/1+ e-z), if we put this in

 When we try to optimize values using gradient descent it will

The cost function used in Logistic Regression is Log Loss.

What is Log Loss?

Log loss, also known as logarithmic loss or cross-entropy loss, is a

Log Loss is the most important classification metric based on probabilities.

Log Loss is the negative average of the log of corrected predicted

Let us understand it with an example:

What are the corrected probabilities?

In short, there are three steps to find Log Loss:

1. To find corrected probabilities.

If we summarize all the above steps, we can use the formula:-

 p(yi) is the probability of 1.

1. When the actual class is 1: second term in the formula would

wow!! we got back to the original formula for binary cross-entropy/log

The benefits of taking logarithm reveal themselves

Frequently Asked Questions

Q2. What is a good log loss?

You might also like