[go: up one dir, main page]

0% found this document useful (0 votes)
20 views61 pages

Linear Regression

The document outlines a session on Linear Regression, led by Dr. Abhinanda Sarkar, focusing on understanding linear relationships, building regression models, and evaluating their performance. It includes learning objectives, common business questions, and the need for regression in predicting outcomes based on input variables. The content emphasizes the distinction between correlation and causation, and introduces simple and multiple linear regression models for various business applications.

Uploaded by

balajiagasthya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views61 pages

Linear Regression

The document outlines a session on Linear Regression, led by Dr. Abhinanda Sarkar, focusing on understanding linear relationships, building regression models, and evaluating their performance. It includes learning objectives, common business questions, and the need for regression in predicting outcomes based on input variables. The content emphasizes the distinction between correlation and causation, and introduces simple and multiple linear regression models for various business applications.

Uploaded by

balajiagasthya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

balajiagasthya@gmail.

com
KNI4QRHVD5 Linear Regression

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Meet Your Speaker

Dr. Abhinanda Sarkar


Academic Director at Great Learning

Alumnus - Indian Statistical Institute, Stanford University

balajiagasthya@gmail.com Faculty - MIT, Indian Institute of Management, Indian


KNI4QRHVD5
Institute of Science
Experienced in applying probabilistic models, statistical
analysis and machine learning to diverse areas
Certified Master Black Belt in Lean Six Sigma and Design
for Six Sigma in GE

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Learning Objectives
By the end of this session, you should be able to:
Relate correlation and simple linear regression in the context of understanding
linear relationships.

Explore simple linear regression models to capture the linear relationship


between a pair of attributes.

Build multiple linear regression to model relationships between


balajiagasthya@gmail.com two or more
KNI4QRHVD5 input attributes and the output, to predict business outcomes.

Evaluate linear regression models and identify the levers to improve their
performance.

Discover the applications of linear regression to solve a variety of business


problems.

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Agenda
In this session, we’ll discuss:

Business Problem and Solution Space

Correlation and Linear Relationships

Simple Linear Regression


balajiagasthya@gmail.com
KNI4QRHVD5
Multiple Linear Regression

Categorical Variables in Regression

Evaluation Metrics for Regression

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Common Business Questions

How can we forecast sales based on historical sales data and marketing
expenditure?

How do we determine medical insurance premiums for customers based on


attributes like blood pressure, blood sugar level, and smoking habits?
balajiagasthya@gmail.com
KNI4QRHVD5
How do we determine the credit card limit to be assigned to customers based on
their past spending behavior, demographic information, etc?

How can we predict future power load requirements to ensure reliable grid
operation and prevent outages?

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Problem Space
Sales Forecasting
and Optimization Investment
Return
Forecasting
Credit Scoring

Price
Optimization Banking and
Financial
Services
Retail
Interest Rate
Modeling
balajiagasthya@gmail.com Load Forecasting
Customer
KNI4QRHVD5
Lifetime Value Medical Insurance
Prediction Premium Prediction

Predicting a Carbon Emission


Number Estimation

Energy
Drug Dosage
Optimization
Healthcare

Renewable Energy
Patient Length of Production Estimation
Stay Estimation

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Problem Space
Sales Forecasting
and Optimization Investment
Return
Forecasting
Credit Scoring

Price
Optimization Banking and
Financial
Services
Retail
Interest Rate
Modeling
balajiagasthya@gmail.com Load Forecasting
Customer
KNI4QRHVD5
Lifetime Value Medical Insurance
Prediction Premium Prediction

Predicting a Carbon Emission


Number Estimation

Energy
Drug Dosage
Optimization
Healthcare

Renewable Energy
Patient Length of Production Estimation
Stay Estimation

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Problem Statement

Consider an online retailer of mobiles and tablets

Crucial to stay ahead of market trends and consumer preferences to maximize sales

Need to effectively manage inventory and marketing efforts to attract and retain
customers
balajiagasthya@gmail.com
KNI4QRHVD5

Objectives

Accurately forecast sales to make Identify the key levers that can
informed decisions influence sales

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Sales Forecasting and Optimization
Current State Desired State

Gap / Key Questions

Developed a sales
Unable to estimate the sales of How do we predict the
forecasting mechanism to
a particular gadget for the next number of units of iPhones
estimate revenue for the
six months that will sell in the next
next six months
quarter?
Difficulty in allocating funds for
balajiagasthya@gmail.com One unit increase in
KNI4QRHVD5 marketing as we cannot
marketing spending will
identify factors driving the sales What are the factors that
result in 20 units increase in
of a particular gadget affect the sales of iPhones?
iPhone sales

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Visualizing Relationships

balajiagasthya@gmail.com
KNI4QRHVD5

What happens to Sales as Advertising What happens to Sales as Product No relationship


Expenditure increases? Price increases?

Advertising Product
Expenditure Sales Price Sales

Positive relationship Negative relationship

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Visualizing Relationships

(I) In both the cases, we observe a positive


relationship between sales and advertising
(I)
expenditure

balajiagasthya@gmail.com What is the difference?


KNI4QRHVD5

(II) In both the cases, we observe a negative


(II) relationship between the sales and product
price

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Visualizing Relationships

(I)

The cases on the left - in both (I) and (II) -


balajiagasthya@gmail.com exhibit a stronger relationship (positive or
KNI4QRHVD5
negative) than the ones on the right

(II)

Strong Weak

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Correlation

We have seen how to visually identify relationships between a pair of variables


from two aspects - direction and strength

But we need a quantitative measure of the relationship

Correlation is a
statistical measure that describes the strength and direction of a
balajiagasthya@gmail.com
KNI4QRHVD5
relationship between two variables.

Indicates the degree to which two variables tend to change together

Quantifies both the direction and strength of the relationship

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Correlation

Correlation typically ranges between -1 and 1.

-1 0 +1

balajiagasthya@gmail.com
Perfect negative
KNI4QRHVD5 correlation No correlation Perfect positive correlation

One variable decreases as Variables are independent Both variables


the other increases of each other increase together

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Pearson’s Correlation Coefficient

One of the most commonly used measures of correlation.

A statistical measure that quantifies the strength and direction of the linear
relationship between two continuous variables.
balajiagasthya@gmail.com
KNI4QRHVD5

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Correlation vs. Causation

We observed advertising expenditure exhibits a strong


positive correlation with sales.

As advertising expenditure increased, sales increased.

balajiagasthya@gmail.com
KNI4QRHVD5 Does it mean advertising expenditure causes an
increase in sales?

Not necessarily true!

There might be other factors at play.

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Correlation vs. Causation

Economic Zone 1

balajiagasthya@gmail.com Economic Zone 2


KNI4QRHVD5

Let’s split the data with respect to another factor - economic zone.

Economic Zone 1 has a booming economy - sales will be higher here


even if we don’t spend as much on marketing.

Economic Zone 2 has a stagnant economy - sales might have been


higher due to data collected in a festive season.
This file is meant for personal use by balajiagasthya@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Correlation vs. Causation

Correlation ≠ Causation

balajiagasthya@gmail.com
KNI4QRHVD5
Variable 1 and Variable 2
are highly correlated ≠ Variable 1 causes a change
in Variable 2

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
The Need for Regression

We observed advertising expenditure exhibits a strong


positive correlation with sales.

Let’s say we now decide to spend $310 for the


marketing campaign of the latest iPhone.

How much
balajiagasthya@gmail.com sales should we expect?
KNI4QRHVD5 $310

We don’t know!

Historically, we’ve had different sales for similar


marketing spending.

Correlation measures the strength and direction of the relationship,


but doesn't provide a way to predict the output given an input.
This file is meant for personal use by balajiagasthya@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
The Need for Regression

It is important for us to be able to determine the output


(sales).

It is also important to identify the lever(s) that drive the


output (sales). $310

Hence, the need


balajiagasthya@gmail.com for a mathematical model.
KNI4QRHVD5

Input value Mathematical Output value


($310 marketing spend) model ($13040.50 sales)

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Simple Linear Regression

The simplest mathematical model is linear - a straight line.

Linear Regression is a statistical model which estimates


the linear relationship between a response and one or
more explanatory variables.

balajiagasthya@gmail.com
KNI4QRHVD5
Simple Linear Regression - one explanatory and one response
variable.

Assumes that there is a linear relationship between the explanatory


(independent) variable and the response (dependent) variable.

advertising expenditure sales

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Simple Linear Regression
Slope
The equation of line is represented by:

advertising
sales
expenditure

y = b0 + b1 x

balajiagasthya@gmail.com
KNI4QRHVD5
y-intercept (b0) Slope (b1)

y-intercept
The average value of the The average change in the
response when the explanatory response for one-unit increase in
variable is zero the explanatory variable

Coefficients of Simple Linear Regression

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coefficient Interpretation

Consider the following model for our context:

sales = 1.01 + 2.45 * advertising expenditure

For a unit increase


balajiagasthya@gmail.com in advertising expenditure, the sales will
KNI4QRHVD5
increase by 2.45 units.

This interpretation is valid ONLY IF the


assumptions of linear regression hold true.

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coefficient Interpretation

Consider the following model for our context:

sales = 1.01 + 2.45 * advertising expenditure

If we have zero
balajiagasthya@gmail.com marketing expenditure:
KNI4QRHVD5

sales = 1.01 + 2.45 * 0 = 1.01

Makes business sense — we can have organic sales.

What if the business context changes?

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coefficient Interpretation

Consider the case of predicting the price of a house using the following model:

house price = 291.07 + 105.45 * square footage

balajiagasthya@gmail.com
KNI4QRHVD5 For a unit increase in square footage, the price of the house increases by 105.45 units.

This interpretation is valid ONLY IF the assumptions of


linear regression hold true.

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coefficient Interpretation

Consider the case of predicting the price of a house using the following model:

house price = 291.07 + 105.45 * square footage

In the case
balajiagasthya@gmail.com of zero square footage:
KNI4QRHVD5

house price = 291.07 + 105.45 * 0 = 291.07

Doesn’t make business sense!

y-intercept doesn’t always make business sense.

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Best-Fit Line

We observed one line that described the relationship


between sales and advertising expenditure.

But we can draw multiple lines!

balajiagasthya@gmail.com
KNI4QRHVD5
Which line do we choose?

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Best-Fit Line

We first need to understand the difference between


these lines.

We have actual data points (actual sales) and


predicted data points (model’s predicted sales).
balajiagasthya@gmail.com
KNI4QRHVD5
Prediction Error = Actual Value - Predicted Value

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 28
Best-Fit Line

balajiagasthya@gmail.com
KNI4QRHVD5

There are multiple data points to consider.

Take the aggregate of the errors across the data points.

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Best-Fit Line

balajiagasthya@gmail.com
KNI4QRHVD5

The line with the least aggregate error across all data points is the
one we want.

This is called the best-fit line.

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Best-Fit Line Computation
How to find the error?

balajiagasthya@gmail.com
KNI4QRHVD5 Actual Predicted
Value Value

Difference between actual and predicted values can be


positive or negative

Direct addition will give a false picture of low overall error

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Best-Fit Line Computation

Take absolute values

balajiagasthya@gmail.com
KNI4QRHVD5

How to minimize the error?

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Best-Fit Line Computation

Need to find the values of the coefficients (b0 and b1)


that yield the minimum error

Use differentiation

balajiagasthya@gmail.com
KNI4QRHVD5
Differentiate the error with respect to the Differentiable
coefficients (b0 and b1)
Not differentiable

Differentiating absolute values is mathematically


inconvenient

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Best-Fit Line Computation

Use squared values instead

balajiagasthya@gmail.com
KNI4QRHVD5
Differentiable
Accommodates both positive and negative errors

Mathematically convenient - differentiable

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Best-Fit Line Computation
Use squared values instead

balajiagasthya@gmail.com
KNI4QRHVD5

This is known as the Method of Least Squares

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Multiple Linear Regression

We have checked the relationship between sales and advertising expenditure

What if there is another variable which can be used to predict the sales?

balajiagasthya@gmail.com
KNI4QRHVD5 Input
1
($310 advertising expenditure)
Mathematical Output value
model ($13341.70 sales)

Input 2
(3.88% discount percentage)

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Multiple Linear Regression

Multiple Linear Regression - two or more explanatory and


one response variable

Extension of Simple Linear Regression

balajiagasthya@gmail.com
KNI4QRHVD5 Input
1
($310 advertising expenditure)
Multiple Linear Output value
Regression ($13341.70 sales)

Input 2
(3.88% discount percentage)

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Multiple Linear Regression

Multiple Linear Regression equation - two explanatory variables

advertising discount
sales
expenditure percentage
balajiagasthya@gmail.com
KNI4QRHVD5

y = b0 + b1x1 + b2x2

Coefficients of Multiple Linear Regression

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Multiple Linear Regression

balajiagasthya@gmail.com
KNI4QRHVD5

y = b0 + b1 x1 y = b0 + b1x1+ b2x2

For one explanatory variable, the equation was that of a line

For two explanatory variables, the equation will be that of a plane

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coefficient Interpretation

Consider the following model for our context

sales = 1.01 + 2.45 * advertising expenditure + 7.88 * discount percentage

For a unit increase in advertising expenditure, the sales will increase by 2.45
units, provided all other variables are held constant
balajiagasthya@gmail.com
KNI4QRHVD5
For a unit increase in discount percentage, the sales will increase by 7.88 units,
provided all other variables are held constant

These interpretations are valid ONLY IF the


assumptions of linear regression hold true

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Multiple Linear Regression

Multiple Linear Regression equation - more than two explanatory


variables

advertising discount product


sales
expenditure percentage price
balajiagasthya@gmail.com
KNI4QRHVD5

y = b0 + b1x1 + b2x2 + b3x3

Coefficients of Multiple Linear Regression

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Categorical Variables in Regression

So far we’ve worked with numerical variables

But real-world data often contains categorical variables

Consider the following case


balajiagasthya@gmail.com
KNI4QRHVD5 numerical categorical
advertising
popularity
expenditure

sales y = b0 + b1x1 + b2x2 + b3x3 + b4x4

discount product
percentage price
numerical numerical only.
This file is meant for personal use by balajiagasthya@gmail.com
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Categorical Variables in Regression
Categorical variables are not numbers - even if they might be represented by numbers

Can’t be utilized directly in a linear regression model

Need to be converted into a numerical format

Encoding
balajiagasthya@gmail.com
KNI4QRHVD5

Label Encoding One-hot Encoding

Used when the categories have an Used when the categories have no
inherent sense of order inherent sense of order

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Label Encoding

Assigns a unique integer to each category

Order of the integers represents the order of the categories

Popularity
balajiagasthya@gmail.com Popularity
KNI4QRHVD5
Very Low 1

Low Label 2
Encoding
Moderate 3

High 4

Very High 5

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
One-hot Encoding

A new column is created for each category

If the data point contains the category, corresponding column has value 1

If the data point doesn’t contain the category, corresponding column has value 0

balajiagasthya@gmail.com
KNI4QRHVD5 Region Region Region Region
Region _North _East _West _South

East
0 1 0 0
One-hot
South Encoding 0 0 0 1
West
0 0 1 0
East
0 1 0 0
North This file is meant for personal use by balajiagasthya@gmail.com only.
1 0 0 0
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Evaluation Metrics

We have seen multiple models so far

We don’t know ‘how well’ these models are performing

Need to evaluate the models to gauge if they’re performing ‘well’


balajiagasthya@gmail.com
KNI4QRHVD5
Model performance is measured using metrics

Quantify how well the model predictions align with the actual values

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Evaluation Metrics

Mean
Absolute
Lorem ipsum
An intuitive metric
Error

balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum
Metrics
congue Gives an idea of how much the model predictions
deviate from the actual observations

Relative to the range of the response

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Evaluation Metrics

Mean
Problem with considering the absolute value of
Absolute
Lorem ipsum

Error errors is it doesn’t penalize larger errors

Needed to ensure that the model learns to do better


when encountering edge cases

balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum
Metrics
congue

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Evaluation Metrics

Mean
Problem with considering the absolute value of
Absolute
Lorem ipsum

Error errors is it doesn’t penalize larger errors


Root
Mean
Squared Needed to ensure that the model learns to do better
Error
when encountering edge cases

balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum
Metrics
congue

Relative to the range of the response

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Evaluation Metrics

Mean
MAE and RMSE are relative to the scale of the response
Absolute
Lorem ipsum

Error
Root
Mean Cannot compare models across different data and scale of
Squared response value
Error

balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum
Metrics
congue

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Evaluation Metrics

Mean
MAE and RMSE are relative to the scale of the response
Absolute
Lorem ipsum

Error
Root
Mean Cannot compare models across different data and scale of
Squared response value
Error

balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum Mean
Absolute
Metrics
congue
Percentage
Error

Indifferent of the range of the response

Needs to be adjusted when the actual


value of the response is zero

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Evaluation Metrics

Mean
Previous metrics do not clearly quantify how well the
Absolute
Lorem ipsum

Error model explains the variability in the data


Root
Mean
Squared
Error

balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum Mean
Absolute
Metrics
congue
Percentage
Error

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Evaluation Metrics

Mean
Previous metrics do not clearly quantify how well the
Absolute
Lorem ipsum

Error model explains the variability in the data


Root
Mean
Squared
Error

balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum Mean
Absolute
Metrics
congue
Percentage
Error
Generally ranges between 0 and 1

R
Squared

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Evaluation Metrics

Mean
Tends to increase when adding more explanatory
Absolute
Lorem ipsum

Error variables
Root
Mean
Squared Does not account for the value addition from the
Error
added explanatory variables

balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum Mean
Absolute
Metrics
congue
Percentage
Error

R
Squared

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Evaluation Metrics

Mean
Tends to increase when adding more explanatory
Absolute
Lorem ipsum

Error variables
Root
Mean
Squared Does not account for the value addition from the
Error
added explanatory variables

balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum Mean
Absolute
Metrics
congue
Percentage
Error

Accounts for the number of explanatory variables in the model


R
Squared
Adjusted
R
Squared

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Evaluation Metrics

Mean
Gives a sense of which variables actually help in
Absolute
Lorem ipsum

Error prediction and which ones do not


Root
Mean
Squared Provides a balance between model fit and
Error
complexity (number of explanatory variables)

balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum Mean
Absolute
Metrics
congue
Percentage
Error

R
Squared
Adjusted
R
Squared

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Summary
Here’s a quick recap of what we've learned:

Business Problem and Solution Space: Identifies the specific problem that
linear regression aims to solve and defines the scope of its application in business
contexts.

Correlation and Linear Relationships: Explores how correlation measures the


strength and direction of linear relationships between variables, laying the
balajiagasthya@gmail.com
KNI4QRHVD5 foundation for understanding linear regression.

Simple Linear Regression: Introduces the basic concept of simple linear


regression, which models the relationship between a dependent variable and one
independent variable using a straight line.

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Summary

Multiple Linear Regression: Expands on the concept of simple linear regression


by incorporating multiple independent variables to predict a dependent variable,
accommodating more complex relationships.

Categorical Variables in Regression: Discusses strategies for encoding


categorical variables in regression models to include qualitative data effectively in
balajiagasthya@gmail.com
KNI4QRHVD5 predictive analysis.

Evaluation Metrics for Regression: Covers key metrics such as Mean Squared
Error (MSE), R-squared, and others used to assess the accuracy and performance
of regression models in predicting outcomes.

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Learning Outcomes
You should now be able to:

Explain how correlation measures the strength and direction of linear


relationships, and apply this understanding to build simple linear regression
models effectively.

Gain proficiency in constructing and interpreting simple linear regression models


to analyze and
predict relationships between two variables.
balajiagasthya@gmail.com
KNI4QRHVD5

Develop multiple linear regression models to enable the prediction of business


outcomes using multiple input variables.

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Learning Outcomes

Evaluate linear regression models using key metrics and implement strategies to
enhance model performance and accuracy.

Identify and apply linear regression techniques to solve various real-world


business problems, leveraging its predictive capabilities across different domains.
balajiagasthya@gmail.com
KNI4QRHVD5

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Happy Learning !
balajiagasthya@gmail.com
KNI4QRHVD5

This file is meant for personal use by balajiagasthya@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action. 61
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

You might also like