Linear Regression
Linear Regression
com
KNI4QRHVD5 Linear Regression
Evaluate linear regression models and identify the levers to improve their
performance.
How can we forecast sales based on historical sales data and marketing
expenditure?
How can we predict future power load requirements to ensure reliable grid
operation and prevent outages?
Price
Optimization Banking and
Financial
Services
Retail
Interest Rate
Modeling
balajiagasthya@gmail.com Load Forecasting
Customer
KNI4QRHVD5
Lifetime Value Medical Insurance
Prediction Premium Prediction
Energy
Drug Dosage
Optimization
Healthcare
Renewable Energy
Patient Length of Production Estimation
Stay Estimation
Price
Optimization Banking and
Financial
Services
Retail
Interest Rate
Modeling
balajiagasthya@gmail.com Load Forecasting
Customer
KNI4QRHVD5
Lifetime Value Medical Insurance
Prediction Premium Prediction
Energy
Drug Dosage
Optimization
Healthcare
Renewable Energy
Patient Length of Production Estimation
Stay Estimation
Crucial to stay ahead of market trends and consumer preferences to maximize sales
Need to effectively manage inventory and marketing efforts to attract and retain
customers
balajiagasthya@gmail.com
KNI4QRHVD5
Objectives
Accurately forecast sales to make Identify the key levers that can
informed decisions influence sales
Developed a sales
Unable to estimate the sales of How do we predict the
forecasting mechanism to
a particular gadget for the next number of units of iPhones
estimate revenue for the
six months that will sell in the next
next six months
quarter?
Difficulty in allocating funds for
balajiagasthya@gmail.com One unit increase in
KNI4QRHVD5 marketing as we cannot
marketing spending will
identify factors driving the sales What are the factors that
result in 20 units increase in
of a particular gadget affect the sales of iPhones?
iPhone sales
balajiagasthya@gmail.com
KNI4QRHVD5
Advertising Product
Expenditure Sales Price Sales
(I)
(II)
Strong Weak
Correlation is a
statistical measure that describes the strength and direction of a
balajiagasthya@gmail.com
KNI4QRHVD5
relationship between two variables.
-1 0 +1
balajiagasthya@gmail.com
Perfect negative
KNI4QRHVD5 correlation No correlation Perfect positive correlation
A statistical measure that quantifies the strength and direction of the linear
relationship between two continuous variables.
balajiagasthya@gmail.com
KNI4QRHVD5
balajiagasthya@gmail.com
KNI4QRHVD5 Does it mean advertising expenditure causes an
increase in sales?
Economic Zone 1
Let’s split the data with respect to another factor - economic zone.
Correlation ≠ Causation
balajiagasthya@gmail.com
KNI4QRHVD5
Variable 1 and Variable 2
are highly correlated ≠ Variable 1 causes a change
in Variable 2
How much
balajiagasthya@gmail.com sales should we expect?
KNI4QRHVD5 $310
We don’t know!
balajiagasthya@gmail.com
KNI4QRHVD5
Simple Linear Regression - one explanatory and one response
variable.
advertising
sales
expenditure
y = b0 + b1 x
balajiagasthya@gmail.com
KNI4QRHVD5
y-intercept (b0) Slope (b1)
y-intercept
The average value of the The average change in the
response when the explanatory response for one-unit increase in
variable is zero the explanatory variable
If we have zero
balajiagasthya@gmail.com marketing expenditure:
KNI4QRHVD5
Consider the case of predicting the price of a house using the following model:
balajiagasthya@gmail.com
KNI4QRHVD5 For a unit increase in square footage, the price of the house increases by 105.45 units.
Consider the case of predicting the price of a house using the following model:
In the case
balajiagasthya@gmail.com of zero square footage:
KNI4QRHVD5
balajiagasthya@gmail.com
KNI4QRHVD5
Which line do we choose?
balajiagasthya@gmail.com
KNI4QRHVD5
balajiagasthya@gmail.com
KNI4QRHVD5
The line with the least aggregate error across all data points is the
one we want.
balajiagasthya@gmail.com
KNI4QRHVD5 Actual Predicted
Value Value
balajiagasthya@gmail.com
KNI4QRHVD5
Use differentiation
balajiagasthya@gmail.com
KNI4QRHVD5
Differentiate the error with respect to the Differentiable
coefficients (b0 and b1)
Not differentiable
balajiagasthya@gmail.com
KNI4QRHVD5
Differentiable
Accommodates both positive and negative errors
balajiagasthya@gmail.com
KNI4QRHVD5
What if there is another variable which can be used to predict the sales?
balajiagasthya@gmail.com
KNI4QRHVD5 Input
1
($310 advertising expenditure)
Mathematical Output value
model ($13341.70 sales)
Input 2
(3.88% discount percentage)
balajiagasthya@gmail.com
KNI4QRHVD5 Input
1
($310 advertising expenditure)
Multiple Linear Output value
Regression ($13341.70 sales)
Input 2
(3.88% discount percentage)
advertising discount
sales
expenditure percentage
balajiagasthya@gmail.com
KNI4QRHVD5
y = b0 + b1x1 + b2x2
balajiagasthya@gmail.com
KNI4QRHVD5
y = b0 + b1 x1 y = b0 + b1x1+ b2x2
For a unit increase in advertising expenditure, the sales will increase by 2.45
units, provided all other variables are held constant
balajiagasthya@gmail.com
KNI4QRHVD5
For a unit increase in discount percentage, the sales will increase by 7.88 units,
provided all other variables are held constant
discount product
percentage price
numerical numerical only.
This file is meant for personal use by balajiagasthya@gmail.com
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Categorical Variables in Regression
Categorical variables are not numbers - even if they might be represented by numbers
Encoding
balajiagasthya@gmail.com
KNI4QRHVD5
Used when the categories have an Used when the categories have no
inherent sense of order inherent sense of order
Popularity
balajiagasthya@gmail.com Popularity
KNI4QRHVD5
Very Low 1
Low Label 2
Encoding
Moderate 3
High 4
Very High 5
If the data point contains the category, corresponding column has value 1
If the data point doesn’t contain the category, corresponding column has value 0
balajiagasthya@gmail.com
KNI4QRHVD5 Region Region Region Region
Region _North _East _West _South
East
0 1 0 0
One-hot
South Encoding 0 0 0 1
West
0 0 1 0
East
0 1 0 0
North This file is meant for personal use by balajiagasthya@gmail.com only.
1 0 0 0
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Evaluation Metrics
Quantify how well the model predictions align with the actual values
Mean
Absolute
Lorem ipsum
An intuitive metric
Error
balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum
Metrics
congue Gives an idea of how much the model predictions
deviate from the actual observations
Mean
Problem with considering the absolute value of
Absolute
Lorem ipsum
balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum
Metrics
congue
Mean
Problem with considering the absolute value of
Absolute
Lorem ipsum
balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum
Metrics
congue
Mean
MAE and RMSE are relative to the scale of the response
Absolute
Lorem ipsum
Error
Root
Mean Cannot compare models across different data and scale of
Squared response value
Error
balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum
Metrics
congue
Mean
MAE and RMSE are relative to the scale of the response
Absolute
Lorem ipsum
Error
Root
Mean Cannot compare models across different data and scale of
Squared response value
Error
balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum Mean
Absolute
Metrics
congue
Percentage
Error
Mean
Previous metrics do not clearly quantify how well the
Absolute
Lorem ipsum
balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum Mean
Absolute
Metrics
congue
Percentage
Error
Mean
Previous metrics do not clearly quantify how well the
Absolute
Lorem ipsum
balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum Mean
Absolute
Metrics
congue
Percentage
Error
Generally ranges between 0 and 1
R
Squared
Mean
Tends to increase when adding more explanatory
Absolute
Lorem ipsum
Error variables
Root
Mean
Squared Does not account for the value addition from the
Error
added explanatory variables
balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum Mean
Absolute
Metrics
congue
Percentage
Error
R
Squared
Mean
Tends to increase when adding more explanatory
Absolute
Lorem ipsum
Error variables
Root
Mean
Squared Does not account for the value addition from the
Error
added explanatory variables
balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum Mean
Absolute
Metrics
congue
Percentage
Error
Mean
Gives a sense of which variables actually help in
Absolute
Lorem ipsum
balajiagasthya@gmail.com
Lorem
Evaluation
KNI4QRHVD5 ipsum Mean
Absolute
Metrics
congue
Percentage
Error
R
Squared
Adjusted
R
Squared
Business Problem and Solution Space: Identifies the specific problem that
linear regression aims to solve and defines the scope of its application in business
contexts.
Evaluation Metrics for Regression: Covers key metrics such as Mean Squared
Error (MSE), R-squared, and others used to assess the accuracy and performance
of regression models in predicting outcomes.
Evaluate linear regression models using key metrics and implement strategies to
enhance model performance and accuracy.