Applied Business Forecasting
and Planning
Simple Linear Regression
Simple Regression
Simple regression analysis is a statistical tool That
gives us the ability to estimate the mathematical
relationship between a dependent variable (usually
called y) and an independent variable (usually
called x).
The dependent variable is the variable for which
we want to make a prediction.
While various non-linear forms may be used,
simple linear regression models are the most
common.
Introduction
• The primary goal of quantitative lot size Man-hours
analysis is to use current
information about a phenomenon
30 73
to predict its future behavior. 20 50
• Current information is usually in 60 128
the form of a set of data. 80 170
• In a simple case, when the data 40 87
form a set of pairs of numbers,
we may interpret them as
50 108
representing the observed values 60 135
of an independent (or predictor ) 30 69
variable X and a dependent ( or
response) variable Y. 70 148
60 132
Introduction
The goal of the analyst Statistical relation between Lot size and Man-Hour
who studies the data is to
180
160
find a functional relation 140
y f (x) 120
between the response
100
Man-Hour
80
variable y and the 60
predictor variable x. 40
20
0
0 10 20 30 40 50 60 70 80 90
Lot size
Regression Function
The statement that the relation
between X and Y is statistical
should be interpreted as providing
the following guidelines:
1. Regard Y as a random variable.
2. For each X, take f (x) to be the
expected value (i.e., mean value) of
y.
3. Given that E (Y) denotes the
expected value of Y, call the
equation
E (Y ) f ( x)
the regression function.
Pictorial Presentation of Linear Regression
Model
Historical Origin of Regression
Regression Analysis was
first developed by Sir
Francis Galton, who
studied the relation
between heights of sons
and fathers.
Heights of sons of both
tall and short fathers
appeared to “revert” or
“regress” to the mean of
the group.
Construction of Regression Models
Selection of independent variables
• Since reality must be reduced to manageable proportions whenever we
construct models, only a limited number of independent or predictor
variables can or should be included in a regression model. Therefore a
central problem is that of choosing the most important predictor variables.
Functional form of regression relation
• Sometimes, relevant theory may indicate the appropriate functional form.
More frequently, however, the functional form is not known in advance
and must be decided once the data have been collected and analyzed.
Scope of model
In formulating a regression model, we usually need to restrict the
coverage of model to some interval or region of values of the independent
variables.
Uses of Regression Analysis
Regression analysis serves Three major
purposes.
1. Description
2. Control
3. Prediction
The several purposes of regression analysis
frequently overlap in practice
Formal Statement of the Model
General regression model
Y 0 1 X
1. 0, and 1 are parameters
2. X is a known constant
3. Deviations are independent N(o, 2)
Meaning of Regression Coefficients
The values of the regression parameters 0,
and 1 are not known.We estimate them
from data.
1 indicates the change in the mean
response per unit increase in X.
Regression Line
If the scatter plot of our sample data
suggests a linear relationship between two
variables i.e.
y 0 1 x
we can summarize the relationship by
drawing a straight line on the plot.
Least squares method give us the “best”
estimated line for our set of sample data.
Regression Line
We will write an estimated regression line
based on sample data as
yˆ b0 b1 x
The method of least squares chooses the
values for b0, and b1 to minimize the sum of
squared errors
n n 2
SSE ( yi yˆ i ) 2 y b0 b1 x
i 1 i 1
Regression Line
Using calculus, we obtain estimating
formulas: n n n n
(x i x )( yi y ) n xi yi x y i i
b1 i 1
n
i 1
n
i 1
n
i 1
(x i x) 2
n xi2 ( xi ) 2
or i 1 i 1 i 1
Sy
b1 r
Sx
b0 y b1 x
Estimation of Mean Response
Fitted regression line can be used to estimate the
mean value of y for a given value of x.
Example
The weekly advertising expenditure (x) and weekly
sales (y) are presented in the following table.
y x
1250 41
1380 54
1425 63
1425 54
1450 48
1300 46
1400 62
1510 61
1575 64
1650 71
Point Estimation of Mean Response
From previous table we have:
n 10 x 564 x 326042
y 14365 xy 818755
The least squares estimates of the regression
coefficients are:
n xy x y 10(818755) (564)(14365) 10.8
b1
n x ( x )
2 2
10(32604) (564)2
b0 1436.5 10.8(56.4) 828
Point Estimation of Mean Response
The estimated regression function is:
ŷ 828 10.8x
Sales 828 10.8 Expenditure
This means that if the weekly advertising
expenditure is increased by $1 we would expect
the weekly sales to increase by $10.8.
Point Estimation of Mean Response
Fitted values for the sample data are
obtained by substituting the x value into the
estimated regression function.
For example if the advertising expenditure
is $50, then the estimated Sales is:
Sales 828 10.8(50) 1368
This is called the point estimate (forecast)
of the mean response (sales).
Example:Retail sales and floor space
It is customary in retail operations to asses the
performance of stores partly in terms of their
annual sales relative to their floor area (square
feet). We might expect sales to increase linearly as
stores get larger, with of course individual
variation among stores of the same size. The
regression model for a population of stores says
that
SALES = 0 + 1 AREA +
Example:Retail sales and floor space
The slope 1 is as usual a rate of change: it is the
expected increase in annual sales associated with
each additional square foot of floor space.
The intercept 0 is needed to describe the line but
has no statistical importance because no stores
have area close to zero.
Floor space does not completely determine sales.
The term in the model accounts for difference
among individual stores with the same floor space.
A store’s location, for example, is important.
Residual
The difference between the observed value
yi and the corresponding fitted value ŷi .
ˆi
ei yi y
Residuals are highly useful for studying
whether a given regression model is
appropriate for the data at hand.
Example: weekly advertising expenditure
y x y-hat Residual (e)
1250 41 1270.8 -20.8
1380 54 1411.2 -31.2
1425 63 1508.4 -83.4
1425 54 1411.2 13.8
1450 48 1346.4 103.6
1300 46 1324.8 -24.8
1400 62 1497.6 -97.6
1510 61 1486.8 23.2
1575 64 1519.2 55.8
1650 71 1594.8 55.2
Estimation of the variance of the error
terms, 2
The variance 2 of the error terms i in the
regression model needs to be estimated for
a variety of purposes.
It gives an indication of the variability of the
probability distributions of y.
It is needed for making inference concerning
regression function and the prediction of y.
Regression Standard Error
To estimate we work with the variance and take the
square root to obtain the standard deviation.
For simple linear regression the estimate of 2 is the
average squared residual.
1 1
i n 2 i i
2 2 2
s y. x e ( y ˆ
y )
n 2
To estimate , use
2
s
s estimates the standard deviation
y. x s y . x of the error term in
the statistical model for simple linear regression.
Regression Standard Error
y x y-hat Residual (e) square(e)
1250 41 1270.8 -20.8 432.64
1380 54 1411.2 -31.2 973.44
1425 63 1508.4 -83.4 6955.56
1425 54 1411.2 13.8 190.44
1450 48 1346.4 103.6 10732.96
1300 46 1324.8 -24.8 615.04
1400 62 1497.6 -97.6 9525.76
1510 61 1486.8 23.2 538.24
1575 64 1519.2 55.8 3113.64
1650 71 1594.8 55.2 3047.04
y-hat = 828+10.8X total 36124.76
Sy.x 67.19818
Basic Assumptions of a Regression Model
A regression model is based on the following
assumptions:
1. There is a probability distribution of Y for each
level of X.
2. Given that µy is the mean value of Y, the
standard form of the model is
y f (x)
where is a random variable with a normal
distribution with mean 0 and standard deviation .
Conditions for Regression Inference
You can fit a least-squares line to any set of
explanatory-response data when both variables are
quantitative.
If the scatter plot doesn’t show an approximately
linear pattern, the fitted line may be almost
useless.
Conditions for Regression Inference
The simple linear regression model, which
is the basis for inference, imposes several
conditions.
We should verify these conditions before
proceeding with inference.
The conditions concern the population, but
we can observe only our sample.
Conditions for Regression Inference
In doing Inference, we assume:
1. The sample is an SRS from the population.
2. There is a linear relationship in the population.
1. We can not observe the population , so we check the
scatter plot of the sample data.
3. The standard deviation of the responses about the
population line is the same for all values of the
explanatory variable.
1. The spread of observations above and below the least-
squares line should be roughly uniform as x varies.
Conditions for Regression Inference
Plotting the residuals against the
explanatory variable is helpful in checking
these conditions because a residual plot
magnifies patterns.
Analysis of Residual
To examine whether the regression model is
appropriate for the data being analyzed, we can
check the residual plots.
Residual plots are:
Plot a histogram of the residuals
Plot residuals against the fitted values.
Plot residuals against the independent variable.
Plot residuals over time if the data are chronological.
Analysis of Residual
A histogram of the residuals provides a check on the
normality assumption. A Normal quantile plot of the
residuals can also be used to check the Normality
assumptions.
Regression Inference is robust against moderate lack of
Normality. On the other hand, outliers and influential
observations can invalidate the results of inference for
regression
Plot of residuals against fitted values or the independent
variable can be used to check the assumption of
constant variance and the aptness of the model.
Analysis of Residual
Plot of residuals against time provides a
check on the independence of the error
terms assumption.
Assumption of independence is the most
critical one.
Residual plots
The residuals should
have no systematic Degree Days Residual Plot
pattern.
1
The residual plot to 0.5
right shows a scatter
Residuals
0
of the points with no 0 20 40 60
individual -0.5
observations or -1
Degree Days
systematic change as x
increases.
Residual plots
The points in this
residual plot have a
curve pattern, so a
straight line fits poorly
Residual plots
The points in this plot
show more spread for
larger values of the
explanatory variable x,
so prediction will be
less accurate when x is
large.