[go: up one dir, main page]

0% found this document useful (0 votes)
9 views36 pages

Simple Linear Regression

The document discusses simple linear regression analysis, a statistical method used to estimate the relationship between a dependent variable and an independent variable. It covers the construction of regression models, the interpretation of regression coefficients, and the use of regression for description, control, and prediction. Additionally, it addresses the assumptions necessary for regression inference and the analysis of residuals to validate the model's appropriateness.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views36 pages

Simple Linear Regression

The document discusses simple linear regression analysis, a statistical method used to estimate the relationship between a dependent variable and an independent variable. It covers the construction of regression models, the interpretation of regression coefficients, and the use of regression for description, control, and prediction. Additionally, it addresses the assumptions necessary for regression inference and the analysis of residuals to validate the model's appropriateness.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 36

Applied Business Forecasting

and Planning

Simple Linear Regression


Simple Regression
 Simple regression analysis is a statistical tool That
gives us the ability to estimate the mathematical
relationship between a dependent variable (usually
called y) and an independent variable (usually
called x).
 The dependent variable is the variable for which
we want to make a prediction.
 While various non-linear forms may be used,
simple linear regression models are the most
common.
Introduction
• The primary goal of quantitative lot size Man-hours
analysis is to use current
information about a phenomenon
30 73
to predict its future behavior. 20 50
• Current information is usually in 60 128
the form of a set of data. 80 170
• In a simple case, when the data 40 87
form a set of pairs of numbers,
we may interpret them as
50 108
representing the observed values 60 135
of an independent (or predictor ) 30 69
variable X and a dependent ( or
response) variable Y. 70 148
60 132
Introduction
 The goal of the analyst Statistical relation between Lot size and Man-Hour

who studies the data is to


180

160

find a functional relation 140

y  f (x) 120

between the response


100

Man-Hour
80

variable y and the 60

predictor variable x. 40

20

0
0 10 20 30 40 50 60 70 80 90
Lot size
Regression Function
 The statement that the relation
between X and Y is statistical
should be interpreted as providing
the following guidelines:
1. Regard Y as a random variable.
2. For each X, take f (x) to be the
expected value (i.e., mean value) of
y.
3. Given that E (Y) denotes the
expected value of Y, call the
equation
E (Y )  f ( x)
the regression function.
Pictorial Presentation of Linear Regression
Model
Historical Origin of Regression
 Regression Analysis was
first developed by Sir
Francis Galton, who
studied the relation
between heights of sons
and fathers.
 Heights of sons of both
tall and short fathers
appeared to “revert” or
“regress” to the mean of
the group.
Construction of Regression Models
 Selection of independent variables
• Since reality must be reduced to manageable proportions whenever we
construct models, only a limited number of independent or predictor
variables can or should be included in a regression model. Therefore a
central problem is that of choosing the most important predictor variables.
 Functional form of regression relation
• Sometimes, relevant theory may indicate the appropriate functional form.
More frequently, however, the functional form is not known in advance
and must be decided once the data have been collected and analyzed.
 Scope of model
 In formulating a regression model, we usually need to restrict the
coverage of model to some interval or region of values of the independent
variables.
Uses of Regression Analysis
 Regression analysis serves Three major
purposes.
1. Description

2. Control

3. Prediction

 The several purposes of regression analysis


frequently overlap in practice
Formal Statement of the Model
 General regression model
Y  0  1 X  
1. 0, and 1 are parameters

2. X is a known constant
3. Deviations  are independent N(o, 2)
Meaning of Regression Coefficients
 The values of the regression parameters 0,
and 1 are not known.We estimate them
from data.
 1 indicates the change in the mean
response per unit increase in X.
Regression Line
 If the scatter plot of our sample data
suggests a linear relationship between two
variables i.e.
y  0  1 x

we can summarize the relationship by


drawing a straight line on the plot.
 Least squares method give us the “best”
estimated line for our set of sample data.
Regression Line
 We will write an estimated regression line
based on sample data as
yˆ b0  b1 x

 The method of least squares chooses the


values for b0, and b1 to minimize the sum of
squared errors
n n 2

SSE  ( yi  yˆ i ) 2  y  b0  b1 x 
i 1 i 1
Regression Line
 Using calculus, we obtain estimating
formulas: n n n n

 (x i  x )( yi  y ) n  xi yi  x y i i
b1  i 1
n
 i 1
n
i 1
n
i 1

 (x i  x) 2
n xi2  ( xi ) 2
or i 1 i 1 i 1

Sy
b1 r
Sx

b0  y  b1 x
Estimation of Mean Response
 Fitted regression line can be used to estimate the
mean value of y for a given value of x.
 Example
 The weekly advertising expenditure (x) and weekly
sales (y) are presented in the following table.
y x
1250 41
1380 54
1425 63
1425 54
1450 48
1300 46
1400 62
1510 61
1575 64
1650 71
Point Estimation of Mean Response
 From previous table we have:
n 10  x 564  x 326042

 y 14365  xy 818755
 The least squares estimates of the regression
coefficients are:
n xy  x y 10(818755)  (564)(14365) 10.8
b1 
n  x  ( x )
2 2
10(32604)  (564)2

b0 1436.5  10.8(56.4) 828


Point Estimation of Mean Response
 The estimated regression function is:
ŷ 828  10.8x
Sales 828  10.8 Expenditure

 This means that if the weekly advertising


expenditure is increased by $1 we would expect
the weekly sales to increase by $10.8.
Point Estimation of Mean Response
 Fitted values for the sample data are
obtained by substituting the x value into the
estimated regression function.
 For example if the advertising expenditure
is $50, then the estimated Sales is:
Sales 828  10.8(50) 1368
 This is called the point estimate (forecast)
of the mean response (sales).
Example:Retail sales and floor space
 It is customary in retail operations to asses the
performance of stores partly in terms of their
annual sales relative to their floor area (square
feet). We might expect sales to increase linearly as
stores get larger, with of course individual
variation among stores of the same size. The
regression model for a population of stores says
that
SALES = 0 + 1 AREA + 
Example:Retail sales and floor space
 The slope 1 is as usual a rate of change: it is the
expected increase in annual sales associated with
each additional square foot of floor space.
 The intercept 0 is needed to describe the line but
has no statistical importance because no stores
have area close to zero.
 Floor space does not completely determine sales.
The term  in the model accounts for difference
among individual stores with the same floor space.
A store’s location, for example, is important.
Residual
 The difference between the observed value
yi and the corresponding fitted value ŷi .
ˆi
ei  yi  y

 Residuals are highly useful for studying


whether a given regression model is
appropriate for the data at hand.
Example: weekly advertising expenditure
y x y-hat Residual (e)
1250 41 1270.8 -20.8
1380 54 1411.2 -31.2
1425 63 1508.4 -83.4
1425 54 1411.2 13.8
1450 48 1346.4 103.6
1300 46 1324.8 -24.8
1400 62 1497.6 -97.6
1510 61 1486.8 23.2
1575 64 1519.2 55.8
1650 71 1594.8 55.2
Estimation of the variance of the error
terms, 2
 The variance 2 of the error terms i in the
regression model needs to be estimated for
a variety of purposes.
 It gives an indication of the variability of the
probability distributions of y.
 It is needed for making inference concerning
regression function and the prediction of y.
Regression Standard Error
 To estimate  we work with the variance and take the
square root to obtain the standard deviation.
 For simple linear regression the estimate of 2 is the
average squared residual.

1 1
 i n 2 i i
2 2 2
s y. x  e  ( y  ˆ
y )
n 2
 To estimate  , use
2
 s 
s estimates the standard deviation
y. x s y . x  of the error term  in
the statistical model for simple linear regression.
Regression Standard Error
y x y-hat Residual (e) square(e)
1250 41 1270.8 -20.8 432.64
1380 54 1411.2 -31.2 973.44
1425 63 1508.4 -83.4 6955.56
1425 54 1411.2 13.8 190.44
1450 48 1346.4 103.6 10732.96
1300 46 1324.8 -24.8 615.04
1400 62 1497.6 -97.6 9525.76
1510 61 1486.8 23.2 538.24
1575 64 1519.2 55.8 3113.64
1650 71 1594.8 55.2 3047.04

y-hat = 828+10.8X total 36124.76


Sy.x 67.19818
Basic Assumptions of a Regression Model
 A regression model is based on the following
assumptions:
1. There is a probability distribution of Y for each
level of X.
2. Given that µy is the mean value of Y, the
standard form of the model is
 y  f (x)  
where  is a random variable with a normal
distribution with mean 0 and standard deviation .
Conditions for Regression Inference
 You can fit a least-squares line to any set of
explanatory-response data when both variables are
quantitative.
 If the scatter plot doesn’t show an approximately
linear pattern, the fitted line may be almost
useless.
Conditions for Regression Inference
 The simple linear regression model, which
is the basis for inference, imposes several
conditions.
 We should verify these conditions before
proceeding with inference.
 The conditions concern the population, but
we can observe only our sample.
Conditions for Regression Inference
 In doing Inference, we assume:
1. The sample is an SRS from the population.
2. There is a linear relationship in the population.
1. We can not observe the population , so we check the
scatter plot of the sample data.
3. The standard deviation of the responses about the
population line is the same for all values of the
explanatory variable.
1. The spread of observations above and below the least-
squares line should be roughly uniform as x varies.
Conditions for Regression Inference
 Plotting the residuals against the
explanatory variable is helpful in checking
these conditions because a residual plot
magnifies patterns.
Analysis of Residual
 To examine whether the regression model is
appropriate for the data being analyzed, we can
check the residual plots.
 Residual plots are:
 Plot a histogram of the residuals
 Plot residuals against the fitted values.
 Plot residuals against the independent variable.
 Plot residuals over time if the data are chronological.
Analysis of Residual
 A histogram of the residuals provides a check on the
normality assumption. A Normal quantile plot of the
residuals can also be used to check the Normality
assumptions.
 Regression Inference is robust against moderate lack of
Normality. On the other hand, outliers and influential
observations can invalidate the results of inference for
regression
 Plot of residuals against fitted values or the independent
variable can be used to check the assumption of
constant variance and the aptness of the model.
Analysis of Residual
 Plot of residuals against time provides a
check on the independence of the error
terms assumption.
 Assumption of independence is the most
critical one.
Residual plots
 The residuals should
have no systematic Degree Days Residual Plot
pattern.
1
 The residual plot to 0.5
right shows a scatter

Residuals
0
of the points with no 0 20 40 60
individual -0.5

observations or -1
Degree Days
systematic change as x
increases.
Residual plots
 The points in this
residual plot have a
curve pattern, so a
straight line fits poorly
Residual plots
 The points in this plot
show more spread for
larger values of the
explanatory variable x,
so prediction will be
less accurate when x is
large.

You might also like