10 11 Simple Linear Regression
10 11 Simple Linear Regression
WEEK10&11-2
Introduction:
Correlation vs. Regression
❑ Correlation analysis is used to measure strength of the
linear association between any two variables
▪ Correlation can only show the strength of the linear
association between any two variables
▪ No causal effect is implied with correlation analysis, i.e. It
does not tell either X affects Y, or Y affects X
▪ Interpretation of correlation coefficient is purely statistical
▪ Correlation was first presented in Chapter 1 then in Chapter 8
Dependent variable, Y:
▪ the variable we wish to explain
▪ E.g. Consumption
Independent variable, X:
▪ the variable used to explain the dependent variable
▪ Eg. Income WEEK10&11-4
Simple Linear Regression
Model
❑ The Simplest form of regression model
❑ Only ONE (1) independent variable, X
❑ Relationship between X and Y is described by a linear
function
❑ Changes in Y are assumed to be caused by changes in
X
❑ We always assumed X (independent variables)
affects Y (dependent variable) and not the other
way
Y Causes X
WEEK10&11-5
Simple Linear Regression
Model
The population regression model:
Population
Population Slope Independent
Dependent Random
Y-intercept Coefficient Variable
Variable Error terms
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖
Linear Component Random Error Component
WEEK10&11-6
Simple Linear Regression Model:
Graphical Illustration
WEEK10&11-8
Estimated Simple Linear
Regression Model
The simple linear regression equation provides an estimate
of the population regression line:
Note that the individual random error terms 𝜀𝑖 are hidden from the
model because it have a mean of zero, E(𝜀)Ƹ = 0
WEEK10&11-9
Estimated Simple Linear
Regression Model:
Interpretation of 𝛽መ0 and 𝛽መ1
𝛽መ0 is the estimated average value of Y when the
value of X is zero.
WEEK10&11-10
Estimated Simple Linear
Regression Model (continued)
Y Y i = 0 + 1 Xi
Observed Value of
Y for Xi
εi Slope = β1
Intercept = β0
Xi
X
WEEK10&11-11
Types of Relationships
between X and Y
✓ Y
Linear relationships
Y
Curvilinear relationships
+ve
X X
Y Y
-ve
X X
WEEK10&11-12
Types of Relationships
between X and Y
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
WEEK10&11-13
Types of Relationships
between X and Y
No relationship
WEEK10&11-14
Estimated Simple Linear
Regression Model
WEEK10&11-15
Residual Analysis
▪ Residual Analysis helps you to determine whether the
regression model that has been selected is appropriate to be
used by assessing these assumptions visually.
▪ The residual for observation is the difference between the
observed value and predicted value of Y.
𝜀𝑖Ƹ = 𝑌𝑖 − 𝑌𝑖
▪ Check the assumptions of regression by examining the residuals
▪ Examine for linearity assumption
▪ Examine for constant variance for all levels of X (homoscedasticity)
▪ Evaluate normal distribution assumption
▪ Evaluate independence assumption
▪ Using Graphical Analysis of Residuals to determine each of the
assumptions
▪ Can plot residuals vs. X
WEEK10&11-16
Assumptions of Simple Linear
Regression Model
After we obtain the estimated regression equation and
before we analyze the result, we must make sure that
❑Normality of Error
❑ Error is normally distributed for any given value of X
❑Zero mean value of Error
❑ The average value of error terms is zero
❑Homoscedasticity
❑ The probability distribution of the errors has constant variance
❑Independence of Error
❑ Error values are statistically independent
❑Independence of Error and independent variable
❑ There is no relationship between error and independent variable
(X).
WEEK10&11-17
Normality of Error
WEEK10&11-18
Error is normally distributed with mean is
equal to zero and constant variance
(Homoscedasticity)
x x
residuals
x residuals
x
Non-constant variance
✓ Constant variance
WEEK10&11-19
Independence of Errors
Residual
✓
Observation
Residual
✓
Observation
WEEK10&11-20
Independence of Error and
X variable
Not Independent
✓ Independent
residuals
residuals
X
residuals
WEEK10&11-21
Simple Linear Regression
Analysis: An Example
WEEK10&11-22
Run Regression Model Using Excel
Data / Data Analysis / Regression
WEEK10&11-23
Simple Linear Regression
Analysis: Excel Output
Regression Statistics
The regression equation is:
Multiple R 0.8531
R Square 0.7278
Food Expenditure = − 723.4509 + 0.4627(Income)
Adjusted R Square 0.7181
Standard Error 65.0494
Observations 30
ANOVA
df SS MS F Significance F
Regression 1 316755.4866 316755.5 74.85783 2.12377E-09
Residual 28 118479.9801 4231.428
Total 29 435235.4667
WEEK10&11-24
Simple Linear Regression Equation:
Interpretation of the Intercept, 𝛽መ0
Food Expenditure = − 723.4509 + 0.4627(Income)
𝛽መ0 is the estimated average value of Y when the value of X is
zero (if X = 0 is in the range of observed X values)
WEEK10&11-25
Simple Linear Regression Equation:
Estimated Slope Coefficient, 𝛽መ1
Food Expenditure = − 723.4509 + 0.4627(Income)
𝛽መ1 measures the estimated change in the average value of Y as a
result of a one-unit change in X
WEEK10&11-26
Simple Linear Regression Equation:
Prediction on Y
Food Expenditure = − 723.4509 + 0.4627(Income)
= −723.4509 + 0.4627(2000)
= 201.9491
WEEK10&11-27
Measures of Variation for Estimated
Simple Linear Regression Model
Total variation is made up of two parts:
SST = ( Yi − Y ) 2
SSR = ( Ŷi − Y ) 2
SSE = ( Yi − Ŷi )2
where:
𝑌ത = Average value of the dependent variable
𝑌 = Observed values of the dependent variable
𝑌𝑖 = Predicted value of Y for the given Xi value
WEEK10&11-28
Measures of Variation for Estimated
Simple Linear Regression Model
WEEK10&11-29
Measures of Variation for Estimated
Simple Linear Regression Model
Excel Output:
ANOVA
df SS
Regression 1 316755.4866
Residual 28 118479.9801
Total 29 435235.4667
WEEK10&11-30
Coefficient of Determination, r2
0 r 1
2
WEEK10&11-31
Examples of Approximate r2 Values
r2 = 1
Perfect linear relationship
X between X and Y:
r2 = 1
Y 100% of the variation in Y is
explained by variation in X
X
r2 = 1
WEEK10&11-32
Examples of Approximate r2 Values
Y
0 < r2 < 1
Weaker linear relationships
between X and Y:
X
Some but not all of the
Y
variation in Y is explained
by variation in X
WEEK10&11-33
Examples of Approximate r2 Values
Y
r2 = 0
No linear relationship
between X and Y:
WEEK10&11-34
Coefficient of Determination, r:
2
Excel output
Regression Statistics 𝑆𝑆𝑅 316755.4866
Multiple R 0.8531 𝑟2 = = = 0.7278
R Square 0.7278 𝑆𝑆𝑇 435235.4667
Adjusted R Square 0.7181
Standard Error 65.0494
Observations 30
ANOVA
df SS MS F Significance F
Regression 1 316755.4866 316755.5 74.85783 2.12377E-09
Residual 28 118479.9801 4231.428
Total 29 435235.4667
WEEK10&11-35
Coefficient of Determination, r:
2
Interpretation
𝑟 2 = 0.7278
OR
WEEK10&11-36
Standard Error of Estimate
SSE
SYX = = MSE
n − k −1
Where
SSE = error sum of squares
n = sample size
k = no. of independent variable (X)
WEEK10&11-37
Standard Error of Estimate:
Excel output
Regression Statistics
Multiple R
R Square
0.8531
0.7278
S YX = 65.0494
Adjusted R Square 0.7181
Standard Error 65.0494
Observations 30
ANOVA
df SS MS F Significance F
Regression 1 316755.4866 316755.5 74.85783 2.12377E-09
Residual 28 118479.9801 4231.428
Total 29 435235.4667
WEEK10&11-38
Comparing Standard Errors of
Estimate
X X
small sYX large sYX
WEEK10&11-39
Inference about the Slope:
Individual t-test
Is there a linear relationship between X and Y?
❑ t-test for a population slope
❑ Null and alternative hypotheses:
▪ H0: β1 = 0 (no linear relationship)
▪ H1: β1 ≠ 0 (linear relationship does exist)
❑ Test statistic: where:
1 − β1
t=
1 = regression slope coefficient
𝛽
𝛽1 = hypothesized slope
S
1 𝑆𝛽1 = standard error of the slope
d.f. = n − k − 1
WEEK10&11-40
t Test for Significance of
Independent variable: Example
Is there evidence of a linear relationship between
monthly income and monthly food expenditure at the
0.05 level of significance?
Step 1
H0: β1 = 0 Step 2
H1: β1 ≠ 0 Significance level: a = 0.05
Step 5
Test statistics:
1 − 𝛽1 0.4627 − 0
𝛽
𝑡= = = 8.652
𝑆𝐸𝛽1 0.0535
WEEK10&11-42
t Test for Significance of
Independent variable: Example
Decision making: d.f. = 30-2
Reject H0 since =28
t = 8.652 > 2.048 a/2=.025 a/2=.025
Step 6
WEEK10&11-43
t Test for Significance of
Independent variable: Example
Additional Step: p-value = 2.12E-09
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -723.4509 120.7403 -5.9918 1.87E-06 -970.7762 -476.1256
Monthly Income (RM) 0.4627 0.0535 8.6520 2.12E-09 0.3531 0.5722
Decision making:
Since P-value (2.12 ×10-9) < α (0.05), so we can
reject H0.
WEEK10&11-44
Confidence Interval Estimation
for
From Excel output:
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -723.4509 120.7403 -5.9918 1.87E-06 -970.7762 -476.1256
Monthly Income (RM) 0.4627 0.0535 8.6520 2.12E-09 0.3531 0.5722
Interpretation:
We are 95% confidence to conclude that for an additional RM1 in
monthly income, the monthly food expenditure will increase in
between RM0.3531 and RM0.5722.
Interpretation:
We are 95% confident to conclude that for an additional
RM1 in monthly income, the monthly food expenditure
will increase in between RM0.3531 and RM0.5722.
WEEK10&11-46
Summary
WEEK10&11-47
Summary
WEEK10&11-48