Open navigation menu

Scribd

0% found this document useful (0 votes)

21 views54 pages

Week 2

The document covers regression modeling concepts, including the estimated regression function, error term variance estimation, normal error regression models, and the analysis of variance (ANOVA) approach. Key topics include properties of fitted regression lines, residuals, and the F test for hypothesis testing. An example using R demonstrates the application of these concepts with a dataset from the Toluca Company.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views54 pages

Week 2

The document covers regression modeling concepts, including the estimated regression function, error term variance estimation, normal error regression models, and the analysis of variance (ANOVA) approach. Key topics include properties of fitted regression lines, residuals, and the F test for hypothesis testing. An example using R demonstrates the application of these concepts with a dataset from the Toluca Company.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Regression Modelling

Week 2

Week 2 Regression Modelling 1 / 53

1 Estimated regression function (Ch 1.6)

2 Estimation of error terms variance σ 2 (Ch 1.7)

3 Normal error regression model (Ch 1.8)

4 Analysis of variance approach (ANOVA) (Ch 2.7)

5 An example in R

Week 2 Regression Modelling 2 / 53

Section 1

Estimated regression function (Ch 1.6)

Week 2 Regression Modelling 3 / 53

Estimated regression function

Ŷ = b0 + b1 X , where
Sxy
b1 = , b0 = Ȳ − b1 X̄
Sxx

The fitted value for the ith case: Ŷi

The observed value for the ith case: Yi

Week 2 Regression Modelling 4 / 53

Residuals

The ith residual is the difference between the observed value Yi and
the corresponding fitted value Ŷi .
ei = Yi − Ŷi
For our model, the residuals becomes

ei = Yi − (b0 + b1 Xi)

Week 2 Regression Modelling 5 / 53

Residuals

Do not confuse
i = Yi − E (Yi ) "Model error"
ei = Yi − Ŷi "Residual"

i : deviation from the unknown true regression line

ei : deviation from the estimated regression line

Week 2 Regression Modelling 6 / 53

Residuals

Week 2 Regression Modelling 7 / 53

Properties of fitted regression line

Pn
1 The sum of residuals is zero: i=1 ei = 0.

Week 2 Regression Modelling 8 / 53

Properties of fitted regression line

Pn 2
2. The sum of squared residuals is a minimum: i=1 ei .

Week 2 Regression Modelling 9 / 53

Properties of fitted regression line

3. The sum of the observed values Yi equals the sum of the fitted values
Ŷi .
n
X n
X
Yi = Ŷi
i=1 i=1

Week 2 Regression Modelling 10 / 53

Properties of fitted regression line

4. The sum of the weighted residuals is zero when the residual in the ith
trial is weighted by the level of the predictor variable in the ith trial:
n
X
Xi ei = 0
i=1

Week 2 Regression Modelling 11 / 53

Properties of fitted regression line

5. the sum ofthe weighted residuals is zero when the residual in the ith
trial is weighted by the fitted value ofthe response variable for the ith
trial:
n
X
Ŷi ei = 0
i=1

Week 2 Regression Modelling 12 / 53

Properties of fitted regression line

6. The regression line always goes through the point (X̄ , Ȳ )

Week 2 Regression Modelling 13 / 53

Section 2

Estimation of error terms variance σ 2 (Ch 1.7)

Week 2 Regression Modelling 14 / 53

Estimation of error terms variance σ 2 (Ch 1.7)

Recall: Estimate σ 2 for a single population

Pn
2 − Ȳi )
i=1 (Yi
s =
n−1

Week 2 Regression Modelling 15 / 53

Estimation of error terms variance σ 2

Estimate σ 2 for the regression model

Var (Yi ) = Var (i ) = σ 2

Use ei = Yi − Ŷi
Pn 2
i=1 ei
Var (ei ) =
DF

Week 2 Regression Modelling 16 / 53

Estimation of error terms variance σ 2

SSE: Residual sum of square

MSE: Residual mean square

Pn
2 SSE − Ŷi )2
i=1 (Yi
s = MSE = =
n−2 n−2
√
s= MSE
It can be shown that MSE is an unbiased estimator of σ 2 .

E (MSE ) = σ 2

Week 2 Regression Modelling 17 / 53

Section 3

Normal error regression model (Ch 1.8)

Week 2 Regression Modelling 18 / 53

Normal error regression model (Ch 1.8)

Yi = β0 + β1 Xi + i
No matter what form of the distribution of i , the least squares
method provides unbiased point estimators of β0 and β1 that have
minimum variance among all unbiased linear estimators.
To set up interval estimates and make tests, we assume:
The error i ∼iid N (0, σ 2 )

Week 2 Regression Modelling 19 / 53

Section 4

Analysis of variance approach (ANOVA) (Ch 2.7)

Week 2 Regression Modelling 20 / 53

Analysis of variance approach (ANOVA) (Ch 2.7)

We consider the regression analysis from the perspective of analysis of

variance
Useful in multiple regression

Week 2 Regression Modelling 21 / 53

Partitioning of total sum of squares

The analysis of variance approach is based on the partitioning of sums of

squares and degrees of freedom associated with the response variable Y .
Consider one single random variable Y

Week 2 Regression Modelling 22 / 53

Partitioning of total sum of squares

Now consider in a linear regression model, where Y is related to X

Week 2 Regression Modelling 23 / 53

Partitioning of total sum of squares

Total sum of squares

n
X
SSTO = (Yi − Ȳ )2
i=1

Regression sum of squares

n
X
SSR = (Ŷi − Ȳ )2
i=1

Error (Residual) sum of squares

n
X
SSE = (Yi − Ŷi )2
i=1

Week 2 Regression Modelling 24 / 53

Formal development of partitioning

We can see easily:

Yi − Ȳ = Ŷi − Ȳ + Yi − Ŷi

1 The deviation of the fitted value Ŷi around the mean Ȳ

2 The deviation of the observation Yi around the fitted line Ŷi
The sum of squares also have the same relationship
n
X n
X n
X
2 2
(Yi − Ȳ ) = (Ŷi − Ȳ ) + (Yi − Ŷi )2 ,
i=1 i=1 i=1

or

SSTO = SSR + SSE

Week 2 Regression Modelling 25 / 53

Formal development of partitioning

Week 2 Regression Modelling 26 / 53

Breakdown of degrees of freedom

SSTO has n − 1 degrees of freedom

SSE has n − 2 degrees of freedom

SSR has 1 degree of freedom

n − 1 = (n − 2) + 1

Week 2 Regression Modelling 27 / 53

Mean squares
A sum of squares devided by its associated degrees of freedom is called a
mean square
Sample variance of Y is a mean square

Regression mean square

SSR
MSR = = SSR
1

Error (residual) mean square

SSE
MSE =
n−2

Mean squares are not additive

Week 2 Regression Modelling 28 / 53
ANOVA table

Source of variation SS df MS
SSR
SSR = (Ŷi − Ȳ )2
P
Regression 1 MSR = 1

SSE
(Ŷi − Ȳ )2
P
Error SSE= n-2 MSE = n−2

(Yi − Ȳ )2
P
Total SSTO= n-1

Week 2 Regression Modelling 29 / 53

Expected mean squares

MSE and MSR are random variables, we have

E (MSE ) = σ 2
E (MSR) = σ 2 + β12 Sxx

When β1 = 0, the means of the sampling distribution of MSE and

MSR are the same;
When β1 6= 0, the mean of the sampling distribution of MSR is larger
than MSE.
Comparing MSR and MSE should be useful for testing if β1 = 0.

Week 2 Regression Modelling 30 / 53

F test

To test

H0 : β1 = 0
Ha : β1 6= 0

We can use the test statistic:

MSR
F∗ =
MSE

What’s the distribution of F ∗ under the null hypothesis?

Week 2 Regression Modelling 31 / 53

F test

It can be proved that when β1 = 0

SSE
σ2
is distributed as χ2n−2 ;
SSR
σ2
is distributed as χ21 .
We also know that
With two independent χ2 distributed random variables Z1 and Z2 , with
degrees of freedome df1 and df2 , the ratio

Z1 /df1
Z2 /df2

will follow an F distribution with (df1 , df2 ) degrees of freedom.

Week 2 Regression Modelling 32 / 53

F test - decision rule

This is an upper-tail test. (why?)

With a significance level α, we reject H0 when

F ∗ > F (1 − α; 1, n − 2),

where F (1 − α; 1, n − 2) is the (1 − α)100 percentile of the F

distribution.

Week 2 Regression Modelling 33 / 53

Coefficient of determination (R 2 )

The coefficient of determination

SSR
R2 =
SSTO

measures the proportion of total variation in Y that can be explained

by the fitted regression model
0 ≤ R2 ≤ 1
In SLR, R 2 = r 2 , where r is the coefficient of correlation.

Week 2 Regression Modelling 34 / 53

Section 5

An example in R

Week 2 Regression Modelling 35 / 53

Toluca Company example from the textbook

Table 1.1 page 19

Use dataset from the R package “ALSM”
Or download from Wattle “Kutner Textbook Datasets”, file named
“CH01TA01.txt”
# install.packages("ALSM")
library("ALSM")
mydata <- TolucaCompany
# mydata <- read.table("CH01TA01.txt")
# need to put the data file into your working directory first
X <- mydata[,1]
Y <- mydata[,2]

X = “Lot Size” and Y = “Hours Worked”

Week 2 Regression Modelling 36 / 53

Scatter plot
#plot(mydata)
plot(X,Y, col="red", pch=17, xlab="Lot size", cex.lab=1.5,
ylab = "Work hours", main = "Toluca Company")
Toluca Company
500
400
Work hours
300
200
100

20 40 60 80 100 120

Lot size
Week 2 Regression Modelling 37 / 53
Summary statistics

summary(mydata)

## x y
## Min. : 20 Min. :113.0
## 1st Qu.: 50 1st Qu.:224.0
## Median : 70 Median :342.0
## Mean : 70 Mean :312.3
## 3rd Qu.: 90 3rd Qu.:389.0
## Max. :120 Max. :546.0

Week 2 Regression Modelling 38 / 53

Summary statistics
boxplot(mydata)
500
400
300
200
100
0

x y
Week 2 Regression Modelling 39 / 53
Fit the SLM manually

Recall we have

(Xi − X̄ )(Yi − Ȳ )
P
Sxy
b1 = =
Sxx (Xi − X̄ )2
P

b0 = Ȳ − b1 X̄

Xbar <- mean(X)

Ybar <- mean(Y)

Week 2 Regression Modelling 40 / 53

Fit the SLM manually

Xcenter <- X - Xbar

Ycenter <- Y - Ybar
Sxy <- crossprod(Xcenter, Ycenter)
# can also use
# Sxy <- sum(Xcenter*Ycenter)
# Sxy <- t(Xcenter)%*%Ycenter
Xcenter

## [1] 10 -40 -20 20 0 -10 50 10 30 -20 -30 0 20 -5

## [18] -20 20 40 -40 20 -30 10 0
Sxy

## [,1]
## [1,] 70690
# You can calculate Sxx similarly

Week 2 Regression Modelling 41 / 53

Fit the SLM manually
Sxx <- crossprod(Xcenter)
# Sxy <- sum(Xcenter^2)
Sxx

## [,1]
## [1,] 19800
b1 <- Sxy/Sxx
b0 <- Ybar - b1*Xbar
b0

## [,1]
## [1,] 62.36586
b1

## [,1]
## [1,] 3.570202
Week 2 Regression Modelling 42 / 53
Fit the SLM manually

Another way to calculate b1 ,

rxy sy
b1 = ,
sx
where rxy is the sample correlation between X and Y .

Week 2 Regression Modelling 43 / 53

Fit the SLM manually

b1 <- cor(X, Y)*sd(Y)/sd(X)

b1

## [1] 3.570202

Week 2 Regression Modelling 44 / 53

Fitting with “lm” function

mymodel <- lm(Y ~ X)

# without intercept: lm(Y ~ X -1)
# without slope: lm(Y ~ 1)
summary(mymodel)

Week 2 Regression Modelling 45 / 53

Fitting with “lm” function
##
## Call:
## lm(formula = Y ~ X)
##
## Residuals:
## Min 1Q Median 3Q Max
## -83.876 -34.088 -5.982 38.826 103.528
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.366 26.177 2.382 0.0259 *
## X 3.570 0.347 10.290 4.45e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 48.82 on 23 degrees of freedom
## Multiple R-squared: 0.8215, Adjusted R-squared: 0.8138
## F-statistic: 105.9 on 1 and 23 DF, p-value: 4.449e-10
Week 2 Regression Modelling 46 / 53
The estimated regression line
Ŷi = 62.366 + 3.570Xi
plot(X,Y, pch = 16)
abline(mymodel, col="purple", lty=2, cex=1.5, lwd=2)
500
400
Y

300
200
100

20 40 60 80 100 120

Week 2 Regression Modelling 47 / 53

Fitted Y values

Yhat <- b0 + b1*X

Yfit <- mymodel$fitted.values
round(Yhat)

## [1] 348 169 241 384 312 277 491 348 419 241 205 312 384 13
## [18] 241 384 455 169 384 205 348 312
round(Yfit)

## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 348 169 241 384 312 277 491 348 419 241 205 312 384 134 455
## 19 20 21 22 23 24 25
## 384 455 169 384 205 348 312

Week 2 Regression Modelling 48 / 53

Residuals
Res <- Y - Yhat
Res <- mymodel$residuals
SSE <- sum(Res^2)
SSE

## [1] 54825.46
n = length(Y)
MSE <- SSE/(n-2)
MSE

## [1] 2383.716
# Estimate sigma
sigma_hat = sqrt(MSE)
sigma_hat

## [1] 48.82331
# This is also called "Residual standard error"
Week 2 Regression Modelling 49 / 53
ANOVA - manually

# Total sum of squares

SSTO <- sum((Y - Ybar)^2)
SSTO

## [1] 307203
# Regression sum of squares
SSR <- sum((Yhat - Ybar)^2)
SSR

## [1] 252377.6
SSTO-SSR

## [1] 54825.46
# Regression mean square
MSR <- SSR/1

Week 2 Regression Modelling 50 / 53

ANOVA - F test

Fstat <- MSR/MSE

Fstat

## [1] 105.8757
critical <- qf(0.95, 1, n-2)
critical

## [1] 4.279344
pvalue <- 1- pf(Fstat, 1, n-2)
pvalue

## [1] 4.448828e-10

Week 2 Regression Modelling 51 / 53

ANOVA - by R function

anova(mymodel)

## Analysis of Variance Table

##
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F)
## X 1 252378 252378 105.88 4.449e-10 ***
## Residuals 23 54825 2384
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Week 2 Regression Modelling 52 / 53

Coefficient of Determination

SSR
R2 = SSTO
Rsqr <- SSR/SSTO
Rsqr

## [1] 0.8215335
Look at the summary output
Check with coefficient of correlation
cor(X, Y)

## [1] 0.9063848
cor(X, Y)^2

## [1] 0.8215335

Week 2 Regression Modelling 53 / 53

You might also like

Regression Analysis & Model Estimation
No ratings yet
Regression Analysis & Model Estimation
66 pages
Regression Analysis and Multiple Regression: Session 7
No ratings yet
Regression Analysis and Multiple Regression: Session 7
100 pages
Linear Regresssion
No ratings yet
Linear Regresssion
29 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Stat 136 Chapter 4 Least Squares Theory and Analysis of Variance
No ratings yet
Stat 136 Chapter 4 Least Squares Theory and Analysis of Variance
34 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Lecture 6
No ratings yet
Lecture 6
33 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
Linear Models
No ratings yet
Linear Models
92 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
CH 2
No ratings yet
CH 2
31 pages
Regression Analysis Basics
No ratings yet
Regression Analysis Basics
56 pages
Week-3 Annotated
No ratings yet
Week-3 Annotated
73 pages
12 W12NSE6220 - Fall 2023 - Zeng
No ratings yet
12 W12NSE6220 - Fall 2023 - Zeng
44 pages
Regression Analysis Essentials
No ratings yet
Regression Analysis Essentials
55 pages
Advanced ANOVA Techniques
No ratings yet
Advanced ANOVA Techniques
35 pages
9 W9INSE6220 Fall 2023
No ratings yet
9 W9INSE6220 Fall 2023
42 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
Chapter 7 - New 1
No ratings yet
Chapter 7 - New 1
29 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Linear Regression Full Version
No ratings yet
Linear Regression Full Version
34 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
14 pages
Chap01-3 (Autosaved)
No ratings yet
Chap01-3 (Autosaved)
51 pages
Linear Regression Analysis Guide
No ratings yet
Linear Regression Analysis Guide
58 pages
Simple and Multiple Linear Regression Guide
No ratings yet
Simple and Multiple Linear Regression Guide
12 pages
Notes 516 Summer 09 Part 2
No ratings yet
Notes 516 Summer 09 Part 2
15 pages
ch12 0
No ratings yet
ch12 0
43 pages
Regression Analysis Guide
100% (1)
Regression Analysis Guide
280 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
12 pages
Regression Modelling and Least-Squares: GSA Short Course: Session 1 Regression
No ratings yet
Regression Modelling and Least-Squares: GSA Short Course: Session 1 Regression
6 pages
SRM Notes
50% (2)
SRM Notes
38 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
46 pages
C6 Regression
No ratings yet
C6 Regression
27 pages
SimpleLinearRegression PDF
No ratings yet
SimpleLinearRegression PDF
86 pages
Intro to Simple Linear Regression
No ratings yet
Intro to Simple Linear Regression
11 pages
Time Series Regression Analysis
No ratings yet
Time Series Regression Analysis
18 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Complete Business Statistics: Simple Linear Regression and Correlation
No ratings yet
Complete Business Statistics: Simple Linear Regression and Correlation
50 pages
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
No ratings yet
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
15 pages
CH 11
No ratings yet
CH 11
55 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Regression for Beginners
No ratings yet
Regression for Beginners
20 pages
Simple Linear Regression Guide
100% (1)
Simple Linear Regression Guide
23 pages
Quantitative Methods
No ratings yet
Quantitative Methods
55 pages
An Introduction To Statistical Learning
No ratings yet
An Introduction To Statistical Learning
19 pages
Lesson 11 Simple Linear Regression and Correlation
No ratings yet
Lesson 11 Simple Linear Regression and Correlation
38 pages
STAT630Slide Adv Data Analysis
0% (1)
STAT630Slide Adv Data Analysis
238 pages
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
No ratings yet
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
23 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
4.1 Multiple Regression Models
No ratings yet
4.1 Multiple Regression Models
6 pages
The Simple Linear Regression Model and Correlation
100% (1)
The Simple Linear Regression Model and Correlation
64 pages
Definition of Simple Linear Regression
No ratings yet
Definition of Simple Linear Regression
9 pages
PED 18 Activity 6
No ratings yet
PED 18 Activity 6
4 pages
Mean, Median, Mode-Jayesh Menashi-2019-08-09
No ratings yet
Mean, Median, Mode-Jayesh Menashi-2019-08-09
3 pages
Nilai DMS NPM PDF
No ratings yet
Nilai DMS NPM PDF
4 pages
Statistics Sample Questions
No ratings yet
Statistics Sample Questions
9 pages
Statistics (Part 4)
No ratings yet
Statistics (Part 4)
25 pages
Back To Problems Page: Solution
No ratings yet
Back To Problems Page: Solution
10 pages
Tab 1 Production Laitiere Lactation 1: Statistics
No ratings yet
Tab 1 Production Laitiere Lactation 1: Statistics
16 pages
Measure of Dispersion
No ratings yet
Measure of Dispersion
11 pages
Learning Sheet No. 8
No ratings yet
Learning Sheet No. 8
4 pages
Assignment MPH
No ratings yet
Assignment MPH
2 pages
NCERT Solutions For Class 11 Maths Chapter 13 Statistics - Free PDF Download
No ratings yet
NCERT Solutions For Class 11 Maths Chapter 13 Statistics - Free PDF Download
39 pages
Statistical Measures for Class Intervals
No ratings yet
Statistical Measures for Class Intervals
5 pages
Burger King Excel
No ratings yet
Burger King Excel
52 pages
Fbe1202 Statistics Homework-3 (Sampling Distributions of Sample Proportion and Sample Variance)
No ratings yet
Fbe1202 Statistics Homework-3 (Sampling Distributions of Sample Proportion and Sample Variance)
3 pages
Grouped Data Analysis: Mean, Median, Mode, Range, Variance, and Standard Deviation
No ratings yet
Grouped Data Analysis: Mean, Median, Mode, Range, Variance, and Standard Deviation
8 pages
Putri Fita Kasmi (1) - Pages-2
No ratings yet
Putri Fita Kasmi (1) - Pages-2
11 pages
ISO System of Limits and Fits (Tolerances)
No ratings yet
ISO System of Limits and Fits (Tolerances)
4 pages
MCT and MD For Pharmacy Students
No ratings yet
MCT and MD For Pharmacy Students
58 pages
Rajat Kapoor Assignment No. 5 8102
No ratings yet
Rajat Kapoor Assignment No. 5 8102
7 pages
Chapter 4 Lesson 3 Mesaures of Dispersion 1
No ratings yet
Chapter 4 Lesson 3 Mesaures of Dispersion 1
9 pages
Quiz 1 Answers
No ratings yet
Quiz 1 Answers
10 pages
Gridding Report - : Data Source
No ratings yet
Gridding Report - : Data Source
7 pages
Ncert Solutions For Class 11 Maths May22 Chapter 15 Statistics Miscellaneous Exercise
No ratings yet
Ncert Solutions For Class 11 Maths May22 Chapter 15 Statistics Miscellaneous Exercise
13 pages
Aiag SPC
No ratings yet
Aiag SPC
31 pages
K.L.E. Institute of Technology
No ratings yet
K.L.E. Institute of Technology
2 pages
g03 Bergonio g05 Fabul
No ratings yet
g03 Bergonio g05 Fabul
25 pages
The Problem
No ratings yet
The Problem
7 pages
Business Performance Analysis
No ratings yet
Business Performance Analysis
28 pages
Gpa Salary
No ratings yet
Gpa Salary
14 pages
Measure of Dispersion - New
No ratings yet
Measure of Dispersion - New
9 pages