[go: up one dir, main page]

0% found this document useful (0 votes)
23 views73 pages

Week-3 Annotated

The document covers regression modeling, focusing on inferences concerning the slope parameter, sampling distribution of the estimator, and confidence intervals. It discusses hypothesis testing for the slope parameter and the equivalence of F tests and t tests. Additionally, it includes simulation techniques to understand the behavior of estimators and validate statistical methods.

Uploaded by

qq1812016515
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views73 pages

Week-3 Annotated

The document covers regression modeling, focusing on inferences concerning the slope parameter, sampling distribution of the estimator, and confidence intervals. It discusses hypothesis testing for the slope parameter and the equivalence of F tests and t tests. Additionally, it includes simulation techniques to understand the behavior of estimators and validate statistical methods.

Uploaded by

qq1812016515
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Regression Modelling

Week 3

Week 3 Regression Modelling 1 / 73


1 Inferences concerning —1 (CH 2.1)

2 Inferences concerning —0 (CH2.2)

3 Interval estimation of E (Yh ) (CH2.4)

4 Prediction of new observation (CH 2.5)

5 An example with R

Week 3 Regression Modelling 2 / 73


Section 1

Inferences concerning —1 (CH 2.1)

Ei Nco 0

Yi E Yi Bot Xi

Yin ftp.xi Vly


Week 3 Regression Modelling 3 / 73
Inferences concerning —1
Point estimator b
We are interested in the slope parameter —1 . Confidence Interval
E.g. tests
Hypothesis
H0 : —1 = 0
Ha : —1 ”= 0

Under SLR settings, when —1 = 0, then the model becomes

E (Y ) = —0

Before discussing inferences concerning —1 , need the sampling


distribution of b1 .

Week 3 Regression Modelling 4 / 73


Sampling distribution of b1

q
(Xi ≠ X̄ )(Yi ≠ Ȳ )
b1 = q
(Xi ≠ X̄ )2

The sampling distribution of b1 refers to the different values of b1


that would be obtained with repeated sampling when the levels of the
predictor variable X are held constant from sample to sample. (Shown
in simulation)

I
The sampling distribution of b1 is normal with mean and variance:

E (b1 ) = —1
‡ 2
V (b1 ) = ‡ 2 (b1 ) =
Sxx

Week 3 Regression Modelling 5 / 73


Sampling distribution of b1

b1 is a linear combination of Yi
n
ÿ
b1 = ki Yi ,
i=1

where

Xi ≠ X̄
ki = q
(Xi ≠ X̄ )2

Sxy CX x̅ Yi I Ʃ Xi x̅
Y Xi x̅ Y

be I TLEY
Week 3 Regression Modelling 6 / 73
Sampling distribution of b1 Sxx É Xi x̅

ELXi x̅ Xi x̅

E Y E X ECY
EX x̅E
Im
F Xi
Normality: A linear combination of independent normal random
variables is normally distributed.
Mean: E (b1 ) = —1

Ʃ
ET ELEM Yi
Xi x̅ E Yi
4 ELA

Y
Xi F Po Xi x̅ β XD
É Xi x̅ Bot β Xi
EEx
EEEE.fi CA
Week 3 Regression Modelling
O
β Sxx 7 / 73
Sampling distribution of b1
ax a V X

VA
If x y independent XY Vly
‡2
Variance: ‡ 2 (b1 ) = V (b1 ) = Sxx

Cbi V Ʃ Y VLE.CA x̅
Y

VICKI AY CX T.VE
5

Sx

Week 3 Regression Modelling 8 / 73


Sampling distribution of b1

Estimated Variance:

2 MSE
s (b1 ) =
Sxx

This is an ubiased estimator of ‡ 2 (b1 ).


b1 has minimum variance among all unbiased linear estimators of the
form:
ÿ
—ˆ1 = ci Yi .

(Proved P43)
Ò
MSE
We call s(b1 ) = Sxx the standard error of b1 .

Week 3 Regression Modelling 9 / 73


Simulation

In practical scenarios, we typically have access to only a single sample,


making it impossible to directly observe the sampling distribution. However,
through simulation, we can emulate the process of repeated sampling. This
approach allows us to observe the distribution of sample statistics, calculate
their mean and variance, and compare these empirical results with
theoretical expectations. Such simulations are invaluable for understanding
the behavior of estimators and for validating statistical methods.

Week 3 Regression Modelling 10 / 73


Sampling distribution of b1 by simulation

Assume a true model

Yi = 2 + Xi + Ái , where Ái ≥ N (0, 82 )

So —0 = 2, —1 = 1 and ‡ = 8.
Assume the X values are 1, 2, . . . , 25, so the sample size is 25.
We learnt from theoretical results that the sampling distribution of b1
is normal with mean 1, and variance S64xx .To construct a simulated
sampling distribution of b1 , we follow steps below:

Week 3 Regression Modelling 11 / 73


Sampling distribution of b1 by simulation

1. We generate a random sample of Yi from the true model.


2. We pretend not knowing the values of the parameters and use least
squares estimation to get estimated values for the parameters. In this
case, we record the b1 value.
3. Repeat the first two steps nrep times, where nrep is a large number, so
that we have many random samples and their corresponding estimates.
We have {b1(1) , b1(2) , . . . , b1(nrep) }.
4. Plot the distribution of all the b1 ’s and calculated the mean and
variance. Compare with theoretical results.

Week 3 Regression Modelling 12 / 73


Sampling distribution of b1 by simulation
X = 1:25
n = length(X) n 25
nrep = 5000
Sxx = sum((X-mean(X))ˆ2)
beta0 = 2; beta1 = 1; esigma = 8
b0 = b1 = MSE = s_b1 = s_b0 = array(NA, dim = nrep)
set.seed(7038)
for(i in 1:nrep)
{
Y = beta0 + beta1*X + rnorm(n, 0, esigma)
mod = lm(Y ~ X)
b0[i] = mod$coefficients[1]
b1[i] = mod$coefficients[2]
MSE[i] = sum(mod$residualsˆ2)/(n-2)
s_b1[i] = sqrt(MSE[i]/Sxx)

}
s_b0[i] = sqrt(MSE[i]*(1/n + mean(X)ˆ2/Sxx)) Scb 1T
Week 3 Regression Modelling 13 / 73
Sampling distribution of b1 by simulation

it
β
mean(b1) # empirical mean if b1

## [1] 0.9983658
sdb1 = esigma/sqrt(Sxx) # theoretical sd
sdb1 FE
## [1] 0.2218801
sd(b1) # empirical sd of b1

## [1] 0.2230607

Week 3 Regression Modelling 14 / 73


Sampling distribution of b1 by simulation
hist(b1, prob = TRUE, breaks = 50, ylim = c(0,1.8))
myfun = function(x) dnorm(x, beta1, sdb1)
plot(myfun, 0.3, 1.7, col= red , add = TRUE)
Histogram of b1

NCR
1.5
1.0
Density

0.5

11 1
0.0

11
0.5 1.0 1.5

b1
Week 3 Regression Modelling 15 / 73
Sampling distribution of (b1 ≠ —1 )/s(b1 )

We have:

b1 ≥ N (—1 , ‡ 2 (b1 ))

Standardize: N
b1 ≠ —1
a
≥ N (0, 1)
‡(b1 )

Substitute ‡(b1 ) with s(b1 ), we get a studentized statistic. We have

b1 ≠ —1
≥ tn≠2 ,
s(b1 )
Ò
MSE
where s(b1 ) = Sxx .

Week 3 Regression Modelling 16 / 73


T distribution and Normal distribution

plot(function(x) dnorm(x), -4,4, lwd = 2,ylab = density )


plot(function(x) dt(x, df=2), -4, 4, ylab="", add = T,
col= 2, lty =2, lwd = 2)
plot(function(x) dt(x, df=6), -4, 4, ylab="", add = T,
col= 3, lty =2, lwd = 2)
plot(function(x) dt(x, df=n-2), -4, 4, ylab="", add = T,
col= 4, lty =2, lwd = 2)
legend(x=2.5, y=0.35, legend = c( normal , t2 , t6 , t23 ),
lty=c(1,2,2,2), col = c(1,2,3,4), lwd = c(2,2,2,2))

Week 3 Regression Modelling 17 / 73


T distribution and Normal distribution
0.4

normal
t2
0.3

t6
t23

now
density

0.2

In Non
0.1
0.0

−4 −2 0 2 4

Week 3 Regression Modelling 18 / 73


Sampling distribution of (b1 ≠ —1 )/s(b1 ) by simulation
hist((b1-beta1)/s_b1, prob = TRUE, breaks = 20, ylim = c(0,0.4))
myfun = function(x) dt(x, df = n-2)
plot(myfun, -4, 4, col= red , add = TRUE)
Histogram of (b1 − beta1)/s_b1
0.4

the
0.3
Density

0.2
0.1
0.0

−4 −2 0 2 4

(b1 − beta1)/s_b1

Week 3 Regression Modelling 19 / 73


Confidence interval for —1

Since (b1 ≠ —1 )/s(b1 ) follows a tn≠2 distribution, we have


Ó b1 ≠ —1 Ô
P t–/2;n≠2 Æ Æ t1≠–/2;n≠2 = 1 ≠ –,
s(b1 )

where t–/2;n≠2 denotes the (–/2)100 percentile of the t distribution


with n ≠ 2 degrees of freedom.
t is symmetric so t–/2;n≠2 = ≠t1≠–/2;n≠2 .
Rearranging terms, we have
Ó Ô
P b1 ≠ t–/2;n≠2 s(b1 ) Æ —1 Æ b1 + t–/2;n≠2 s(b1 ) = 1 ≠ –

Week 3 Regression Modelling 20 / 73


Confidence interval for —1

α 0105

95
The 100(1 ≠ –)% confidence interval for —1 is:

Ë È
b1 ≠ t–/2 (n ≠ 2)s (b1 ) , b1 + t–/2 (n ≠ 2)s (b1 )
Ò
MSE
where s (b1 ) = Sxx .

55 Ei

2.069 0 2.069

Week 3 Regression Modelling 21 / 73


100(1 ≠ –)% Confidence interval
50 5 2.5
2.0
1.5

PO
1.0
b1

0.5
0.0

0 10 20 30 40 50

Index

Week 3 Regression Modelling 22 / 73


Code
X = 1:25
n = length(X)
beta0 = 2; beta1 = 1; esigma = 8
b0 = b1 = ci1 = ci2 = array(NA, dim = 50)
set.seed(7038)
for(i in 1:50)
{
Y = beta0 + beta1*X + rnorm(n, 0, esigma)
mod = lm(Y ~ X)
b0[i] = mod$coefficients[1] CI
b1[i] = mod$coefficients[2] 95
ci1[i] = confint(mod)[2,1] # lower limits
ci2[i] = confint(mod)[2,2] # upper limits
of
CI for β
I
}
library(plotrix) # need to install this package beforehand
plot(b1, ylim = c(0,2),ylab = b1 )
plotCI(b1,y=NULL, uiw=ci2-b1, liw=b1-ci1, err="y", add=TRUE)
abline(h = 1, col = red )

Week 3 Regression Modelling 23 / 73


Homework

1. Design a simulation study to find the sampling distribution of MSE,


MSR.
2. Run the simulation in the previous slide for a large number of times.
Calculate the percentage of confidence intervals covering the true
parameter value.

Week 3 Regression Modelling 24 / 73


Tests concerning —1
Using the result
b1 ≠ —1
≥ tn≠2 ,
s(b1 ) Iii

we can conduct hypothesis tests for —1 .


I E
Two-sided test: whether there is a linear relationship
to 025 to 925
H0 : —1 = 0
Ha : —1 ”= 0

Test statistic
b1
t =
ú
s(b1 )

Reject if |t ú | > t1≠–/2;n≠2 .


Week 3 Regression Modelling 25 / 73
Tests concerning —1

One-sided test: Is there a positive (negative) relationship?

H0 : —1 Æ 0 (—1 Ø 0) tri z
Ha : —1 > 0 (—1 < 0)

Test statistic
t.li

b1
t =
E
ú
s(b1 )

Reject if tú > t1≠–;n≠2 (t ú < t–,n≠2 ).


to.os to 95

Week 3 Regression Modelling 26 / 73


Equivalence of F test and t test

For a given – level, the F test of —1 = 0 versus —1 ”= 0 is equivalent


algebraically to the two-sided t test on —1 .
MSR value same
F =ú
MSE p
b1
t =
ú
s(b1 )

We can show that (t ú )2 = F ú .

MSR
5
SSR CY Y É both Xi 75
Eg Cy b x̅ b Xi 45 IT bicxi TT
bisx tt.

T bE
Week 3
bIIE
Regression Modelling
f
27 / 73
Hypothesis testing with —1 at a general value

Two-sided test

H0 : —1 = c v.s. Ha : —1 ”= c

Test statistic

b1 ≠ c
t =
ú
≥ tn≠2 under H0
s (b1 )

Reject H0 if |t ú | > t1≠–/2;n≠2 .

Week 3 Regression Modelling 28 / 73


Hypothesis testing with —1 at a general value

One-sided test tail test


Upper
H0 : —1 Æ c v.s. Ha : —1 > c

Test statistic

b1 ≠ c
t =
ú
≥ tn≠2 under H0
s (b1 )

Reject H0 if t ú > t1≠–;n≠2 .

Week 3 Regression Modelling 29 / 73


Section 2

Inferences concerning —0 (CH2.2)

Week 3 Regression Modelling 30 / 73


Sampling distribution of b0

The sampling distribution of b0 is normal with mean and variance:

E (b0 ) = —0
C D
2 2 1 X̄ 2
‡ (b0 ) = V (b0 ) = ‡ +
n Sxx

Week 3 Regression Modelling 31 / 73


Sampling distribution of b0

Normality: b0 is also a linear combination of Yi .

b Y bit

LC LC of Yi
of Yi

Week 3 Regression Modelling 32 / 73


Sampling distribution of b0

Mean: E (b0 ) = —0

bi ECT x̅ ECBD
E bo E Y
EC EYi x̅ β
ÉELY x̅
β
É pot β
Xi x̅ β

Week 3
Pot RIEI
Regression Modelling
β β
33 / 73
Sampling distribution of b0

VCkÉYi
VCXtVCY
InVLY Ë
no
2
È EVIXIYI
Variance: ‡ 2 (b0 ) = ‡ 2 1
n + X̄
Sxx
I2COVCi

bo V b

VCXBD see proof


V
y zcovcy.IE on Wattle
In ÉVCb


If Jr
ʰ
É
Week 3 Regression Modelling 34 / 73
Sampling distribution of b0

2
βo
mean(b0) # empirical mean of b0

## [1] 2.014146
sd(b0) # empirical sd of b0

## [1] 3.275723
sdb0 = esigma*sqrt(1/n + mean(X)ˆ2/Sxx)
sdb0 # Theoretical sd of b0

## [1] 3.298485

Week 3 Regression Modelling 35 / 73


Sampling distribution of b0
Histogram of b0

bo
N Po
0.10
0.08
Density

0.06
0.04
0.02
0.00

−10 −5 0 5 10

b0

Week 3 Regression Modelling 36 / 73


Sampling distribution of (b0 ≠ —0 ) /s (b0 )

An estimator of ‡ 2 (b0 )
C D
2 1 X̄ 2
s (b0 ) = MSE +
n Sxx

Ú 1 2
1 X̄ 2
We call s(b0 ) = MSE n + Sxx the standard error of b0 .

The sampling distribution of the studentised statistic


b0 ≠ —0
≥ tn≠2
s(b0

Week 3 Regression Modelling 37 / 73


Sampling distribution of (b0 ≠ —0 ) /s (b0 )
hist((b0-beta0)/s_b0, prob = TRUE)
myfun = function(x) dt(x, df = n-2)
plot(myfun, -3, 3, col= red , add = TRUE)
Histogram of (b0 − beta0)/s_b0
0.3
Density

0.2
0.1
0.0

−4 −2 0 2 4

(b0 − beta0)/s_b0

Week 3 Regression Modelling 38 / 73


Confidence interval for —0

The 100(1 ≠ –)% confidence interval for —0 is:

Ë È
b0 ≠ t1≠–/2,n≠2 s (b0 ) , b0 + t1≠–/2,n≠2 s (b0 )
Ú Ë È
1 x̄ 2
where s (b0 ) = MSE n + Sxx .

estimate Z t Cs e
point

Week 3 Regression Modelling 39 / 73


Hypothesis Testing with —0 at a General Value

Two-sided test

H0 : —0 = c
Ha : —0 ”= c

Test statistic

b0 ≠ c
t =
ú
≥ tn≠2 under H0
s (b0 )

Reject H0 if |t ú | > t1≠–/2,n≠2 .

Week 3 Regression Modelling 40 / 73


Hypothesis Testing with —0 at a General Value

One-sided test

H0 : —0 Æ c (—0 Ø c)
Ha : —0 > c (—0 < c)

Test statistic

b0 ≠ c
t =
ú
s (b0 )

Reject H0 if t ú > t1≠–,n≠2 (t ú < t–,n≠2 ).

Week 3 Regression Modelling 41 / 73


Section 3

Interval estimation of E (Yh ) (CH2.4)

Week 3 Regression Modelling 42 / 73


Interval estimation of E (Yh )

bot bi Xu
Objective: estimate the mean for one or more probability distributions
of Y .
Example: the Toluca Company is interested in the mean number of
work hours for a range of logt sizes for the purposes of finding the
optimum lot size.

Week 3 Regression Modelling 43 / 73


Interval estimation of E (Yh )

Xh denotes the level of X for which we wish to estimate the mean


response. Xh may be a value in the sample or other value of the
predictor within the scope of the model.
We already know Yˆh = b0 + b1 Xh is a point estimator for E (Yh ).
To get the interval estimation, we need to know the sampling
distribution of Ŷh .

Week 3 Regression Modelling 44 / 73


The sampling distribution of Ŷh

The sampling distribution of Ŷh is normal, with mean and variance:

E {Ŷh } = E (Yh )
C D
2 1 (Xh ≠ X̄ )2
V {Ŷh } = ‡ +
n Sxx

Normality: Ŷh is a linear combination of the observations Yi .

botbin

LCofYi
Week 3 Regression Modelling 45 / 73
The sampling distribution of Ŷh

E E both Xn Ecbolt XnECb


Mean: E {Ŷh } = E {Yh }
Variance: Pot β Xh
E
Yn
V Yn V bot bi Xh

y b x̅ b Xn

VEY b Xn x̅

Xn V bi 2Corky b Xn x̅
Y x̅

I an IT I
Week 3 Regression Modelling 46 / 73
The sampling distribution of Ŷh

53

So
y
So

if
I

! "2
Figure 1: The further from X̄ is Xh , the greater is the quantity Xh ≠ X̄ and the
larger is the variance of Ŷh .

Week 3 Regression Modelling 47 / 73


The sampling distribution of Ŷh

Estimated standard deviation of Ŷh :


C D
2 1 (Xh ≠ X̄ )2
s {Ŷh } = MSE +
n Sxx

Week 3 Regression Modelling 48 / 73


Sampling distribution of (Ŷh ≠ E (Yh ))/s(Ŷh )

(Ŷh ≠ E (Yh ))/s{Ŷh } is distributed as tn≠2 .

The 100(1 ≠ –)% confidence interval for E (Yh ) is:

Ë È
Ŷh ≠ t1≠–/2,n≠2 s(Ŷh ), Ŷh + t1≠–/2,n≠2 s(Ŷh )
Û 5 6
2
1 (Xh ≠X̄ )
where s(Ŷh ) = MSE n + Sxx .

Week 3 Regression Modelling 49 / 73


Section 4

Prediction of new observation (CH 2.5)

Week 3 Regression Modelling 50 / 73


Prediction of new observation

Objective: Predict a new observation Y corresponding to a given level


X.
Example: with the Toluca Company example, the next lot to be
produced consists of 100 units and management wishes to predict the
number of work hours for this particular lot.

Week 3 Regression Modelling 51 / 73


Prediction of new observation

Denote the level of X for the new trial as Xh and the new observation
on Y as Yh(new ) .
Difference between estimation of the mean response E (Y ) and
prediction of a new response Yh(new ) :
The former one is to estimate the mean of the distribution of Y .
The latter one is to predict an individual outcome drawn from the
distribution of Y .

Week 3 Regression Modelling 52 / 73


Prediction interval for Yh(new )

The point prediction for Yh(new ) is

Ŷh = b0 + b1 Xh

Compared with constructing the confidence interval for


E (Yh ) = —0 + —1 Xh , we now want a prediction interval for

Yh(new ) = —0 + —1 Xh + Áh(new )

To quantify the variation induced by the extra error term, we have

V {Yh(new ) } = V {Ŷh } + ‡ 2

Week 3 Regression Modelling 53 / 73


Prediction interval for Yh(new )

Total variation of predication ‡ 2 {pred}

S 1 22 T
1 2 1 Xh ≠ X̄
‡ 2 {pred} = ‡ 2 Ŷh + ‡ 2 = ‡ 2 U1 + +
W X
V
n Sxx

An unbiased estimator of ‡ 2 {pred} is

S 1 22 T
1 2 1 Xh ≠ X̄
2 2
se2
W X
s {pred} = s Ŷh + = MSE U1 + + V
n Sxx

Week 3 Regression Modelling 54 / 73


Prediction interval for Yh(new )

It can be shown that

Yh( new ) ≠ Ŷh


≥ tn≠2
s{pred}

The 100(1 ≠ –)% prediction interval of Yh(new ) is

Ë È
Ŷh ≠ t1≠–/2,n≠2 s{pred}, Ŷh + t1≠–/2,n≠2 s{pred}
Û 5 6
2
1 (Xh ≠X̄ )
where s{pred} = MSE 1 + n + Sxx .

Week 3 Regression Modelling 55 / 73


Comparison

Mean Response Individual Outcome

E (Yh ) = —0 + —1 Xh Yh( new ) = —0 + —1 Xh + Áh( new )

Point Estimator: Ŷh = b0 + b1 Xh Point Prediction: Ŷh = b0 + b1 Xh

Confidence 1 interval
2 Ŷh ± Prediction interval Ŷh ±
t1≠–/2,n≠2 s Ŷh t1≠–/2,n≠2 s{pred}

Û 5 6 Û 5 6
1 2
(Xh ≠X̄ ) (Xh ≠X̄ )
2 2
1 1
s Ŷh = MSE n + Sxx s{pred} = MSE 1 + n + Sxx

Week 3 Regression Modelling 56 / 73


Section 5

An example with R

Week 3 Regression Modelling 57 / 73


Toluca company example

Toluca <- read.table("CH01TA01.txt")


# need to put the data file into your working directory first
X <- Toluca[,1]
Y <- Toluca[,2]
n <- length(Y)

X = “Lot Size” and Y = “Hours Worked”

Week 3 Regression Modelling 58 / 73


Fitting the model

mymodel <- lm(Y ~ X)


coef(mymodel)

## (Intercept) X
## 62.365859 3.570202
b0 = coef(mymodel)[1]
b1 = coef(mymodel)[2]
summary(mymodel)$coefficients
S bo
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.365859 26.1774339 2.382428 2.585094e-02
## X 3.570202 0.3469722 10.289592 4.448828e-10
5lb
Week 3 Regression Modelling 59 / 73
Interpretation of the coefficients

Ŷ = 62.37 + 3.5702X

We estimate that the mean number of work hours increases by 3.57


hours for each additional unit produced in the lot.
We estimate that the mean number of workhours is 62.37 when the
number of lot units is 0. (meaningless)

Week 3 Regression Modelling 60 / 73


Fitting the model
500
400
Hours worked

300

FEE
200

i
1
100

50
20 40 60 80 100 120

Lot sizes

Week 3 Regression Modelling 61 / 73


Confidence intervals of coefficients (manually)

alpha = 0.05
s_b0 = summary(mymodel)$coefficients[1,2]
s_b1 = summary(mymodel)$coefficients[2,2]
b0_ci = c(b0 + qt(alpha/2, df = n-2)*s_b0,
b0 + qt(1-alpha/2, df = n-2)*s_b0)
b1_ci = c(b1 + qt(alpha/2, df = n-2)*s_b1,
b1 + qt(1-alpha/2, df = n-2)*s_b1)
b0_ci

## (Intercept) (Intercept)
## 8.213711 116.518006
b1_ci

## X X
## 2.852435 4.287969

Week 3 Regression Modelling 62 / 73


Confidence intervals of coefficients (by function)

confint(mymodel)

## 2.5 % 97.5 %
## (Intercept) 8.213711 116.518006
## X 2.852435 4.287969
With confidence coefficient .95, we estimate that the mean number of
work hours increases by somewhere between 2.85 and 4.29 hours for
each additional unit in the lot.
confint(mymodel, level=0.90)

## 5 % 95 %
## (Intercept) 17.501100 107.230617
## X 2.975536 4.164868

Week 3 Regression Modelling 63 / 73


Hypothesis testing with —

The t statistic and p-values refer to two-tail tests at zero


summary(mymodel)$coefficients

## Estimate Std. Error t value Pr(>|t|)


## (Intercept) 62.365859 26.1774339 2.382428 2.585094e-02
## X 3.570202 0.3469722 10.289592 4.448828e-10
value
p
two tail test i
Ho β O
4 10.289

na Bito p value4.4 10
Week 3 reject Ho
Regression Modelling 64 / 73
Equivalence of F Test and t Test
X
Y
correlation
β
anova(mymodel)
p t
##
##
Analysis of Variance Table
to
tes
## Response: Y P
## Df Sum Sq Mean Sq F value Pr(>F)
## X 1 252378 252378 105.88 4.449e-10 ***
## Residuals 23 54825 2384
## ---
## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1

Week 3 Regression Modelling 65 / 73


Hypothesis testing with — critical
If using c v approac
ᵈ value
0.05
am tail test
upper a2
C v is
to.gs
One-tail test

tail test
g 1.714
Upper Lower tail test 1714
i reject Ho
Ho
β Ho 1170 tail
Ha 170 Ha co for lower
p test Cr is
10.289 10.289 1.714
not 17
p value pvalue 1 4
IE 2t
i Fail to
RejectHo Fail to reject Ho reject
Week 3 Regression Modelling 66 / 73
Hypothesis testing with —

Test at non-zero value

H0 : —1 = 2
Ha : —1 ”= 2

ts = (b1-2)/s_b1
ts

## X
## 4.525441
pvalue = 2*(1-pt(ts, df = n-2))
pvalue

## X
## 0.0001519178 2 Ho
Reject
Week 3 Regression Modelling 67 / 73
Confidence interval for E (Yh )
Let’s try to find a 90% confidence interval for E (Yh ) when Xh = 65.
Ú Ë È
1 (Xh ≠X̄ )2
Ŷh ± t1≠–/2;n≠2 MSE n
+ Sxx

X_h = 65
Y_hat = b0 + b1*X_h
Y_hat

## (Intercept)
## 294.429
MSE = sum(residuals(mymodel)ˆ2) / df.residual(mymodel)
s_Yhat = sqrt(MSE*(1/n +(X_h-mean(X))ˆ2/sum((X-mean(X))ˆ2)))
alpha = 0.1
Y_hat_ci = c(Y_hat - qt(1-alpha/2, n-2)*s_Yhat,
Y_hat + qt(1-alpha/2, n-2)*s_Yhat)
names(Y_hat_ci) = c( lower , upper )
Y_hat_ci

## lower upper
## 277.4315 311.4264
Week 3 Regression Modelling 68 / 73
Prediction interval for Ŷh(new )

A 90% prediction interval when Xh = 65.


Û 5 6
2
1 (Xh ≠X̄ )
Ŷh ± t1≠–/2,n≠2 MSE 1 + n + Sxx

s_Yhnew = sqrt(MSE*(1 + 1/n +


(X_h-mean(X))ˆ2/sum((X-mean(X))ˆ2)))
Y_hat_pi = c(Y_hat - qt(1-alpha/2, n-2)*s_Yhnew,
Y_hat + qt(1-alpha/2, n-2)*s_Yhnew)
names(Y_hat_pi) = c( lower , upper )
Y_hat_pi

## lower upper
## 209.0432 379.8148

Week 3 Regression Modelling 69 / 73


Confidence interval and prediction interval

predict(mymodel, newdata = data.frame(X = X_h),


interval = confidence , level = .90)

## fit lwr upr


## 1 294.429 277.4315 311.4264
predict(mymodel, newdata = data.frame(X = X_h),
interval = prediction , level = .90)

## fit lwr upr


## 1 294.429 209.0432 379.8148

With confidence coefficient 90%, the mean work hours required when lots of
65 units are produced is between 277 and 311.
With confidence coefficient 90%, we predict that the number of work hours
required for any lot of 65 units is between 209 and 380.

Week 3 Regression Modelling 70 / 73


Confidence interval and prediction interval
120
For a series of Xh values 24
X_h = seq(20, 120, by = 2) 20 22
ci = predict(mymodel, newdata = data.frame(X = X_h),
interval = confidence )
pi = predict(mymodel, newdata = data.frame(X = X_h),
interval = prediction )
plot(X,Y, ylim = c(0, 600))
abline(mymodel, lwd = 2)
lines(X_h, ci[,2], lty = 2, col = 2, lwd = 2)
lines(X_h, ci[,3], lty = 2, col = 2, lwd = 2)
lines(X_h, pi[,2], lty = 2, col = 3, lwd = 2)
lines(X_h, pi[,3], lty = 2, col = 3, lwd = 2)
legend(20, 550, c( 95CI , 95PI ), col = c(2,3),
lty = c(2,2), lwd = c(2,2))

Week 3 Regression Modelling 71 / 73


Prediction interval for Ŷh(new )
600

20,550
95CI
500

95PI
400
300
Y

200
100
0

20 40 60 80 100 120

Week 3 Regression Modelling 72 / 73


Read textbook Ch 2.1-2.5.

Week 3 Regression Modelling 73 / 73

You might also like