Lecture Notes Week One
Lecture Notes Week One
Ken Langat
Lecture 1 Notes
Contents
1 Introduction 2
1.1 Definition of Econometrics ................................................................................................... 2
1.2 Types of Econometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2.1 Theoretical Econometrics ........................................................................................ 2
1.2.2 Applied Econometrics 2
1.3 Methodology of Econometrics .............................................................................................. 2
1.4 Structure of Economic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.4.1 Cross sectional Data ................................................................................................ 2
1.4.2 Time series data ...................................................................................................... 2
1.4.3 Panel data .............................................................................................................. 3 3
2.1 Simple Regression Model ...................................................................................................... 3
2 Regression Analysisthe
2.1.1 Deriving with
OLSCross Sectional Data
...................................................................................................... 3
2.1.2 Goodness of Fit ......................................................................................................... 4
2.2 Gauss Markov Theorem ...................................................................................................... 5
2.3 Linear Estimators .............................................................................................................. 5
2.3.1 Unbiasedness of estimates ........................................................................................ 5
2.3.2 Variances ................................................................................................................. 5
2.3.3 Covariances .............................................................................................................. 6
2.4 Inference for β0 and β1 .......................................................................................... 6
2.4.1 Inference for β1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.4.2 Inference for β0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.5 Practical in R 7
2.5.1 Example 1: US Consumption Expenditure . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.5.2 Example 2: Sales data 8
2.6 Multiple Linear Regression Analysis 8
2.6.1 Matrix Formulation of MLR ..................................................................................... 8
2.6.2 OLS Estimation ...................................................................................................... 9
2.6.3 Fitted Values and Residuals ..................................................................................... 9
2.6.4 Inference in MLR ........................................................................................................... 10
2.6.5 Coefficient of Multiple Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6.6 Hypothesis testing for individual regressors . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6.7Global test 11
2.6.8 Assumptions of Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . .11
2.6.9 Example in R 12
1
1 Introduction
1.1 Definition of Econometrics
This is the social science in which the tools of economic theory, mathematics and statistical inference are
applied to the analysis of economic phenomena.
2
Obs Year GDP
2 2007 14873.8
3 2008 14830.4
Yi = β0 + β1 Xi + ϵi (1)
X X
Q= ϵ2i = (Yi − β0 − β1 Xi )
∂Q X
= −2 (Yi − β0 − β1 Xi )
∂β0
∂Q X
= −2 Xi (Yi − β0 − β1 Xi )
∂β1
X X
Yi = nβˆ0 + βˆ1 Xi (2)
3
X X X
Xi Yi = βˆ0 Xi + βˆ1 Xi2 (3)
βˆ0 = Ȳ − βˆ1 X̄
P P P
n X Y − Xi Yi
βˆ1 = P i i2 P
n Xi − ( Xi )2
The residuals for observation i is the difference between actual and its fitted value.
ei = Yi − Ŷi
X
SST = (Yi − Ȳ )2
X
SSR = (Ŷi − Ȳ )2
X
SSE = (eˆi )2
To proof:
Expand (Yi − Ȳ )2 by adding and substracting Ŷi
P
SSR SSE
R2 = =1−
SST SST
4
2.2 Gauss Markov Theorem
Under the assumptions of Simple Linear Regression, the least square estimators βˆ0 and βˆ1 are unbiased and
have a minimum variance among all linear unbiased estimators of β0 and β1 . Thus βˆ0 and βˆ1 are said to be
BLUE.
P
ˆ (Yi − Ȳ )
β1 = P
(Xi − X̄)2
can be written as:
X
βˆ1 = mi Yi
i −X̄)
where mi = P(X
(X −X̄)2
i
P
and mi = 0 and summi xi = 1
X
E[βˆ1 ] = mi E[Yi ]
= mi (β0 + β1 Xi ) (5)
= β1
2.3.2 Variances
X
V ar(βˆ1 ) = V ar( mi Yi )
X XX
= m2i V ar(Yi ) + ki kj cov(Yi , Yj )
2
P
(Xi − X̄) (6)
= σ2 P
(Xi − X̄)4
σ2
=P
(Xi − X̄)2
5
2.3.3 Covariances
The covariance between β0 and β1 is
H0 : β1 = 0 vs H1 : β1 ̸= 0
The sampling distribution of βˆ1 refers to the different values of βˆ1 that would be obtained with repeated
samping when the levels of the predictor X are held constant from sample to sample
E[βˆ1 ] = β1
and
σ2
V ar(βˆ1 ) = P
(Xi − X̄)2
σ 2 = f racSSEn − 2
thus
M SE
S 2 (βˆ1 ) = P
(X1 − X̄)2
If Yi are normally distributed then the distribution of βˆ1 is normal since βˆ1 =
P
mi Yi and a linear combi-
nation of independent random variables are also normally distributed then:
2
σ
βˆ1 ∼ N (β1 , P )
(Xi − X̄)2
β1 − c
t= q
P M SE
(Xi −X̄)2
6
2.4.2 Inference for β0
The sampling distribution of β0 is:
1 x¯2
βˆ0 ∼ N (β0 , σ 2 ( + P ))
n (xi − x̄)2
To test the hypothesis To test the hypothesis H0 : β0 = c the test statistic is:
β0 − c
t= q
S 2 (βˆ0 )
2.5 Practical in R
2.5.1 Example 1: US Consumption Expenditure
In fpp3 package in R, a data set named us_change shows a time series of quarterly percentage changes
(growth rates) of real personal consumption expenditure, y and real personal disposable income x for the
US from 1970 Q1 to 2019 Q2.
us_change %>%
pivot_longer(c(Consumption, Income), names_to = "Series") %>%
autoplot(value) + theme_bw()+
labs(y = "% Change", x = "Time")
7
The fitted equation is:
Ŷ = 0.545 + 0.272X
2.5 % 97.5 %
(Intercept) 0.438 0.6511
Income 0.1797 0.364
Y = β0 + β1 X1 + ... + βk Xk + ϵ
8
Y = Xβ + ϵ
X
ϵ2i = ϵ′ ϵ
(9)
= (Y − Xβ)′ (Y − Xβ)
Q = (Y − Xβ)′ (Y − Xβ) = Y ′ Y − 2Y ′ Xβ + β ′ X ′ Xβ
∂Q
= 2X ′ Xβ − 2Y ′ X
∂β
−2X ′ Y = −2X ′ Xβ
β̂ = (X ′ X)−1 X ′ Y
ei = Yi − Ŷi
In matrix formulation:
9
Ŷ = X β̂
= X(X ′ X)−1 X ′ Y (10)
= HY
In other words:
e = Y − Ŷ
= Y − Xβ
= Y − X(X ′ X)−1 X ′ Y (11)
= Y − HY
= (1 − H)Y
The variance of β is
Source df SS MSS
Regression p-1 SSR MSR
Error n-p SSE MSE
Total n-1 SST
2 (n − 1)SSE
Rα =1−
(n − p)SST
10
2.6.6 Hypothesis testing for individual regressors
• Determine the null and alternative hypothesis
• Specify the test statistic and its distribution if H0 is true
• Select α and determine the rejection region
• Calculate the sample value of test statistic and desired p-value
• State your conclusion
The hypothesis is
H0 : βk = 0 vs H1 : βk ̸= 0
βk
t= ∼ tn−p
se(βk )
H0 : β1 = ... = βk = 0 vs Ha : βj ̸= 0
M SR
F =
M SE
2.6.8.2 Homoscedasticity The variation in the residuals is the same for all fitted values of Y. The
formal test for homoscedasticity is the Breusch Pagan test and the hypothesis is:
H0 : Constant variance
Ha : Heteroscedasticity
2.6.8.3 Normality of residuals Residuals are normally distributed with a mean of zero. The assump-
tion is necessary for the validity of the inferences that we make based on the global and individual hypothesis
tests
The formal test for the normality of residuals is the Shapiro-Wilk test. The hypothesis tested is:
H0 : Normality of residuals Ha : Residuals not normally distributed
2.6.8.4 Multicollinearity This exists when the independent variables are correlated. If an independent
variable is highly correlated with other variables in the model should be removed.
To assess the degree to which independent variables are correlated we compute the VIF. A VIF greater than
10 is unsatisfactory.
11
2.6.8.5 Autocorrelation Successive residuals should be independent implying that there is no pattern
in the residuals.
When successive residuals are correlated we refer to the condition as autocorrelation.
The formal test is the Durbin Watson test
H0 : No Autocorrelation
Ha : Autocorrelation
2.6.9 Example in R
We fit a multiple linear regression for US consumption given by:
Y = β0 + β1 X1 + β2 X2 + β3 X3 + β4 X4
Where:
Y is the percentage change in real personal consumption expenditure X1 is the percentage change in real
personal disposable income X2 is the percentage change in industrial production X3 is the percentage change
in personal savings X4 is the change in unemployment rate
fit_model <- us_change |>
model(tslm = TSLM(Consumption ~ Income + Production +
Unemployment + Savings))
report(fit_model)
## Series: Consumption
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.90555 -0.15821 -0.03608 0.13618 1.15471
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.253105 0.034470 7.343 5.71e-12 ***
## Income 0.740583 0.040115 18.461 < 2e-16 ***
## Production 0.047173 0.023142 2.038 0.0429 *
## Unemployment -0.174685 0.095511 -1.829 0.0689 .
## Savings -0.052890 0.002924 -18.088 < 2e-16 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 0.3102 on 193 degrees of freedom
## Multiple R-squared: 0.7683, Adjusted R-squared: 0.7635
## F-statistic: 160 on 4 and 193 DF, p-value: < 2.22e-16
The fitted values can be obtained as:
augment(fit_model) |>
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Consumption, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
labs(y = NULL,
title = "Percent change in US consumption expenditure"
) +
12
scale_colour_manual(values=c(Data="black",Fitted="#D55E00")) +
guides(colour = guide_legend(title = NULL))
Data
0
Fitted
−1
−2
13
Innovation residuals
1.0
0.5
0.0
−0.5
−1.0
1980 Q1 2000 Q1 2020 Q1
Quarter
40
0.1
30
count
0.0
acf
20
−0.1 10
−0.2 0
2 4 6 8 10121416182022 −1.0 −0.5 0.0 0.5 1.0
lag [1Q] .resid
14
Table 11: Analysis of Variance Table
15
Consumption Income Production Savings Unemployment
Consumption
0.6
0.4
Corr: Corr: Corr: Corr:
0.2 0.384*** 0.529*** −0.257*** −0.527***
0.0
Income
2.5 Corr: Corr: Corr:
0.0
−2.5 0.269*** 0.720*** −0.224**
Production
2.5
0.0 Corr: Corr:
−2.5 −0.059 −0.768***
−5.0
40
Savings Unemployment
20 Corr:
0
−20 0.106
−40
−60
1.5
1.0
0.5
0.0
−0.5
−1.0
−2−10 1 2 −2.50.02.5 −5.0
−2.5
0.02.5−60
−40
−2002040
−1.0
−0.5
0.00.51.01.5
16