Econometrics
Nguyen Van Quy
Data Science program - NEU
February 11, 2025
Nguyen Van Quy Econometrics 2024/2025 1 / 49
Multiple Regression Model
Nguyen Van Quy Econometrics 2024/2025 2 / 49
The need of multiplicity?
Usually, studying an economics relationship requires many
independent variables.
More flexible and more suitable in terms of functional forms
Better regression and prediction.
Example: Demand/Supply, Phillip curve, etc
Nguyen Van Quy Econometrics 2024/2025 3 / 49
Model
Model of k explanatory
In Population (PRF) In Sample (SRF)
E (yi ) = β0 + β1 x1i + · · · + βk xki yˆi = β̂0 + β̂1 x1i + · · · + β̂k xki
yi = β0 + β1 x1i + · · · + βk xki + ui yi = β̂0 + β̂1 x1i + · · · + β̂k xki + ei
Intercept β0 = E (y |0,...,0 )
∂E (y )
Slope βj = ∂xj
If β1 = · · · = βk = 0: model is overall insignificant
Nguyen Van Quy Econometrics 2024/2025 4 / 49
Matrix form
y1 = β0 + β1 x11 + · · · βk xk1 + ε1
..
.
yn = β0 + β1 x1n + · · · βk xkn + εn
y1 1 x11 · · · xk1 ε1
y2 1 x12 · · · xk2 β0 ε2
.
.. = .. . .. + ..
.. . .
. . . . .. .
βk
yn 1 x1n · · · xkn εn
y = Xβ + u
ŷ = X β̂
y = X β̂ + e
Nguyen Van Quy Econometrics 2024/2025 5 / 49
Interpret the coefficients
How do we interpret equation below?
\ = 1.29 + 0.453hsGPA + 0.0094ACT
colGPA
\ = 2.40 + 0.0271ACT
colGPA
Ceteris paribus interpretations
Changing more than one independent variable simultaneously
Nguyen Van Quy Econometrics 2024/2025 6 / 49
OLS estimation
Nguyen Van Quy Econometrics 2024/2025 7 / 49
OLS estimation
Find β̂j , j = 0, k such that they minimizes
n
X n
X
RSS = ei2 = (yi − β̂0 − β̂1 xi1 − · · · − β̂k xik )2
i=1 i=1
In terms of matrix form
β̂k2
β̂) = kyy − X β̂
min S(β̂
β̂
β̂)0 (yy − X β̂
β̂) = (yy − X β̂
We have S(β̂ β̂) = yy 0 − 2β̂ 0X 0y + β̂ 0X 0X β̂
β̂. So take
FOC, one gets
X 0X β̂ = X 0y
If X has full column rank (no multicollinearity of x1 , . . . , xk ), then
X 0X )−1X 0y
β̂ OLS = (X (1)
Nguyen Van Quy Econometrics 2024/2025 8 / 49
OLS fitted values
The sample average of the residuals is zero and so ȳ = ŷ¯ .
The sample covariance between each independent variable and the
OLS residuals is zero. Consequently, the sample covariance between
the OLS fitted values and the OLS residuals is zero.
The point (x̄1 , x̄2 , . . . , x̄k , ȳ ) is always on the OLS regression line.
Nguyen Van Quy Econometrics 2024/2025 9 / 49
Geometric interpretation of OLS
We have
0 = X 0y − X 0X β̂ = X 0e
Which means that e should be perpendicular to every column vector
of matrix X , i.e. perpendicular to the vector space spanned by the
column vectors of X
Condition X 0e = 0 is called the system of normal equations.
XX 0 )−1 )X
Notice that ŷ = X β̂ = PX y , where PX = X (XX X is orthogonal
projector on vector space spanned by X .
Let MX = I − PX is orthogonal projector on the orthogonal space of
X , e = MX y .
Nguyen Van Quy Econometrics 2024/2025 10 / 49
OLS properties
Nguyen Van Quy Econometrics 2024/2025 11 / 49
Gauss - Markov assumptions
Gauss - Markov assumptions
1. (Linearity) y = β0 + β1 x1 + · · · + βk xk + u
2. (Zero mean) E (uu |X ) = 0
3. (Random sampling) We have a random sample of n observations,
{(xi1 , xi2 , . . . , xik , yi ) : i = 1, n}.
4. (No perfect collinearity) Rank(X ) = k + 1.
5. (Homoskedasticity) Var (ui ) = σ 2 for all i = 1, n.
6. (Non auto correlation) Cov (ui , uj ) = 0 for all i 6= j.
7. (Normality) u ∼ N(00, σ 2I ).
Nguyen Van Quy Econometrics 2024/2025 12 / 49
Properties of OLS estimator
OLS estimator
Under Assumptions 1-4, OLS estimator is unbias, E (β̂ OLS ) = β .
X 0X )−1 .
Under Assumptions 1-6, Var (β̂ OLS ) = σ 2 (X
Moreover
σ2
Var (β̂j OLS ) =
TSSj (1 − Rj2 )
where TSSj = i (xij − x¯j )2 is the total sample variation in xj , and Rj2 is
P
the R-squared from regressing xj on all other independent variables (and
including an intercept).
Nguyen Van Quy Econometrics 2024/2025 13 / 49
Properties of OLS estimator
BLUE
Under Assumptions 1-6, the OLS estimator, β̂ OLS , is a best linear
unbiased estimator (BLUE) of β .
When n → ∞, (β̂ OLS
n ) converges in probability to β .
Moreover, under Assumption 1-6, the estimators are asymptotically normal.
MVUE
Under Assumptions 1-7, the OLS estimator, β̂ OLS , is also the minimum
variance unbiased estimator of β .
Nguyen Van Quy Econometrics 2024/2025 14 / 49
Estimator of σ
β̂), variance of random error σ 2 is unknown, estimated by
In the Var (β̂
ee 0
P 2
ei
s2 = =
n − (k + 1) n−k −1
Estimated variance-covariance matrix:
Var
d (β̂ X 0X )−1
β̂) = s 2 (X
Standard Error of estimated coefficient
βj ) = Var
Se(β d (β̂j )
Nguyen Van Quy Econometrics 2024/2025 15 / 49
Partialling out effect
Nguyen Van Quy Econometrics 2024/2025 16 / 49
Partialling Out interpretation
We focus on β̂1 .
We have
n n
! !
X X
2
β̂1 = rˆi1 yi / rˆi1
i=1 i=1
The residuals rˆi1 come from the regression of x1 on x2 , . . . , xk .
So that we can then do a simple regression of y on rˆ1 to obtain β̂1 .
β̂1 then measures the effect of x1 on y after x2 , . . . , xk have been
partialled or netted out.
Nguyen Van Quy Econometrics 2024/2025 17 / 49
Simple and Multiple Regression Estimates
Let consider k = 2. In the simple regression of y on x1 , we have
ỹ = β̃0 + β̃1 x1 . In the multiple regression, ŷ = βb0 + βb1 x1 + βb2 x2 .
We have β̃1 = βb1 + βb2 δ̃1 where δ̃1 is the slope coefficient from the
simple regression of x2 on x1 .
Simple and multiple regression estimates are equal if
The partial effect of x2 on ŷ is zero, i.e., βb2 = 0
x1 and x2 are uncorrelated, i.e., δ̃1 = 0
Simple and multiple regression estimates are almost never identical.
But we can use the above formula to characterize why they might be
either very different or quite similar.
Nguyen Van Quy Econometrics 2024/2025 18 / 49
Including Irrelevant Variables
One (or more) of the independent variables is included in the model
even though it has no partial effect on y in the population
Suppose we specify the model as
y = β0 + β1 x1 + β2 x2 + β3 x3
and this model satisfies Assumptions 1-4. However, x3 has no effect
on y after x1 and x2 have been controlled for, β3 = 0.
There is no effect on the unbiasedness of all coefficients.
However, if x1 and x3 are highly correlated then R1 is high, which
leads to a large variance of βb1 .
Nguyen Van Quy Econometrics 2024/2025 19 / 49
Omitted Variables
For example, we should regress y on x1 and x2 but instead of that, we
regress y on x1 only. Then the coefficient of x1 is mostly bias.
Omitted variable bias β2 δ̃1 where δ̃1 is the slope coefficient from the
simple regression of x2 on x1
The direction of bias.
Nguyen Van Quy Econometrics 2024/2025 20 / 49
Omitted Variables
However, if we regress more than 3 independent variables, this is more
problematic. For example, suppose the population model
y = β0 + β1 x1 + β2 x2 + β3 x3 + u
satisfies Assumptions 1-4. But we omit x3 and estimate the model as
y = β̃0 + β̃1 x1 + β̃2 x2 + u
Suppose that x1 is correlated with x3 .
It’s clear that β̃1 is probably biased (same reason as previous one)
Moreover, β̃2 is also biased even x2 is uncorrelated with x3 .
It’s usually difficult to obtain the direction of bias in β̃1 and β̃2
Nevertheless, if we assume that x1 and x2 are uncorrelated, then we
can study the direction of bias.
Nguyen Van Quy Econometrics 2024/2025 21 / 49
Omitted Variables
Now we compare two estimator of β1 . One comes from
y = βb0 + βb1 x1 + βb2 x2
And the other comes from
y = β̃0 + β̃1 x1
When β2 6= 0, β̃1 is biased, βb1 is unbiased, and Var(βb1 )>Var(βb1 ).
When β2 = 0, both β̃1 and βb1 are unbiased, and Var(βb1 )>Var(βb1 ).
What we should choose between β̃1 and βb1 ?
Nguyen Van Quy Econometrics 2024/2025 22 / 49
Goodness-of-fit
Nguyen Van Quy Econometrics 2024/2025 23 / 49
Sum of squares
P
yi
Let ȳ = .
n
n
X
Total sum of squares: TSS = (yi − ȳ ), df = n − 1.
i=1
n
X
Explained/Regression sum of squares: ESS = (ŷi − ȳ ), df = k.
i=1
n
X n
X
Residual sum of squares: RSS = (yi − ŷi ) = ei2 ,
i=1 i=1
df = n − 1 − k.
TSS = ESS + RSS.
Nguyen Van Quy Econometrics 2024/2025 24 / 49
Goodness-of-fit
R-squared is squared of correlation between y and ŷ ;
ESS RSS
R2 = =1−
TSS TSS
It is interpreted as the proportion of the sample variation in y that is
explained by the OLS regression line.
Adding new explanatory variable, even it’s irrelevant, will artificially
increases R 2 .
Adjusted R-square:
n−1 RSS n − 1 s2
Ra2 = 1 − (1 − R 2 ) = 1 − =1− 2
n−k −1 TSS n − k − 1 sy
For models with a different number of explanatory variables, only the
Ra2 can be compared.
Nguyen Van Quy Econometrics 2024/2025 25 / 49
Remark
If there is no constant in the model, R 2 has no meaning because the
way it is computed requires a constant term.
R 2 and adjusted-R 2 are valid only if comparing models that have the
same dependent variable. So they are inappropriate to compare 2
models with y and log (y ) as the dependent variable.
The (adjusted)-R 2 is not enough to assess the relevance of a
regression: we’ll need statistical tests.
Nguyen Van Quy Econometrics 2024/2025 26 / 49
Inference statistics
Nguyen Van Quy Econometrics 2024/2025 27 / 49
Inference with T-distribution
We know the distribution of each βj , but one value is still unknown σj .
The unbiased estimator of σ 2 in the general multiple regression case is
RSS
b2 =
σ
n−k −1
Standard error of βbj is
σ
b
Se(βbj ) = q
TSSj (1 − Rj2 )
Using what we know about the distribution of βbj (normal) and Se(βbj )
(χ2 ), we get :
βbj − βj
∼ tn−k−1
Se(βbj )
Nguyen Van Quy Econometrics 2024/2025 28 / 49
Inference with T-distribution
Statistic Hyp. pair Reject H0 P-value
H0 : βj = βj∗ |t| > t(n−k−1)α/2 2P(T > |tobs |)
βbj − βj∗ 6 βj∗
H1 : β j =
t= H0 : βj > βj∗ t > t(n−k−1)α/2 P(T > tobs )
Se(βbj )
H1 : β j > βj∗
H0 : β j = βj∗ t < −t(n−k−1)α/2 P(T < tobs )
H1 : β j < βj∗
Important t-test
H0 : βj = 0 vs H1 : βj 6= 0, j = 1, k
βbj
If |t| = > tα/2 , reject H0 : significant coefficient
Se(βbj )
Nguyen Van Quy Econometrics 2024/2025 29 / 49
Inference of Coefficients
Confidence interval of single coefficient
βbj ± tα/2 Se(βbj )
Inference on two coefficient, say β1 ± β2
Testing H0 : β1 ± β2 = β ∗
(βb1 ± βb1 ) − βj∗
t=
Se(βb1 ± βb1 )
Confidence interval: (βb1 ± βb1 ) ± Se(βb1 ± βb1 )
q
where Se(βb1 ± βb1 ) = Se 2 (βb1 ) + Se 2 (βb1 ) ± 2Cov (βb1 , βb1 )
Nguyen Van Quy Econometrics 2024/2025 30 / 49
The test procedure : an example
Consider the following model:
income = a + b × height + c × education
estimated over N individuals
Suppose that the estimated parameter b̂ is close to zero
We thus infer that variable height could be irrelevant : the correlation
between income and height could (should) be zero
The ”true” b should be zero
But even if it is the case, it is very unlikely that we get b̂ = 0 (due to
computations).
Given the computed b̂, there should be a way to assess if the ”true” b
is in fact zero or not
Nguyen Van Quy Econometrics 2024/2025 31 / 49
The test procedure : an example
Let’s call H0 the hypothesis: b = 0, and H1 the hypothesis: b 6= 0
Should we consider H0 as true?
βbj −βj
We know that for this model, ∼ tn−3
Se(βbj )
Is the latter still plausible, if we take H0 as granted ?
Taking H0 as granted means that we assume b = 0, so that t
βbj
becomes t =
Se(βbj )
If under H0 , we find this value t to be unlikely to belong to a tn−3
distribution, then we will say that H0 was wrong
Rejecting H0 ⇐⇒ parameter b is significant
Not rejecting H0 ⇐⇒ parameter b is not significant
Nguyen Van Quy Econometrics 2024/2025 32 / 49
Example
Regression result in sample of 12 employees, in which wage depends on
experience (exp: year), education (edu: year), dummy of male
[ i = −4.9 + 0.41 expi + 0.83 edui + 1.2 malei , R 2 = 0.7575
wage
(4.38) (0.098) (0.299) (1.125)
(a) Interpret estimated slope and coefficient of determination
(b) At 5%, test for significant of slope
(c) Confidence 95% of significant slope
(d) Test hypothesis that slope of experience equals unit
(e) Test for hypothesis that slope of experience is less than slope of
education, and confidence interval of difference at 5%, knowing
covariance of estimated slopes is 0.001.
Nguyen Van Quy Econometrics 2024/2025 33 / 49
Some remarks
α = TypeIerror = P(H0 rejected|H0 is true)
β = TypeIIerror = P(H0 accepted|H1 is true)
α is the significance level, what is the intuition of α?
1 − β is the power of a test : it indicates how powerful a test is in
finding deviations from the null hypothesis H0
Lowering α =⇒ increasing β. Why?
Since we cannot minimize both, we set α as fixed (e.g. 5%) and try
to find the test that minimizes β for this given α
Nguyen Van Quy Econometrics 2024/2025 34 / 49
Some remarks
Dropping a useful variable can lead to non consistent estimates, while
keeping an unimportant variable only leads to loss in precision
Say we set α = 0.01 with a small sample size: then estimates are
likely to have a large variance
So even if the true parameter is not zero, its t-statistic is likely to be
small, thus failing to reject H0 although it is false
In that case, we might remove from the analysis a relevant variable
simply because we’ve been too stringent about the size of the test
Nguyen Van Quy Econometrics 2024/2025 35 / 49
Example
Suppose we are testing the hypothesis b = 0, while the true value is
b = 0.1
The probability that we reject the null (H0 ) depends on the standard
error of b̂, thus on the sample size
The larger the sample, the smaller the standard error so the more
likely we are to reject H0 .
Type II errors thus become increasingly unlikely when sample size
increases
We can thus decrease the size of the test α, e.g., 1%
Similarly, we can choose a size of 10% in small samples
Nguyen Van Quy Econometrics 2024/2025 36 / 49
Correlation and Estimated Coefficient
Model: y = β0 + β1 x1 + · · · + βk xk + u
Correlation of xk and y and estimated β
ck may has different sign
Added Variable plot
Regress y = β0 + β1 x1 + · · · + βk−1 xk−1 + u1 , gains e1
Regress xk = α0 + α1 x1 + · · · + αk−1 xk−1 + u2 , gains e2
Plot e2 on e1 → Added Plot, shows relationship of y versus xk
Partial Correlation
t(βck )
ry ,xk |x6=k = q
ck ))2 + n − k − 1
(t(β
Nguyen Van Quy Econometrics 2024/2025 37 / 49
Prediction Interval
1
x ∗
1
Forecast at x1 = x1∗ , . . . , xk = xk∗ or vector x ∗ = .
..
xk∗
ck x ∗ = x ∗0 βb
Point estimate: yc∗ = βb0 + βb1 x1∗ + · · · + β k
q
0
Standard error: Se(pred) = s 1 + x (X X )−1 x ∗
∗ 0
Confidence interval
yc∗ ± t(n−k−1)α/2 Se(pred)
Nguyen Van Quy Econometrics 2024/2025 38 / 49
Inference with F-distribution
Testing for reducing model
Full model
y = β0 + β1 x1 + · · · + βk xk + u
Reduced model, after remove p explanatory variable
y = β0 + β1 x1 + · · · + βk−p xk−p + u
Hypotheses
H0 : βk−p+1 = · · · = βk = 0: Reduced model is correct
H1 : not H0 : Reduced model is not correct
Statistic
(RSSReduced − RSSFull )/p RSSReduced − RSSFull
Fstat = = 2
RSSFull /(n − k − 1) p × sFull
If Fstat > f(p,n−k−1)α then reject H0 .
Nguyen Van Quy Econometrics 2024/2025 39 / 49
Overall Significant Test
Other formula of Reducing test (if dependent variable is unchanged)
2 − R2
(RFull Reduced )/p
Fstat = 2
(1 − RFull )/(n − k − 1)
Most important F-test: for all slopes, i.e, p = k
H0 : β1 = · · · = βk = 0: model is overall insignificant
H1 : not H0 : model is overall significant
2 /k
RFull
Fstat = 2 )/(n − k − 1)
(1 − RFull
If Fstat > f(k,n−k−1)α then reject H0 .
Nguyen Van Quy Econometrics 2024/2025 40 / 49
Linear Hypothesis Testing
Combine hypothesis: H0 : (β1 = 2 and β2 = 3): cannot using T-test
Called 2 restrictions, matrix present
1 0 β1 2
=
0 1 β2 3
General linear hypothesis (restriction) of coefficients C β = d
Hypotheses
( pair
H0 : C β = d
H1 : C β 6= d
Number or ”=” is number of restrictions, is p
Nguyen Van Quy Econometrics 2024/2025 41 / 49
Linear Hypothesis Testing
Under H0 , full model → reduced model
Full model
y = β0 + β1 x1 + β2 x2 + beta3 x3 + u
Under hypothesis β1 = 2 and β2 = 3, reduced model is
y − 2x1 − 3x2 = β0 + beta3 x3 + u
(RSSReduced − RSSFull )/p
Fstat = , critical f(p,n−k−1)α .
RSSFull /(n − k − 1)
Nguyen Van Quy Econometrics 2024/2025 42 / 49
T-test and F-test
T-test for one restriction only,
H0 contains one “=”,
H1 can be 6=, >, <
F-test for p restrictions, p can be larger than 1
H1 contains 6= only
If T-test and F-test apply for same hypothesis then
Fstat = (tstat )2
fcrit = (tcrit )2 ,
T-test and F-test have same P-value
Nguyen Van Quy Econometrics 2024/2025 43 / 49
Example
Regression result in sample of 12 observations
[ i = −4.9 + 0.41expi + 0.83edui + 1.2malei
wage
R 2 = 0.7575, RSS = 22.95, s = 1.694
At significant level 5%
1. Test for significant of model
2. Remove variable male, then R 2 = 0.723, RSS = 26.202. Test for
removing male
3. Regress wage on exp, then R 2 = 0.52, RSS = 45.423. Test for
reducing model
4. Test hypothesis that sum of coefficient of exp and edu is 1, if reduced
model has R 2 = 0.6883, RSS = 24.597.
Nguyen Van Quy Econometrics 2024/2025 44 / 49
Example in R
Data or 12 employees: exp: experience (year); edu: education (year); male
= 1 for male, = 0 otherwise; wage
exp 1 2 2 3 4 5 7 10 10 12 15 16
edu 13 12 16 11 15 15 10 15 13 11 13 15
male 1 1 0 0 1 0 1 0 0 1 1 0
wage 6 6 12 6 11 8 8 10 11 10 15 13
Nguyen Van Quy Econometrics 2024/2025 45 / 49
Matrix calculation
exp <-c(1,2,2,3,4,5,7,10,10,12,15,16)
edu <-c(13,12,16,11,15,15,10,15,13,11,13,15)
male <- c(1,1,0,0,1,0,1,0,0,1,1,0)
wage <- c(6,6,12,6,11,8,8,10,11,10,15,13)
intercept <- c(rep(1,12))
explanatory<-data.frame(intercept, exp, edu, male)
X <-data.matrix(explanatory)
y <-data.matrix(wage)
beta <- solve(t(X) %*% X) %*% (t(X) %*% y)
beta
Nguyen Van Quy Econometrics 2024/2025 46 / 49
Matrix calculation
fitted <- X %*% beta # fitted value vector
resid <- y - fitted # residual vector
resid.SS <- t(resid) %*% resid # residual SS matrix
resid.SS <- as.vector(resid.SS) # convert to value
s.sq <- resid.SS/8 # regression variance
cov.beta <- s.sq* solve(t(X)%*% X) # covariance matrix
var.beta <- diag(cov.beta) # variance of coef
var.beta <- data.matrix(var.beta) # convert into matrix
se.beta <- sqrt(var.beta) # standard error
t.beta <- beta/se.beta # t-statistic
p.beta <- 2*(1-pt(abs(t.beta),8)) # P-value of t-test
TSS <- sum((wage - mean(wage))^2) # Total SS
R2 <- 1 - resid.SS/TSS # R-square
f <- (R2/3)/((1-R2)/8) # F-statistic
p.ftest <- pf(1-f,3,8) # P-value of F-test
Nguyen Van Quy Econometrics 2024/2025 47 / 49
Output
#output
reg1 <-lm(wage ~ exp + edu + male)
summary(reg1)
#variance-covariance matrix
round(vcov(reg1),4)
Nguyen Van Quy Econometrics 2024/2025 48 / 49
Linear hypothesis testing
Install packet AER
install.packages("AER")
library(AER)
Test hypothesis: βexp = 1
linearHypothesis(reg1,"exp = 1")
Test hypothesis: βexp + βedu = 1
linearHypothesis(reg1, "exp + edu = 1")
Testing for deleting 2 variables edu and male
reg1 <- lm(wage ~ exp + edu + male)
reg2 <- lm(wage ~ exp)
anova(reg1, reg2)
Nguyen Van Quy Econometrics 2024/2025 49 / 49