[go: up one dir, main page]

0% found this document useful (0 votes)
15 views18 pages

17-Econometrics-Linear Regression

The document discusses the concepts of endogenous regressors and instrumental variables in econometrics, focusing on the implications of heteroskedasticity and the use of Two Stage Least Squares (2SLS) for estimation. It explains the conditions under which an instrumental variable can be used to address endogeneity and provides examples, including the use of parental education as an instrument for estimating returns to education. Additionally, it includes problems and solutions related to the application of these concepts in regression analysis.

Uploaded by

Lorenzo Lucchesi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views18 pages

17-Econometrics-Linear Regression

The document discusses the concepts of endogenous regressors and instrumental variables in econometrics, focusing on the implications of heteroskedasticity and the use of Two Stage Least Squares (2SLS) for estimation. It explains the conditions under which an instrumental variable can be used to address endogeneity and provides examples, including the use of parental education as an instrument for estimating returns to education. Additionally, it includes problems and solutions related to the application of these concepts in regression analysis.

Uploaded by

Lorenzo Lucchesi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Econometrics

University of Milan-Bicocca

Course lecturer:
Maryam Ahmadi
maryam.ahmadi@unimib.it

1
Endogenous Regressors and
Instrumental Variables

2
Problem 16 & Answer.

3
a) Why might the suspicion about heteroskedasticity be reasonable?
It is reasonable to think that there would be
more variations in expenditure on education
among high GDP countries than among low
GDP countries. This is because high GDP
countries have more options in deciding how
much to spend on education.

b) Test for the existence of heteroskedasticity using the white test.


𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝑢 → Estimate this model. Obtain residuals, and perform the white test based on these
residuals.
𝑢ො 2 = 𝛿0 + 𝛿1 𝑥1 + 𝛿2 𝑥12 + ν
𝐻0 : 𝛿1 = 𝛿2 = 0 (Homoskedasticity)
𝐻1 : 𝐻0 is not true (Heteroskedasticity)
N*R2= 34*0.2930 =9.962 > 9.21 (CHI2(0.01, df=2))= 9.21 → therefore the null of Homoskedasticity is rejected at
the 1% significance level. Consequently the error terms are heteroskedastic.
4
c) What is the 95% confidence interval for 𝛽መ1 when one uses the standard error from the OLS?
95% Confidence interval using the usual standard error is [0.062, 0.083]
Or 𝛽መ1 ±c*se(𝛽መ1 ) = 0.073 ± 2.04*(0.0052), given that c(df=32)=2.04,

d) What is the 95% confidence interval for 𝛽መ1 when one uses the heteroskedasticity robust standard
error?
95% confidence interval using the heteroskedasticity robust standard error is [0.060, 0.085]
• Or 𝛽መ1 ±c*se(𝛽መ1 ) = 0.073 ± 2.04*(0.0062)

c) Comment on the differences between the confidence interval found in part (c) and (d). Explain
why they differ.
The formula to compute the se(𝛽መ1 ) is different when we use usual and robust standard errors, that is the
reason why they generate different numbers and different confidence intervals.
However, in the presence of Heteroskedasticity, the standard errors from OLS are not correct and the robust
standard error is the correct ones.

5
6
𝑦 = 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥k + 𝑢

Assumption A2 → conditional mean independence assumption, E(u|x)=E(u).

When there is one (or more) explanatory variable that is correlated with the error term, the
conditional mean independence assumption, A2, is violated.

• Explanatory variables that are uncorrelated with the error term are called exogenous
• A2 holds if all explanatory variables are exogenous.

• Explanatory variables that are correlated with the error term are called endogenous.
• Endogeneity is a violation of assumption A2.

Endogeneity causes the omitted variable bias.


• There is a factor in u (omitted from model regressors) that is correlated with x(makes the problem of
biasedness). We need to consider it in our model but we can not. Because it is not observed and/or there
is no data for it.
7
Instrumental Variable (IV) is the most well-known method to address endogeneity problem.

When we think that 𝑥1 and u are correlated (have non zero covariance),

Cov(𝑥1 ,u)≠0 Endogeneity

we use z as an instrumental variable for 𝑥1 only if z satisfies the following conditions:

z is uncorrelated with u
(1) Cov(z,u)=0 (exogeneity)
z is correlated with 𝑥1
(2) Cov(z, 𝑥1 )≠0 (relevance)

• z can be used as an instrument only if these two assumptions hold.


8
Finding instruments Often this is hard.

▪ Instruments need to be uncorrelated with the unobservables affecting y.

▪ E.g. we want to estimate a wage equation explaining earnings from schooling and other
variables.

▪ Error terms contains factors such as ability/inteligence which are correlated with education.

▪ We should find an instrumental variable (z) that is correlated with x (education) but
uncorrelated with u (ability)

▪ Which factor (z) affects schooling but not unobserved ability that is determining wages?
Parents’ education? Distance to school? Quarter of birth???
9
In a simple regrssion model:

IV regression breaks x into two parts:


a part that is correlated with u, and
a part that is not correlated with u.

By isolating the part that is not correlated with u, it is possible to estimate an


unbiased 𝛽1 even if E (u|x) ≠ 0.

• This is done using an instrumental variable, 𝑧𝑖 , which is uncorrelated with 𝑢𝑖 .

• The instrumental variable detects movements in 𝑥𝑖 that are uncorrelated with


𝑢𝑖 , and uses these to estimate an unbiased 𝛽1 .
10
The IV Estimator, when we have one x and one z

Two Stage Least Squares (2SLS)


As it sounds, 2SLS has two stages – two regressions:

Step one.
Isolate the part of 𝑥𝑖 that is uncorrelated with u, regress 𝑥𝑖 on 𝑧𝑖 using OLS: 𝑥𝑖 = 𝜋0 +𝜋1 𝑧𝑖 +𝑣𝑖 .
Compute the predicted values of 𝑥𝑖 , that is 𝑥ො𝑖 = 𝜋ො 0 +𝜋ො1 𝑧𝑖

Step two.
Replace 𝑥𝑖 by 𝑥ො𝑖 in the original regression, regress 𝑦𝑖 on 𝑥ො𝑖 using OLS: 𝑦𝑖 = 𝛽0 +𝛽1 𝑥ො𝑖 +𝑢𝑖

Thus the model parameter can be estimated by OLS and 𝛽መ1,𝐼𝑉 is an unbiased estimator of 𝛽1

11
Consider a simple linear regression model, y = 𝛽1 + 𝛽2 x + u, and assume existence of an
instrumental variable (z) for the endogenous variable (x).

The instrumental variable is un-


but ) correlated with the error term
(instrument exogeneity)

IV-estimator:

12
Example: Education in a wage equation

Error terms contains factors


such as innate ability which
are correlated with education

• Individual ability is an unobservable variable.

• We should find an instrumental variable (z) that is correlated with x


(education) but uncorrelated with u (ability)

• We chose father education as an instrument (z)


13
Error terms contains factors such
as innate ability which are
correlated with education

OLS: Return to education


probably overestimated

Is the education of the father a good IV?


1) It doesn‘t appear as regressor
2) It is significantly correlated with educ
3) It is uncorrelated with the error (?)

The estimated return to


education decreases (which
is to be expected)
IV:
It is also much less precisely
estimated

14
Use MROZ.dta

• An OLS estimation
. reg lwage educ

Source SS df MS Number of obs = 428


F(1, 426) = 56.93
Model 26.3264193 1 26.3264193 Prob > F = 0.0000
Residual 197.001022 426 .462443713 R-squared = 0.1179
Adj R-squared = 0.1158
Total 223.327441 427 .523015084 Root MSE = .68003

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .1086487 .0143998 7.55 0.000 .0803451 .1369523


_cons -.1851968 .1852259 -1.00 0.318 -.5492673 .1788736

15
An IV regression
An instrumental variable
. ivregress 2sls lwage (educ=fatheduc), first

First-stage regressions

Number of obs = 428


F( 1, 426) = 88.84
Prob > F = 0.0000
R-squared = 0.1726
Adj R-squared = 0.1706
Root MSE = 2.0813

educ Coef. Std. Err. t P>|t| [95% Conf. Interval]

fatheduc .2694416 .0285863 9.43 0.000 .2132538 .3256295


_cons 10.23705 .2759363 37.10 0.000 9.694685 10.77942

Instrumental variables (2SLS) regression Number of obs = 428


Wald chi2(1) = 2.85
Prob > chi2 = 0.0914
R-squared = 0.0934
Root MSE = .68778

lwage Coef. Std. Err. z P>|z| [95% Conf. Interval]

educ .0591735 .0350596 1.69 0.091 -.009542 .127889


_cons .4411034 .4450583 0.99 0.322 -.4311947 1.313402

Instrumented: educ
Instruments: fatheduc

16
Other IVs for education that have been used in the literature:

• The number of siblings

1) Not directly correlated with wage

2) Correlated with education because of resource constraints in family

3) Uncorrelated with the error term (innate ability)

17
Problem 17
1- Consider a linear model to explain monthly beer consumption:
𝑏𝑒er = 𝛽0 + 𝛽1 𝑖𝑛𝑐 + 𝛽2 𝑝𝑟𝑖𝑐𝑒 + 𝛽3 𝑒𝑑𝑢𝑐 + 𝛽4 𝑓𝑒𝑚𝑎𝑙𝑒 + 𝑢
E(uuinc, price, educ, female) = 0
Var(uuinc, price, educ, female) = 𝜎2inc2
Write the transformed equation that has a homoskedastic error term.

2- Consider the model y= 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢, and suppose that cov(𝑢,𝑥2 ) ≠ 0.


a) Is it possible to still make appropriate inferences based on the OLS estimator, while adjusting the standard
errors appropriately?
b) Explain how an instrumental variable, zi, leads to a new moment condition and, consequently, an
alternative estimator for 𝛽.
c) Why does this alternative estimator lead to a smaller R2 than the OLS one? What does this say about the R2
as a measure for the adequacy of the model?
d) Why can we not choose z= 𝑥1 as an instrument for 𝑥2 , even if E(𝑥1 ,u) = 0? Would it be possible to use 𝑥12 as
an instrument for 𝑥2 ?
18

You might also like