Econometrics - Exercise set 5
Exercise 1
Remember exercise 2 in exercise set 4. Load the same dataset (E42.RData) and estimate the following
model again.
𝑔𝑑𝑝𝑖 = 𝛽0 + 𝛽1 𝑙𝑎𝑏𝑜𝑢𝑟𝑖 + 𝛽2 𝑒𝑑𝑢𝑖 + 𝛽3 𝑒𝑛𝑒𝑟𝑔𝑦𝑖 + 𝛽4 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖 + 𝜀𝑖
1.1
Would quantile regression be appropriate to use with this dataset? If yes, why? If no, why?
Yes, since the effects from the different regressors might vary depending on the different levels of GDP.
E.g., the role of education may be more/less important at different stages of economic development.
A simple QQ-plot of the least squares regression model indicates that the residuals are not normally
distributed, and we have a few outliers for which the least squares estimate might not be representative.
Page 1 of 12
Least-squares method
Estimate the average value of 𝑦 for each 𝑥𝑖𝑘 :
𝑛
𝑆(𝛽) = ∑(𝑦𝑖 − 𝑥′𝑖 𝛽)2
𝑖=1
Quantile regression
The quantile regression estimates the conditional median of the dependent variable. When minimizing
the expression below for a given quantile (𝜏 ), we find the 𝜏 th quantile best-fit median.
𝑆(𝜉) = 𝜏 ∑|𝑦𝑖 − 𝜉| + (1 − 𝜏 ) ∑|𝑦𝑖 − 𝜉|
𝑦𝑖 ≥𝜉 𝑦𝑖 <𝜉
1.2
Conduct Quantile Regression on the model with 𝝉 = [𝟎. 𝟐𝟓, 𝟎. 𝟓, 𝟎. 𝟕𝟓]
Page 2 of 12
Y-axes: Effect on GDP
X-axes: Quantiles
The graphs above plot the marginal effects for all deciles. The effect of education and capital are
somewhat constant, while the ones for labour and energy are decreasing and increasing, respectively.
1.3
Comment on your findings in 1.2
Labour: The marginal effect decreases substantially when focusing on larger values of GDP. That
is, an increase in labour has a smaller marginal effect when GDP is large. This could indicate the
importance of labour for developing countries. Notice that labour is barely significant in the third
specification.
Education: Positive and (highly) significant across all quantiles.
Energy: Significant on a 1% significance level across all specifications. Effect is gradually
increasing as we consider higher quantiles.
Capital: Capital only appears to have (slight) significant effect for countries with low GDP. For
countries with low GDP, an increase in the capital stock is expected to decrease GDP.
Page 3 of 12
Exercise 2
The dataset E52.RData contains information on net financial wealth (nettfa), age of survey respondent
(age), annual family income (inc), family size (fsize) and information on participation in certain pension
plans for people in the United States. The wealth and income variables are both recorded in thousands
of euros.
For this problem, use only data for married people without children living at home (marr = 1, fsize =
2) and consider the following model,
𝑛𝑒𝑡𝑡𝑓𝑎 = 𝛽0 + 𝛽1 𝑖𝑛𝑐 + 𝛽2 𝑎𝑔𝑒 + 𝜀
2.1
Estimate the model above by OLS, and comment on the results. Do not forget to comment on
intercept
Income: A one-unit increase in income is expected to increase net financial wealth by 1.399. The
coefficient is significant at a 1% significance level.
Age: A one-year increase in age is expected to increase net financial wealth by 1.840. The
coefficient is significant at a 1% significance level.
Page 4 of 12
Intercept: This is the predicted net financial wealth for an individual for whom age = 0 and income
= 0. In this setting, the intercept brings little relevant information as we are not interested in predicting
the wealth for infants.
2.2
Reformulate the model such that the intercept fits the data. Please comment on your findings
Note: There are multiple ways of reformulating the model. This can either be done by demeaning the
variables, making a log-transformation or including additional regressors.
Demeaned model
To demean a model, we subtract the mean of each regressor:
𝑛𝑒𝑡𝑡𝑓𝑎 = 𝛽0 + 𝛽1 (𝑖𝑛𝑐 − 𝚤𝑛𝑐
̅̅̅̅̅̅̅̅) − 𝛽2 (𝑎𝑔𝑒 − 𝑎𝑔𝑒
̅̅̅̅̅̅̅̅̅) + 𝜖
In this case, the intercept can be interpreted as the average net financial wealth when all the regressors
are set to their means. From the output, we see that when income and age are evaluated at their
average values, net financial wealth is expected to be positive (48.035).
Page 5 of 12
The interpretation of the coefficients does not change.
Additional regressors
Added male, pira, p401k, incsq, and agesq. Thus, I will be estimating:
𝑛𝑒𝑡𝑡𝑓𝑎 = 𝛽0 + 𝛽1 𝑖𝑛𝑐 + 𝛽2 𝑎𝑔𝑒 + 𝛽3 𝑚𝑎𝑙𝑒 + 𝛽4 𝑝𝑖𝑟𝑎 + 𝛽5 𝑝401𝑘 + 𝛽6 𝑖𝑛𝑐2 + 𝛽7 𝑎𝑔𝑒2 + 𝜖
Male: Insignificant effect.
IRA: Significant on a 1% level. Having an “individual retirement account” increases net wealth
by 35.832.
P401k: Significant on a 1% level. Having an “employer-sponsored defined-contribution pension
account” increases net financial wealth by 15.894.
Income (sq.): Significant on a 1% level. Interestingly, for low values of income the effect is negative,
but turns positive at some point.
Age (sq.): Insignificant effect.
Log-transformed model
Here, I will be estimating:
log(𝑛𝑒𝑡𝑡𝑓𝑎) = 𝛽0 + 𝛽1 𝑖𝑛𝑐 + 𝛽2 𝑎𝑔𝑒 + 𝜖
Income: A one-unit increase in income is expected to increase net financial wealth by 3.1%. This
effect is significant at a 1% significance level.
Age: A one-year increase in age is expected to increase net financial wealth by 4.4%.
Page 6 of 12
2.3
Test the following hypothesis, 𝑯𝟎 : 𝜷𝟏 = 𝟏 (coefficient on income), on one or more of your
reformulated models and compare to the original model
Given the extremely low p-values, we can reject 𝐻0 : 𝛽1 = 1. Hence, it is plausible that 𝛽1 = 1.399 in
the original and demeaned model.
Note that the test is not exactly applicable to the 3rd and 4th model as we include squared covariates
and use a log-transformation, respectively.
2.4
Remove age and agesq from the models and comment on your findings
Every regressor turns out to be highly significant. For the original and demeaned model, the coefficient
on income decreases slightly compared to the regressions including age.
The magnitude of the income, male, IRA, and income squared coefficients increases in Model 3 (with
additional regressors), while the 401(k) variable decreases in magnitude.
Page 7 of 12
The effect of a one-unit increase in income decreases from 3.1% to 2.9%. This might be caused by the
fact that income and age is negatively correlated.
2.5
Estimate the model(s) without age and agesq by Quantile Regression for 𝝉=
[𝟎. 𝟎𝟓, 𝟎. 𝟐𝟓, 𝟎. 𝟓, 𝟎. 𝟕𝟓, 𝟎. 𝟗𝟓], and compare to OLS
Model 1 and 2
Interestingly, from our original and demeaned models, it seems that, as net financial wealth increases,
the effect of income becomes stronger. Compared to OLS, the 75% quantile is the most representative.
Model 3
In model 3, where additional regressors are included, the effect from income is negative and increasing
in magnitude as we consider higher quantiles of net financial wealth.
The effects of being male is relatively constant in all specifications, with the exception of at 𝜏 = 0.95
where the magnitude increases substantially.
Page 8 of 12
The effect of having an “individual retirement account” seems to have a larger effect for individuals
with higher net financial wealth. The same can be said for having an “employer-sponsored defined-
contributions pension account”.
The positive coefficient on 𝑖𝑛𝑐2 suggests that the effect from income follows a U-shape for all quantiles.
Model 4
In the log-transformed model, the effect of income has a constant effect for the 𝜏 = 5% and 𝜏 = 25%
quantiles (approx. 3.4%). From then on, the effect decreases as net financial wealth increases.
2.6
How are the Standard Errors estimated in the Quantile Regression? And under which assumptions
are they valid?
Hint: Look in “summary.rq” in the help section in R to find the different estimation methods for the
standard errors.
The general formula for computing standard errors in a quantile regression is:
√ 𝐽 2
√∑ (𝜃 ̂ − 𝜃𝑗̂ )
√
𝑆𝐸(𝜃)̂ =
𝑗=1
⎷ 𝐽 −1
The reason for this formula, is that the standard errors are cumbersome to obtain so we draw j random
samples with replacement. 𝜃 ̂ is the estimate based on the original sample (each quantile data set).
Let 𝜃𝑗̂ denote the estimate from bootstrap sample number 𝑗.
Page 9 of 12
Exercise 3
Perform an analysis of factors affecting birthweight similar to the one in Koenker and Hallock (2001),
Quantile Regression, but using a dataset from the MASS package in RStudio. Be sure to install the
MASS-package and load the dataset using the following code: data=birthwt
This data frame contains the following columns:
• Low indicator of birth weight less than 2.5 kg.
• Age mother’s age in years.
• Lwt mother’s weight in pounds at last menstrual period.
• Race mother’s race (1 = white, 2 = black, 3 = other).
• Smoke smoking status during pregnancy.
• Ptl number of previous premature labours.
• Ht history of hypertension.
• Ui presence of uterine irritability.
• Ftv number of physician visits during the first trimester.
• Bwt birth weight in grams.
3.1
The dataset is substantially smaller than that of Koenker and Hallock (2001), p. 6. Is that an issue?
Yes, since it makes computing standard errors and coefficients more difficult because fewer observations
cause less consistent models from the central limit theorem.
Page 10 of 12
3.2
Estimate the relationship between birthweight and other relevant factors by OLS and Quantile
Regression for 𝝉 = [𝟎. 𝟎𝟓, 𝟎. 𝟐𝟓, 𝟎. 𝟓, 𝟎. 𝟕𝟓, 𝟎. 𝟗𝟓]. Argue for each and every variable, and whether
they should be transformed in any way
Mother’s weight: Mother’s weight seems to positively impact the birth weight of a child. The
effect seems to be largest around the middle quantiles, and is only statistically significant at 𝜏 = 0.75.
Race:
• Black: Increasing negative effects, with a sudden drop at the 95% quantile.
• Other: We only estimate significant effects (at a 10% significance level) at 𝜏 =
0.25 and 𝜏 = 0.50.
Age: Largest negative effect for 75% quantile.
Premature labour: Insignificant.
Smoker: Negative effect, which seems to be most prominent at 𝜏 = 0.5 (the LAD
regression)
Page 11 of 12
Hypertension: Largest negative effect at 𝜏 = 0.25, which is the only significant effect (at
a 10% significance level)
Uterine irritability: Negative effect which is significant across all quantiles except the 95%
quantile.
We experience insignificant coefficients for the tails of the distribution of children’s birthweight with
the exception of “smoker” and “uterine irritability”.
3.3
How are the standard errors calculated, and under which assumptions are they valid?
√ 𝐽 2
√∑𝑗=1(𝜃 ̂ − 𝜃𝑗̂ )
√
𝑆𝐸(𝜃)̂ =
⎷ 𝐽 −1
An assumption is that the number of samples used must be large.
It’s a valid method if errors are not normally distributed and there is heteroskedasticity.
Page 12 of 12