PRACTICAL EXERCISE 11: SOLUTION
1. Start a log file in your folder (call it prac11.log)
2. Open the dataset hetero1.dta.
3. Regress y on x1 and x2 and then predict the fitted y-values and the residuals.
. regress y x1 x2
Source | SS df MS Number of obs = 85
-------------+------------------------------ F( 2, 82) = 140.53
Model | 7237.7469 2 3618.87345 Prob > F = 0.0000
Residual | 2111.58429 82 25.7510279 R-squared = 0.7741
-------------+------------------------------ Adj R-squared = 0.7686
Total | 9349.33118 84 111.301562 Root MSE = 5.0745
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .000542 .000122 4.44 0.000 .0002992 .0007847
x2 | .2833033 .0284442 9.96 0.000 .2267186 .3398879
_cons | 39.43803 1.948595 20.24 0.000 35.56165 43.3144
------------------------------------------------------------------------------
predict yfit
predict e,resid
A TESTING PROCEDURES
4. Graph the residuals against yfit:
scatter e yfit
scatter e yfit, yline(0)
10
5
Residuals
-50
-10
-15
40 50 60 70 80
Fitted values
Do you think there is heteroscedasticity? - There is evidence of heteroscedasticity,
since the spread of the residuals varies with the fitted values, i.e. with the
explanatory variables. The variance of the residuals appears larger in the
middle of the scatter plot and is smaller for the smaller and much larger fitted
values.
5. Conduct the PARK test for heteroscedasticity (follow in the notes):
806000670.doc Page 1 of 11
Step 1: You have already carried this out (i.e. regressed y on x1 and x2).
Step 2: Enter the commands:
g esq=e^2
g lnesq=log(esq)
Step 3: Generate logged values of all your other variables.
g lnx1=log(x1)
g lnx2=log(x2)
g lnyfit=log(yfit)
Step 4: Regress ‘lnesq’ on each of the logs of your explanatory variables (separately) as well as on
the log of the estimated Y variable.
. reg lnesq lnx1
Source | SS df MS Number of obs = 85
-------------+------------------------------ F( 1, 83) = 13.81
Model | 47.3903009 1 47.3903009 Prob > F = 0.0004
Residual | 284.885186 83 3.43235164 R-squared = 0.1426
-------------+------------------------------ Adj R-squared = 0.1323
Total | 332.275487 84 3.95566056 Root MSE = 1.8527
------------------------------------------------------------------------------
lnesq | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnx1 | -.4951857 .133266 -3.72 0.000 -.7602464 -.2301251
_cons | 5.536888 .9948185 5.57 0.000 3.558234 7.515542
------------------------------------------------------------------------------
. reg lnesq lnx2
Source | SS df MS Number of obs = 85
-------------+------------------------------ F( 1, 83) = 4.86
Model | 18.3831962 1 18.3831962 Prob > F = 0.0302
Residual | 313.892291 83 3.78183483 R-squared = 0.0553
-------------+------------------------------ Adj R-squared = 0.0439
Total | 332.275487 84 3.95566056 Root MSE = 1.9447
------------------------------------------------------------------------------
lnesq | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnx2 | -1.075221 .4876841 -2.20 0.030 -2.045205 -.1052372
_cons | 6.483508 2.082121 3.11 0.003 2.342253 10.62476
------------------------------------------------------------------------------
. reg lnesq lnyfit
Source | SS df MS Number of obs = 85
-------------+------------------------------ F( 1, 83) = 11.48
Model | 40.3828102 1 40.3828102 Prob > F = 0.0011
Residual | 291.892677 83 3.51677924 R-squared = 0.1215
-------------+------------------------------ Adj R-squared = 0.1110
Total | 332.275487 84 3.95566056 Root MSE = 1.8753
------------------------------------------------------------------------------
lnesq | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnyfit | -4.572106 1.349244 -3.39 0.001 -7.255699 -1.888514
_cons | 20.81832 5.58168 3.73 0.000 9.716587 31.92006
------------------------------------------------------------------------------
806000670.doc Page 2 of 11
Step 5: Follow the rest of your notes (page 9) to establish if your data has heteroscedasticity.
heteroscedasticity
Is hetroscedasticity present? Which of the variable(s) do you think is ‘responsible’ for the
heteroscedasticity?
H0: No heteroscedasticity (or disturbance terms are homoscedastic), i.e. B 2 = 0
H1: Evidence for heteroscedasticity, i.e. B2 ≠ 0.
We test whether the slope coefficient in each regression is statistically
significant. The easiest way to do this is to look at the relevant p-value.
The first and third regressions confirm that we have heteroscedasticity, since
we reject the null hypothesis that B2=0 (i.e. the null hypothesis of
homoscedasticity). But since we fail to reject the null of homoscedasticity for
the variable lnx2 (i.e. the second regression) at the 1% significance level, it
appears that in variable x1 heteroscedasticity is a greater problem.
6. Conduct the GLEJSER test for heteroscedasticity (as outlined in the notes):
Step 1: Generate absolute values of the residuals from the original regression as follows:
gen eabs=abs(e)
Step 2: Generate the square root and inverse values of x1 and x2 as follows:
gen x1sqrt=sqrt(x1)
gen x2sqrt=sqrt(x2)
gen x2inv = 1/x2
gen x1inv = 1/x1
Step 3: Then run the regressions as outlined on page 10 of your notes (i.e. equations (2), (3) and (4)).
. reg eabs x1
Source | SS df MS Number of obs = 85
-------------+------------------------------ F( 1, 83) = 11.61
Model | 104.759459 1 104.759459 Prob > F = 0.0010
Residual | 748.86638 83 9.02248651 R-squared = 0.1227
-------------+------------------------------ Adj R-squared = 0.1122
Total | 853.625839 84 10.1622124 Root MSE = 3.0037
------------------------------------------------------------------------------
eabs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | -.0001934 .0000568 -3.41 0.001 -.0003063 -.0000805
_cons | 4.674458 .4063425 11.50 0.000 3.866259 5.482657
------------------------------------------------------------------------------
. reg eabs x2
Source | SS df MS Number of obs = 85
-------------+------------------------------ F( 1, 83) = 5.40
Model | 52.1880898 1 52.1880898 Prob > F = 0.0225
Residual | 801.43775 83 9.6558765 R-squared = 0.0611
-------------+------------------------------ Adj R-squared = 0.0498
Total | 853.625839 84 10.1622124 Root MSE = 3.1074
------------------------------------------------------------------------------
eabs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x2 | -.0318268 .01369 -2.32 0.023 -.0590557 -.004598
_cons | 6.248627 1.086624 5.75 0.000 4.087376 8.409878
------------------------------------------------------------------------------
806000670.doc Page 3 of 11
. reg eabs x1sqrt
Source | SS df MS Number of obs = 85
-------------+------------------------------ F( 1, 83) = 10.89
Model | 99.0394774 1 99.0394774 Prob > F = 0.0014
Residual | 754.586362 83 9.09140195 R-squared = 0.1160
-------------+------------------------------ Adj R-squared = 0.1054
Total | 853.625839 84 10.1622124 Root MSE = 3.0152
------------------------------------------------------------------------------
eabs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1sqrt | -.0269136 .0081542 -3.30 0.001 -.0431321 -.0106952
_cons | 5.237445 .5333163 9.82 0.000 4.1767 6.298189
------------------------------------------------------------------------------
. reg eabs x2sqrt
Source | SS df MS Number of obs = 85
-------------+------------------------------ F( 1, 83) = 3.80
Model | 37.3360267 1 37.3360267 Prob > F = 0.0547
Residual | 816.289813 83 9.83481702 R-squared = 0.0437
-------------+------------------------------ Adj R-squared = 0.0322
Total | 853.625839 84 10.1622124 Root MSE = 3.1361
------------------------------------------------------------------------------
eabs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x2sqrt | -.4170348 .2140382 -1.95 0.055 -.8427482 .0086787
_cons | 7.40853 1.859287 3.98 0.000 3.710483 11.10658
------------------------------------------------------------------------------
. reg eabs x1inv
Source | SS df MS Number of obs = 85
-------------+------------------------------ F( 1, 83) = 1.80
Model | 18.1080855 1 18.1080855 Prob > F = 0.1835
Residual | 835.517754 83 10.066479 R-squared = 0.0212
-------------+------------------------------ Adj R-squared = 0.0094
Total | 853.625839 84 10.1622124 Root MSE = 3.1728
------------------------------------------------------------------------------
eabs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1inv | 274.5804 204.7255 1.34 0.184 -132.6105 681.7713
_cons | 3.422781 .4674167 7.32 0.000 2.493108 4.352453
------------------------------------------------------------------------------
. reg eabs x2inv
Source | SS df MS Number of obs = 85
-------------+------------------------------ F( 1, 83) = 0.58
Model | 5.90363082 1 5.90363082 Prob > F = 0.4492
Residual | 847.722209 83 10.2135206 R-squared = 0.0069
-------------+------------------------------ Adj R-squared = -0.0050
Total | 853.625839 84 10.1622124 Root MSE = 3.1959
------------------------------------------------------------------------------
eabs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x2inv | 26.73777 35.16846 0.76 0.449 -43.21087 96.68641
_cons | 3.417822 .6624514 5.16 0.000 2.100232 4.735411
------------------------------------------------------------------------------
806000670.doc Page 4 of 11
What do you conclude? Do your results support the conclusions arrived at from the Park test.
H0: No heteroscedasticity (or disturbance terms are homoscedastic), i.e. B 2 = 0
here
H1: Evidence for heteroscedasticity, i.e. B2 ≠ 0.
We test whether the slope coefficient in each regression is statistically
significant. The easiest way to do this is to look at the relevant p-value.
The regression results again tend to confirm that we have heteroscedasticity
in the data, especially with regard to the x1 variable (per capita income) and
to a lesser degree with the x2 variable (access to healthcare). Yes, the results
very much confirm those obtained with the Park test.
7. Conduct the White test by running the regression outlined in the notes (page 11). You will first need
to generate variables in addition to those already created above before you run the regression.
gen x1sq= x1^2
gen x2sq= x2^2
gen x1x2= x1*x2
reg esq x1 x2 x1sq x2sq x1x2
Source | SS df MS Number of obs = 85
-------------+------------------------------ F( 5, 79) = 3.64
Model | 24237.9326 5 4847.58651 Prob > F = 0.0052
Residual | 105353.083 79 1333.58333 R-squared = 0.1870
-------------+------------------------------ Adj R-squared = 0.1356
Total | 129591.016 84 1542.75019 Root MSE = 36.518
------------------------------------------------------------------------------
esq | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .0407372 .0179974 2.26 0.026 .0049143 .0765602
x2 | 2.029915 1.156166 1.76 0.083 -.2713762 4.331206
x1sq | -1.94e-07 2.09e-07 -0.93 0.356 -6.09e-07 2.22e-07
x2sq | -.0157391 .0097046 -1.62 0.109 -.0350557 .0035775
x1x2 | -.000385 .0001772 -2.17 0.033 -.0007378 -.0000323
_cons | -35.58042 31.98197 -1.11 0.269 -99.23894 28.07809
------------------------------------------------------------------------------
Thereafter, since you also need to calculate the chi-squared statistic, and its probability, which is nR 2,
type:
. di 85*0.1870
15.895
. di chiprob(5, 15.895)
.00715034
This displays a p-value for the hypothesis test with H 0: homoscedasticity. What is your conclusion at
the conventional significance levels of 1%, 5% and 10%?
H0: No heteroscedasticity (or disturbance terms are homoscedastic)
H1: Evidence for heteroscedasticity
The p-value is close to zero (about 0.7%), hence we reject the null hypothesis
at all conventional significance level.
Thus this clearly agrees with the result from the previous two tests: again, we
reject the null hypothesis that the data are homoscedastic at all conventional
significance levels.
806000670.doc Page 5 of 11
8. Now conduct STATA’s automatic White test:
reg y x1 x2
imtest, white
Looking at the first part of the output, do the results agree with your manual test above?
. reg y x1 x2
-- Output omitted --
. imtest, white
White's test for Ho: homoskedasticity
against Ha: unrestricted heteroskedasticity
chi2(5) = 15.90
Prob > chi2 = 0.0071
Cameron & Trivedi's decomposition of IM-test
---------------------------------------------------
Source | chi2 df p
---------------------+-----------------------------
Heteroskedasticity | 15.90 5 0.0071
Skewness | 3.86 2 0.1451
Kurtosis | 0.92 1 0.3378
---------------------+-----------------------------
Total | 20.68 8 0.0081
---------------------------------------------------
H0: No heteroscedasticity (or disturbance terms are homoscedastic)
H1: Evidence for heteroscedasticity
The chi-squared and p-values are the same as before, thus this command
carries out the same test that you did manually in the previous question.
9. Conduct the Cook-Weisberg test as follows:
Repeat the regression of y on x1 and x2.
Type: hettest
. hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of y
chi2(1) = 4.29
Prob > chi2 = 0.0382
What is your conclusion?
H0: No heteroscedasticity (or disturbance terms are homoscedastic)
H1: Evidence for heteroscedasticity
We reject the null hypothesis of homoscedasticity at the 5% and 10%
significance levels. This agrees with the earlier results.
806000670.doc Page 6 of 11
Now type:
hettest x1
hettest x2
. hettest x1
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: x1
chi2(1) = 7.39
Prob > chi2 = 0.0066
. hettest x2
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: x2
chi2(1) = 2.34
Prob > chi2 = 0.1262
Is the conclusion the same? Can you understand why you get different results?
Instead of using the fitted value of y (i.e. a linear combination of x1 and x2) as
the variable explaining the heteroscedasticity, we now hone in on the actual
sources.
H0: No heteroscedasticity (or disturbance terms are homoscedastic)
H1: Evidence for heteroscedasticity
We reject the null hypothesis in the first test and fail to reject the null
hypothesis in the second test at all conventional significance levels. Clearly,
the heteroscedasticity is related to x1 rather than x2.
10. Examine the pattern of heteroscedasticity by graphing the residuals (e) against x1 and x2
separately.
10
10
5
5
0
0
Residuals
Residuals
-5
-5
-10
-10
-15
-15
0 5000 10000 15000 20000 20 40 60 80 100
Per capita income in US Dollars Index of access to healthcare (0 = lowest, 100 = highest)
What do you observe?
- This confirms the conclusion in 10 that heteroscedasticity is related to x1
rather than x2.
B REMEDIAL MEASURES
806000670.doc Page 7 of 11
11. Carry out a weighted least squares regression as described below. Recall that you need to make an
assumption about the pattern of heteroscedasticity. First try the weighting suggested in the notes- p
14) i.e. . We transform the variables by dividing each one of them by the square root of x1.
We also generate a new variable x0t =1/x1sqrt.
. g yt=y/x1sqrt
. g x1t=x1/ x1sqrt
. g x2t=x2/ x1sqrt
. g x0t=1/ x1sqrt
Now regress the transformed variables without an intercept [Note that to perform a regression without
an intercept, you have to put “ reg…, nocons” after the last variable].
. regress yt x0t x1t x2t, nocons
Source | SS df MS Number of obs = 85
-------------+------------------------------ F( 3, 82) = 3621.72
Model | 384.26462 3 128.088207 Prob > F = 0.0000
Residual | 2.90007062 82 .035366715 R-squared = 0.9925
-------------+------------------------------ Adj R-squared = 0.9922
Total | 387.164691 85 4.55487872 Root MSE = .18806
------------------------------------------------------------------------------
yt | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x0t | 39.5879 1.370164 28.89 0.000 36.86221 42.31359
x1t | .0012724 .0003639 3.50 0.001 .0005486 .0019962
x2t | .2399105 .0241113 9.95 0.000 .1919454 .2878756
------------------------------------------------------------------------------
12. Retrieve the residuals. (You will have to use a different name, i.e. not “e”, but et, for example.) Plot
them against x1 and x2 separately. Do you think you have solved the heteroscedasticity problem?
. predict et,resid
. scatter et x1, yline(0)
. scatter et x2, yline(0)
806000670.doc Page 8 of 11
.4
.2
Residuals
0
-.2
-.4
0 5000 10000 15000 20000
Per capita income in US Dollars
.4
.2
Residuals
0
-.2
-.4
20 40 60 80 100
Index of access to healthcare (0 = lowest, 100 = highest)
- The 1st graph suggests we haven’t got rid of the heteroscedasticity. Since
the heteroscedasticity did not appear to be related to x2, the 2nd graph is
no surprise.
13. We now assume that the error variance is proportional to (x1)2. Referring to page 16 of the notes, try
the following weighting. Generate a transformed y by dividing it by x1, and a transformed x2 by
dividing it by x1, and generate a new variable x0w=1/x1.
. g yw=y/x1
. g x2w=x2/x1
. g x0w=1/x1
Then regress the transformed y on x0w and the transformed x2 with an intercept.
. regress yw x2w x0w
Source | SS df MS Number of obs = 85
-------------+------------------------------ F( 2, 82) = 4118.37
Model | .546954436 2 .273477218 Prob > F = 0.0000
Residual | .005445151 82 .000066404 R-squared = 0.9901
-------------+------------------------------ Adj R-squared = 0.9899
Total | .552399587 84 .006576186 Root MSE = .00815
------------------------------------------------------------------------------
yw | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x2w | .1737642 .0202111 8.60 0.000 .1335579 .2139706
x0w | 40.48373 .9714293 41.67 0.000 38.55125 42.41622
_cons | .0055482 .0012883 4.31 0.000 .0029854 .0081109
------------------------------------------------------------------------------
806000670.doc Page 9 of 11
Retrieve the residuals (again, you will have to use a different name e.g. ew) and plot them against x1
and x2 separately. Do you think that you have now solved the heteroscedasticity problem?
. predict ew,resid
. scatter ew x1, yline(0)
. scatter ew x2, yline(0)
.03
.03
.02
.02
.01
.01
Residuals
Residuals
0
0
-.01
-.01
-.02
-.02
0 5000 10000 15000 20000 20 40 60 80 100
Per capita income in US Dollars Index of access to healthcare (0 = lowest, 100 = highest)
- Unfortunately, we still haven’t solved the problem, as the first graph
indicates the presence of heteroscedasticity.
806000670.doc Page 10 of 11
14. Perform a heteroscedasticity test:
. hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of yw
chi2(1) = 30.52
Prob > chi2 = 0.0000
What do you conclude?
H0: No heteroscedasticity (or disturbance terms are homoscedastic)
H1: Evidence for heteroscedasticity
We can see that we have not solved the hetroscedasticity problem since we
can reject the null hypothesis of homoscedasticity at all conventional levels of
significance.
15. Finally, deal with the heteroscedasticity by using robust estimation:
regress y x1 x2, robust
Linear regression Number of obs = 85
F( 2, 82) = 251.81
Prob > F = 0.0000
R-squared = 0.7741
Root MSE = 5.0745
------------------------------------------------------------------------------
| Robust
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .000542 .0000952 5.70 0.000 .0003527 .0007313
x2 | .2833033 .0261322 10.84 0.000 .2313181 .3352884
_cons | 39.43803 1.823039 21.63 0.000 35.81142 43.06463
------------------------------------------------------------------------------
Note the differences in your results (comparing them with your original regression results), particularly
the sizes of the coefficient errors. Are the changes in these as expected?
- The coefficients are exactly the same as the original regression, but the
standard error of the “main culprit” variable (x1) is substantially higher than
in the original regression. This is not unexpected since the OLS coefficients are
unbiased, but OLS is less efficient than GLS, i.e. the coefficients will be less
precisely determined.
806000670.doc Page 11 of 11