Panel Cookbook
Panel Cookbook
Panel Cookbook
Spring 2004
1
REGRESSION (Revision)
Cross section: 1, , i N = (NON-ORDERED)
Individuals, firms, countries etc.
Time Series: 1, , t T = (ORDERED)
Variables:
y
i
(y
t
): DEPENDENT (Endogenous)
x
ki
(x
kt
): INDEPENDENT (Exogenous), 1, , k K =
MODEL
1
K
i j ji i
j
y x o e
=
= + +
o and are parameters. e is a stochastic error
NOTE: The model can be written in matrix form
y X e = + , where
1
N
y
y
y
=
,
1
K
=
, etc
2
The simple regression model
i i i
y x o e = + +
is used as an example in this course. We always write
"K" as the number of exogenous variables, however
ASSUMPTIONS
1) Correct Model E( ) 0
i
e =
2) Exogeneity Cor( , ) 0
i i
x e =
3) Homoscedasticity
2
Var( )
i
e o = , constant
4) Serial independence Cor( , ) 0,
i j
i j e e =
5) Normality Normal
i
e
6) No incidental parameters
(K does not grow with N)
(1), (2) and (6) are needed for CONSISTENCY (i.e.,
in large samples OLS parameter estimates will be
correct "on the average")
(3) and (4) are needed for EFFICIENCY (i.e., in
large samples OLS yields "best" estimates,
significance tests are correct, etc.)
(5) is needed for small sample properties
3
Problem when Cor( , ) 0 x e
X
y
True line
Observations positive
negative
4
FORMULAE for OLS (Ordinary Least Squares)
2
( )( )
( )
i i
i
x x y y
x x
- -
=
-
y x o = -
Residual:
i i
y x e = -
Error variance:
2 2
1
i
:
o e =
,
where 1 N K : = - - , the degrees of freedom
Variance;
2
2
Var( )
( )
i
x x
o
=
-
Standard error;
2
2
se( )
( )
i
x x
o
=
-
t-value;
( )
( )
t
se
=
MATRIX FORMULAE
1
( ) X X X y
-
=
2 1
Var( ) ( ) X X o
-
= , etc
5
PANEL DATA MODELS
, 1, , , 1,
it it
y x i N t T = =
THE POOLED MODEL
it it it
y x o e = + +
X
y
Here we are NOT using any Panel information.
The data are treated as if there was only a single
index.
6
TRADITIONAL PANEL MODEL
it i it it
y x o e = + +
X
y
1
2
3
The constant terms,
i
o , vary from individual to
individual.
This is called INDIVIDUAL (UNOBSERVED)
HETEROGENEITY
The slopes are, however, the same for all individuals.
In both the Pooled and Panel Models we assume that
the errors are homoscedastic and serially
independent both within and between individuals
2
Var( )
it
e o =
Cor( , ) 0 when and/or
it js
i j t s e e =
7
SUR MODEL
SEEMINGLY UNRELATED REGRESSIONS
it i i it it
y x o e = + +
X
y
1
2
3
The constant terms,
i
o , and slopes,
i
, vary from
individual to individual.
In SUR models the errors are allowed to be contem-
porally correlated and heteroscedastic between indi-
viduals. We still assume serial independence as well
as homoscedasticity within individuals
2
Var( )
it i
e o =
Cor( , )
it jt ij
e e o =
Cor( , ) 0 when
it js
t s e e =
8
TWO COMMON SITUATIONS
1) There are a LARGE number of independent
individuals observed for a FEW time periods.
N T >>
N is often in the range 500 - 20,000, while T lies
between 2 and 10. In this case it is not possible to
estimate different individual slopes for all the
exogenous variables.
The PANEL DATA MODEL is most appropriate.
2) There are some MEDIUM length time series for
RELATIVELY FEW, possibly dependent,
equations (countries, firms, sectors etc)
T N >
T is usually in the range 30 - 150, while N often
lies between 2 and 15.
In this case the SUR MODEL is appropriate.
Efficient (SUR) estimation is used when T N
Equation-by-equation OLS is used if K T N < .
Panel models are MORE general than Pooled models
but LESS general than SUR models.
9
FIXED EFFECTS MODELS
Here we treat the individual heterogeneity as N
parameters that are to be estimated
it i it it
y x o e = + +
1, , , 1, i N t T = =
N is large (and can often be increased). T is small
and fixed.
WHY CAN'T WE USE OLS?
The individual heterogeneity can be considered as N
dummy variables. A regression with N K +
variables (so called Least Squares Dummy Variables
(LSDV) regression) must therefore be estimated.
There are two problems with LSDV regression.
1) There are INCIDENTAL PARAMETERS
The number of
i
o grows as N increases. The
usual proof of consistency therefore does not hold
for LSDV
2) Inverting a N K + matrix can be impossible if N
is very large. Even when possible it can be
impracticable and/or inaccurate.
10
WE NEED A "TRICK" TO REMOVE
THE INCIDENTAL PARAMETERS!
The original model is
it i it it
y x o e = + + (1)
Averaging over the T observations for each
individual yields
. . . i i i i
y x o e = + + (2)
where the "dot" notation is simply
1
. i it
T
t
y y =
, etc.
Subtracting (2) from (1) gives
. . .
( ) ( ) ( )
it i it i it i
y y x x e e - = - + - (3)
This is called the WITHIN REGRESSION. There
are no incidental parameters and the errors still
satisfy the usual assumptions. We can therefore use
LS on (3) to obtain consistent estimates.
11
The Within Regression Estimates
To simplify the notation we define
. it it i
y y y
= - ,
. it it i
x x x
= - , etc
The within regression can thus be written
it it it
y x
e = +
The estimates can thus be written
. .
2 2
.
( )( )
( )
it it it i it i
w
it it i
x y x x y y
x x x
- -
= =
-
and the individual effects can be estimated as
, . .
wi i w i
y x o = -
12
PROPERTIES OF THE WITHIN (FE)
ESTIMATES
w
is consistent if either N or T become large.
,
wi
o is only consistent when T become large.
The number of degrees of freedom must be
adjusted.
Degrees of freedom #obs #pars = - , i.e.
NT N K
N T K ( 1)
: = - -
= - -
Usual OLS programs, that are not explicitly
designed for panel data, assume that the degrees of
freedom are NT K - . Their standard errors, test
statistics and P-values must therefore be
corrected.
The parameter estimates from LSDV are the same
as from the within regression!
This is NOT a general result (incidental para-
meters do cause inconsistencies in many models)
13
THE FIXED EFFECTS MODEL:
A SUMMARY
1) Calculate the within averages:
. i
y and
. i
x
2) Calculate the differences from within averages:
. it it i
y y y
= - and
. it it i
x x x
= -
3) Regress
it
y
on
it
x
w
and
( )
w
se
4) Estimate the individual effects (if required):
, . .
wi i w i
y x o = -
5) If the regression has been performed with an
ordinary least squares program, then the degrees
of freedom etc. must be adjusted.
a u
N : : = -
( ) ( )
u
a w u w
a
se se
:
:
=
( ) ( )
a
a w u w
u
t t
:
:
=
which is distributed
a
t
:
under H
0
"a" denotes ADJUSTED and "u" UNADJUSTED
14
RANDOM EFFECTS MODELS
In Fixed Effects models:
We aren't interested in the individual effects
We can't estimate them consistently
WHY BOTHER WITH THEM?
The individual effects have an "empirical"
distribution
alfa
frequency
which has certain characteristics, e.g.
1
average
i
N
o o = =
2
variance of
o
o o =
15
We can use these definitions to rewrite the panel
data model
( )
it it i it
y x o e = + + - +
Defining the new error:
( )
it i it
u o e = - +
we can write
it it it
y x u = + +
This is the RANDOM EFFECTS MODEL.
This looks almost the same as the POOLED model,
but note two differences
The constant term can be interpreted as the
average individual effect
The error term now has a special form
We can obviously estimate the RE model using OLS
to obtain estimates of and
When is this consistent?
If consistent, is it efficient?
16
WHEN IS THE RANDOM EFFECTS
MODEL CONSISTENT?
Two conditions must be fulfilled
E( ) E( ) E( ) 0
it i it
u o e = - + =
Cov( , ) Cov( , ) Cov( , ) 0
it it i it it it
u x x x o e = + =
The first condition is OK as long as the original
errors are unbiased.
The second condition needs x
it
to be independent of
e
it
(which has already been assumed) and of o
i
.
IS IT REASONABLE TO ASSUME THAT THE
INDIVIDUAL EFFECTS ARE INDEPENDENT OF
THE EXOGENOUS VARIABLES?
EXAMPLE:
y
it
= # days unemployed year t
x
it
= income
o
i
= unmeasured individual propensity to be
unemployed (depends on such factors as
Education, Health Status etc.)
THIS ASSUMPTION MUST BE TESTED!
17
IS OLS EFFICIENT IN THE RANDOM
EFFECTS MODEL?
Efficient OLS needs homoscedasticity and serial
independence in the errors, u
it
.
Remember that ( )
it i it
u o e = - + we obtain
2 2
Var( )
it
u
o e
o o = + Assuming that o
i
and e
it
are
independent
Cov( , ) 0,
it js
u u j i = Can be assumed if all
individuals are independent
2
Cov( , ) 0
it is
u u
o
o = Since o
i
is the same for all t
within the same individual
The last condition violates the "serial independence"
assumption.
OLS is thus INEFFICIENT in the random effects
model, and yields INCORRECT standard errors and
tests.
18
EFFICIENT ESTIMATION IN THE
RANDOM EFFECTS MODEL
The Random Effects Model can be efficiently
estimated using GLS (Generalised Least Squares)
1) Define
1
1
e
o
0
o
= - , where
2 2 2
1
T
o e
o o o = +
2) Calculate the "pseudo within differences"
. it it i
y y y 0
*
= - ,
. it it i
x x x 0
*
= -
3) Perform an OLS regression on
it it it
y x u
* * * *
= + +
where (1 ) 0
*
= -
and
it
u
*
satisfies the LS assumptions
4) The Random Effects estimate of is given by
. .
2
.
( )( )
( )
it i it i
re
it i
x x y y
x x
* * * *
* *
- -
=
-
19
PROBLEM: 0 is not known
Unfortunately
2
e
o and
2
o
o are unknown.
If the errors u and e (or o) where known we could
estimate the variances using
2 2
1 .
T
i
N
u o =
(4)
2 2
1
.
( 1)
2
1
.
( 1)
( )
( )
it i
N T
it i
N T
u u
e
o
e e
-
-
= -
= -
(5)
2 2
1
1
( )
i
N
o
o o o
-
= -
(6)
Since u, e and o are unknown there are a number of
suggestions for how they can be estimated.
20
These methods use various residuals instead of the
unknown errors:
ols
u = RE residuals from the POOLED regression
it it it
y x u = + + #obs = NT
b
u = RE residuals from the BETWEEN regression
. . . i i i
y x u = + + #obs = N
w
e = FE residuals from the WITHIN regression
it it it
y x
e = + #obs = NT
w
u = RE residuals from the WITHIN regression
=
( )
w w w
e o o + -
re
u = residuals from the RE regression
it it it
y x u
* * * *
= + + #obs = NT
21
SOME DIFFERENT METHODS OF
ESTIMATING 0
I WALLACE and HUSSAIN.
Use
ols
u instead of u in (4) and (5)
II AMEMIYA
Use
w
u in (4) and
w
e in (5)
III SWAMY and ARORA
Use
b
u in (4) and
w
e in (5)
IV NERLOVE
Use
w
o in (6) and
w
e in (5)
V MAXIMUM LIKELIHOOD
Start with one of the previous methods, estimate
the RE parameters and then use
re
u to calculate
a new 0. Iterate.
22
Different authors suggest different degrees-of-
freedom corrections in the variance formulae. For
example (5) is often calculated as
2 2
1
,
( 1)
wit
N T K
e
o e
- -
=
where we have also used the fact that
0
w
e =
23
PROPERTIES
Research has established the following:
There is not much difference between I - V when
the Random Effects model is correct.
Only NERLOVE guarantees that
2
0
o
o > . Many
users of the other methods set 1 0 = (Fixed Effects)
if a negative value of
2
o
o is obtained.
It is difficult to give any general rules as to which
method to use. SWAMY/ARORA is probably the
most common.
The Random Effects estimates are more efficient
than the Fix Effects estimates when the RE model
is correct. They are inconsistent, however, when
the model is incorrect.
It is important to test which model is correct.
24
INDIVIDUAL SPECIFIC VARIABLES
In many cases we have some exogenous variables
that vary between individuals, but which do not vary
over time within a given individual (e.g., gender,
race, nationality).
Denote such an individual specific variable as q
i
In a FIX EFFECT Model we will thus write
it i i it it
y q x o j e = + + +
The term ( )
i i
q o j + does not vary over time, and
will thus be removed by the within transformation,
i.e.,
. . .
( ) ( ) ( )
it i it i it i
y y x x e e - = - + -
The parameters of the individual specific variables
(j) cannot be estimate in the Fix Effects model
(that is, we cannot distinguish between observed and
unobserved heterogeneity)
If q
i
only varies slightly over time, and only for a few
individuals, then j will be estimated with poor
precision (for example, Education, Marital Status)
25
In a RANDOM EFFECTS model we will write
it i it it
y q x u j = + + +
in which case j can be estimated (although not when
using the NERLOVE method).
For the Random Effects model to be appropriate,
however, the observed heterogeneity (q) must be
independent of the unobserved heterogeneity (o).
The Random Effects model therefore has the added
advantage of allowing us to estimate parameters of
which we are probably interested
26
TESTING
Hypothesis testing is central to statistical inference.
In econometric modelling we often distinguish
between three types of tests
1) SPECIFICATION TESTS
Is the model correct? (e.g., POOLED, RE, FE,
SUR)
2) MISSPECIFICATION TESTS
Are any of the statistical assumptions violated?
(e.g., Serial independence, Homoscedasticity)
3) PARAMETER TESTS
Do the parameters have specified values?
(e.g., is a parameter "significant")
(1) and (2) are really two aspects of the same
question - we are asking "Can the model be
estimated efficiently using Least Squares?". They
should be answered together
(3) can only be addressed after (1) and (2).
27
PARAMETER TESTS
There are no new problems with panel data models.
The same principals that apply to ordinary
regression can be applied here.
The usual way to test hypotheses concerning the
parameters in regression models is to use t-tests (one
parameter) and F-tests (several parameters).
These tests can be calculated in two ways, which give
identical results in linear models. We use the method
of calculation that is easiest.
28
Sum of Squares Tests
We have
an UNRESTRICTED model, where all the
parameters are estimated, and
a RESTRICTED model, where the parameters
satisfy a number of restrictions.
The null hypothesis (H
0
) is that the RESTRICTED
model is true, while the alternative hypothesis (H
1
) is
that the UNRESTRICTED model holds
The restrictions are usually of the form that certain
parameters are zero - i.e., some variables are not
important.
If the hypothesis that a single parameter is zero is
rejected, then we say that this parameter is
significant (more correctly: "the parameter is
significantly different from zero at the given
significance level ")
29
The unrestricted model, containing p
u
parameters, is
estimated using LS. The residual sum of squares is
calculated
2
,
u i
URSS e =
The restricted model, which has p
r
(<p
u
) parameters,
is also estimated using LS. The residual sum of
squares is also calculated here
2
,
r i
RRSS e =
Defining
1
# restrictions
u r
p p : = - =
2
# observations
u
p : = -
we calculate the test statistic
1 2
1
, 0
2
( )
under
RRSS URSS
F H
URSS
F
: :
:
:
-
=
The SS test is derived through a small sample
adjustment of the Likelihood Ratio Test
If
1
1 : = then the square root of F is t-distributed
under the null.
30
Standard Error Tests
The t-test for a single parameter can also be written
1
0
( ) under : 0
( )
v
t H
se
t
= =
This is a small sample adjustment of the Wald
Test.
The F-test for multiple parameters can also be
derived from the equivalent Wald test, but is now
expressed in matrix terms.
In general
tests for single parameters are easiest to use in
standard error form, while
tests for several parameters are calculated easiest
in sum-of-squares form.
31
CRITICAL VALUES AND P-VALUES
The traditional way to test a hypothesis is to see
whether the t-statistic is greater than the critical
value for a given significance level
-3 -2 -1 0 1 2 3
2.5%
2.5%
t(b)
The null hypothesis is rejected if
( ) t c
o
> .
In the above diagram o = 5%, c
o
= 1.96 and t = 1.51,
i.e. the hypothesis is not rejected.
32
This method of describing test results is not very
informative, however.
The reader is dependent on what significance level
the author considers to be interesting. At best the
author will use the "star" convention (one, two or
three stars to show significance at the 5%, 1% and
0.1% levels)
An alternative, which is gaining more and more
popularity, is to quote P-VALUES
-3 -2 -1 0 1 2 3
6.3%
6.3%
t(b)
The P-Value is
( | ( )| ) ( | ( )| ) P t t P t t > + <- .
The null hypothesis is rejected if the P-value is less
than the significance level.
In the above diagram the P-value is 12.6%, which is
greater than 5%. The hypothesis would therefore not
be rejected at the 5% level.
33
SPECIFICATION TESTS
So far we have considered four models for panel
data:
SUR
it i i it it
y x o e = + +
FIXED EFFECTS
it it it
y x
e = +
RANDOM EFFECTS
it it it
y x u
* * * *
= + +
POOLED
it it it
y x o e = + +
The error terms in these models satisfy the OLS
assumptions IF the respective model is correct (this
is why we have expressed the FE and RE models in
their transformed form).
We will not be considering the SUR model in detail,
since this is not possible to estimate if T is small.
34
TESTS FOR VARIOUS MODELS
FIXED
EFFECTS
RANDOM
EFFECTS
POOLED
Chow
Test
Hausman
Test
LM
Test
35
A TEST FOR UNOBSERVED
HETEROGENEITY
The CHOW TEST of the POOLED MODEL
against the FIX EFFECTS MODEL.
H
0
: POOLED MODEL (Restricted)
H
1
: FIX EFFECTS MODEL (Unrestricted)
The URSS is calculated using the residuals from the
Within Regression (
w
ols
e ).The number of parameters
is 1
r
p K = + .
The number of observations is NT in both cases.
36
The Sum-of-Squares test of H
0
is thus
( ) ( 1)
( )
RRSS URSS N
CHOW
URSS NT N K
- -
=
- -
which is distributed
1, N NT N K
F
- - -
under H
0
.
This test is called a CHOW test because of its
similarity to the well known CHOW test for
parameter stability.
37
INDIVIDUAL SPECIFIC VARIABLES
If there are p
q
individual specific variables in the
model, then these are INCLUDED in the POOLED
model, but EXCLUDED from the FIXED EFFECTS
model.
This is reasonable, since we want to test for
unobserved heterogeneity, not observed
heterogeneity!
In this case we must use 1
r q
p K p = + + and
1
1
q
N p : = - - in the Chow test.
38
RANDOM or FIX EFFECTS?
The HAUSMAN TEST
The Hausman test is a general test procedure which
is used when we want to test the validity of an
assumption that is necessary for efficient estimation.
For the test to work we need two estimation
methods:
METHOD 1, called
a
, is both consistent and
efficient under H
0
, but is inconsistent under H
1
.
METHOD 2, called
b
, is consistent under both H
0
and H
1
, but is inefficient under H
0
.
39
If there is only one parameter to be tested, then the
test statistic is very simple
2
2
0
1
2 2
( )
under H
b a
b a
h
s s
_
-
=
-
where s
a
and s
b
are the standard errors of the
parameter estimates.
Although
2 2
b a
o o > under the null, this relation
need not hold in small samples for the standard
errors. If
2 2
b a
s s < the test is not applicable.
If there are J > 1 parameters to be compared, the
Hausman test statistic must be expressed in matrix
terms and is distributed
2
J
_ .
There often exists an "omitted variables" version
of the Hausman test, which has the same
asymptotic properties and which is never negative
40
The Hausman test for RE vs. FE
In the case of testing for random effects we have the
following situation
H
0
: Random Effects model [Cor( , ) 0
i it
x o = ]
H
1
: Fix Effects model [Cor( , ) 0
i it
x o ]
Our estimates satisfy the Hausman conditions
re
is consistent and efficient under H
0
, but
inconsistent under H
1
w
is consistent under H
0
and H
1
, but inefficient
under H
0
The Hausman test can now be calculated in matrix
terms or through an omitted variables procedure
41
OMITTED VARIABLES VERSION
Define as before
. it it i
x x x
j
* * *
= + + +
The alternative Hausman test is a simple
F-test that j is zero.
This is appropriate since
0 0
H : 0 H : Cor( , ) 0
i it
x j o = =
If there are individual specific variables we simply
test
0
H : 0 j = in the regression
it i it it it
y q x x w
j
* * * *
= + + + +
Note now that we must assume that the q
i
are
independent of the o
i
(this is not testable).
42
POOLED or RANDOM EFFECTS?
The BREUSCH-PAGAN LM TEST
The RE model reduces to the POOLED model if the
variance of the individual effects becomes zero. The
hypothesis we wish to test is thus
H
0
:
2
0
o
o =
H
1
:
2
0
o
o >
LM tests are useful when it is easy to estimate the
model under the null (here the POOLED model) and
more complicated under the alternative (here the RE
model)
The Breusch-Pagan statistic is calculated using the
OLS residuals from the pooled model (
ols
e e = )
2
2 2
.
2
0
1
2
1 under H
2( 1)
i
it
T e
NT
LM
T
e
_
= -
-
43
Unfortunately, the Breusch-Pagan test is two-sided
against the alternative
2
0
o
o , in spite of the fact
that we know that variances cannot be negative.
An improvement suggested by HONDA is to use
the one-sided test
2 2
.
0
2
1 (0,1) under H
2( 1)
i
it
T e
NT
HONDA N
T
e
= -
-
A one-sided P-value is calculated; ( ) P x HONDA >
Another problem is that LM tests often have low
power
Experiments have shown that in many cases it is
better to use the CHOW test for FE against
POOLED even if we suspect that RE is the correct
alternative
44
SUR MODELS
SUR models can easily be tested against FE and
POOLED models using Chow tests, if we assume
homoscedasticity and independence between and
within individuals.
SUR models can be estimated when there is
heteroscedasticity and/or correlation between
individuals. In this case we must adapt the Chow
tests.
Testing SUR against RE always needs generalised
Chow tests.
45
MISSPECIFICATION TESTS
When T is small it is very difficult to investigate the
time series properties of panel data model. This is
quite possible when T gets larger, however.
Misspecification testing should be performed on the
most general model being considered
SMALL T
Autocorrelation can only be tested with great
difficulty
Heteroscedasticity can be tested, but it is difficult to
distinguish within individual from between
individual differences.
46
TEST FOR HETEROSCEDASTICITY
The proposed test is the Bickel version of the
Breusch-Pagan test. This tests for both within and
between heteroscedasticity, and is performed in
three steps
1) Estimate the within regression. Obtain the
residuals (
it
it
e )
2) Calculate the total residual variance
2 2
1
it
NT N K
s e
- -
=
Remembering that
.
0
i
e = , calculate the within
individual variances
2 2
1
1
i it
T
t
s e
-
=
3) Calculate the Bartlett statistic
i
N
T N s s
B
N T
2 2
2
1 0
( 1)[ ln ln ]
under H
1 {( 1) 3( 1)}
_
-
- -
=
+ + -
Bickel's test can also be used if we suspect
heteroscedasticity within individuals.
48
TESTS FOR AUTOCORRELATION
The first order within individual autocorrelation
coefficient is calculated from the within regression
residuals
, 1
1 2
2
1 2
N T
it i t
i t
N T
it
i t
r
e e
e
-
= =
= =
=
The simplest test is the LM test due to Breusch and
Godfrey
2
0
(0,1) under H
1
NT
LM r N
T
=
-
The autocorrelation coefficient is known to have a
slow convergence to normality, however, so a
superior alternative is probably a test due to Fisher
0
1
ln (0,1) under H
2 1
NT N K r
z N
r
- - +
=
-
49
ROBUST STANDARD ERRORS
If we discover (or even suspect) heteroscedasticity or
serial autocorrelation we must decide what to do.
One approach is to try and model these variances
and/or correlations. This can be difficult even for
large T, and is generally impossible for small T.
An alternative approach is to accept the usual
estimates, but to calculate their so called Robust
Standard Errors.
If we only suspect heteroscedasticity then we can
use WHITE'S ROBUST ERRORS.
If we suspect heteroscedasticity and/or within
individual autocorrelation we can use
ARELLANO'S ROBUST ERRORS
50
WHITE'S method is a standard approach performed
by most econometric software.
The robust variances estimate is given for a fixed
effects model with one exogenous variable as
2 2
2 2
Var( )
( )
it it
it
x
x
e
=
where the residuals and variables are from the
within regression.
In the general case with K exogenous variables the
variance-covariance matrix is given by
1 2 1
Var( ) ( ) ( )( )
it it it
X X X X X X
e
- -
=
where X
is the ( ) NT K "difference-from-mean"
matrix of all the exogenous variables and
it
X
is the
(1 ) K row vector of variables for a given
observation.
51
ARELLANO'S method is not standard.
For one variable we obtain
2
2 2
( )
Var( )
( )
it it
i t
it
x
x
e
=
while in the general case we have
1 1
Var( ) ( ) ( )( )
i i i i
X X X X X X
e e
- -
=
where
i
X
is the ( ) T K "difference-from-mean"
matrix of exogenous variables, and
e is the ( 1) T
vector of residuals, for the i
th
individual
52
STRATEGY
1) Test for Heteroscedasticity and Serial
Correlation in the most general model available
(SUR if possible, FE otherwise)
2) If there is no violation of the assumptions we test
a) RE vs. FE (Hausman)
b) POOLED vs. FE (Chow)
If (b) not significant Use POOLED model
If (b) significant but (a) is not Use RE model
If both tests significant Use FE model
If T is large we can also test the FE model against
SUR using a generalised Chow test or Wald test
of
i
i = " .
53
3) If the assumptions are violated
a) For small T estimate the FE model with
Arellano standard errors
b) For medium T use the following strategy in the
FE model. Test against pooled after
i) Adjusting for autocorrelation by making
the model dynamic
ii) Adjusting for heteroscedasticity between
individuals by using weighted least squares
c) For large T estimate the SUR model. Test
against FE and pooled after making the model
dynamic
Estimating SUR with the restriction
i
i = "
is sometimes called Park's model.
54
ONE-WAY PANEL MODEL
We have written the one-way panel model as
it i it it
y x o e = + + (1)
This is often rewritten as
it i it it
y x o e = + + + (2a)
1
where
0
and
i
i
i i
N
o o
o o
= -
(2b)
o is the AVERAGE individual effect, while
i
is the
individual DEVIATION FROM AVERAGE.
(2) seems to be just a complicated way of writing (1).
BUT it has the advantage that it can easily be
extended to two-way models.
55
TWO-WAY PANEL MODELS
In the one-way model we assume that there exists an
unobserved individual heterogeneity, but that the
model is homogeneous over time.
Is it reasonable to assume that all time heterogeneity
can be captured using observed explanatory
variables?
Assume that the individual and time effects are
additive, i.e. there is no interaction,
X
y
i=1,t=1
i=2,t=1
i=2,t=2
i=1,t=2
This is the Two-Way Panel Model
56
The Two-Way Panel Model is written
it i t it it
y x o A e = + + + + (3a)
where 0, 0
i t
i t
A = =
(3b)
We can define the individual/time effect as
it i t
o o A = + + (4)
Using the usual "dot" notation we obtain
1
it
i t
NT
o o o =
average effect (5a)
1
i i it
t
T
o o o + =
individual effect (5b)
1
t t it
i
N
o A o o + =
time effect (5c)
Note that some programmes report the individual
effects as
i
o , while others report
i
Note also that we can substitute (5) into (4) to obtain
0
it i t
o o o o - - + = (6)
57
THE TWO-WAY MODEL WITH
FIXED EFFECTS
The Two-Way model (3) has incidental parameters
as either N or T go to infinity.
We need a new "within" transformation to remove
these. We can see from (6) how this can be done
it it i t
y y y y y
= - - +
The Two-Way Within Model can thus be written
OLS estimates
it it it w
y x
e = +
The average, individual and time effects can now be
estimated
w w
y x
o = -
,
wi i w i
y x
o = -
,
w t t w t
y x
o = -
58
w
o and
w
are consistent as either N or T
,
wi
o is only T-consistent
,
w t
o is only N-consistent
The Two-Way within transformation removes
both observed and unobserved heterogeneity, for
both individual and time effects.
A dummy for an "oil-shock" or a "flu epidemic"
will disappear in a FE estimation
If T is small then the 2-Way FE model can easily
be estimated using a 1-Way program. We write
1
1
T
it i it s st it
s
y x D o A e
-
=
= + + + +
, (7)
where D
s
are dummies for year s. We can simply
treat these dummies as explanatory variables.
(Note that A is now defined as the difference from
year T, not difference from average.)
59
ONE and TWO-WAY MODELS with
FIXED and RANDOM EFFECTS
A One-Way model has either fixed or random
effects. Let
{ }
F
, { }
R
, { }
F
A and { }
R
A
denote the One-Way fixed and random models for
individual and time effects.
In a Two-Way model both the individual effects and
the time effects can be fixed or random. Let
{ } ,
F F
A denote the fully FE model
{ } ,
R F
A and { } ,
F R
A denote mixed FE/RE model
{ } ,
R R
A denote the fully RE model
Estimation of the One-Way and fully FE Two-Way
models have been described earlier
60
THE FULLY RANDOM EFFECTS
TWO-WAY MODEL
The model can be written
it it it
y x u o = + + , with
it i t it
u A e = + + ,
where , A, e and x are independent
OLS will be consistent but inefficient. The efficient
estimate is obtained by regressing y
**
on x
**
, where
1 2 3 it it i t
y y y y y
0 0 0
**
= - - +
and where
1
1
1
e
o
0
o
= - with
2 2 2
1
T
e
o o o = + , (8a)
2
2
1
e
o
0
o
= - with
2 2 2
2
N
A e
o o o = + and (8b)
3 1 2
3
1
e
o
0 0 0
o
= + + - with
2 2 2 2
3 1 2 e
o o o o = + - (8c)
61
PROBLEM: The 0 are unknown.
If the two-way errors u and e where known then
2 2
1
1
T
i
N
i
u
o
-
=
(9a)
2 2
2
1
N
t
T
t
u
o
-
=
(9b)
2 2
1
( 1)( 1)
it
N T
i t
e
o e
- -
=
(9c)
Similar alternatives as in One-Way RE are available
WALLACE uses OLS residuals
AMEMIYA uses within residuals
SWAMY/ARORA uses the between individual and
between time residuals for (9a) and (9b) and
within residuals for (9c)
NERLOVE estimates
2
o and
2
A
o Error! Not a valid link.
directly from the FE model, and uses within
residuals for (9c).
+ more complicated alternatives
It is common to adjust the denominators of (9) for
degrees-of-freedom
62
Negative estimates of
2
o and
2
A
o Error! Not a valid link. are
possible for all methods except NERLOVE. A
common procedure is to use
2
max( , 0) o .
63
MIXED FE/RE TWO-WAY MODELS
If the number of time periods (or individuals) is
small then a mixed model can be estimated by using
RE on a One-Way model with dummies (as in (7))
Otherwise we proceed as follows
1) Adjust the model for the fixed effects
2) Adjust for the random effects
3) Regress adjusted y on adjusted x (no constant)
Step Action
{ } ,
R F
A { } ,
F R
A
1) Within
transformation
it it t
y y y
= -
it it i
y y y
= -
2) RE
transformation
1 it it i
y y y
0
*
= -
2 it it t
y y y
0
*
= -
Theta
estimation
0
1
from (8a)
and (9a)
0
2
from (8b)
and (9b)
3) RE regression
(no constant)
it
y
*
on
it
x
*
it
y
*
on
it
x
*
Note that
i i
y y y
= - and
t t
y y y
= -
64
PARAMETER TESTS
There are 9 different models when we allow for the
possibility of both individual and time effects
{ } ,
F F
A
{ } ,
R F
A { }
F
A
{ } ,
F R
A { } ,
R R
A { }
R
A
{ }
F
{ }
R
POOLED
We must therefore choose:
The level (2-way, 1-way, pooled).
CHOW tests for FE
LM tests for RE
The type of effects (FE, RE)
HAUSMAN tests for given level
This is a "chicken-egg" problem. But
LM tests have poor power in small samples and
are complicated to adjust.
Chow tests have good power even in RE models
65
TWO-WAY CHOW TESTS
The best power is obtained by always testing against
the unrestricted two-way model
Model # Parameters (p) RSS
IT
{ } ,
F F
A 1 N T K + + -
I T
RSS
I
{ }
F
N K +
I
RSS
T
{ }
F
A T K +
T
RSS
0 POOLED K
0
RSS
We perform three Chow tests, for m = 0, T, I
( ) /( 1 )
/[( 1)( 1) ]
m I T m
m
I T
RSS RSS N T K p
CHOW
RSS N T K
- + + - -
=
- - -
Reject 0/I T
something
Reject T/I T
individual
Reject I /I T
time
Conclusion
YES YES YES 2-way
YES YES NO Individual
YES NO YES Time
NO NO NO Pooled
YES NO NO ?? (2-way)
NO YES NO ? (Individ.)
NO NO YES ? (Time)
NO YES YES ?? (2-way)
66
TWO-WAY HAUSMAN TESTS
The best power is obtained by always testing against
the fully FE model The omitted variables variant of
the tests are as follow
1) To test { } ,
R R
A against { } ,
F F
A
Regress y
**
on x
**
, x
2) To test { } ,
R F
A against { } ,
F F
A
Regress y
*
on x
*
, x
3) To test { } ,
F R
A against { } ,
F F
A
Regress y
*
on x
*
, x
Reject (1)
some FE
Reject (2)
ind. FE
Reject (3)
time FE
Conclusion
YES YES YES
{ } ,
F F
A
YES YES NO
{ } ,
F R
A
YES NO YES
{ } ,
R F
A
NO NO NO
{ } ,
R R
A
YES NO NO ?? { } ,
F F
A
NO YES NO ? { } ,
F R
A
NO NO YES ? { } ,
R F
A
NO YES YES ?? { } ,
F F
A
67
STRATEGY
1) Test for Heteroscedasticity and Serial
Correlation in Two-Way FE model
2) If there is no violation of the assumptions we
a) First choose level with CHOW tests
b) Then decide RE/FE with Hausman tests
3) If the assumptions are violated: see p. 53.
68
INCOMPLETE PANELS
Panel data studies where all individuals are observed
at each time period are called COMPLETE.
INCOMPLETE surveys are those with missing data.
These can occur for several reasons
1) We can plan our survey so that it is incomplete.
We have DETERMINISTIC missing data
2) The missing data is unplanned, but the selection
rule is independent of the data (observed and
unobserved). We have RANDOMLY missing data.
3) There is a correlation between the selection rule
and the data. There is a SELECTION BIAS
Complete surveys are BALANCED, i.e. each
individual and each time period is observed equally
often (N and T respectively).
Stochastic missing data is UNBALANCED, while
deterministic missing data can be either.
69
DETERMINISTIC and RANDOM missing values
are methodologically equivalent
UNBALANCED models without selection bias
only cause technical problems
The data for unbalanced panels is written
{ } ,
it it
y x for 1, , , 1, ,
i
i N t T = =
SELECTION BIAS is a serious problem that
needs complicated estimation methods in panel
data models
Missing data is often caused by ATTRITION; the
tendency of individuals to drop out of surveys that
stretch over many periods. We often suspect that
the causes of attrition are correlated with the data.
70
UNBALANCED PANELS
Assumption: There is no selection bias
ONE-WAY FIXED EFFECTS
The individual means are redefined:
1
i it
i
t
y y
T
=
As in the balanced model we regress y
on x
ONE-WAY RANDOM EFFECTS
The GLS transformation is now:
it it i i
y y y
0
*
= -
with 1
i
i
e
o
0
u
= - and
2 2 2
i i
T
e
u o o = +
The variances can be estimated consistently as
2 2
1
( 1)
it
N T K
e
o e
- -
=
, where
1
i
N
T T =
2
b
RSS
N K
o =
-
from regressing
i i
T y
on
i i
T x
2 2
2
b
T
o o
o
-
=
The estimates are obtained by regressing y
*
on x
*
71
TESTING Chow and Hausman (omitted variables)
tests as before. LM tests must be adjusted slightly
TWO-WAY MODELS are messy, but not difficult
SELECTION BIAS models are difficult to estimate
(we need numerical integration). Some simple
specification tests exist, however
Hausman-type tests are available if we estimate
the model in the full (unbalanced) sample and a
balanced sub-sample. The two methods should
give the same results if there is no selection bias
Omitted variable tests can be used with such extra
variables as
# times i
th
individual is in sample
dummy for whether i
th
individual is in the whole
sample
dummy for whether i
th
individual was present
in the previous period
72
ROTATING PANELS
Surveyors are wary of designs where individuals
have to answer questions many times over a long
period of time. This often leads to a large degree of
attrition, which can very well include selection bias
One method of avoiding this is to introduce a
deterministic attrition. By only interviewing each
individual a few times we hope to reduce the
stochastic attrition.
The most common deterministic design is the
method of ROTATING PANELS.
Period 1 Period 2 Period 3 Period 4
Wave 1 N N/2
Wave 2 N/2 N/2
Wave 3 N/2 N/2
Wave 4 N/2
73
DYNAMIC MODELS
Dynamic models include lagged values of the
endogenous variable on the RHS (they can also
include lagged exogenous variables)
, 1 it i i t it it
y y x o e
-
= + + +
it
e is assumed independent of
, 1
,
it i t
x y
-
it
e and
it
y are dependent
it
e is correlated with
i
y
w
is only T-consistent, not N-consistent
The standard way of estimating models with
correlation between the errors and the RHS
variables is to use the INSTRUMENTAL
VARIABLES method
74
INSTRUMENTAL VARIABLES (IV)
Consider a simple linear regression
y x o e = + +
where E( ) 0 x e .
A variable z is called an instrumental variable if
E( ) 0 z e = and Cov( , ) 0 x z
The IV estimate of is given by
( )( )
( )( )
I V
y y z z
x x z z
- -
=
- -
In matrix terms
1
( ' ) '
IV
Z X Z y
-
= if there are as
many instruments as RHS variables.
If there are more instruments than RHS variables
then
1
( ' ) '
IV
X X X y
-
= , where
explanatory variables x: 1 k
Model: ( 1) ( ) P y F x
= =
j j
x x
, where x
1
is usually the constant
Note that y is a binomially distributed
for given x with ( ) P F x
=
E( | ) y x P = and Var( | ) (1 ) y x P P = -
79
MARGINAL EFFECTS
The change in the dependent variable (y) for a given
change in an explanatory variable (
j
x ) is called the
MARGINAL EFFECT of that variable.
If
j
x is continuous:
E( )
c
j
y
ME
x
If
j
x is a dummy:
E( | 1) E( | 0)
d j j
ME y x y x = = - =
In the LINEAR model
c d
ME ME = =
In NONLINEAR models
ME
ME is a function of x
c d
ME ME , but often
c d
ME ME
We are not always so interested in
80
LINEAR PROBABILITY MODEL
(LPM)
The simplest binary choice model is to assume that F
is linear
( ) E( | ) F x x y x x
= =
i i i
y x e
= +
We can fit a linear regression, treating y as an
ordinary variable.
A technical problem is that e is heteroscedastic, since
Var( ) (1 ) ( )(1 ) constant
i i i
P P x x e
= - = -
LPM can be estimated using
OLS, with White's robust variance estimates
WLS
ML
81
A more serious problem is the assumption behind
the model.
Plotting y against x in a model with only one
explanatory variable yields
0
1
The OLS line estimates ( 1) 0 P y= < for some values
of x, and ( 1) 1 P y= > for some others!!
82
PROBIT and LOGIT MODELS
Two commonly used functions that always lie in the
interval (0,1) are
PROBIT: ( ) ( ) F x x d
= , the standard normal
distribution function
LOGIT: ( )
1
x
x
e
F x
e
=
+
The LOGIT function was first proposed as an
approximation to the PROBIT. In most cases they
give very similar results.
83
LATENT VARIABLE INTERPRETATION
There is an alternative interpretation to these binary
choice models, which is in some ways more attractive
Assume that there is a "true", but unobservable,
variable y
*
, e.g. the propensity to be sick. There is
also an observed variable y, the incidence of being
sick
The latent variable is explained in an ordinary linear
regression
y x e
*
= + , (1)
and the observed variable is given by
0 0
1 0
y
y
y
*
*
<
(2)
PROBIT e is normally distributed
LOGIT e is logistically distributed
84
IDENTIFICATION PROBLEMS
Multiplying (1) by a positive constant
the sign of y* is unchanged
y is unchanged
is unidentified
However
The sign of
j
and the ratio
j
are identified,.
The marginal effects are also identified
The probit model is normalised by letting e be
standard normally distributed (
2
1 o = )
Imposing the logistic distribution normalises
These normalised parameters are related
1.6 4
logit probit LPM
, except for the constant
term where 1.6 4 2
logit probit LPM
- .
85
ML ESTIMATION
The nonlinear probit and logit models are estimated
using maximum likelihood
1 0
1
Likelihood Joint Probability of sample
( 1) ( 0)
[ ( )] [1 ( )]
i i
i i
i i
y y
y y
i i
i
P y P y
F x F x
= =
-
=
= = =
= -
and thus
{ }
log likelihood ln ( ) (1 )ln[1 ( )]
i i i i
i
y F x y F x
= + - -
This is easy to maximise iteratively for LOGIT and
PROBIT models
86
PREDICTIONS
What do we mean by a prediction from a binary
choice model? The intuitive
( ) F x
is a prediction
of ( 1) P y= , not of y.
The standard definition is
0 0
1 0
y
y
y
*
*
<
where
y x
*
= . Note that
0 ( ) 0.5 y F y
* *
This rule seems reasonable if 0.5
obs
P , where
obs
P
is the observed proportion of ones amongst the y's. It
will however lead to nearly all the prediction being
zeroes or ones if
obs
P is small (large)
An alternative definition is therefore
0 ( )
1 ( )
obs
obs
F y P
y
F y P
*
*
<
The proportion
Cor( , ) y y
2) Effron's
2
2
2
( )
1
( )
y y
R
y y
-
= -
-
3) McFadden's
2
0
ln
1
ln
L
R
L
= - ,
where L is the likelihood from the estimated model
and L
0
is from the model with only a constant
4)
# correct predictions ( )
i i
y y
N
=
The first two measures reduce to the ordinary R
2
measure in a linear model.
We can of course replace
with y y in all except (3)
88
RESULTS IN LIMDEP
LIMDEP includes the following output when using
the PROBIT/LOGIT commands
1) LPM (start values)
2) PROBIT/LOGIT model
3) Measure of fit (4)
4) LogL and LogL0 Measure of fit (3)
One can request the marginal effects
1) ( )
c
ME x ; evaluated at the average of the x's
2) ( )
c s
ME x ; evaluated at strata averages
Note that ( )
c i
ME x and ( )
d i
ME x must be explicitly
calculated
One can also save
1) The predictions
y
2) The residuals y y -
3) The probabilities
( ) F y y
*
89
PANEL DATA LOGIT/PROBIT
A panel data binary choice model can be written
it i it it
y x o e
*
= + + + ,
where the observed variable is given as usual by
0 0
1 0
y
y
y
*
*
<
There are two problem here
1) FIXED EFFECTS: The incidental parameters
cannot be swept away by a simple transformation
of the data
2) RANDOM EFFECTS: Maximising the likelihood
involves numerical integration over T-dimensions
90
Fixed Effect Panel Logit - Chamberlain's Approach
Chamberlain has shown that the incidental
parameters are removed if the likelihood is
conditioned on
it
t
y
.
The number of observations and regressors available
are substantially reduced in the FE logit model
Individuals that have the same y for all time
periods don't contribute to the likelihood and can
be removed.
As for all fixed effects models we cannot include
individual specific regressors.
As for all unbalanced panels we can remove all
individuals that are only observed once
Other "problems" are
There is no obvious way of estimating the
individual effects or the marginal effects
It would, be possible to estimate the marginal
effects for the "average" person, i.e., when 0
i
= .
LIMDEP doesn't do this, however.
No "conditional ML" Probit is available
91
Random Effect Panel Probit and Logit
The full RE model is not possible to estimate. The
usual approach is to impose the "equi-correlation"
restriction
2
2 2
Cor( , )
it is
u u
e
o
o o
=
+
The Probit model also assumes that (0,1) N e
The Logit model assumes that e is logistic distr.
Both assume that
i
is normally distributed
The marginal effects are difficult to estimate since
E( ) y is now highly nonlinear. Replacing
i
with its
expected value (zero) in E( ) y leads to the usual
probit/logit formulae, however.
TESTS
Testing FE vs. RE vs. POOLED can be performed
with Hausman tests. In the Probit model a Wald test
is also available for RE vs. POOLED ( 0 = ).
LRT or F-tests are used for parameter testing.
92
OTHER LIMDEP MODELS
Multinomial Logit (MNL)
Each individual "chooses" between alternatives
0,1, , J . Thus
i
y j = if alternative j is chosen
The explanatory variables x
ij
are of two types
z
ij
: the choice specific attributes
w
i
: the individual specific characteristics
The multinomial logit model is written
0
1
( )
ij j i
i i i
z w
i
J
z z w
e
P y j
e e
j
j
+
+
=
= =
+
where
0
j is normalised to 0.
Note that the attributes have choice independent
parameters, while the characteristics can have choice
dependent parameters.
93
A property of the MNL model is the so called
Independence of Irrelevant Alternatives (IIA). When
choosing between alternatives 1 and 2 it does not
matter if alternative 3 exists or not.
The multinomial Probit model (MNP) does not
impose IIA automatically. Estimating MNP needs
Monte Carlo integration, however.
In LIMDEP we estimate MNL models using
LOGIT if there are no attributes ( 0
ij
z = )
NLOGIT if the characteristics have
j
j j =
NLOGIT + choice dummies/interactions for the
general model
MNP models can also be estimated in LIMDEP
94
Ordered Logit/Probit
In these models we assume that there is a strict
ranking between the alternatives (the classic example
is school grades).
The model is given by
1
( ) ( )
i j i j
P y j F y r r
*
-
= = <
where the r's are to be estimated. F is logistic or
normal depending on whether an Ordered Logit or
Ordered Probit is used.
Random Effect versions of the ordered models are
also available in LIMDEP
95
Count Models
If we are modelling, for example, the number of sick
days it may not seem unreasonable to assume that
Poisson( ) y A , where ln x A
=
A problem with a Poisson regression is that it forces
the mean and variance of y (given x) to be equal. An
alternative which allows for "overdispersion" is the
Negative Binomial regression
In many cases the Poisson and Neg.Bin. models
underestimate the number of zeroes. This may be
due to fact that there are two processes at work
1) A binary choice model, which determines if we
report sick
2) A count model, which determines how many days
we are absent if we report sick
Such models are called Zero-Inflated.
LIMDEP estimates Poisson, NegBin, ZIP, ZINB,
Fixed and Random Effect Poisson and NegBin., and
some sample selection count models
96
Truncated and Censored Models
In binary choice models we only observe, for
example, if consumption occurs. In a censored model
we observe the amount of consumption, but only if it
is non-negative. The model is
i i i
y x e
*
= +
if
if
i i i
i
i i i
y y L
y
L y L
* *
*
>
This model is left censored if x
i
is observed for all
observations and left truncated if x
i
is only observed
when
i i
y y
*
= . Right censoring and truncation are
also possible.
The limit, L
i
, can be a constant or a variable. If y is
observed consumption, then
i
y
*
is the propensity to
consume and the limit value is zero.
Censored and Truncated models usually assume that
e is normally distributed. In this case the censored
model is usually called a TOBIT model
Random Effects, nested, bivariate and sample
selection forms of the TOBIT model are available in
LIMDEP.
97
Sample Selection
In many situations (more than we like to think) the
data we have available has not been obtained
randomly from the population of interest.
In addition there may well be a correlation between
the mechanism that determines what data is
observed and the process we are interested in.
For example, physicians may tend to "deselect"
patients considered too ill to take part in a clinical
trial. Too few of these patients will therefore
participate, which will obviously bias our results.
Sample selection models consist of two parts
1) A selection model (probit, logit, etc)
2) An explanatory model (regression, tobit, etc).
These models are very easily estimated in LIMDEP.
Note the main problem consists of formulating a
reasonable selection model!