Econometrics Eviews 6

ECONOMETRICS II
TUTORIAL III
The third tutorial will deal with binary dependent variable models esti-
mation by Maximum Likelihood (Logit and Probit models) and by OLS
(linear probability model). We will analyse the interpretation of coef-
ficients and of marginal effects, some goodness-of-fit measures and we
will program an heteroskedasticity test based on Lagrance Multipliers
principle.
To this end we will use the data contained in the benefit.wf1 workfile (the
example is taken from Verbeek, chap. 7 which is based on McCall, 1995).
The variables refer to 4, 877 blucollars which have become unemployed
between 1982 and 1991. Only 68% of them have applied for unemploy-
ment benefits, so an interesting question is which are the determinants
of the choice of whether to apply or not for the benefit as a function of
some individual characteristics.
The variables in the dataset are:
• y binary variable y (dummy), 1 if applied for (and received)

UI benefits and 0 otherwise;
• rr replacement rate, ratio between benefit and last wage;
• rr2 replacement rate squared;
• age age of the worker, in years;
• age2 age of the worker squared and divided by 10;
• tenure years of tenure in job lost;
• slack dummy, 1 if job lost due to slack work, 0 otherwise;
• abol dummy, 1 if job lost because position abolished, 0 other-

wise;
• seasonal dummy, 1 if job lost because seasonal job ended, 0

otherwise;
• head dummy, 1 if head of household, 0 otherwise;
• married dummy, 1 if married, 0 otherwise;
• dkids dummy, 1 if kids, 0 otherwise;
1
• dykids dummy, 1 if young kids (0-5 yrs);
• smsa dummy, 1 if live is smsa, 0 otherwise;
• nwhite dummy, 1 if nonwhite, 0 otherwise;
• yrdispl year of job displacement (1 = 1982, ..., 10 = 1991);
• school12 dummy, 1 if more than 12 years of school, 0 otherwise;
• male dummy, 1 if male, 0 otherwise;
• statemb state maximum benefit level;
• stateur state unemployment rate (in %).
The workfile also contains the variables blucollar (dummy, 1 for all
the observations) and state (51 different values, grouped in 9 categories).
1 MODEL ESTIMATION WITH LPM, LOGIT AND

PROBIT
Estimate the model with the y variable as dependent variable and the
other variables (including a constant, in the order above) as regressors.
Use first a Linear Probability Model (i.e. Ordinary Least Squares as
estimation method), then a Logit and a Probit model (call the equations
you estimate as lpm, logistic, probit).
To estimate a model with binary dependent variable you have first to

select:
BINARY - Binary choice (logit, probit, extreme value)
among the estimation methods of the equation and then to select the
specific model (Probit, logit, extreme value)
Notice how the logit and probit outputs contain:
1) Information on the convergence process and on the computation

method of coefficients’ variance/covariance matrix;
2) Some statistics specific to these models:
Log likelihood
Restr. log likelihood
LR statistic (# df)
2
Probability(LR stat)
McFadden R-squared
The output of the three equations are, respectively:
3
4
5
1.1 Computation of the marginal effects
In a non-linear model, the object of interest is the marginal effects and
not the coefficients. In a binary dependent variable case we have:
∂P (yi = 1|xi )
= f (x0i β)βk (1)
∂xki
Notice however that the sign of the marginal effect is the same as the
sign of the coefficient. Notice also that the marginal effect is individual
specific (it depends on x0i ) and it can also be evaluated in a specific
sample point (average, median, minimum, maximum, etc.)
EViews does not provide directly the marginal effects. However, they
can be easily computed.
As a first step, estimate the value of x0i β (so called index function) for
each observation. It is sufficient to select:
Procs
Forecast (Fitted probability/index)
and then selecting “Index”.
Once the index function for each observation is computed, it is sufficient

to multiply the density function evaluated at the index function by the
coefficient itself.
Example 1. Let yplogit and ypprobit be the names given to the index
functions and logistic and probit the names given to logit and probit equa-
tions; if we want to compute the marginal effect of the variable tenure,
the command to generate the series is for the probit and logit models,
respectively:
series marg ten p = @dnorm(−ypprobit) ∗ probit.@coef (6)
series marg ten l = @dlogistic(−yplogit) ∗ logistic.@coef (6)

If we want to measure the marginal effect of the variable age, the com-
mand to generate the series is for the probit and logit model, respectively
(notice that the variable appears both linearly and as a squared term and
that the squared term is divided by 10):
series marg age p = (@dnorm(−ypprobit)) ∗

((probit.@coef (4)) + (2/10) ∗ (probit.@coef (5)) ∗ age)
6
series marg age l = (@dlogistic(−yplogit)) ∗
((logistic.@coef (4)) + (2/10) ∗ (logistic.@coef (5)) ∗ age)
Marginal effects can also be evaluated in specific sample points (eg. ave-
rage). For instance, if we want to measure the marginal effect evaluated
observation by observation of the replacement ratio variable in the probit
model:
series marg rr p = (@dnorm(−ypprobit)) ∗

((probit.@coef (2)) + 2 ∗ (probit.@coef (3)) ∗ rr)
whereas if we want to measure it at the sample average:
scalar marg rrm p = (@dnorm(−@mean(ypprobit))) ∗

((probit.@coef (2)) + 2 ∗ (probit.@coef (3)) ∗ @mean(rr))
Notice that if the regressor is discrete (i.e. it is a dummy) it is more

meaningful to refer to discrete changes in probability and not to marginal
effects:
∆P (yi = 1|xi ) = (F (x0i β) | xk = 1) − (F (x0i β) | xk = 0)
1.2 Goodness-of-fit measures

For these models, two types of goodness of fit measures exist: i ) those
based on the comparison between the loglikelihood of the model and the
likelihood of a restricted model with only a constant as regressor and ii )
those based on the correct classification of the observations.
Pseudo-R2 and McFadden-R2 The Pseudo-R2 statistics is:
1
Pseudo-R2 = 1 − log L1 −log L0
(2)
1+2× n
The McFadden-R2 statistics is:

log L1 log L0 − log L1
McFadden-R2 = 1 − = (3)
log L0 log L0
EViews shows the McFadden statistics in the output:
log L1 −2874.071
Probit: McFadden-R2 = 1 − =1− = 0.0555 (4)
log L0 −3043.028
log L1 −2873.197
Logit: McFadden-R2 = 1 − =1− = 0.0558 (5)
log L0 −3043.028
7
The Pseudo-R2 can instead be easily computed. For instance, the com-
mand to use for the probit model is:
scalar pseudor2p = 1−(1/(1+2∗(probit.@logl−(−3043.028))/probit.@regobs))

(6)
Statistics computed from predictions Idea: compare the correct

and incorrect predictions. To this end, it is necessary to compute the
predictions for yi
1
ybi = 1 if F (x0i βbM L ) > (7)
2
1
ybi = 0 if F (x0i βbM L ) ≤
2
which allow to construct a two-ways table according to observed values
yi and predicted ones ybi :
ybi
0 1 T ot
0 n00 n01 n0 (8)
yi 1 n10 n11 n1
T ot N0 N1 n
By using the (8) it is possible to derive several goodness-of-fit measures.
Define with
n01 + n10
wr1 = (9)
n
the proportion of incorrect predictions for the general model. The cor-
responding statistics for the model in which all the parameters are set
equal to 0 can be easily computed. In fact if pbM L = nn1 > 21 then
n1
wr0 = 1 − (10)
n
n1 1
Instead, if pbM L = n
≤ 2
then
n1
wr0 = (11)
n
The goodness-of-fit measure can then be computed as
wr1 wr0 − wr1
Rp2 = 1 − = (12)
wr0 wr0
which shows it is a measure of the proportional decrease of incorrect
classifications when using the estimated model and not the restricted one
8
(with all the parameters set equal to 0), standardised by the proportion
of incorrect classifications in the restricted model.
To obtain the elements for the computation of this statistics it is suffi-

cient to select:
view
Prediction-Expectation Evaluation
by specifying a threshold for the estimated probability (0.5).
In the probit case:
wr1 (1, 311 + 162) /4, 877 1, 473

Rp2 = 1 − =1− =1− = 0.045 (13)
wr0 1, 542/4, 877 1, 542
whereas in the logit:
wr1 (1, 300 + 171) /4, 877 1, 471

Rp2 = 1 − =1− =1− = 0.046 (14)
wr0 1, 542/4, 877 1, 542
The table shown by EViews is, in the logit case:
9
1.3 LM heteroskedasticity test
Consider the unrestricted model in latent form
yi∗ = x0i β + εi (15)
2
π
εi |xi , zi ∼ N ID(0, 1h(z0i α)) o ∼ LID(0, h(z0i α))
3
yi = 1 if yi∗ > 0
yi = 0 if yi∗ ≤ 0
where h is a differentiable function such that h(.) > 0, h0 (.) 6= 0 and
h(0) = 1. The restriction we want to test is
α=0 (16)
10
The loglikelihood function of the unrestricted model is
x0i β
X
log L(θ) = yi log F + (17)
i κh(z0i α)
x0i β
X
+ (1 − yi ) log 1 − F
i kh(z0i α)
where θ = (β 0 , α0 )0 and κ = 1 for the Probit model and κ = π3 for the Lo-
2
git model. First order conditions wrt θ of the corresponding Lagrangean

function for the restricted model can be written as
 
0e
P y − F x β
i ∂ log Hi (θ, λ) 1 X i i ML
= h i h i f x0i βeM L  xi = 0
∂β (θeM L ) κ i
0
F xi βeM L 0
1 − F xi βeM L
 
0e
P y − F x β
i ∂ log Hi (θ, λ)
X i i M L
0e 0e
=k  h i h i f xi βM L  xi βM L z i
∂α (θeM L ) i F x0 βeM L 1 − F x0 βeM L
i i
− λ̃ = 0 (18)
where the terms in square brackets are the generalised residuals of the
restricted model. If we neglect irrelevant constants, the S matrix in this
case is  
0 G 0 b 0
εeG x (e
ε x β M L )z
 1G 10 1 1
0 b
1
0 
 εe2 x2 εG
(e 2 x 2 β M L )z 2 
S=  (19)
 
0 0 b 0
εeGn xn εG
(en xn βM L )zn
Also in this case the null hypothesis can be tested by using the test
statistics
b0 S0 Sb 2
ξLM = n 0
= nR (20)
ii
= b0 S0 Sb (21)
which follows a χ2 distribution with J degrees of freedom and where b

is the OLS estimator of an auxiliary regression with a column of 1s as
dependent variable and with the elements of the S matrix as independent
variables.
Example 2. Test that the variance of the error is function of age in the
probit model.
11
To save the generalised residuals select:
Procs
Make residual series
and choosing Generalized.

2
Notice that EViews does not report the R among the statistics of the
regression. However, in this case (i.e. with a vector i as dependent
variable):
−1 −1
b0 S0 Sb = i0 S (S0 S) S0 S(S0 S) S0 i =
| {z } | {z }
b0 b
0 0 −1 0
= i S (S S) S i =
h i0 h i
0 −1 0 0 −1 0
= n − i − S (S S) S i i − S (S S) S i =
= n − RSS (22)
The commands to use are then (save the auxiliary regression as “etero”):
scalar etero stat = etero.@regobs − etero.@ssr
scalar etero p = 1 − @cchisq(etero stat, 1)
12

Econometrics Eviews 6

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Econometrics Eviews 6

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics Eviews 6

Uploaded by

Copyright:

Available Formats

ECONOMETRICS II

The variables in the dataset are:

• y binary variable y (dummy), 1 if applied for (and received)

• rr replacement rate, ratio between benefit and last wage;

• rr2 replacement rate squared;

• age age of the worker, in years;

• age2 age of the worker squared and divided by 10;

• tenure years of tenure in job lost;

• slack dummy, 1 if job lost due to slack work, 0 otherwise;

• abol dummy, 1 if job lost because position abolished, 0 other-

• seasonal dummy, 1 if job lost because seasonal job ended, 0

• head dummy, 1 if head of household, 0 otherwise;

• married dummy, 1 if married, 0 otherwise;

• dkids dummy, 1 if kids, 0 otherwise;

• smsa dummy, 1 if live is smsa, 0 otherwise;

• nwhite dummy, 1 if nonwhite, 0 otherwise;

• yrdispl year of job displacement (1 = 1982, ..., 10 = 1991);

• school12 dummy, 1 if more than 12 years of school, 0 otherwise;

• male dummy, 1 if male, 0 otherwise;

• statemb state maximum benefit level;

• stateur state unemployment rate (in %).

1 MODEL ESTIMATION WITH LPM, LOGIT AND

To estimate a model with binary dependent variable you have first to

BINARY - Binary choice (logit, probit, extreme value)

Notice how the logit and probit outputs contain:

1) Information on the convergence process and on the computation

2) Some statistics specific to these models:

The output of the three equations are, respectively:

and then selecting “Index”.

Once the index function for each observation is computed, it is sufficient

series marg ten p = @dnorm(−ypprobit) ∗ probit.@coef (6)

series marg ten l = @dlogistic(−yplogit) ∗ logistic.@coef (6)

series marg age p = (@dnorm(−ypprobit)) ∗

series marg rr p = (@dnorm(−ypprobit)) ∗

whereas if we want to measure it at the sample average:

scalar marg rrm p = (@dnorm(−@mean(ypprobit))) ∗

Notice that if the regressor is discrete (i.e. it is a dummy) it is more

∆P (yi = 1|xi ) = (F (x0i β) | xk = 1) − (F (x0i β) | xk = 0)

1.2 Goodness-of-fit measures

The McFadden-R2 statistics is:

scalar pseudor2p = 1−(1/(1+2∗(probit.@logl−(−3043.028))/probit.@regobs))

Statistics computed from predictions Idea: compare the correct

To obtain the elements for the computation of this statistics it is suffi-

by specifying a threshold for the estimated probability (0.5).

In the probit case:

wr1 (1, 311 + 162) /4, 877 1, 473

whereas in the logit:

wr1 (1, 300 + 171) /4, 877 1, 471

The table shown by EViews is, in the logit case:

git model. First order conditions wrt θ of the corresponding Lagrangean

which follows a χ2 distribution with J degrees of freedom and where b

and choosing Generalized.

scalar etero stat = etero.@regobs − etero.@ssr

scalar etero p = 1 − @cchisq(etero stat, 1)

You might also like