[go: up one dir, main page]

0% found this document useful (0 votes)
17 views15 pages

RV Econometrics II - Exam Fall 2016 - Solution Guide

Uploaded by

Tafa Tulu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views15 pages

RV Econometrics II - Exam Fall 2016 - Solution Guide

Uploaded by

Tafa Tulu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Exam Solution Guide

Econometrics II
December 2016
Econometrics II, Fall 2016
Department of Economics, University of Copenhagen

Part 1
What is Your Forecast
of GDP Growth?

The Case The goal of this part of the exam is to estimate a univariate time series model
for the growth in nominal GDP for Denmark over the period from 1980 to 2015 and use
the estimated model to forecast the growth rate until 2025.

The Data Graphs of the data and relevant transformations must be shown in the exam.
It must be noted that the level of nominal GDP is clearly non-stationary, but the log growth
rate appears stationary. Moreover, it can be noted that there appears to be a level shift
in the (log) growth rate around 1986-1987 and 2009-2010, while the negative growth rates
in 2008-2009 appears to be extreme events.

Econometric Theory The econometric theory must include the following:

(1) A precise definition and interpretation of the model considered and its proper-
ties. Specifically, a univariate autoregressive (AR) or autoregressive moving average
(ARMA) model must be presented. Furthermore, a precise definition of the out-of-
sample forecasts and the forecast variance must be given.

(2) A precise description of the estimator used, in particular a precise account of the
assumptions used to derive the estimator. Specifically, the method of moments (MM)
or the maximum likelihood (ML) estimators can be used dependent on the model
considered.

(3) A precise account of the necessary assumptions for consistent estimation and valid
inference. This includes a precise definition of the null hypotheses, test statistics,
and asymptotic distributions used to test relevant hypotheses.

(4) The theory must be presented in a logical order and with a consistent and correct
notation.

2
Empirical Results The empirical results must include the following:

(1) A presentation and discussion of the relevant empirical results, so that the reader
is able to understand the steps carried out in the process as well as the conclusions
made.

(2) A description of the model selection process based on a general-to-specific approach,


information criteria, or both.

(3) A discussion of whether the assumptions for consistent estimation and valid inference
are satisfied for the estimated models. Specifically, this includes misspecification
testing, which must be presented and discussed before statistical testing is carried
out.

(4) A clear conclusion to the main question and a discussion of the limitations of the
approach used to reach the conclusion. Specifically, the conclusion regarding the out-
of-sample forecasts and the forecast variance must be presented and the limitations
of the estimated models must be discussed in relation the forecasts.

3
Econometrics II, Fall 2016
Department of Economics, University of Copenhagen

Part 2
Money Demand and Interest Rates

The Case The goal of this part of the exam is to use cointegration techniques to test
the empirical validity of a theoretical equilibrium relation between money velocity and
interest rates.

The Data Graphs of the data and relevant transformations must be shown in the exam.
It must be noted that the time series for Danish money velocity and interest rates appear to
be unit root processes with some long-run co-movements indicating cointegration between
them.

Econometric Theory The econometric theory must include the following:

(1) A precise definition and interpretation of the models considered and their proper-
ties. Specifically, an interpretation of cointegration must be presented along with
a presentation of univariate autoregressive (AR) models used to test for unit roots
and a single equation cointegration approach based on the Engle-Granger two-step
procedure or the autoregressive distributed lag (ADL) and error-correction models
(ECM).

(2) A precise description of the estimator used, in particular a precise account of the
assumptions used to derive an estimator.

(3) A precise account of the necessary assumptions for consistent estimation and valid
inference. This includes a precise definition of the null hypotheses, test statistics,
and asymptotic distributions used to test relevant hypotheses.

(4) The theory must be presented in a logical order and with a consistent and correct
notation.

Empirical Results The empirical results must include the following:

(1) A presentation and discussion of the relevant empirical results, so that the reader
is able to understand the steps carried out in the process as well as the conclusions
made.

4
(2) A description of the model selection process based on a general-to-specific approach,
information criteria, or both.

(3) A discussion of whether the assumptions for consistent estimation and valid inference
are satisfied for the estimated models. Specifically, this includes misspecification
testing, which must be presented and discussed before statistical testing is carried
out.

(4) A clear conclusion to the main question and a discussion of the limitations of the
approach used to reach the conclusion. Specifically, the conclusion regarding cointe-
gration between money velocity and interest rates must be presented and the limi-
tations of the single-equation cointegration approach must be discussed in relation
to the conclusion.

5
Econometrics II, Fall 2016
Department of Economics, University of Copenhagen

Part 3
Is there Empirical Evidence of a
Taylor Rule with Interest Smoothing
under Different Fed Chairs?

The Case The goal of this part of the exam is to use generalized method of moments
to test if there is empirical evidence of the Federal Reserve following a Taylor rule with
interest rate smoothing under three different chairs.

The Data Graphs of the data and relevant transformations must be shown in the
exam. It must be noted that the time series appear quite persistent and potentially
non-stationary. Moreover, there seems to be some long-run co-movements between the
Federal Funds rate, the inflation rates, and proxies for the output gap.

Econometric Theory The econometric theory must include the following:

(1) A precise definition and interpretation of the model considered and its properties.
Specifically, the general method of moments (GMM) used to estimate the parameters
of the economic theory.

(2) A precise description of the estimator used, in particular a precise account of the
assumptions used to derive an estimator.

(3) A precise account of the necessary assumptions for consistent estimation and valid
inference. This includes a precise definition of the null hypotheses, test statistics,
and asymptotic distributions used to test relevant hypotheses.

(4) The theory must be presented in a logical order and with a consistent and correct
notation.

Empirical Results The empirical results must include the following:

(1) A presentation and discussion of the relevant empirical results, so that the reader
is able to understand the steps carried out in the process as well as the conclusions
made.

6
(2) A discussion of whether the assumptions for consistent estimation and valid inference
are satisfied for the estimated models.

(3) A robustness analysis of the estimated model.

(4) A conclusion to the main question. Specifically, if there is empirical evidence of a


Taylor rule with interest rate smoothing.

(5) A clear conclusion to the main question and a discussion of the limitations of the
approach used to reach the conclusion. Specifically, the conclusion regarding em-
pirical evidence of a Taylor rule with interest rate smoothing and the limitations of
the single-equation cointegration approach used must be discussed in relation to the
conclusion drawn.

7
Econometrics II, Fall 2016
Department of Economics, University of Copenhagen

Part 4
Theoretical Problems

#4.1 Cointegration and Error-Correction


Consider the system for xt = (x1t , x2t , x3t )0 , given by,

x1t = ρx1t−1 + δ + 1t (4.1)


x2t = b1 x1t−1 + b3 x3t−1 + 2t (4.2)
x3t = x3t−1 + 3t , (4.3)

for t = 1, 2, ..., T , where |ρ| < 1, the error terms are uncorrelated, independent over
time and normally distributed, i.e. it ∼ N (0, σi2 ) for i = 1, 2, 3, and the initial values
x0 = (x10 , x20 , x30 )0 are given.

Question 1
By recursive substitution the moving average representation for x1t is found as,

x1t = ρt x10 + (1 + ρ + ρ2 + ... + ρt−1 )δ + 1t + ρ1t−1 + ρ2 1t−2 + ... + ρt−1 11 .

As |ρ| < 1 the process for x1t is stationary, so x1t ∼ I(0). First, the effect of the initial
value is given by, ρt x10 , which converges to zero as t increases as ρt x10 → 0 for t → ∞.
Second, the effect of the constant term is given by, (1 + ρ + ρ2 + ... + ρt−1 )δ, which
δ
converges to µ = 1−ρ for t → ∞. Finally, the effect of shocks is temporary as ρi → 0
for i → ∞. The expectation of x1t conditional on the initial value, x10 , is given by,
E[x1t |x10 ] = ρt x10 + (1 + ρ + ρ2 + ... + ρt−1 )δ, which converges to the unconditional mean,
δ
µ = E[x1t ] = 1−ρ , as t increases.
Next, the moving average representation for x3t is found as,
t
X
x3t = x30 + 3t + 3t−1 + ... + 31 = x30 + 3i ,
i=1

which is a unit root process without a drift (or a pure random walk), so x3t ∼ I(1) as
∆x3t = 3t ∼ I(0). First, the initial value stays in the process for x3t . Second, the shocks
3t accumulate into a stochastic trend, ti=1 3i , and they have permanent effects on x3t
P

as ∂x3t /∂3t−k = ∂x3t+k /∂3t = 1 for all k ≥ 0. The expectation of x3t conditional on the
initial value, x30 , is given by, E[x3t |x30 ] = x30 , which does not converge as t increases.

8
Finally, the moving average representation for x2t is found as,
t−2 t−2 t−1
! !
X X X
t−1 i i
x2t = b1 ρ x10 + ρδ+ ρ 1t−1−i + b3 x30 + 3i + 2t
i=0 i=0 i=1
t−2 t−2 t−1
! ! !
X X X
= b1 ρt−1 x10 + b3 x30 + b1 ρi δ + b1 ρi 1t−1−i + 2t + b3 3i ,
i=0 i=0 i=1

which is a unit root process as it depends on x3t−1 , so x2t ∼ I(1). The first parenthesis is
the effect of the initial values. The effect of the initial value x10 goes to zero as t increases,
the effect of the initial value x30 stays in the process for x2t , and the initial value x20 has
no impact on x2t . The second parenthesis is a stationary component, which consists of the
effect from the constant term in x1t given by δ, the shocks to x1t given by 11 , ..., 1t−1 ,
and the shock 2t . It can be noted that the past shocks 11 , ..., 2t−1 do not have an impact
on x2t , so the shocks to x2t only have a temporary effect at time t. The final parenthesis
is a stochastic trend given by the accumulated shocks to x3t . The expectation of x2t
conditional on the initial values x10 and x30 is E[x2t |x0 ] = b1 ρt−1 x10 + b3 x30 + b1 t−2 i
P
i=0 ρ δ.

Question 2
Two or more unit root processes are cointegrated if there exists a linear combination of
the variables which is stationary. Formally, let xt be a unit root process of dimension
k, xt ∼ I(1). The variables in xt are cointegrated if a k-dimensional vector β exists,
β 6= 0, such that the linear combination β 0 xt is stationary, β 0 xt ∼ I(0). Thus, the common
stochastic trends in the unit root processes cancel out in the linear combination β 0 xt . We
refer to β as a cointegration vector.
The cointegration relation β can be interpreted as defining a long-run equilibrium.
The variables themselves wander arbitrarily up and down due to the presence of stochastic
trends, but they never deviate too much from equilibrium.
Above we showed that both x2t and x3t are unit root processes, i.e. x2t ∼ I(1) and
x3t ∼ I(1), while x1t is a stationary process. Hence, x2t and x3t are cointegrated if
there exists a linear combination between them which is stationary. Consider the linear
combination,
 
  x1t
β 0 xt = 0 1 −b3 x2t  = x2t − b3 x3t
 

x3t
= b1 x1t−1 + b3 x3t−1 + 2t − b3 x3t
= b1 x1t−1 + 2t − b3 ∆x3t
= b1 x1t−1 + 2t − b3 3t ,

where we have plugged in from (4.2) and (4.3). As x1t is a stationary process (and 2t and
3t are stationary by definition) the linear combination β 0 xt is stationary, β 0 xt ∼ I(0), and
the variables are cointegrated. The cointegration relation β 0 xt defines the deviation from
the long-run equilibrium between x2t and x3t .
It can be noted that the cointegration vector β is only unique up to a constant factor.
If β 0 xt ∼ I(0) then it also holds that B 0 xt ∼ I(0), where B = a · β for any non-zero
constant a.

9
Question 3
The representation theorem by Engle and Granger (1987) states that the variables x2t
and x3t cointegrate if and only if there exists an error correction model for either x2t , x3t ,
or both. Formally a variable is error correcting if its first-difference reacts to the lagged
cointegration relation given by β 0 xt−1 , so that whenever the variables are away from their
long-run equilibrium values there are forces pulling them back towards the equilibrium.
As x3t is a pure random walk it is not error correcting. By rewriting (4.2) we get,

x2t = b1 x2t−1 + b3 x2t−1 + 2t


∆x2t = −1(x2t−1 − b3 x3t−1 ) + b1 x2t−1 + 2t ,

which shows that x2t is error-correcting as ∆x2t reacts to the lagged deviation from the
long-run equilibrium given by β 0 xt−1 = x2t−1 − b3 x3t−1 , as found above. The coefficient
in front of the parenthesis is the error-correction coefficient (often referred to as α), which
must satisfy −1 ≤ α < 0 for x2t to error-correct. We note that x2t is instantly error-
correcting as α = −1.

Question 4
We rewrite the autoregressive distributed lag (ADL) model in (4.4) into an error correction
model as,

x2t = δ2 + θ1 x2t−1 + θ2 x2t−2 + φ0 x1t + φ1 x1t−1 + φ2 x1t−2


+ ψ0 x3t + ψ1 x3t−1 + ψ2 x3t−2 + εt (4.4)
∆x2t = δ2 + (θ1 + θ2 − 1)x2t−1 − θ2 ∆x2t−1
+ φ0 ∆x1t + (φ0 + φ1 + φ2 )x1t−1 − φ2 ∆x1t−1
+ ψ0 ∆x3t + (ψ0 + ψ1 + ψ2 )x3t−1 − ψ2 ∆x3t−1 + εt (4.5)
∆x2t = δ2 + γ1 x1t−1 + γ2 x2t−1 + γ3 x3t−1
− θ2 ∆x2t−1 + φ0 ∆x1t − φ2 ∆x1t−1 + ψ0 ∆x3t − ψ2 ∆x3t−1 + εt (4.6)
∆x2t = δ2 + γ1 x1t−1 + α (x2t−1 − β3 x3t−1 )
− θ2 ∆x2t−1 + φ0 ∆x1t − φ2 ∆x1t−1 + ψ0 ∆x3t − ψ2 ∆x3t−1 + εt , (4.7)

where
γ3
γ1 = φ0 + φ1 + φ2 , γ2 = θ1 + θ2 − 1, γ3 = ψ0 + ψ1 + ψ2 , α = γ1 , β3 = .
γ2
The representation in (4.6) is the linear error correction model and the model in (4.7) is
the non-linear error correction model. In (4.7), the parenthesis is the cointegration relation
between x2t and x3t , while α is the error-correction coefficient for x2t . It should be noted
that (4.4)-(4.7) are different representations of the same model.
The ADL model in (4.2) is a restricted version of the ADL model in (4.4). By imposing
the restrictions,

θ1 = θ2 = φ0 = φ2 = ψ0 = ψ2 = 0, φ1 = b1 , ψ1 = b3 ,

the ADL model in (4.4) reduces to (4.2), and likewise the error correction model in (4.7)
reduces to the error correction model for x2t found in Question 3.

10
Question 5
The PcGive test for no error-correction/no cointegration is a test of the null H0 : γ1 = 0
in the linear ECM model in (4.6) against the alternative HA : γ1 < 0. Under the null x2t is
not error-correcting, so the null corresponds to no cointegration. Under the alternative x2t
is error-correcting, so that the variables are cointegrated. The test statistics is given by the
γ
b1
usual t-ratio, tγ1 =0 = s.e.(b
γ1 ) , but the test statistics asymptotically follows a Dickey-Fuller
type distribution which depends on the number of I(1) variables and the deterministic
terms in the model.

Question 6
Consider first the test of the null hypothesis H0 : φ1 = 0 against the alternative HA : φ1 6=
0 in the ADL model (4.4). As φ1 is the coefficient to the stationary variable x1t−1 , the
null hypothesis can be tested with a t-test and standard inference applies in the sense that
the test statistics asymptotically follows a standard normal distribution under the null.
Next, the null hypothesis H0 : ψ2 = 0 is a test on a coefficient to the unit-root process
x3t−2 . But in (4.6), ψ2 is a coefficient to a mean-zero stationary variable, ∆x3t−1 , we can
test the null hypothesis with a t-test in (4.4) and standard inference applies in the sense
that the test statistics asymptotically follows a standard normal distribution under the
null. This is an implication of the general result by Sims, Stock, and Watson (1990) that
the estimated parameter follows a normal distribution asymptotically if the parameter is
a coefficient to a mean zero stationary variable, possibly after a linear transformation of
the model.

Question 7
There are three major limitations of using the single-equation cointegration approach
based on the ADL model in (4.4).
First, there could in principle exist an error-correction model for all the variables
consider, but we only considered the equation for x2t . Thus, the single equation ADL
model is, in general, inefficient as the cointegration parameters also enter in the other
equations which are not considered. Only in the special case where only x2t is error-
correcting is it sufficient to consider only the ADL/ECM model for x2t . But the assumption
that only x2t error-corrects can only be empirically tested in a vector error-correction
model for all variables. In can be noted that in the model given by (4.1)-(4.3), only x2t
is error-correcting, so in that case it is sufficient to estimate a single-equation ADL/ECM
model for x2t .
Second, the single-equation ADL/ECM approach implicitly assumes that only one
cointegration relation exists between the variables in the model. In general, p unit root
variables can have up to p−1 cointegration relations between them. If there are more coin-
tegration relations the single equation ADL/ECM model for x2t is not able to separately
identify the cointegration relations.
Third, in the single-equation ADL/ECM model we condition on x1t and x3t , so we
assume that they are predetermined, E[x1t 2t ] = 0 and E[x3t 2t ] = 0, which rules out
contemporaneous feedback from x2t to x1t and x3t .

11
#4.2 Forecasting Volatility
Consider the GARCH-X model for yt given the stationary exogenous variables xt ,

yt = δ + t (4.8)
t = σt zt (4.9)
σt2 = $ + α2t−1 + βσt−1
2
+ φx2t−1 , (4.10)

for t = 1, 2, ..., T , where $ > 0, α ≥ 0, and β ≥ 0, the innovation zt is assumed independent


over time and standard normally distributed, i.e. zt ∼ N (0, 1), and the initial values are
given. The exogenous variable xt influences the conditional variance of yt in (4.10). Assume
that xt is given by a stationary first-order autoregressive process,

xt = ρxt−1 + ηt , (4.11)

where |ρ| < 1 and the error term ηt is independent of zt , independent over time and
normally distributed, i.e. ηt ∼ N (0, σx2 ).

Question 1
The process for yt is weakly stationary when 0 ≤ α + β < 1 given that |ρ| < 1 as stated
in the exam (so that xt is a mean-zero stationary process).
As E[yt ] = δ is constant for all t and cov(yt , yt−k ) = 0 for all k 6= 0, the process yt is
weakly stationarity if the unconditional variance of t , σ 2 , is finite, which is fulfilled if 2t
has a stationary solution.
First, decompose 2t into the conditional expectation and a surprise in the squared
innovation:

2t = E[2t |It−1 ] + vt = E[σt2 zt2 |It−1 ] + vt = σt2 E[zt2 |It−1 ] + vt = σt2 + vt ,

where E[vt |It−1 ] = 0, so vt is uncorrelated over time. The last steps follow as σt2 is in the
information set It−1 and zt ∼ N (0, 1). Pluggin into (4.7) we get,

σt2 = $ + α2t−1 + βσt−1


2
+ φx2t−1
2t − vt = $ + α2t−1 + β(2t−1 − vt−1 ) + φx2t−1
2t = $ + (α + β)2t−1 + φx2t−1 + vt − βvt−1 .

This is an ARMA(1,1) model extended with a squared stationary variable xt , so the process
for 2t is stationary when |α + β| < 1. As α ≥ 0, β ≥ 0, and φ ≥ 0 are required for a
non-zero conditional variance, the stationary condition becomes 0 ≤ α + β < 1.
Given that the stationarity condition holds, we use E[2t ] = E[2t−1 ] to derive the
unconditional variance,

σ 2 = E[2t ] = E[$ + (α + β)2t−1 + φx2t−1 + vt − βvt−1 ]


= $ + (α + β)E[2t−1 ] + φE[x2t−1 ] + E[vt ] − βE[vt−1 ]
= $ + (α + β)E[2t ] + φE[x2t−1 ]
$ φ
= + E[x2t−1 ].
1−α−β 1−α−β

12
Next, we use (4.8) to find an expression for E[x2t−1 ], which we note is equal to the uncon-
ditional variance of xt as it is a mean-zero stationary process. By recursive substitution
we find,
 !2 

X σx2
E[x2t−1 ] = E[(ρxt−2 + ηt−1 )2 ] = ... = E  ρi ηt−i  = (1 + ρ2 + ρ4 + ...)σx2 = ,
1 − ρ2
i=0

as ηt is assumed to be independent over time and normally distributed with variance σx2
(so that E[η 2 ] = σx2 and E[ηt ηt−k ] = 0 for all k 6= 0).
That gives the unconditional variance of the innovation (and hence yt ),

$ φ $ φσx2
σ2 = + E[x2t−1 ] = + .
1−α−β 1−α−β 1 − α − β (1 − α − β)(1 − ρ2 )

It should be noted that the unconditional variance of yt depends on the unconditional


variance of xt .

Question 2
To derive the forecasts it is useful, but not necessary, to formulate the constant term $
in terms of the unconditional variance,

$ φσx2
σ2 = +
1 − α − β (1 − α − β)(1 − ρ2 )
φσx2
$ = (1 − α − β)σ 2 − .
(1 − ρ2 )

The conditional variance can then be written as,

φσx2
σt2 = (1 − α − β)σ 2 − + α2t−1 + βσt−1
2
+ φx2t−1
(1 − ρ2 )
σx2
 
2 2 2 2 2 2
= σ + α(t−1 − σ ) + β(σt−1 − σ ) + φ xt−1 − .
(1 − ρ2 )

The volatility forecast for T + 1 conditional on the information set IT is given by,

σT2 +1|T = E[2T +1 |IT ]


= E[σT2 +1 |IT ]
σx2
   
= E σ + α(T − σ ) + β(σT − σ ) + φ x2T −
2 2 2 2 2
IT
(1 − ρ2 )
σx2
 
= σ 2 + α(E[2T |IT ] − σ 2 ) + β(E[σT2 |IT ] − σ 2 ) + φ E[x2T |IT ] −
(1 − ρ2 )
σx2
 
= σ 2 + α(2T − σ 2 ) + β(σT2 − σ 2 ) + φ x2T − ,
(1 − ρ2 )

where the last equality holds as 2T , σT2 , and x2T are all in the information set IT .

13
The volatility forecast for T + 2 conditional on the information set IT is given by,

σT2 +2|T = E[2T +2 |IT ]


= E[σT2 +2 |IT ]
σx2
   
= E σ + α(T +1 − σ ) + β(σT +1 − σ ) + φ x2T +1 −
2 2 2 2 2
IT
(1 − ρ2 )
σx2
 
2 2 2 2 2 2
= σ + α(E[T +1 |IT ] − σ ) + β(E[σT +1 |IT ] − σ ) + φ E[xT +1 |IT ] −
(1 − ρ2 )
σx2
 
= σ 2 + α(σT2 +1|T − σ 2 ) + β(σT2 +1|T − σ 2 ) + φ E[x2T +1 |IT ] − )
(1 − ρ2 )
σx2
 
2 2 2 2
= σ + (α + β)(σT +1|T − σ ) + φ E[xT +1 |IT ] −
(1 − ρ2 )
σx2
 
2 2 2 2
= σ + (α + β)(σT +1|T − σ ) + φ E[(ρxT + ηT +1 ) |IT ] −
(1 − ρ2 )
σx2
 
= σ 2 + (α + β)(σT2 +1|T − σ 2 ) + φ E[ρ2 x2T + ηT2 +1 + ρxT ηT +1 |IT ] −
(1 − ρ2 )
σx2
 
2 2 2 2 2 2
= σ + (α + β)(σT +1|T − σ ) + φ ρ E[xT |IT ] + E[ηT +1 |IT ] + ρE[xT ηT +1 |IT ] −
(1 − ρ2 )
σx2
 
= σ 2 + (α + β)(σT2 +1|T − σ 2 ) + φ ρ2 x2T + σx2 + ρxT E[ηT +1 |IT ] −
(1 − ρ2 )
σx2
 
= σ 2 + (α + β)(σT2 +1|T − σ 2 ) + φ ρ2 x2T + σx2 −
(1 − ρ2 )

where it has been used that E[ηT2 +1 |IT ] = σx2 and E[ηT +1 |IT ] = 0. Compared to the
standard GARCH(1,1) model, the forecasts also depend on the level of xT .

Question 3
The volatility forecast for T + k conditional on the information set IT is given by,

σx2
 
2 2 2 2 2
σT +k|T = σ + (α + β)(σT +k−1|T − σ ) + φ E[xT +k−1 |IT ] − ,
(1 − ρ2 )

which converges towards the unconditional variance σ 2 for k → ∞.


It can be noted that,

E[x2T +k−1 |IT ] = E[(ρxT +k−2 + ηT +k−1 )2 |IT ]


" k−1
!2 #
X
k−1 k−1−i
= E ρ xT + ρ ηT +i IT
i=1
k−1
" #
X
=E ρ2(k−1) x2T + ρ2(k−1−i) ηT2 +i IT
i=1
k−1
X
2(k−1)
=ρ E[x2T |IT ] + ρ2(k−1−i) E[ηT2 +i |IT ]
i=1
k−1
!
X
= ρ2(k−1) x2T + ρ 2(k−1−i)
σx2
i=1
σx2
→ for k → ∞ as |ρ| < 1,
(1 − ρ)2

14
where all cross-products with conditional expectation of zero have been left out in the
σx2
third expression. Hence, E[x2T +k−1 |IT ] − (1−ρ 2 ) → 0, and as 0 ≤ α + β < 1 the volatility

forecast for T + k conditional on the information set IT converges towards the uncondi-
tional variance, σ 2 . The intuition is that the information included in the information set
IT becomes less and less relevant as the forecasting horizon increases and therefore the
volatility forecast converges towards the unconditional variance, which can be interpreted
as the volatility forecast based on the empty information set.

Question 4
To estimate the parameters in the model (4.5)-(4.7), θ = (δ, $, α, β, φ)0 , by maximum
likelihood conditional on x0 , x1 , ..., xT , we use the assumption of conditional normality of
t ,
t = σt zt , zt ∼ N (0, 1),

or alternatively that t |It−1 ∼ N (0, σt2 ).


The likelihood contribution can be written in terms of the observed data as,

Lt (δ, $, α, β, φ|yt , xt , yt−1 , xt−1 , ..., y1 , x1 , y0 , x0 )


1 2t
 
1
=p exp − 2
2πσt2 2 σt
(yt − δ)2
 
1 1
=q exp − 2 ,
2
2π($ + α(yt−1 − δ)2 + βσt−1 + φx2t−1 ) 2 $ + α(yt−1 − δ)2 + βσt−1 + φx2t−1

for t = 1, 2, ..., T , where σt2 can be calculated recursively for a given set of parameters θ
and given assumed initial values for σ02 and 20 .
The likelihood function is given by the product of the likelihood contributions over t =
1, ..., T . The maximum likelihood estimator is found by maximizing the (log-) likelihood
function with respect to the parameters θ. As we cannot solve the likelihood equations
analytically, the maximum likelihood estimator is found by numerical optimization.
It can be noted that the maximum likelihood estimator can be based on other con-
ditional distributions of t . For example, a fat-tailed distribution can be used, such as a
student t(v) distribution where the degrees of freedom, v, can be treated as a parameter
that can be estimated from the data.

15

You might also like