[go: up one dir, main page]

0% found this document useful (0 votes)
31 views8 pages

Dynamic Linear Models With Switching

The document discusses dynamic linear models with switching measurement matrices to model changes in vector time series. It presents filtered estimators for state vectors and state occupancy probabilities, along with a maximum likelihood estimation procedure using a pseudo-expectation-maximization algorithm. The authors relate their models to existing literature and provide applications, including tracking multiple targets.

Uploaded by

jiaqi bao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views8 pages

Dynamic Linear Models With Switching

The document discusses dynamic linear models with switching measurement matrices to model changes in vector time series. It presents filtered estimators for state vectors and state occupancy probabilities, along with a maximum likelihood estimation procedure using a pseudo-expectation-maximization algorithm. The authors relate their models to existing literature and provide applications, including tracking multiple targets.

Uploaded by

jiaqi bao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Dynamic Linear Models With Switching

Author(s): R. H. Shumway and D. S. Stoffer


Source: Journal of the American Statistical Association , Sep., 1991, Vol. 86, No. 415
(Sep., 1991), pp. 763-769
Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association

Stable URL: https://www.jstor.org/stable/2290410

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to
digitize, preserve and extend access to Journal of the American Statistical Association

This content downloaded from


3.104.43.49 on Tue, 05 Dec 2023 12:21:41 +00:00
All use subject to https://about.jstor.org/terms
Dynamic Linear Models With Switching
R. H. SHUMWAY and D. S. STOFFER*

The problem of modeling change in a vector time series is studied using a dynamic linear model with measurement matrices
that switch according to a time-varying independent random process. We derive filtered estimators for the usual state vectors
and also for the state occupancy probabilities of the underlying nonstationary measurement process. A maximum likelihood
estimation procedure is given that uses a pseudo-expectation-maximization algorithm in the initial stages and nonlinear opti-
mization. We relate the models to those considered previously in the literature and give an application involving the tracking
of multiple targets.

KEY WORDS: Change points; Expectation-maximization algorithm; Nonlinear models; State-space; Target tracking.

1. INTRODUCTION ment covariance matrix R. The problems of estimating the


parameters by maximum likelihood and the signal vector x,
One way of modeling change in an evolving time series
by Kalman filtering and smoothing techniques have been
is by assuming that the dynamics of some underlying model
exhaustively treated in the literature [see Shumway (1988)
changes discontinuously at certain undetermined points in
for a review of some of them]. For purposes of maximum
time. In this article we will be concerned primarily with
likelihood estimation it is convenient to assume that the ini-
modeling change in the dynamic linear model, a general
tial vector xo along with the errors v, and w, have multi-
form that includes autoregressive integrated moving aver-
variate.normal distributions. It should be recognized that
age (ARIMA) and classical regression models as special
the model defined in (1) and (2) usually is applied with a
cases.
considerably reduced parameter space. For example, struc-
In a stable dynamic linear model, one assumes that some
tural models are often applied that represent the observed
underlying q x 1 observation vectors Yt are connected to
series as the sum of unobserved trend and seasonal com-
unobserved p x 1 signal vectors of interest x, through the
ponents (see, for example, Kitagawa and Gersch 1984;
observation equation
Harvey and Todd 1983). The two examples considered here
yt = Atxt + vt (1) also involve specializing (1) and (2) by reducing the num-
ber of nonzero parameters considered.
for the time points t = 1, . .., n, where At are q x p mea-
Generalizations of the preceding model to include the
surement matrices that convert the unobserved signal mea-
possibility of changes occurring over time have been ap-
surements into the data vectors Yt. The vectors vt are in-
proached by allowing changes in the error covariances (see
dependent zero-mean white-noise vectors with common q
Harrison and Stevens 1976 or Gordon and Smith 1988, 1990)
x q covariance matrix R. The measurement or design ma-
or by assigning mixture distributions to the observation er-
trices At are usually regarded as specified and may be used
rors vt (see Penia and Guttman 1988). Approximations to
to model situations involving structured multiple signals or
filtering were derived in all of the articles just cited. An
where there are missing observations (see, for example,
application to monitoring renal transplants was described in
Shumway 1988, sec. 3.3). The description of the model is
Smith and West (1983) and in Gordon and Smith (1990).
completed by noting that the signal process xt was gener-
Changes can also be modeled in the classical regression
ated from a starting point xo with mean ,u and covariance
case by allowing switches in the design matrices, as in
matrix X using the state equations
Quandt (1972). Applications of the switching approach to
xt= 4Fxt_ + Wt, (2) modeling changes in econometric time series have been re-
viewed by Tsurimi (1988). Switching via a stationary Mar-
where 'F is a p x p transition matrix that models the evo-
kov chain with independent observations has been devel-
lution of the signal vector xt through time and wt is a p x
oped by Lindgren (1978) and Goldfeld and Quandt (1973).
1 white-noise process, assumed to be independent of vt,
Markov switching for dependent data has been applied by
with covariance matrix Q. The parameters of the model are
Hamilton (1989) to detect changes between positive and
the initial mean and covariance ,u and E plus the state tran-
negative growth periods in the economy. Applications to
sition matrix 4 and covariance Q along with the measure-
speech recognition have been considered by Juang and Ra-
biner (1985); hidden Markov models are summarized by
* R. H. Shumway is Professor, Division of Statistics, University ofRabiner and Juang (1986). An application of the idea of
California, Davis, CA 95616. D. S. Stoffer is Associate Professor, De- switching to the tracking of multiple targets has been con-
partment of Mathematics and Statistics, University of Pittsburgh, Pitts- sidered in Bar-Shalom and Tse (1975) and in Bar-Shalom
burgh, PA 15260. R. H. Shumway's time was supported in part by the
Science and Technology Division, Institute for Defense Analysis. D. S.
(1978) who obtained approximations to Kalman filtering
Stoffer's time was supported in part by National Science Foundation Grant in terms of weighted averages of the innovations. They
#DMS-9000522 and by the Centers for Disease Control through a co-
operative agreement with the Association for Schools of Public Health
(ASPH). We are indebted to an anonymous referee who pointed out a
flaw in our original attempt at treating the case where the A,'s are Markov
dependent and to Gabriel Frankel of the Institute for Defense Analysis ? 1991 American Statistical Association
and Mohsen Pouramadhi of Northern Illinois University for a number of Joumal of the American Statistical Association
helpful suggestions. September 1991, Vol. 86, No. 415, Theory and Methods

763

This content downloaded from


3.104.43.49 on Tue, 05 Dec 2023 12:21:41 +00:00
All use subject to https://about.jstor.org/terms
764 Journal of the American Statistical Association

call their approximations "probabilistic data association (2) is of the form


filters. "
zt (P02 0 0 Zt-1 wt
2. DYNAMIC LINEAR MODELS WITH SWITCHING

Our approach is motivated primarily by the problem of


tracking a large number of moving targets using a vector
Zt-L = o 00ZOt-2
Iaot lo 0 1 01 ao, t-Ia0 a,

y, of sensors. In this problem one does not know at any


which models the constant drift in terms of two constant
given time point which target any given sensor has de- processes, aot and alt. In this case a rising economy might
tected. Hence it is the structure of the measurement matrix be characterized by drift aot + alt and a falling economy
A, in (1) that is changing and not the dynamics of the un- by aot.
derlying signal x, or the noises v, or w,. To illustrate, as- To incorporate a reasonable switching structure for the
sume a 3 x 1 vector of satellite measurements yt = (Ytl,measurement matrix into the dynamic linear model (1) and
yt,, ytb)' is observing some combination of tracks or signals (2) that is compatible with both the practical situations de-
xt' = (x,,, xtV, xt3). For the measurement matrix scribed above, we assume that the m possible configura-
tions are states in a nonstationary independent process de-
I 0 0
fined by the time-varying probabilities
A,= I 0 0
1 0 0 =j(t) Pr{At = MJ}, (6)

in the defining model (1), it is clear that all sensors are for j = 1, ..., m and t = 1, 2, .. ., n, independent of past
observing target x,,, whereas for the measurement matrix measurement matrices Al, . . ., At-I and of past data Yi, Y2,
... Yt-i- Important information about the current state of
0 1 0 the measurement process is given by the filtered probabil-
At= 1 0 0, ities for being in state j, defined as the conditional proba-
O 0 1 bilities

1Tj(t I t) = Pr{At = MJ I Yt}, (7)


the first sensor Yt, observes the second target xt2, the second
which also
sensor yt, observes target xt,, and the third sensor Yt3vary as a function of time. We will use the no-
ob-
serves the third target xt3. All possible detection tationconfigu-
Ys = {Y1, Y2, *--, Ys} (s = 1, 2, ..., n) to denote the
rations will define a set of possible values for At, say M1, space spanned by the observations Yi, Y25 *.., y, The fil-
M2,1 ... .,Mm, as a collection of plausible measurement ma- tered probabilities (7) give time-varying estimates of the
trices for p = 3 sensors tracking one, two, or three targets. probability of being in state j given the past and present Yt.
As a second example of the switching model to be con- The most important estimators for purposes of tracking the
sidered here, consider the case where the dynamics of the process are the filtered estimators for the state vectors xt,
linear model changes suddenly over the history of a given say
single realization. For example, Lam (1990) has given the
following generalization of Hamilton's (1989) model for
txt = E(xt I Yt), (8)
detecting positive and negative growth periods in the econ- and the filter covariances defined by
omy. Assume the representation
Pt = cov(xt I Y) (9)
Yt=Z+lnt, (3)
since these are the best current estimators of position.
where zt is an autoregressive series
It is clear that and
if the parameters of thent islinear
dynamic a ran
with a drift that switches between two values a0 and a0 + model are known, we would like to be able to compute
a1. That is, quickly both an estimator for the smoothed trajectory and
its variance given by Equations (8) and (9) and an estimator
nt= nt- + ao + a1St
for the probabilities of each of the current measurement
with St = 0 or 1 depending on whether we are in state 1 configurations given by Equation (7). In many cases we
or state 2. Suppose, for purposes of illustration, thatwill not know the parameters of the dynamic linear model,
given by the initial means and by the transition and co-
zt = 1zt- I + 2Zt-2 + Wt
variance matrices mentioned in the discussion following (1)
is a second-order autoregressive series with var(wt) =and
oT2.
(2). In these cases a procedure for estimating such pa-
Lam (1990) wrote Equation (3) in differenced form rameters by maximum likelihood will be of interest. A
Bayesian direction can also be taken that assigns prior dis-
VYt = zt-zt- + ao + a St (4)
tributions to the unknown parameters (see Gordon and Smith
that we may take as the observation equation (1) with state 1988, 1990, or Pefia and Guttman 1988). Our approach here
vector will be in terms of classical maximum likelihood estimation
using an approximation to the Expectation-Maximization
=t (zr, zt-, a0t, a1,)' (EM) algorithm (Dempster, Laird, and Rubin 1977) in
andM1 = (1, -1, 1,0O) andM2 = (1, -1, 1, 1) determining combination with a standard nonlinear optimization
the two possible economic conditions. The state equation procedure.

This content downloaded from


3.104.43.49 on Tue, 05 Dec 2023 12:21:41 +00:00
All use subject to https://about.jstor.org/terms
Shumway and Stoffer: Dynamic Switching 765

To summarize we derive exact equations for the filtered tt- = (DP t- 1V + Q, (15)
probabilities and state vectors in Section 3. The estimators m

given for the state vectors do not involve mixtures of the


Xt = Xt + E N j(t I t) Ktj(yt-Mjxt ), (16
normal distribution as in Gordon and Smith (1988, 1990) j=1
or Pefia and Guttman (1987) so that no approximations are and
needed to avoid the geometric increase in computations. In m

Section 3 we derive a procedure for estimating the param-


pt = j(t I t)(I -KtjMj)Pt- ( 17)
eters using a pseudo-EM algorithm to get close to the final j=1

values before changing to a classical nonlinear optimization where


algorithm.
K = Pt-lMj1V (18)
3. FILTERING
with Ytj defined as in Equation (10). This
In this section, we establish the recursions for the filters dated filters xt and filtered covariances as weighted com-
associated with the state process x, and the switchingbinations
pro- of the m scaled innovations and covariances,
cess At. The filters in this section are an essential part of respectively, corresponding to each of the possible mea-
the maximum likelihood estimation procedure which will surement matrices. The equations are somewhat similar to
be derived in Section 4. the approximations introduced by Bar-Shalom and Tse (1975)
First, consider the derivation of filters for the measure- as "probabilistic data association filters" [see, also, the ap-
proximations of Gordon and Smith (1988) and Pefia and
ment matrices At. Let fj(t I t - 1) denote the conditional
density of Yt, given At = Mj and the past; we know im-Guttman (1988)].
mediately that, under the multivariate normal assumptions, To verify (16), let I(At = Mj) be the indicator function
the conditional density fj(t I t - 1) is that of a normal with of the set At = Mj and note that
mean Mjxt-' and covariance matrix
xt = E(xt I Yt) = E[E(xt I Yt, At) I YJ
E-tj = Mjpt- 1Mj + R, (10)
where = E{ E(xt l Yt, At = M) I(At =M) I Yt}

Xt = E(xt I Yt-1) (11)


is the one-step filtered estimator for xt. Then=m [xc'
the + Ktj(yt -m M3') I(At
updating = M1) IYt}
equation to get Pr{At = Mj I Yt} is given by

E X j(t I t) [X7t' + Kt1y, (Y-M -1


Ij(t I t) = -1) (12) j=1

where Ktj is given by (18). Equation (17) is derived in


where we assume that the distribution
similar fashion. ij(t) (j = 1, ..., m)
has been specified prior to observing
To review, weYl, ***,
have shown Yt.
how to extend the classical
A potential weakness of the model is the need to specify Kalman filtering recursions to the case where the measure-
the time-varying prior probabilities ij(t). If the investigator
ment matrices are switching (or not) in accordance with a
has no reason to prefer one state over another at time t, he nonstationary independent measurement process. In this
or she might choose uniform probabilities ij(t) = m-'.
model we have derived estimators for the probability of being
Smoothness can be introduced by letting in state j at time t given the past and present Yt in Equation
m
(12). The modified results for the filtered state estimators

Ti(t) = E ij7(t- lIt- 1), (13) (16) show that the filtered estimators involve weighted
j=1 combinations of the gain-adjusted innovations. The filtered
where the nonnegative weights are chosen so that Y, -in covariances
= Pt again involve weighted combinations of the
1. If the process At were Markov with transition probabil- conventional estimators.

ities iij, then (13) would be the update for the filter prob-
4. MAXIMUM LIKELIHOOD ESTIMATION
ability, as has been shown in Lindgren (1978) or Kitagawa
(1987) (see, also, Rabiner and Juang 1986). The difficulty To develop a procedure for maximum likelihood esti-
in extending the approach here to the Markov-dependent mation, note first of all that the innovations form of the
case is the dependence in Yt which makes it necessary to log-likelihood function is proportional to
enumerate over the possible histories to derive Equation (12).
Equation (13) has vi(t) as a function of the past observa-
In L'(o) I n E vj(t) fj(t I t -1) (19)
tions Y,_1 and hence is inconsistent with model assumption t= I j=1
(10). Nevertheless, this seems to be a reasonable compro- where f(t I t - 1) are the multivariate normal
mise that allows the data to modify the probabilities ii-/t). fined earlier with means M1xtt' and covariance matrices
The filtered estimators xtt' and xt = E(xt | Yt) are given given in (10). We may consider maximizing (19) directly
by
as a function of the parameters 03 = (,u, t?, Q, R), or we
xt= FxtII, (14) may consider applying the EM algorithm to the complete-

This content downloaded from


3.104.43.49 on Tue, 05 Dec 2023 12:21:41 +00:00
All use subject to https://about.jstor.org/terms
766 Journal of the American Statistical Association

data log-likelihood. In general, the EM algorithm con- Table 1. Definition of Sensor Target Associations Used
verges nicely in the initial stages and more slowly in the to Determine m = 10 Possible Measurement Matrices
in Example
final stages, where it is advantageous to switch to a stan-
dard nonlinear optimization procedure. Measurements and
To apply the EM algorithm, note in the Appendix that
Measurement observed tracks
the log-likelihood of the complete data, xo, xl, ..., x,,, Al, matrix Yti Yt2 Yt3
A2, ..., An, and Yi, ..., yn, can be written as in Shumway
Ml xt 1 Xt 1 Xt 1
and Stoffer (1982) or Shumway (1988) with an additional
M2 Xt,1 Xt x2,1
term corresponding to the unknown probabilities irj (t). This M3 xt i xt 2 Xt 1
leads to the same regression equations for the transition ma- M4 xd 2 X0 , r X,
M5 Xt i Xt 3 Xt,2
trix (I and the error covariance Q expressed in terms of the
MB Xt,1 Xt,2 X,3
same smoothers x7 = E(xt I YJ) and smoother covariances M7 Xt 2 Xt i Xt 3
Pn and Pn_,. We note only the modification (A. 16) for the M8 Xt,2 Xt,3 Xt
M9 Xt 3 Xt,2 Xt
last smoothed covariance. The equation for updating R given
M10 Xt3 Xt i Xt 2
in (A. 8) involves smoothed probabilities irj(t I n) that have
not been computed. The backwards recursions for the
smoothed probabilities involve integrating over mixtures of The matrices Q and R were taken to be diag(.0025, .0025,
normal distributions and are excessively complicated. Monte .0025) and diag(. 0625, .0625, .0625), respectively, and the
Carlo integration techniques such as the Gibbs sampler initial mean was ,u = (5, 5, 5)' with n = 100 points. The
(Carlin, Polson, and Stoffer 1990) may be useful, but here measurement matrices were switched three times during the
we apply Equation (A.8) in the EM algorithm assuming first 100 points, leading to the observed data shown in Fig-
that the smoothed probabilities can be approximated by the ure 1. Since all tracks started at the same mean, the ob-
filtered probabilities irj(t I t). The resulting "pseudo-EM"
served data corresponds roughly to three one-dimensional
algorithm works quite well as is evident from the example targets originating from a common launch point. To sim-
considered in the following section.
Since the pseudo-EM algorithm is quite slow in later it- First Observed Series
erations and does not necessarily increase the incomplete- 10

data log-likelihood or converge to maximizing values for


the parameters, a variable-metric (Fletcher-Powell-Davi-
don) nonlinear optimization procedure (see Nash and Walker-
25-
Smith 1987) was applied to finish the maximization of
Equation (19) in the latter stages.

5. AN EXAMPLE 0 I
0 20 40 60 80 100
time in points
To illustrate the procedures of the preceding sections, we
return to the problem of tracking multiple targets, as intro-
Second Observed Series
duced in Section 1. The example given here uses a set of 10

contrived data that simulates the behavior of three sensors


observing various configurations of one to three targets over
n - 100 time points. The difficulty is that it is not possible
5 -
to identify which configuration of targets is being observed
at any given time point. This introduces a whole collection
0~ I I
0
of plausible measurement matrices, M1, M2, .. ., Mm at each
time point to serve as possible models for the true mea-
surement matrix At. Two of these matrices, corresponding 0 20 40 60 80 100
time in points
to observing only one track on all three sensors or observ-
ing one track on each of the three sensors, were given in
Third Observed Series
Section 1. The complete collection of measurement matri- 10

ces investigated at each point as plausible explanations for


the data forms the set of possible states for the hidden Mar-
kov chain in this example. A listing of the m = 10 config-
O-5
urations assumed to be possible at each time point is shown
in Table 1. Notice that the two measurement matrices shown
in Section 1 can be identified as M1 and M7 in Table 1.
The underlying tracks were generated according to a model
with a transition matrix of the form F> = diag(1l.005, .990, 0 20 40 60 80 100
time in points
1.000) which corresponds to three tracks, a random walk,
and two tracks that increase and decrease slightly over time. Figure 1. Original Measurements for Tracking Example.

This content downloaded from


3.104.43.49 on Tue, 05 Dec 2023 12:21:41 +00:00
All use subject to https://about.jstor.org/terms
Shumway and Stoffer: Dynamic Switching 767

plify further we have not exhibited the nonlinear dynamics nal tracks, as detailed in Section 2. Time requirements were
of the measurement matrices, which would certainly be 6 min., 3 min. and 1 min. respectively for 8086-, 80286-,
present in any realistic application. It is fairly clear from and 80386-based microcomputers. It is clear that tremen-
the observed tracks where the transition points lie and which dous reductions in computing time are available through
series (staying level, decreasing, increasing) that we are ob- parallel processing, which can be done in Equations (12),
serving. We have chosen a fairly straightforward set of re- (16), and (17) using parallel paths for each possible mea-
alizations to test the method. surement matrix. Hence the multiple path filters would re-
There is the question in any application as to whether quire no more time than a single Kalman filter and smoother.
time varying probabilities or the parameters in the dynamic Figure 2 shows 4 of the 10 filtered probabilities, com-
linear model are more important or whether the two are puted using the forward recursions (12) and (14)-(18). These
equally important. Often the dynamics of the tracks are as- are states exhibiting nonzero probabilities over a significant
sumed known so that those parameters are known. In that portion of the range. It is clear that after an initial period
case adjustment of the initial probabilities might be very of indecisiveness (t < 20), the filter indicates Mlo for 20
useful for effectively pruning highly improbable measure- ' t ' 29, M6 for 30 ' t ' 49, Mg for 50 ' t ' 69, and
ment matrices. In other cases the dynamics of the signal or Mlo again for 70 ' t ' 100.
track are unknown, and the initial probabilities simply may Figure 3 shows the signal tracks, estimated using Equa-
not matter that much. In this case parameter estimation might tions (14)-(18), and we note that they compare well to the
proceed for some fixed reasonable a priori set of probabil- input series mixed to determine the observed realizations
ities, such as rj(t) = m-l. The filtered probabilities (12) in Figure 1. In fact, we did not plot the original tracks be-
can serve as the important indicator of whether or not we cause they would have overlayed the estimated tracks al-
are in state j at time t. In the present example there are most
10 exactly.
initial probabilities. Because of the large number of states
6. DISCUSSION
that are common in examples involving tracking, we chose
to hold all of these probabilities fixed at reasonable starting We have developed an approach to modeling change in
values and identify the tracks by varying the parameters in a vector time series that uses a model for switching that is
the state-space model. The m = 10 initial probabilities were different than considered by Gordon and Smith (1988, 1990)
all taken as .1, corresponding to assuming that each con- or by Penia and Guttman (1988). In general restricting the
figuration is equally likely. switching to the measurement matrices simplifies the Kal-
A summary of parameter estimates using 50 iterations of man filtering recursions considerably since the equations
the EM algorithm and several varible-metric (Fletcher- are exact and it is not necessary to approximate mixture of
Powell-Davidon) iterations is shown in Table 2. It is clear normals as in the Gordon and Smith approach.
that the log-likelihood (19) increases at each step and that The model is still general enough to include most struc-
the transition matrix parameters converge rather quickly, tural models useful in modeling and monitoring changes of
say after 5 iterations. Estimating variances requires some- regime in vector time series. Even changes of the kind con-
what more time, and these may be assumed known to in- sidered by Gordon and Smith can be accommodated as long
crease processing speed. One iteration requires running the as one is willing to assume that they occur as abrupt changes
filters and smoothers for both the probabilities and the sig- in the configuration of elements observed in the state vec-

Table 2. Iterations for Parameters in Tracking Example

Iteration
no. Oil 422 X qll q22 qf r,, r22 r3 In L'
Pseudo-EM

0 1.000 1.000 1.000 .1000 .1000 .1000 .1000 .1000 .1000 -316.67
5 1.006 .990 .999 .0272 .0283 .0332 .0658 .0653 .0744 -252.50
10 1.006 .990 .999 .0118 .0129 .0160 .0516 .0533 .0643 -232.53
15 1.006 .991 .999 .0070 .0081 .0102 .0509 .0522 .0664 -226.84
20 1.006 .991 .999 .0047 .0059 .0073 .0512 .0519 .0682 -223.93
50 1.006 .991 .999 .0016 .0026 .0025 .0534 .0516 .0724 -219.79

Nonlinear optimization (Variable Metric)

51 1.006 .991 .999 .0013 .0025 .0023 .0533 .0516 .0732 -219.70
52 1.006 .991 .999 .0010 .0029 .0019 .0533 .0516 .0723 -219.51
53 1.006 .991 .999 .0011 .0022 .0015 .0533 .0515 .0723 -219.41
54 1.006 .991 .999 .0011 .0021 .0015 .0533 .0515 .0722 -219.38
55 1.006 .991 .999 .0011 .0020 .0015 .0533 .0514 .0722 -219.38
56 1.006 .991 .999 .0011 .0023 .0012 .0536 .0466 .0711 -219.36
57 1.006 .991 .999 .0011 .0024 .0011 .0536 .0461 .0709 -219.33
58 1.006 .991 .999 .0011 .0024 .0012 .0533 .0460 .0704 -219.32

* 1.005 .990 1.000 .0025 .0025 .0025 .0625 .0625 .0625


*denotes true parameter values.

This content downloaded from


3.104.43.49 on Tue, 05 Dec 2023 12:21:41 +00:00
All use subject to https://about.jstor.org/terms
768 Journal of the American Statistical Association

Filtered Probability- State 2 Filtered Probability- State 9


1.2 1.2-

1.0 ,0 1
0.8 0.8

0.6- 0.6

e0.4- 0.4-
0.2 0.2

0.2 - 0.2
0 20 40 60 80 1 0 20 40 60 80 100
time in points time in points

Filtered Probability- State 6 Filtered Probability- State 10


1.2 1.2-

1.0 1.0

0.8 0.8-

0.6 0.6

0.4 -0.4-
0.2 0.2
0.2 0.2
0 20 40 60 80 100 0020 40 60 80 100
time in points time in points

Figure 2. Filtered Probabilities for Most

tor. Modeling of abrupt changes in the observation error, obtained for independent data (see Lindgren 1978) do not
however, is precluded under our formulation, making our carry over easily to the dependent data case given here.
switching structure of relatively little use in robust detection
of outliers as considered by Pend and Guttman. APPENDIX: PSEUDO-EM ALGORITHM
We mention also that our approach allows for simulta- We give details for the pseudo-EM algorithm used in the first
neous estimation of parameters in the dynamic linear model stages of the procedure for maximizing the incomplete-data log-
by maximum likelihood. This means that the sequential likelihood (16). If we imagine the unobserved and observed com-
model selection approach and parameter estimation can be ponents of the model (1) and (2) generated in the order xo, xl,
accomplished simultaneously. A1, Yl, x2, A2, Y2, ..., the complete-data log-likelihood can be
It should be noted in closing that more general models written as

can be considered that introduce dependence in the mea-


surement matrices as they evolve over time. For example, In L(0) =--lnjl n~~~~~~~
(l -ji'1x0-j) - lnIQI
2 2 2
the measurement matrices may form a stationary Markov -- E (Xt O Xt_L),Y_,(t-x0_t-L)_
chain with constant initial and transition probabilities. The - (xt -
cost of increasing the level of generality to the dependent 2 t=1

case seems to be high in computational effort since the pos- n m

sible dependent paths must be laboriously traced back through + E E I(A, = MJ) ln nrJ(
t=_ J=_ 2
the chain, as in Hamilton (1989) who analyzed a simple
n m

two-state Markov structure. In addition one will have to


assume reasonable values for the transition probabilities or
E I(At = Mj)(yt - Atxt
2t= j=l
treat them as additional parameters. Thus it seems that the
(A. 1)
simple extensions to dependent switching transitions that
The EM algorithm requires that we maximize the conditional ex-
pectation

Final Filtered Series Q(0, 00) = EJ[ln L(0) I Yn] (A.2)


10 -

with respect to 0 at each step, where 00 is the parameter


the previous iteration.
Now, taking conditional expectations in (A. 1), noting that
> 5 -
-gI(t I n) = E[I(A, = Mj) I YJ], (A.3)
0
leads to
~0

ri1^(t) wrr(t I n), (A.4)


(A.5)
0 20 40 60 80 100
time in points ? = BA-1, (A.6)

Figure 3. Filtered Estimators for Tracks. Q = n'l(C - B - 'B' + (FA(F'), (A.7)

This content downloaded from


3.104.43.49 on Tue, 05 Dec 2023 12:21:41 +00:00
All use subject to https://about.jstor.org/terms
Shumway and Stoffer: Dynamic Switching 769

and Bar-Shalom, Y., and Tse, E. (1975), "Tracking in a Cluttered Environ-


n m ment With Probabilistic Data Association," Automatica, 11, 4451-4460.
Carlin, B. P., Polson, N. G., and Stoffer, D. S., (1990), "A Monte Carlo
R = - i,j(t I n)[(yt-MJX7)(YI-Mjx7)'+ MJPjMj].
Approach to Nonnormal and Nonlinear State Space Modeling,' pre-
t=1 j=1
print.
(A.8) Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), "Maximum
Likelihood From Incomplete Data via the EM Algorithm" (with dis-
The covariance matrix I is held fixed at some reasonable value. cussion), Journal of the Royal Statistical Society, Ser. B, 35, 1-38.
The matrices A, B, and C are defined as Goldfeld, S. M., and Quandt, R. E. (1973), "A Markov Model for
n Switching Regression," Journal of Econometrics, 1, 3-16.
Gordon, K., and Smith, A. F. M. (1988), "Modeling and Monitoring
A = E (PnI + Xn lXn, (A.9) Discontinuous Changes in Time Series," in Bayesian Analysis of Time
t=I
Series and Dynamic Linear Models, ed. J. C. Spall, New York: Marcel
n
Dekker, pp. 359-392.
B = > (P t_, + xtnx'), (A.10) (1990), "Modeling and Monitoring Biomedical Time Series,"
t=1 Journal of the American Statistical Association, 85, 328-337.
and Hamilton, J. D. (1989), "A New Approach to the Economic Analysis of
n
Nonstationary Time Series and the Business Cycle," Econometrica,
C = > (Pn + xnx n). (All) 57, 357-384.
t=1 Harrison, P. J., and Stevens, C. F. (1976), "Bayesian Forecasting" (with
discussion), Journal of the Royal Statistical Society, Ser. B, 38, 205-
These matrices involve the usual Kalman smoothers xn = E(xt | 247.
Yn) and their covariances Harvey, A. C., and Todd, P. H. J. (1983), "Forecasting Economic Time
Series With Structural and Box-Jenkins Models," Journal of Business
p n = EI(x, - Xn)(Xt _Xtny I Yn} and Economic Statistics, 1, 299-315.
and Juang, B. H. and Rabiner, L. R. (1985), "Mixture Autoregressive Hid-
den Markov Models for Speech Signals," Transactions of the IEEE
pnt_l= E{x n)(Xt-l_Xny I Yn}.
Acoustics, Speech, and Signal Processing, 33, 1404-1413.
The smoothers are derived under the assumption that theKitagawa, At are G. (1987), "Non-Gaussian State-Space Modeling of Nonsta-
stochastic in a manner analogous to that used for deriving the tionary Time Series," Journal of the American Statistical Association,
82, 1032-1041.
filters in Section 3. We obtain for t = n, n - 1 ...
Kitagawa, G., and Gersch, W. (1984), "A Smoothness Priors Modeling
n= xt_I + Jt_1(xn -_ X1), (A.12) of Time Series With Trend and Seasonality," Journal of the American
Statistical Association, 79, 378-389.
where Lam, P. S. (1990), "The Hamilton Model With a General Autoregressive
Component: Estimation and Comparison With Other Models of Eco-
it = Pt-IzD(Pt-)-. (A.13) nomic Time Series," preprint.
The smoother covariances satisfy Lindgren, G. (1978), "Markov Regime Models for Mixed Distributions
and Switching Regressions," Scandinavian Journal of Statistics, 5, 81-
t-= P + Jt (P )Jt-I (A.14) 91.
Nash, J. C., and Walker-Smith, M. (1987), Nonlinear Parameter Esti-
for t = n, n- 1. I and
mation, New York: Marcel Dekker.
Pt_ = Pt_ J>2 + J _I (P71 - (PPtI)J '2 (A. 15) Pefia, D., and Guttman, I. (1988), "Bayesian Approach to Robustifying
the Kalman Filter," in Bayesian Analysis of Time Series and Dynamic
for t n, n-1. 2 subject to Linear Models, ed. J. C. Spall, New York: Marcel Dekker, pp. 227-
m 254.
Quandt, R. E. (1972), "A New Approach to Estimating Switching
Pn n_= irj(n I n)(I - KjM)DPIn- I (A.16) Regressions," Journal of the American Statistical Association, 67, 306-
J=1
310.
where Kn1 is defined in (15). Rabiner, L. R., and Juang, B. H. (1986), "An Introduction to Hidden
As discussed in the text, the difficulties encountered in com- Markov Models," IEEE Acoustics, Speech, and Signal Processing
Magazine, Jan., 4-16.
puting 1r(t I n) led us to replace it with rrj(t I t) in (A.8). The
Shumway, R. H. (1988), Applied Statistical Time Series Analysis, New
advantages of using the EM algorithm are (a) its stability under
York: Prentice-Hall.
the switching structure, and (b) the simple regression computa- Shumway, R. H., and Stoffer, D. S. (1982), "An Approach to Time
tions involved at each step as exhibited in (A.6)-(A.8). Series Smoothing and Forecasting Using the EM Algorithm," Journal
of Time Series Analysis, 3, 253-264.
[Received February 1990. Revised January 1991.1 Smith, A. F. M., and West, M. (1983), "Monitoring Renal Transplants:
An Application of The Multiprocess Kalman Filter," Biometrics, 39,
867-878.
REFERENCES Tsurimi, H. (1988), "Survey of Bayesian and Non-Bayesian Testing of
Model Stability in Econometrics," in Bayesian Analysis of Time Series
Bar-Shalom, Y. (1978), "Tracking Methods in a Multi-Target Environ- and Dynamic Linear Models, ed. J. C. Spall, New York: Marcel Dek-
ment," IEEE Transactions on Automatic Control, 23, 618-626. ker, pp. 75-100.

This content downloaded from


3.104.43.49 on Tue, 05 Dec 2023 12:21:41 +00:00
All use subject to https://about.jstor.org/terms

You might also like