Dynamic Linear Models With Switching
Dynamic Linear Models With Switching
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to
digitize, preserve and extend access to Journal of the American Statistical Association
The problem of modeling change in a vector time series is studied using a dynamic linear model with measurement matrices
that switch according to a time-varying independent random process. We derive filtered estimators for the usual state vectors
and also for the state occupancy probabilities of the underlying nonstationary measurement process. A maximum likelihood
estimation procedure is given that uses a pseudo-expectation-maximization algorithm in the initial stages and nonlinear opti-
mization. We relate the models to those considered previously in the literature and give an application involving the tracking
of multiple targets.
KEY WORDS: Change points; Expectation-maximization algorithm; Nonlinear models; State-space; Target tracking.
763
in the defining model (1), it is clear that all sensors are for j = 1, ..., m and t = 1, 2, .. ., n, independent of past
observing target x,,, whereas for the measurement matrix measurement matrices Al, . . ., At-I and of past data Yi, Y2,
... Yt-i- Important information about the current state of
0 1 0 the measurement process is given by the filtered probabil-
At= 1 0 0, ities for being in state j, defined as the conditional proba-
O 0 1 bilities
To summarize we derive exact equations for the filtered tt- = (DP t- 1V + Q, (15)
probabilities and state vectors in Section 3. The estimators m
Ti(t) = E ij7(t- lIt- 1), (13) (16) show that the filtered estimators involve weighted
j=1 combinations of the gain-adjusted innovations. The filtered
where the nonnegative weights are chosen so that Y, -in covariances
= Pt again involve weighted combinations of the
1. If the process At were Markov with transition probabil- conventional estimators.
ities iij, then (13) would be the update for the filter prob-
4. MAXIMUM LIKELIHOOD ESTIMATION
ability, as has been shown in Lindgren (1978) or Kitagawa
(1987) (see, also, Rabiner and Juang 1986). The difficulty To develop a procedure for maximum likelihood esti-
in extending the approach here to the Markov-dependent mation, note first of all that the innovations form of the
case is the dependence in Yt which makes it necessary to log-likelihood function is proportional to
enumerate over the possible histories to derive Equation (12).
Equation (13) has vi(t) as a function of the past observa-
In L'(o) I n E vj(t) fj(t I t -1) (19)
tions Y,_1 and hence is inconsistent with model assumption t= I j=1
(10). Nevertheless, this seems to be a reasonable compro- where f(t I t - 1) are the multivariate normal
mise that allows the data to modify the probabilities ii-/t). fined earlier with means M1xtt' and covariance matrices
The filtered estimators xtt' and xt = E(xt | Yt) are given given in (10). We may consider maximizing (19) directly
by
as a function of the parameters 03 = (,u, t?, Q, R), or we
xt= FxtII, (14) may consider applying the EM algorithm to the complete-
data log-likelihood. In general, the EM algorithm con- Table 1. Definition of Sensor Target Associations Used
verges nicely in the initial stages and more slowly in the to Determine m = 10 Possible Measurement Matrices
in Example
final stages, where it is advantageous to switch to a stan-
dard nonlinear optimization procedure. Measurements and
To apply the EM algorithm, note in the Appendix that
Measurement observed tracks
the log-likelihood of the complete data, xo, xl, ..., x,,, Al, matrix Yti Yt2 Yt3
A2, ..., An, and Yi, ..., yn, can be written as in Shumway
Ml xt 1 Xt 1 Xt 1
and Stoffer (1982) or Shumway (1988) with an additional
M2 Xt,1 Xt x2,1
term corresponding to the unknown probabilities irj (t). This M3 xt i xt 2 Xt 1
leads to the same regression equations for the transition ma- M4 xd 2 X0 , r X,
M5 Xt i Xt 3 Xt,2
trix (I and the error covariance Q expressed in terms of the
MB Xt,1 Xt,2 X,3
same smoothers x7 = E(xt I YJ) and smoother covariances M7 Xt 2 Xt i Xt 3
Pn and Pn_,. We note only the modification (A. 16) for the M8 Xt,2 Xt,3 Xt
M9 Xt 3 Xt,2 Xt
last smoothed covariance. The equation for updating R given
M10 Xt3 Xt i Xt 2
in (A. 8) involves smoothed probabilities irj(t I n) that have
not been computed. The backwards recursions for the
smoothed probabilities involve integrating over mixtures of The matrices Q and R were taken to be diag(.0025, .0025,
normal distributions and are excessively complicated. Monte .0025) and diag(. 0625, .0625, .0625), respectively, and the
Carlo integration techniques such as the Gibbs sampler initial mean was ,u = (5, 5, 5)' with n = 100 points. The
(Carlin, Polson, and Stoffer 1990) may be useful, but here measurement matrices were switched three times during the
we apply Equation (A.8) in the EM algorithm assuming first 100 points, leading to the observed data shown in Fig-
that the smoothed probabilities can be approximated by the ure 1. Since all tracks started at the same mean, the ob-
filtered probabilities irj(t I t). The resulting "pseudo-EM"
served data corresponds roughly to three one-dimensional
algorithm works quite well as is evident from the example targets originating from a common launch point. To sim-
considered in the following section.
Since the pseudo-EM algorithm is quite slow in later it- First Observed Series
erations and does not necessarily increase the incomplete- 10
5. AN EXAMPLE 0 I
0 20 40 60 80 100
time in points
To illustrate the procedures of the preceding sections, we
return to the problem of tracking multiple targets, as intro-
Second Observed Series
duced in Section 1. The example given here uses a set of 10
plify further we have not exhibited the nonlinear dynamics nal tracks, as detailed in Section 2. Time requirements were
of the measurement matrices, which would certainly be 6 min., 3 min. and 1 min. respectively for 8086-, 80286-,
present in any realistic application. It is fairly clear from and 80386-based microcomputers. It is clear that tremen-
the observed tracks where the transition points lie and which dous reductions in computing time are available through
series (staying level, decreasing, increasing) that we are ob- parallel processing, which can be done in Equations (12),
serving. We have chosen a fairly straightforward set of re- (16), and (17) using parallel paths for each possible mea-
alizations to test the method. surement matrix. Hence the multiple path filters would re-
There is the question in any application as to whether quire no more time than a single Kalman filter and smoother.
time varying probabilities or the parameters in the dynamic Figure 2 shows 4 of the 10 filtered probabilities, com-
linear model are more important or whether the two are puted using the forward recursions (12) and (14)-(18). These
equally important. Often the dynamics of the tracks are as- are states exhibiting nonzero probabilities over a significant
sumed known so that those parameters are known. In that portion of the range. It is clear that after an initial period
case adjustment of the initial probabilities might be very of indecisiveness (t < 20), the filter indicates Mlo for 20
useful for effectively pruning highly improbable measure- ' t ' 29, M6 for 30 ' t ' 49, Mg for 50 ' t ' 69, and
ment matrices. In other cases the dynamics of the signal or Mlo again for 70 ' t ' 100.
track are unknown, and the initial probabilities simply may Figure 3 shows the signal tracks, estimated using Equa-
not matter that much. In this case parameter estimation might tions (14)-(18), and we note that they compare well to the
proceed for some fixed reasonable a priori set of probabil- input series mixed to determine the observed realizations
ities, such as rj(t) = m-l. The filtered probabilities (12) in Figure 1. In fact, we did not plot the original tracks be-
can serve as the important indicator of whether or not we cause they would have overlayed the estimated tracks al-
are in state j at time t. In the present example there are most
10 exactly.
initial probabilities. Because of the large number of states
6. DISCUSSION
that are common in examples involving tracking, we chose
to hold all of these probabilities fixed at reasonable starting We have developed an approach to modeling change in
values and identify the tracks by varying the parameters in a vector time series that uses a model for switching that is
the state-space model. The m = 10 initial probabilities were different than considered by Gordon and Smith (1988, 1990)
all taken as .1, corresponding to assuming that each con- or by Penia and Guttman (1988). In general restricting the
figuration is equally likely. switching to the measurement matrices simplifies the Kal-
A summary of parameter estimates using 50 iterations of man filtering recursions considerably since the equations
the EM algorithm and several varible-metric (Fletcher- are exact and it is not necessary to approximate mixture of
Powell-Davidon) iterations is shown in Table 2. It is clear normals as in the Gordon and Smith approach.
that the log-likelihood (19) increases at each step and that The model is still general enough to include most struc-
the transition matrix parameters converge rather quickly, tural models useful in modeling and monitoring changes of
say after 5 iterations. Estimating variances requires some- regime in vector time series. Even changes of the kind con-
what more time, and these may be assumed known to in- sidered by Gordon and Smith can be accommodated as long
crease processing speed. One iteration requires running the as one is willing to assume that they occur as abrupt changes
filters and smoothers for both the probabilities and the sig- in the configuration of elements observed in the state vec-
Iteration
no. Oil 422 X qll q22 qf r,, r22 r3 In L'
Pseudo-EM
0 1.000 1.000 1.000 .1000 .1000 .1000 .1000 .1000 .1000 -316.67
5 1.006 .990 .999 .0272 .0283 .0332 .0658 .0653 .0744 -252.50
10 1.006 .990 .999 .0118 .0129 .0160 .0516 .0533 .0643 -232.53
15 1.006 .991 .999 .0070 .0081 .0102 .0509 .0522 .0664 -226.84
20 1.006 .991 .999 .0047 .0059 .0073 .0512 .0519 .0682 -223.93
50 1.006 .991 .999 .0016 .0026 .0025 .0534 .0516 .0724 -219.79
51 1.006 .991 .999 .0013 .0025 .0023 .0533 .0516 .0732 -219.70
52 1.006 .991 .999 .0010 .0029 .0019 .0533 .0516 .0723 -219.51
53 1.006 .991 .999 .0011 .0022 .0015 .0533 .0515 .0723 -219.41
54 1.006 .991 .999 .0011 .0021 .0015 .0533 .0515 .0722 -219.38
55 1.006 .991 .999 .0011 .0020 .0015 .0533 .0514 .0722 -219.38
56 1.006 .991 .999 .0011 .0023 .0012 .0536 .0466 .0711 -219.36
57 1.006 .991 .999 .0011 .0024 .0011 .0536 .0461 .0709 -219.33
58 1.006 .991 .999 .0011 .0024 .0012 .0533 .0460 .0704 -219.32
1.0 ,0 1
0.8 0.8
0.6- 0.6
e0.4- 0.4-
0.2 0.2
0.2 - 0.2
0 20 40 60 80 1 0 20 40 60 80 100
time in points time in points
1.0 1.0
0.8 0.8-
0.6 0.6
0.4 -0.4-
0.2 0.2
0.2 0.2
0 20 40 60 80 100 0020 40 60 80 100
time in points time in points
tor. Modeling of abrupt changes in the observation error, obtained for independent data (see Lindgren 1978) do not
however, is precluded under our formulation, making our carry over easily to the dependent data case given here.
switching structure of relatively little use in robust detection
of outliers as considered by Pend and Guttman. APPENDIX: PSEUDO-EM ALGORITHM
We mention also that our approach allows for simulta- We give details for the pseudo-EM algorithm used in the first
neous estimation of parameters in the dynamic linear model stages of the procedure for maximizing the incomplete-data log-
by maximum likelihood. This means that the sequential likelihood (16). If we imagine the unobserved and observed com-
model selection approach and parameter estimation can be ponents of the model (1) and (2) generated in the order xo, xl,
accomplished simultaneously. A1, Yl, x2, A2, Y2, ..., the complete-data log-likelihood can be
It should be noted in closing that more general models written as
sible dependent paths must be laboriously traced back through + E E I(A, = MJ) ln nrJ(
t=_ J=_ 2
the chain, as in Hamilton (1989) who analyzed a simple
n m