Updating Techniques in Recursive Least-Squares Estimation
Updating Techniques in Recursive Least-Squares Estimation
Updating Techniques in Recursive Least-Squares Estimation
Estimation.
Anders Sjo
October 23, 1992
Contents
1 Introduction
2 Algorithms
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
2
4
4
5
5
7
10
12
13
14
14
18
3 Error Analysis
22
28
31
56
Chapter 1
Introduction
The need for recursive algorithms for parameter estimation has arisen since online parameter estimation is frequently used in scientic areas like for instance
adaptive control and signal processing.
In this survey we have only considered algorithms based on the least-squares
problem, i.e.
min
kAx ; bk2;
(1.1)
x
where A 2 IRmn , x 2 IRn and b 2 IRm , with m > n. The solution to the
least-squares problem is given by the normal equations
AT Ax = AT b:
When a new linear equation aT x = is added to the system, we get a modied
least-squares problem
min
x kAx ; bk2 ;
;
;
where A = AT a T and b = bT T . This procedure is called updating.
The classical recursive algorithms are based on the normal equations, which
has the drawback that we will lose accuracy when the matrix AT A is formed.
One alternative is to update the Cholesky factorization of this matrix or its
inverse, e.g. AT A = LLT , where L is a lower-triangular (n n)-matrix, but still
use the same idea as in the classical algorithms. The numerical disadvantages
of the normal equations have however led to a new set of algorithms that in
principle update the QR decomposition of; the matrix
A = QR0 , where Q is
T
T
0
an orthogonal (m m)-matrix and R = R 0 , where R is an uppertriangular (n n)-matrix. Although dierent in approach, these two groups of
alternative algorithms are closely related since
AT A = AT A + aaT = LLT + aaT = RT R + aaT ;
2
i.e. L = RT .
In Chapter 2 we present the dierent alternatives to perform recursive leastsquares estimation. To make the algorithms comparable we have aimed for
making the taxonomy as uniform as possible, implying that it often will dier
from the presentations in the references. In Chapter 3 a brief error analysis
is presented to form the basis of the test examples presented in Chapter 5.
In Chapter 4 we discuss the problem of time-varying parameters. There are
mainly two strategies to deal with this problem: downdating and forgetting. By
downdating we mean the case when a linear equation aT x = is discarded
from (1.1), giving for example
R T R = RT R ; aaT :
Downdating is, however, numerically a more complicated problem than updating mainly because the singular values of R are less than or equal to the
corresponding singular values of R, i.e. R may become singular. In forgetting
we put time-dependent weights on the old data, giving a weighted least-squares
problem
min
kD(Ax ; b)k2 ;
x
where D = diag((m;i)=2 ) and the forgetting factor is limited by 0 < < 1.
This type of forgetting is often referred to as exponential forgetting. This is also
connected to some minor problems, which will be further discussed in Chapter 4.
An alternative to exponential forgetting is directional forgetting, i.e. we only
forget in the direction of the new vector a.
Chapter 2
Algorithms
2.1 Recursive Algorithms and the Regression
Model
In this chapter we will derive a number of algorithms for the recursive leastsquares problem. These algorithms must have certain qualities that to some
extent is connected to their applications. The most important qualities are that
the algorithm should give a solution that is accurate up to the limitations
of the data and conditioning of the problem, i.e. the algorithm should not
aggravate the problem's sensitivity to perturbations,
the algorithm should use as few operations (and particularly as few squareroots) as possible and
it should be possible to use a computer with short word-length,
cf. Bjorck et al. [5] and Alexander et al. [1].
The least-squares method has many applications, among which we will concentrate on the regression problem. Assume that we have a mathematical model
for our system that can be represented as a linear regression model
y(t) = '1 (t)1 + '2 (t)2 + + 'n (t)n + "(t) = 'T (t) + "(t);
(2.1)
where y is the observed variable, is a vector containing the parameters to
be estimated, ' is a regression vector of known functions 'i , that may depend
on other known variables, and " is additive errors.; Now, let
us say that in a
discrete-time problem we have observed pairs of y(i); '(i) for i = 1; 2; :::; t.
These observations give an overdetermined linear system
t + Et = Yt ;
4
with
0 'T (1) 1
BB 'T (2) CC
t = B .. C ;
@ . A
'T (t)
;
;
Yt = y(1) y(2) y(t) T and Et = "(1) "(2) "(t) T . The
To derive the RLS algorithm we note that when adding a new data vector,
the inverse of the covariance matrix is updated as
Pt;1 = Pt;;11 + ;2 't 'Tt
(2.5)
5
and furthermore,
;
^t = ;2 Pt Tt Yt = ;2 Pt Tt;1 Yt;1 + 't yt :
(2.6)
Kt = Pt '2 t ;
T
Pt = Pt;1 ; P2t;+1 ''tT'Pt Pt;'1 = Pt;1 ; ;t 1 (Pt;1 't ) (Pt;1 't )T ;
t;1 t
t
(2.9)
(2.10)
The Kalman gain can now be expressed with the old covariance matrix
;P ; ;1(P ' )(P ' )T ' P '
t = t;1 t :
Kt = t;1 t t;12 t t;1 t
(2.11)
t
:=
:=
:=
:=
:=
yt ; 'Tt ^t;1
2 + 'Tt Pt;1 't
Pt;1 't =t
^t;1 + Kt"t
Pt;1 ; ;t 1 (Pt;1 't ) (Pt;1 't )T :
2
Due to the fact that P is symmetric the RLS algorithm only requires n =2+O(n)
multiplications and n divisions.
2
t;1
t;1 t
t;1 t
Peterka's and Bierman's algorithms are two algorithms that are based on the
covariance matrix updating formula (2.9), but now the covariance matrix is
written in LDLT factorized form, cf. Peterka [22] and Bierman [4] respectively.
Although the lines of derivation diers, the two algorithms can be shown to be
theoretically equal.
To make the derivations of the algorithms easier to follow we will in the
remainder of this section use the following notations P = Pt , P = Pt;1 and so
forth.
Peterka's algorithm is based on a transformation of a joint covariance
matrix P 2 IR(n+1)(n+1) . Consider the system of (2.1), the joint covariance
matrix of y and is given by
T
2
T
P = ' P' + ' P :
P'
1 'T L
P=
0 L
1
LT ' LT
P=
1
0
K L
0
0 D
1 K T
0 L T ;
LL~ D~ L~ T LT = L D L T ;
Peterka's algorithm
1. Dene vectors v := LT ' and w := Dv.
2. Set n+1 := 2 and K n := 0.
3. Use backward recurrence
for j = n to 1
j := j+1 + vj wj
dj := j+1 dj =j
L j := Lj ; vj K j
j
K ;1 := K j + wj L j =j
4. Use the Kalman gain K := K 0 .
5. Update the parameters ^ := ^ + K y ; 'T ^ .
Bierman's algorithm
1. Dene vectors v := LT ' and w := Dv.
2. Set n+1 := 2 and K n := 0.
3. Use backward recurrence
for j = n to 1
j := j+1 + vj wj
dj := j+1 dj =j
L j := Lj ; vj K j =j+1
K j;1 := K j + wj Lj
4. Compute the Kalman gain K := K 0 =1 .
5. Update the parameters ^ := ^ + K y ; 'T ^ .
As we can see there is very little that diers between the two algorithms, and
we can thus expect them to perform equally well. The complexity of the algorithms is also comparable, i.e. Peterka's algorithm requires 3n2=2 + O(n) multiplications and 2n divisions, while Bierman's algorithm requires 3n2=2 + O(n)
multiplications and 2n + 1 divisions.
Like all algorithms that update the covariance matrix, Peterka's and Bierman's algorithms have the advantage that the Kalman gain appears directly.
Furthermore, the usage of the LDLT factorization rather than the Cholesky
factorization avoids the involvement of square-roots, which in some applications
is preferable.
Here we can also easily use exponential forgetting by setting n+1 := and
.
scaling D := D=
T
Rn = vn L0 ;
K = K :
0
Givens 1 algorithm
1. Dene the vector v := LT '.
2. Set n := 2 , n := pn and K n := 0.
3. Use backward recurrence
for j = n to 1
j;1
j;1
L j
j
K ;1
:=
:=
:=
:=
j + vj2
pj;1
(j Lj ; vj K j )=j;1
(j K j + vj Lj )=j;1
2
This algorithm requires 5n2=2 + O(n) multiplications, 2n + 1 divisions and
n + 1 square-roots. When comparing this algorithm to the RLS algorithm and
Peterka's and Bierman's algorithms, it is clearly seen that it cannot hold it's
own when it comes to the number of operations. The involvement of squareroots also proves to make this algorithm slower than the others. On the other
hand one could think that the orthogonal transformations should make the
algorithm more stable, but according to van Dooren and Verhaegen [8] this is
not obvious. In particular, the usage of Sherman-Morrison's formula changes the
numerical behaviour of the updating, and it is not certain that the orthogonal
transformations will restore the eects of this.
Both exponential forgetting and downdating can be applied to this algorithm. Exponential forgetting is achieved by using p
the same rescaling as in the
RLS algorithm, leading to n := and L := L = . The aspects of using
this type of algorithm for downdating are discussed in Alexander et al. [1] and
Gill et al. [12]. An alternative to this orthogonal downdating is an algorithm
presented by Ferng et al. [10], using hyperbolic rotations. A hyperbolic rotation
H is a semi-orthogonal matrix with the following appearance,
H=
cosh
; sinh
; sinh
cosh
HSH T = S;
11
Pt;1 = Lt LTt
T
^
Rt; = L;t;'T ;t;yt :
t
1
Then we apply n Givens rotations to zero the vector ;1 'Tt , giving
LT ^
Rt = U Rt;1 = 0t rt :
The parameter estimation is then given by
^t = L;t T ^t :
The algorithm is summarized as follows:
SRIF algorithm
1. Dene the vector v1 := ;1 't .
2. Set r1 := ;1 yt .
12
(2.13)
qj
:= vj 2 + ljj2
c := ljj =
s := vjj =
rj+1 := crj ; s^j
^j+1 := c^j + srj
vj+1 := cvj ; sLj
Lj := cLj + svj
4. Update the parameters ^t := L;T ^t .
Rt Y~t ;
0 t
QTt t Yt =
(2.14)
First of all we will derive the LINPACK updating algorithm. Assume that
we know the factors Rt , Y~t and t of Equation (2.14) and that we have a new
pair of observations (yt+1 ; 't+1 ). The updating is now simply done by
0 R Y~ 1
0 'T y 1
t+1
t+1 t+1
@ 0 t+1 A := QTt @ Rt+1t Y~t A ;
(2.15)
0
0
0
t
giving the new parameter estimate
^t+1 := Rt;+11 Y~t+1 :
The updating method is equivalent to the sequential row orthogonalization
method for computing the QR decomposition, which is known to be stable.
Note the similarities to the SRIF algorithm.
Downdating is a more sensitive problem, and more care has to be taken when
deriving and implementing the algorithms. We will present two algorithms for
least-squares downdating: the LINPACK algorithm and the CSNE algorithm
(Corrected Semi-Normal Equations).
Assume that we have the QR factorization of
0
1
; Y = 'T1 y1 = Q @ R0t Y~t A
(2.16)
t t
t
t
t+1 Yt
0 0
and want to delete the rst row 'T1 . By forming the matrix e1 t Yt we
obtain
1 'T y 0 q Rt Y~t 1
T
Qt 0 1 Y 1 = @
0 t A ;
t+1 t+1
q 0 0
14
where q;tT = qT qT is the rst row of Qt , Tt = '1 Tt+1 and
YtT = y1 YtT+1 . We can now apply a matrix U , formed by a number of
Givens rotations, such that
0 q R Y~ 1 0 1 '^T y^ 1
t t
B 0 Rt+1 Y~t+1 CC :
UT @
(2.17)
0 t A = B
@ 0 0 t+1 A
q 0 0
0 0
0
We have thus performed the transformation
0
1
1 'T y B 10 R'^T Y~y^ C
+1 C
U T 0 1 Y 1 = B
@ 0 t0+1 tt+1
A;
t+1 t+1
0 0
0
where U = QtU . The rst columns on each side read U T e1 = e1, saying that U
must have the following appearance
U = 10 Q0
t+1
and, hence, '^ = '1 and y^ = y1 . By removing the rst row we have thus obtained
the new downdated QR decomposition.
The sliding window computations require a large amount of data storing,
which easily becomes too costly. One way to avoid this is to downdate only the
R-matrix of the QR factorization. This is done in Saunder's LINPACK algorithm, cf. Saunders [24]. Another way is to use the Gram-Schmidt factorization
which reduces the storage requirement for the Q-matrix, cf. Bjorck et al. [5] and
Daniel et al. [7].
To derive the LINPACK algorithm we observe that only the rst row of
Qt is used in the downdating above. In Bjorck et al. [5] the derivations of how
to compute the elements of this vector are given in detail, here we will however
only present the results of these derivations in the LINPACK algorithm given
below.
By leaving the Q-matrix out of the computations, Saunder's LINPACK algorithm improves the eciency. On the other hand it can be shown, cf. Bjorck
et al. [5], that we will lose in accuracy compared to algorithms that downdate
the whole QR factorization, due to the fact that information is lost when the
Q-matrix is left unchanged.
An alternative algorithm is Bjorck's CSNE algorithm, where again the
Q-matrix is left out of the computations, but instead the data matrix t is
included in iterative renement. The derivations are again given in detail in [5].
The semi-normal equations are given by
RtT Rt = Tt Yt :
15
LINPACK algorithm
p1 ; kqk
2
2
and
(2.19)
CSNE algorithm
r := r ; e:
1 'T
1
0 Rt+1
y1
Y~t+1 :
7. Calculate t+1 :=
t .
0
2
Obviously the complexity of the algorithms is rather high. The LINPACK algorithm uses two triangular solves and the iteritive renement of the CSNE
algorithm adds another three. Furthermore, one QR decomposition is carried
out in each algorithm, giving a total cost of about 3n2 operations for the LINPACK algorithm and about 4pn + 4:5n2 operations, where p is the length of
the window, for the CSNE algorithm. If one can accept the complexity of the
LINPACK algorithm, then the CSNE algorithm can be used whenever the conditioning of the downdating problem is poor. This Hybrid algorithm was
suggested by Elden, cf. Bjorck et al. [5].
17
Note that the scalar is not used in the downdating in the LINPACK
algorithm, but may still be calculated. This is because when we want to use the
Hybrid algorithm we use the factor
1 ; kqk22 ;
> TOL;
where TOL is a suitably chosen factor in the interval (0; 1), e.g. TOL = 0:25, to
check whether the system is too ill-conditioned to use the LINPACK algorithm.
Another alternative to the LINPACK algorithm is an algorithm presented by
Alexander et al. [1], using hyperbolic rotations (see Section 2.2.3). This algorithm has been shown to have almost as good numerical qualities as the LINPACK algorithm, if the problem is not too poorly conditioned, and furthermore
that it uses n2 less operations than the LINPACK algorithm.
QT
1 'T q R
t
t
;
0 Rt = 0
+1
1 ;'T S 0
t t Q=
0 St
St ; St q :
1
+1
+1
The updating of the last column of Equation (2.15) is achieved by taking the
transpose of the columns
; y Y~ T Q = ; Y~ T :
t
t
t+1 t
t+1
Note that Q, as distinguished from Q in Equation (2.15), does not aect t .
Finally, we calculate
q
t+1 = 2 + 2t
and update our parameters as
^t+1 = St+1 Y~t+1 :
18
0 0
@ Y~tT
+1
St+1 ; 1 St+1 q
1 0 1 ;'T S 1
t t
A := @ yt
Y~tT A Q:
+1
St
2
For the downdating of R;1 we will present two algorithms directly related
to the ones in Section 2.4.2. To derive the LINPACK inverse downdating
algorithm we recall the QR decomposition (2.19), which shortly can be written
U^ T
q R 1 '
t
:
0 = 0 Rt
1
+1
and, fourthermore,
St ;
1 St q
U^ =
1 ;'T S
t
;
0 St
+1
+1
; Y~ T ^ U^ = ; y Y~ T :
t
t
1
+1
This algorithm is presented below together with the CSNE inverse downdating algorithm (Corrected Semi-Normal Equations), that uses the seminormal equations with one step of iterative renement.
19
:= 1 ; kqk22
and
T^
^ := y1 ; '1 t :
1
A Q:^
r
q
q
v
v
r
and
:=
:=
:=
:=
:=
:=
e1 ; t St q;
StT Tt r;
q + q;
St q;
v + v;
r ; t v
:= krk2 :
t+1 := ^ = t
20
else compute the normalized residual and use another step of iterative
renement to calculate
e := (Yt ; t ^t )=t ;
:= (y1 ; 'T1 ^t )=t ;
r := r ; e;
:= eT r;
:= + ;
r := r ; e;
t+1 := krk2 t =
and
^ := t =
1
A Q:^
2
To modify R;1 rather than R has the advantage that the triangular solves
are avoided, and the number of operations is thus decreased. And, furtermore,
in our application we have no need of knowing the QR decomposition of
explicitly, so in this sense it is not advantageous to use the methods of Sections 2.4.1 and 2.4.2. But then we have forgotten the most important quality
of the algorithms: the numerical stability. Whereas the error analysis of the
algorithms using modication of the QR factorization is rather straightforward
and well established, the ditto for the inverse modication methods have not
been fully examined (for further discussion, see Chapter 3).
21
Chapter 3
Error Analysis
When analysing the eect of round-o errors, there are three aspects that have
to be studied:
What bounds can be obtained for the errors performed in step t?
How do errors performed in step t propagate in subsequent steps?
How do errors performed in dierent steps interact and accumulate?
cf. van Dooren and Verhaegen [8]. In the literature, however, mostly the second
item has been studied so far.
Error analysis in numerical linear algebra can be very complicated, usually
requiring tedious studies to fully examine how errors emerge and propagate for
a certain algorithm. An exampel is Gau elimination, which up to this date
still is subject of intensive research. Therefore, we do not hope to fully examine
the numerical properties of the algorithms presented in Chapter 2, but only to
present some results of the error analyses made by e.g. van Dooren and Verhaegen, Bjorck et al. and Alexander et al., and then by numerical experiments
make certain conclusions about the algorithms.
2
2
increase. This is of course alarming, but on the other hand we know that the
RLS algorithm works in practice if the problem is not too poorly conditioned,
which implies that the bounds of (3.1) are too pessimistic.
Although the algorithms of Sections 2.2.2 and 2.2.3 dier from the RLS
algorithm, the algorithms are inherently equivalent, and the equations above
hold. For the SRIF algorithm we do not calculate ^t+1 from ^t directly, and we
would expect no propagation of errors between them. Yet, such a propagation
is present via the vectors ^t+1 and ^t and the relation
^t+1 = (Lt+1 + Lt+1 );T ^t+1 :
In fact, the SRIF algorithm is inherently equivalent to an update of P and ^,
and the equations above hold for this algorithm as well, cf. van Dooren and
Verhaegen [8].
The main reason for using factorized forms of the covariance matrix (or the
information matrix) is that one is bound to lose accuracy whenever P (or P ;1 )
is calculated explicitly. In [8] it is shown that this is valid, as far as the error
analysis is concerned, only for P (or P ;1 ) but not for K or ^. And clearly, since
we are mainly interested in a good parameter estimate, this implies that we do
not necessarily gain accuracy by using the algorithms that involve factorized
23
forms instead of the RLS algorithm. What we know, however, is that we will
lose eciency when these algorithms are used.
The error analysis concerning items one and three above, i.e. What bounds
can be obtained for the errors performed in step t? and How do errors performed
in dierent steps interact and accumulate?, are more dicult to derive and will
only be presented here with approximate bounds for the RLS algorithm, the
Givens 1 algorithm and the SRIF algorithm, cf. van Dooren and Verhaegen [8]
(k k denotes the 2-norm throughout).
1. RLS algorithm
+1
+1
2. Givens 1 algorithm
+1
1 2
+1
+1
3. SRIF algorithm
+1
+1
+1
+1
+1
+1
where i are constants close to the machine precision u, () is the condition
number and r is the factor from Equation (2.13).
The dierences are quite clear and can be left to the reader to observe. We
will, however, make the following comments:
The expression for the error in the parameters is almost the same for
the RLS and Givens 1 algorithms, although the errors in the covariance
matrix and the Kalman gain dier. This implies that accuracy might not
necessarily be gained by using factorized forms of the covariance matrix
as far as the parameters are concerned.
Regarding the errors in the parameters, there are factors that are not
multiplied by an i in all three algorithms.
24
The appearance of the parameter error in the SRIF algorithm diers sig-
nicantly from the other two, implying that other stability properties can
be expected.
F (x + x) = y + y:
Provided that jxj ) jyj < , we say that we have forward stability.
A more powerful tool for the stability analysis of problems and algorithms
is backward stabilty. For the problem (3.2) it can be dened as follows: If we
perturb the data y by y, we obtain a perturbed solution x + x satisfying (3.2).
If jyj ) jxj < , the problem is backward stable. Note that this is a
condition on the inverse F ;1 ; the solution must depend continuously on the
data y.
This technique is also used in the analysis of a computational algorithm A for
the solution of (3.2). The algorithm can be thought of as an approximate inverse
to F . Given the data (F ; y) of the problem (3.2), it produces an approximate
solution x~:
x~ = A(F ; y):
This approximate solution satises some equation
F~(~x) = y~;
and the backward stability approach tries to ensure that this perturbed problem
is \nearby" the original problem. In other words, the algorithm is amenable to a
backward stability analysis, if it produces the exact solution of a nearby problem.
Backward stability can then be expressed as
jF~ ; Fj < 1 ^ jy ; y~j < 2 ) jx ; x~j < :
It is common to ask for uniform continuity, i.e. should be proportional to the
i . In addition, the typical error analysis is often carried out in such a way that
we require and i to be expressed in terms of problem parameters (i.e. the
25
kR~ () ; R~ k
kR~ k
2
(R)C
2
2n(n =
1 2
k
f
k
k
E
k
=
C + 1) k' k + (2n C + 2n + 1) kRk jj + O( );
2
3 2
1 2
C = p kqk2 2 :
1 ; kqk2
Once again we see that the closeness of kqk to 1 is important via the factor C
2
k
r
k
k
Y
k
=
=
u c kk + n p kk + n nkk + p kk + n kk ;
where = c , c = 2n = (c + n) and c 2n = (c +2n + p=2). According to
1 2
1 2
1 2
1 2
1 2
Bjorck et al., if < 1, the forward error will not be worse than for a backward
stable method.
27
Chapter 4
Time-Varying Parameter
Estimation
In almost any system where recursive least-squares algorithms are used, the parameters are assumed to vary with time. If this were not the case, the parameter
estimation would simply be a tuning process, which is not really a demanding
problem.
If the parameters vary with time, we have to discount old observations in
some way, or otherwise the result will be poor estimates. The two major categories of discarding old observations are forgetting and downdating. In this
chapter we will discuss dierent ways of dealing with the problem of timevarying parameters.
Of course we have to make certain assumptions about how the parameters
can vary. To be able to produce good parameter estimates, we can only allow
the parameters to vary slowly and/or seldom compared with the time constants
of the system, cf. Hagglund [15]. If we for instance allow the parameters to vary
quickly, noise can be interpreted as a parameter change and we will obtain poor
estimates. The parameter variations can now be divided into two types: the
slow parameter changes, where the problem is to determine \the right amount
of forgetting", and the large parameter changes, where the problem is to detect
when a parameter change has occured.
The algorithms based on downdating are automatically adapted to timevarying parameters, and therefore in this chapter we will mainly discuss forgetting. The most common type of forgetting in control theory is exponential
forgetting, where the data are subject to time-dependent weights, and the leastsquares problem becomes
min
Xt
i=1
t;i yi ; 'Ti 2 ;
28
where 0 < < 1. The exponential forgetting is appealing due to its obvious
interpretation and that it easily can be applied to the recursive algorithms
that use updating of the covariance matrix, see Section 2.2. The exponential
forgetting does however have disadvantages. If the forgetting factor is chosen
too large, it will lead to slow convergence of the estimates, and if is chosen
too small, it will lead to \noisy" estimates. Another problem that may occur
during times of low excitation is estimator wind-up, i.e. whenever the vector
Pt;1 't = 0 the covariance matrix will increase exponentially.
These problems have led to a number of more or less successful alternative
types of forgetting. Fortesque et al. [11] and Sano and Wellstead [23] have made
attempts to use a time-variable forgetting factor that depends on the prediction
error, i.e. (t) "(t);1 . Irving [16] proposed the use of a forgetting factor that
keeps the trace of the covariance matrix constant to eliminate the estimator
wind-up. The estimation can also be restarted repeatedly, instead of using a
forgetting factor. This method is successfully practised by Betz and Evans [3],
where the covariance matrix is reset to a large matrix repeatedly. Another way
to deal with the trade-o between fast adaption and parameter
uctuations
has been suggested by Bobrow and Murray [6], where a desire to minimize the
parameter
uctuations is added to the cost function. This is achieved by a stable
polynomial in the forward shift operator, which determines how the parameters
should decay. The method does however have the drawback that an inertia is
added to the system.
We will now present an important alternative suggested by Hagglund [15]
called directional forgetting. One problem with exponential forgetting is that
although new information only arrives in the '-direction, old information is removed in all directions. In directional forgetting, however, information will only
be removed in the '-direction. The algorithm given in [15] gives a covariance
matrix proportional to the identity matrix, i.e.
lim P = a I;
(4.1)
t!1 t
where the diagonal elements of Pt may be interpreted as approximations of the
variances of the corresponding parameters.
To derive the new algorithm a more general denition of the covariance
matrix, or rather the information matrix, is needed. Let the updating of the
information matrix be given by
Pt;1 = Tt Vt;1 t = Tt;1 Vt;;11jt t;1 + 't vt;1 'Tt ;
where
V ;1 0
;
1
Vt = t;01jt v;1 :
t
If information only shall be discarded in the '-direction, Vt;;11jt is chosen so that
Tt;1 Vt;;11jt t;1 = Tt;1 Vt;;11 t;1 ; (t)'t 'Tt ;
29
(4.2)
;
Pt;1 := Pt;;11 + vt;1 ; (t) 't 'Tt :
The information at time t can be expressed as a sum of three terms: the information at time t ; 1, the old information which is removed and the new
information. The corresponding expression for the exponential forgetting is
Kt := Pvt 't :
t
The remaining issue is how to choose the factor (t). In [15] this is derived
by using (4.1) and the fact that (t) should be non-negative.
In the case of large and seldom occuring parameter changes there are mainly
two methods that have been put into practice. Both use the fact that whenever a
large parameter change is detected, the covariance matrix should be increased.
The rst method is to decrease the forgetting factor . The growth of Pt is
then nearly exponential. The second method is to add a constant times the
unit matrix to the P -matrix in which case Pt is increased instantaneously. The
updating of the covariance matrix is then given by
where (t) = 0 until a parameter change has been detected when it becomes a
non-negative number. How this number is chosen and, the most important issue,
how the parameter change is detected is discussed in detail in Hagglund [15].
30
Chapter 5
Test Results
In this chapter we will apply some of the algorithms presented in the previous
chapters to ve test examples. As we could see in e.g. Chapter 3, there are some
dierences between the way a numerical analyst and a control theorist tackles
the recursive least-squares problem. The test examples will therefore be divided
into two sections, of which the rst deals with an example taken from control
theory while the second deals with an example taken from numerical analysis.
(5.1)
where ut is the input signal (the rudder angle) and yt is the output signal (the
heading angle). In [17] the parameters are estimated by using free steering
experiments on the freighter itself, and are given by
a1
a2
b1
b2
=
=
=
=
;1:60
0:61
;0:161
;0:285:
31
Example 5.1
Example 5.2
Xt
k=1
P (q)u(k)
> 0;
where q is the forward shift operator, for all nonzero polynomials P of degree
n ; 1 or less, cf. Wittenmark and
Astrom [27]. In our case we have a second
order system, so to get an input signal that is not PE we simply let ut be a
step, since a step is PE of order 1.
32
Example 5.3
The problem with particularly the RLS algorithm is that the elements of the
-matrix are squared when the covariance matrix is calculated. This means
that if we for example have an input signal containing one large constant term
and one small time-varying term, problems may occur when the RLS algorithm
is used.
In this example we will use the same input signal as in Example 5.1 but will
add a constant term
= 108 . Since we are using MATLAB's double precission
calculations in all test examples, this is bound to cause problems, at least when
the RLS algorithm is used. The input and output signals are thus transformed
as
ut := ut +
(1)
;
yt := yt + B
A(1)
where A(q) = q2 + a1 q + a2 and B (q) = b1 q + b2 .
Example 5.4
where a2 (0) is the value of a2 in the previous examples and t 2 (0; T ). This
represents a slow variation in the system. A fast variation in the system is
obtained if we add some noise to the model (5.1). The discrete time model will
now have the following appearance:
yt + a1 yt;1 + a2 (t)yt;2 = b1 ut;1 + b2 ut;2 + et ;
where ei 2 N (0; 2) is white noise.
We have chosen ve of the algorithms given in Chapter 2: The RLS algorithm, Peterka's algorithm, the Givens algorithm, the Hybrid algorithm and
the Inverse Hybrid algorithm. For each of the rst three test examples we have
plotted the input- and output signals followed by plots of the factor1
q = kR;T 'k2 = kL'k2;
1 For Peterka's algorithm L here stands for LD 1=2 .
33
u (rudder angle)
10
-10
0
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
y (heading angle)
100
50
0
-50
-100
Figure 5.1: The input signal u and output signal y in Example 5.1.
cf. Section 3.2, the parameter estimates ^ and the prediction error " for each
algorithm. (For some algorithms the results are indistinguishable, and we have
in those cases only presented a single plot.)
The initial values for the parameters and matrices in the algorithms are given
in dierent ways: for the algorithms that use updating of the covariance matrix
the parameters are simply set to 0 and the covariance matrix is initialized as
P0 = I , where is chosen to be a large number to minimize the in
uence on
Pt , since according to Equation (2.5), Pt depends on P0 as
;
Pt = Tt t + P0;1 ;1 :
For the Hybrid algorithms the initial values are given by solving an overdetermined system of the size of the window. To choose window size can sometimes
be a problem, as we will see later on. In the examples where nothing else is
mentioned, a window of length 20 will be used.
Since we are estimating a third order system, this will in our example show up
as two constant poles and one constant zero, which can be seen in Figure 5.10{
5.11. The consequence of this is unsteady estimates. It may be interesting to
know that a realization of the the nal estimates of the Hybrid algorithm gives
an output signal ym that is close to y. This is however not the case for the
Inverse Hybrid algorithm.
So, we cannot say if the estimates obtained by the Hybrid algorithm are
worse than e.g. the ones obtained by Peterka's algorithm. But if we have to
choose it would certainly be preferable to have moderately varying estimates.
35
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.2: The factor q = kL'k2 for Peterka's algorithm and the Givens algorithm in Example 5.1.
Parameter estimates
0.5
0
...............................................................................................................................................................................................
.....
. ...............................................................................................................................................................................................
-0.5
-1
Prediction error
1 x10
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
-12
0.5
0
-0.5
-1
Figure 5.3: Parameter estimates ^ and prediction error " for Peterka's algorithm
and the Givens algorithm in Example 5.1.
36
Parameter estimates
0.5
0
...............................................................................................................................................................................................
.....
. ...............................................................................................................................................................................................
-0.5
-1
Prediction error
1 x10
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
-12
0.5
0
-0.5
-1
Figure 5.4: Parameter estimates ^ and prediction error " for the RLS algorithm
in Example 5.1.
0
+
-2
+
-4
log(||eps||)
-6
+
-8
+
-10
+
-12
+
-14
10
12
14
16
18
20
log(rho)
Figure 5.5: log(k"k) vs. log(), where is the factor in P0 = I , for Peterka's
algorithm in Example 5.1.
37
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.6: The factor q = kR;T 'k2 for the Hybrid algorithm in Example 5.1.
Parameter estimates
10
5
0
.. .......
................................................................
..........
...
.....
.................................................................................................................................................................
.
.
-5
-10
Prediction error
1 x10
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
-12
0.5
0
-0.5
-1
Figure 5.7: Parameter estimates ^ and prediction error " for the Hybrid algorithm in Example 5.1.
38
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.8: The factor q = kS T 'k2 for the Inverse Hybrid algorithm in Example 5.1.
Parameter estimates
1 x10
12
0.5
. ..
....................
.......................
..............................
...........
.......
.......
.................
...... ............
.................................................
........................
...................................
...........
........................................
.................
0
-0.5
-1
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
0.02
0.01
0
-0.01
-0.02
Figure 5.9: Parameter estimates ^ and prediction error " for the Inverse Hybrid
algorithm in Example 5.1.
39
Zeros of A(z)
5
0
-5
-10
-15
20
40
60
80
100
120
140
160
180
200
40
60
80
100
120
140
160
180
200
Zeros of B(z)
-5
-10
-15
20
Figure 5.10: The zeros of the polynomials A(z ) and B (z ) respectively for the
Hybrid algorithm in Example 5.1.
Zeros of A(z)
6 x10
11
4
2
0
-2
-4
20
Zeros of B(z)
6 x10
40
60
80
100
120
140
160
180
200
40
60
80
100
120
140
160
180
200
11
4
2
0
-2
-4
20
Figure 5.11: The zeros of the polynomials A(z ) and B (z ) respectively for the
Inverse Hybrid algorithm in Example 5.1.
40
u (rudder angle)
10
-10
0
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
y (heading angle)
500
-500
Figure 5.12: The input signal u and output signal y in Example 5.2.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.13: The factor q = kR;T 'k2 for the Hybrid algorithm in Example 5.2.
Parameter estimates
40
20
0
-20
-40
0
Prediction error
2 x10
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
-12
1
0
-1
-2
Figure 5.14: Parameter estimates ^ and prediction error " for the Hybrid algorithm in Example 5.2.
42
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.15: The factor q = kS T 'k2 for the Inverse Hybrid algorithm in Example 5.2.
Parameter estimates
2 x10
16
1
0
-1
-2
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
100
50
0
-50
-100
Figure 5.16: Parameter estimates ^ and prediction error " for the Inverse Hybrid
algorithm in Example 5.2.
43
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.17: The factor q = kL'k2 for Peterka's algorithm and the Givens
algorithm in Example 5.2.
Parameter estimates
-1
-2
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
1
0.5
0
-0.5
-1
0
Figure 5.18: Parameter estimates ^ and prediction error " for Peterka's algorithm, the RLS algorithm and the Givens algorithm in Example 5.2.
44
Parameter estimates
30
20
10
0
-10
-20
0
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
100
50
0
-50
-100
Figure 5.19: Parameter estimates ^ and prediction error " for the RLS algorithm
in Example 5.3.
For Peterka's algorithm and the Givens algorithm we can again observe a
more stable behaviour. The estimates converge quickly to some values (^3 (b1 )
and ^4 (b2 ) converge to erroneous values). As the step intervenes the prediction
error increases and q imediately goes to 1, but as the estimates instantaneously
converge to new (and correct) values, " and q decrease again. See Figures 5.17{
5.18. For the RLS algorithm the covariance matrix once again becomes nonpositive denite, but still the estimates and the prediction error are close to the
results of Peterka's algorithm and the Givens algorithm.
Parameter estimates
-1
-2
Prediction error
1 x10
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
-5
0.5
0
-0.5
-1
Figure 5.20: Parameter estimates ^ and prediction error " for Peterka's algorithm in Example 5.3.
Parameter estimates
-1
-2
Prediction error
1 x10
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
-5
0.5
0
-0.5
-1
Figure 5.21: Parameter estimates ^ and prediction error " for the Givens algorithm in Example 5.3.
46
u (rudder angle)
10
-10
0
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
y (heading angle)
100
50
0
-50
-100
Figure 5.22: The input signal u and output signal y in Example 5.4.
The Hybrid algorithm tackles this test example successfully, while the Inverse
Hybrid algorithm fails again.
47
70
65
+
60
||eps||
+
+
55
50
45
+
40
0.4
0.5
0.6
0.7
0.8
0.9
1.1
Forgetting factor
Figure 5.23: k"k2 vs. ln(), where 2 f0:5; 0:7; 0:9; 0:95; 0:99; 0:995; 0:999; 1g
for Peterka's algorithm in Example 5.4.
50
45
40
+
+
35
+
+
||eps||
30
25
+
+
+
+
20
15
10
5
0
10
15
20
25
30
35
40
45
50
Window length
Figure 5.24: k"k2 vs. the window length for the Hybrid algorithm in Example 5.4.
48
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.25: The factor q = kL'k2 for Peterka's algorithm and the Givens
algorithm in Example 5.4.
The results for the dierent algorithms are shown in Figures 5.25{5.31, and
will not be commented further upon.
This test example is taken from a report by Bjorck et al. [5] and is constructed
in the following way: generate an overdetermined system
= Y;
where is a 5 1 vector with ones as its elements,
H
H + ;
where H is a 25 5 Hilbert matrix, H is the same matrix but with the rows in
reversed order and is a 50 5 matrix with elements uniformly distributed in
(0; 10;9), and
Y = + Y;
=
49
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.26: The factor q = 'Tt Pt 't for the RLS algorithm in Example 5.4.
Parameter estimates
2
1
0
-1
-2
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
10
5
0
-5
-10
Figure 5.27: Parameter estimates ^ and prediction error " for Peterka's algorithm, the RLS algorithm and the Givens algorithm in Example 5.4.
50
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.28: The factor q = kR;T 'k2 the Hybrid algorithm in Example 5.4.
Parameter estimates
2
1
0
-1
-2
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
10
5
0
-5
-10
Figure 5.29: Parameter estimates ^ and prediction error " the Hybrid algorithm
in Example 5.4.
51
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.30: The factor q = kS T 'k2 for the Inverse Hybrid algorithm in Example 5.4.
Parameter estimates
2
1
0
-1
-2
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
10
5
0
-5
-10
Figure 5.31: Parameter estimates ^ and prediction error " for the Inverse Hybrid
algorithm in Example 5.4.
52
Parameter estimates
1000
500
0
-500
-1000
-1500
Prediction error
1 x10
10
15
20
25
30
35
40
45
50
-6
0.5
0
-0.5
-1
10
15
20
25
30
35
40
45
50
Figure 5.32: Parameter Estimates ^ and prediction error " for the Hybrid algorithm in Example 5.5.
where Y is a 50 1 vector with elements uniformly distributed in (0; 10;6).
In this test example we choose only to compare the Hybrid algorithm and
Peterka's algorithm. For the Hybrid algorithm we have chosen a window of
length 8 and for Peterka's algorithm we have chosen a forgetting factor = 1
and initial values ^0 = 0 and P0 = 1016 . We will present plots of the parameter
estimates ^, the prediction error ", the factor q and the relative error norm of
the parameter estimates, i.e.
^ ^
" = k k;kls k2 ;
2
1
0.8
0.6
0.4
0.2
0
0
10
15
20
25
30
35
40
45
50
10 0
10 -3
10 -6
10 -9
10 -12
10
15
20
25
30
35
40
45
50
Figure 5.33: The factor q = kR;T 'k2 and relative error norm for the parameter
estimates " for the Hybrid algorithm in Example 5.5.
Parameter estimates
1.1
1.05
1
0.95
0.9
0
Prediction error
1 x10
10
15
20
25
30
35
40
45
50
10
15
20
25
30
35
40
45
50
-6
0.5
0
-0.5
-1
Figure 5.34: Parameter Estimates ^ and prediction error " for Peterka's algorithm in Example 5.5.
54
1
0.8
0.6
0.4
0.2
0
0
10
15
20
25
30
35
40
45
50
10
15
20
25
30
35
40
45
50
10 4
10 1
10 -2
10 -5
10 -8
10 -11
Figure 5.35: The factor q = kL'k2 and relative error norm for the parameter
estimates " for Peterka's algorithm in Example 5.5.
dier from the results in [5]). It is however doubtful if the relative error norm
for the two algorithms is comparable since it is calculated in dierent ways.
55
Appendix A
Alternative Algorithms
Using Orthogonal
Transformations
As mentioned in Chapter 2 there are other ways to apply the orthogonal transformations when updating Cholesky factorizations. In the Givens 2 Algorithm we multiply an orthogonal matrix Q, formed by n ; 1 Givens rotations,
to the updating formula (2.12) in the following way
;
P = LQT Q I ; ;1 vvT QT QLT ;
so that Qv = e1 . This gives
P = H T JJH = H T H;
where H and H are upper Hessenberg matrices and
J=
p1 ; ;
1 2
0
In;1 :
Then we apply a new orthogonal matrix Q formed by n ; 1 Givens matrices to
give the nal result
P = H T Q T Q H = L L T :
To derive the Givens 3 Algorithm we will once again modify the updating
formula (2.12), i.e.
;
;
;
P = L I ; ;1 vvT LT = L I + vvT I + vvT LT ;
where v = LT ' and
;1
= ; p ;1 T :
1+ 1; v v
56
Q I + vvT = Q + 1 e1 vT = H;
where H is an upper Hessenberg matrix. As in the Givens 2 Algorithm we apply
a new orthogonal transformation to reduce H to a upper triangular matrix L~ T ,
i.e.
T = LL~ L~ T LT :
P = LH T HLT = LH T Q T QHL
Since the product of two lower triangular matrices is lower triangular, the derivation is completed with the new Cholesky factor L = LL~ .
Finally, we have the Householder Algorithm, where the covariance matrix
will be written in LDLT form, giving the updating formula
P = L D L T ;= LDLT ; ;1 LDLT ''T LDLT
= LD1=2 I ; ;1 vvT D1=2 LT ;
with v = D1=2 LT '. Now, we factorize the matrix
;
P = LD1=2 L^ L^ T D1=2 LT :
Note that L^ , as distinguished from L, does not have a unit diagonal. Finally,
57
Bibliography
[1] S. T. Alexander, C. T. Pan and R. J. Plemmons. Analysis of a Recursive
Least-Squares Hyperbolic Rotation Algorithm for Signal Processing. Lin.
Alg. Appl., vol. 98, pp. 3{40, 1988.
[2] Andersson, Bjorck and Dahlquist. Numerical Methods. Prentice-Hall, 1974.
[3] R. E. Betz and R. J. Evans. New Results and Applications of Adaptive Control to Classes of Nonlinear Systems. Proc. Workshop on Adaptive Control,
Florence, Italy, 1982.
[4] G. J. Bierman. Factorization Methods for Discrete Sequential Estimation.
Mathematics in Science and Engineering, vol. 128, Academic Press, 1977.
[5]
A. Bjorck, L. Elden and H. Park. Accurate Downdating of Least Squares
Solutions. IMA Preprint Series # 947, April 1992.
[6] J. E. Bobrow and W. Murray. An Algorithm for RLS Identication of Parameters that Vary Quickly with Time. Report, April 1991.
[7] J. Daniel, W. B. Gragg, L. Kaufman and G. W. Stewart. Reorthogonalization and Stable Algorithms for Updating the Gram-Schmidt QR Factorization. Math. Comp., vol. 30, pp. 772{795, 1976.
[8] P. van Dooren and M. Verhaegen. Numerical Aspects of Dierent Kalman
Filter Implementations. IEEE Trans. Aut. Control, vol. 31, no. 10, pp. 907{
917, 1986.
[9] L. Elden and H. Park. Block Downdating of Least Squares Solutions. Department of Mathematics Lindkoping University, August 1992.
[10] W. R. Ferng, G. H. Golub and R. J. Plemmons. Adaptive Lanczos Methods for Recursive Condition Estimation. Numerical Algorithms, pp. 11{20,
1991.
[11] T. R. Fortesque, L. S. Kershenbaum and B. E. Ydstie. Implementation
of Self-Tuning Regulators with Variable Forgetting Factors. Automatica,
vol. 17, pp. 831{835, 1981.
58
[12] P. E. Gill, G. H. Golub, W. Murray and M. A. Saunders. Methods for Modifying Matrix Factorizations. Mathematics of Computation, vol. 28, no. 126,
pp. 505{535, April 1974.
[13] J. Gotze and U. Schwiegelshohn. A Square Root and Division Free Givens
Rootation for Solving Least Squares Problems on Systolic Arrays. SIAM J.
Sci. Stat. Comput., vol. 12, no. 4, pp. 800{807, July 1991.
[14] C. S. Henkel and R. J. Plemmons. Recursive Least Squares on a Hypercube Multiprocessor Using the Covariance Factorization. SIAM J. Sci. Stat.
Comput., vol. 12, no. 1, pp. 95{106, January 1991.
[15] T. Hagglund. New Estimation Techniques for Adaptive Control. Department of Automatic Control, Lund, December 1983.
[16] E. Irving. New Developments in Improving Power Network Stability with
Adaptive Control. Proc. Workshop on Appl. of Adaptive Control, Yale University, New Haven, 1979.
[17] C. G. Kallstrom and K. J.
Astrom. Identication of Ship Steering Dynamics. Automatica, Vol. 12, pp. 9{22, Pergamon Press, 1976.
[18] M. Morf and T. Kailath. Square-Root Algorithms for Least-Squares Estimation. IEEE Trans. Aut. Control, vol. 20, no. 4, pp. 487{497, August
1975.
[19] C. C. Paige. Error Analysis of Some Techniques for Updating Orthogonal
Decompositions. Math. Comp., vol 34, pp. 465{471, 1980.
[20] C. T. Pan. A Perturbation Analysis of the Problem of Downdating a
Cholesky Factorization. Department of Mathematics, Northern Illinois University, October 1990.
[21] C. T. Pan and R. J. Plemmons. Least-Squares Modications with Inverse
Factorizations: Parallel Implementations. Comp. Appl. Math., vol. 27,
pp. 109{127, 1989.
[22] V. Peterka. Algorithms for LQG Self-Tuning Control Based on InputOutput Delta Models. Institute of Information Theory and Automation,
Prague.
[23] S. P. Sano and P. E. Wellstead. Extended Self-Tuning Algorithm. Int. J.
Control, vol. 34, pp. 433{455, 1981.
[24] M. A. Saunders. Large-Scale Linear Programming Using the Cholesky Factorization. Technical Report CS252, Comp. Sci. Dept., Stanford University,
1972.
59
[25] A. Sjo. Updating Techniques in Recursive Least-Squares Estimation. Algorithms Using Rank-1 Modication of Covariance Matrix Factorization.
Department of Computer Science, Lund, March 1992.
[26] G. W. Stewart. The Eects of Rounding Error on an Algorithm for Downdating a Cholesky Factorization. Journal of the Institute for Mathematics
and Applications, vol. 23, pp. 203{213, 1979.
[27] B. Wittenmark and K. J.
Astrom. Adaptive Control. Addison-Wesley, 1989.
60