EE531 (Semester II, 2010)
13. Recursive Identification Methods
• Introduction
• Recursive least-squares method
• Recursive instrumental variable method
• Recursive prediction error method
13-1
Introduction
Features of recursive (online) identification
• θ̂(t) is computed by some ’simple modification’ of θ̂(t − 1)
• used in central part of adaptive systems
• not all data are stored, so a small requirement on memory
• easily modified into real-time algorithms
• used in fault detection, to find out if the system has changed
significantly
How to estimate time-varying parameters
• update the model regularly
• make use of previous calculations in an efficient manner
• the basic procedure is to modify the corresponding off-line method
Recursive Identification Methods 13-2
Desirable properties of recursive algorithms
• fast convergence
• consistent estimates (time-invariant case)
• good tracking (time-varying case)
• computationally simple
Trade-offs
• convergence vs tracking
• computational complexity vs accuracy
Recursive Identification Methods 13-3
Recursive least-squares method (RLS)
Recursive estimation of a constant: Consider the model
y(t) = b + ν(t), ν(t) is a disturbance of variance λ2
The least-squares estimate of b is the arithmetic mean:
t
1X
θ̂(t) = y(k)
t
k=1
This expression can be reformulated as
1
θ̂(t) = θ̂(t − 1) + [y(t) − θ̂(t − 1)]
t
• the current estimate is equal to the previous estimate plus a correction
• the correction term is the deviation of the predicted value from what is
actually observed
Recursive Identification Methods 13-4
RLS algorithm for a general linear model
y(t) = H(t)θ + ν(t)
The recursive least-squares algorithm is given by
θ̂(t) = θ̂(t − 1) + K(t)e(t)
K(t) = P (t)H(t)∗ = P (t − 1)H(t)∗[I + H(t)P (t − 1)H(t)∗]−1
e(t) = y(t) − H(t)θ̂(t − 1)
P (t) = P (t − 1) − P (t − 1)H ∗(t)[I + H(t)P (t − 1)H ∗(t)]−1H(t)P (t − 1)
• interprete e(t) as a prediction error and K(t) as a gain factor
• the update rule in P (t) has an efficient matrix inversion for scalar case
Recursive Identification Methods 13-5
Proof of the update formula The least-square estimate is given by
t
!−1 t
!
X X
θ̂(t) = H(t)∗H(t) H(t)∗y(t)
k=1 k=1
Denote P (t) as
t
!−1
X
P (t) = H(t)∗H(t) =⇒ P −1(t) = P −1(t − 1) + H(t)∗H(t)
k=1
Then it follows that
" t−1 #
X
θ̂(t) = P (t) H(k)∗y(k) + H(t)∗y(t)
k=1
h i
= P (t) P −1(t − 1)θ̂(t − 1) + H(t)∗y(t)
Recursive Identification Methods 13-6
h i
θ̂(t) = P (t) (P (t) − H(t) H(t))θ̂(t − 1) + H(t) y(t)
−1 ∗ ∗
h i
= θ̂(t − 1) + P (t)H(t) y(t) − H(t)θ̂(t − 1)
∗
To obtain the update rule for P (t), we apply the matrix inversion lemma:
(A + BCD)−1 = A−1 − A−1B(C −1 + DA−1B)−1DA−1
to
P −1(t) = P −1(t − 1) + H(t)∗H(t)
where we use
A = P −1(t − 1), B = H(t)∗, C=I D = H(t)
Recursive Identification Methods 13-7
Initial conditions
• θ̂(0) is the initial parameter estimate
• P (0) is an estimate of the covariance matrix of the initial parameter
• if P (0) is small then K(t) will be small and θ̂(t) will not change much
• if P (0) is large, θ̂(t) will quickly jump away from θ̂(0)
• it is common in practice to choose
θ̂(0) = 0, P (0) = ρI
where ρ is a constant
• using a large ρ is good if the initial estimate θ̂(0) is uncertain
Recursive Identification Methods 13-8
Effect of the initial values
We simulate the following system
y(t) − 0.9y(t − 1) = 1.0u(t − 1) + ν(t)
• u(t) is binary white noise
• ν(t) is white noise of zero mean and variance 1
• Identify the system using RLS with 250 points of data
• The parameters are initialized by
1 0
θ̂(0) = 0, P (0) = ρ
0 1
for ρ = 0.01, 0.1, 1, 10
Recursive Identification Methods 13-9
The graphs show the influence of the initial values
0 2
ρ = 0.01 ρ = 0.01
ρ = 0.1 ρ = 0.1
−0.2
ρ=1 ρ=1
ρ = 10 1.5 ρ = 10
−0.4
−0.6
b̂
1
â
−0.8
−1
0.5
−1.2
−1.4 0
0 50 100 150 200 250 0 50 100 150 200 250
k k
• Large and moderate values of ρ (i.e., ρ = 1 and ρ = 10) lead to similar
results
• For large ρ, little confidence is given to θ̂(0), so quick transient response
• A small value of ρ leads to a small K(t), so it gives a slower convergence
Recursive Identification Methods 13-10
Forgetting factor
The loss function function in the least-squares method is modified as
t
X
f (θ) = λt−k ky(k) − H(k)θk22
k=1
• λ is called the forgetting factor and take values in (0, 1)
• the smaller the value of λ, the quicker the previous info will be forgotten
• the parameters are adapted to describe the newest data
The RLS method with a forgetting factor is
θ̂(t) = θ̂(t − 1) + K(t)[y(t) − H(t)θ̂(t − 1)]
K(t) = P (t)H(t)∗ = P (t − 1)H(t)∗[λI + H(t)P (t − 1)H(t)∗]−1
1
P (t) = P (t − 1) − P (t − 1)H (t)[λI + H(t)P (t − 1)H (t)] H(t)P (t − 1)
∗ ∗ −1
λ
Recursive Identification Methods 13-11
The solution θ̂(t) that minimizes f (θ) is given by
t
!−1 t
!
X X
θ̂(t) = λt−k H(k)∗H(k) λt−k H(k)∗y(k)
k=1 k=1
The update formula follow analogously to RLS by introducing
t
!−1
X
P (t) = λt−k H(k)∗H(k)
k=1
The choice of λ is a trade-off between convergence and tracking
performance
• λ small =⇒ old data is forgotten fast, hence good tracking
• λ close to 1 =⇒ good convergence and small variances of the estimates
Recursive Identification Methods 13-12
Effect of the forgetting factor
Consider the problem of tracking a time-varying system
(
1.5 t ≤ N/2
y(t) − 0.9y(t − 1) = b0u(t) + ν(t), b0 =
0.5 t > N/2
• u(t) is binary white noise
• ν(t) is white noise of zero mean and variance 1
• Identify the system using RLS with 250 points of data
• The parameters are initialized by
1 0
θ̂(0) = 0, P (0) =
0 1
• The forgetting factors are varied by the values λ = 1, 0.99, 0.95
Recursive Identification Methods 13-13
The graphs show the influence of the forgetting factors
0 2
λ=1
−0.2 λ = 0.99
λ = 0.95 1.5
−0.4
1
−0.6
b̂
â
−0.8
0.5
λ=1
−1 λ = 0.99
0 λ = 0.95
−1.2
−1.4 −0.5
0 50 100 150 200 250 0 50 100 150 200 250
k k
A decrease in the forgetting factor leads to two effects:
• the estimates approach the true value mroe rapidly
• the algorithm becomes more sensitive to noise
As λ decreases, the oscillations become larger
Recursive Identification Methods 13-14
In summary
• one must have λ = 1 to get convergence
• if λ < 1 the parameter estimate can change quickly, and the algorithm
becomes more sensitive to noise
For this reason, it is often to allow the forgetting factor to vary with time
A typically choice is to let λ(t) tend exponentially to 1
λ(t) = 1 − λt0(1 − λ(0))
which can be easily implemented via a recursion
λ(t) = λ0λ(t − 1) + (1 − λ0)
Typical values for λ0 = 0.99 and λ(0) = 0.95
Recursive Identification Methods 13-15
Kalman Filter interpretation
The Kalman filter for estimating the state of a time-varying system
x(t + 1) = A(t)x(t) + Bu(t) + ν(t)
y(t) = C(t)x(t) + η(t)
where ν(t), η(t) are independent white noise with covariances R1, R2,
resp., is given by
x̂(t + 1) = A(t)x̂(t) + B(t)u(t) + K(t)[y(t) − C(t)x̂(t)]
K(t) = A(t)P (t)C ∗(t)[C(t)P (t)C ∗(t) + R2]−1
P (t + 1) = A(t)P (t)A∗(t) + R1
− A(t)P (t)C(t)∗[C(t)P (t)C ∗(t) + R2]−1C(t)P (t)A(t)∗
Recursive Identification Methods 13-16
The linear regression model
y(t) = H(t)θ + ν(t)
can be written as a state-space equation
θ(t + 1) = θ(t) (= θ)
y(t) = H(t)θ(t) + ν(t)
Apply the Kalman filter to the state-space equation with
A(t) = I, B(t) = 0, C(t) = H(t), R1 = 0
When R2 = I, it will give precisely the basic RLS algorithm in page 13-5
The tracking capability is affected by R2
Recursive Identification Methods 13-17
Recursive instrument variable method
The IV estimate of a scalar linear system
y(t) = H(t)θ + ν(t)
is given by
" t
#−1 " t
#
X X
θ̂(t) = Z(k)∗H(k) Z(k)∗y(k)
k=1 k=1
The IV estimate can be computed recursively as
θ̂(t) = θ̂(t − 1) + K(t)[y(t) − H(t)θ̂(t − 1)]
K(t) = P (t)Z(t)∗ = P (t − 1)Z(t)∗[I + H(t)P (t − 1)Z(t)∗]
P (t) = P (t − 1) − P (t − 1)Z(t)∗[I + H(t)P (t − 1)Z(t)∗]−1H(t)P (t − 1)
Pt
(analogous proof to RLS by using P (t) = ( k=1 Z(k)
∗
H(k))−1)
Recursive Identification Methods 13-18
Recursive prediction error method
We will use the cost function
t
1 X t−k ∗
f (t, θ) = λ e (k, θ)W e(k, θ)
2
k=1
where W ≻ 0 is a weighting matrix
1
Pt
• For λ = 1, f (θ) = tr(W R(θ)) where R(θ) = 2 k=1 e(k, θ)e (k, θ)
∗
• The off-line estimate of θ̂ cannot be found analytically (except for the
LS case)
• It is not possible to derive an exact recursive algorithm
• Some approximation must be used, and they hold exactly for the LS case
Recursive Identification Methods 13-19
Main idea: Assume that
• θ̂(t − 1) minimizes f (t − 1, θ)
• the minimum point of f (t, θ) is close to θ̂(t − 1)
Using a second-order Taylor series approximation around θ̂(t − 1) gives
f (t, θ) ≈ f (t, θ̂(t − 1)) + ∇f (t, θ̂(t − 1))∗(θ − θ̂(t − 1))
1
+ [θ − θ̂(t − 1)]∗∇2f (t, θ̂(t − 1))[θ − θ̂(t − 1)]
2
Minimize this w.r.t. θ and let the minimizer be θ̂(t):
θ̂(t) = θ̂(t − 1) − [∇2f (t, θ̂(t − 1))]−1∇f (t, θ̂(t − 1))
(Newton-Raphson step)
We must find ∇f (t, θ̂(t − 1)) and P (t) = [∇2f (t, θ̂(t − 1))]−1
Recursive Identification Methods 13-20
Details: To proceed, the gradient of f (t, θ) w.r.t θ are needed
1
f (t, θ) = λf (t − 1, θ) + e(t, θ)∗W e(t, θ)
2
∇f (t, θ) = λ∇f (t − 1, θ) + e(t, θ)∗W ∇e(t, θ)
∇2 f (t, θ) = λ∇2 f (t − 1, θ) + ∇e(t, θ)∗W ∇e(t, θ) + e(t, θ)∗W ∇2e(t, θ)
First approximations:
• ∇f (t − 1, θ̂(t − 1)) = 0 (θ̂(t − 1) minimizes f (t − 1, θ)
• ∇2f (t − 1, θ̂(t − 1)) = ∇2f (t − 1, θ̂(t − 2)) (∇2 f varies slowly with θ)
• e(t, θ)∗W ∇2e(t, θ) is negligible
After inserting the above equations to
θ̂(t) = θ̂(t − 1) − [∇2f (t, θ̂(t − 1))]−1∇f (t, θ̂(t − 1))
Recursive Identification Methods 13-21
we will have
ˆ − 1)]
θ̂(t) = θ̂(t − 1) − [∇2f (t, θ̂(t − 1)]−1[e(t, θ̂(t − 1))∗W ∇e(t, θ)(t
∇2f (t, θ̂(t − 1)) = λ∇2f (t − 1, θ̂(t − 2)) + ∇e(t, θ̂(t − 1))∗W ∇e(t, θ̂(t − 1))
(still not suited well as an online algorithm due to the term e(t, θ̂(t − 1))
Second approximations: Let
e(t) ≈ e(t, θ̂(t − 1)), H(t) ≈ −∇e(t, θ̂(t − 1))
(the actual way of computing these depends on model structures), then
θ̂(t) = θ̂(t − 1) + P (t)H ∗(t)W e(t)
where we denote P (t) = [∇2f (t, θ̂(t − 1))]−1 which satisfies
P −1(t) = λP −1(t − 1) + H(t)∗W H(t)
Recursive Identification Methods 13-22
Apply the matrix inversion lemma to the recursive formula of P −1(t)
we arrive at recursive prediction error method (RPEM)
Algorithm:
θ̂(t) = θ̂(t − 1) + K(t)e(t)
K(t) = P (t)H(t)∗
1
P (t) = P (t − 1) − P (t − 1)H(t) [λW + H(t)P (t − 1)H(t) ] P (t − 1)
∗ −1 ∗ −1
λ
where the approximations
e(t) ≈ e(t, θ̂(t − 1)), H(t) ≈ −∇e(t, θ̂(t − 1))
depend on the model structure
Recursive Identification Methods 13-23
Example of RPEM: ARMAX models
Consider the scalar ARMAX model
A(q −1)y(t) = B(q −1)u(t) + C(q −1)ν(t)
where all the polynomials have the same order
A(q −1 ) = 1 + a1q −1 + · · · + anq −n
B(q −1) = b1q −1 + · · · + bnq −n
C(q −1 ) = 1 + c1q −1 + · · · + cnq −n
Recursive Identification Methods 13-24
Define
1 1 1
ỹ(t, θ) = y(t), ũ(t, θ) = u(t), ẽ(t, θ) = e(t)
C(q )
−1 C(q )
−1 C(q )
−1
We can derive the following relations
A(q −1 )y(t) − B(q −1)u(t)
e(t, θ) =
C(q −1 )
∇e(t, θ) = (ỹ(t − 1, θ), . . . , ỹ(t − n, θ), −ũ(t − 1, θ), . . . , −ũ(t − n, θ),
− ẽ(t − 1, θ), . . . , −ẽ(t − n, θ))
To compute e(t, θ), we need to process all data up to time t
Recursive Identification Methods 13-25
We use the following approximations
e(t, θ) ≈ e(t) = y(t) + â1(t − 1)y(t − 1) + · · · + ân(t − 1)y(t − n)
− b̂1(t − 1)u(t − 1) − · · · − b̂n(t − 1)u(t − n)
− ĉ1(t − 1)e(t − 1) − · · · − ĉn(t − 1)e(t − n)
− ∇e(t, θ) ≈ H(t) = (−ȳ(t − 1), . . . , −ȳ(t − n),
ū(t − 1), . . . , ū(t − n), ē(t − 1), . . . , ē(t − n))
where
ȳ(t) = y(t) − ĉ1(t)ȳ(t − 1) − · · · − ĉn(t)ȳ(t − n)
ū(t) = u(t) − ĉ1(t)ū(t − 1) − · · · − ĉn(t)ū(t − n)
ē(t) = e(t) − ĉ1(t)ē(t − 1) − · · · − ĉn(t)ē(t − n)
Recursive Identification Methods 13-26
Comparison of recursive algorithms
We simulate the following system
1.0q −1
y(t) = u(t) + ν(t)
1 − 0.9q −1
• u(t), ν(t) are indepentdent white noise with zero mean and variance 1
• we use RLS,RIV, RPEM to identify the system
The model structure for RLS and RIV:
y(t) + ay(t − 1) = bu(t − 1) + ν(t), θ = (a, b)
The model structure for RPEM:
y(t) + ay(t − 1) = bu(t − 1) + ν(t) + cν(t − 1), θ = (a, b, c)
Recursive Identification Methods 13-27
Numerical results
RLS RIV
2
2.5
b̂
2 1
1.5
b̂ 0
â
θ̂(k)
1
θ̂(k)
−1
0.5
−2
0
−0.5 â −3
−1 −4
0 50 100 150 200 250 300 0 50 100 150 200 250 300
k k
• RLS does not give consistent estimates for systems with correlated noise
• This is because RLS is equivalent to an off-line LS algorithm
• In contrast to RLS, RIV gives consistent estimates
• This result follows from that RIV is equivalent to an off-line IV method
Recursive Identification Methods 13-28
RPEM
2
0
θ̂(k)
−1
−2
â
b̂
−3 ĉ
−4
0 50 100 150 200 250 300
k
• RPEM gives consistent estimates of a, b, c
• The estimates â and b̂ converge more quickly and ĉ
Recursive Identification Methods 13-29
Common problems for recursive identification
• Excitation
• Estimator windup
• P (t) becomes indefinite
Excitation it is important that the input is persistently excitation of
sufficiently high order
Recursive Identification Methods 13-30
Estimator windup
Some periods of an identification experiment exhibit poor excitation
Consider when H(t) = 0 in the RLS algorithm, then
1
θ̂(t) = θ̂(t − 1), P (t) = P (t − 1)
λ
• θ̂ becomes constant as t increases
• P increases exponentially with time for λ < 1
• When the system is excited again (H(t) 6= 0), the gain
K(t) = P (t)H(t)∗
will be very large and causes an abrupt change in θ̂
• This is referred to as estimator windup
Solution: Do not update P (t) if we have poor excitation
Recursive Identification Methods 13-31
P (t) indefinite
P (t) represents a covariance matrix
Therefore, it must be symmetric and positive definite
Rounding error may accumulate to make P (t) indefinite
This will make the estimate diverge
The solution is to note that every positive definite matrix can be factorized
as
P (t) = S(t)S(t)∗
and rewrite the algorithm to update S(t) instead
Recursive Identification Methods 13-32
References
Chapter 9 in
T. Söderström and P. Stoica, System Identification, Prentice Hall, 1989
Chapter 11 in
L. Ljung, System Identification: Theory for the User, 2nd edition, Prentice
Hall, 1999
Lecture on
Recursive Identification Methods, System Identification (1TT875), Uppsala
University,
http://www.it.uu.se/edu/course/homepage/systemid/vt05
Recursive Identification Methods 13-33