system_id
system_id
system_id
Models are used to replicate the behavior of a physical system and are common in science and engineering.
Models are used in control design, optimization, simulations, fault detection and many other applications
[3]. Models are also used in estimation of signals in signal processing. In estimation problems, models are
assumed to be known and the input signal is to be estimated from the observed output. In system identifi-
cation, we estimate the system model from the known input and the observed output. Thus, estimation and
system identification are closely related to each other.
Estimation: A standard estimation problem involves
1. Collecting output observations y(t).
3. A known model structure of the underlying system/process for example, y(t) = f (u(t)) + v(t) where
f can be linear or nonlinear. A commonly used linear model structure is y = Hu + v where y, H are
known and v is modeled as a Gaussian noise.
4. The estimation problem is to estimate u from the measurements y i.e., minimizing a possible cost
function
minimizeu ky(t) − f (u(t))k.
For linear models this involves solving least squares problems if the 2-norm is used. For sparse signal
recovery/estimation, one uses 1-norm instead.
Notice that the input signal is unknown whereas the output signal is measurable and the system model is
known. The noise/disturbance effects the output measurements and is modeled appropriately.
System Identification: System identification involves modeling system from data. One encounters three
types of signals [3]
The system model is unknown and is to be determined/estimated. The system identification procedure
involves
1. Collecting input-output data: A data set can be represented as D := {u(1), . . . , u(N ), y(1), . . . , y(N )}.
The data needs to be informative to select appropriate models. In practice, we need some data pro-
cessing (filtering) to reduce the effect of disturbance, outliers and so on to polish the data.
2. Selecting a model structure: A set of candidate models includes linear, nonlinear, input-output, state
space models etc. One may need to revise the model set if the available models do not fit the data. We
also need models for noise and disturbances/uncertainties. Selecting a model M is the most difficult
step.
3. Best model fit: We need a set of rules (model selection criteria) for assessing the candidate models
i.e., cost function (e.g., least squares, l1 minimization etc.). This leads to an optimization problem.
We select the best candidate model which minimizes the cost function on the given data set. We may
need to revise the cost function/model selection criteria if the best model does not give a good fit.
4. Model validation: The chosen model based on the available is then validated on a new data. If the
model does not pass the validation setps, we need to update the previous steps.
1
There is also a step of designing experiments (0th step) for generating a data by feeding appropriate inputs to
the system so that the generated data is informative for system identification or in other words, the system is
identifiable from the data. This involves using persistently exciting set of inputs (defined on the next page).
Models can be first principles models (white box models) based on physics or data driven models.
Black box models such as neural network models do not use any physics of the underlying system. One
tunes the weights of the neural network to minimize the cost function to get a good fit. Weights may get
updated after the validation step. Gray box modeling involves combination of both these approaches i.e.,
first principles (physical laws, differential equations etc.) and gathered data. These are used when we have
a prior knowledge about the system/data. Sometimes modeling from first principles is impossible. In other
cases first principles models can be too complex or unreliable. One may need real data experiments to
validate the first principles models. Thus, we need black box and gray box models as well.
The model structures include
• Parametric models: These can be linear or nonlinear, input-output (transfer function) or state space
models. These models requires prior assumptions on a model structure.
• Nonparametric linear models: Impulse response or frequency response models. These methods esti-
mate time/frequency response without selecting a possible set of models.
There are delays to be incorporated in the transfer function and the state space models. Models are used in
controller design, state estimation and trajectory optimization, formal analysis and simulating real systems.
The model set M is parametrized by a vector Θ. The cost function (also called as loss function)
L(M, Θ) is evaluated on data points D which may be continuous or discrete. We look for Θ which mini-
mizes L(M, Θ) i.e.,
Θ∗ := argminΘ L(M, Θ)
and the optimization problem is
minΘ L(M, Θ).
One commonly used selection criteria is the prediction error method (PEM). Let
be the error between the observed output and the output predicted by the model parametrized by θ. Let
l((t, Θ)) be a scalar norm. Then
N
1 X
L(M, Θ) = VN (M, Θ) := l((t, Θ)) (2)
N
t=1
is the evaluation criteria. This leads to least squares estimation. One can also have a maximum likelihood es-
timator when the measurement noise is modeled as a random process. We want to select θ which maximizes
the likelihood of the observed data.
In addition to the prediction error criteria, one would like to add a penalty on the complexity of the
model structure i.e.,
WN (D, M, Θ) := VN (M, Θ)(1 + UN (M)). (3)
Different choices of UN (M) (e.g., dim(Θ)) give different model selection criteria. Choosing UN (M) =
2dim(Θ)
N gives AIC criteria whereas choosing U (M) = dim(Θ) log(N ) gives MDL or BIC criteria.
N N
Bias and variance tradeoff: Let θn be an estimate of θ0 using a random sample of size n. Then, the mean
squared error (MSE) is
2
The first term E[k(θ0 −E[θn ])k2 ] is referred as the bias and the second term E[k(E[θn ]−θn )k2 ] is referred as
the variance associated with the estimator θn . One would like to have minimum variance unbiased estimator
(MVUE) (i.e., the bias is zero and the variance is minimum). The estimators which achieve the Cramer-Rao
lower bound are MVU estimators. One looks for the best linear unbiased estimators (BLUE) for the ease of
computations.
Input design:
If the input is a vector process u ∈ Rm , then this condition becomes rank(UN −1 ) = km.
Pk−1 The PE condition on input signals is crucial in system identification. Consider the FIR model y(t) =
T
i=0 gi u(t−i)+e(t). We want to find the impulse response θ := (gk−1 , . . . , g1 , g0 ) based on {u(t), y(t), t =
0, 1 . . . , N − 1}. Let
y(k − 1) e(k − 1)
y(k) e(k)
yN −1 = .
, eN −1 =
. .
. .
y(N − 1) e(N − 1)
Then,
yN −1 = UN −1 θ + eN −1 (6)
and the impulse response coefficients can be found by solving the LS problem minθ kyN −1 − UN −1 θk which
requires UN −1 to be of full column rank i.e., the input sequence u to be PE.
For a zero mean stationary process u ∈ Rm , define the covariance matrix as
ΛT · · · ΛT
Λuu (0) uu (1) uu (k − 1)
Λuu (1) Λuu (0) · · · ΛT
uu (k − 2)
T
Λ̄uu (k) = limN →∞ UN UN = . . ··· .
(7)
. . ··· .
Λuu (k − 1) Λuu (k − 1) · · · Λuu (0)
It turns out that if Λ̄uu (k) is positive definite, then u is PE of order k. A model of order n is identifiable when
the input signal {u(t)}t is PE of order n. Persistency of excitation can also be defined using autocorrelations
of quasi-stationary input signals. We will define this alternate definition after defining quasi-stationary
signals.
References
[1] L. Ljung, System Identification, Theory for the user, PHI, 2nd Edition, 1999.