0% found this document useful (0 votes)

131 views13 pages

The Use of Gaussian Processes in System Identification

Gaussian processes are used in system identification to model input-output relationships from observed data. Specifically, Gaussian process regression places a Gaussian process prior on the unknown system function and conditions it on observed input-output data. This allows modeling of nonlinear finite impulse response models and nonlinear autoregressive models from time series data. Gaussian process state-space models can also learn the dynamic and measurement models of a nonlinear system directly from input-output data. The key aspects of Gaussian process regression are outlined, including the Bayesian formulation, inference equations, and learning of hyperparameters from data.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

131 views13 pages

The Use of Gaussian Processes in System Identification

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

The Use of Gaussian Processes in System

Identification
Simo Särkkä
To appear in Encyclopedia of systems and control, 2nd edition
arXiv:1907.06066v1 [stat.ML] 13 Jul 2019

1 Abstract
Gaussian processes are used in machine learning to learn input-output mappings from
observed data. Gaussian process regression is based on imposing a Gaussian process
prior on the unknown regressor function and statistically conditioning it on the ob-
served data. In system identification, Gaussian processes are used to form time series
prediction models such as non-linear finite-impulse response (NFIR) models as well
as non-linear autoregressive (NARX) models. Gaussian process state-space models
(GPSS) can be used to learn the dynamic and measurement models for a state-space
representation of the input-output data. Temporal and spatio-temporal Gaussian pro-
cesses can be directly used to form regressor on the data in the time domain. The aim
of this article is to briefly outline the main directions in system identification methods
using Gaussian processes.

2 Keywords
Gaussian process regression, non-linear system identification, GP-NFIR model, GP-
ARX model, GP-NOE model, Gaussian process state-space model, temporal Gaussian
process, state-space Gaussian process

3 Introduction
Gaussian process regression (Rasmussen and Williams, 2006) refers to a statistical
methodology where we use Gaussian processes as prior models for regression func-
tions that we fit to observed data. This kind of methodology is particularly popular in
machine learning although the origins of the basic ideas can be traced to geostatistics
(Cressie, 1993). In geostatistics, the corresponding methodology is called ”kriging”
which is named after South African mining engineer D. G. Krige. In system identifica-
tion, Gaussian processes can be used to identify (or ”learn” in machine learning terms)
the input-output relationships from observed data. Even when there is no explicit input
in the system, Gaussian processes can be used to identify a model for an observed time
series of outputs which is a specific form of a system identification problem. Overviews
of the use of Gaussian processes in system identification can be found, for example, in
the monograph of Kocijan (2016), and PhD theses of McHutchon (2015) and Frigola
(2016).

1
4 Gaussian processes in system identification
4.1 Gaussian process regression
Gaussian process regression is concerned with the following problem: Given a set
of observed (training) input-output data D = {(zk , yk ) : k = 1, . . . , N } from an
unknown function y = f (z), predict the values of the function at new (test) inputs
{z∗k : k = 1, . . . , M }. That is, the problem is a classical regression problem. However,
the classical solution to the problem usually amounts to fixing a parametric function
class f (z; θ), where θ is a set of parameters and then fitting the parameters to the
observed data. In Gaussian process regression, we take a different route; instead of
fixing a parametric class of functions, we put a Gaussian process prior measure on
the whole regression function and condition on the observed data using Bayes’ rule
(Rasmussen and Williams, 2006).

4.1.1 Gaussian process regression problem

Mathematically, the Gaussian process regression problem can be written as

f (z) ∼ GP(m(z), k(z, z0 )), (1a)

yk = f (zk ) + k , k ∼ N (0, σn2 ), (1b)

where Equation (1a) tells that, a priori, the function is a Gaussian process
with mean function m(z) = E[f (z)] and covariance function (or kernel)
k(z, z0 ) = Cov[f (z), f (z0 )] = E[(f (z) − m(z)) (f (z0 ) − m(z0 ))]. Equation (1b)
tells that we observe the function values at points zk , k = 1, . . . , N and that they are
corrupted by (independent) Gaussian noises with variance σn2 .
The mean and covariance functions define the regressor function class and they, or
at least their parametric classes, need to be selected a priori. The mean function can
typically be selected to be identically zero m(z) = 0. The covariance function defines
the smoothness properties of the functions, and a typical choice in machine learning is
the squared exponential covariance function

kz − z0 k2

k(z, z0 ) = s2 exp − (2)
2`2

which produces infinitely differentiable (i.e., analytic) regressor functions. The pa-
rameters s and ` in the aforementioned covariance function define the magnitude and
length scales of the regressor functions, respectively. Other common choices of co-
variance functions are, for example, the Matérn class of covariance functions (Matérn,
1960; Rasmussen and Williams, 2006).

4.1.2 Gaussian process regression equations

Given the mean and covariance functions as well as the measurements, we can form the
Gaussian process regressor. Assuming that the noises are independent of the function
values, we can write the joint distribution of the observed values and the unknown
function values as follows:
k> (Z∗ )

K + σn2 I

y m(Z)
∼N , , (3)
f (Z∗ ) m(Z∗ ) k(Z∗ ) k(Z∗ , Z∗ )

2
> >
where y = y1 . . . yN , m(Z) = m(z1 ) · · · m(zN ) , m(Z∗ ) =
>
m(z∗1 ) . . . m(z∗M ) , and K and k(Z∗ ) denote matrices with element (i, j)

given as k(zi , zj ) and k(z∗i , zj ), respectively. By conditioning this joint Gaussian

distribution on the measurements y we get that the conditional (i.e., posterior)
>
distribution of the function values f (Z∗ ) = f (z∗1 ) . . . f (z∗M ) is Gaussian with
the mean and covariance
E[f (Z∗ ) | y] = m(Z∗ )
−1
+ k(Z∗ ) K + σn2 I

(y − m(Z)),
∗ ∗ ∗ (4)
Cov[f (Z ) | y] = k(Z , Z )
−1 > ∗
− k(Z∗ ) K + σn2 I

k (Z ).
These are the fundamental equations of Gaussian process regression. An example of
Gaussian process regression with squared exponential covariance function is shown in
Figure 1.

4.1.3 Hyperparameter learning

Even though Gaussian process regression is a non-parametric method, for which we
do not need to fix a parametric class of functions, the mean and covariance functions
can have unknown hyperparameters ϕ which can be estimated from data. For example,
the squared exponential covariance function in Equation (2) has the hyperparameters
ϕ = (s, `).
2

1.5

0.5

-0.5

-1
True function
-1.5
Regression mean
Observations
-2
Quantiles
-2.5
0 2 4 6 8 10

Figure 1: Example of Gaussian process regression with squared exponential covariance func-
tion. The true function is a sinusoidal which is observed only at 10 points that are corrupted by
Gaussian noise. The quantiles provide error bars for the predicted function values.

A common way to estimate the parameters is to maximize the marginal likelihood

– also called evidence – p(y | ϕ) of the measurements, or equivalently, minimize the
negative log-likelihood of the measurements
1
− log p(y | ϕ) = log |2π (Kϕ + σn2 I)|
2
1 −1 (5)
+ (y − mϕ (Z))> Kϕ + σn2 I

2
× (y − mϕ (Z)).

3
The gradient of this function with respect to the hyperparameters is also available (see,
e.g., Rasmussen and Williams, 2006) which allows for the use of gradient-based opti-
mization methods to estimate the parameters.
Instead of using the maximum likelihood method to estimate the parameters, it is
also possible to use a Bayesian approach to the problem and consider the posterior
distribution of the hyperparameters
p(y | ϕ) p(ϕ)
p(ϕ | y) = R , (6)
p(y | ϕ) p(ϕ) dϕ
where p(ϕ) is the prior distribution of the hyperparameters. We can, for example,
compute the maximum a posteriori estimate of the parameters by finding the maximum
of this distribution or use Markov chain Monte Carlo (MCMC) methods (Brooks et al,
2011) to estimate the statistics of the distribution.
In what follows, to avoid notational clutter, we drop out the hyperparameters from
the Gaussian process formulations and inference methods although they are commonly
estimated as part of the Gaussian process learning.

4.1.4 Reduction of computational complexity

A limitation of Gaussian process regression in its explicit form is that the computa-
tional complexities of the regression Equations (4) and likelihood Equation (5) are
cubic O(N 3 ) in the number of measurements N . This is due to the N × N matrix
inversion appearing in the equations which, even when implemented with Cholesky or
LU decompositions, needs a cubic number of computational steps.
Ways of solving the computational complexity problem are, for example, sparse
approximations using inducing points (Quiñonero-Candela and Rasmussen, 2005; Ras-
mussen and Williams, 2006; Titsias, 2009), approximating the problem with a discrete
Gaussian random field model (Lindgren et al, 2011), or use of random or deterministic
basis/spectral expansions (Quiñonero-Candela et al, 2010; Solin and Särkkä, 2018).

4.2 GP-NFIR, GP-NARX, GP-NOE, and related models

In system identification, we can use Gaussian processes to model unknown input-
output relationships in time series. Several different model architectures are available
for this purpose. Let us assume that we have a system with input sequence u1 , u2 , . . .
and output sequence y1 , y2 , . . . and the aim is to predict the outputs from inputs. We
also assume that we have been given a set of training data consisting of known inputs
and (noisy) outputs. In the following, we present some typically used architectures that
have been proposed for this purpose. More details can be found in the monograph of
Kocijan (2016).

4.2.1 GP-NFIR model

The Gaussian process non-linear finite impulse response (GP-NFIR) model (Acker-
mann et al, 2011; Kocijan, 2016) has the form (see Figure 2)
ŷk = f (uk−1 , . . . , uk−m ), (7)
where f (·) is an unknown mapping which we model as a Gaussian process, and ŷk
denotes the estimate produced by the regressor. In this model, we form a Gaussian pro-
cess regressor that predicts the current output from a finite number of previous inputs.

4
This model can be identified by reducing it into a Gaussian process regression model

yk = f (zk ) + k , (8a)
>
where zk = uk−1 ... uk−m and by using standard Gaussian process regression
methods on it.
k
uk
uk−1
GP model yk

uk−m

Figure 2: In GP-NFIR model the Gaussian process regressor is used to predict next output from
previous inputs.

4.2.2 GP-NARX model

The Gaussian process nonlinear autoregressive model with exogenous input (GP-
NARX) (Kocijan et al, 2005; Kocijan, 2016) is a model of the form (see Figure 3)

yk = f (yk−1 , . . . , yk−n , uk−1 , . . . , uk−m ) + k , (9)

where k is a Gaussian random variable. This model can be reduced to a Gaussian pro-
>
cess regression problem by setting zk = yk−1 · · · yk−n uk−1 · · · uk−m
in Equation (8a).

uk
k

uk−m
yk GP model yk
yk−1

yk−n

Figure 3: In GP-NARX model the Gaussian process is used to predict the next output from the
previous inputs and outputs.

4.2.3 GP-NOE model

In Gaussian process nonlinear output error (GP-NOE) model (Kocijan and Petelin,
2011; Kocijan, 2016) we form a Gaussian process regressor for the problem (see Fig-
ure 4)
yk = f (ŷk−1 , . . . , ŷk−n , uk−1 , . . . , uk−m ) + k , (10)

5
where ŷk−1 , . . . , ŷk−n are the Gaussian process regressor predictions from the previous
steps.

uk
k

uk−m
ŷk−1 GP model yk
q−1
ŷk−n
q−n

Figure 4: The GP-NOE model uses previous inputs and the previous outputs of the Gaussian
process regressor to predict the next output. In the figure, q−n denotes an n-step delay operator.

Learning in this kind of model requires further approximations because the predic-
tions of the Gaussian process are directly used as inputs on the next step.

4.2.4 Other model architectures

As discussed in Kocijan (2016), it is also possible extend these architectures to, for
example, GP-NARMAX (nonlinear autoregressive and moving average model with
exogenous input) models and NJB (nonlinear BoxJenkins) models.

4.3 Gaussian process state-space (GPSS) models

Another approach to system identification is to form a state-space model where the
dynamic and measurement models are identified using Gaussian process regression
methods. This leads to so-called Gaussian process state-space models.

4.3.1 General GPSS model

A Gaussian process state-space (GPSS) model (see Figure 5) has the mathematical
form (e.g. Kocijan, 2016)

xk+1 = f (xk , uk ) + wk , (11a)

yk = g(xk , uk ) + k , (11b)

where the state vector xk , k = 0, 1, 2, . . . , N contains the current state of the system,
u1 , u2 , . . . is the input sequence and y1 , y2 , . . . is the output sequence. In the model, wk
is a Gaussian distributed process noise. The aim is now to learn the functions f (xk , uk )
and g(xk , uk ), which are modeled as Gaussian processes, given the input and output
sequences, or in some cases, also given direct observations of the state vector.

4.3.2 Learning with fully observed state

When the state vector xk is fully observed, then both the dynamic model (11a) and
measurement model (11b) become standard Gaussian process regression models. In

6
wk

uk−1
GP model f xk
xk−1
−1
q

k
xk

GP model g yk
uk

Figure 5: In GPSS model we learn a Gaussian process regressor for approximating the dynamic
and measurement models in a state-space model. In this figure, q−1 denotes a one-step delay
operator.

dynamic model (11a), the training set consists of measurements xk+1 with the cor-
responding inputs (xk , uk ), and in measurement model (11b) the measurements are
yk with the corresponding inputs (xk , uk ). This kind of fully observed models are
important in many applications such as robotics (Deisenroth et al, 2015).
After conditioning on the training data, the functions f and g will still be Gaussian
processes and their mean and covariance functions are given by (multivariate general-
izations) of Equations (4). State estimation in this kind of models has been considered
by Ko and Fox (2009) and Deisenroth et al (2011), and it turns out that it is possible
to construct closed-form Gaussian approximation (moment matching) based filters and
smoothers for these models (Deisenroth et al, 2011). Control problems related to this
kind of models have been considered, for example, by Deisenroth et al (2015).

4.3.3 Marginalization of the GP

When the states xk are not observed, then we need to treat both the states and the Gaus-
sian processes as unknown. There are a few different ways to cope with the model in
that case, and one approach is the marginalization approach of Frigola et al (2013,
2014b,a) and Frigola (2016). First note that if we have a method to learn f , we can
learn both f and g using a state-augmentation trick (Frigola, 2016): we define an aug-
> >
mented state as x̃ = x γ , Gaussian process h(γ, u) = f (x, u) g(x, u) , and
>
augmented process noise w̃ = wk 0 , which reduces the model to

x̃k+1 = h(x̃k , uk ) + w̃k , (12a)

y k = γ k + k . (12b)

This is now a model with an unknown dynamic model, but with a given linear Gaussian
measurement model p(yk | x̃k , uk ) = N (yk | γk , σn2 ).
Thus, without a loss of generality, we can focus on models with an unknown dy-

7
namic model f , and a known measurement model:

xk+1 = f (xk , uk ) + wk , (13a)

yk ∼ p(yk | xk , uk ). (13b)

The aim is to learn the function f (x) from a sequence of measurement data yk .
One way to understand (Frigola, 2016) this model is that we could hypothetically
generate data from it by sampling the (infinite-dimensional) function f (x, u), and then
starting from x0 sequentially produce each {f1 , x1 , . . . , fN , xN }, where we have de-
noted fk = f (xk−1 , uk−1 ) for k = 1, 2, . . . , N . Each of the conditional distributions

p(fk | f1:k−1 , x0:k−1 ) (14)

turns out to be Gaussian. Note that above, we have introduced the short-hand notation
f1:k = (f1 , . . . , fk ) which we will use also in the rest of this article.
The above observation allows us to integrate out (i.e., marginalize) the Gaussian
process from the model in closed form. The result is the following representation
(Frigola et al, 2013):
N
Y
p(x0:N ) = N (xk | µk (x0:k−1 ), Σk (x0:k−1 )), (15)
k=1

where the means µk (x0:k−1 ) and covariances Σk (x0:k−1 ) are (quite complicated)
functions of the prior mean and covariance functions of the Gaussian process f , which
are evaluated on the whole previous state history. The above equation defines a non-
Markovian prior model for the state sequence xk , k = 1, . . . , N .
For a given x0:N , the distribution p(f (x∗ ) | x∗ , x0:N ) for a test point x∗ can be
computed by using conventional Gaussian process prediction equation as follow:

p(f (x∗ ) | x∗ , y1:N )

(16)
Z
= p(f (x∗ ) | x∗ , x0:N ) p(x0:N | y1:N ) dx0:N ,

which can be numerically approximated, provided that we use a convenient (e.g. Monte
Carlo) approximation for p(x0:N | y1:N ).
Given the model (15) it is then possible to use, for example, particle Markov chain
Monte Carlo methods (Frigola et al, 2013) to sample state trajectories from the poste-
rior distribution p(x0:N | y1:N ) jointly with the parameters of the model, which pro-
vides a Monte Carlo approximation to the above integral. Other proposed approaches
are, for example, particle stochastic approximation expectation–maximization (EM,
Frigola et al, 2014b) which uses a Monte Carlo approximation to the EM algorithm
aiming at computing the maximum likelihood estimates of the parameters while han-
dling the states as missing data.

4.3.4 Approximation of the GP

Another way to approach the problem where both the states and Gaussian processes
are unknown is to approximate the Gaussian process as finite-dimensional parametric
model and use conventional parameter estimation methods to the model.

8
One possible approximation considered in Svensson et al (2016) is to employ a
Karhunen–Loeve type of basis function expansion of the Gaussian process as follows:
S
X
f (x, u) = ci φi (x, u), (17)
i=1

where φi (x, u) are deterministic basis functions (e.g. sinusoidals) and ci are Gaussian
random variables. With this approximation, the model in Equation (13) becomes
S
X
xk+1 = ci φi (xk , uk ) + wk , (18a)
i=1
yk ∼ p(yk | xk , uk ), (18b)
where learning of the Gaussian process f reduces to estimation of the finite number of
>
parameters c = c1 · · · cS in the state-space model. The states and parameters
in this model can now be determined using, for example, particle Markov chain Monte
Carlo (PMCMC) methods (see Svensson et al, 2016).
Another possibility is to use inducing points (typically denoted with u, but here
we denote them with f u to avoid confusion with the input sequence). In those ap-
proaches the idea is to first perform Gaussian process inference on the inducing points
alone, that is, compute p(f u | y1:N ) and then compute the (approximate) predictions
by conditioning on the inducing points instead of the original data. This leads to the
approximation
p(f (x∗ ) | y1:N )
Z
= p(f (x∗ ) | x∗ , x0:N ) p(x0:N | y1:N ) dx0:N
(19)
Z
≈ p(f (x∗ ) | x∗ , f u ) p(f u | y1:N ) df u .

In the Turner et al (2010) the inducing points are learned using expectation–
maximization (EM) algorithm. Frigola et al (2014a) propose a method for variational
learning (or integration over) the inducing points by forming a variational Bayesian
approximation q(f u ) ≈ p(f u | y1:N ), which further results in
p(f (x∗ ) | y1:N )
(20)
Z
≈ p(f (x∗ ) | x∗ , f u ) q(f u ) df u

and turns out to be analytically tractable as the optimal variational distribution q(f u ) is
Gaussian.

4.4 Spatio-temporal Gaussian process models

4.4.1 Temporal Gaussian processes
Another way of modeling time series using Gaussian processes is by considering them
as functions of time (Hartikainen and Srkk, 2010) which are sampled at certain time
instants t1 , t2 , . . .:
f (t) ∼ GP(m(t), k(t, t0 )),
(21)
yk = f (tk ) + k .

9
That is, instead of attempting to form a predictor from the previous measurements or
inputs, the idea is to condition the temporal Gaussian process on its observed values
and use the conditional Gaussian process for predicting values at new time points.
Unfortunately, due to the cubic computational scaling of the Gaussian process re-
gression, this quickly becomes in tractable when time series length increases. However,
temporal Gaussian process regression is closely related to the classical Kalman filtering
and Rauch-Tung-Striebel smoothing problems (e.g. Särkkä, 2013), which can be used
to reduce the required computations. It turns out that provided that the Gaussian pro-
cess is stationary, that is, the covariance function only depends on the time difference
k(t, t0 ) = k(t − t0 ), then, under certain restrictions, the Gaussian process regression
problem is essentially equivalent to state-estimation in a model of the form
dx(t)
= A x(t) + B η(t),
dt (22)
yk = C x(tk ) + k ,

where η(t) is a white noise process and the matrices A, B, and C are selected suitably
to match the original covariance function. For example, the Matérn covariance func-
tions with half-integer smoothness parameters have exact representations as state-space
models (Hartikainen and Srkk, 2010).
The Gaussian process regression problem can be now solved by applying a Kalman
filter and Rauch-Tung-Striebel smoother on this problem. These methods have the
fortunate property, that their complexity is linear O(N ) with respect to the number
of measurements N as opposed to the cubic complexity of direct Gaussian process
regression solution.

4.4.2 Spatio-temporal Gaussian processes

A similar state-space approach also works for spatio-temporal Gaussian process models
with a covariance function of a stationary form k(z, t; z0 , t0 ) = k(z, z0 ; t − t0 ). In that
case, the state-estimation problem becomes infinite-dimensional, that is, a distributed
parameter system
dx(z, t)
= A x(z, t) + B η(z, t),
dt (23)
yk = C x(z, tk ) + k ,

where A is a matrix of operators and C is a matrix of functionals (Särkkä et al, 2013).

The solution of the Gaussian process regression problem in this form requires the use
of methods from partial differential equations, but in many cases we can obtain an exact
O(N ) inference procedure from this route.

4.4.3 Latent force models

In so-called latent force models (Álvarez et al, 2013) the idea is to infer latent force
ξ(t) in a differential equation model such as

d2 x(t) dx(t)
2
+γ + ν 2 = ξ(t), (24)
dt dt
where ξ(t) is an unknown function which is modeled as a Gaussian process. The in-
ference in this model can be recast as Gaussian process regression with a modified

10
covariance function. The idea can be further generalized to partial differential equation
models and non-linear models when approximation methods such as Laplace approxi-
mation are used.
The inference in this kind of models can be further re-stated in the state-space form
(Särkkä et al, 2019), which also allows for the study of control problems on latent force
models. This formulation also allows the analysis of observability and controllability
properties of latent force models.

5 Summary and Future Directions

In this article, we have briefly outlined the main directions in system identification us-
ing Gaussian processes. The methods can be divided into three classes: (1) GP-NFIR,
GP-NARX, and GP-NOE type of models which directly aim at learning the function
from the previous inputs and outputs to the current output, (2) Gaussian process state-
space models which aim to learn the dynamic and measurement models in the state-
space model, and (3) Gaussian process regression methods for the spatio-temporal time
series by direct or state-space Gaussian process regression.
Active problems in research in this area appear to be, for example, joint learning
and control in all the model types outlined here. Another direction of further study is
the analysis of observability, identifiability, and controllability. In practice, the Gaus-
sian process-based system identification methods are very similar to classical methods,
and hence they can be expected to inherit many limitations and theoretical properties
of the classical methods.
An emerging research area in Gaussian process-based models are so-called deep
Gaussian processes (Damianou and Lawrence, 2013) which borrow the idea of deep
neural networks by forming hierarchies of Gaussian processes. This kind of models
could also turn out to be useful in system identification. Furthermore, Gaussian pro-
cesses are also easily combined with first-principles-models (cf. latent force models
described above) which allows for flexible gray box modeling using Gaussian pro-
cesses.
One of the main obstacles in Gaussian process regression is still the computational
scaling in the number of measurements, which is also inherited by all the new develop-
ments. Although several good approaches to tackle this problem have been proposed,
the problem still is that they inherently replace the original model with an approxima-
tion. New better approaches to this problem are likely to appear in the near future.

6 Cross References
References
Ackermann ER, De Villiers JP, Cilliers P (2011) Nonlinear dynamic systems modeling
using Gaussian processes: Predicting ionospheric total electron content over South
Africa. Journal of Geophysical Research: Space Physics 116(10)

Álvarez MA, Luengo D, Lawrence ND (2013) Linear latent force models using Gaus-
sian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence
35(11):2693–2705

11
Brooks S, Gelman A, Jones GL, eng XLM (2011) Handbook of Markov Chain Monte
Carlo. Chapman & Hall/CRC, Boca Raton, FL

Cressie NAC (1993) Statistics for Spatial Data. Wiley

Damianou AC, Lawrence ND (2013) Deep Gaussian processes. In: International Con-
ference on Artificial Intelligence and Statistics (AISTATS), pp 207–215
Deisenroth MP, Turner RD, Huber MF, Hanebeck UD, Rasmussen CE (2011) Robust
filtering and smoothing with Gaussian processes. IEEE Transactions on Automatic
Control 57(7):1865–1871
Deisenroth MP, Fox D, Rasmussen CE (2015) Gaussian processes for data-efficient
learning in robotics and control. IEEE Transactions on Pattern Analysis and Machine
Intelligence 37(2):408–423

Frigola R (2016) Bayesian time series learning with Gaussian processes. PhD thesis,
University of Cambridge
Frigola R, Lindsten F, Schön TB, Rasmussen CE (2013) Bayesian inference and learn-
ing in Gaussian process state-space models with particle MCMC. In: Advances in
Neural Information Processing Systems, pp 3156–3164

Frigola R, Chen Y, Rasmussen CE (2014a) Variational Gaussian process state-space

models. In: Advances in Neural Information Processing Systems, pp 3680–3688
Frigola R, Lindsten F, Schön TB, Rasmussen CE (2014b) Identification of Gaussian
process state-space models with particle stochastic approximation EM. IFAC Pro-
ceedings Volumes, Proceedings of the 19th IFAC World Congress 47(3):4097–4102

Hartikainen J, Srkk S (2010) Kalman filtering and smoothing solutions to temporal

Gaussian process regression models. In: IEEE International Workshop on Machine
Learning for Signal Processing (MLSP), pp 379–384
Ko J, Fox D (2009) GP-BayesFilters: Bayesian filtering using Gaussian process pre-
diction and observation models. Autonomous Robots 27(1):75–90

Kocijan J (2016) Modelling and control of dynamic systems using Gaussian process
models. Springer
Kocijan J, Petelin D (2011) Output-error model training for Gaussian process mod-
els. In: International Conference on Adaptive and Natural Computing Algorithms,
Springer, pp 312–321
Kocijan J, Girard A, Banko B, Murray-Smith R (2005) Dynamic systems identifica-
tion with Gaussian processes. Mathematical and Computer Modelling of Dynamical
Systems 11(4):411–424
Lindgren F, Rue H, Lindström J (2011) An explicit link between Gaussian fields
and Gaussian Markov random fields: the stochastic partial differential equation ap-
proach. JRSS B 73(4):423–498
Matérn B (1960) Spatial variation. Tech. rep., Meddelanden från Statens Skogforskn-
ingsinstitut, band 49 - Nr 5

12
McHutchon AJ (2015) Nonlinear modelling and control using Gaussian processes. PhD
thesis, University of Cambridge

Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate

Gaussian process regression. JMLR 6:1939–1959
Quiñonero-Candela J, Rasmussen CE, Figueiras-Vidal AR, et al (2010) Sparse
spectrum Gaussian process regression. Journal of Machine Learning Research
11(Jun):1865–1881

Rasmussen CE, Williams CK (2006) Gaussian Processes for Machine Learning. MIT
Press, Cambridge, MA
Särkkä S (2013) Bayesian Filtering and Smoothing. Cambridge University Press
Särkkä S, Solin A, Hartikainen J (2013) Spatiotemporal learning via infinite-
dimensional Bayesian filtering and smoothing. IEEE Signal Processing Magazine
30(4):51–61
Särkkä S, Álvarez MA, Lawrence ND (2019) Gaussian process latent force models for
learning and stochastic control of physical systems. IEEE Transactions on Automatic
Control (to appear)

Solin A, Särkkä S (2018) Hilbert space methods for reduced-rank Gaussian process
regression. ArXiv:1401.5508
Svensson A, Solin A, Särkkä S, Schön T (2016) Computationally efficient Bayesian
learning of Gaussian process state space models. In: Artificial Intelligence and
Statistics, pp 213–221

Titsias M (2009) Variational learning of inducing variables in sparse Gaussian pro-

cesses. In: Artificial Intelligence and Statistics, pp 567–574
Turner R, Deisenroth M, Rasmussen C (2010) State-space inference and learning with
Gaussian processes. In: Proceedings of the Thirteenth International Conference on
Artificial Intelligence and Statistics, pp 868–875

Gaussian Processes For Regression: A Tutorial
No ratings yet
Gaussian Processes For Regression: A Tutorial
7 pages
Stochastic Differential Equations in Machine Learning
No ratings yet
Stochastic Differential Equations in Machine Learning
26 pages
Spatiotemporal Learning Via Infinite-Dimensional Bayesian Filtering and Smoothing A Look at Gaussian Process Regression Through Kalman Filtering
No ratings yet
Spatiotemporal Learning Via Infinite-Dimensional Bayesian Filtering and Smoothing A Look at Gaussian Process Regression Through Kalman Filtering
11 pages
Gaussian Processes in Machine Learning
No ratings yet
Gaussian Processes in Machine Learning
9 pages
Gaussian Process For Nonstationary Time Series Prediction: So$ane Brahim-Belhouari, Amine Bermak
No ratings yet
Gaussian Process For Nonstationary Time Series Prediction: So$ane Brahim-Belhouari, Amine Bermak
8 pages
Gaussian Processes Regression Tutorial
No ratings yet
Gaussian Processes Regression Tutorial
30 pages
GP Ts Kfrts
No ratings yet
GP Ts Kfrts
6 pages
Gaussian Processes Tutorial
No ratings yet
Gaussian Processes Tutorial
31 pages
Gaussian Process - Part 2: 1 2 N T I 1 2 N T
No ratings yet
Gaussian Process - Part 2: 1 2 N T I 1 2 N T
4 pages
Wilson2020 Part1
No ratings yet
Wilson2020 Part1
52 pages
Tutorial
No ratings yet
Tutorial
11 pages
Gaussian Processes For Machine
No ratings yet
Gaussian Processes For Machine
62 pages
5772 Learning Stationary Time Series Using Gaussian Processes With Nonparametric Kernels
No ratings yet
5772 Learning Stationary Time Series Using Gaussian Processes With Nonparametric Kernels
9 pages
Kriging
No ratings yet
Kriging
10 pages
Yirun Fu Reaserch Paper
No ratings yet
Yirun Fu Reaserch Paper
42 pages
1 s2.0 S0005109820303253 Main
No ratings yet
1 s2.0 S0005109820303253 Main
11 pages
Deep GP Untuk Speech
No ratings yet
Deep GP Untuk Speech
8 pages
Tutorial: Gaussian Process Models For Machine Learning
No ratings yet
Tutorial: Gaussian Process Models For Machine Learning
35 pages
Energetic Variational Gaussian Process Regression For Computer Experiments
No ratings yet
Energetic Variational Gaussian Process Regression For Computer Experiments
19 pages
Gaussian Regression
No ratings yet
Gaussian Regression
207 pages
Machine Learning and Pattern Recognition Gaussian Processes
No ratings yet
Machine Learning and Pattern Recognition Gaussian Processes
6 pages
Gaussian Process Regression in Finance
No ratings yet
Gaussian Process Regression in Finance
36 pages
Fundamentals of Statistical Signal Processing - Estimation Theory-Kay
100% (1)
Fundamentals of Statistical Signal Processing - Estimation Theory-Kay
303 pages
Lecture 13: Gaussian Process Optimization: 1.1 Submodularity
No ratings yet
Lecture 13: Gaussian Process Optimization: 1.1 Submodularity
13 pages
Aravkin Et Al. 2017 - Generalized Kalman Smoothing - Modeling and Algorithms
No ratings yet
Aravkin Et Al. 2017 - Generalized Kalman Smoothing - Modeling and Algorithms
30 pages
Thesis GPR
No ratings yet
Thesis GPR
116 pages
Chapter 14 MCMC For Continuous Distribution, Gaussian Process (Lecture On 02-18-2021) - STAT 243 - Stochastic Process
No ratings yet
Chapter 14 MCMC For Continuous Distribution, Gaussian Process (Lecture On 02-18-2021) - STAT 243 - Stochastic Process
6 pages
Lecture6 2015
No ratings yet
Lecture6 2015
36 pages
Learning Graphical Models For Stationary Time Series: Fbach@cs - Berkeley.edu Jordan@cs - Berkeley.edu
No ratings yet
Learning Graphical Models For Stationary Time Series: Fbach@cs - Berkeley.edu Jordan@cs - Berkeley.edu
20 pages
Gaussian Process Regression: 4F10 Pattern Recognition, 2010
No ratings yet
Gaussian Process Regression: 4F10 Pattern Recognition, 2010
40 pages
Tungban Probabilistic ML 2021 - Lecture09
No ratings yet
Tungban Probabilistic ML 2021 - Lecture09
46 pages
Gaussian Process Kernels For Pattern Discovery and Extrapolation
No ratings yet
Gaussian Process Kernels For Pattern Discovery and Extrapolation
10 pages
Linear Dynamical Models, Kalman Filtering and Statistics. Lecture Notes To IN-ST 259
No ratings yet
Linear Dynamical Models, Kalman Filtering and Statistics. Lecture Notes To IN-ST 259
163 pages
Introduction To Gaussian Process Models: C Esar Lincoln Cavalcante Mattos
No ratings yet
Introduction To Gaussian Process Models: C Esar Lincoln Cavalcante Mattos
54 pages
Gaussian Process Tutorial
No ratings yet
Gaussian Process Tutorial
59 pages
EGIRAFFE Computational Intelligence UE - Joergan - Protokoll - 2019SS
No ratings yet
EGIRAFFE Computational Intelligence UE - Joergan - Protokoll - 2019SS
11 pages
Gaussian Processes in Bayesian Regression
No ratings yet
Gaussian Processes in Bayesian Regression
146 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
Kalman Filters Switching Kalman Filter: Adventures of Our BN Hero
No ratings yet
Kalman Filters Switching Kalman Filter: Adventures of Our BN Hero
22 pages
METULecture 1
No ratings yet
METULecture 1
15 pages
Convergence of Gaussian Process Regression With Estimated Hyper-Parameters and Applications in Bayesian Inverse Problems
No ratings yet
Convergence of Gaussian Process Regression With Estimated Hyper-Parameters and Applications in Bayesian Inverse Problems
28 pages
Arthur E. Albert, Leland A. Gardner Jr. Stochastic Approximation and NonLinear Regression
No ratings yet
Arthur E. Albert, Leland A. Gardner Jr. Stochastic Approximation and NonLinear Regression
211 pages
The Art of Gaussian Processes Classic and Contemporary
No ratings yet
The Art of Gaussian Processes Classic and Contemporary
216 pages
Estimation Theory
No ratings yet
Estimation Theory
603 pages
ML Algorithm For PM
No ratings yet
ML Algorithm For PM
8 pages
How To Choose The Covariance For Gaussian Process Regres-Sion Independently of The Basis
No ratings yet
How To Choose The Covariance For Gaussian Process Regres-Sion Independently of The Basis
4 pages
Gaussian Processes for ML Students
No ratings yet
Gaussian Processes for ML Students
15 pages
A Step by Step Mathematical Derivation A
No ratings yet
A Step by Step Mathematical Derivation A
32 pages
Fundamentals of Statistical Signal Processing Estimation 3001q9c4fj
No ratings yet
Fundamentals of Statistical Signal Processing Estimation 3001q9c4fj
5 pages
Kalman Filter Basics & Gaussian Distributions
No ratings yet
Kalman Filter Basics & Gaussian Distributions
11 pages
Orthogonal Gaussian Process
No ratings yet
Orthogonal Gaussian Process
30 pages
Glmext4 Preview
No ratings yet
Glmext4 Preview
27 pages
Hyper-Parameter Initialization For Squared Exponential Kernel-Based Gaussian Process Regression
No ratings yet
Hyper-Parameter Initialization For Squared Exponential Kernel-Based Gaussian Process Regression
6 pages
Sigma-Point Kalman Filters For Probabilistic Inference in Dynamic State-Space Models - Van Der Merve
No ratings yet
Sigma-Point Kalman Filters For Probabilistic Inference in Dynamic State-Space Models - Van Der Merve
398 pages
Time Grad
No ratings yet
Time Grad
11 pages
1994 - Recursive-Bayesian-Location-Of-A-Discontinuity-In-Time-Series
No ratings yet
1994 - Recursive-Bayesian-Location-Of-A-Discontinuity-In-Time-Series
4 pages
On Computational Gestalt Detection Thresholds
No ratings yet
On Computational Gestalt Detection Thresholds
14 pages
A Variational Approach To Joint Denoising, Edge Detection and Motion Estimation
No ratings yet
A Variational Approach To Joint Denoising, Edge Detection and Motion Estimation
10 pages
Lectures On Stochastic Methods For Image Analysis
No ratings yet
Lectures On Stochastic Methods For Image Analysis
67 pages
Phase-Stretch Adaptive Gradient-Field Extractor
No ratings yet
Phase-Stretch Adaptive Gradient-Field Extractor
6 pages
Gaussian Process Approximations of Stochastic Differential Equation
No ratings yet
Gaussian Process Approximations of Stochastic Differential Equation
16 pages
Researchers in Gaussian Processes
No ratings yet
Researchers in Gaussian Processes
14 pages
An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields - The SPDE Approach
No ratings yet
An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields - The SPDE Approach
39 pages
Gaussian Process Latent Force Models For Learning and Stochastic Control of Physical Systems
No ratings yet
Gaussian Process Latent Force Models For Learning and Stochastic Control of Physical Systems
7 pages
On Global Representations of The Solutions of Linear Differential Equations As A Product of Exponentials
No ratings yet
On Global Representations of The Solutions of Linear Differential Equations As A Product of Exponentials
8 pages
System Identification
No ratings yet
System Identification
6 pages
ELEC4632 - Lab - 01 - 2022 v1
No ratings yet
ELEC4632 - Lab - 01 - 2022 v1
13 pages
ISA Transactions: Mahdi Kazemi, Mohammad Mehdi Are Fi
No ratings yet
ISA Transactions: Mahdi Kazemi, Mohammad Mehdi Are Fi
7 pages
FOMCON Toolbox Fractional Order Mathlab
No ratings yet
FOMCON Toolbox Fractional Order Mathlab
26 pages
Matlab Solved Problems
50% (2)
Matlab Solved Problems
137 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
(Peter Van Overschee, Bart de Moor (Auth.) ) Subs
No ratings yet
(Peter Van Overschee, Bart de Moor (Auth.) ) Subs
262 pages
EI6801-Computer Control of Processes
No ratings yet
EI6801-Computer Control of Processes
13 pages
UAV Applications & Control Review
No ratings yet
UAV Applications & Control Review
18 pages
Nonlinearity in Structural Dynamics Chapter App Biblio
No ratings yet
Nonlinearity in Structural Dynamics Chapter App Biblio
14 pages
Open Source Robotics Test Bench
No ratings yet
Open Source Robotics Test Bench
12 pages
An Overview of Systems-Theoretic Guarantees in Data-Driven Model Predictive Control
No ratings yet
An Overview of Systems-Theoretic Guarantees in Data-Driven Model Predictive Control
25 pages
Main Ref1
No ratings yet
Main Ref1
10 pages
On Neural Networks in Identification and Control of Dynamic Systems
No ratings yet
On Neural Networks in Identification and Control of Dynamic Systems
34 pages
DC Motor System Identification and Speed Control Using dSPACE Tools
No ratings yet
DC Motor System Identification and Speed Control Using dSPACE Tools
14 pages
Estimation of Parameters of Under-Damped Second Order Plus Dead Time Processes Using Relay Feedback
No ratings yet
Estimation of Parameters of Under-Damped Second Order Plus Dead Time Processes Using Relay Feedback
6 pages
System Identification
No ratings yet
System Identification
2 pages
2009 - Modelling of Primary Reformer Tube Metal Temperature (TMT)
No ratings yet
2009 - Modelling of Primary Reformer Tube Metal Temperature (TMT)
79 pages
System Identification Assignment
No ratings yet
System Identification Assignment
9 pages
Antena
No ratings yet
Antena
5 pages
Evaluation of Damping Ratios For The Seismic Analysis of Tall Buildings
No ratings yet
Evaluation of Damping Ratios For The Seismic Analysis of Tall Buildings
10 pages
Modeling A Third Order Noninteracting Liquid Level Tank System Empirically and Theoretically
No ratings yet
Modeling A Third Order Noninteracting Liquid Level Tank System Empirically and Theoretically
6 pages
Modelling and Simulation - Lecture 01
100% (1)
Modelling and Simulation - Lecture 01
7 pages
Adaptive Control Book
100% (1)
Adaptive Control Book
30 pages
Aeroservoelastic Modeling, Analysis, and Design Techniques PDF
No ratings yet
Aeroservoelastic Modeling, Analysis, and Design Techniques PDF
12 pages
Theme 1: Curve Fitting: Identification of Linear and Nonlinear Dynamical Systems
No ratings yet
Theme 1: Curve Fitting: Identification of Linear and Nonlinear Dynamical Systems
7 pages
System Identification Seminar
No ratings yet
System Identification Seminar
2 pages
Adaptive PID Control with Neural Tuning
No ratings yet
Adaptive PID Control with Neural Tuning
7 pages
Hardware Implementation of The Neural Network
No ratings yet
Hardware Implementation of The Neural Network
9 pages
Discrete State-Variable Techniques
No ratings yet
Discrete State-Variable Techniques
101 pages

The Use of Gaussian Processes in System Identification

Uploaded by

The Use of Gaussian Processes in System Identification

Uploaded by

The Use of Gaussian Processes in System

4.1.1 Gaussian process regression problem

f (z) ∼ GP(m(z), k(z, z0 )), (1a)

4.1.2 Gaussian process regression equations

given as k(zi , zj ) and k(z∗i , zj ), respectively. By conditioning this joint Gaussian

4.1.3 Hyperparameter learning

A common way to estimate the parameters is to maximize the marginal likelihood

4.1.4 Reduction of computational complexity

4.2 GP-NFIR, GP-NARX, GP-NOE, and related models

4.2.1 GP-NFIR model

4.2.2 GP-NARX model

yk = f (yk−1 , . . . , yk−n , uk−1 , . . . , uk−m ) + k , (9)

4.2.3 GP-NOE model

4.2.4 Other model architectures

4.3 Gaussian process state-space (GPSS) models

4.3.1 General GPSS model

xk+1 = f (xk , uk ) + wk , (11a)

4.3.2 Learning with fully observed state

4.3.3 Marginalization of the GP

x̃k+1 = h(x̃k , uk ) + w̃k , (12a)

xk+1 = f (xk , uk ) + wk , (13a)

p(fk | f1:k−1 , x0:k−1 ) (14)

p(f (x∗ ) | x∗ , y1:N )

4.3.4 Approximation of the GP

4.4 Spatio-temporal Gaussian process models

4.4.2 Spatio-temporal Gaussian processes

where A is a matrix of operators and C is a matrix of functionals (Särkkä et al, 2013).

4.4.3 Latent force models

5 Summary and Future Directions

Cressie NAC (1993) Statistics for Spatial Data. Wiley

Frigola R, Chen Y, Rasmussen CE (2014a) Variational Gaussian process state-space

Hartikainen J, Srkk S (2010) Kalman filtering and smoothing solutions to temporal

Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate

Titsias M (2009) Variational learning of inducing variables in sparse Gaussian pro-

You might also like

yk = f (yk−1 , . . . , yk−n , uk−1 , . . . , uk−m ) + k , (9)