[go: up one dir, main page]

0% found this document useful (0 votes)
131 views13 pages

The Use of Gaussian Processes in System Identification

Gaussian processes are used in system identification to model input-output relationships from observed data. Specifically, Gaussian process regression places a Gaussian process prior on the unknown system function and conditions it on observed input-output data. This allows modeling of nonlinear finite impulse response models and nonlinear autoregressive models from time series data. Gaussian process state-space models can also learn the dynamic and measurement models of a nonlinear system directly from input-output data. The key aspects of Gaussian process regression are outlined, including the Bayesian formulation, inference equations, and learning of hyperparameters from data.

Uploaded by

aa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views13 pages

The Use of Gaussian Processes in System Identification

Gaussian processes are used in system identification to model input-output relationships from observed data. Specifically, Gaussian process regression places a Gaussian process prior on the unknown system function and conditions it on observed input-output data. This allows modeling of nonlinear finite impulse response models and nonlinear autoregressive models from time series data. Gaussian process state-space models can also learn the dynamic and measurement models of a nonlinear system directly from input-output data. The key aspects of Gaussian process regression are outlined, including the Bayesian formulation, inference equations, and learning of hyperparameters from data.

Uploaded by

aa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

The Use of Gaussian Processes in System

Identification
Simo Särkkä
To appear in Encyclopedia of systems and control, 2nd edition
arXiv:1907.06066v1 [stat.ML] 13 Jul 2019

1 Abstract
Gaussian processes are used in machine learning to learn input-output mappings from
observed data. Gaussian process regression is based on imposing a Gaussian process
prior on the unknown regressor function and statistically conditioning it on the ob-
served data. In system identification, Gaussian processes are used to form time series
prediction models such as non-linear finite-impulse response (NFIR) models as well
as non-linear autoregressive (NARX) models. Gaussian process state-space models
(GPSS) can be used to learn the dynamic and measurement models for a state-space
representation of the input-output data. Temporal and spatio-temporal Gaussian pro-
cesses can be directly used to form regressor on the data in the time domain. The aim
of this article is to briefly outline the main directions in system identification methods
using Gaussian processes.

2 Keywords
Gaussian process regression, non-linear system identification, GP-NFIR model, GP-
ARX model, GP-NOE model, Gaussian process state-space model, temporal Gaussian
process, state-space Gaussian process

3 Introduction
Gaussian process regression (Rasmussen and Williams, 2006) refers to a statistical
methodology where we use Gaussian processes as prior models for regression func-
tions that we fit to observed data. This kind of methodology is particularly popular in
machine learning although the origins of the basic ideas can be traced to geostatistics
(Cressie, 1993). In geostatistics, the corresponding methodology is called ”kriging”
which is named after South African mining engineer D. G. Krige. In system identifica-
tion, Gaussian processes can be used to identify (or ”learn” in machine learning terms)
the input-output relationships from observed data. Even when there is no explicit input
in the system, Gaussian processes can be used to identify a model for an observed time
series of outputs which is a specific form of a system identification problem. Overviews
of the use of Gaussian processes in system identification can be found, for example, in
the monograph of Kocijan (2016), and PhD theses of McHutchon (2015) and Frigola
(2016).

1
4 Gaussian processes in system identification
4.1 Gaussian process regression
Gaussian process regression is concerned with the following problem: Given a set
of observed (training) input-output data D = {(zk , yk ) : k = 1, . . . , N } from an
unknown function y = f (z), predict the values of the function at new (test) inputs
{z∗k : k = 1, . . . , M }. That is, the problem is a classical regression problem. However,
the classical solution to the problem usually amounts to fixing a parametric function
class f (z; θ), where θ is a set of parameters and then fitting the parameters to the
observed data. In Gaussian process regression, we take a different route; instead of
fixing a parametric class of functions, we put a Gaussian process prior measure on
the whole regression function and condition on the observed data using Bayes’ rule
(Rasmussen and Williams, 2006).

4.1.1 Gaussian process regression problem


Mathematically, the Gaussian process regression problem can be written as

f (z) ∼ GP(m(z), k(z, z0 )), (1a)


yk = f (zk ) + k , k ∼ N (0, σn2 ), (1b)

where Equation (1a) tells that, a priori, the function is a Gaussian process
with mean function m(z) = E[f (z)] and covariance function (or kernel)
k(z, z0 ) = Cov[f (z), f (z0 )] = E[(f (z) − m(z)) (f (z0 ) − m(z0 ))]. Equation (1b)
tells that we observe the function values at points zk , k = 1, . . . , N and that they are
corrupted by (independent) Gaussian noises with variance σn2 .
The mean and covariance functions define the regressor function class and they, or
at least their parametric classes, need to be selected a priori. The mean function can
typically be selected to be identically zero m(z) = 0. The covariance function defines
the smoothness properties of the functions, and a typical choice in machine learning is
the squared exponential covariance function

kz − z0 k2
 
k(z, z0 ) = s2 exp − (2)
2`2

which produces infinitely differentiable (i.e., analytic) regressor functions. The pa-
rameters s and ` in the aforementioned covariance function define the magnitude and
length scales of the regressor functions, respectively. Other common choices of co-
variance functions are, for example, the Matérn class of covariance functions (Matérn,
1960; Rasmussen and Williams, 2006).

4.1.2 Gaussian process regression equations


Given the mean and covariance functions as well as the measurements, we can form the
Gaussian process regressor. Assuming that the noises are independent of the function
values, we can write the joint distribution of the observed values and the unknown
function values as follows:
k> (Z∗ )

K + σn2 I
     
y m(Z)
∼N , , (3)
f (Z∗ ) m(Z∗ ) k(Z∗ ) k(Z∗ , Z∗ )

2
> >
where y = y1 . . . yN , m(Z) = m(z1 ) · · · m(zN ) , m(Z∗ ) =
>
m(z∗1 ) . . . m(z∗M ) , and K and k(Z∗ ) denote matrices with element (i, j)


given as k(zi , zj ) and k(z∗i , zj ), respectively. By conditioning this joint Gaussian


distribution on the measurements y we get that the conditional (i.e., posterior)
>
distribution of the function values f (Z∗ ) = f (z∗1 ) . . . f (z∗M ) is Gaussian with
the mean and covariance
E[f (Z∗ ) | y] = m(Z∗ )
−1
+ k(Z∗ ) K + σn2 I

(y − m(Z)),
∗ ∗ ∗ (4)
Cov[f (Z ) | y] = k(Z , Z )
−1 > ∗
− k(Z∗ ) K + σn2 I

k (Z ).
These are the fundamental equations of Gaussian process regression. An example of
Gaussian process regression with squared exponential covariance function is shown in
Figure 1.

4.1.3 Hyperparameter learning


Even though Gaussian process regression is a non-parametric method, for which we
do not need to fix a parametric class of functions, the mean and covariance functions
can have unknown hyperparameters ϕ which can be estimated from data. For example,
the squared exponential covariance function in Equation (2) has the hyperparameters
ϕ = (s, `).
2

1.5

0.5

-0.5

-1
True function
-1.5
Regression mean
Observations
-2
Quantiles
-2.5
0 2 4 6 8 10

Figure 1: Example of Gaussian process regression with squared exponential covariance func-
tion. The true function is a sinusoidal which is observed only at 10 points that are corrupted by
Gaussian noise. The quantiles provide error bars for the predicted function values.

A common way to estimate the parameters is to maximize the marginal likelihood


– also called evidence – p(y | ϕ) of the measurements, or equivalently, minimize the
negative log-likelihood of the measurements
1
− log p(y | ϕ) = log |2π (Kϕ + σn2 I)|
2
1 −1 (5)
+ (y − mϕ (Z))> Kϕ + σn2 I

2
× (y − mϕ (Z)).

3
The gradient of this function with respect to the hyperparameters is also available (see,
e.g., Rasmussen and Williams, 2006) which allows for the use of gradient-based opti-
mization methods to estimate the parameters.
Instead of using the maximum likelihood method to estimate the parameters, it is
also possible to use a Bayesian approach to the problem and consider the posterior
distribution of the hyperparameters
p(y | ϕ) p(ϕ)
p(ϕ | y) = R , (6)
p(y | ϕ) p(ϕ) dϕ
where p(ϕ) is the prior distribution of the hyperparameters. We can, for example,
compute the maximum a posteriori estimate of the parameters by finding the maximum
of this distribution or use Markov chain Monte Carlo (MCMC) methods (Brooks et al,
2011) to estimate the statistics of the distribution.
In what follows, to avoid notational clutter, we drop out the hyperparameters from
the Gaussian process formulations and inference methods although they are commonly
estimated as part of the Gaussian process learning.

4.1.4 Reduction of computational complexity


A limitation of Gaussian process regression in its explicit form is that the computa-
tional complexities of the regression Equations (4) and likelihood Equation (5) are
cubic O(N 3 ) in the number of measurements N . This is due to the N × N matrix
inversion appearing in the equations which, even when implemented with Cholesky or
LU decompositions, needs a cubic number of computational steps.
Ways of solving the computational complexity problem are, for example, sparse
approximations using inducing points (Quiñonero-Candela and Rasmussen, 2005; Ras-
mussen and Williams, 2006; Titsias, 2009), approximating the problem with a discrete
Gaussian random field model (Lindgren et al, 2011), or use of random or deterministic
basis/spectral expansions (Quiñonero-Candela et al, 2010; Solin and Särkkä, 2018).

4.2 GP-NFIR, GP-NARX, GP-NOE, and related models


In system identification, we can use Gaussian processes to model unknown input-
output relationships in time series. Several different model architectures are available
for this purpose. Let us assume that we have a system with input sequence u1 , u2 , . . .
and output sequence y1 , y2 , . . . and the aim is to predict the outputs from inputs. We
also assume that we have been given a set of training data consisting of known inputs
and (noisy) outputs. In the following, we present some typically used architectures that
have been proposed for this purpose. More details can be found in the monograph of
Kocijan (2016).

4.2.1 GP-NFIR model


The Gaussian process non-linear finite impulse response (GP-NFIR) model (Acker-
mann et al, 2011; Kocijan, 2016) has the form (see Figure 2)
ŷk = f (uk−1 , . . . , uk−m ), (7)
where f (·) is an unknown mapping which we model as a Gaussian process, and ŷk
denotes the estimate produced by the regressor. In this model, we form a Gaussian pro-
cess regressor that predicts the current output from a finite number of previous inputs.

4
This model can be identified by reducing it into a Gaussian process regression model

yk = f (zk ) + k , (8a)
>
where zk = uk−1 ... uk−m and by using standard Gaussian process regression
methods on it.
k
uk
uk−1
GP model yk

uk−m

Figure 2: In GP-NFIR model the Gaussian process regressor is used to predict next output from
previous inputs.

4.2.2 GP-NARX model


The Gaussian process nonlinear autoregressive model with exogenous input (GP-
NARX) (Kocijan et al, 2005; Kocijan, 2016) is a model of the form (see Figure 3)

yk = f (yk−1 , . . . , yk−n , uk−1 , . . . , uk−m ) + k , (9)

where k is a Gaussian random variable. This model can be reduced to a Gaussian pro-
>
cess regression problem by setting zk = yk−1 · · · yk−n uk−1 · · · uk−m
in Equation (8a).

uk
k

uk−m
yk GP model yk
yk−1

yk−n

Figure 3: In GP-NARX model the Gaussian process is used to predict the next output from the
previous inputs and outputs.

4.2.3 GP-NOE model


In Gaussian process nonlinear output error (GP-NOE) model (Kocijan and Petelin,
2011; Kocijan, 2016) we form a Gaussian process regressor for the problem (see Fig-
ure 4)
yk = f (ŷk−1 , . . . , ŷk−n , uk−1 , . . . , uk−m ) + k , (10)

5
where ŷk−1 , . . . , ŷk−n are the Gaussian process regressor predictions from the previous
steps.

uk
k

uk−m
ŷk−1 GP model yk
q−1
ŷk−n
q−n

Figure 4: The GP-NOE model uses previous inputs and the previous outputs of the Gaussian
process regressor to predict the next output. In the figure, q−n denotes an n-step delay operator.

Learning in this kind of model requires further approximations because the predic-
tions of the Gaussian process are directly used as inputs on the next step.

4.2.4 Other model architectures


As discussed in Kocijan (2016), it is also possible extend these architectures to, for
example, GP-NARMAX (nonlinear autoregressive and moving average model with
exogenous input) models and NJB (nonlinear BoxJenkins) models.

4.3 Gaussian process state-space (GPSS) models


Another approach to system identification is to form a state-space model where the
dynamic and measurement models are identified using Gaussian process regression
methods. This leads to so-called Gaussian process state-space models.

4.3.1 General GPSS model


A Gaussian process state-space (GPSS) model (see Figure 5) has the mathematical
form (e.g. Kocijan, 2016)

xk+1 = f (xk , uk ) + wk , (11a)


yk = g(xk , uk ) + k , (11b)

where the state vector xk , k = 0, 1, 2, . . . , N contains the current state of the system,
u1 , u2 , . . . is the input sequence and y1 , y2 , . . . is the output sequence. In the model, wk
is a Gaussian distributed process noise. The aim is now to learn the functions f (xk , uk )
and g(xk , uk ), which are modeled as Gaussian processes, given the input and output
sequences, or in some cases, also given direct observations of the state vector.

4.3.2 Learning with fully observed state


When the state vector xk is fully observed, then both the dynamic model (11a) and
measurement model (11b) become standard Gaussian process regression models. In

6
wk

uk−1
GP model f xk
xk−1
−1
q

k
xk

GP model g yk
uk

Figure 5: In GPSS model we learn a Gaussian process regressor for approximating the dynamic
and measurement models in a state-space model. In this figure, q−1 denotes a one-step delay
operator.

dynamic model (11a), the training set consists of measurements xk+1 with the cor-
responding inputs (xk , uk ), and in measurement model (11b) the measurements are
yk with the corresponding inputs (xk , uk ). This kind of fully observed models are
important in many applications such as robotics (Deisenroth et al, 2015).
After conditioning on the training data, the functions f and g will still be Gaussian
processes and their mean and covariance functions are given by (multivariate general-
izations) of Equations (4). State estimation in this kind of models has been considered
by Ko and Fox (2009) and Deisenroth et al (2011), and it turns out that it is possible
to construct closed-form Gaussian approximation (moment matching) based filters and
smoothers for these models (Deisenroth et al, 2011). Control problems related to this
kind of models have been considered, for example, by Deisenroth et al (2015).

4.3.3 Marginalization of the GP


When the states xk are not observed, then we need to treat both the states and the Gaus-
sian processes as unknown. There are a few different ways to cope with the model in
that case, and one approach is the marginalization approach of Frigola et al (2013,
2014b,a) and Frigola (2016). First note that if we have a method to learn f , we can
learn both f and g using a state-augmentation trick (Frigola, 2016): we define an aug-
> >
mented state as x̃ = x γ , Gaussian process h(γ, u) = f (x, u) g(x, u) , and
>
augmented process noise w̃ = wk 0 , which reduces the model to

x̃k+1 = h(x̃k , uk ) + w̃k , (12a)


y k = γ k + k . (12b)

This is now a model with an unknown dynamic model, but with a given linear Gaussian
measurement model p(yk | x̃k , uk ) = N (yk | γk , σn2 ).
Thus, without a loss of generality, we can focus on models with an unknown dy-

7
namic model f , and a known measurement model:

xk+1 = f (xk , uk ) + wk , (13a)


yk ∼ p(yk | xk , uk ). (13b)

The aim is to learn the function f (x) from a sequence of measurement data yk .
One way to understand (Frigola, 2016) this model is that we could hypothetically
generate data from it by sampling the (infinite-dimensional) function f (x, u), and then
starting from x0 sequentially produce each {f1 , x1 , . . . , fN , xN }, where we have de-
noted fk = f (xk−1 , uk−1 ) for k = 1, 2, . . . , N . Each of the conditional distributions

p(fk | f1:k−1 , x0:k−1 ) (14)

turns out to be Gaussian. Note that above, we have introduced the short-hand notation
f1:k = (f1 , . . . , fk ) which we will use also in the rest of this article.
The above observation allows us to integrate out (i.e., marginalize) the Gaussian
process from the model in closed form. The result is the following representation
(Frigola et al, 2013):
N
Y
p(x0:N ) = N (xk | µk (x0:k−1 ), Σk (x0:k−1 )), (15)
k=1

where the means µk (x0:k−1 ) and covariances Σk (x0:k−1 ) are (quite complicated)
functions of the prior mean and covariance functions of the Gaussian process f , which
are evaluated on the whole previous state history. The above equation defines a non-
Markovian prior model for the state sequence xk , k = 1, . . . , N .
For a given x0:N , the distribution p(f (x∗ ) | x∗ , x0:N ) for a test point x∗ can be
computed by using conventional Gaussian process prediction equation as follow:

p(f (x∗ ) | x∗ , y1:N )


(16)
Z
= p(f (x∗ ) | x∗ , x0:N ) p(x0:N | y1:N ) dx0:N ,

which can be numerically approximated, provided that we use a convenient (e.g. Monte
Carlo) approximation for p(x0:N | y1:N ).
Given the model (15) it is then possible to use, for example, particle Markov chain
Monte Carlo methods (Frigola et al, 2013) to sample state trajectories from the poste-
rior distribution p(x0:N | y1:N ) jointly with the parameters of the model, which pro-
vides a Monte Carlo approximation to the above integral. Other proposed approaches
are, for example, particle stochastic approximation expectation–maximization (EM,
Frigola et al, 2014b) which uses a Monte Carlo approximation to the EM algorithm
aiming at computing the maximum likelihood estimates of the parameters while han-
dling the states as missing data.

4.3.4 Approximation of the GP


Another way to approach the problem where both the states and Gaussian processes
are unknown is to approximate the Gaussian process as finite-dimensional parametric
model and use conventional parameter estimation methods to the model.

8
One possible approximation considered in Svensson et al (2016) is to employ a
Karhunen–Loeve type of basis function expansion of the Gaussian process as follows:
S
X
f (x, u) = ci φi (x, u), (17)
i=1

where φi (x, u) are deterministic basis functions (e.g. sinusoidals) and ci are Gaussian
random variables. With this approximation, the model in Equation (13) becomes
S
X
xk+1 = ci φi (xk , uk ) + wk , (18a)
i=1
yk ∼ p(yk | xk , uk ), (18b)
where learning of the Gaussian process f reduces to estimation of the finite number of
>
parameters c = c1 · · · cS in the state-space model. The states and parameters
in this model can now be determined using, for example, particle Markov chain Monte
Carlo (PMCMC) methods (see Svensson et al, 2016).
Another possibility is to use inducing points (typically denoted with u, but here
we denote them with f u to avoid confusion with the input sequence). In those ap-
proaches the idea is to first perform Gaussian process inference on the inducing points
alone, that is, compute p(f u | y1:N ) and then compute the (approximate) predictions
by conditioning on the inducing points instead of the original data. This leads to the
approximation
p(f (x∗ ) | y1:N )
Z
= p(f (x∗ ) | x∗ , x0:N ) p(x0:N | y1:N ) dx0:N
(19)
Z
≈ p(f (x∗ ) | x∗ , f u ) p(f u | y1:N ) df u .

In the Turner et al (2010) the inducing points are learned using expectation–
maximization (EM) algorithm. Frigola et al (2014a) propose a method for variational
learning (or integration over) the inducing points by forming a variational Bayesian
approximation q(f u ) ≈ p(f u | y1:N ), which further results in
p(f (x∗ ) | y1:N )
(20)
Z
≈ p(f (x∗ ) | x∗ , f u ) q(f u ) df u

and turns out to be analytically tractable as the optimal variational distribution q(f u ) is
Gaussian.

4.4 Spatio-temporal Gaussian process models


4.4.1 Temporal Gaussian processes
Another way of modeling time series using Gaussian processes is by considering them
as functions of time (Hartikainen and Srkk, 2010) which are sampled at certain time
instants t1 , t2 , . . .:
f (t) ∼ GP(m(t), k(t, t0 )),
(21)
yk = f (tk ) + k .

9
That is, instead of attempting to form a predictor from the previous measurements or
inputs, the idea is to condition the temporal Gaussian process on its observed values
and use the conditional Gaussian process for predicting values at new time points.
Unfortunately, due to the cubic computational scaling of the Gaussian process re-
gression, this quickly becomes in tractable when time series length increases. However,
temporal Gaussian process regression is closely related to the classical Kalman filtering
and Rauch-Tung-Striebel smoothing problems (e.g. Särkkä, 2013), which can be used
to reduce the required computations. It turns out that provided that the Gaussian pro-
cess is stationary, that is, the covariance function only depends on the time difference
k(t, t0 ) = k(t − t0 ), then, under certain restrictions, the Gaussian process regression
problem is essentially equivalent to state-estimation in a model of the form
dx(t)
= A x(t) + B η(t),
dt (22)
yk = C x(tk ) + k ,

where η(t) is a white noise process and the matrices A, B, and C are selected suitably
to match the original covariance function. For example, the Matérn covariance func-
tions with half-integer smoothness parameters have exact representations as state-space
models (Hartikainen and Srkk, 2010).
The Gaussian process regression problem can be now solved by applying a Kalman
filter and Rauch-Tung-Striebel smoother on this problem. These methods have the
fortunate property, that their complexity is linear O(N ) with respect to the number
of measurements N as opposed to the cubic complexity of direct Gaussian process
regression solution.

4.4.2 Spatio-temporal Gaussian processes


A similar state-space approach also works for spatio-temporal Gaussian process models
with a covariance function of a stationary form k(z, t; z0 , t0 ) = k(z, z0 ; t − t0 ). In that
case, the state-estimation problem becomes infinite-dimensional, that is, a distributed
parameter system
dx(z, t)
= A x(z, t) + B η(z, t),
dt (23)
yk = C x(z, tk ) + k ,

where A is a matrix of operators and C is a matrix of functionals (Särkkä et al, 2013).


The solution of the Gaussian process regression problem in this form requires the use
of methods from partial differential equations, but in many cases we can obtain an exact
O(N ) inference procedure from this route.

4.4.3 Latent force models


In so-called latent force models (Álvarez et al, 2013) the idea is to infer latent force
ξ(t) in a differential equation model such as

d2 x(t) dx(t)
2
+γ + ν 2 = ξ(t), (24)
dt dt
where ξ(t) is an unknown function which is modeled as a Gaussian process. The in-
ference in this model can be recast as Gaussian process regression with a modified

10
covariance function. The idea can be further generalized to partial differential equation
models and non-linear models when approximation methods such as Laplace approxi-
mation are used.
The inference in this kind of models can be further re-stated in the state-space form
(Särkkä et al, 2019), which also allows for the study of control problems on latent force
models. This formulation also allows the analysis of observability and controllability
properties of latent force models.

5 Summary and Future Directions


In this article, we have briefly outlined the main directions in system identification us-
ing Gaussian processes. The methods can be divided into three classes: (1) GP-NFIR,
GP-NARX, and GP-NOE type of models which directly aim at learning the function
from the previous inputs and outputs to the current output, (2) Gaussian process state-
space models which aim to learn the dynamic and measurement models in the state-
space model, and (3) Gaussian process regression methods for the spatio-temporal time
series by direct or state-space Gaussian process regression.
Active problems in research in this area appear to be, for example, joint learning
and control in all the model types outlined here. Another direction of further study is
the analysis of observability, identifiability, and controllability. In practice, the Gaus-
sian process-based system identification methods are very similar to classical methods,
and hence they can be expected to inherit many limitations and theoretical properties
of the classical methods.
An emerging research area in Gaussian process-based models are so-called deep
Gaussian processes (Damianou and Lawrence, 2013) which borrow the idea of deep
neural networks by forming hierarchies of Gaussian processes. This kind of models
could also turn out to be useful in system identification. Furthermore, Gaussian pro-
cesses are also easily combined with first-principles-models (cf. latent force models
described above) which allows for flexible gray box modeling using Gaussian pro-
cesses.
One of the main obstacles in Gaussian process regression is still the computational
scaling in the number of measurements, which is also inherited by all the new develop-
ments. Although several good approaches to tackle this problem have been proposed,
the problem still is that they inherently replace the original model with an approxima-
tion. New better approaches to this problem are likely to appear in the near future.

6 Cross References
References
Ackermann ER, De Villiers JP, Cilliers P (2011) Nonlinear dynamic systems modeling
using Gaussian processes: Predicting ionospheric total electron content over South
Africa. Journal of Geophysical Research: Space Physics 116(10)

Álvarez MA, Luengo D, Lawrence ND (2013) Linear latent force models using Gaus-
sian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence
35(11):2693–2705

11
Brooks S, Gelman A, Jones GL, eng XLM (2011) Handbook of Markov Chain Monte
Carlo. Chapman & Hall/CRC, Boca Raton, FL

Cressie NAC (1993) Statistics for Spatial Data. Wiley


Damianou AC, Lawrence ND (2013) Deep Gaussian processes. In: International Con-
ference on Artificial Intelligence and Statistics (AISTATS), pp 207–215
Deisenroth MP, Turner RD, Huber MF, Hanebeck UD, Rasmussen CE (2011) Robust
filtering and smoothing with Gaussian processes. IEEE Transactions on Automatic
Control 57(7):1865–1871
Deisenroth MP, Fox D, Rasmussen CE (2015) Gaussian processes for data-efficient
learning in robotics and control. IEEE Transactions on Pattern Analysis and Machine
Intelligence 37(2):408–423

Frigola R (2016) Bayesian time series learning with Gaussian processes. PhD thesis,
University of Cambridge
Frigola R, Lindsten F, Schön TB, Rasmussen CE (2013) Bayesian inference and learn-
ing in Gaussian process state-space models with particle MCMC. In: Advances in
Neural Information Processing Systems, pp 3156–3164

Frigola R, Chen Y, Rasmussen CE (2014a) Variational Gaussian process state-space


models. In: Advances in Neural Information Processing Systems, pp 3680–3688
Frigola R, Lindsten F, Schön TB, Rasmussen CE (2014b) Identification of Gaussian
process state-space models with particle stochastic approximation EM. IFAC Pro-
ceedings Volumes, Proceedings of the 19th IFAC World Congress 47(3):4097–4102

Hartikainen J, Srkk S (2010) Kalman filtering and smoothing solutions to temporal


Gaussian process regression models. In: IEEE International Workshop on Machine
Learning for Signal Processing (MLSP), pp 379–384
Ko J, Fox D (2009) GP-BayesFilters: Bayesian filtering using Gaussian process pre-
diction and observation models. Autonomous Robots 27(1):75–90

Kocijan J (2016) Modelling and control of dynamic systems using Gaussian process
models. Springer
Kocijan J, Petelin D (2011) Output-error model training for Gaussian process mod-
els. In: International Conference on Adaptive and Natural Computing Algorithms,
Springer, pp 312–321
Kocijan J, Girard A, Banko B, Murray-Smith R (2005) Dynamic systems identifica-
tion with Gaussian processes. Mathematical and Computer Modelling of Dynamical
Systems 11(4):411–424
Lindgren F, Rue H, Lindström J (2011) An explicit link between Gaussian fields
and Gaussian Markov random fields: the stochastic partial differential equation ap-
proach. JRSS B 73(4):423–498
Matérn B (1960) Spatial variation. Tech. rep., Meddelanden från Statens Skogforskn-
ingsinstitut, band 49 - Nr 5

12
McHutchon AJ (2015) Nonlinear modelling and control using Gaussian processes. PhD
thesis, University of Cambridge

Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate


Gaussian process regression. JMLR 6:1939–1959
Quiñonero-Candela J, Rasmussen CE, Figueiras-Vidal AR, et al (2010) Sparse
spectrum Gaussian process regression. Journal of Machine Learning Research
11(Jun):1865–1881

Rasmussen CE, Williams CK (2006) Gaussian Processes for Machine Learning. MIT
Press, Cambridge, MA
Särkkä S (2013) Bayesian Filtering and Smoothing. Cambridge University Press
Särkkä S, Solin A, Hartikainen J (2013) Spatiotemporal learning via infinite-
dimensional Bayesian filtering and smoothing. IEEE Signal Processing Magazine
30(4):51–61
Särkkä S, Álvarez MA, Lawrence ND (2019) Gaussian process latent force models for
learning and stochastic control of physical systems. IEEE Transactions on Automatic
Control (to appear)

Solin A, Särkkä S (2018) Hilbert space methods for reduced-rank Gaussian process
regression. ArXiv:1401.5508
Svensson A, Solin A, Särkkä S, Schön T (2016) Computationally efficient Bayesian
learning of Gaussian process state space models. In: Artificial Intelligence and
Statistics, pp 213–221

Titsias M (2009) Variational learning of inducing variables in sparse Gaussian pro-


cesses. In: Artificial Intelligence and Statistics, pp 567–574
Turner R, Deisenroth M, Rasmussen C (2010) State-space inference and learning with
Gaussian processes. In: Proceedings of the Thirteenth International Conference on
Artificial Intelligence and Statistics, pp 868–875

13

You might also like