Lecture 7: Introduction to Kriging
Looking at a Nonlinear Regression Problem
2
Looking at a Nonlinear Regression Problem
3
Looking at a Nonlinear Regression Problem
4
Looking at a Nonlinear Regression Problem
True function unknown
5
Looking at a Nonlinear Regression Problem
Can we create this regressor all based on Gaussian? 6
Gaussian Basics
X1 0 1 0.8
X ∼ N 0 , 0.8 1
2
Covariance σ 122 = ρσ 1σ 2
7
Gaussian Basics
X1 0 1 0.8
X ∼ N 0 , 0.8 1
2
Covariance σ 122 = ρσ 1σ 2
Bivariate Gaussian
X1
X = N ( μ, Σ )
2 Covariance
matrix
µ1 σ 12 σ 122
N , 2 2
µ2 σ σ
21 2
Cov ( X 1 , X 2 ) σ 122
ρ= =
σ 1σ 2 σ 1σ 2
8
Gaussian Basics
X1 0 1 0.8
X ∼ N 0 , 0.8 1
2
9
Gaussian Basics
X1 0 1 0.5
X ∼ N 0 , 0.5 1
2
10
Gaussian Basics
X1 0 1 0.3
X ∼ N 0 , 0.3 1
2
11
Gaussian Basics
4
X1 0 1 0
3 X ∼ N 0, 0 1
2
2
1
X2
0
-1
-2
-3
-4
-4 -3 -2 -1 0 1 2 3 4
X1
12
Looking at a Joint Gaussian PDF in 3D
X1 0 1 0.8
X ∼ N 0 , 0.8 1
2
Observation:
X1 = x1
13
Looking at a Joint Gaussian PDF in 3D
X1 0 1 0.8
X ∼ N 0 , 0.8 1
2
Observation:
X1 = x1
14
Looking at a Joint Gaussian PDF in 3D
Conditional PDF
f X1X 2 ( x1 , x2 ) X1 0 1 0.8
f X 2| X1 ( x2 | x1 ) = X ∼ N 0 , 0.8 1
f X1 ( x1 ) 2
Observation:
X1 = x1
15
Looking at a Conditional Gaussian PDF
f X 2 ( x2 | X 1 = x1 ) = N ( µ2|1 , σ 2|21 )
ρσ 1σ 2
2 ( 1
µ 2|1 = µ2 + x − µ1 )
σ1 Posterior std
2 σ2|1 = ?
2 2
σ 2|1 = σ 2 −
( ρσ 1σ 2 )
σ 12
Conditioning a 2D
Gaussian
Posterior mean
Group Discussion (4 min) μ2|1 = ? 16
Looking at a Conditional Gaussian PDF
f X 2 ( x2 | X 1 = x1 ) = N ( µ2|1 , σ 2|21 )
ρσ 1σ 2
2 ( 1
µ 2|1 = µ2 + x − µ1 )
σ1 Posterior std
2 σ2|1 = 0.6
2 2
σ 2|1 = σ 2 −
( ρσ 1σ 2 )
σ 12
Posterior mean
μ2|1 = 0.8 17
Looking at a Conditional Gaussian PDF
Posterior std
σ2|1 = 0.6
Prior std
σ2 = 1
Prior mean Posterior mean
μ2 = 0 μ2|1 = 0.8 18
Looking Back at 2D Gaussian Contour
X1 0 1 0.8
X ∼ N 0 , 0.8 1
2
19
Looking Back at 2D Gaussian Contour
f X 2 ( x2 | X 1 = x1 ) ? X1 0 1 0.8
X ∼ N 0 , 0.8 1
2
Conditional on
Observation:
X1 = x1
20
Looking Back at 2D Gaussian Contour
f X 2 ( x2 | X 1 = x1 ) X1 0 1 0.8
X ∼ N 0 , 0.8 1
∼ N ( µ2|1 , σ 2|21 ) 2
ρσ 1σ 2
2 ( 1
µ2|1 = µ 2 + x − µ1 )
σ1
= ρ x1 = 0.8
2
σ 2|1 = 2
σ −
( ρσ 1σ 2 )
2 2
σ1
= 1 − ρ 2 = 0.6
Observation:
X1 = x1
21
Looking Back at 2D Gaussian Contour
f X 2 ( x2 | X 1 = x1 ) X1 0 1 0
X ∼ N 0, 0 1
∼ N ( µ2|1 , σ 2|21 ) 2
ρσ 1σ 2
2 ( 1
µ2|1 = µ 2 + x − µ1 )
σ1
= ρ x1 = 0
2
2
σ 2|1 = σ 2 −
( ρσ 1σ 2 )
σ 12
= 1− ρ 2 = 1
Observation:
X1 = x1
22
Marginals and Conditionals of a Multivariate Normal
Suppose X = (X1; X2) is jointly Gaussian with parameters
μ1 Σ11 Σ12
μ = , Σ =
2
μ 21
Σ Σ 22
Then the marginals are given by
f X1 ( x1 ) = N ( μ1 , Σ11 )
f X2 ( x 2 ) = N ( μ 2 , Σ 22 )
and the posterior conditional is given by
f X2 ( x 2 | X1 = x1 ) = N ( μ 2|1 , Σ 2|1 )
−1
μ 2|1 = μ 2 + Σ 21 Σ11 ( x1 − μ1 )
−1
Σ 2|1 = Σ 22 − Σ 21Σ11 Σ12
Murphy, 2022, “Probabilistic Machine Learning: An
Introduction,” Section 3.2.3, Page 82.
23
Looking Back at the Nonlinear Regression Problem
Given training points X = (x1, …, x6)T, we
want to model y = (y1, …, y6)T.
(x1, y1)
(x5, y5) (x6, y6)
?
(x4, y4)
(x2, y2) (x3, y3)
24
Looking Back at the Nonlinear Regression Problem
Given training points X = (x1, …, x6)T, we
want to model y = (y1, …, y6)T.
(x1, y1)
(x5, y5) (x6, y6)
?
(x4, y4)
(x2, y2) (x3, y3)
y1 0 1.00 0.12 0.05 0.00 0.00 0.00
y
2 0 0.12 1.00 0.92 0.46 0.06 0.00
y3 0 0.05 0.92 1.00 0.71 0.15 0.00
∼ N ,
y4 0 0.00 0.46 0.71 1.00 0.54 0.00
y5 0 0.00 0.06 0.15 0.54 1.00 0.07
y6 0 0.00 0.00 0.00 0.00 0.07 1.00 25
Looking Back at the Nonlinear Regression Problem
Given training points X = (x1, …, x6)T, we
want to model y = (y1, …, y6)T.
(x1, y1)
(x5, y5) (x6, y6)
?
(x4, y4)
(x2, y2) (x3, y3)
y1 0 1.00 0.12 0.05 0.00 0.00 0.00
y
2 0 0.12 1.00 0.92 0.46 0.06 0.00 KX,X: Covariance
y3 0 0.05 0.92 1.00 0.71 0.15 0.00
∼ N , matrix of training
y4 0 0.00 0.46 0.71 1.00 0.54 0.00 points
y5 0 0.00 0.06 0.15 0.54 1.00 0.07
y6 0 0.00 0.00 0.00 0.00 0.07 1.00 26
Looking Back at the Nonlinear Regression Problem
Gaussian process prior (assume no noise)
• Mean function: ≡
This example uses a zero-mean prior:
(x1, y1)
0.
(x5, y5) (x6, y6)
?
(x4, y4) ≡
(x2, y2) (x3, y3) 0 1.00 0.92
~ ,
0 0.92 1.00
≡
y1 0 1.00 0.12 0.05 0.00 0.00 0.00
y
2 0 0.12 1.00 0.92 0.46 0.06 0.00
y3 0 0.05 0.92 1.00 0.71 0.15 0.00
∼ N ,
y4 0 0.00 0.46 0.71 1.00 0.54 0.00
y5 0 0.00 0.06 0.15 0.54 1.00 0.07
y6 0 0.00 0.00 0.00 0.00 0.07 1.00 27
Looking Back at the Nonlinear Regression Problem
Gaussian process prior (assume no noise)
• Mean function: ≡
(x1, y1) • Covariance function:
?
(x5, y5) (x6, y6) , ′ ′ ′
(x4, y4)
(x2, y2) (x3, y3) 0 1.00 0.92
~ ,
0 0.92 1.00
y1 0 1.00 0.12 0.05 0.00 0.00 0.00
y
2 0 0.12 1.00 0.92 0.46 0.06 0.00
y3 0 0.05 0.92 1.00 0.71 0.15 0.00
∼ N ,
y4 0 0.00 0.46 0.71 1.00 0.54 0.00
y5 0 0.00 0.06 0.15 0.54 1.00 0.07
y6 0 0.00 0.00 0.00 0.00 0.07 1.00 28
Looking Back at the Nonlinear Regression Problem
Gaussian process prior (assume no noise)
• Mean function: ≡
(x1, y1) • Covariance function:
?
(x5, y5) (x6, y6) , ′ ′ ′
(x4, y4)
(x2, y2) (x3, y3) 0 1.00 0.92
~ ,
0 0.92 1.00
y1 0 1.00 0.12 0.05 0.00 0.00 0.00
y Note: , ′ depicts
2 0 0.12 1.00 0.92 0.46 0.06 0.00 the covariance between
y3 0 0.05 0.92 1.00 0.71 0.15 0.00 and ′ , NOT
∼ N , the covariance between
y4 0 0.00 0.46 0.71 1.00 0.54 0.00 and ′.
y5 0 0.00 0.06 0.15 0.54 1.00 0.07
y6 0 0.00 0.00 0.00 0.00 0.07 1.00 29
Looking Back at the Nonlinear Regression Problem
Gaussian process prior (assume no noise)
• Mean function: ≡
(x1, y1) • Covariance function:
?
(x5, y5) (x6, y6) , ′ ′ ′
(x4, y4)
(x2, y2) (x3, y3) 0 1.00 0.92
~ ,
0 0.92 1.00
y1 0 1.00 0.12 0.05 0.00 0.00 0.00 This example uses the
y squared exponential
2 0 0.12 1.00 0.92 0.46 0.06 0.00 kernel (A.K.A. Radial
y3 0 0.05 0.92 1.00 0.71 0.15 0.00 Basis Function kernel,
∼ N , Gaussian kernel):
y4 0 0.00 0.46 0.71 1.00 0.54 0.00
, ′
y5 0 0.00 0.06 0.15 0.54 1.00 0.07
exp
y6 0 0.00 0.00 0.00 0.00 0.07 1.00 230
Looking Back at the Nonlinear Regression Problem
Gaussian process prior (assume no noise)
• Mean function: ≡
(x1, y1) • Covariance function:
?
(x5, y5) (x6, y6) , ′ ′ ′
(x4, y4)
(x2, y2) (x3, y3) 0 1.00 0.92
~ ,
0 0.92 1.00
y1 0 1.00 0.12 0.05 0.00 0.00 0.00 This example uses the
y squared exponential
2 0 0.12 1.00 0.92 0.46 0.06 0.00 kernel (A.K.A. Radial
y3 0 0.05 0.92 1.00 0.71 0.15 0.00 Basis Function kernel,
∼ N , Gaussian kernel):
y4 0 0.00 0.46 0.71 1.00 0.54 0.00
, ′
y5 0 2
0.00 0.062 0.15 0.54 1.00 0.07
34 exp exp
y6 2
0 0.00 0.00 0.00 0.00 0.07 1.00 231
Looking Back at the Nonlinear Regression Problem
Given training points X = (x1, …, x6)T, we
want to model y = (y1, …, y6)T.
(x1, y1)
Squared exponential kernel (A.K.A. Radial
(x5, y5) (x6, y6) Basis Function kernel, Gaussian kernel):
?
1 2
(x4, y4) K ij = σ exp − 2 ( xi − x j )
2
2l
(x2, y2) (x3, y3)
where σ2 > 0 is the signal variance, and
l > 0 is the length scale.
y1 0 1.00 0.12 0.05 0.00 0.00 0.00
y
2 0 0.12 1.00 0.92 0.46 0.06 0.00 KX,X: Covariance
y3 0 0.05 0.92 1.00 0.71 0.15 0.00
∼ N , matrix of training
y4 0 0.00 0.46 0.71 1.00 0.54 0.00 points
y5 0 0.00 0.06 0.15 0.54 1.00 0.07 (σ2 = 1, l = 1)
y6 0 0.00 0.00 0.00 0.00 0.07 1.00 32
Looking Back at the Nonlinear Regression Problem
Given training points X = (x1, …, x6)T, we
want to model y = (y1, …, y6)T.
(x1, y1)
Squared exponential kernel (A.K.A. Radial
(x5, y5) (x6, y6) Basis Function kernel, Gaussian kernel):
?
1 2
(x4, y4) K ij = σ exp − 2 ( xi − x j )
2
2l
(x2, y2) (x3, y3)
where σ2 > 0 is the signal variance, and
l > 0 is the length scale.
y1 0 1.00 0.12 0.05 0.00 0.00 0.00
y
2 0 0.12 1.00 0.92 0.46 0.06 0.00 KX,X: Covariance
y3 0 0.05 0.92 1.00 0.71 0.15 0.00
∼ N , matrix of training
y4 0 0.00 0.46 0.71 1.00 0.54 0.00 points
y5 0 0.00 0.06 0.15 0.54 1.00 0.07 (σ2 = 1, l = 1)
y6 0 0.00 0.00 0.00 0.00 0.07 1.00 33
Looking Back at the Nonlinear Regression Problem
Given training points X = (x1, …, x6)T, we
want to model y = (y1, …, y6)T.
(x1, y1)
Squared exponential kernel (A.K.A. Radial
(x5, y5) (x6, y6) Basis Function kernel, Gaussian kernel):
?
1 2
(x4, y4) K ij = σ exp − 2 ( xi − x j )
2
2l
(x2, y2) (x3, y3)
where σ2 > 0 is the signal variance, and
l > 0 is the length scale.
y1 0 1.00 0.12 0.05 0.00 0.00 0.00 y ∼ N ( 0, K XX )
y
2 0 0.12 1.00 0.92 0.46 0.06 0.00
y3 0 0.05 0.92 1.00 0.71 0.15 0.00
∼ N ,
y4 0 0.00 0.46 0.71 1.00 0.54 0.00
y5 0 0.00 0.06 0.15 0.54 1.00 0.07
y6 0 0.00 0.00 0.00 0.00 0.07 1.00 34
First, Looking at a New Joint Prior
Given a training dataset = {(xi, yi) | i =
1, …, 6}, where yi = g(xi), and a test point
x*, we want to predict y* = g(x*).
(x1, y1)
(x5, y5) (x6, y6)
y*= ?
(x4, y4)
(x2, y2) (x3, y3)
x* KX,X: Covariance matrix
of training points KX,*: Covariances
between training
points and test point
1.00 0.12 0.05 0.00 0.00 0.00 0.65
0.12 1.00 0.92 0.46 0.06 0.00 0.53
y 0.05 0.92 1.00 0.71 0.15 0.00 0.30
y ∼ N 0, 0.00 0.46 0.71 1.00 0.54 0.00 0.06
* 0.00 0.06 0.15 0.54 1.00 0.07 0.00 K*,*: Variance of
0.07 1.00 0.00
0.00 0.00 0.00 0.00 test point
0.65 0.53 0.30 0.06 0.00 0.00 1.00 35
Then, Looking at the Posterior
Given a training dataset = {(xi, yi) | i =
Are these y values 1, …, 6}, where yi = g(xi), and a test point
equally likely? x*, we want to predict y* = g(x*).
x* KX,X: Covariance matrix
of training points KX,*: Covariances
between training
points and test point
1.00 0.12 0.05 0.00 0.00 0.00 0.65
0.12 1.00 0.92 0.46 0.06 0.00 0.53
y 0.05 0.92 1.00 0.71 0.15 0.00 0.30
y ∼ N 0, 0.00 0.46 0.71 1.00 0.54 0.00 0.06
* 0.00 0.06 0.15 0.54 1.00 0.07 0.00 K*,*: Variance of
0.07 1.00 0.00
0.00 0.00 0.00 0.00 test point
0.65 0.53 0.30 0.06 0.00 0.00 1.00 36
Marginals and Conditionals of a Multivariate Normal
Suppose X = (X1; X2) is jointly Gaussian with parameters Customize for
μ1 Σ11 Σ12 regression problem
μ = , Σ =
2
μ 21
Σ Σ 22 X1 Y
Then the marginals are given by x1 y
f X1 ( x1 ) = N ( μ1 , Σ11 ) x2 y*
f X2 ( x 2 ) = N ( μ 2 , Σ 22 ) μ1 m ( X)
and the posterior conditional is given by μ2 m ( x* )
f X2 ( x 2 | X1 = x1 ) = N ( μ 2|1 , Σ 2|1 ) Σ11 K X,X
−1
μ 2|1 = μ 2 + Σ 21 Σ11 ( x1 − μ1 ) Σ 21 K *,X
−1
Σ 2|1 = Σ 22 − Σ 21Σ11 Σ12 Σ 22 K *,*
Murphy, 2022, “Probabilistic Machine Learning: An
Introduction,” Section 3.2.3, Page 82.
37
Making Prediction at a Test Point
Given a training dataset = {(xi, yi) | i =
1, …, 6}, where yi = g(xi), and a test point
x*, we want to predict y* = g(x*).
3σ* fY* ( y* | X = x* , D ) = N ( µ* , σ *2 )
μ*
µ* = m ( x* ) + K TX ,*K −X1,X ( y − m ( X ) )
σ *2 = K *,* − K TX ,*K −X1,X K X ,*
x*
KX,X
KX,*
1.00 0.12 0.05 0.00 0.00 0.00 0.65
0.12 1.00 0.92 0.46 0.06 0.00 0.53
y 0.05 0.92 1.00 0.71 0.15 0.00 0.30
y ∼ N 0, 0.00 0.46 0.71 1.00 0.54 0.00 0.06
* 0.00 0.06 0.15 0.54 1.00 0.07 0.00 K
0.00 0.00 0.00 0.00 0.07 1.00 0.00 *,*
0.65 0.53 0.30 0.06 0.00 0.00 1.00 38
Making Prediction at a Test Point
Given a training dataset = {(xi, yi) | i =
1, …, 6}, where yi = g(xi), and a test point
x*, we want to predict y* = g(x*).
3σ* fY* ( y* | X = x* , D ) = N ( µ* , σ *2 )
μ*
µ* = m ( x* ) + K TX ,*K −X1,X ( y − m ( X ) )
σ *2 = K *,* − K TX ,*K −X1,X K X ,*
x*
KX,X 1x1 1x6 6x6 6x1
KX,*
Zero-mean Mean function:
Gaussian prior 1.00 0.12 0.05 0.00 0.00 0.00 0.65
assumed to be
0.12 1.00 0.92 0.46 0.06 0.00 0.53
zero (although
y 0.05 0.92 1.00 0.71 0.15 0.00 0.30 not necessary)
y ∼ N 0, 0.00 0.46 0.71 1.00 0.54 0.00 0.06
* 0.00 0.06 0.15 0.54 1.00 0.07 0.00 K
0.00 0.00 0.00 0.00 0.07 1.00 0.00 *,*
0.65 0.53 0.30 0.06 0.00 0.00 1.00 39
Making Prediction at a Test Point
Given a training dataset = {(xi, yi) | i =
1, …, 6}, where yi = g(xi), and a test point
x*, we want to predict y* = g(x*).
3σ* fY* ( y* | X = x* , D ) = N ( µ* , σ *2 )
μ*
µ* = m ( x* ) + K TX ,*K −X1,X ( y − m ( X ) )
σ *2 = K *,* − K TX ,*K −X1,X K X ,*
x*
40
Making Prediction at a Second Test Point
Given a training dataset = {(xi, yi) | i =
1, …, 6}, where yi = g(xi), and a test point
3σ*
x*, we want to predict y* = g(x*).
μ*
fY* ( y* | X = x* , D ) = N ( µ* , σ *2 )
µ* = m ( x* ) + K TX ,*K −X1,X ( y − m ( X ) )
σ *2 = K *,* − K TX ,*K −X1,X K X ,*
x*
41
Making Prediction at a Third Test Point
Given a training dataset = {(xi, yi) | i =
1, …, 6}, where yi = g(xi), and a test point
x*, we want to predict y* = g(x*).
3σ*
μ* fY* ( y* | X = x* , D ) = N ( µ* , σ *2 )
µ* = m ( x* ) + K TX ,*K −X1,X ( y − m ( X ) )
σ *2 = K *,* − K TX ,*K −X1,X K X ,*
x*
42
Making Prediction at Many Test Points
Given a training dataset = {(xi, yi) | i =
1, …, 6}, where yi = g(xi), and a test point
x*, we want to predict y* = g(x*).
fY* ( y* | X = x* , D ) = N ( µ* , σ *2 )
µ* = m ( x* ) + K TX ,*K −X1,X ( y − m ( X ) )
σ *2 = K *,* − K TX ,*K −X1,X K X ,*
43
Making Predictions at Many Test Points
44
Making Predictions at a Finer Grid of Test Points
3σ*
μ*
x*
We created a regressor all based on Gaussian. 45
Adding the True Function for Comparison
We created a regressor all based on Gaussian. 46
Generating Multivariate Gaussian Samples
Monte Carlo Simulation
Mapping a standard uniform sample (xi) to a standard normal
random sample (ui)
1.0 1.0
0.8 CDF mapping 0.8
0.6 0.6
ΦUi (ui)
FXi (xi)
0.4 0.4
0.2 0.2
0 0
ui Ui xi Xi
1.0
fU i (ui)
fX i (xi)
0 ui Ui 00 1 Xi
xi
47
Generating Multivariate Gaussian Samples
Step 1: Draw samples from the standard Gaussian distribution (based
on standard uniform seeds)
ui ~ N ( 0,1)
Note we can also draw samples for an arbitrary Gaussian
xi ~ N ( µ ,σ 2 ) xi ~ µ + σ N ( 0,1) xi = µ + σ ui
Step 2: Draw samples from multivariate standard Gaussian by
repeatedly calling the standard Gaussian generator in Step 1.
u1i 0 1 0
u ~ N , N ( 0, I 2 )
K = LT L 2i 0
0 1
Step 3: Perform the Cholesky decomposition of the covariance matrix
K to obtain the “matrix square root” L of K
x1i µ1 σ 112 σ 122
x ~ N , 2 2 x i = μ + LT ui
2i µ
2 σ 21 σ 22
48
Generating Multivariate Gaussian Samples (Prior)
[–3, 3]
49
Generating Multivariate Gaussian Samples (Posterior)
95% prediction interval
[μ*–3σ*, μ*+3σ*]
50
A Practical Implementation
MATLAB built-in function chol()
returns the transpose of L, i.e., LT.
Algorithm 2.1: Predictions and log marginal likelihood for Gaussian process
regression.
Rasmussen and William, 2006, “Gaussian Processes
for Machine Learning,” Section 2.2, Page 19.
51
Project 1: Kriging implementation (Due March 24)
Part 1 (20%): Hand-code
kriging on 1D regression
problem to replicate the right
plot (style can differ slightly;
six training points will be
provided)
Part 2 (3% Bonus): Produce an uncertainty calibration
curve on a test dataset following the procedure in Sec.
4.1.1 of Nemani et al. 2023
Work in groups of 2–3; submit a brief report and code.
52