Course PDF
Course PDF
räumlicher Prozesse
practical exercises
Edzer Pebesma, Kristina Helle
Institute for Geoinformatics
Univesity of Münster
January 13, 2009
Contents
1 Introduction to R 2
1.1 S, S-Plus and R . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Downloading and installing R . . . . . . . . . . . . . . . . . . . . 3
1.3 Starting R; saving or resoring an R session . . . . . . . . . . . . . 4
1
7 Spatial modelling: introductory matter 17
7.1 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.1.1 meuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.2 Models: the formula interface; linear regression . . . . . . . . . . 17
7.3 Spatial data in R: the sp package . . . . . . . . . . . . . . . . . . 19
7.3.1 Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.3.2 Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.4 Import/export: rgdal . . . . . . . . . . . . . . . . . . . . . . . . 20
8 Geostatistics 21
8.1 Exploratory data analysis . . . . . . . . . . . . . . . . . . . . . . 21
8.2 Simple interpolation algorithms . . . . . . . . . . . . . . . . . . . 21
8.2.1 Trend surface analysis . . . . . . . . . . . . . . . . . . . . 21
8.2.2 Inverse distance interpolation . . . . . . . . . . . . . . . . 22
8.2.3 Thiessen polygons . . . . . . . . . . . . . . . . . . . . . . 23
8.3 Spatial prediction with multiple linear regression . . . . . . . . . 23
8.4 Spatial correlation: the h-scatterplot . . . . . . . . . . . . . . . 26
8.5 Spatial correlation: the variogram cloud . . . . . . . . . . . . . 26
8.6 Spatial correlation: the variogram . . . . . . . . . . . . . . . . . 27
8.7 Sample variogram and variogram model . . . . . . . . . . . . . . 28
8.8 Simple and ordinary kriging . . . . . . . . . . . . . . . . . . . . . 30
8.8.1 Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8.8.2 Influence of Variogram on Prediction . . . . . . . . . . . . 31
8.9 Universal kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
8.10 Regression coefficients . . . . . . . . . . . . . . . . . . . . . . . . 36
8.11 Block kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
8.12 Cokriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
8.13 Cokriging: the undersampled case . . . . . . . . . . . . . . . . . 39
8.14 Kriging errors and confidence intervals . . . . . . . . . . . . . . . 40
8.15 Conditional simulation . . . . . . . . . . . . . . . . . . . . . . . . 41
8.16 Cross validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1 Introduction to R
1.1 S, S-Plus and R
S is a language for data analysis and statistical computing. It has currently
two implementations: one commercial, called S-Plus, and one free, open source
implementation called R. Both have advantages and disadvantages.
S-Plus has the (initial) advantage that you can work with much of the func-
tionality without having to know the S language: it has a large, extensively
developed graphical user interface. R works by typing commands (although the
R packages Rcmdr and JGR do provide graphical user interfaces). Compare typ-
ing commands with sending SMS messages: at first it seems clumsy, but when
you get to speed it is fun.
Compared to working with a graphical program that requires large amounts
of mouse clicks to get something done, working directly with a language has a
couple of notable advantages:
2
after a few commands, you can exactly (i) see and (ii) replicate what
you did, and send the list of commands to someone else to replicate your
analysis;
when you have a large number of analyses that take much time, you can
easily perform them as batches, possibly on a remote computer or on a
cluster of computers;
when you tend to repeat a set of commands often, you can very easily turn
them into an S function, which then extends the language, is available
forever, and can be redistributed. In the world of R, users soon become
developers–vice versa, most developers are users in the first place.
Of course the disadvantage (as you may consider it right now) is that you need
to learn a few new ideas.
Although S-Plus and R are both full-grown statistical computing engines,
throughout this course we use R for the following reasons: (i) it is free (S-Plus
may cost in the order of 10,000 USD), (ii) development and availability of novel
statistical computation methods is much faster in R.
Recommended text books that deal with both R and S-Plus are e.g.:
1. W. Venables, B. Ripley: Modern applied statistics with S; Springer (em-
phasizes on statistical analysis)
2. W. Venables, B. Ripley: S Programming; Springer (emphasizes on pro-
gramming)
3. J. Chambers: Programming with Data; Springer (fundamental; written
by one of the designers of S)
Further good books on R are:
1. P. Dalgaard, Introductory Statistics with R; Springer
3
1.3 Starting R; saving or resoring an R session
You can start an R session on a MS-Windows machine by double-clicking the
R icon, or through the Start menu (Note that this is not true in the ifgi CIP
pools: here you have to find the executable in the right direction on the C:
drive). When you start R, you’ll see the R console with the > prompt, on which
you can give commands.
Another way of starting R is when you have a data file associated with R
(meaning it ends on .RData). If you click on this file, R is started and reads at
start-up the data in this file.
Try this using the meteo.RData file. Next, try the following commands:
> objects()
> summary(meteo)
> a = 1
> objects()
> print(a)
> a
(Note that you can copy and paste the commands including the > to the R
prompt, only if you use ”paste as commands”)
You can see that your R workspace now has two objects in it. You can find
out which class these objects belong by
> class(a)
> class(meteo)
If you want to remove an object, use
> rm(a)
If you want so safe a (modified) work space to a file, you use
> save.image(file = "meteo.RData")
with, e.g., initial conditions y(t0 ) = 0 and e(ti ) a random value from the stan-
dard normal distribution (mean 0, variance 1). Random normal deviates are
generated by the function rnorm.
> x = rnorm(10)
> x
> cumsum(x)
> plot(cumsum(rnorm(1000)), type = "l")
> plot(cumsum(rnorm(1000)), type = "l")
> plot(cumsum(rnorm(1000)), type = "l")
> plot(cumsum(rnorm(10000)), type = "l")
> plot(cumsum(rnorm(1e+05)), type = "l")
4
Exercise 1 Does a RW process in general increase, decrease, or does it not
have a dominant direction?
> var(cumsum(rnorm(10)))
> var(cumsum(rnorm(100)))
> var(cumsum(rnorm(1000)))
> var(cumsum(rnorm(10000)))
> var(cumsum(rnorm(1e+05)))
> var(cumsum(rnorm(1e+06)))
Exercise 2 Explain why a RW processs has a variance that increases with its
length
5
3.2 Fitting a periodic component
We will now fit a periodic component to this data, using a non-linear fit.
> f = function(x) sum((meteo$T.outside - (x[1] + x[2] * sin(pi *
(meteo$hours + x[3])/12)))^2)
> nlm(f, c(0, 0, 0))
We used the function nlm that will minimize the function in its first argument
(f), using the initial guess of the parameter vector in its second argument. The
function f computes the sum of squared residuals:
n
X
(observedi − predictedi )2
i=1
Exercise 4 What is the interpretation of the fitted parameters? (if you need
to guess, modify them and replot)
We can now also plot the residual from this fitted model:
> plot(T.outside - T.per ~ date, meteo, type = "l")
> title("difference from fitted sinus")
yt = φ1 yt−1 + et .
Now try to model the residual process as an AR(5) process, and look at the
partial correlations.
6
(Note that this generates 2 plots)
Exercise 5 Does the an process exhibit temporal correlation for lags larger
than 0?
Exercise 8 Try to explain what you see in the first two plots obtained
Exercise 10 HAND IN: Do a similar analysis for the humidity variable in the
meteo data set. (i) Fit a periodic trend; give the trend equation; (ii) Plot the
humidity data and the fitted model; (iii) detrend the humidity data to obtain
residuals and report for which value of n in an AR(n) model of the model
anomalies (residuals) has the lowest AIC. (iv) Up to which lag does the residual
humidity process exhibit temporal correlation?
7
3.5 Prediction with an AR model
Let us now work with the AR(6) model for the temperature, ignoring the pe-
riodic (diurnal) component. Make sure you have ”plot recording” on (activate
the plot window to get this option).
Exercise 11 Where does, for long-term forecasts, converge the predicted value
to?
Now compare this with prediction using an AR(6) model for the residual
with respect to the daily cycle:
> plot(meteo$T.outside, xlim = c(1, 19970), type = "l")
> x.an = arima(an, c(6, 0, 0))
> x.pr = as.numeric(predict(x.an, 10080)$pred)
> x.se = as.numeric(predict(x.an, 10080)$se)
> hours.all = c(meteo$hours, max(meteo$hours) + (1:10080)/60)
> T.per = 18.2 - 4.9 * sin(pi * (hours.all + 1.6)/12)
> lines(T.per, col = "blue")
> hours.pr = c(max(meteo$hours) + (1:10080)/60)
8
> T.pr = 18.2 - 4.9 * sin(pi * (hours.pr + 1.6)/12)
> lines(9891:19970, T.pr + x.pr, col = "red")
> lines(9891:19970, T.pr + x.pr + 2 * x.se, col = "green")
> lines(9891:19970, T.pr + x.pr - 2 * x.se, col = "green")
> title("predicting 1 week")
Exercise 12 Where does now, for long-term forecasts, converge the predicted
value to?
Exercise 14 Is this fit better? Did optimize get the full 9 (or more) digits
right?
Exercise 15 HAND IN: What is the other approach used? Search for the
explanation of this method (English wikipedia), and briefly describe in your
own words how it works. Explain why this method is used as well.
9
5 Linear and non-linear least squares
As a preparation, go through the course slides; if you missed the lecture you may
want to go through the Gauss-Newton algorithm, e.g. on English wikipedia.
5.1 By hand
First, we will try example by “hand”. Consider the following data set:
x1 x2 y
0 0 5
0 1 7
0 2 6
1 0 5
and try to find the coefficients b0 , b1 and b2 of the linear regression equation
y = b0 + b1 x1 + b2 x2 + e
Exercise 16 HAND IN: report the values of the coeffients found; compute the
residual sum of squares R; report the steps used to compute R and the value
found.
5.2 Linear: by lm
Fit the same model and check your results, found above:
> x1 = c(0, 0, 0, 1)
> x2 = c(0, 1, 2, 0)
> y = c(5, 7, 6, 5)
> d = data.frame(x1 = x1, x2 = x2, y = y)
> summary(lm(y ~ x1 + x2, d))
10
fits and summarized the linear regression model
T.outside = β0 + β1 hours + e,
Here, β0 (intercept) and β1 (slope) are implicit, and not named in the formula.
For non-linear models, we need to name coefficients explicitly, as they may in
principle appear anywhere. In the example below, the now familiar sinus model
is refitted using function nls, which uses Gauss-Newton optimization, and the
coefficients are named a, b and c in the formula.
Exercise 17 What does the third argument to nls contain, and why does it
change the outcome?
5.5 Iteration
The consecutive steps of the optimization are shown when a trace is requested.
> nls(T.outside ~ a + b * sin(pi * (hours + c)/12), meteo, c(a = 0,
b = 1, c = 0), trace = TRUE)
11
Exercise 22 Why is more than one step needed?
Exercise 24 Which coefficients are, with 95% confidence, different from zero?
5.7 Prediction
Try
> plot(T.outside ~ hours, meteo)
> t.nls = nls(T.outside ~ a + b * sin(pi * (hours + c)/12), meteo,
c(a = 0, b = 1, c = 2))
> lines(predict(t.nls) ~ hours, meteo, col = "red")
to see how predictions are generated for the data points. Interestingly, the two
commands
> predict(t.nls)[1:10]
> predict(t.nls, se.fit = TRUE)[1:10]
> predict(t.nls, interval = "prediction")[1:10]
all do the same. Although surprising, it is documented behaviour; see ?pre-
dict.nls. Prediction errors or intervals are much harder to obtain for non-linear
models than for linear models.
12
6 Metropolis and Simulated Annealing
6.1 Fun with Metropolis
The Metropolis algorithm is a simplified version of the Metropolis-Hastings al-
gorithm (search English wikipedia), and provides a search algorithm for finding
global optima, and sampling the (posterior) parameter distribution, given the
data.
Let P (θ) be the probability that parameter value θ has the right value, given
the data.
The metropolis algorithm considers proposal values θ0 for the parameter
vector, and accepts if, given the current value θi , the ratio
P (θ0 )
α=
P (θi )
either when
13
class(out) = c("Metropolis", "matrix")
out
}
Then, insert the function in your working environment by copying and pasting
the whole block.
To understand why the ratio is computed like it is here, have a look at
, the likelihood function for the normal distribution. Also recall that using
exp(a)/exp(b) = exp(a-b) is needed for numerical stability.
Let’s try and see if it works. First recollect the non-linear least squares fit
with nls and use that as a starting condition:
> t.nls = nls(T.outside ~ a + b * sin(pi * (hours + c)/12), meteo,
c(a = 0, b = 1, c = 2))
> f = function(x) {
x[1] + x[2] * sin(pi * (hours + x[3])/12)
}
> temp = meteo$T.outside
> hours = meteo$hours
> coef(t.nls)
> out = Metropolis(coef(t.nls), c(0.1, 0.1, 0.1), temp, f, n = 5000)
> out[1:10, ]
> out[4990:5000, ]
We can try to see what happened to the three parameters if we plot them.
For this we will make a little dedicated plot function, called plot.Metropolis.
It will be automatically called for objects of class Metropolis
> class(out)
Here’s the plot function:
> plot.Metropolis = function(x, ...) {
oldpar = par()
par(mfrow = c(ncol(x), 1), mar = rep(2, 4))
apply(x, 2, function(x) plot(x, type = "l", ...))
par(oldpar)
}
that you should load in the environment. Try it with:
> plot(out)
Note that now your own plot.Metropolis has been called, because Metropolis
is the class of out.
Exercise 25 HAND IN: which percentage of the proposals was rejected, and
how could you decrease this percentage?
Compare the values found with those obtained with nls e.g. by:
> confint(t.nls)
> t(apply(out, 2, quantile, probs = c(0.025, 0.975)))
14
6.3 The effect of sigma
Set the plot recording to “on” (activate the plot window, use the menu). Now
you can browse previous plots with PgUp and PgDown, when the plot window
is active.
Let’s first try a very small sigma.
> out1 = Metropolis(coef(t.nls), rep(0.3, 3), temp, f, n = 5000)
> plot(out1)
> out2 = Metropolis(coef(t.nls), rep(0.1, 3), temp, f, n = 5000)
> plot(out2)
> out3 = Metropolis(coef(t.nls), rep(0.05, 3), temp, f, n = 5000)
> plot(out3)
> out4 = Metropolis(coef(t.nls), rep(0.01, 3), temp, f, n = 5000)
> plot(out4)
Exercise 26 HAND IN: (i) which chain becomes stationary, meaning that
it fluctuates sufficiently for a long time over the same area? (ii) which of the
chains do not mix well?
15
> out = Metropolis(c(0, 0, 0), rep(0.3, 3), temp, f, n = 5000)
> plot(out)
> out = Metropolis(c(0, 0, 0), rep(0.1, 3), temp, f, n = 5000)
> plot(out)
> out = Metropolis(c(0, 0, 0), rep(0.03, 3), temp, f, n = 5000)
> plot(out)
The burn-in period is the period that the chain needs to go from the initial
values to the region where it becomes stationary.
Exercise 28 HAND IN: for the different values of sigma2, how long does it
take before the chain becomes stationary?
Exercise 29 HAND IN: How do you compute (give the R command) the mean
and summary statistics for a chain ignoring the burn-in values?
Suppose we want to find the mean, scale and phase of the temperature data,
and are looking for the model that best fits in terms of least absolute deviations.
Define the function
Do the
16
Exercise 32 HAND IN: explain why the parameters are equal/not equal
Exercise 33 HAND IN: use simulated annealing to find the least squares
solution; give the function used, and compare the resulting parameter values
with the least absolute error solution.
> library(gstat)
> data(meuse)
> objects()
> summary(meuse)
7.1.1 meuse
The meuse data set is a data set collected from the Meuse floodplain. The heavy
metals analyzed are copper, cadmium, lead and zinc concentrations, extracted
from the topsoil of a part of the Meuse floodplain, the area around Meers and
Maasband (Limburg, The Netherlands). The data were collected during field-
work in 1990 (Rikken and Van Rijn, 1993; Burrough and McDonnell, 1998).
Load library gstat by
> library(gstat)
and read the documentation for the data set by ?meuse)
17
> library(gstat)
> data(meuse)
> zinc.lm = lm(zinc ~ dist, meuse)
> class(zinc.lm)
> summary(zinc.lm)
Do the following yourself:
> plot(zinc.lm)
Besides the variety of plots you now obtain, there are many further custom
options you can set that can help analysing these data. When you ask help by
?plot, it does not provide very helpful information. To get help on the plot
method that is called when plotting an object of class lm, remember that the
function called is plot.lm. Read the help of plot.lm by ?plot.lm . You can
customize the plot call, e.g. by
> plot(zinc.lm, add.smooth = FALSE)
In this call, the name of the second function argument is added because the
argument panel is in a very late position. Adding the FALSE as a positional
argument, as in
Exercise 34 HAND IN: For this regression model, which of the following
assumptions underlying linear regression are violated (if any):
none
residuals are heteroscedastic (i.e., their variance is not constant over the
range of fitted values)
residuals are not normally distributed (their distribution is for example
non-symmetric)
residuals are heteroscedastic and not normally distributed
and explain how you can tell that this is the case.
18
7.3 Spatial data in R: the sp package
The R package sp provides classes and methods for spatial data; currently it
can deal with points, grids, lines and polygons. It further provides interfaces
to various GIS functions such as overlay, projection and reading and writing of
external GIS formats.
7.3.1 Points
Try:
> library(sp)
> data(meuse)
> class(meuse)
> summary(meuse)
> coordinates(meuse) = c("x", "y")
> class(meuse)
> summary(meuse)
> sp.theme(TRUE)
> spplot(meuse, "zinc", key.space = "right")
Here, the crucial argument is the coordinates function: it specifies for the
meuse data.frame which variables are spatial coordinates. In effect, it replaces
the data.frame meuse which has ”just” two columns with coordinates with a
structure (of class SpatialPointsDataFrame) that knows which columns are
spatial coordinates. As a consequence, spplot knows how to plot the data in
map-form. If there is any function or data set for which you want help, e.g.
meuse, read the documentation: ?meuse , ?spplot etc.
A shorter form for coordinates is by assigning a formula, as in
> data(meuse)
> coordinates(meuse) = ~x + y
> coordinates(meuse)[1:10, ]
> spplot(meuse, "zinc", key.space = "right")
The function coordinates without assignment retrieves the matrix with coor-
dinate values.
The plot now obtained shows points, but without reference where on earth
they lie. You could e.g. add a reference by showing the coordinate system:
> spplot(meuse, "zinc", key.space = "right", scales = list(draw = TRUE))
but this tells little about what we see. As another option, you can add geographic
reference by e.g. lines of rivers, or administrative boundaries. In our example,
we prepared the coordinates of (part of) the river Meuse river boundaries, and
will add them as reference:
> data(meuse.riv)
> dim(meuse.riv)
> meuse.riv[1:10, ]
> meuse.sp = SpatialPolygons(list(Polygons(list(Polygon(meuse.riv)),
"meuse.riv")))
> meuse.lt = list("sp.polygons", meuse.sp, fill = "grey")
> spplot(meuse, "zinc", key.space = "right", sp.layout = meuse.lt)
19
SpatialRings, Srings and Sring create an sp polygon object from a simple
matrix of coordinates; the sp.layout argument contains map elements to be
added to an spplot plot.
Note that the points in the plot partly fall in the river; this may be attributed
to one or more of the following reasons: (i) the river coordinates are not correct,
(ii) the point coordinates are not correct, (iii) points were taken on the river
bank when the water was low, whereas the river boundary indicates the high
water extent of the river.
7.3.2 Grids
Try:
> data(meuse.grid)
> class(meuse.grid)
> coordinates(meuse.grid) = c("x", "y")
> class(meuse.grid)
> gridded(meuse.grid) = TRUE
> class(meuse.grid)
> summary(meuse.grid)
> meuse.lt = list(riv = list("sp.polygons", meuse.sp, fill = "grey"),
pts = list("sp.points", meuse, pch = 3, cex = 0.5, col = "black"))
> spplot(meuse.grid, sp.layout = meuse.lt, key.space = "right")
Here, gridded(meuse.grid) = TRUE promotes the (gridded!) points to a struc-
ture which knows that the points are on a regular grid. As a consequence they
are drawn with filled rectangular symbols, instead of filled circles. (Try the same
sequence of commands without the call to gridded() if you are curious what
happens if meuse.grid were left as a set of points).
For explanation on the sp.layout argument, read ?spplot; much of the
codes in them (pch, cex, col) are documented in ?par.
Note that spplot plots points on top of the grid, and the grid cells on top
of the polygon with the river. (When printing to pdf, transparency options can
be used to combine both.)
20
> coordinates(meuse) = ~x + y
> proj4string(meuse) = CRS("+init=epsg:28992")
> proj4string(meuse)
> meuse.ll = spTransform(meuse, CRS("+proj=longlat"))
> writeOGR(meuse.ll, "meuse.kml", "meuse.kml", driver = "KML")
and import it in google earth.
8 Geostatistics
These practical exercises cover the use of geostatistical applications in environ-
mental research. They give an introduction to exploratory data analysis, sample
variogram calculation and variogram modelling and several spatial interpolation
techniques. Interpolation methods described and applied are inverse distance
interpolation, trend surface fitting, thiessen polygons, ordinary (block) kriging,
stratified (block) kriging and indicator (block) kriging.
The gstat R package is used for all geostatistical calculations. See:
E.J. Pebesma, 2004. Multivariable geostatistics in S: the gstat package.
Computers & Geosciences 30: 683-691 .
d 53 54 55 59 82
21
prediction in most cases because higher powers of large coordinate values result
in large numerical errors. The krige function in library gstat scales coordinates
and computes trend surface up to degree 3, using scaled coordinates.
> library(gstat)
> data(meuse.grid)
> coordinates(meuse.grid) = ~x + y
> gridded(meuse.grid) = TRUE
> meuse.grid$tr1 = krige(log(zinc) ~ 1, meuse, meuse.grid, degree = 1)$var1.pred
> meuse.grid$tr2 = krige(log(zinc) ~ 1, meuse, meuse.grid, degree = 2)$var1.pred
> meuse.grid$tr3 = krige(log(zinc) ~ 1, meuse, meuse.grid, degree = 3)$var1.pred
> spplot(meuse.grid, c("tr1", "tr2", "tr3"), sp.layout = meuse.lt,
main = "log(zinc), trend surface interpolation")
Function surf.ls from library spatial (described in the MASS book) fits
trends up to order 6 with scaled coordinates.
As you can see, ?krige does not provide help on argument degree. However,
tells that any remaining arguments (...) are passed to function gstat, so look
for argument degree on ?gstat.
22
The argument set is a list because other parameters can be passed to gstat
using this list as well.
Exercise 37 HAND IN: With a larger inverse distance power (idp), the
weight of the nearest data point
a becomes larger
b becomes smaller
c does not change
explain in words why this is the case.
> library(gstat)
> library(sp)
> lzn.tp = krige(log(zinc) ~ 1, meuse, meuse.grid, nmax = 1)
> image(lzn.tp["var1.pred"])
> points(meuse, pch = "+", cex = 0.5)
> cc = coordinates(meuse)
> library(tripack)
> plot(voronoi.mosaic(cc[, 1], cc[, 2]), do.points = FALSE, add = TRUE)
> title("Thiessen (or Voronoi) polygon interpolation of log(zinc)")
23
> lzn.lm = lm(log(zinc) ~ sqrt(dist), meuse)
> summary(lzn.lm)
> plot(lzn.lm)
> plot(log(zinc) ~ sqrt(dist), meuse)
> abline(lzn.lm)
> lzn.pr = predict(lzn.lm, meuse.grid, se.fit = TRUE)
> meuse.grid$lzn.fit = lzn.pr$fit
> spplot(meuse.grid, "lzn.fit", sp.layout = meuse.lt)
> meuse.grid$se.fit = lzn.pr$se.fit
> spplot(meuse.grid, "se.fit", sp.layout = meuse.lt)
Exercise 38 Why are prediction standard errors on the log-scale largest for
small and large distance to the river?
a Because variability is smallest for intermediate distance to river
b because the uncertainty in the regression slope is most prevalent when the
regressor distance takes extreme values
c because we have too few degrees of freedom for regression (i.e., too few
observations) overall
d because we have too little data close to the river and far away from the
river
24
Note that all these prediction intervals refer to prediction of single observa-
tions; confidence intervals for mean values (i.e., the regression line alone) are
obtained by specifying interval = "confidence".
a wider
b narrower
c of approximately equal width
Exercise 40 Why is a box-and whisker plot drawn for this plot, and not a
scatter plot?
ffreq is a categorical variable with numeric levels
Exercise 42 What is needed for flood frequency to use it for mapping zinc?
25
> lzn.lm2 = lm(log(zinc) ~ ffreq, meuse)
> summary(lzn.lm2)
> pr = predict(lzn.lm2, meuse.grid, interval = "prediction")
> meuse.grid$fit = pr[, 1]
> meuse.grid$upper = pr[, 3]
> meuse.grid$lower = pr[, 2]
> spplot(meuse.grid, c("fit", "lower", "upper"), sp.layout = meuse.lt,
layout = c(3, 1))
Exercise 45 HAND IN: how many point pairs can you form from 155 obser-
vations?
Exercise 46 HAND IN: The correlation steadily drops with larger distance.
Why?
Exercise 47 How would you interpret the negative correlation for the largest
lag?
26
against
h = |xi − xj |
in words: for each point pair z(xi ), z(xj ) half the squared difference in measured
values against the spatial separation distance.
Plot a variogram cloud for the log-zinc data:
> plot(variogram(log(zinc) ~ 1, meuse, cloud = TRUE))
Exercise 50 How many points does the variogram cloud contain if the sepa-
ration distance is not limited?
Identify a couple of points in the cloud, and plot the pairs in a map by
repeating the following commands a couple of times. The following command
lets you select point pairs by drawing polygons around them. Please note the
use of the different keys on the mouse.
> out = plot(variogram(log(zinc) ~ 1, meuse, cloud = TRUE), digitize = TRUE)
> plot(out, meuse)
27
8.7 Sample variogram and variogram model
To detect, or model spatial correlation, we average the variogram cloud values
over distance intervals (bins):
Exercise 54 HAND IN: What is the problem if we set the cutoff value to
e.g. a value of 500 and width to 50?
For kriging, we need a suitable, statistically valid model fitted to the sample
variogram. The reason for this is that kriging requires the specification of semi-
variances for any distance value in the spatial domain we work with. Simply
connecting sample variogram points, as in
> lzn.vgm = variogram(log(zinc) ~ 1, meuse, cutoff = 1000, width = 50)
> plot(gamma ~ dist, lzn.vgm)
> lines(c(0, lzn.vgm$dist), c(0, lzn.vgm$gamma))
will not result in a valid model. Instead, we will fit a valid parametric model.
Some of the valid parametric models are shown by e.g.
> show.vgms()
These models may be combined, and the most commonly used are Exponential
or Spherical models combined by a Nugget model.
We will try a spherical model:
28
> lzn.vgm = variogram(log(zinc) ~ 1, meuse)
> plot(lzn.vgm)
> lzn.mod = vgm(0.6, "Sph", 1000, 0.05)
> plot(lzn.vgm, lzn.mod)
The variogram model ("Sph" for spherical) and the three parameters were
chosen by eye. We can fit the three parameters also automatically:
> lzn.vgm = variogram(log(zinc) ~ 1, meuse)
> lzn.mod = fit.variogram(lzn.vgm, vgm(0.6, "Sph", 1000, 0.05))
> lzn.mod
> plot(lzn.vgm, lzn.mod)
but we need to have good ”starting” values, e.g. the following will not yield a
satisfactory fit:
> lzn.misfit = fit.variogram(lzn.vgm, vgm(0.6, "Sph", 10, 0.05))
> plot(lzn.vgm, lzn.misfit)
Exercise 58 HAND IN: Compute the variogram for four different direction
ranges (hint: look into argument alpha for function variogram) and plot it in
a single plot. Is the zinc process anisotropic? Try to explain what you find in
terms of what you know about the process.
29
8.8 Simple and ordinary kriging
Ordinary kriging is an interpolation method that uses weighted averages of all,
or a defined set of neighbouring, observations. It calculates a weighted average,
where weights are chosen such that (i) prediction error variance is minimal,
and (ii) predictions are unbiased. The second requirement results in weights
that sum to one. Simple kriging assumens the mean (or mean function) to be
known; consequently it drops the second requirement and the weights do not
have to sum to unity. Kriging assumes the variogram to be known; based on
the variogram (or the covariogram, derived from the variogram) it calculates
prediction variances.
To load everything you need into your workspace do
> library(gstat)
> sp.theme(TRUE)
> data(meuse.riv)
> meuse.sp = SpatialPolygons(list(Polygons(list(Polygon(meuse.riv)),
"meuse.riv")))
> meuse.lt = list(riv = list("sp.polygons", meuse.sp, fill = "grey"),
pts = list("sp.points", meuse, pch = 3, cex = 0.5, col = "black"))
> data(meuse.grid)
> coordinates(meuse.grid) = c("x", "y")
> gridded(meuse.grid) = TRUE
> data(meuse)
> coordinates(meuse) = ~x + y
> lzn.vgm = variogram(log(zinc) ~ 1, meuse)
> lzn.mod = fit.variogram(lzn.vgm, vgm(0.6, "Sph", 1000, 0.05))
8.8.1 Weights
One main advantage of kriging compared to e.g. inverse distance weighting is
that it cares about clustering of the points where the values for prediction come
from.
Exercise 59 Imagine you want to predict a value by three others. All three
points have the same distance from the point to predict, two of them are very
close together and the third one is on the opposite side of the point to predict.
Which weighting would you choose and why?
a Equal weights on all points.
b Put more weight on the single point than on the two clustered ones.
c Put less weight on the single point than on the two clustered ones.
Download the EZ-Kriging tool to see how weighting is done by kriging (down-
load the .zip file and unzip it, than it can be started by clicking the icon) Use
it to answer the following questions.
30
a Points closer to the point to predict have more weight.
b Clustered points have more weight.
Can you imagine how the interpolated map looks like? Try pure nugget
effect kriging on the log(zinc) data from meuse:
> vgm.nugget = vgm(psill = 0, model = "Exp", range = 1, nugget = 0.64)
> lzn.krige.nugget = krige(log(zinc) ~ 1, meuse, meuse.grid, vgm.nugget)
> spplot(lzn.krige.nugget, main = "Kriging with pure nugget effect")
Exercise 62 HAND IN: How are clustered points weighted? How important is
the distance to the point to predict? Does pure nugget effect fit data with strong
spatial correlation or data where values in different points are uncorrelated?
Exercise 63 What are the predicted values? Explain why this makes sense
here!
Exercise 65 HAND IN: Describe the main differences of the kriged map for
small/big range and compare the errors?
31
Given that lzn.mod a suitable variogram model contains (see above), we can
apply ordinary kriging by
> lzn.ok = krige(log(zinc) ~ 1, meuse, meuse.grid, lzn.mod)
> spplot(lzn.ok, "var1.pred", main = "log(zinc), ordinary kriging")
For simple kriging, we have to specify in addition the known mean value,
passed as argument beta:
> lzn = krige(log(zinc) ~ 1, meuse, meuse.grid, lzn.mod, beta = 5.9)
> lzn$sk = lzn$var1.pred
> lzn$ok = lzn.ok$var1.pred
> spplot(lzn, c("ok", "sk"), main = "log(zinc), ordinary and simple kriging")
Comparing the kriging standard errors:
> lzn$sk.se = sqrt(lzn$var1.var)
> lzn$ok.se = sqrt(lzn.ok$var1.var)
> meuse.lt$pts$col = "green"
> spplot(lzn, c("ok.se", "sk.se"), sp.layout = meuse.lt, main = "log(zinc), ordinary and si
Exercise 66 HAND IN: Why is the kriging standard error not zero in grid
cells that contain data points?
a kriging standard errors are not zero on data points
b the data points do not coincide exactly with grid cell centres
c kriging standard errors do not depend on data locations, they only depend
on the variability of measured values
Explain your answer briefly.
Local kriging is obtained when, instead of all data points, only a subset of
points nearest to the prediction location are selected. If we use only the nearest
5 points:
> lzn.lok = krige(log(zinc) ~ 1, meuse, meuse.grid, lzn.mod, nmax = 5)
> lzn$lok = lzn.lok$var1.pred
> spplot(lzn, c("lok", "ok"), main = "lok: local (n=5); ok: global")
Exercise 67 When we increase the value of nmax, the differences between the
local and global kriging will
a become smaller
b become larger
c remain the same
32
> lzn.lok2 = krige(log(zinc) ~ 1, meuse, meuse.grid, lzn.mod, maxdist = 200)
> lzn.ok$lok2 = lzn.lok2$var1.pred
> lzn.ok$ok = lzn.ok$var1.pred
> spplot(lzn.ok, c("lok2", "ok"), main = "lok: local (maxdist=200); ok: global",
sp.layout = meuse.lt, scales = list(draw = TRUE))
Exercise 68 How can the gaps that now occur in the local neighbourhood
kriging map be explained? At these prediction locations
a the kriging neighbourhood is empty
b the kriging neighbourhood only contains a single observation
Exercise 69 HAND IN: In the areas with gaps in the ordinary kriging map,
simple kriging yields
a the simple kriging mean value (5.9)
b a compromise between the mean value (5.9) and the nearest observation
c a compromise between the mean value (5.9) and the nearest two or more
observations
Explain why this is the case.
33
> n.obtained = dim(coordinates(pts))[1]
> n.obtained
Create nonsense data from a standard normal distribution; interpolate and plot
them:
> dummy = SpatialPointsDataFrame(pts, data.frame(z = rnorm(n.obtained)))
> system.time(dummy.ok <- krige(z ~ 1, dummy, meuse.grid, vgm(1,
"Exp", 300)))
> system.time(dummy.lok <- krige(z ~ 1, dummy, meuse.grid, vgm(1,
"Exp", 300), nmax = 20))
> dummy.ok$var1.local = dummy.lok$var1.pred
> spplot(dummy.ok, c("var1.pred", "var1.local"), sp.layout = list("sp.points",
dummy, col = "black", cex = 0.2))
and compare the run time for global kriging to local kriging with the nearest 20
observations.
Although not needed here, for exact timings, you can use system.time, but
then replace dummy.int = ... with dummy.int <- ... in the expression
argument.
Z(s) = m + e(s)
with Z(s) the observed variable Z at spatial location s, m a constant but un-
known mean value and e(s) an intrinsically stationary residual, we can write
stratified kriging as j models
Zj (s) = mj + ej (s),
j referring to the stratum: each stratum has a different mean and different
parameters describing the spatial correlation of ej , and ej (s) and ek (s) are
independent if j 6= k.
Universal kriging extends the ordinary kriging model by replacing the con-
stant mean value by a non-constant mean function:
34
with βj the trend (i.e., regression) coefficients, and Xj (s) p known predictor
functions (independent variables, regressors).
The hope is that the predictor functions carry information in them that
explains the variability of Z(s) to a considerable extent, in which case the resid-
ual e(s) will have lower variance and less spatial correlation compared to the
ordinary kriging case. To see that this model extends ordinary kriging, take
p = 0 and note that for that case β0 = m. Another simplification is that if
e(s) is spatially independent, the universal kriging model reduces to a multiple
linear regression model. One problem with universal kriging is that we need the
variogram of the residuals, without having measured the residuals.
As we’ve seen before, we can predict the zinc concentrations fairly well from
the sqrt-transformed values of distance to river:
> plot(log(zinc) ~ sqrt(dist), as.data.frame(meuse))
> lzn.lm = lm(log(zinc) ~ sqrt(dist), as.data.frame(meuse))
> summary(lzn.lm)
> abline(lzn.lm)
To compare the variograms from raw (but log-transformed) data and regres-
sion residuals, look at the conditioning plot obtained by
> g = gstat(id = "raw data", formula = log(zinc) ~ 1, data = meuse)
> g = gstat(g, id = "residuals", formula = log(zinc) ~ sqrt(dist),
data = meuse)
> plot(variogram(g, cross = F), scales = list(relation = "same"),
layout = c(2, 1))
Exercise 71 How is the residual standard error from the regression model
above (lzn.lm) related to the variogram of the residuals:
a it is half the mean semivariance of the raw data and that of the residuals
b the square of the residual standard error (the residual variance) equals the
sill of the residual variogram
c it equals the range of the residual variogram
d it equals the sill of the residual variogram
d it equals the nugget of the residual variogram
e the residual variance (approximately) equals the nugget of the variogram
Exercise 72 HAND IN: Why is the sill of the residual variogram much lower
than the sill of the variogram for the raw data?
35
Now model the residual variogram with an exponential model, say zn.res.m,
and apply universal kriging. Compare the maps with predictions from universal
kriging and ordinary kriging in a single plot (with a single scale), using spplot.
Now print the maps with prediction errors for ordinary kriging and universal
kriging with a single scale. See if you can answer the previous two questions in
the light of the differences between prediction errors.
36
Exercise 75 What is calculated here (two correct answers)?
a The universal kriging trend for log(zinc) at the points in newdat .
b The universal kriging prediction at these points.
c The universal kriging prediction at these points using only sqrt(dist) for
prediction.
d The value for the parameter β̂ of the kriging trend.
Compare the above obtained regression coefficients with those obtained from
function lm
> summary(lm(log(zinc) ~ sqrt(dist), meuse))
Exercise 77 HAND IN: explain why both sets of fitted coefficients, and their
standard errors, are different
Exercise 78 HAND IN: Describe what happens if we change from point pre-
diction to block prediction of size 40, in terms of predictions and in terms of
prediction standard errors?
37
Exercise 79 HAND IN: Describe what happens if we change from block pre-
diction for blocks of size 40 m to block prediction for blocks of size 400 m, in
terms of predictions and in terms of prediction standard errors?
Exercise 81 HAND IN: give the four corner points of the block mentioned in
the previous exercise
Compare the lowest standard error found with the standard error of the
mean in a simple linear regression (with a mean only):
> summary(lm(log(zinc) ~ 1, meuse))
Exercise 82 HAND IN: why is the block mean value for the mean of the com-
plete area different from the sample mean value of the log(zinc) observations?
8.12 Cokriging
> g <- gstat(NULL, "logCd", log(cadmium) ~ 1, meuse)
> g <- gstat(g, "logCu", log(copper) ~ 1, meuse)
> g <- gstat(g, "logPb", log(lead) ~ 1, meuse)
> g <- gstat(g, "logZn", log(zinc) ~ 1, meuse)
> g
> vm <- variogram(g)
> vm.fit <- fit.lmc(vm, g, vgm(1, "Sph", 800, 1))
> vm.fit
> plot(vm, vm.fit)
38
Exercise 83 Which of the parameters (nugget, partial sill, range, model type)
were fitted in the above call to fit.lmc?
Do a cokriging:
Exercise 85 HAND IN: explain why all the prediction error covariances are
positive
Exercise 86 HAND IN: give the R commands to compute this index and to
show a map of it. Also give the commands to compute the standard error of
the index, taking the error covariances into account.
Exercise 87 HAND IN: how much does the standard error of this toxicity
index change as a result of taking the covariances into account, as opposed to
ignoring them?
39
> g = gstat(NULL, "logZinc", log(zinc) ~ 1, meuse1)
> g = gstat(g, "logLead", log(lead) ~ 1, meuse)
> v12 = variogram(g)
> plot(v12)
> v12.fit = fit.lmc(v12, g, vgm(1, "Sph", 900, 1))
> v12.fit
> class(v12.fit)
> plot(v12, v12.fit)
cokrige:
> v12.pr = predict(v12.fit, meuse.grid)
> spplot(v12.pr, c("logZinc.pred", "logLead.pred"))
> spplot(v12.pr, c("logZinc.var", "logLead.var", "cov.logZinc.logLead"))
compare the differences of kriging the 50 observations of zinc with the cok-
riging of from 50 zinc and 155 lead observations:
> v12.fit[1]
> v1.pr = predict(v12.fit[1], meuse.grid)
> lt = list(list("sp.points", meuse2, pch = 3), list("sp.points",
meuse1, pch = 2))
> v12.pr$diffpr = v12.pr$logZinc.pred - v1.pr$logZinc.pred
> spplot(v12.pr["diffpr"], sp.layout = lt)
> v12.pr$diffse = sqrt(v12.pr$logZinc.var) - sqrt(v1.pr$logZinc.var)
> spplot(v12.pr["diffse"], sp.layout = lt)
Exercise 88 HAND IN: why are the standard errors smaller in the cokriging
case?
40
Exercise 89 HAND IN: for which fraction of the total area is zinc concentra-
tion lower than 500 ppm, with 95% confidence? Give the number of grid cells
for which this is the case, and describe where in the area they are situated.
For a single threshold, we can vary the confidence level by specifying alpha:
> lzn.ok$ci001 = ci(log(500), lzn.ok$var1.pred, lzn.ok$var1.var,
0.01)
> lzn.ok$ci005 = ci(log(500), lzn.ok$var1.pred, lzn.ok$var1.var,
0.05)
> lzn.ok$ci010 = ci(log(500), lzn.ok$var1.pred, lzn.ok$var1.var,
0.1)
> lzn.ok$ci050 = ci(log(500), lzn.ok$var1.pred, lzn.ok$var1.var,
0.5)
> spplot(lzn.ok, c("ci001", "ci005", "ci010", "ci050"), col.regions = c("green",
"grey", "red"))
Exercise 90 HAND IN: which of these four maps has the highest confidence
level, and what is this level? Why are there gray areas on map ci050 at all?
Consider the object zn, created under the section of block kriging.
> zn$ok_point = ci(log(500), lzn.ok$var1.pred, lzn.ok$var1.var,
0.05)
> zn$uk_point = ci(log(500), zn$pointkr, zn$pointkr.se^2, 0.05)
> zn$uk_block40 = ci(log(500), zn$block40, zn$block40.se^2, 0.05)
> zn$uk_block400 = ci(log(500), zn$block400, zn$block400.se^2,
0.05)
> spplot(zn, c("ok_point", "uk_point", "uk_block40", "uk_block400"),
col.regions = c("green", "grey", "red"))
Exercise 91 HAND IN: compare the fraction classified as lower and higher
between ok_point and uk_point, and try to explain the difference
Exercise 92 HAND IN: compare the fraction classified as lower and higher
between uk_point, uk_block40 and uk_block400, and try to explain the dif-
ference
41
8.16 Cross validation
For cross validating kriging resulst, the functions krige.cv (univariate) and
gstat.cv (multivariate) can be used.
Exercise 94 HAND IN: perform cross validation for three diffferent (e.g. pre-
viously evaluated) kriging types/models using one of these two functions. Com-
pare the resulst at least based on (i) mean error, (ii) mean square error, (iii)
mean and variance of the z-score, and (iv) map of residuals (use function bubble
for the residual variable). Compare on the results, and compare them to the
ideal value for the evaluated statistics/graph.
If you want to safe graphs in an efficient way (for once, a bitmap), you could
use e.g.
42
> jpeg("diff1.jpg")
> out = diffuse()
> dev.off()
Exercise 96 What is the meaning of the different parameters (dt, dx, n.end,
x.end)? Try them! What can be seen on these plots compared to the initial
ones?
Exercise 97 [HAND IN:] What happens if dt is close but below 0.5, at 0.5,
above 0.5? Find plots to show it, describe and explain with help of them.
Exercise 98 How could the problem be solved (hint: change other parame-
ters)?
Try
> out = diffuse(dt = 0.4, x.end = 100, n.end = 10000, dirichlet = TRUE)
> out = diffuse(dt = 0.4, x.end = 100, n.end = 10000, dirichlet = FALSE)
Exercise 99 What is the total (sum) of out, after the last time step, for the
last two commands? What has happened to the material released?
Exercise 100 What does the Dirichlet boundary condition mean for the
values?
43
Exercise 101 [HAND IN:] In case of dirichlet =TRUE: What happens to the
released material until the last time step (calculate the total sum of out)? What
has happened at earlier time steps (calculate a few numbers to describe it)?
Exercise 102 [HAND IN:] In case of dirichlet =FALSE: What are the condi-
tions at the boundaries? What happens to the released material (total sum of
out)?
Change one parameter at each step, the others shall stay in the original state.
Exercise 103 What is the concentration (out) at the points 20 steps from
the release point after 1000 time steps (n.end = 1000)? Why? How does it
change when release is half ? Why? How does it change when dt is half? Why?
How does it change when dirichlet = TRUE? Why?
Exercise 104 [HAND IN:] Hand in a short description of your plan for the
assignment: which data will you analyze, what kind of hypothesis do you want
to look at, which type of analysis methods will you use
44