Mathematical Image Processing 9783030014575 9783030014582
Mathematical Image Processing 9783030014575 9783030014582
Kristian Bredies
Dirk Lorenz
Mathematical
Image
Processing
Applied and Numerical Harmonic Analysis
Series Editor
John J. Benedetto
University of Maryland
College Park, MD, USA
Jelena Kovačević
Carnegie Mellon University
Pittsburgh, PA, USA
Mathematical Image
Processing
Kristian Bredies Dirk Lorenz
Institute for Mathematics and Scientific Braunschweig, Germany
University of Graz
Graz, Austria
This book is published under the imprint Birkhäuser, www.birkhauser-science.com by the registered
company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
ANHA Series Preface
The Applied and Numerical Harmonic Analysis (ANHA) book series aims to
provide the engineering, mathematical, and scientific communities with significant
developments in harmonic analysis, ranging from abstract harmonic analysis to
basic applications. The title of the series reflects the importance of applications
and numerical implementation, but richness and relevance of applications and
implementation depend fundamentally on the structure and depth of theoretical
underpinnings. Thus, from our point of view, the interleaving of theory and
applications and their creative symbiotic evolution is axiomatic.
Harmonic analysis is a wellspring of ideas and applicability that has flourished,
developed, and deepened over time within many disciplines and by means of
creative cross-fertilization with diverse areas. The intricate and fundamental rela-
tionship between harmonic analysis and fields such as signal processing, partial
differential equations (PDEs), and image processing is reflected in our state-of-the-
art ANHA series.
Our vision of modern harmonic analysis includes mathematical areas such as
wavelet theory, Banach algebras, classical Fourier analysis, time-frequency analysis,
and fractal geometry, as well as the diverse topics that impinge on them.
For example, wavelet theory can be considered an appropriate tool to deal with
some basic problems in digital signal processing, speech and image processing,
geophysics, pattern recognition, biomedical engineering, and turbulence. These
areas implement the latest technology from sampling methods on surfaces to fast
algorithms and computer vision methods. The underlying mathematics of wavelet
theory depends not only on classical Fourier analysis, but also on ideas from abstract
harmonic analysis, including von Neumann algebras and the affine group. This leads
to a study of the Heisenberg group and its relationship to Gabor systems, and of the
metaplectic group for a meaningful interaction of signal decomposition methods.
The unifying influence of wavelet theory in the aforementioned topics illustrates the
justification for providing a means for centralizing and disseminating information
from the broader, but still focused, area of harmonic analysis. This will be a key role
of ANHA. We intend to publish with the scope and interaction that such a host of
issues demands.
v
vi ANHA Series Preface
The above point of view for the ANHA book series is inspired by the history of
Fourier analysis itself, whose tentacles reach into so many fields.
In the last two centuries Fourier analysis has had a major impact on the
development of mathematics, on the understanding of many engineering and
scientific phenomena, and on the solution of some of the most important problems
in mathematics and the sciences. Historically, Fourier series were developed in
the analysis of some of the classical PDEs of mathematical physics; these series
were used to solve such equations. In order to understand Fourier series and the
kinds of solutions they could represent, some of the most basic notions of analysis
were defined, e.g., the concept of “function." Since the coefficients of Fourier
series are integrals, it is no surprise that Riemann integrals were conceived to deal
with uniqueness properties of trigonometric series. Cantor’s set theory was also
developed because of such uniqueness questions.
A basic problem in Fourier analysis is to show how complicated phenomena,
such as sound waves, can be described in terms of elementary harmonics. There are
two aspects of this problem: first, to find, or even define properly, the harmonics or
spectrum of a given phenomenon, e.g., the spectroscopy problem in optics; second,
to determine which phenomena can be constructed from given classes of harmonics,
as done, for example, by the mechanical synthesizers in tidal analysis.
Fourier analysis is also the natural setting for many other problems in engineer-
ing, mathematics, and the sciences. For example, Wiener’s Tauberian theorem in
Fourier analysis not only characterizes the behavior of the prime numbers, but also
provides the proper notion of spectrum for phenomena such as white light; this
latter process leads to the Fourier analysis associated with correlation functions in
filtering and prediction problems, and these problems, in turn, deal naturally with
Hardy spaces in the theory of complex variables.
Nowadays, some of the theory of PDEs has given way to the study of Fourier
integral operators. Problems in antenna theory are studied in terms of unimodular
trigonometric polynomials. Applications of Fourier analysis abound in signal
processing, whether with the fast Fourier transform (FFT), or filter design, or the
ANHA Series Preface vii
Mathematical imaging is the treatment of mathematical objects that stand for images
where an “image” is just what is meant in everyday conversation, i.e., a picture of
a real scene, a photograph, or a scan. In this book, we treat images as continuous
objects, i.e., as image of a continuous scene or, put differently, as a function of a
continuous variable. The closely related field of digital imaging, on the other hand,
treats discrete images, i.e., images that are described by a finite number of values
or pixels. Mathematical imaging is a subfield of computer vision where one tries
to understand how information is stored in images and how information can be
extracted from images in an automatic way. Methods of computer vision usually
use underlying mathematical models for images and the information therein. To
apply methods for continuous images in practice, i.e., to digital images, one has
to discretize the methods. Hence, mathematical imaging and digital imaging are
closely related and often methods in both fields are developed simultaneously. A
method based on a mathematical model is useful only if it can be implemented in an
efficient way and the mathematical treatment of a digital method often reveals the
underlying assumptions and may explain observed effects.
This book emphasizes the mathematical character of imaging and as such is
geared toward students of mathematical subjects. However, students of computer
science, engineering, or natural sciences who have a knack for mathematics may
also find this book useful. We assume knowledge of introductory courses like
linear algebra, calculus, and numerical analysis; some basics of real analysis and
functional analysis are advantageous. The book should be suited for students in their
third year; however, later chapters of the book use some advanced mathematics. In
this book, we give an overview of mathematical imaging; we describe methods and
solutions for standard problems in imaging. We will also introduce elementary tools
as histograms and linear and morphological filters since they often suffice to solve
a given task. A special focus is on methods based on multiscale representations,
partial differential equations, and variational methods In most cases, we illustrate
how the methods can be realized practically, i.e., we derive applicable algorithms.
This book can serve as the basis for a lecture on mathematical imaging, but is also
possible to use parts in lectures on applied mathematics or advanced seminars.
ix
x Preface
The introduction of the book outlines the mathematical framework and intro-
duces the basic problems of mathematical imaging. Since we will need mathematics
from quite different fields, there is a chapter on mathematical basics. Advanced
readers may skip this chapter, just use it to brush up their knowledge, or use it as a
reference for the terminology used in this book. The chapter on mathematical basics
does not cover the basics we will need. Many mathematics facts and concepts are
introduced when they are needed for specific methods. Mathematical imaging itself
is treated in Chaps. 3–6. We organized the chapters according to the methods, and
not according to the problems. Somehow we present a box of tools that shall serve
as a reservoir of methods so that the user can pick, combine, and develop tools
that seem to be best suited for the problem at hand. These mathematical chapters
conclude with exercises which shall help to develop a deeper understanding of
the methods and techniques. Some exercises involve programming, and we would
like to encourage all readers to try to implement the method in their favorite
programming language. As with every book, there are a lot of topics which did
not find their way into the book. We would still like to mention some of these topics
in the sections called “Further developments.”
Finally, we would like to thank all the people who contributed to this book in one
way or another: Matthias Bremer, Jan Hendrik Kobarg, Christian Kruschel, Rainer
Löwen, Peter Maaß, Markus Müller (who did a large part of the translation from the
German edition), Tobias Preusser, and Nadja Worliczek.
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.1 What Are Images? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.2 The Basic Tasks of Imaging . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5
2 Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15
2.1 Fundamentals of Functional Analysis .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15
2.1.1 Analysis on Normed Spaces. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 16
2.1.2 Banach Spaces and Duality . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 23
2.1.3 Aspects of Hilbert Space Theory.. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 29
2.2 Elements of Measure and Integration Theory . . . . .. . . . . . . . . . . . . . . . . . . . 32
2.2.1 Measure and Integral . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 32
2.2.2 Lebesgue Spaces and Vector Spaces of Measures.. . . . . . . . . . . . 38
2.2.3 Operations on Measures .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 45
2.3 Weak Differentiability and Distributions . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 49
3 Basic Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 55
3.1 Continuous and Discrete Images . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 55
3.1.1 Interpolation.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 55
3.1.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 59
3.1.3 Error Measures .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 61
3.2 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 62
3.3 Linear Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 68
3.3.1 Definition and Properties . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 69
3.3.2 Applications .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 75
3.3.3 Discretization of Convolutions .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 81
3.4 Morphological Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 86
3.4.1 Fundamental Operations: Dilation and Erosion . . . . . . . . . . . . . . . 88
3.4.2 Concatenated Operations .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 92
3.4.3 Applications .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 95
3.4.4 Discretization of Morphological Operators.. . . . . . . . . . . . . . . . . . . 97
3.5 Further Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 101
3.6 Exercises.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 105
xi
xii Contents
References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 445
Notation . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 455
Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 461
We omit the philosophical aspect of the question “What are images?” and aim
to answer the question “What kind of images are there?” instead. Images can be
produced in many different ways:
Photography: Photography produces two-dimensional images by projecting a
scene of the real world through some optics onto a two-dimensional image plane.
The optics are focused onto some plane, called the focal plane, and objects appear
more blurred the farther they are from the focal plane. Hence, photos usually have
both sharp and blurred regions.
At first, photography was based on chemical reactions to map the different
values of brightness and color onto photographic film. Then some other chemical
reactions were used to develop the film and to produce photoprints. Each of the
different chemical reactions happens with some slight uncontrolled variations,
and hence the photoprint does not exactly correspond to the real brightness and
color values. In particular, photographic film has a certain granularity, which
amounts to a certain noise in the picture.
Nowadays, most photos are obtained digitally. Here, the brightness and color
are measured digitally at certain places—the pixels, or picture elements. This
results in a matrix of brightness or color values. The process of digital picture
acquisition also results in some noise in the picture.
Scans: To digitize photos one may use a scanner. The scanner illuminates the
photo row by row and measures the brightness or color along the lines. Usually
this does not result in some additional blur. However, a scanner operates at some
resolution, which results in a reduction of information. Moreover, the scanning
process may result in some additional artifacts. Older scans are often pale and
may contain some contamination. The correction of such errors is an important
problem in image processing.
u : → F.
Fig. 1.1 Different types of images. First row: Photos. Second row: A scan and a microscopy image
of cells. Third row: An image from indirect measurements (holography image of droplets) and a
generalized image (“elevation profile” of a surface produced by a turning process)
The field of digital image processing treats mostly discrete images, often also
with discrete color space. This is reasonable in the sense that images are most often
generated in discrete form or have to be transformed to a discrete image before
further automatic processing. The methods that are used are often motivated by
continuous considerations. In this book we take the viewpoint that our images are
continuous objects ( ⊂ Rd ). Hence, we will derive methods for continuous images
with continuous color space. Moreover, we will deal mostly with grayscale images
(F = R or F = [0, 1]).
The mathematical treatment of color images is a delicate subject. For example,
one has to be aware of the question of how to measure distances in the color space:
is the distance from red to blue larger than that from red to yellow? Moreover, the
perception of color is very complex and also subjective. Colors can be represented
in different color spaces and usually they are encoded in different color channels.
For example, there are the RGB space, where colors are mixed additively from the
red, green, and blue channels (as on screens and monitors) and the CMYK space,
where colors are mixed subtractively from the cyan (C), magenta (M), yellow (Y),
and black (K for black) channels (as is common in print). In the RGB space, color
values are encoded by triplets (R, G, B) ∈ [0, 1]3, where the components represent
the amount of the respective color; (0, 0, 0) represents the color black, (1, 1, 1)
stands for white. This is visualized in the so-called RGB cube; see Fig. 1.2. Also the
colors cyan, magenta, and yellow appear as corners of the color cube. To process
color images one often uses the so-called HSV space: a color is described by the
channels Hue, Saturation, and Value. In the HSV space a color is encoded by a
triplet (H, S, V ) ∈ [0, 360[ × [0, 100] × [0, 100]. The hue H is interpreted as an
angle, the saturation S and the value V as percentages. The HSV space is visualized
as a cylinder; see Fig. 1.3. Processing only the V-channel for the value (and leaving
the other channels untouched) often leads to fairly good results in practice.
The goal of image processing is to automate or facilitate the evaluation and
interpretation of images. One speaks of high-level methods if one obtains certain
information from the images (e.g., the number of objects, the viewpoint of the
R=(1,0,0) M=(1,0,1)
Y=(1,1,0)
W=(1,1,1)
K=(0,0,0)
B=(0,0,1)
G=(0,1,0) C=(0,1,1)
camera, the size of an object, and even the meaning of the scene). Low-level methods
are methods that produce new and improved images out of given images. This book
treats mainly low-level methods.
For the automatic processing of images one usually focuses on certain properties
and structures of interest. These may be, for example:
Edges, corners: An edge describes the boundary between two different struc-
tures, e.g., between different objects. However, a region in the shade may also be
separated from a lighter area by an edge.
Smooth regions: Objects with uniform color appear as smooth regions. If the
object is curved, the illumination creates a smooth transition of the brightness.
Textures: The word “texture” mostly stands for something like a pattern. This
refers, for example, to the fabric of a cloth, the structure of wallpapers, or fur.
Periodic structures: Textures may feature some periodic structures. These struc-
tures may have different directions and different frequencies and also occur as
the superposition of different periodic structures.
Coherent regions: Coherent regions are regions with a similar orientation of
objects as, for example, in the structure of wood or hair.
If such structures are to be detected or processed automatically, one needs good
models for the structures. Is an edge adequately described by a sudden change of
brightness? Does texture occur where the gray values have a high local variance?
The choice of the model then influences how methods are derived.
Many problems in imaging can be reduced to only a few basic tasks. This section
presents some of the classical basic tasks. The following chapters of this book will
introduce several tools that can be used for these basic tasks. For a specific real-
6 1 Introduction
Fig. 1.4 Unfavorable light conditions lead to noisy images. Left: Photo taken in dim light. Right:
Gray values along the depicted line
world application one usually deals with several problems and tasks and has to
combine and adapt methods or even invent new methods.
Denoising: Digital images contain erroneous information. Modern cameras that
can record images with several megapixels still produce noisy images, see
Fig. 1.4; in fact, it is usually the case that an increase of resolution also results
in a higher noise level. The camera’s chip uses the photon count to measure the
brightness. Since the emission of photons is fundamentally a random process,
the measurement is also a random variable and hence contains some noise.The
presence of noise is an inherent problem in imaging. The task of denoising is:
• Identify and remove the noise but at the same time preserve all important
information and structure.
Noise does not pose a serious problem for the human eye. We have no problems
with images with high noise level, but computers are different. To successfully
denoise an image, one needs a good model for the noise and the image. Some
reasonable assumptions are, for example:
• The noise is additive.
• The noise is independent of the pixel and comes from some distribution.
• The image consists of piecewise smooth regions that are separated by lines.
In this book we will treat denoising at the following places: Example 3.12,
Example 3.25, Sect. 3.4.4, Sect. 3.5, Example 4.19, Exam-
ple 5.5, Remark 5.21, Example 5.39, Example 5.40, Example 6.1,
Application 6.94, and Example 6.124.
Image decomposition: This usually refers to an additive decomposition of an
image into different components. The underlying assumption is that an image is
1.2 The Basic Tasks of Imaging 7
Here, “cartoon” refers to a rough sketch of the image in which textures and
similar components are omitted, and “texture” refers to these textures and other
fine structure.
A decomposition of an image can be successful if one has good models for the
different components.
In this book we will treat image decomposition at these places: Example 4.20,
Sect. 6.5.
Enhancement, deblurring: Besides noise, there are other errors that may be
present in images:
• Blur due to wrong focus: If the focus is not adjusted properly, one point in the
image is mapped to an entire region on the film or chip.
• Blur due to camera motion: The object or the camera may move during
exposure time. One point of the object is mapped to a line on the film or
chip.
• Blur due to turbulence: This occurs, e.g., as “shimmering” of the air above a
hot street but also is present in the observation of astronomic objects.
• Blur due to erroneous optics: One of the most famous examples is the Hubble
Telescope. Only after the launch of the telescope was it recognized that one
mirror had not been made properly. Since a fix in orbit was not considered
appropriate at the beginning, elaborate digital methods to correct the resulting
errors were developed.
See Fig. 1.5 for illustrations of these defects.
The goal of enhancement is to reduce the blur in images. The more is known
about the type of blur, the better. Noise is a severe problem for enhancement and
deblurring, since usually, noise is also amplified during deblurring or sharpening.
Methods for deblurring are developed at the following places in this book:
Application 3.24, Remark 4.21, Example 6.2, Appplication 6.97,
and Example 6.127.
Edge detection: One key component to the understanding of images is the
detection of edges:
• Edges separate different objects or an object from the background.
• Edges help to infer the geometry of a scene.
• Edges describe the shape of an object.
Edges pose different questions:
• How to define an edge mathematically?
• Edges exist at different scales (e.g., fine edges describe the shape of bricks,
while coarse edges describe the shape of a house). Which edges are important
and should be detected?
8 1 Introduction
Fig. 1.5 Blur in images. Top left: Blur due to wrong focus; top right: motion blur due to camera
shake; bottom left: shimmering; bottom right: an image from the Hubble Telescope before the error
was corrected
• A ball at rest is illuminated by a moving light source. The real field of motion
is zero, but the optical flow is not.
10 1 Introduction
The correspondence problem is a consequence of the fact that the optical flow
and the real field of motion do not coincide. In some cases different fields of
motion may cause the same difference between two images. Also there may be
some points in one image that may have moved to more than one place in the
other image:
• A regular grid is translated. If we observe only a small part of the grid, we
cannot detect a translation that is approximately a multiple of the grid size.
• Different particles move. If the mean distance between particles is larger than
their movements, we may find a correspondence; if the movement is too large,
we cannot.
• A balloon is inflated. Since the surface of the object increases, one point does
not have a single trajectory, but splits up into multiple points.
1.2 The Basic Tasks of Imaging 11
The aperture problem is related the correspondence problem. Since we see only
a part of the whole scene, we may not be able to trace the motion of some object
correctly. If a straight edge moves through the picture, we cannot detect any
motion along the direction of the edge. Similarly, we are unable to detect whether
a circle is rotating. This problem does not occur if the edges of the object have a
varying curvature, as illustrated by this picture:
problem is related to that of determining the optical flow. Thus, there are similar
problems, but there are some differences:
• Both images may come from different imaging modalities with different
properties (e.g., the images may have a different range of gray values or
different characteristics of the noise).
• There is no notion of time regularity, since there are only two images.
• In practice, the objects may not be rigid.
As for optical flow, we will not treat registration in this book. Again, the methods
from Chap. 6 can be adapted for this problem, too; see Sect. 6.5. Moreover,
one may consult the book [100].
Restoration (inpainting): Inpainting means the reconstruction of destroyed parts
of an image. Reasons for missing parts of an image may be:
• Scratches in old photos.
• Occlusion of objects by other objects.
• Destroyed artwork.
• Occlusion of an image by text.
• Failure of sensors.
• Errors during transmission of images.
There may be several problems:
• If a line is covered, it is not clear whether there may have been two different,
separated, objects.
• If there is an occluded crossing, one cannot tell which line is in front and
which is in back.
1.2 The Basic Tasks of Imaging 13
We are going to treat inpainting in this book at the following places: Sect. 5.5,
Example 6.4, Application 6.98, and Example 6.128.
Compression: “A picture is worth a thousand words.” However, it needs even
more disk space:
1 letter = 1bByte
1 word ≈ 8 letters = 8 bytes
1000 words ≈ 8 KB.
1 pixel = 3 bytes
1 picture ≈ 4,000,000 pixels ≈ 12 MB
So one picture is worth about 1,500,000 words!
To transmit an uncompressed image with, say, four megapixels via email with an
upstream capacity of 128 KB/s, the upload will take about 12 min. However,
image data is usually somewhat redundant, and an appropriate compression
allows for significant reduction of this time. Several different compression
methods have entered our daily lives, e.g. JPEG, PNG, and JPEG2000.
One distinguishes between lossless and lossy compression. Lossless compression
allows for a reconstruction of the image that is accurate bit by bit. Lossy
compression, on the other hand, allows for a reconstruction of an image that
is very similar to the original image. Inaccuracies and artifacts are allowed as
long as they are not disturbing to the human observer.
• How to measure the “similarity” of images?
• Compression should work for a large class of images. However, a simple
reduction of the color values works well for simple graphics or diagrams, but
not for photos.
We will treat compression of images in this book at the following places:
Sect. 3.1.3, Application 4.53, Application 4.73, and Remark 6.5.
Chapter 2
Mathematical Preliminaries
For image processing, mainly those aspects of functional analysis are of interest that
deal with function spaces (as mathematically, images are modeled as functions).
Later, we shall see that, depending on the space in which an image is contained,
Let K denote either the field of real numbers R or complex numbers C. For complex
numbers, the real part, the imaginary part, the conjugate, and the absolute value are
respectively defined by
√
z = a + ib with a, b ∈ R : Re z = a, Im z = b, z = a − ib, |z| = zz.
In order to distinguish norms, we may add the name of the underlying vector
space to it, for instance, · X for the norm on X. It is also common to refer to X
itself as the normed space if the norm used is obvious due to the context. Norms
on finite-dimensional spaces will be denoted by | · | in many cases. Since in finite-
dimensional spaces all norms are equivalent, they play a different role from that in
infinite-dimensional spaces.
Example 2.2 (Normed Spaces) Obviously, the pair (K, | · |) is a normed space. For
N ≥ 1 and 1 ≤ p < ∞,
N 1/p
|x|p = |xi |p and |x|∞ = max |xi |
i∈{1,...,N}
i=1
the set
Br (x) = {y ∈ X x − y < r}.
A subset U ⊂ X is
• open if it consists of interior points only, i.e., for every x ∈ U , there exists an
ε > 0 such that Bε (x) ⊂ U ,
• a neighborhood of x ∈ X if x is an interior point of U ,
• closed if it consists of limit points only, i.e., for every x ∈ X for which for every
ε > 0, the sets Bε (x) intersect the set U , one has also x ∈ U ,
• compact if every covering of U by a family of open sets has a finite subcover,
i.e.,
Vi open, i ∈ I with U ⊂ Vi ⇒ ∃J ⊂ I, J finite with U ⊂ Vj .
i∈I j ∈J
which is why the latter is also referred to as a closed r-ball. We say that a subset
U ⊂ X is dense in X if U = X. In particular, X is called separable if it possesses a
countable and dense subset.
Normed spaces are first countable (i.e., each point has a countable neighborhood
base, cf. [122]). That is why we can also describe the terms closed and compact also
by means of sequences and their convergence properties:
• We say that a sequence (xn ) : N → X converges to x ∈ X if (xn − x) is a null
sequence. This is also denoted by xn → x for n → ∞ or x = limn→∞ xn .
• The subset U is closed if and only if for every sequence (xn ) in U with xn → x,
the limit x lies in U as well (sequential closedness).
• The subset U is compact if and only if every sequence (xn ) in U has a convergent
subsequence (sequential compactness).
For nonempty subsets V ⊂ X, we naturally obtain a topology on V through
restriction, which is referred to as the relative topology. The notions introduced
above result simply through substituting X by the subset V in the respective
definitions.
18 2 Mathematical Preliminaries
is a normed space.
2. For a subspace U of a normed space (X, · X ), the pair (U, · X ) is a normed
space again. Its topology corresponds to the relative topology on U .
3. If (X, · X ) is a normed space and Y ⊂ X is a closed subspace, the quotient
vector space
X/Y = {[x] x1 ∼ x2 if and only if x1 − x2 ∈ Y }
is closed.
Continuity in x ∈ U can equivalently be expressed through sequential continuity,
i.e., for every sequence (xn ) in U , one has xn → x ⇒ F (xn ) → F (x) for n → ∞.
The weaker property that F is closed is expressed with sequences as follows: for
every sequence (xn ) in U with xn → x such that F (xn ) → y for some y ∈ Y , we
always have x ∈ dom F and y = F (x).
On normed spaces, a stronger notion of continuity is of importance as well:
Definition 2.5 (Lipschitz Continuity) A mapping F : X ⊃ U → Y between
the normed spaces (X, · X ) and (Y, · Y ) is called Lipschitz continuous if there
2.1 Fundamentals of Functional Analysis 19
The infimum over all these constants C is called the Lipschitz constant.
Sets of uniformly continuous mappings can be endowed with a norm structure:
Definition 2.6 (Spaces of Continuous Mappings) Let U ⊂ X be a non-empty
subset of the normed space (X, · X ), endowed with the relative topology, and let
(Y, · Y ) be a normed space.
The vector space of continuous mappings we denote by:
C (U, Y ) = {F : U → Y F continuous}.
forms the space of linear and continuous mappings between X and Y . We also refer
to the norm given on L(X, Y ) as operator norm.
Linear and continuous mappings are often also called bounded linear mappings.
Note that densely defined and continuous mappings can be extended onto the
whole of X, which is why densely defined linear mappings are often also called
unbounded.
20 2 Mathematical Preliminaries
Normed spaces are also the starting point for the definition of the differentiability
of a mapping. Apart from the classical definition of Fréchet differentiability, we will
also introduce the weaker notion of Gâteaux differentiability.
Definition 2.9 (Fréchet Differentiability) Let F : U → Y be a mapping defined
on the open subset U ⊂ X between the normed spaces (X, · X ) and (Y, · Y ).
Then, F is Fréchet differentiable (or differentiable) at x ∈ U if there exists DF (x) ∈
L(X, Y ) such that for every ε > 0, there exists δ > 0 such that
F (x + h) − F (x) − DF (x)hY
0 < hX < δ ⇒ x+h ∈U and < ε.
hX
The linear and continuous mapping DF (x) is also called the (Fréchet) derivative at
the point x.
If F is differentiable at every point x ∈ U , then F is called (Fréchet)
differentiable, and DF : U → L(X, Y ), given by DF : x → DF (x), denotes
the (Fréchet) derivative. If DF is continuous, we call F continuously differentiable.
2.1 Fundamentals of Functional Analysis 21
k-times k-times
k
G = inf M > 0 G(x1 , . . . , xk )Y ≤ M xi X for all (x1 , . . . , xk ) ∈ Xk ,
i=1
where the latter norm coincides with the respective operator norm. Usually, we
regard the kth derivative as an element in Lk (X, Y ) (the space of k-linear continuous
mappings), or equivalently, as a mapping Dk F : U → Lk (X, Y ). If such a derivative
exists in x ∈ U and is continuous at this point, then Dk F (x) is symmetric.
Example 2.10 (Differentiable Mappings)
• A linear and continuous mapping F ∈ L(X, Y ) is infinitely differentiable with
itself as the first derivative and 0 as every higher derivative.
• On KN , every polynomial is infinitely differentiable.
• Functions on U ⊂ KN that possess continuous partial derivatives are also
continuously differentiable.
In the case of functions, i.e., X = RN and Y = K, the following notations are
common:
⎛ 2 ⎞
∂2F
∂ F
2 · · · ∂x ∂x
⎜ ∂x1 1 N
⎟
⎜ .. . .. ⎟ .
∇F = ∂x ∂F
· · · ∂F
, ∇ 2
F = ⎜ . . . . ⎟
1 ∂xN ⎝ 2 ⎠
∂2F
∂ F
∂xN ∂x1 · · · 2 ∂xN
By means of that notation and under the assumption that the respective partial
derivatives are continuous, the Fréchet derivatives can be represented by matrix
vector products
For higher derivatives, there are similar summation notations. In the former
situation, the vector field ∇F is called the gradient of F , while the matrix-valued
mapping ∇ 2 F is, slightly abusively, referred to as the Hessian matrix. In the case
of the gradient, it has become common in the literature not to distinguish between a
row vector and a column vector—in the sense that one can multiply the gradient at
a point by a matrix from the left as well. We will also make use of this fact as long
as ambiguities are impossible, but we will point this out again at suitable locations.
22 2 Mathematical Preliminaries
Apart from that, let us introduce two specific and frequently used differential
operators: For a vector field F : U → KN with U ⊂ RN a nonempty open subset,
the function
N
∂Fi
div F = trace ∇F =
∂xi
i=1
is called the divergence and the associated operator div the divergence operator. For
functions F : U → K, the operator with
N
∂ 2F
F = trace ∇ F = 2
i=1
∂xi2
∂α ∂ α1 ∂ αd
= · · · .
∂x α ∂x α1 ∂x αd
We will also use the notation
∂α
∂α = .
∂x α
By |α| = dk=1 αk , we denote the order of the multi-index. By means of multi-
indices, we can, for instance, formulate the Leibniz rule
for the higher-order
derivatives of a product in a compact fashion: with α! = dk=1 αk ! and βα =
α!
β!(α−β)! for β ≤ α (i.e., βk ≤ αk for 1 ≤ k ≤ d), one has
α
∂ (f g) =
α
∂ α−β f ∂ β g .
β
β≤α
2.1 Fundamentals of Functional Analysis 23
Apart from the concept of normed spaces, the notion of completeness, i.e., the
existence of limits of Cauchy sequences, is essential for various fundamental results.
Definition 2.12 (Banach Space) A normed space (X, · X ) is complete if every
Cauchy sequence converges in X, i.e., for every sequence (xn ) in X with the
property
for all ε > 0, there exists n0 ∈ N such that for all n, m ≥ n0 , one has: xn − xm X < ε,
The subspaces of X and X∗ can be related in the following way: For a subspace
U ⊂ X, the annihilator is defined as the set
U ⊥ = {x ∗ ∈ X∗ x ∗ (x) = 0 for all x ∈ U } in X∗ ,
The sets U ⊥ and V ⊥ are always closed subspaces. Annihilators are used for the
characterizations of dual spaces of subspaces.
Example 2.17 (Dual Spaces)
1. The dual space X∗ of an N-dimensional normed space X is again N-dimensional,
i.e., equivalent to itself. In particular, (KN , · )∗ = (KN , · ∗ ), where · ∗ is
the norm dual to · .
2. The dual space of Y = X1 × · · · × XN of Example 2.3 can be regarded as
Y ∗ = X1∗ × · · · × XN
∗
, (x1∗ , . . . , xN
∗
)Y ∗ = (x1∗ X1∗ , . . . , xN
∗
XN∗ ) ∗
The consideration of the dual space is the basis for the notion of weak
convergence, for instance, and in many cases, X∗ reflects important properties of
the predual space X. It is common to regard the application of elements in X∗ to
elements in X as a bilinear mapping called duality pairing:
The subscript is often omitted if the spaces are evident due to the context. Of course,
one can iterate the generation of the dual space, the next space being the bidual
space X∗∗ . This space naturally contains X, and the canonical injection is given by
Hence, the bidual space is always at least as large as the original space. We can
now consider the closure of J (X) in X∗∗ and naturally obtain a Banach space that
contains X in a certain sense.
Theorem 2.18 (Completion of Normed Spaces) For every normed space
(X, · X ), there exists a Banach space (X̃, · X̃ ) and a norm-preserving mapping
J : X → X̃ such that J (X) is dense in X̃.
Often, one identifies X ⊂ X̃ and considers the completed space X̃ instead of
X. A completion can also be constructed by taking equivalence classes of Cauchy
sequences; according to the inverse mapping theorem, this procedure yields an
equivalent Banach space.
The notion of the bidual space is also essential in another context: If the injection
J : X → X∗∗ is surjective, X is called reflexive. Reflexive spaces play a particular
role in the context of weak convergence:
Definition 2.19 (Weak Convergence, Weak*-Convergence) A sequence (xn ) in
a normed space (X, · X ) converges weakly to some x ∈ X if for every x ∗ ∈ X∗ ,
one has
∗
for every x ∈ X, which we also denote by xn∗ x ∗ for n → ∞.
Note that the definition coincides with the notion of the convergence of sequences
in the weak or weak* topology, respectively. However, we will not go into further
detail here. While the convergence in the norm sense implies convergence in the
weak sense, the converse holds only in finite-dimensional spaces. Also, in the
dual space X∗ , weak* convergence is in general a weaker property than weak
convergence; for reflexive spaces, however, the two notions coincide. According
to Theorem 2.15 (Banach Steinhaus), weakly or weak*-convergent sequences,
respectively, are at least still bounded, i.e., xn x for n → ∞ implies
supn xn X < ∞ and, the analogue holds for weak* convergence.
Of course, notions such as continuity and closedness of mappings can be
generalized for these types of convergence.
Definition 2.20 (Weak/Weak* Continuity, Closedness) Let X, Y be normed
spaces and U ⊂ X a nonempty subset.
• A mapping F : U → Y is called {strongly, weakly}-{strongly, weakly}
continuous if the {strong, weak} convergence of a sequence (xn ) to some x ∈ U
implies the {strong, weak} convergence of (F (xn )) to F (x).
• The mapping is {strongly, weakly}-{strongly, weakly} closed if the {strong,
weak} convergence of (xn ) to some x ∈ X and the {strong, weak} convergence
of (F (xn )) to some y ∈ Y imply: x ∈ U and y = F (x).
In the case that X or Y is a dual space, the corresponding weak* terms are defined
analogously with weak* convergence.
One of the main reasons to study these types of convergences is compactness
results, whose assertions are similar to the Heine Borel theorem in the finite-
dimensional case.
• A subset U ⊂ X is weakly sequentially compact if every sequence in U possesses
a weakly convergent subsequence with limit in U .
• We say that a subset U ⊂ X∗ is weak*-sequentially compact if the analogue
holds for weak* convergence.
Theorem 2.21 (Banach-Alaoglu for Separable Spaces) Every closed ball in the
dual space of a separable normed space is weak*-sequentially compact.
Theorem 2.22 (Eberlein Šmulyan) A normed space is reflexive if and only if
every closed ball is weakly sequentially compact.
2.1 Fundamentals of Functional Analysis 27
Dual spaces and weak and weak* convergence can naturally be used in connec-
tion with linear and continuous mappings as well. Corresponding examples are the
adjoint as well as the notions of weak and weak* sequential continuity.
Definition 2.23 (Adjoint Mapping) For F ∈ L(X, Y ),
The circumstances under which these identities hold for interchanged annihilators
are given in the following theorem on linear mappings with a closed range.
28 2 Mathematical Preliminaries
The concept of a Banach space is very general. Thus, it is not surprising that many
desired properties (such as reflexivity, for instance) do not hold in general and have
to be required separately. In the case of a Hilbert space, however, one has additional
structure at hand due to the inner product, which naturally yields several properties.
Let us give a brief summary of the most important of these properties.
Definition 2.30 (Inner Product) Let X be a K-vector space. A mapping ( · , · )X :
X × X → K is called an inner product if it satisfies:
1. (λ1 x1 + λ2 x2 , y)X = λ1 (x1 , y)X + λ2 (x2 , y) for x1 , x2 , y ∈ X and λ1 , λ2 ∈ K,
(linearity)
2. (x, y)X = (y, x)X for x, y ∈ X, (Hermitian symmetry)
3. (x, x)X ≥ 0 and (x, x)X = 0 ⇔ x = 0. (positive definiteness)
√
An inner product on X induces a norm by xX = (x, x)X , which satisfies the
Cauchy-Schwarz inequality:
Based on that, one can easily deduce the continuity of the inner product as well.
For a K-vector space with inner product and the associated normed space
(X, · X ), the terms inner
product space
and pre-Hilbert space are common. We
also denote this space by X, ( · , · )X . Together with the notion of completeness,
we are lead to the following definition.
Definition 2.31
(Hilbert Space) A Hilbert space is a complete pre-Hilbert space
X, ( · , · )X . Depending on K = R or K = C, it is called a real or a complex
Hilbert space, respectively.
Example 2.32 For N ≥ 1, the set KN is a finite-dimensional Hilbert space if the
inner product is chosen as
N
(x, y)2 = xi yi .
i=1
30 2 Mathematical Preliminaries
We call this inner product the Euclidean inner product and also write x ·y = (x, y)2 .
Analogously, the set 2 = {x : N → K ∞
i=1 |xi | < ∞}, endowed with the
2
inner product
∞
(x, y)2 = xi yi ,
i=1
yields an infinite-dimensional separable Hilbert space. (Note that the inner product
is well defined as a consequence of the Cauchy-Schwarz inequality.)
For (pre-)Hilbert spaces, the notion of orthogonality is characteristic:
Definition 2.33 (Orthogonality) Let X, ( · , · )X be a pre-Hilbert space.
• Two elements x, y ∈ X are called orthogonal if (x, y)X = 0, which we also
denote by x ⊥ y. A set U ⊂ X whose elements are mutually orthogonal is called
an orthogonality system.
• The subspaces U, V ⊂ X are orthogonal, denoted by U ⊥ V , if orthogonality
holds for every pair (x, y) ∈ U × V . For a subspace W ⊂ X with W = U + V ,
we also write W = U ⊥ V .
• For a subspace U ∈ X, the subspace of all vectors
U ⊥ = {y ∈ X x ⊥ y for all x ∈ U }
The analogous assertion on the square of norms of sums remains true for finite
orthogonal systems as well as countable orthogonal systems whose series converge
in X.
The orthogonal complement U ⊥ is closed and for closed subspaces U in the
Hilbert space X, one has X = U ⊥ U ⊥ . That implies the existence of the
orthogonal projection on U : if X, ( · , · )X is a Hilbert space and U ⊂ X is a
closed subspace, then there exists a unique P ∈ L(X, X) with
where (x, y) = 0 is true for at most countably many y ∈ U , i.e., the sum is to be
interpreted as a convergent series.
If U is complete, then equality holds as long as X is a Hilbert space. This
relationship is called Parseval’s identity:
x∈X: x2X = |(x, y)X |2 .
y∈U
The latter can also be interpreted as a special case of the Parseval identity:
x1 , x2 ∈ X : (x1 , x2 )X = (x1 , y)X (x2 , y)X .
y∈U
One can show that in every Hilbert space, there exists a complete orthonormal
system (cf. [22]). Furthermore, a Hilbert space X is separable if and only if there
exists an orthonormal basis in X. Due to the Parseval identity, every separable
Hilbert space is thus isometrically isomorphic to either 2 or KN for some N ≥ 0. In
particular, the Parseval relation implies that every sequence of orthonormal vectors
(x n ) weakly converges to zero.
Let us finally consider the dual spaces of Hilbert spaces. Since the inner product
is continuous, for every y ∈ X we obtain an element JX y ∈ X∗ by means of
JX y, xX∗ ×X = (x, y)X . The mapping JX : X → X∗ is semi linear, i.e., we have
The remarkable property of a Hilbert space is now that the range of JX is the whole
of the dual space:
Theorem 2.35 (Riesz Representation Theorem) Let X, ( · , · )X be a Hilbert
space. For every x ∗ ∈ X∗ , there exists y ∈ X with yX = x ∗ X∗ such that
The notion of the Lebesgue integral is based on measuring the contents of sets by
means of a so-called measure.
Definition 2.36 (Measurable Space) Let be a nonempty set. A family of subsets
F is a σ -algebra if
1. ∅ ∈ F,
2. for all A ∈ F, one has \A
∈ F,
3. Ai ∈ F, i ∈ N, implies that i∈N Ai ∈ F.
The pair (, F) is called a measurable space, and the sets in F are called
measurable. The smallest σ -algebra that contains a family of subsets G is the σ -
algebra induced by G. For a topological space , the σ -algebra induced by the
open sets is called a Borel algebra, denoted by B().
2.2 Elements of Measure and Integration Theory 33
∞
μ Ai = μ(Ai ).
i∈N i=1
If μ() < ∞, then μ is a finite measure; in the special case that μ() = 1, it is
exists a sequence (Ai ) in F for which μ(Ai ) <
called a probability measure. If there
∞ for all i ∈ N as well as = i∈N Ai , then the measure is called σ -finite.
The triple (, F, μ) is called a measure space.
Often, the concrete measurable space to which a measure is associated to, is of
interest: in the case F = B(), we speak of a Borel measure; if additionally one
has μ(K) < ∞ for all compact sets K, we call μ a positive Radon measure.
Example 2.38 (Measures)
1. On every nonempty set , a measure is defined on the power set F = P() in
an obvious manner:
card(A) if A is finite,
μ(A) =
∞ otherwise.
This measure is called the counting measure on . One can restrict it to a Borel
measure, but in general, it is not a positive Radon measure: in the important
special case of the standard topology on Rd , there are compact sets with infinite
measure.
2. The following example is of a similar nature: For ⊂ Rd , x ∈ , and A ∈ F =
B(),
1 if x ∈ A,
δx (A) =
0 otherwise,
3. For half-open cuboids [a, b[ = {x ∈ Rd ai ≤ xi < bi } ∈ B(Rd ) with a, b ∈
Rd , we define
d
Ld ([a, b[) = (bi − ai ).
i=1
One can show that this function possesses a unique extension to a Radon measure
on B(Rd ) (cf. [123]). This measure, denoted by Ld as well, is the d-dimensional
Lebesgue measure. It corresponds to the intuitive idea of the “volume” of a d-
dimensional set, and we also write || = Ld ().
4. A different approach to the Lebesgue measure assumes that the volume of k-
dimensional unit balls is known: For an integer k ≥ 0, this volume is given by
∞
π k/2
ωk = , (k) = t k−1 e−t dt,
(1 + k/2) 0
where is known as the gamma function. For k ∈ [0, ∞[, a volume can be
defined even for “fractional dimensions.” For an arbitrary bounded set A ⊂ Rd ,
one now expects that the k-dimensional volume is at most ωk diam(A)k /2k with
diam(A) = sup {|x − y| x, y ∈ A}, diam(∅) = 0.
∞
ωk k
Hk (A) = lim k
inf diam(A i A ⊂
) Ai , diam(Ai ) < δ .
δ→0 2
i=1 i∈N
A ∈ Fμ ⇔ A = B ∪ N, B ∈ F, N null set,
is the completion of F with respect to μ. Its elements are the μ-measurable sets.
• For A ∈ Fμ , we extend μ(A) = μ(B) using B ∈ F above.
The extension of μ to Fμ results in a measure again, which we tacitly denote by μ
as well. For the Lebesgue measure, the construction B(Rd )Ld yields the Lebesgue
measurable sets. The measure space
, B()Ld , Ld
associated to ∈ B(Rd )Ld presents the basis for the standard Lebesgue integration
on . If nothing else is mentioned explicitly, the notions of measure and integration
theory refer to this measure space.
36 2 Mathematical Preliminaries
For measurable functions with finite range, called step functions, the integral is
defined by
f (t) dμ(t) = f dμ = μ f −1 ({a}) a ∈ [0, ∞].
a∈f ()
Otherwise, we set
f (t) dμ(t) = f dμ = lim fn dμ
n→∞
2.2 Elements of Measure and Integration Theory 37
• In connection with the integral sign, several special notations are common: if A is
a μ-measurable subset of , then the integral of a measurable/integrable f over
A is defined by
f (t) if t ∈ A,
f dμ = f¯ dμ, f¯(t) =
A 0 otherwise.
where the latter notation can lead to misunderstandings and is used only if it is
evident that f is a function of t.
The notions μ-measurability and μ-integrability for non-negative functions as
well as vector-valued mappings present the respective analogues for the completed
measure space (, Fμ , μ). They constitute the basis for the Lebesgue spaces, which
we will introduce now as spaces of equivalence classes of measurable functions.
f ∼g ⇔ f =g almost everywhere.
which yields the triangle inequality for p ∈ [1, ∞[; the case p = ∞ can
be proved in a direct way. Finally, the requirement of the positive definiteness
reflects the fact that a transition to equivalence classes is necessary, since the
integral of a nonnegative, measurable function vanishes if and only if it is zero
almost everywhere.
For sequences of measurable or integrable functions in the sense of Lebesgue,
several convergence results hold. The most important of these findings are as
follows. Proofs can be found in [50, 53], for instance.
40 2 Mathematical Preliminaries
Lemma 2.46 (Fatou’s Lemma) Let (, Fμ , μ) be a measure space and (fn ) a
sequence of nonnegative measurable functions fn : → [0, ∞], n ∈ N. Then,
lim inf fn (t) dμ(t) ≤ lim inf fn (t) dμ(t).
n→∞ n→∞
∞
p 1/p
f p = fn X , f ∞ = sup fn X .
n=1 n∈N
Due to the above results, Lebesgue spaces become part of Banach and Hilbert
space theory. This also motivates further topological considerations, such as the
p
characterization of the dual spaces, for instance: if Lμ (, X) is a Hilbert space,
i.e., p = 2 and X is a Hilbert space, then the Riesz representation theorem
2.2 Elements of Measure and Integration Theory 41
∗
(Theorem 2.35) immediately yields L2μ (, X) = L2μ (, X) in the sense of the
Hilbert space isometry
∗
J : L2μ (, X) → L2μ (, X) , (Jf ∗ )f = (f (t), f ∗ (t)) dμ(t).
is compact in .
We denote the subspace of continuous functions with compact support by
Cc (, X) = {f ∈ C (, X) f has compact support} ⊂ C (, X).
Theorem 2.55 For 1 ≤ p < ∞, the set Cc (, X) is dense in Lp (, X), i.e., for
every f ∈ Lp (, X) and every ε > 0, there exists some g ∈ Cc (, X) such that
f − gp ≤ ε.
Apart from Lebesgue spaces, which contain classes of measurable functions,
we also consider Banach spaces of measures. In particular, we are interested in
the spaces of signed or vector-valued Radon measures. These spaces, in some
sense, contain functions, but additionally also measures that cannot be interpreted
as functions or equivalence classes of functions. Thus, they already present a set
of generalized functions, i.e., of objects that in general can no longer be evaluated
pointwise. In the following, we give a summary of the most important corresponding
results, following the presentation in [5, 61].
Definition 2.56 (Vector-Valued Measure) A function μ : F → X on a measur-
able space (, F) into a finite-dimensional Banach space X is called a vector-valued
measure if
1. μ(∅) = 0 and
2. for Ai ∈ F, i ∈ N, with Ai mutually disjoint, one has
∞
μ Ai = μ(Ai ).
i∈N i=1
∞
A∈F: |μ|(A) = sup μ(Ai )X A = Ai , Ai ∈ F mutually disjoint
i=1 i∈N
Since μf M = f 1 , this mapping is injective and the range is a closed subspace.
44 2 Mathematical Preliminaries
and therefore, it yields a finite, even positive, Radon measure that coincides with the
restriction of the one-dimensional Lebesgue measure to [a, b].
By means of the characterization of M(, X) as a dual space, we immediately
obtain the notion of weak* convergence in the sense of Definition 2.19: in fact,
∗
μn μ for a sequence (μn ) in M(, X) and μ ∈ M(, X) if and only if
f dμn → f dμ for all f ∈ C0 (, X∗ ).
For two measure spaces given on 1 and 2 , respectively, one can easily construct
a measure on the Cartesian product 1 × 2 . This, for example, is helpful for
integration on Rd1 +d2 = Rd1 × Rd2 .
Definition 2.65 (Product Measure) For measure spaces (1 , F1 , μ1 ) and
(2 , F2 , μ2 ), the product F1 ⊗ F2 denotes the σ -algebra induced by the sets
A × B, A ∈ F1 and B ∈ F2 .
A product measure μ1 ⊗ μ2 is a measure given on F1 ⊗ F2 that satisfies
Remark 2.67 The respective assertions
hold for the completed measure space 1 ×
2 , (F1 ⊗ F2 )μ1 ⊗μ2 , μ1 ⊗ μ2 as well.
Example 2.68 One can show that the product of Lebesgue measures is a again a
Lebesgue measure: Ld−n ⊗ Ln = Ld for integers 1 ≤ n < d (cf. [146]). This fact
facilitates integration on Rd :
f (t) dt = f (t1 , t2 ) dt1 dt2 , t = (t1 , t2 )
Rd Rn Rd−n
for f ∈ L1 (Rd , X). According to Fubini’s theorem, the value of the integral does
not depend on the order of integration.
For a measure space (1 , F1 , μ1 ), a measurable space (2 , F2 ), and a measur-
able mapping ϕ : 1 → 2 , one can abstractly define the pushforward measure:
μ2 (A) = μ1 ϕ −1 (A) , A ∈ F2 .
Fig. 2.1 The mappings ψj pull the boundary of a Lipschitz domain “locally straight”
48 2 Mathematical Preliminaries
n
supp ϕj ⊂⊂ Uj , ϕj (x) ∈ [0, 1] ∀x ∈ Rd , ϕj (x) = 1 ∀x ∈ .
j =0
n
f (x) dH d−1
(x) = Jj (x) (ϕj f ) ◦ ψj (x, 0) dLd−1(x),
j =1 Vj ∩R
∂ d−1
!
where Jj (x) = det Dj (x)T Dj (x) with Dj (x) corresponding to the first d −
1 columns of the Jacobian matrix ∇ψj (x, 0) and Dj (x)T corresponding to its
transpose. In particular, the function Jj can be defined Ld−1 -almost everywhere
in Vj ∩ Rd−1 and is measurable and essentially bounded there.
3. There exists an Hd−1 -measurable mapping ν : ∂ → Rd , called the outer
normal, with |ν(x)| = 1 Hd−1 -almost everywhere such that for all vector fields
f ∈ L1 (∂, Rd ),
n
(f ·ν)(x) dHd−1(x) = Jj (x) (ϕj f )◦ψj ·Ej (x, 0) dLd−1(x),
j =1 Vj ∩R
∂ d−1
where Ej is given by Ej (x) = (∇ψj (x)−T ed )/|∇ψj (x)−T ed | and ∇ψj (x)−T
denotes the inverse of the transposed Jacobian matrix ∇ψj (x).
2.3 Weak Differentiability and Distributions 49
α
and analogously C k (), endowed with the norm f k,∞ = max|α|≤k ∂x∂
α f ∞ .
The space of functions that are infinitely differentiable is given by
∂α
C ∞ () = {f : → K ∂x α f ∈ C () for all α ∈ N },
d
∂ α φn → ∂ α φ uniformly in .
The Dirac measures of Example 2.38 are called Dirac distributions or delta
distributions in this context.
2.3 Weak Differentiability and Distributions 51
we say that the weak derivative ∂ α f = g exists. If for an integer m ≥ 1 and all
multi-indices α with |α| ≤ m, the weak derivatives ∂ α f exist, then f is m-times
weakly differentiable.
Note that we denote the classical as well as the weak derivative by the same
symbol, which normally will not lead to ambiguities. If necessary, we will explicitly
state which derivative is meant. According to Lemma 2.75 (fundamental lemma
of the calculus of variations), the weak derivative is uniquely determined almost
everywhere.
Definition 2.79 (Sobolev Spaces) Let be 1 ≤ p ≤ ∞ and m ∈ N. The Sobolev
space H m,p () is the set
H m,p () = {f ∈ Lp () ∂ α f ∈ Lp () for |α| ≤ m}
The Sobolev spaces are Banach spaces; for 1 < p < ∞, they are reflexive, and
for p = 2, together with the inner product
(f, g)H m = (∂ α f , ∂ α g)2 ,
|α|≤m
property of the trace is the fact that Gauss’s theorem holds for Sobolev functions on
Lipschitz domains:
Theorem 2.81 (Gauss’s Theorem, Weak Form) If is a bounded Lipschitz
d
domain and f ∈ H 1,1(, Kd ) = H 1,1 () is a Sobolev vector field. Then
f · ν dHd−1 = div f dx
∂
where f ∈ L1Hd−1 (∂) is the Sobolev trace of Theorem 2.80 and ν is the outer
normal introduced in Theorem 2.73. In particular, we have for f ∈ H 1,p (, Kd )
∗
and g ∈ H 1,p (),
f g · ν dH d−1
= g div f + f · ∇g dx.
∂
The proof is again based on the fact that the assertion holds for smooth functions
and vector fields. Then density arguments transfer the result to the general case. For
the second claim, we additionally use that the product satisfies fg ∈ H 1,1(, Kd )
and div(f g) = g div f + f · ∇g.
Chapter 3
Basic Tools
In Sect. 1.1 we considered images with continuous and discrete image domains. In
this book, we essentially work with continuous image domains. However, there are
good reasons to deal with discrete images and in particular with the connection of
discrete and continuous images:
• In practice, images are given in discrete form.
• In order to apply continuous methods to discrete images, the method has to
be discretized. This can, for instance, be achieved by interpolating the discrete
image to a continuous one and then employing the continuous method.
• Also images that are given in discrete form often stem from “continuous
brightness distributions.” For this purpose, the continuous scene was sampled.
What does this sampled image have to do with the real image?
Let us first deal with the interpolation of images.
3.1.1 Interpolation
rotated image by calculating where this pixel was located before the rotation. This
point, however, will in general not be a pixel of the original image, i.e., we have
to evaluate the image at intermediate points. Of course, the same happens for other
geometric transformations such as stretching, shrinking, shearing, and shifting, for
instance. Let us define the following geometric operations on images, which we will
encounter frequently in this book:
Definition 3.1 For y ∈ Rd and A ∈ Rd×d , we define
ty : Rd → Rd , d A : Rd → Rd ,
ty (x) = x + y, dA (x) = Ax.
By means of that, we define the linear operators for translation (shifting) and linear
coordinate transformation (scaling) on C (Rd ) by
Remark 3.2 The operators Ty and DA act “from the right,” i.e., they are applied
before the use of the function u. For concatenation, one has
u(x) = Uj if j − 1
2 ≤x<j+ 1
2
=U 1
x+ 2 "
where y" denotes the greatest integer that is less than y. This interpolation
is also called nearest-neighbor interpolation and can be interpreted as a
3.1 Continuous and Discrete Images 57
spline-interpolation of zeroth order. For this purpose, we define the vector space
V 0 = u : [ 12 , N + 12 ] → R u| 1 1 is constant for j = 1, . . . , N .
[j − 2 ,j + 2 [
In this vector space, the plateau functions form a basis. These functions are given
by φj0 (x) = T−j φ 0 (x) = φ 0 (x − j ), i.e., translations of
1 if x ∈ [− 12 , 12 [,
φ 0 (x) = χ 1 1 (x) =
[− 2 , 2 [ 0 else.
N
u(x) = Uj φj0 (x).
j =1
Obviously, a basis of this vector space is given by the hat functions φj1 (x) =
T−j φ 1 (x) = φ 1 (x − j ) with
⎧
⎪
⎨x + 1
⎪ if x ∈ [−1, 0[,
1
φ (x) = 1 − x if x ∈ [0, 1[,
⎪
⎪
⎩0 otherwise.
N
u(x) = Uj φj1 (x).
j =1
58 3 Basic Tools
A function that will play a major role in Sect. 4.2.2 is the sinc function:
sin(πx)
if x 0,
sinc(x) = πx
1 if x = 0.
N
u(x) = Uj T−j sinc(x).
j =1
N
M
u(x, y) = U 1 1 = Ui,j φi0 (x)φj0 (y).
x+ 2 ", y+ 2 "
i=1 j =1
N
M
u(x, y) = Ui,j φi1 (x)φj1 (y).
i=1 j =1
N
M
u(x, y) = Ui,j T−i φ(x)T−j φ(y).
i=1 j =1
3.1 Continuous and Discrete Images 59
3.1.2 Sampling
Using the Dirac measure of Example 2.38, we can view U as a delta comb
U= Ui δxi
This approach is somewhat closer to what happens inside a digital camera: the chip
collects photons over a small area. Mathematically, one can argue that an image u
cannot be evaluated pointwise, since it actually corresponds to the “distribution”
of brightness. Slightly more generally, we can define the mean sampling also by
60 3 Basic Tools
means of a test function φ ∈ D(Rd ) with φ ≥ 0 and φ dx = 1. For this purpose,
let xi ∈ Rd be the sampling points and
Ui = φ(x − xi )u(x) dx.
Rd
In this case, the mean sampling is justified for distributions as well, since it
corresponds to the application of the distribution to a test function in the sense of
Sect. 2.3.
When an image is sampled, some information is obviously lost. However, it can
even happen that “wrong” or undesired information creeps in; cf. Fig. 3.1. With
both point sampling and mean sampling, errors occur in this way, yet the errors
introduced by mean sampling are less obvious. The sampling of continuous images
(or signals, respectively) is a particular mathematical theory, which we will cover in
Sect. 4.2.2. We can then explain in Sect. 4.2.3 the so-called “alias effect” shown in
Fig. 3.1.
Fig. 3.1 Error due to wrong sampling. Top: Original image in high resolution. Lower left: The
same image after an eight times point subsampling, i.e., in the horizontal and vertical directions,
every eighth value was taken. Lower right: The same image after an eight times mean subsampling,
i.e., the mean was taken of squares of eight by eight pixels, respectively. In order to ease the
comparison, the images were scaled to the same size
3.1 Continuous and Discrete Images 61
Definition 3.4 (Mean Squared Error (MSE)) For two continuous images u, v ∈
L2 (), the mean squared error is given by
1 1
MSE(u, v) = u − v22 = (u(x) − v(x))2 dx.
|| ||
1
MSE(U, V ) = (Ui,j − Vi,j )2 .
NM
i,j
The mean squared error can be used to evaluate the difference of images, for instance
to compare the result of a denoising method with the original image.
In the context of image compression, one compares the uncompressed image
with the compressed one. For this purpose, the “peak signal-to-noise ratio,” PSNR,
is common. Essentially, the PSNR is a scaled version of the MSE; it measures the
ratio of the maximal possible energy of the signal and the energy of the existing
noise. The PSNR is usually given logarithmically (more precisely, in decibels):
Definition 3.5 (Peak Signal-to-Noise Ratio (PSNR)) For two continuous images
u, v ∈ L2 () with u, v : → [0, 1], the PSNR is given by
1
PSNR(u, v) = 10 log10 db.
MSE(u, v)
If U, V ∈ RN×M are discrete images with Ui,j , Vi,j ∈ [0, 255], then we have
2552
PSNR(U, V ) = 10 log10 db.
MSE(u, v)
Note that a higher PSNR value implies a better image quality. We set PSNR(u, u) =
∞; a PSNR value of over 40db typically means that the difference between the
images cannot be perceived; cf. Fig. 3.2. The PSNR is designed to measure noise
or compression artifacts. It is not suitable for specifying a distance between two
general images that in some sense reflects the “similarity” of these images.
62 3 Basic Tools
3.2 Histograms
A histogram contains some important properties of the image and is very helpful
for several basic applications. Roughly speaking, a histogram specifies how often
the different gray values appear in the image. Before we introduce the histogram for
continuous images, we consider a basic example:
Example 3.6 (Histogram of a Discrete Image) We consider a discrete image u :
→ F with = {1, . . . , N } × {1, . . . , M} and F = {0, . . . , n}. The histogram
Hu of u states how often the respective gray values appear in the image:
Hu (k) = #{(i, j ) ∈ ui,j = k}.
3.2 Histograms 63
By means of the Kronecker delta, we can represent the histogram in a different way:
N
M
Hu (k) = δk,ui,j .
i=1 j =1
with 0 < s1 < s2 < s3 , i.e., the color space is essentially discrete. Then the
distribution function reads
⎧
⎪
⎪ 0 if s < s1 ,
⎪
⎪
⎨μ( ) if s1 ≤ s < s2 ,
1
Gu (s) =
⎪
⎪ μ(1 ) + μ(2 ) if s2 ≤ s < s3 ,
⎪
⎪
⎩
μ(1 ) + μ(2 ) + μ(3 ) = μ() if s3 ≤ s.
the resulting image may exhibit a low contrast. By means of the histogram, a
reasonable contrast improvement method can be motivated.
An image with high contrast usually has gray values in all the range available. If
we assume this range to be the interval [0, 1], the linear gray value spread
s − ess inf u
s →
ess sup u − ess inf u
leads to a full coverage of the range. However, this does not necessarily suffice to
increase the contrast sufficiently in all parts of the image, i.e., in particular areas,
the contrast can still be improvable. One possibility is to distribute the gray values
equally over the range of gray values as much as possible. For this purpose, we look
for a monotonic function : R → [0, 1] such that
(s) = Gu (s)/μ().
n n s0
(s0 ) = round Gu (s0 ) = round Hu (s) . (3.1)
μ() μ()
s=0
50 100 150 200 250 50 100 150 200 250 50 100 150 200 250
250 250
200 200
150 150
100 100
50 50
Fig. 3.3 Histogram equalization according to Application 3.10. Left column: Original image with
histogram. Middle column: Spreading of the gray values to the full range. Right column: Image
with equalized histogram. Lowest row: Respective transformation
The question remains how to determine the threshold. There are numerous corre-
sponding methods. In this simple case, the following idea often works well:
We select the threshold such that it corresponds to the arithmetic mean of the centers of
mass of the histogram above and below the threshold.
3.2 Histograms 67
Since the center of mass corresponds to the normed first moment, we can write this
is formula as follows: The threshold s0 satisfies the equation
s S
1 0 0 s dHu (s) s s dHu (s)
s0 = s0 + S0 .
2 0 1 dHu(s) 1 dH u (s)
s0
s S
1 0 0 s dHu (s) s s dHu (s)
s0n+1 = f (s0n ) with f (s0 ) = s0 + S0 .
2 0 1 dHu (s)
s0 1 dHu (s)
Why does this fixed-point iteration converge? The iteration map f , as a sum of two
increasing functions, is monotonically increasing. Furthermore, we have
S S
1 0 sHu (s) ds 1 0 sHu (s) ds
f (0) = S ≥ 0, f (S) = S +S ≤ S.
2 2
0 Hu (s) ds 0 Hu (s) ds
Due to the monotonicity, there exists at least one fixed point with a slope of less than
one. Thus, the fixed-point iteration converges.
This method is also known as the isodata algorithm and has been used since the
1970s (cf. [117]). An example is given in Fig. 3.4.
Fig. 3.4 Segmentation through thresholding according to Application 3.11. Left: Scanned hand-
writings. Right: Segmentation
68 3 Basic Tools
Linear filters belong to the oldest tools in digital image processing. We first consider
an introductory example:
Example 3.12 (Denoising with the Moving Average) We consider a continuous
image u : Rd → R and expect that this image exhibits some form of noise, i.e., we
assume that the image u results from a real u† by adding noise n:
u = u† + n.
Furthermore, we assume that the noise is distributed in some way uniformly around
zero (this is not a precise mathematical assumption, but we do without a more
precise formulation here). In order to reduce the noise, we take averages over
neighboring values and hope that this procedure will suppress the noise. In formulas,
this reads: for a radius r > 0, we compute
1
Mr u(x) = d u(y) dy.
L (Br (0)) Br (x)
Fig. 3.5 Denoising with moving average. Left: Original image. Middle: Original image with
noise. Right: Application of the moving average. Next to this: The indicator function used
3.3 Linear Filters 69
underlies filtering is the convolution (apart from the sign in the argument of the
filter function).
Ty (u ∗ h) = (u ∗ Ty h) = (Ty u ∗ h).
∂ α (u ∗ φ) ∂αφ
= u ∗ .
∂x α ∂x α
φε (x) = 1
εd
φ( xε ).
70 3 Basic Tools
For p = ∞, we can pull the supremum of |u| out of the integral and obtain the
required estimate. For p < ∞, we integrate the pth power of u ∗ v and get
p
|u ∗ v(x)| dx ≤ p
|u(x − y)||v(y)| dy dx.
Rd Rd Rd
The application of Fubini’s theorem is justified here in retrospect since the latter
integrals exist. After taking the pth root, the assertion follows. For q > 1, the
proof is similar but more complicated, cf. [91], for instance.
2. We prove the assertion only for the first partial derivatives; the general case then
follows through repeated application. We consider the difference quotient in the
3.3 Linear Filters 71
which can be found using the variable transformation ξ = x/ε. We conclude that
|(u ∗ φε )(x) − u(x)| ≤ |u(x − y) − u(x)||φε (y)| dy.
Rd
We choose ρ > 0 and split the integral into large and small y:
On the one hand, this shows the pointwise convergence; on the other hand, it also
shows the uniform convergence on compact sets. $
#
Expressed in words, the properties of the convolution read:
1. The convolution is a linear and continuous operation if it is considered between
the function spaces specified.
2. The convolution of functions inherits the smoothness of the smoother function
of the two, and the derivative of the convolution corresponds to the convolution
with the derivative.
3. The convolution of a function u with a “narrow” function φ approximates u in a
certain sense.
For image processing, the second and third property are of particular interest:
Convolving with a smooth function smooths the image. When convolving with a
function φε , for small ε, only a small error occurs.
72 3 Basic Tools
In fact, the convolution is often even slightly smoother than the smoother one of
the functions. A basic example for this case is given in the following proposition:
Theorem 3.14 Let p > 1 and p∗ the dual exponent, u ∈ Lp (Rd ), and v ∈
∗
Lp (Rd ). Then u ∗ v ∈ C (Rd ).
Proof For h ∈ Rd , we estimate by means of Hölder’s inequality:
|u ∗ v(x + h) − u ∗ v(x)| ≤ |u(y)||v(x + h − y) − v(x − y)| dy
Rd
1/p ∗
1/p∗
≤ |u(y)|p dy |v(x + h − y) − v(x − y) dy|p
Rd Rd
The fact that the last integral converges to 0 for h → 0 corresponds to the assertion
∗
that Lp -functions are continuous in the p∗ -mean for 1 ≤ p < ∞, see Exercise 3.4.
$
#
Note that Theorem 3.13 yields only u ∗ v ∈ L∞ (Rd ) in this case.
The smoothing properties of the convolution can also be formulated in several
other ways than in Theorem 3.13, for instance, in Sobolev spaces H m,p (Rd ):
Theorem 3.15 Let m ∈ N, 1 ≤ p, q ≤ ∞, 1r + 1 = p1 + q1 , u ∈ Lp (Rd ), and
h ∈ H m,q (Rd ). Then u ∗ h ∈ H m,r (Rd ), and for the weak derivatives up to order
m, we have
∂ α (u ∗ h) = u ∗ ∂ α h almost everywhere.
Furthermore, let 1 and 2 be domains. The assertion holds with u∗h ∈ H m,r (2 )
if u ∈ Lp (1 ) and h ∈ H m,q (2 − 1 ).
Proof We use the definition of the weak derivative and calculate, similarly to
Theorem 3.13, using Fubini’s theorem:
|α| ∂α
∂ (u ∗ h)φ(x) dx = (−1)
α
(u ∗ h)(x) φ(x) dx
Rd Rd ∂x α
∂α
= (−1)|α| u(y)h(x − y) dy φ(x) dx
Rd Rd ∂x α
|α| ∂α
= (−1) u(y) h(x − y) φ(x) dx dy
Rd Rd ∂x α
= u(y) ∂ α h(x − y)φ(x) dx dy
Rd Rd
= (u ∗ ∂ α h)(x)φ(x) dx.
Rd
3.3 Linear Filters 73
Now the asserted rule for the derivative follows from the fundamental lemma of the
calculus of variations, Lemma 2.75. Due to ∂ α h ∈ Lq (Rd ), Theorem 3.13 yields
∂ α (u ∗ h) = u ∗ ∂ α h ∈ Lr (Rd ) for |α| ≤ m, and hence we have u ∗ h ∈ H m,p (Rd ).
In particular, the existence of the integral on the left-hand side above is justified
retrospectively.
The additional claim follows analogously; we remark, however, that for φ ∈
D(2 ), extending by zero and y ∈ 1 , we have Ty φ ∈ D(2 − y). Thus, the
definition of the weak derivative can be applied. $
#
If the function φ of Theorem 3.13 (3) additionally lies in D(Rd ), the functions
φε are also called mollifiers. This is motivated by the fact that in this case, u ∗
φε is infinitely differentiable and for small ε, it differs only slightly from u. This
situation can be expressed more precisely in various norms, such as the Lp -norms,
for instance:
Lemma 3.16 Let ⊂ Rd be a domain and 1 ≤ p < ∞. Then for every u ∈
Lp (), mollifier φ, and δ > 0, there exists some ε > 0 such that
2δ
φε ∗ u − up ≤ φε ∗ u − φε ∗ f p + φε ∗ f − f p + f − up ≤ + φε ∗ f − f p .
3
2δ
u − φε ∗ f p ≤ u − f p + φε ∗ f − f p < < δ,
3
where ε > 0 is chosen sufficiently small such that we have supp f − supp φε ⊂⊂ .
Together with φε ∗ f ∈ D(), this implies the density. #
$
A similar density result also holds for Sobolev spaces:
Lemma 3.17 Let ⊂ Rd be a domain, m ∈ N with m ≥ 1, 1 ≤ p < ∞ and
u ∈ H m,p (). Then for every δ > 0 and subdomain with ⊂⊂ , there exists
f ∈ C ∞ () such that u − f m,p < δ on .
Proof We choose a mollifier φ and ε0 > 0 such that − supp φε ⊂⊂ holds
for all ε ∈ ]0, ε0 [. The function fε = φε ∗ u now lies in C ∞ () and according
74 3 Basic Tools
to Theorem 3.15, for every ε ∈ ]0, ε0[, one has fε ∈ H m,p ( ), where we have
∂ α fε = u ∗ ∂ α φε for every multi-index α with |α| ≤ m. By means of Lemma 3.16,
we can choose ε sufficiently small such that for every multi-index with |α| ≤ m,
one has
δ
∂ α (u − fε )p = ∂ α u − φε ∗ ∂ α up < in ,
M
where M denotes the number of multi-indices with |α| ≤ m. Setting f = fε and
using the Minkowski inequality yields the desired assertion. $
#
Theorem 3.18 Let 1 ≤ p < ∞ and m ∈ N. Then the space D(Rd ) is dense in
H m,p (Rd ).
Proof We first show that the set C ∞ (Rd ) ∩ H m,p (Rd ) is dense. Let u ∈ H m,p ()
and φ a mollifier. Since in particular u ∈ Lp (Rd ), we obtain u ∗ φε → u in Lp (Rd )
and for |α| ≤ m, (∂ α u) ∗ φε = ∂ α (u ∗ φε ) converges to ∂ α u in Lp (Rd ), i.e., we have
u ∗ φε → u in H m,p (Rd ).
Now we show that the space D(Rd ) is dense in C ∞ (Rd ) ∩ H m,p (Rd ) (which will
complete the proof). For this purpose, let u ∈ C ∞ (Rd ) ∩ H m,p (Rd ), which implies
in particular that the classical derivatives ∂ α u up to order m are in Lp (Rd ). Now
let η ∈ D(Rd ) with η ≡ 1 in a neighborhood of zero. For R > 0, we consider the
functions uR (x) = u(x)η(x/R). Then uR ∈ D(Rd ), and according to the dominated
convergence theorem, we also have uR → u in Lp (Rd ) for R → ∞. For the partial
derivatives, due to the Leibniz formula, we have
α
(∂ uR )(x) =
α
(∂ α−β u)(x)R −|β| (∂ β η)(x/R).
β
β≤α
Remark 3.21 In image processing, it is more common to speak of filters rather than
convolutions. This corresponds to a convolution with a reflected convolution kernel:
the linear filter for h is defined by u ∗ D− id h. If not stated otherwise, we will use
the term “linear filter” for the operation u ∗ h in this book.
3.3.2 Applications
By means of linear filters, one can create interesting effects and also tackle some
of the fundamental problems of image processing. Exemplarily, we here show three
applications:
Application 3.22 (Effect Filters) Some effects of analog photography can be
realized by means of linear filters:
Duto filter: The Duto filter overlays the image with a smoothed version of itself.
The result gives the impression of blur on the one hand, whereas on the other
hand, the sharpness is maintained. This results in a “dream-like” effect. In
mathematical terms, the Duto filter can be realized by means of a convolution
with a Gaussian function, for instance. For this purpose, let
1 −|x|2
Gσ (x) = exp (3.2)
(2πσ )d/2 2σ
be the d-dimensional Gaussian function with variance σ . In the case of the Duto
filter, the convolved image is linearly overlayed with the original image. This can
be written as a convex combination with parameter λ ∈ [0, 1]:
λu + (1 − λ)u ∗ Gσ .
Motion blur: If an object (or the camera) moves during the exposure time, a point
of the object is mapped onto a line. For a linear motion of length l in direction
76 3 Basic Tools
1 1
μ= H L
l
by means of the Hausdorff measure of Example 2.38. The motion blur is then
given by
u(x + y) dμ(y).
In Fig. 3.6, an example for the application of the Duto filter as well as motion
blurring is given.
Fig. 3.6 Effect filters of Application 3.22. Left: Original image. Center: Duto blurrer with λ =
0.5. Right: Motion blurring
3.3 Linear Filters 77
u:
u :
u :
Fig. 3.7 A one-dimensional gray value distribution and its first two derivatives
2
0.1 0.1
1
-3 -2 -1 1 -3 -2 -1 1
-3 -2 -1 0 1 2 3 -0.1 -0.1
2 2
1 1
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
Fig. 3.8 Edge detection by determining the maxima of the smoothed derivative. Left: A noisy edge
at zero. Center and right: On top a filter function, below the corresponding result of the filtering
∂(u ∗ f )(x)
= (u ∗ f )(x).
∂x
Different functions h = f lead to different qualities of results for the edge detection
by determining the maxima of (u ∗ f )(x), cf. Fig. 3.8.
Canny [29] presents a lengthy derivation of a class of, in some sense, optimal
functions f . Here, we present a heuristic variant, which leads to the same result.
Edges exist on different scales, i.e., there are “coarse edges” and “fine edges.”
Fine edges belong to small, delicately structured objects and are thus suppressed
by a convolution with a function with a high variance. The convolution with a
function with a small variance changes the image only slightly (cf. Proposition 3.13)
and hence preserves all edges. Therefore, we consider rescaled versions fσ (x) =
σ −1 f (σ −1 x) of a given convolution kernel f . If σ is large, fσ is “wider” and if σ
is small, fσ is “narrower.” For one original image u0 , we thus obtain a whole class
of smoothed images:
u(x, σ ) = u0 ∗ fσ (x).
We now formulate requirements for finding a suitable f . We require that the location
of the edges remain constant for different σ . Furthermore, no new edges shall appear
78 3 Basic Tools
for larger σ . In view of Fig. 3.7, we hence require that at an edge point x0 , we have
∂2 ∂
u(x0 , σ ) > 0 &⇒ u(x0 , σ ) > 0,
∂x 2 ∂σ
∂2 ∂
2
u(x0 , σ ) = 0 &⇒ u(x0 , σ ) = 0,
∂x ∂σ
∂2 ∂
u(x0 , σ ) < 0 &⇒ u(x0 , σ ) < 0.
∂x 2 ∂σ
In other words, if the second derivative in the x-direction of u is positive (or
zero/negative), then u will increase (or remain constant/decrease) for increasing σ ,
i.e., for coarser scales. In order to ensure this, we require that the function u solve
the following differential equation:
∂ ∂2
u(x, σ ) = u(x, σ ). (3.3)
∂σ ∂x 2
Furthermore, σ = 0 should lead to the function u0 , of course. Thus, we set the initial
value for the differential equation as follows:
The initial value problem (3.3), (3.4) is known from physics, where is models heat
conduction in one dimension. The problem admits a unique solution,
√ which is given
by the convolution with the Gaussian function (3.2) with variance 2σ :
The points x at which ρ(x) exhibits a local maximum in the direction of the vector
(sin((x)), cos((x))) are then marked as edges. Afterward, a threshold is applied
in order to suppress edges that are not important or induced by noise, i.e., if ρ(x)
is smaller than a given value τ , the corresponding x is removed. The result of the
3.3 Linear Filters 79
Fig. 3.9 Edge detection. Top left: Original image with 256×256 pixels. Top center: Edge detection
through thresholding of the gradient after convolution with a Gaussian function with σ = 3.
Bottom: Edge detection with Canny edge detector (from left to right: σ = 1, 2, 3)
Canny detector is shown in Fig. 3.9. For the sake of comparison, the result of a
simple thresholding method is depicted as well. In this case, ρ is calculated as for
the Canny edge detector, and then all points x for which ρ(x) is larger than a given
threshold are marked as edges.
u
80 3 Basic Tools
Furthermore, we consider the second derivative of the cross section f and remark
(analogously to Fig. 3.7) that if we subtract from f a small multiple of the second
derivative f , we obtain an image in which the edge is steeper than before:
f f − τ f
Note, however, that edges occur in different directions. Thus, a rotationally invariant
differential operator is necessary. The simplest one is given by the Laplace operator
∂ 2u ∂ 2u
u = + 2 .
∂x 2 ∂y
u − τ u.
Note that this is a linear operation. In general, we cannot assume that the images u
are sufficiently smooth, so that the Laplace operator may not be well defined. Again,
a simple remedy is to smoothen the image beforehand—by a convolution with a
Gaussian function, for instance. According to Proposition 3.13, we then obtain
...
In general, however, the edges are overemphasized after some time, i.e., the function
values in a neighborhood of the edge are smaller or larger than in the original image.
Furthermore, noise can be increased by this operation as well. These effects can be
seen in Fig. 3.10.
3.3 Linear Filters 81
Fig. 3.10 Laplace sharpening. Top left: Original image with 256 × 256 pixels. Next to it:
Successively applying Laplace sharpening of Application 3.24 (parameter: σ = 0.25, α = 1)
The generalization to higher dimensions is obvious. Dealing with finite images and
finite convolution kernels, we obtain finite sums and are faced with the problem
that we have to evaluate U or H at undefined points. This problem can be tackled
by means of a boundary treatment or a boundary extension, in which it suffices to
extend U . We extend a given discrete and finite image U : {0, . . . , N − 1} → R to
an image Ũ : Z → R in one of the following ways:
• Periodic extension: Tessellate Z with copies of the original image
Ũi = Ui mod N .
For the extension of images of multiple dimensions, the rules are applied to each
dimension separately. Figure 3.11 shows an illustration of the different methods in
two dimensions.
Note that the periodical and zero extensions induce unnatural jumps at the
boundary of the image. The constant and the symmetric extensions do not produce
such additional jumps.
Let us now consider two-dimensional images U : Z2 → R and cover some
classical methods that belong to the very first methods of digital image processing.
3.3 Linear Filters 83
In this context, one usually speaks of filter masks rather than convolution kernels. A
filter mask H ∈ R2r+1×2s+1 defines a filter by
r
s
(U H )i,j = Hk,l Ui+k,j +l .
k=−r l=−s
We assume throughout that the filter mask is of odd size and is indexed in the
following way:
⎡ ⎤
H−r,−s . . . H−r,s
⎢ . ⎥
H = ⎣ ... H0,0 .. ⎦ .
Hr,−s . . . Hr,s
Filtering corresponds to a convolution with the reflected filter mask. Due to the
symmetry of the convolution, we observe that
(U H ) G = U (H ∗ G) = U (G ∗ H ) = (U G) H.
84 3 Basic Tools
Therefore, the order of applying different filter masks does not matter. We now
introduce some important filters:
Moving average: For odd n, the moving average is given by
1, -
Mn = 1 . . . 1 ∈ Rn .
n
−(k 2 + l 2 ) G̃σ
G̃σk,l = exp , Gσ = .
2σ 2 k,l G̃σk,l
1, -
B1 = 110 ,
2
1 , -
B2 = 121 ,
4
1 , -
B3 = 13310 ,
8
1 , -
B4 = 14641 ,
16
..
.
For large n, the binomial filters present good approximations to Gaussian filters.
Two-dimensional binomial filters are obtained by (B n )T ∗ B n . An important
property of binomial filters is the fact that B n+1 can be obtained by a convolution
of B n with B 1 (up to translation).
Derivative filter according to Prewitt and Sobel: In Application 3.23, we saw
that edge detection can be realized by calculation of derivatives. Discretizing the
derivative in the x and y direction by central difference quotients and normalizing
the distance of two pixels to 1, we obtain the filters
⎡ ⎤
−1
1, - 1
Dx = −1 0 1 , D y = (D x )T = ⎣ 0 ⎦ .
2 2
1
Since derivatives amplify noise, it was suggested in the early days of image
processing to complement these derivative filters with a smoothing into the
3.3 Linear Filters 85
respectively opposite
, -direction. In case of the Prewitt filters [116], a moving
average M 3 = 1 1 1 /3 is used:
⎡ ⎤
−1 0 1
1
x
DPrewitt = (M 3 )T ∗ D x = ⎣−1 0 1⎦ ,
6
−1 0 1
⎡ ⎤
−1 −1 −1
1
= M 3 ∗ Dy = ⎣ 0 0 0 ⎦ .
y
DPrewitt
6
1 1 1
The Sobel filters [133] use the binomial filter B 2 as a smoothing filter:
⎡ ⎤
−1 0 1
1
x
DSobel = (B 2 )T ∗ D x = ⎣−2 0 2⎦ ,
8
−1 0 1
⎡ ⎤
−1 −2 −1
1
= B 2 ∗ Dy = ⎣ 0 0 0 ⎦ .
y
DSobel
8
1 2 1
Laplace filter: We have already seen the Laplace operator in Application 3.24,
where it was used for image sharpening. In this case, we need to discretize
second-order derivatives. We realize this by successively applying forward and
backward difference quotients:
∂ 2u , - , - , -
2
≈ (U D−
x
) D+
x
= (U −1 1 0 ) 0 −1 1 = U 1 −2 1 .
∂x
Therefore, the Laplace filter is obtained by
⎡ ⎤
0 1 0
u ≈ (U D−
x
) D+
x
+ (U
y y
D− ) D+ ⎣
= U 1 −4 1⎦ .
0 1 0
H = GT ∗ F.
86 3 Basic Tools
U H = (U GT ) F
and therefore, the numerical cost reduces to O((2r + 1) + (2s + 1)). The moving
average, as well as the Laplace, Sobel, Prewitt, and binomial filters is separable.
Recursive implementation: The moving average can be implemented recur-
sively: If Vi = (U M 2n+1 )i = n1 nk=−n Ui+k is known, then
1 , - 1 , - , - , - , -
14641 = 110 011 110 011 .
16 16
Note that each of the partial filters consists of only one addition. Furthermore,
the multiplication by 1/16 presents a bit shift, which can be executed faster than
a multiplication.
Morphological filters are the main tools of mathematical morphology, i.e., the
theory of the analysis of spatial structures in images (the name is derived from
the Greek word “morphe” = shape). The mathematical theory of morphological
filters traces back to the engineers Georges Matheron and Jean Serra, cf. [134], for
instance. Morphological methods aim mainly at the recognition and transformation
of the shape of objects. We again consider an introductory example:
Example 3.25 (Denoising of Objects) Let us first assume that we have found an
object in a discrete digital image—by means of a suitable segmentation method, for
instance. For the mathematical description of the object, an obvious approach is to
encode the object as a binary image, i.e., as a binary function u : Rd → {0, 1} with
1 if x belongs to the object,
u(x) =
0 if x does not belongs to the object.
3.4 Morphological Filters 87
Furthermore, we assume that the object is “perturbed,” i.e., that there are perturba-
tions in the form of “small” objects. Since “1” typically encodes the color white
and “0” the color black, the shape consists of the white part of the image u, and the
perturbations are small additional white points.
Since we know that the perturbances are small, we define a “structure element”
B ⊂ Rd of which we assume that it is just large enough to cover each perturbance.
In order to eliminate the perturbances, we compute a new image v by
1 if for all y ∈ B, there holds u(x + y) = 1
v(x) =
0 otherwise.
This implies that only those points x for which the structure element B shifted by x
lies within the old object completely are part of the new object v. This eliminates all
parts of the objects that are smaller than the structuring element. However, the object
is also changed significantly: the object is “thinner” than before. We try to undo the
“thinning” by means of the following procedure: we compute another image w by
1 if for a y ∈ B, v(x + y) = 1,
w(x) =
0 otherwise.
Hence, a point x is part of the new object w if the structure element B shifted by
x touches the old object v. This leads to an enlargement of the object. However, all
remaining perturbations are increased again as well. All in all, we have reached our
goal relatively well: the perturbations are eliminated to a large extent and the object
is changed only slightly; cf. Fig. 3.12.
The methods used in this introductory example are the fundamental methods
in mathematical morphology and will now be introduced systematically. The
u v w
We will also use the notation u ∨ v and u ∧ v for the supremum and infimum,
respectively. Obtaining the complement corresponds to subtraction from one:
u(x) = 1 − u(x).
u u B
u⊕B u
Fig. 3.13 Illustration of dilation (left) and erosion (right) of an object (dashed line) with a circular
disk
Erosion and dilation can be extended to grayscale images in a natural way. The
key to this is provided by the following simple lemma:
Lemma 3.27 Let u : Rd → {0, 1} and B ⊂ Rd nonempty. Then
Proof The proof consists simply in carefully considering the definitions. For the
dilation, we observe that we have supy∈B u(x + y) = 1 if and only if u(x + y) = 1
for some y ∈ B. And for the erosion we have infy∈B u(x + y) = 1 if and only if
u(x + y) = 1 for all y ∈ B. $
#
The formulation of erosion and dilation in this lemma does not use the fact that u(x)
attains only the values 0 and 1. Therefore, we can use the formulas for real-valued
functions u analogously. In order to avoid the values ±∞ in the supremum and
infimum, we assume in the following that u is bounded, i.e., we work in the vector
space of bounded functions:
B(Rd ) = {u : Rd → R u bounded}.
−(u ⊕ B) = (−u) * B.
Translation invariance
(Ty u) * B = Ty (u * B);
(Ty u) ⊕ B = Ty (u ⊕ B).
Monotonicity
u * B ≤ v * B,
u≤v ⇒
u ⊕ B ≤ v ⊕ B.
Distributivity
(u ∧ v) * B = (u * B) ∧ (v * B),
(u ∨ v) ⊕ B = (u ⊕ B) ∨ (v ⊕ B).
Composition For B + C = {x + y ∈ Rd x ∈ B, y ∈ C}, one has
(u * B) * C = u * (B + C),
(u ⊕ B) ⊕ C = u ⊕ (B + C).
Proof The proofs of these assertions rely on the respective properties of the
supremum and infimum. For instance, we can show duality as follows:
The further proofs are a good exercise in understanding the respective notions. $
#
Dilation and erosion obey a further fundamental property: among all operations
on binary images, they are the only ones that are translation invariant and satisfy
3.4 Morphological Filters 91
the distributivity of Proposition 3.29, i.e., dilation and erosion are characterized by
these properties, as the following theorem shows:
Theorem 3.30 Let D be a translation invariant operator on binary images with
D(0) = 0 such that for each set of binary images ui ⊂ Rd ,
. .
D( ui ) = D(ui ).
i i
D(u) = u ⊕ B.
E(u) = u * B.
Proof Since D is translation invariant, translation invariant images are mapped onto
translation invariant images again. Since 0 and χRd are the only translation invariant
images, we must have either D(0) = 0 or D(0) = χRd . The second case, in which
D would not be a dilation, is excluded by definition.
Since we can write every binary image u ⊂ Rd as a union of its elements χ{y} ,
we have
. .
Du = D( χ{y} ) = Dχ{y} .
y∈u y∈u
and we obtain
Du = u ⊕ Dχ{0} (− · ),
D̃(u) = u ⊕ B,
A further property of erosion and dilation that makes them interesting for image
processing is their contrast invariance:
Theorem 3.31 Let B ⊂ Rd be nonempty. Erosion and dilation with structure
element B are invariant with respect to change of contrast, i.e., for every continuous,
monotonically increasing grayscale transformation : R → R and every u ∈
B(Rd ),
Proof The assertion is a direct consequence of the fact that continuous, monotoni-
cally increasing functions can be interchanged with supremum and infimum; shown
exemplarily for the dilation,
(u) ⊕ B(x) = sup (u(x + y)) = (sup u(x + y)) = (u ⊕ B(x)). $
#
y∈B y∈B
Erosion and dilation can be combined in order to achieve specific effects. We already
saw this procedure in the introductory Example 3.25: By means of erosion followed
by a dilation we could remove small perturbations. By erosion of binary images, all
objects that are “smaller” than the structure element B (in the sense of set inclusion)
are eliminated. The eroded image contains only the larger structures, albeit “shrunk”
3.4 Morphological Filters 93
by B. A subsequent dilation with −B then has the effect that the object is suitably
enlarged again.
On the other hand, dilation fills up “holes” in the object that are smaller than
B. However, the result is enlarged by B, which can analogously be corrected by a
subsequent erosion with −B. In this way, we can “close” holes that are smaller than
B. In fact, the presented procedures are well-known methods in morphology.
Definition 3.32 (Opening and Closing) Let B ⊂ Rd be a nonempty structure
element and u ∈ B(Rd ) an image. Set −B = {−y y ∈ B}. The operator
u ◦ B = (u * B) ⊕ (−B)
u • B = (u ⊕ B) * (−B)
is called closing.
The operators inherit many properties from the basic operators, yet in contrast to
erosion and dilation, it is less reasonable to iterate them.
Theorem 3.33 Let be B a nonempty structure element, u, v ∈ B(Rd ) images and
y ∈ Rd . Then the following properties hold
Translation invariance
(Ty u) ◦ B = Ty (u ◦ B),
(Ty u) • B = Ty (u • B).
Duality
−(u • B) = (−u) ◦ B.
Monotonicity
u ◦ B ≤ v ◦ B,
u≤v ⇒
v • B ≤ v • B.
u ◦ B ≤ u, u ≤ u • B.
Idempotence
(u ◦ B) ◦ B = u ◦ B,
(u • B) • B = u • B.
94 3 Basic Tools
which apparently does not apply for y = z. We obtain a contradiction and hence
conclude that u ◦ B ≤ u. Analogously, we can deduce u • B ≥ u.
In order to show the idempotence of the opening, we remark that due to the anti-
extensionality of the opening, we have
(u ◦ B) ◦ B ≤ u ◦ B.
On the other hand, the monotonicity of the erosion and the extensionality of the
closing imply
(u ◦ B) * B = ((u * B) ⊕ (−B)) * B
= (u * B) • (−B)
≥ u * B.
u # B = u − u ◦ B,
while
u$B =u•B −u
u # B ≥ 0, u $ B ≥ 0,
3.4.3 Applications
Morphological operators can be used to solve a multitude of tasks. Often, the oper-
ators introduced here are combined in an intelligent and creative way. Therefore, it
is difficult to tell in general for which type of problems morphology can provide a
solution. The following application examples primarily illustrate the possibilities of
applying morphological filters and serve as an inspiration for developing one’s own
methods.
96 3 Basic Tools
If there are many lines with direction α in the image, there will remain many points;
otherwise, there will not. Considering the area of the remaining object, we can thus
determine the dominant directions by means of the largest local maxima (also see
Fig. 3.14).
10.000
5.000
π/ 2 π
Fig. 3.14 Example for the detection of dominant directions. Upper left: Underlying image. Upper
right: Result of edge detection by the algorithm according to Canny. Lower left: Amount of pixels
in the eroded edge image depending on the angle α. Lower right: Image overlaid with the three
dominant directions determined by maxima detection
3.4 Morphological Filters 97
The application of the hit-or-miss operator now results in an image in which exactly
those letters are marked that exhibit a serif at the bottom in the form of the hit mask;
see Fig. 3.15.
◦ ( B, C )
u selections of the hit-or-miss mask
(red/blue)
Fig. 3.15 Example for a simple application of the hit-or-miss operator to select a downward
oriented serif B. The “hit operator” u * B also selects the cross bar of “t”, which is excluded by
applying the “miss operator” u * C. The combination finally leads to exactly the serif described
by B and C
u
⏐ u•B u−u•B =
⏐ −(u B)
⏐ ⏐
⏐ ⏐
Fig. 3.16 Correction of an irregular background to improve segmentation. The structure element
is chosen to be a square. The lower two images show the respective results of the automatic
segmentation introduced in Application 3.11
where Bi,j = 1 denotes that (i, j ) belongs to the structure element and (i, j ) = 0
implies that (i, j ) is not part of the structure element. Erosion and dilation are then
defined as follows:
(u * B)i,j = min{ui+k,j +l (k, l) with Bk,l = 1},
(u ⊕ B)i,j = max{ui+k,j +l (k, l) with Bk,l = 1}.
The discrete erosion and dilation satisfy all properties of their continuous variants
presented in Theorem 3.29. For the composition property
(u * B) * C = u * (B + C), (u ⊕ B) ⊕ C = u ⊕ (B + C),
to find the minimum (or maximum) of n numbers; which can be achieved with
n − 1 pairwise comparisons. Let us further assume that the image consists of NM
pixels and that the boundary extension is of negligible cost. Then we observe that
the application of the erosion or dilation requires (n − 1)NM pairwise comparisons.
Due to the composition property in Theorem 3.29, we can increase the efficiency in
certain cases:
If B and C consist of n and m elements, respectively, then B +C can have at most
nm elements. Hence, for the calculation of u * (B + C), we need (nm − 1)NM
pairwise comparisons in the worst case. However, to compute (u * B) * C, we
need only (n + m − 2)NM pairwise comparisons. Already for moderately large
structure elements, this can make a significant difference, as the following example
demonstrates. Therein, we omit the zeros at the boundary of the structure element
and denote the center of a structure element by an underscore:
, - , - , - , -
11 + 101 + 10001 = 11111111 ,
B1 B2 B3
⎡ ⎤
11111111
⎢1 1 1 1 1 1 1 1⎥
⎢ ⎥
⎢1 1 1 1 1 1 1 1⎥
⎢ ⎥
⎢ ⎥
⎢1 1 1 1 1 1 1 1 ⎥
B1 + B2 + B3 + (B1T + B2T + B3T ) = ⎢ ⎥.
⎢1 1 1 1 1 1 1 1⎥
⎢ ⎥
⎢1 1 1 1 1 1 1 1⎥
⎢ ⎥
⎣1 1 1 1 1 1 1 1⎦
11111111
The erosion with an 8 × 8 square (64 elements, 63 pairwise comparisons) can hence
be reduced to 6 erosions with structure elements containing two elements each (6
pairwise comparisons).
The other morphological operators (opening, closing, hit-or-miss, top-hat trans-
formations) are obtained by combination. Their properties are analogous to the
continuous versions.
In the discrete variant, there is an effective generalization of erosion and dilation:
for erosion with a structure element with n elements, for each pixel of the image, the
smallest gray value is taken that is hit by the structure element (for dilation, we take
the largest one). The idea of this generalization is to sort the image values masked
by the structure element and the subsequent replacement by the nth value within
this rank order. These filters are called rank-order filters.
Definition 3.42 Let B ∈ {0, 1}2r+1×2s+1 be a structure element with n ≥ 1
elements. The elements will be indexed by I = {(k1 , l1 ), . . . , (kn , ln )}, i.e., we have
Bk,l = 1 if and only if (k, l) ∈ I . By sort(a1 , . . . , an ) we denote the nondecreasing
reordering of the vector (a1 , . . . , an ) ∈ Rn . The mth rank-order filter of a bounded
3.5 Further Developments 101
u1 B = u * B, un B = u ⊕ B.
The operator performs a type of averaging that is more robust with respect to outliers
than the computation of the arithmetic mean. It is thus well suited for denoising in
case of so-called impulsive noise, i.e., the pixels are not all perturbed additively, but
random pixels have a random value that is independent of the original gray value;
refer to Fig. 3.17.
Even though this chapter treats basic methods, it is worthwhile to cover further
developments in this field in the following. In particular, in the area of linear filters,
there are noteworthy further developments. An initial motivation for linear filters
was the denoising of images. However, we quickly saw that this does not work
particularly well with linear filters, since in particular, edges cannot be preserved. In
order to remedy this, we recall the idea in Example 3.12. Therein, the noise should
be reduced by computing local averages. The blurring of the edges can then be
explained by the fact that the average near an edge considers the gray values on both
sides of the edge. The influence of the pixels on “the other side of the edge” can be
resolved by considering not only the spatial proximity, but also the proximity of the
gray values during the averaging process. The so-called ∼bilateral filter achieves
this as follows: for a given image u : Rd → R and two functions h : Rd → R and
g : R → R, it computes a new image by
1
Bh,g u(x) = u(y)h(x − y)g(u(x) − u(y)) dy.
Rd h(x − y)g(u(x) − u(y)) dy Rd
102 3 Basic Tools
Fig. 3.17 Denoising in case of impulsive noise. Upper left: Original image. Upper right: Image
perturbed by impulsive noise; 10% of the pixels were randomly replaced by black or white pixels
(PSNR of 8.7 db). Lower left: Application of the moving average with a small circular disk with
a radius of seven pixels (PSNR of 22.0 db). Lower right: Application of the median filter with a
circular structure element with a radius of two pixels (PSNR of 31.4 db). The values of the radii
were determined such that the respective PSNR is maximal
We observe that the function h denotes the weight of the gray value u(y) depending
on the distance x − y, while the function g presents the weight of the gray value
depending on the similarity of the gray values u(x) and u(y). The factor
u(y)
( Rd h(x − y)g(u(x) − u(y)) dy)−1 is a normalization factor that ensures that the
weights integrate to one in every point x. For linear filters
(in this case, when there
is no function g), it does not depend on x, and usually Rd h(y) dy = 1 is required.
The name “bilateral filter” traces back to Tomasi and Manduchi [136]. If we chose
h and g to be Gaussian functions, the filters are also called “nonlinear Gaussian
filters” after Aurich and Weule [11]. In the case of characteristic functions h and
g, the filter is also known as SUSAN [132]. The earliest reference for this kind of
filters is probably Yaroslavski [145]. The bilateral filter exhibits excellent properties
for edge-preserving denoising; see Fig. 3.18. A naive discretization of the integrals,
however, reveals a disadvantage: the numerical cost is significantly higher than for
linear filters, since the normalization factor has to be recalculated for every point
3.5 Further Developments 103
Fig. 3.18 The bilateral filter with Gaussian functions h and g, i.e., h(x) = exp(−|x|2 /(2σh2 )) and
g(x) = exp(−|x|2 /(2σg2 )) applied to the original image in Fig. 3.17. The range of gray values is
[0, 1], i.e., the distance of black to white amounts to one
x. Methods to increase the efficiency are covered in the overview article [108], for
instance.
Progressing a step further than bilateral filters, the so-called nonlocal aver-
ages [25] take averages not over values that are close and have a similar gray value,
but over values that have a similar neighborhood. Mathematically, this is realized
as follows: for an image u : Rd → R and a function g : Rd → R, we define the
function
h(x, y) = g(t)|u(x + t) − u(y + t)|2 dt.
Rd
For the choice of the function g, again a Gaussian function or the characteristic
function of a ball around the origin is suitable, for instance. The function h exhibits
a small value in (x, y) if the functions t → u(x + t) and t → u(y + t) are similar
104 3 Basic Tools
in a neighborhood of the origin. If they are dissimilar, the value is large. Hence, the
values x and y have similar neighborhoods in this sense if h(x, y) is small. This
motivates the following definition of the nonlocal averaging filter:
1
NL u(x) = u(y)e−h(x,y) dy.
Rd e−h(x,y) dy Rd
Nonlocal averaging filters are particularly well suited for denoising of regions
with textures. Their naive discretization is even more costly than for the bilateral
filter, since for every value x, we first have to determine h(x, y) by means of an
integration. For ideas regarding an efficient implementation, we refer to [23].
The median filter that we introduced in Sect. 3.4.4 was defined only for images
with discrete image domain. It was based on the idea of ordering the neighboring
pixels. A generalization to images u : Rd → R is given by the following: for a
measurable set B ⊂ Rd let
For this filter, there is a connection to the mean curvature flow, which we will cover
in Sect. 5.2.2. Roughly speaking, the iterated application of the median filter with
B = Bh (0) asymptotically corresponds (for h → 0) to a movement of the level
lines of the image into the direction of the normal with a velocity proportional to the
average curvature of the contour lines. For details, we refer to [69].
In this form, the median filter is based on the ordering of the real numbers. For
an image u : → F with discrete support but non-ordered color space F , the
concept of the median based on ordering cannot be transferred. Non-ordered color
spaces, for instance, arise in the case of color images (e.g. F = R3 ) or in so-called
diffusion tensor imaging; in which F is the set of symmetric matrices. If there is a
distance defined on F , we can use the following idea: the median of real numbers
a1 , . . . , an is a minimizer of the functional
n
F (a) = |a − ai |
i=1
(cf. Exercise 3.13). If there is a distance · defined on F , we can define the median
of n “color values” A1 , . . . , An as a minimizer of
n
F (A) = A − Ai .
i=1
In [143], it was shown that this procedure defines reasonable “medians” in case of
matrices Ai depending on the matrix norm, for instance.
3.6 Exercises 105
3.6 Exercises
Ty DA = DA TAy .
Exercise 3.4 (Lp -Functions Are Continuous in the pth Mean) Let 1 ≤ p < ∞ and
u ∈ Lp (Rd ). Show that
h→0
Th u − up −→ 0.
Exercise 3.5 (Solution of the Heat Equation) Let Gσ be the d-dimensional Gaus-
sian function defined in (3.2) and
∂t F = F.
d
∂2
=
i=1
∂xi2
does not exhibit any terms of odd order of differentiation, i.e., one has cα = 0 for
all multi-indices α with |α| odd.
Exercise 3.7 (Separability Test) We call a discrete two-dimensional filter mask
H ∈ R(2r+1)×(2r+1) separable if for some one-dimensional filter masks F, G ∈
R2r+1 , one has H = F ⊗ G (i.e., Hi,j = Fi Gj ).
Derive a method that for every H ∈ R(2r+1)×(2r+1), provides an n ≥ 0 as well
as separable filter masks Hk ∈ R(2r+1)×(2r+1), 1 ≤ k ≤ n, such that there holds
H = H1 + H2 + · · · + Hn
and n is minimal.
Exercise 3.8 (Proofs in the Morphology Section) Prove the remaining parts of
Theorems 3.29, 3.33, and 3.37.
Exercise 3.9 (Lipschitz Constants for Erosion and Dilation) Let B ⊂ Rd be a
nonempty structure element and u ∈ B(Rd ) Lipschitz continuous with constant
L > 0. Show that u * B and u ⊕ B are also Lipschitz continuous and that their
Lipschitz constants are less than or equal to L.
3.6 Exercises 107
u ⊕ B − v ⊕ B∞ ≤ u − v∞
u * B − v * B∞ ≤ u − v∞ .
n
min |a − ai |.
a∈R
i=1
Furthermore, show that the arithmetic mean ā = (a1 + · · · + an )/n is the unique
solution to the minimization problem
n
min |a − ai |2 .
a∈R
i=1
Chapter 4
Frequency and Multiscale Methods
Like the methods covered in Chap. 3, the methods based on frequency or scale-
space decompositions belong to the older methods in image processing. In this
case, the basic idea is to transform an image into a different representation in order
to determine its properties or carry out manipulations. In this context, the Fourier
transformation plays an important role.
u : Rd → C.
In this section, we will define the Fourier transform for certain Lebesgue spaces and
measures of Sect. 2.2.2 as well as distributions of Sect. 2.3. We begin by defining
the Fourier transform on the space L1 (Rd ).
lim 0
u(ξn ) = 0
u(ξ ),
n→∞
and hence the continuity of 0 u. The linearity of F is obvious, and the continuity
results from the estimate
1 −ix·ξ
1 1
|0
u(ξ )| = d/2 u(x)e dx ≤ d/2
|u(x)| dx = u1 ,
(2π) Rd (2π) Rd (2π)d/2
which implies 0
u∞ ≤ 1
(2π)d/2
u1 . $
#
Furthermore, the minus sign in the exponent may be omitted. Therefore, caution is
advised when using tables of Fourier transforms or looking up calculation rules.
The Fourier transform goes well with translations Ty and linear coordinate
transformations DA . Furthermore, it also goes well with modulations, which we
will define now.
Definition 4.4 For y ∈ Rd , we set
my : Rd → C, my (x) = eix·y
My : L1 (Rd ) → L1 (Rd ), My u = my u.
4.1 The Fourier Transform 111
F (Ty u) = My (F u),
F (My u) = T−y (F u),
F (DA u) = |det A|−1 DA−T (F u),
F (u) = D− id (F u).
Proof One should first assure oneself that the operators Ty , My , and DA map
both L1 (Rd ) and C (Rd ) onto themselves, i.e., all occurring terms are well defined.
According to the transformation formula for integrals,
1
(F Mω Ty u)(ξ ) = u(x + y)e−ix·(ξ −ω) dx
(2π)d/2 Rd
1
= ei(ξ −ω)·y u(z)e−iz·(ξ −ω) dz
(2π)d/2 Rd
= (T−ω My F u)(ξ ).
u real-valued &⇒ 0
u(ξ ) = 0
u(−ξ ),
u imaginary valued &⇒ 0
u(ξ ) = −0
u(−ξ ),
u real-valued &⇒ u(x) = u(−x),
0
u imaginary valued &⇒ u(x) = −u(−x).
0
However, this is not allowed at this instance, since in Definition 4.1, we defined
the Fourier transform for L1 -functions only. This was for a good reason, since for
L2 -functions, it cannot be readily ensured that the defining integral exists. Anyhow,
it appears desirable and will prove to be truly helpful to have access to the Fourier
transform not only on the (not even reflexive) Banach space L1 (Rd ), but also on the
Hilbert space L2 (Rd ).
The extension of the Fourier transform to the space L2 (Rd ) requires some further
work. As a first step, we define a “small” function space on which the Fourier
transform exhibits some further interesting properties—the Schwartz space:
Definition 4.9 The Schwartz space of rapidly decreasing functions is defined by
S(Rd ) = u ∈ C ∞ (Rd ) ∀α, β ∈ Nd : Cα,β (u) = sup |x α ∂x
∂β
β u(x)| < ∞ .
x∈Rd
Roughly speaking, the Schwartz space contains smooth functions that tend to
zero faster than polynomials tend to infinity. It can be verified elementarily that
the Schwartz space is a vector space. In order to make it accessible for analytical
methods, we endow it with a topology. We describe this topology by defining a
notion of convergence for sequences of functions.
Definition 4.10 A sequence (un ) in the Schwartz space converges to u if and only
if for all multi-indices α, β, one has
F (pα u) = i|α| ∂x
α
α F (u).
∂
∂ α −ix·ξ ∂ α −ix·ξ
(e ) = (−i)|α| ξ α e−ix·ξ and x α eix·ξ = i|α| (e ).
∂x α ∂ξ α
114 4 Frequency and Multiscale Methods
= i|α| pα (ξ )F u(ξ ).
= i|α| ( ∂ξ
α
α F u)(ξ ).
∂
Both of the previous arguments are valid, since the integrands are infinitely
differentiable with respect to ξ and integrable with respect to x. $
#
We thus observe that the Fourier transform transforms a differentiation into a
multiplication and vice versa. This lets us assume already that the Schwartz space
S(Rd ) is mapped onto itself by the Fourier transform. In order to show this, we state
the following lemma:
|x|2
− 2
Lemma 4.14 For the Gaussian function G(x) = e , one has
0 ) = G(ξ ),
G(ξ
d
d
0 )= 1 −ixk ξk
G(ξ g(xk )e dx = 0
g (ξk ).
(2π)d/2 Rd k=1 k=1
0
g satisfy the same differential equation with the same initial value. By the Picard-
Lindelöf theorem on uniqueness of solutions of initial value problems, they thus
have to coincide, which proves the assertion. $
#
Theorem 4.15 The Fourier transform is a continuous and bijective mapping of the
Schwartz space into itself. For u ∈ S(Rd ), we have the inversion formula
q(x) = 1
(F −1 F u)(x) = 0
u 0
u(ξ )eix·ξ dξ = u(x).
(2π)d/2 Rd
β α 1 α
|ξ α ∂ξ
∂
β0
u(ξ )| = |F ( ∂x
∂
α p u)(ξ )| ≤
β
∂ α pβ u1 . (4.1)
(2π)d/2 ∂x
Therefore, for u ∈ S(Rd ), we also have 0 u ∈ S(Rd ). Since the Fourier transform is
linear, it is sufficient to show the continuity at zero. We thus consider a null sequence
(un ) in the Schwartz space, i.e., as n → ∞, Cα,β (un ) → 0. That is, (un ), as well
as (∂ α pβ un ) for all α, β, converges to zero uniformly. This implies that the right-
hand side in (4.1) tends to zero. In particular, we obtain that Cα,β (u0n ) → 0, which
implies that (u0n ) is a null sequence, proving continuity.
In order to prove the inversion formula, we for now consider two arbitrary
functions u, φ ∈ S(Rd ). By means of Lemma 4.8 and the calculation rules for
translation and modulation given in Lemma 4.5, we infer for the convolution of 0 u
and φ that
(0
u ∗ φ)(x) = 0
u(y)φ(x − y) dy = u(y)eix·y 0
0 φ (−y) dy
Rd Rd
= u(y)0
φ(−x − y) dy = (u ∗ 0
φ)(−x).
Rd
|x|2
−
φε (x) = ε−d (Dε−1 id G)(x) = ε−d e 2ε2 .
0
u ∗ φε (x) → 0
u(x) and u ∗ φε (−x) → u(−x).
116 4 Frequency and Multiscale Methods
0
0
u(x) = u(−x).
Note that we can state the inversion formula for the Fourier transform in the
following way as well:
q = F u.
u
q=
According to the calculation rule for conjugation in Lemma 4.5, we infer that u
D− id0
u, and substituting 0
u for u, this altogether results in
q
u = D− id0
0 u = u. $
#
(0 v )2 = (u, v)2
u,0
and in particular u2 = F u2 . Thus, the Fourier transform is an isometry defined
on a dense subset of L2 (Rd ). Hence, there exists a unique continuous extension onto
the whole space. Due to the symmetry between F and F −1 , an analogous argument
yields the remainder of the assertion. $
#
Remark 4.17 As remarked earlier, the formula
1
F (u)(ξ ) = u(x)e−iξ ·x dx
(2π)d/2 Rd
cannot be applied to a function u ∈ L2 (Rd ), since the integral does not necessarily
exist. However, for u ∈ L2 (Rd ), there holds that the function
1
ψR (ξ ) = u(x)e−iξ ·x dx
(2π)d/2 |x|≤R
This is to say that the Fourier transform of h indicates in what way the frequency
components of u are damped, amplified, or modulated. We also call 0 h the transfer
function in this context. A convolution kernel h whose transfer function 0 h is zero
(or attains small values) for large ξ is called a ∼low-pass filter, since it lets low
frequencies pass. Analogously, we call h a ∼high-pass filter if 0 h(ξ ) is zero (or small)
for small ξ . Since noise contains many high-frequency components, one can try to
reduce noise by a low-pass filter. For image processing, it is a disadvantage in this
context that edges also exhibit many high-frequency components. Hence, a low-pass
filter necessarily blurs the edges as well. It turns out that edge-preserving denoising
cannot be accomplished with linear filters; cf. Fig. 4.1 as well.
Fig. 4.1 High- and low-pass filters applied to an image. The Fourier transform of the low-pass
filter g is a characteristic function of a ball around the origin, and the Fourier transform of the
high-pass filter h is a characteristic function of an annulus around the origin. Note that the filters
oscillate slightly, which is noticeable in the images as well
low-pass filter, i.e., for a radius r > 0, the Fourier transform of h is given by
0 1
h= χB (0).
(2π)d/2 r
The low- and high-frequency components of the image u are then respectively given
by
u ulow uhigh
In particular,
u1 u(1 − (2π)d/20
high = 0 h) = 0
u · χ{|ξ |>r} .
In Fig. 4.2, we observe that the separation of textured components by this method
has its limitations: the low-frequency component now contains almost no texture of
the fabric, whereas this is contained in the high-frequency component. However,
we also find essential portions of the edges, i.e., the separation of texture from
nontextured components is not very good.
Remark 4.21 (Deconvolution with the Fourier Transform) The “out-of-focus” and
motion blurring as well as other models of blurring assume that the blurring is
modeled by a linear filter, i.e., by a convolution. A deblurring can in this case be
achieved by a deconvolution: For a blurred image u given by
u = u0 ∗ h
u = 2π u000
with an unknown image u0 and a known convolution kernel h, one has 0 h
(in the two-dimensional case), and we obtain the unknown image by
0
u
u0 = F −1 .
2π 0
h
Fig. 4.3 Deconvolution with the Fourier transform. The convolution kernel h models a motion
blurring. The degraded image ũ results from a quantization of u into 256 gray values (a difference
the eye cannot perceive). After the deconvolution, the error becomes unpleasantly apparent
We can not only extend the Fourier transform from the Schwartz space S(Rd ) to
L2 (Rd ), but even define it for certain distributions as well. For this purpose, we
define a specific space of distributions:
Definition 4.22 By S(Rd )∗ we denote the dual space of S(Rd ), i.e., the space of
all linear and continuous functionals T : S(Rd ) → C. We call this space the space
of tempered distributions.
Tempered distributions are distributions in the sense of Sect. 2.3. Furthermore,
there are both regular and non-regular tempered distributions. The delta-distribution
is a non-regular tempered distribution, for example. In particular, every function
u ∈ S(Rd ) induces a regular tempered distribution Tu :
Tu (φ) = u(x)φ(x) dx,
Rd
Remark 4.23 We use the notation Tu for the distribution induced by u and the
similar notation Ty for the translation by y as long as there cannot be any confusion.
Our goal is to define a Fourier transform for tempered distributions. Since one
often does not distinguish between a function and the induced distribution, it is
reasonable to denote the Fourier transform of Tu by T0u = T0 u . According to
Lemma 4.8,
T0u (φ) = 0
u(ξ )φ(ξ ) dξ = u(ξ )0
φ(ξ ) dξ = Tu (0
φ).
Rd Rd
T0(φ) = T (0
φ).
Tq (φ) = T (φ).
q
Since the Fourier transform is bijective from the Schwartz space to itself, the
same holds if we view the Fourier transform as a map from the space of tempered
distributions to itself.
Theorem 4.25 As a mapping of the space of tempered distributions into itself, the
Fourier transform T → T0 is bijective and is inverted by T → Tq .
Since according to the Riesz-Markov representation theorem (Theorem 2.62),
Radon measures are elements of the dual space of continuous functions, they are in
particular tempered distributions as well. Hence, by Definition 4.24, we have defined
a Fourier transform for Radon measures as well.
Example 4.26 The distribution belonging to the Dirac measure δx of Example 2.38
is the delta distribution, denoted by δx as well:
δx (φ) = φ dδx = φ(x).
Rd
∗ v(ξ ) = (2π)d/20
u u(ξ )0
v (ξ ).
The computation rules for Fourier transforms and derivatives in Lemma 4.13
hold analogously for weak derivatives:
Lemma 4.28 Let u ∈ L2 (Rd ) and α ∈ Nd be such that the weak derivative ∂ α u
lies in L2 (Rd ) as well. Then
F (∂ α u) = i|α| pα F (u).
Proof As above, we show the equation in the distributional sense. We use integra-
tion by parts, Lemma 4.13, and the Plancherel formula (4.2), to obtain for a Schwartz
function φ,
T1
∂ α u (φ) = T∂ α u (0
φ) = ∂ α u(x)0
φ (x) dx
Rd
= (−1)|α| u(x)∂ α 0
φ (x) dx
Rd
4.1 The Fourier Transform 123
= (−1)|α| u(x)(−i
|α| pα φ)(x) dx
Rd
|α|
= (−1) u(x)(−i|α| pα (x)φ(x)) dx = Ti|α| pα 0
0 u (φ).
Rd
The
asserted equivalence now follows from the fact that the functions h(ξ ) =
|α|≤k |ξ α |2 and g(ξ ) = (1 + |ξ |2 )k are comparable, i.e., they can be estimated
against each other by constants that depend on k and d only, cf. Exercise 4.9. This
1/2
shows in particular that Rd (1 + |ξ |2 )k |0 u(ξ )|2 dξ is an equivalent norm on
k d
H (R ). $
#
Another way to put the previous theorem is that Sobolev space H k (Rd ) is the
Fourier transform of the weighted Lebesgue space L2(1+| · |2 )k Ld (Rd ).
Fig. 4.4 Illustration to Example 4.30: the smoothness of a function is reflected in the rapid decay
of the Fourier transform (and vice versa)
depicted in Fig. 4.4. The Fourier transform of u exhibits a decay rate at infinity
like |ξ |−1 ; in particular, the function ξ → |ξ |20
u(ξ ) is not in L2 (R). For v and w,
however, the Fourier transforms decay exponentially (cf. Exercise 4.4); in particular,
ξ → |ξ |k0v (ξ ) is an L2 (R)-function for every k ∈ N (just as it is for w). Conversely,
the slow decay of w is reflected in non-differentiability of w 0.
The relationship between smoothness and decay is of fundamental importance
for image processing: images with discontinuities never have a rapidly decaying
Fourier transform. This demonstrates again that in filtering with low-pass filters,
edges are necessarily smoothed and hence become blurred (cf. Example 4.19).
The equivalence in Theorem 4.29 motivates us to define Sobolev spaces for
arbitrary smoothness s ∈ R as well:
Definition 4.31 The fractional Sobolev space to s ∈ R is defined by
u ∈ H s (Rd ) ⇐⇒ (1 + |ξ |2 )s |0
u(ξ )|2 dξ < ∞.
Rd
Remark 4.32 In fractional Sobolev spaces, “nonsmooth” functions can still exhibit
a certain smoothness. For instance, the characteristic function u(x) = χ[−1,1] (x)
lies in the space H s (R) for every s ∈ [0, 1/2[ as one should convince oneself in
Exercise 4.10.
4.2 Fourier Series and the Sampling Theorem 125
Apart from the Fourier transform on L1 (Rd ), L2 (Rd ), S(Rd ), and S(Rd )∗ , anal-
ogous transformations for functions f on rectangles dk=1 [ak , bk ] ⊂ Rd are of
interest for image processing as well. This leads to the so-called Fourier series. By
means of these, we will prove the sampling theorem, which explains the connection
of a continuous image to its discrete sampled version. Furthermore, we will be able
to explain the aliasing in Fig. 3.1 that results from incorrect sampling.
Since the continuous functions are dense in L2 ([−π, π]), every L2 -function can also
be approximated by trigonometric polynomials arbitrarily well, and we conclude
that (ek )k forms a basis. $
#
126 4 Frequency and Multiscale Methods
The values
π
1
(u, ek )[−π,π] = u(x)e−ikx dx
2π −π
π
In this case, the functions ek (x) = eik B x form an orthonormal basis and together
with the Fourier coefficients
B
1 π
(u, ek )[−B,B] = u(x)e−ik B x dx
2B −B
d
On a d-dimensional rectangle = l=1 [−Bl , Bl ], we define the functions
, d by
(ek, )k∈Z
d π
ikl B xl
ek, (x) = e l
l=1
and obtain an orthonormal basis in L2 () with respect to the inner product
1
(u, v) = d u(x)v(x) dx.
2 d
l=1 Bl
4π
-1
Fig. 4.5 Sampling of the function u(x) = sin(5x). While sampling with rate T = 0.1 (crosses)
reflects the function well, the sampling rate T = 1.2 (dots) yields a totally wrong impression, since
it suggests a much too low frequency
Proof In this proof, we make use of the trick that 0u can be regarded as an element in
L2 (R) as well as in L2 ([−B, B]). Thus, we can consider both the Fourier transform
and the Fourier series of 0
u.
Since 0
u lies in L2 ([−B, B]), it lies in L1 ([−B, B]) as well. Thus, u is continuous
and the evaluation of u at a point is well defined. We use the inversion formula of
π
the Fourier transform, and with the basis functions ek (x) = eik B x , we obtain
B !
1 kπ
B )= √
u( kπ u(ξ )eiξ
0 B dξ = 2
πB (0
u, e−k )[−B,B] .
2π −B
Hence, the values u( kπB ) determine the coefficients (0u, e−k )[−B,B] , and due to 0u∈
L ([−B, B]), they actually determine the whole function 0
2 u. This proves that u is
determined by the values (u( kπB ))k∈Z .
In order to prove the reconstruction formula, we develop 0 u into its Fourier series
and note that for ξ ∈ R, we need to restrict the result by means of the characteristic
function χ[−B,B] :
2
π 1
0
u(ξ ) = u, ek )[−B,B] ek (ξ )χ[−B,B] (ξ ) =
(0 u(− kπ
B )ek (ξ )χ[−B,B] (ξ ).
2B
k∈Z k∈Z
Since the inverse Fourier transform is continuous, we can pull it inside the series
and obtain
2
π 1 −1
u= u(− kπ
B )F (ek χ[−B,B] ).
2B
k∈Z
128 4 Frequency and Multiscale Methods
By means of the calculation rules in Lemma 4.5 and Exercise 4.3, we infer
Inserting this expression into the previous equation yields the assertion. $
#
Remark 4.36 In the above case, we call B the bandwidth of the signal. This
bandwidth indicates the highest frequency contained in the signal. Expressed in
words, the sampling theorem reads:
π
A signal with bandwidth B has to be sampled with sampling rate B in order to store all
information of the signal.
We here use the word “frequency” not in the sense in which it is often used
in engineering. In this context, typically the angular frequency f = 2πB is used.
Also, the variant of the Fourier transform in Remark 4.3 including the term e−2πix·ξ
is common there. For this situation, the assertion of the sampling theorem reads:
If a signal exhibits frequencies up to a maximal angular frequency f , it has to be sampled
1
with the sampling rate 2f in order to store all information of the signal.
That is, one has to sample twice as fast as the highest angular frequency. The
1
sampling frequency 2f is also called the Nyquist rate or Nyquist frequency.
4.2.3 Aliasing
Aliasing is what we observe in Figs. 3.1 and 4.5: the discrete image or signal does
not match the original signal, since in the discrete version, frequencies arise that are
not contained in the original. As “aliases,” they stand for the actual frequencies.
In the previous subsection, we saw that this effect cannot occur if the signal is
sampled at a sufficiently high rate. In this subsection, we desire to understand how
exactly aliasing arises and how we can eliminate it.
For this purpose, we need an additional tool:
either the function k∈Z 0u( · + 2Bk) ∈ L ([−B, B]) or the series k∈Z |u( B )|
converges. Then, for almost all ξ ∈ R,
√
2π kπ −i kπ ξ
0
u(ξ + 2Bk) = u( B )e B .
2B
k∈Z k∈Z
4.2 Fourier Series and the Sampling Theorem 129
For φ ∈ L2 ([−B, B]), we can represent the function by its Fourier series. The
Fourier coefficients are given by
B
1 kπ
(φ, ek )[−B,B] = φ(ξ )e−i B ξ dξ
2B −B
B
1 kπ
= u(ξ + 2Bl)e−i B ξ dξ
0
2B −B l∈Z
B
1 kπ
= u(ξ + 2Bl)e−i B
0 (ξ +2Bl)
dξ
2B −B l∈Z
1 kπ
= u(ξ )e−i B ξ dξ
0
2B R
√
2π
= u(− kπ
B ).
2B
Therefore, the Fourier series
φ(ξ ) = (φ, ek )[−B,B] ek (ξ )
k∈Z
√
2π kπ
= u(− kπ
B )ei B ξ
2B
k∈Z
The connection between u and ud can be clarified via the Fourier transform:
Lemma 4.39 For almost all ξ ∈ R, one has
B
u0d (ξ ) = 0
u(ξ + 2Bk).
π
k∈Z
1 kπ
F (δk π )(ξ ) = √ e−i B ξ .
B 2π
1 kπ −i kπ ξ
u0d (ξ ) = √ u( )e B
2π k∈Z B
B
= 0
u(ξ + 2Bn). $
#
π
n∈Z
Expressed in words, the lemma states that the Fourier transform of the sampled
signal corresponds to a periodization of the Fourier transform of the original signal
with period 2B.
In this way of speaking, we can interpret the reconstruction formula in the
sampling theorem (Theorem 4.35) as a convolution as well:
u(x) = B ) sinc( π (x −
u( kπ B kπ
B )) = ud ∗ sinc( Bπ · )(x).
k∈Z
0
u(ξ ) = u0d (ξ ) Bπ χ[−B,B] (ξ ).
If the support of 0u is contained in the interval [−B, B], then no overlap occurs
during periodization, and u0d Bπ χ[−B,B] corresponds to 0
u exactly. This procedure is
depicted in Fig. 4.6.
However, if 0
u has a larger support, then the support of 0
u( · + 2Bk) exhibits a
nonempty intersection with [−B, B] for several k. This “folding” in the frequency
domain is responsible for aliasing; cf. Fig. 4.7.
4.2 Fourier Series and the Sampling Theorem 131
u(ξ ) u(x)
ξ x
−B B
ud (ξ ) ud (x)
ξ x
−3B −B B 3B π
B
χ[−B,B] (ξ ) sinc( B
π x)
ξ x
−B 3π − π π
B − 5π
B − B B B
3π
B
5π
B
ξ x
B B
Fig. 4.6 Reconstruction of a discretized signal by means of the reconstruction formula in the
sampling theorem. First row: A signal u and its Fourier transform. Second row: Sampling the
function renders the Fourier transform periodic. Third row: The sinc convolution kernel and its
Fourier transform. Fourth row: Convoluting with the sinc function reconstructs the signal perfectly
eiξ0 x + e−iξ0 x
u(x) = cos(ξ0 x) = .
2
Its Fourier transform is given by
!
0
u= π
2 (δξ0 + δ−ξ0 ).
u(ξ ) u(x)
ξ x
−B B
ud (ξ ) ud (x)
ξ x
−3B −B B 3B π
B
χ[−B,B] (ξ ) sinc( B
π x)
ξ x
3π − π π
−B B − 5π
B − B B B
3π
B
5π
B
ξ x
B B
Fig. 4.7 Illustration of aliasing. First row: A signal u and its Fourier transform. Second row:
Sampling of the function renders the Fourier transform periodic and produces an overlap. Third
row: The sinc convolution kernel and its Fourier transform. Fourth row: Convoluting with the sinc
function reconstructs a signal in which the high-frequency components are represented by low
frequencies
We denote by
ud = u(k1 T1 , k2 T2 ) δ(k1 T1 ,k2 T2 )
k∈Z2
an image that is discretely sampled on a rectangular grid with sampling rates T1 and
T2 . By means of the Fourier transform, the connection to the continuous image u
can be expressed as
B1 B2
u0d (ξ ) = 0
u(ξ1 + 2B1 k1 , ξ2 + 2B2 k2 ).
π2 2
k∈Z
Also in this case, aliasing occurs if the image does not have finite bandwidth or is
sampled too slowly. In addition to the change of frequency, a change in the direction
also may occur here:
134 4 Frequency and Multiscale Methods
(ξ1 , ξ2 )
B
(ξ1 , ξ2 − 2B)
k∈Z2 ulk δlk . Also during this undersampling, aliasing arises; cf. Fig. 3.1. In order
to prevent this, a low-pass filter h should be applied before the undersampling in
order to eliminate those frequencies that due to the aliasing would be reconstructed
as incorrect frequencies. It suggests itself to choose this filter as the perfect low-
pass filter with width π/ l, i.e., we have 0 h = χ[−π/ l,π/ l]2 . This prevents aliasing;
cf. Fig. 4.8.
Fig. 4.8 Preventing aliasing by low-pass filtering. For better comparability, the subsampled
images are rescaled to the original size
4.3 The Discrete Fourier Transform 135
For the numerical realization of frequency methods, we need the discrete Fourier
transform. Also in this case, we shall study one-dimensional discrete images for
now and will obtain the higher-dimensional version later as a tensor product. Hence,
we consider
u : {0, . . . , N − 1} → C.
These images form an N-dimensional vector space CN , which by means of the inner
product
N−1
(u, v) = un v n
n=0
1
N−1 −2πink
0
uk = un exp .
N N
n=0
1
0
u= Bu.
N
N−1 2πink
un = 0
uk exp .
N
k=0
u = NB −10
u = B ∗0
u.
Also in the discrete case, we denote the inverse of the Fourier transform by ǔ.
The generalization to two-dimensional images is simple:
Remark 4.45 (Two-Dimensional Discrete Fourier Transform) The two-dimensional
discrete Fourier transform 0
u ∈ CN×M of u ∈ CN×M is defined by
1
M−1 N−1 −2πink −2πiml
0
uk,l = un,m exp exp
MN N M
m=0 n=0
and is inverted by
M−1
N−1 2πink 2πiml
un,m = 0
uk,l exp exp .
N M
k=0 l=0
4.3 The Discrete Fourier Transform 137
Remark 4.46 (Periodicity of the Discrete Fourier Transform) The vectors bn can
also be regarded as N-period, i.e.,
2πin(k + N) 2πink
n
bk+N = exp = exp = bkn .
N N
Furthermore, one has also bn+N = bn . In other words, for the discrete Fourier
transform, all signals are N-periodic, since we have
0
uk+N = 0
uk .
This observation is important for the interpretation of the Fourier coefficients. The
basis vectors bN−n correspond to the vectors b−n , i.e., the entries 0
uN−k with small k
correspond to the low-frequency vectors b−k . This can also be observed in Fig. 4.9:
the “high” basis vectors have low frequencies, while the “middle” basis vectors
have the highest frequencies. Another explanation for this situation is given by the
sampling theorem as well as the alias effect: when sampling with rate 1, the highest
uk uk
0 1 2 3 4 5 6 7 k -4 -3 -2 -1 0 1 2 3 k
There is also a convolution theorem for the discrete Fourier transform. In view
of the previous remark, it is not surprising that it holds for periodic convolution:
Definition 4.47 Let u, v ∈ CN . The periodic convolution of u with v is defined by
N−1
(u v)n = vk u(n−k) mod N .
k=0
(u
v)n = N0
un0
vn .
Proof Using the periodicity of the complex exponential function, the equation can
be verified directly:
1
N−1 N−1 −2πink
(u
v)n = vl u(k−l) mod N exp
N N
k=0 l=0
1
N−1 −2πinl N−1
−2πin(k − l)
= vl exp u(k−l) mod N exp
N N N
l=0 k=0
vn0
= N0 un . $
#
Also in the discrete case, convolution can be expressed via the multiplication of
the Fourier transform. And here we again call the Fourier transform of a convolution
kernel the transfer function:
4.3 The Discrete Fourier Transform 139
1
r 2πink
0
hk = hn exp − .
N n=−r N
Note that the Fourier transform of a convolution kernel depends on the period N
of the signal! Furthermore, it is important that we use here the periodic boundary
extension of Sect. 3.3.3 throughout.
Example 4.50 We consider some of the filters introduced in Sect. 3.3.3.
For the moving average filter M 3 = [1 1 1]/3, the transfer function is given by
1
1 2πink
1
M k=
3 exp −
3N N
n=−1
2πik 2πik
1
= 1 + exp + exp −
3N N N
1 2πk
= 1 + 2 cos .
3N N
M3
1
N
− N2 N
2
k
1
3N
The transfer function of this filter is nonnegative throughout, i.e., no frequencies are
“flipped.”
B2
1
N
k
N 1 N
2 4N 2
In this light, Sobel filters appear more reasonable than Prewitt filters.
Remark 4.51 (Fast Fourier Transform and Fast Convolution) Evaluating the sums
directly, the discrete Fourier transform needs O(N 2 ) operations. By making use
of symmetries, the cost can be significantly reduced to O(N log2 N), cf. [97],
for instance. By means of the convolution theorem, this can be utilized for a fast
convolution; cf. Exercises 4.11 and 4.12.
Remark 4.52 The discrete Fourier transform satisfies the following symmetry
relation:
1
N−1 2πink 1
N−1 2πin(N − k)
0
uk = un exp − = un exp − =0
uN−k .
N N N N
n=0 n=0
2λk
N−1
DCT(u)k = un cos kπ
N (n + 12 ) .
N
n=0
N−1
un = IDCT(DCT(u))n = λn DCT(u)k cos N (n + 2 )
kπ 1
.
k=0
4.4 The Wavelet Transform 141
Like the discrete Fourier transform, the DCT can be computed with complexity
O(N log2 N). There are three further variants of the discrete cosine transform;
cf. [97] for details, for instance.
Application 4.53 (Compression with the DCT: JPEG) The discrete cosine
transform is a crucial part of the JPEG compression standard, which is based on
the idea of transform coding, whereby the image is transformed into a different
representation that is better suited for compression through quantization. One can
observe that the discrete cosine transform of an image typically exhibits many small
coefficients, which mostly belong to the high frequencies. Additionally, there is a
physiological observation that the eye perceives gray value variations less well for
higher frequencies. Together with further techniques, this constitutes the foundation
of the JPEG standard. For grayscale images, this standard consists of the following
steps:
• Partition the image into 8 × 8-blocks and apply the discrete cosine transform to
these blocks.
• Quantize the transformed values (this is the lossy part).
• Reorder the quantized values.
• Apply entropy coding to the resulting numerical sequence.
The compression potential of the blockwise two-dimensional DCT is illustrated in
Fig. 4.10. For a detailed description of the functioning of JPEG, we refer to [109].
In the previous sections, we discussed in detail that the Fourier transform yields the
frequency representation of a signal or an image. However, any information about
location is not encoded in an obvious way. In particular, a local alteration of the
signal or image at only one location results in a global change of the whole Fourier
transform. In other words, the Fourier transform is a global transformation in the
sense that the value 0u(ξ ) depends on all values of u. In several circumstances,
transformations that are “local” in a certain sense appear desirable. Before we
introduce the wavelet transform, we present another alternative to obtain “localized”
frequency information: the windowed Fourier transform.
An intuitive idea for localizing the Fourier transform is the following modification.
We use awindow function g : Rd → C, which is nothing more than a function that
is “localized around the origin,” i.e., a function that for large |x|, assumes small
values. For σ > 0, two such examples are given by the following functions, which
142 4 Frequency and Multiscale Methods
u0 PSNR(u, u0 ) = 34.4db
Fig. 4.10 Illustration of the compression potential of the two-dimensional DCT on 8 × 8 blocks.
From upper left to lower right: Original image, reconstruction based on 10%, 5% and 2% of the
DCT coefficients, respectively. The resulting artifacts are disturbing only in the last case
The first alternative again explains the name “windowed” Fourier transform:
through multiplication by the shifted window g, the function u is localized prior
to the Fourier transform.
Note that the windowed Fourier transform is a function of 2d variables: Gg u :
R2d → C. If the window function g is a Gaussian function, i.e., g(x) =
(2πσ )−d/2 exp(−|x|2/(2σ )) for some σ > 0, the transform is also called the Gabor
transform.
Thanks to the previous work in Sects. 4.1.1–4.1.3, the analysis of the elementary
properties of the windowed Fourier transform is straightforward.
Lemma 4.55 Let u, v, g ∈ L2 (Rd ). Then we have Gg u ∈ L2 (R2d ) and
Proof In order to prove the equality of the inner products, we use the isometry
property of the Fourier transform and in particular the Plancherel formula (4.2).
With Ft denoting the Fourier transform with respect to the variable t, we use one
of the alternative representations of the windowed Fourier transform as well as the
convolution theorem to obtain
Ft Gg u(ξ, · ) (ω) = Ft (2π)−d/2(M−ξ u ∗ D− id g) (ω)
= F (M−ξ u)(ω)(F D− id g)(ω)
=0
u(ω + ξ )0
g (ω).
g 22 (0
= 0 u,0
v )2
= g22 (u, v)2 . $
#
We thereby see that the windowed Fourier transform is an isometry, and as such,
it is inverted on its range by its adjoint (up to a constant). In particular, we have the
following.
Corollary 4.56 For u, g ∈ L2 (Rd ) with g2 = 1, we have the inversion formula
1
u(x) = Gg u(ξ, t)g(x − t)eix·ξ dξ dt for almost all x.
(2π)d/2 Rd Rd
which implies
1
Gg∗ F (x) = F (ξ, t)eix·ξ g(x − t) dξ dt. $
#
(2π)d/2 R2d
Another transformation that analyzes local behavior is given by the wavelet trans-
form. This transform has found broad applications in signal and image processing, in
particular due to its particularly elegant discretization and its numerical efficiency.
4.4 The Wavelet Transform 145
√1 ψ( x−b )
g(x − t) |a| a
ψ(x)
x x
t b
Re(g(x t)e−iξ x )
Fig. 4.11 Localization during windowed Fourier transform (left) and during wavelet transform
(right)
We will follow a similar path as for the Fourier transform: we first introduce the
continuous wavelet transform as well as the wavelet series, and finally, we cover the
discrete wavelet transform.
While the windowed Fourier transform uses a fixed window in order to localize
the function of interest, the wavelet transform uses functions of varying widths,
cf. Fig. 4.11. In case of dimensions higher than one, the wavelet transform can be
defined in various ways. We here cover the one-dimensional case of real-valued
functions:
Definition 4.57 Let u, ψ ∈ L2 (R, R). For b ∈ R and a > 0, the wavelet transform
wavelet transform of u with ψ is defined by
x−b
Lψ u(a, b) = u(x) √1a ψ a dx.
R
1
Lψ u(a, b) = √ (u, T−b D1/a ψ)L2 (R)
a
1
= √ (u ∗ D−1/a ψ)(b).
a
where we have omitted the usual transition to equivalence classes (cf. Sect. 2.2.2).
The inner product in this space is given by
∞ da db
(F, G) da db = F (a, b)G(a, b) .
L2 ([0,∞[×R, 2 )
a R 0 a2
Then
Lψ : L2 (R) → L2 ([0, ∞[ × R, da db
a2
)
Proof We use the inner product representation of the wavelet transform, the
calculation rules for the Fourier transform, and the Plancherel formula (4.2) to
obtain
1
Lψ u(a, b) = √ (u, T−b D1/a ψ)L2 (R)
a
1
= √ (0u, F (T−b D1/a ψ))L2 (R)
a
1 0 L2 (R)
= √ (0 u, aM−b Da ψ)
a
√
= a 0 0(aξ ) dξ
u(ξ )eibξ ψ
R
√
= a2πF −1 (0 0)(b).
uDa ψ
Now we compute the inner product of Lψ u and Lψ v, again using the calculation
rules for the Fourier transform and the Plancherel formula with respect to the
variable b:
∞
da db
(Lψ u, Lψ v) = Lψ u(a, b)Lψ v(a, b)
L2 ([0,∞[×R, da 2db ) R 0 a2
a
∞
da
= 2π aF −1 (0 0
uDa ψ)(b)F −1 (0 0
v Da ψ)(b) db 2
0 R a
4.4 The Wavelet Transform 147
∞
da
= 2π a0 0 )0
u(ξ )ψ(aξ 0 ) dξ
v (ξ )ψ(aξ
0 R a2
∞ 0
|ψ(aξ )|2
= 2π 0
u(ξ )0
v (ξ ) da dξ.
R 0 a
0(−ξ )| = |ψ
Then a change of variables and |ψ 0(ξ )| leads to
∞ 0 ∞ 0 ∞ 0
|ψ (aξ )|2 |ψ (a|ξ |)|2 |ψ (ω)|2 cψ
da = da = dω = .
0 a 0 a 0 ω 2π
Applying the Plancherel formula again yields the assertion. $
#
The condition cψ < ∞ ensures that Lψ is a continuous mapping, while cψ > 0
guarantees the stable invertibility on the range of Lψ .
Definition 4.59 The condition
∞ 0 )|2
|ψ(ξ
0 < cψ = 2π dξ < ∞ (4.3)
0 ξ
is called admissibility condition, and the functions ψ that satisfy it are called
wavelets.
The admissibility condition says in particular that around zero, the Fourier transform
of a wavelet must tend to zero sufficiently fast, roughly speaking, ψ 0(0) = 0. This
implies that the average of a wavelet vanishes.
Analogously to Corollary 4.56 for the windowed Fourier transform, we derive
that the wavelet transform is inverted on its range by its adjoint (up to a constant).
Corollary 4.60 Let u, ψ ∈ L2 (R) and cψ = 1. Then,
∞
1 da db
u(x) = Lψ u(a, b) √ ψ x−ba .
R 0 a a2
Proof Due to the normalization, we have only to compute the adjoint of the wavelet
transform. For u ∈ L2 (R) and F ∈ L2 ([0, ∞[ × R, daa 2db ),
∞ x−b da db
(Lψ u, F ) da db = u(x) √1a ψ a dx F (a, b)
L2 ([0,∞[×R,
a2
) R 0 R a2
∞ x−b da db
= u(x) F (a, b) √1a ψ a dx.
R R 0 a2
This implies
∞ x−b da db
L∗ψ F (x) = F (a, b) √1a ψ a ,
R 0 a2
which yields the assertion. $
#
148 4 Frequency and Multiscale Methods
− dx
d 2
0(ξ ) = iξ e−ξ 2 /2 , which
G(x) = xe−x /2 . This function is a wavelet, since ψ
implies
∞ 0 )|2
|ψ(ξ
cψ = 2π dξ = π.
0 ξ
1 ψ 1 |ψ |
2 2
-3 3x 3 ξ
We can express the wavelet transform by means of the convolution and obtain in
this case
1
Lψ u(a, b) = √ (u ∗ D−1/a ψ)(b)
a
1
= √ (u ∗ D−1/a G )(b)
a
√
= a(u ∗ (D−1/a G) )(b)
√ d
= a (u ∗ (D−1/a G))(b).
db
Here we can recognize an analogy to edge detection according to Canny in
Application 3.23, whereby the image was convolved with scaled Gaussian
functions as well.
A similar construction leads to the so-called Mexican hat function: ψ(x) =
√
d2
− dx 0(ξ ) = ξ 2 e−ξ 2 /2 and cψ = π/2.
2 −x 2 /2 . Here we have ψ
2 G(x) = (1 − x )e
The Mexican hat function is named after its shape:
ψ
1 |ψ |
1
2
-3 3x 3 ξ
4.4 The Wavelet Transform 149
Haar wavelet: A different kind of wavelet is given by the Haar wavelet, defined
by
⎧
⎪
⎪ if 0 ≤ x < 12 ,
⎨1
ψ(x) = −1 if 12 ≤ x < 1,
⎪
⎪
⎩0 otherwise.
ψ
1
1 |ψ|
2
1x 4π 8π ξ
The wavelets that result from derivatives of the Gaussian function are very
smooth (infinitely differentiable) and decay rapidly. In particular, they (as well as
their Fourier transforms) are well localized. For a discrete implementation, compact
support would furthermore be desirable, since then, the integrals that need to be
computed would be finite. As stated above, the Haar wavelet exhibits compact
support, but it is discontinuous. In the following subsection, we will see that it is
particularly well suited for the discrete wavelet transform. In this context, we will
also encounter further wavelets.
form an orthonormal basis of L2 (R). To comprehend this will require quite a bit
of work. We first introduce the central notion for wavelet series and the discrete
wavelet transform, namely the notion of a “multiscale analysis.”
150 4 Frequency and Multiscale Methods
u ∈ Vj ⇐⇒ T2j k u ∈ Vj .
Vj +1 ⊂ Vj .
u ∈ Vj ⇐⇒ D1/2 u ∈ Vj +1 .
Trivial intersection:
6
Vj = {0}.
j ∈Z
Completeness:
Vj = L2 (R).
j ∈Z
Orthonormal
basis: There is a function φ ∈ V0 such that the functions
{Tk φ k ∈ Z} form an orthonormal basis of V0 .
The function φ is called a generator or scaling function of the multiscale analysis.
Let us make some remarks regarding this definition: The spaces Vj are translation
invariant with respect to the dyadic translations by 2j . Furthermore, they are nested
into each other and become smaller with increasing j . If we denote the orthogonal
projection onto Vj by PVj , then we have for every u that
with hk = (φ, φ−1,k ). Equation (4.4) is called the scaling equation, and it explains
the name scaling function for φ. The functions φj,k already remind us of the
continuous wavelet transform with discrete values a = 2j and b = 2j k. We find
the wavelets in the following construction again:
Definition 4.65 (Approximation and Detail Spaces) Let (Vj )j ∈Z be a multiscale
analysis. Let the spaces Wj be defined as orthogonal complements of Vj in Vj −1 ,
i.e.,
Vj −1 = Vj ⊕ Wj , Vj ⊥ Wj .
2−j/2 φj,k
1 φ
1 1 3 1
4 2 4
PV−1 u
PV−2 u
1 2j k 2j (k 1) u
Fig. 4.12 Representation of a function in the piecewise constant multiscale analysis. Left:
Generator function φ and φj,k for j = −1, k = 3. Right: The function u(x) = cos(πx) and
its representation in the spaces V−1 and V−2
152 4 Frequency and Multiscale Methods
The space Vj is called the approximation space to the scale j ; the space Wj is called
the detail space or wavelet space to the scale j .
The definition of the spaces Wj immediately implies
7
Vj = Wm
m≥j +1
These equations justify the name “multiscale analysis”: the spaces Vj allow a
systematic approximation of functions on different scales.
Example 4.66 (Detail Spaces Wj for Piecewise Constant Multiscale Analysis) Let
us investigate what the spaces Wj look like in Example 4.64. We construct the
spaces by means of the projection PVj . For x ∈ [k2j , (k + 1)2j [, we have
(k+1)2j
PVj u(x) = 2−m u(x) dx.
k2j
In order to obtain PWj+1 = PVj − PVj+1 , we use the scaling equation for φ. In this
case, we have
cf. Fig. 4.13. Therefore, also the spaces Wj have orthonormal bases, namely
(ψj,k )k∈Z . The function ψ is just the Haar wavelet in Example 4.61 again.
The above example shows that for piecewise constant multiscale analysis, there is
actually a wavelet (the Haar wavelet) that yields an orthonormal basis of the wavelet
spaces Wj . A similar construction also works in general, as the following theorem
demonstrates:
Theorem 4.67 Let (Vj ) be a multiscale analysis with generator φ such that φ
satisfies the scaling equation (4.4) with a sequence (hk ). Furthermore, let ψ ∈ V−1
be defined by
√
ψ(x) = 2 (−1)k h1−k φ(2x − k).
k∈Z
Then:
1. The set {ψj,k k ∈ Z} is an orthonormal basis of Wj .
2. The set {ψj,k j, k ∈ Z} is an orthonormal basis of L2 (R).
3. The function ψ is a wavelet with cψ = 2 ln 2.
154 4 Frequency and Multiscale Methods
1 2j k 2j (k + 1)
ψj,k
(ψ, φk,0 ) = 0,
(ψ, ψk,0 ) = δ0,k
In Exercise 4.16, you will prove that in particular, k∈Z h2k = 1, and due to
φ−1,0 = 1, it follows that the system {φk,0 , ψk,0 k ∈ Z} is complete in V−1 .
The assertions 1 and 2 can now be shown by simple arguments (cf. Exercise 4.14).
For assertion 3, we refer to [94]. $
#
4.4 The Wavelet Transform 155
In the context of Theorem 4.67, we also note that the set {ψj,k j, k ∈ Z} is an
orthonormal wavelet basis of L2 (R).
Given a sequence of subspaces (Vj ) that satisfies all the further requirements
for a multiscale analysis, it is in general not easy to come up with an orthonormal
basis for V0 (another example is given in Example 4.15). However, it can be shown
that under the assumption in Remark 4.63 (the translates of φ form a Riesz basis of
V0 ), a generator can be constructed whose translates form an orthonormal basis of
V0 , cf. [94], for instance. Theorem 4.67 expresses what the corresponding wavelet
looks like. By now, a large variety of multiscale analyses and wavelets with different
properties have been constructed. Here, we only give some examples:
Example 4.68 (Daubechies Wavelets and Symlets) We consider two important
examples of multiscale analysis:
Daubechies Wavelets: The Daubechies wavelets (named after Ingrid Daubechies,
cf. [48]) are wavelets with compact support, a certain degree of smoothness and
of vanishing moments, i.e., for l = 0, . . . , k up to a certain k,
a certain number
the integrals R x l ψ(x) dx are all zero. There is a whole scale of these wavelets,
and for a given support, these wavelets are those that exhibit the most vanishing
moments, i.e., k is maximal. The so-called db2-wavelet ψ (featuring two
vanishing moments) and the corresponding scaling function φ look as follows:
φ(x)
1 ψ(x)
x
3
For the db2-wavelet, the analytical coefficients of the scaling equation are given
by
√ √ √ √
1− 3 3− 3 3+ 3 1+ 3
h0 = √ , h1 = √ , h2 = √ , h3 = √ .
8 2 8 2 8 2 8 2
For the other Daubechies wavelets, the values are tabulated in [48], for instance.
156 4 Frequency and Multiscale Methods
Symlets: Symlets also trace back to Ingrid Daubechies. They are similar to
Daubechies wavelets, but more “symmetric.” Also in this case, there is a scale of
symlets. The coefficients of the scaling equation are tabulated but not available
in analytical form. The so-called sym4-wavelet (exhibiting four vanishing
moments) and the corresponding scaling function look as follows:
1 ψ(x) φ(x)
x
7
The scaling equation (4.4) and the definition of the wavelet in Theorem 4.67 are
the key to a fast wavelet transform. We recall that since {ψj,k j, k ∈ Z} forms an
orthonormal basis of L2 (R), one has
u= (u, ψj,k )ψj,k .
j,k∈Z
The equation for ψj,k can be shown analogously, and the equations for the inner
products are an immediate consequence. $
#
Starting from the values (u, φ0,k ), we now have recurrence formulas for the
coefficients on coarser scales j > 0. By means of the abbreviations
j
j −1 j
j −1
ck = hl c2k+l and dk = (−1)l h1−l c2k+l .
l∈Z l∈Z
Based on the projection PVj u, the fast wavelet transform computes the coarser
projection PVj+1 u in the approximation space Vj +1 and the wavelet component
PWj+1 u in the detail space Wj +1 . Note that in the case of a finite coefficient sequence
h, the summation processes are finite. In the case of short coefficient sequences, only
few calculations are necessary in each recursive step.
For the reconstruction, the projection PVj u is computed based on the coarser
approximation PVj+1 u and the details PWj+1 u, as the following lemma describes:
Lemma 4.70 For the coefficient sequences d j = ((u, ψj,k ))k∈Z and cj =
((u, φj,k ))k∈Z , we have the following recurrence formula:
j
j +1
j +1
ck = cl hk−2l + dl (−1)k−2l h1−(k−2l) .
l∈Z l∈Z
Proof Since the space Vj is orthogonally decomposed into the spaces Vj +1 and
Wj +1 , one has PVj u = PVj+1 u + PWj+1 u. Expressing the projections by means of
158 4 Frequency and Multiscale Methods
= PVj+1 u + PWj+1 u
j +1 j +1
= cl φj +1,l + dl ψj +1,l
l∈Z l∈Z
j +1
j +1
= cl hn φj,n+2l + dl (−1)n h1−n φj,n+2l
l∈Z n∈Z l∈Z n∈Z
j +1
j +1
= cl hk−2l φj,k + dl (−1)k−2l h1−(k−2l)φj,k .
l∈Z k∈Z l∈Z k∈Z
Swapping the sums and comparing the coefficients yields the assertion. $
#
In order to denote the decomposition and the reconstruction in a concise way, we
introduce the following operators:
H : 2 (Z) → 2 (Z), (H c)k = hl c2k+l ,
l∈Z
G : 2 (Z) → 2 (Z), (Gc)k = (−1)l h1−l c2k+l .
l∈Z
Thereby, we obtain
(H ∗ c)k = hk−2l cl = hk−n c̃n = (c̃ ∗ h)k .
l∈Z n∈Z
∗D−1 h 2 cj +1 2 ∗h
cj + cj
∗D−1 g 2 d j +1 2 ∗g
Here, ↓ 2 refers to downsampling with the factor 2, i.e., omitting every second
value. Analogously, ↑ 2 denotes upsampling with the factor 2, i.e., the extension of
the vector by filling in zeros at every other place.
160 4 Frequency and Multiscale Methods
It remains to discuss how to obtain an initial sequence cJ . Let us assume that the
signal of interest u lies in a certain approximation space VJ , i.e.,
u= ckJ φJ,k ,
k∈Z
c−4
d −4
d −5
d −6
d −7
u ≈ 2−4 c−8
1
Fig. 4.14 One-dimensional wavelet transform of a signal with the wavelet sym4 (cf. Exam-
ple 4.68). Bottom row: Signal of interest u : [0, 1] → R, sampled with T = 2−8 corresponding
to approximately 2−4 c−8 . Upper graphs: Wavelet and approximation coefficients, respectively.
Note that the jumps and singularities of the signal evoke large coefficients on the fine scales. In
contrast, the smooth parts of the signal can be represented nearly exclusively by the approximation
coefficients
4.4 The Wavelet Transform 161
For a finite signal u, the sampling results in a finite sequence. Since the discrete
wavelet transform consists of convolutions, again the problem arises that the
sequences have to be evaluated at undefined points. Also in this case, boundary
extension strategies are of help. The simplest method is given by the periodic
boundary extension: the convolutions are replaced by periodic convolutions. This
corresponds to the periodization of the wavelet functions ψj,k . A drawback of this
method is the fact that the periodization typically induces a jump discontinuity
that leads to unnaturally large wavelet coefficients at the boundary. Other methods
such as symmetric extension or zero extension are somewhat more complex to
implement; cf. [97], for instance.
The numerical cost of a decomposition or reconstruction step is proportional to
the length of the filter h and the length of the signal. For a finite sequence c0 , the
length of the sequence is cut by half due to the downsampling in every decompo-
sition step (up to boundary extension effects). Since the boundary extension effects
are of the magnitude of the filter length, the total complexity of the decomposition
of a signal of length N = 2M into M levels with a filter h of length n is given by
O(nN). The same complexity true for the reconstruction. For short filters h, this
cost is even lower than for the fast Fourier transform.
Vj2 = Vj ⊗ Vj ⊂ L2 (R2 )
form an orthonormal basis of Vj2 . This construction is also called a tensor product
of separable Hilbert spaces; cf. [141], for instance.
162 4 Frequency and Multiscale Methods
In the two-dimensional case, the wavelet spaces, i.e., the orthogonal comple-
ments of Vj2 in Vj2−1 , exhibit a little more structure. We define the wavelet space
Wj2 by
(where the superscripted number two in one case denotes a tensor product and in the
other just represents a name). On the other hand, Vj −1 = Vj ⊕ Wj , and we obtain
can be expressed as
m
{ψj,k m = 1, 2, 3, k ∈ Z2 , j ∈ Z}
the horizontal details on the scale j (i.e., the details in the x1 -direction), the spaces
Sj2 the vertical details (in the x2 -direction) and the spaces Dj2 the diagonal details;
cf. Fig. 4.15.
4.4 The Wavelet Transform 163
Fig. 4.15 Two-dimensional wavelet transform of an image by means of the Haar wavelet. The
image itself is interpreted as the finest wavelet representation in the space V0 . Based on this, the
components in the coarser approximation and detail spaces are computed
∗D−1 h 2 cj +1
∗D−1 h 2
∗D−1 g 2 d 1,j +1
cj
∗D−1 h 2 d 2,j +1
∗D−1 g 2
∗D−1 g 2 d 3,j +1
Application 4.73 (JPEG2000) The DCT being a foundation of the JPEG stan-
dard, the discrete wavelet transform constitutes a foundation of the JPEG2000
standard. Apart from numerous further differences between JPEG and JPEG2000,
using the wavelet transform instead of the blockwise DCT is the most profound
distinction. This procedure has several advantages:
• Higher compression rates, but preserving the same subjective visual quality.
• More “pleasant” artifacts.
• Stepwise image buildup through stepwise decoding of the scales (of advantage
in transferring with a low data rate).
Figure 4.16 shows the compression potential of the discrete wavelet transform
(compare to the DCT case in Fig. 4.10). For a detailed description of JPEG2000,
we refer to [109] again.
u0 PSNR(u, u0 ) = 34.3db
Fig. 4.16 Illustration of the compression potential of the two-dimensional wavelet transform.
From upper left to lower right: Original image, reconstruction based on 10%, 5% and 2% of the
wavelet coefficients, respectively
4.5 Further Developments 165
in order to define
3
γa,θ,b (x) = a 2 γ (Sa Rθ x − b).
The functions γa,θ,b hence result from γ through translation, rotation, and parabolic
scaling. The continuous curvelet transform is then given by
γ u(a, θ, b) = u(x)γa,θ,b (x) dx;
R2
cf. [96] as well. It is remarkable about the construction of the curvelets that they
allow a discretization that nearly results in an orthonormal basis, cf. [27]. Apart from
that, curvelets are in a certain sense nearly optimally suited to represent functions
that are piecewise twice continuously differentiable and whose discontinuities occur
on sets that can be parameterized twice continuously differentiably. A curvelet
decomposition and reconstruction can be implemented efficiently, cf. [28]. In
comparison to Fourier or wavelet decompositions, however, it is still quite involved.
Another ansatz is given by the so-called shearlets; cf. [90], for instance. These
functions are also based on translations and parabolic scalings. In contrast to
curvelets, however, shearings are used instead of rotations. Specifically, for a, s >
0, we set
8 98 9 8 √ 9
1s a 0 a as
Ma,s = √ = √ ,
01 0 a 0 a
3
ψa,s,b (x) = a − 4 ψ(Ma,s
−1
(x − b)),
166 4 Frequency and Multiscale Methods
Also, the shearlet transform allows a systematic discretization and can be imple-
mented by efficient algorithms.
Apart from curvelets and shearlets, there are numerous other procedures to
decompose images into elementary components that reflect the structure of the
image as well as possible: Ridgelets, edgelets, bandlets, brushlets, beamlets, or
platelets are just a few of these approaches. In view of the abundance of “-lets,”
some people also speak of ∗-lets (read “starlets”).
4.6 Exercises
Mξ Ty = eiξ ·y Ty Mξ , Mξ DA = DA MAT ξ .
1
f (x) = .
1 + x2
1 1
fa (x) = ,
aπ 1 + x 2
a 2
fa ∗ fb = fa+b .
4.6 Exercises 167
1 |x|2
− 4a
ga (x) = e
(4πa)d/2
ga ∗ gb = ga+b .
Exercise 4.9 (Regarding Equivalent Sobolev Norms, cf. Theorem 4.29) Let k, d ∈
N and define f, g : Rd → R by
f (ξ ) = |ξ α |2 , g(ξ ) = (1 + |ξ |2 )k .
|α|≤k
Show that there exist constants c, C > 0 (which may depend on k and d) such that
cf ≤ g ≤ Cf.
1
N/2−1
− 2π ikn
y2k =
0 yn + yn+N/2 e N/2 ,
N
n=0
1 − 2π in
N/2−1
− 2π ikn
0
y2k+1 = e N yn − yn+N/2 e N/2 .
N
n=0
4. Test the algorithm fft2conv with convolution kernels of your choice. Compare
the results and execution times with an implementation of the direct calculation
of the sums according to Sect. 3.3.3 (also in the light of the complexity estimates).
Exercise 4.13 (Overlap-Add Convolution) For a signal u ∈ CN and a convolution
kernel h ∈ CM that is significantly shorter than the signal, the convolution of u with
h can be computed considerably more efficiently than in Exercise 4.12. For M a
factor of N, we partition u into N/M blocks of length M:
N/M−1
u(r−1)M+n if 0 ≤ n ≤ M − 1,
un = urn−rM with urn =
r=0 0 otherwise.
u u ...u u u . . . u2M−1 . . .
0 1 M−1
M M+1
u1 u2
N/M−1
u∗h = vr .
r=0
Note that v r and v r+1 overlap. This procedure is also called the overlap-add
method.
1. Show that the complexity of the overlap-add convolution is given by
O(N log M).
2. Develop, implement, and document an algorithm fftconv_oa that computes
the convolution of u and v on the whole support by means of the overlap-add
method. Input: Two vectors u ∈ CN and h ∈ CM where M is a factor of N.
Output: The result w ∈ CN+M−1 of the convolution of u and v.
3. For suitable test examples, compare the results and execution times of your
algorithm with those of the algorithm fftconv in Exercise 4.12.
Exercise 4.14 (Scaled Bases of a Multiscale Analysis)
Let (Vj ) be a multiscale
analysis with generator φ. Show that the set {φj,k k ∈ Z} forms an orthonormal
basis of Vj .
Exercise 4.15 (Multiscale Analysis of Bandlimited Functions) Let
Vj = {u ∈ L2 (R) supp 0
u ⊂ [−2−j π, 2−j π]}.
170 4 Frequency and Multiscale Methods
1. Show that (Vj ) together with the generator φ(x) = sinc(x) forms a multiscale
analysis of L2 (R).
2. Determine the coefficient sequence (hk ) with which φ satisfies the scaling
equation (4.4) and calculate the corresponding wavelet.
Exercise 4.16 (Properties of ψ in Theorem 4.67) Let φ : R → R be the generator
of a multiscale analysis and let ψ be defined as in Theorem 4.67.
1. Show that the coefficients (hk ) of the scaling equation are real and satisfy the
following condition: For all l ∈ Z,
1 if l = 0,
hk hk+2l =
k∈Z 0 if l = 0.
(Use the fact that the function φ is orthogonal to the functions φ( · + m) for
m ∈ Z, m = 0.)
2. Show:
1 if l = 0,
(a) For all l ∈ Z, (ψ, ψ( · − l)) =
0 if l = 0.
(b) For all l ∈ Z, (φ, ψ( · − l)) = 0.
Chapter 5
Partial Differential Equations in Image
Processing
Our first encounter with a partial differential equation is this book was Applica-
tion 3.23 on edge detection according to Canny: we obtained a smoothed image by
solving the heat equation. The underlying idea was that images contain information
on different spatial scales and one should not fix one scale a priori. The perception of
an image depends crucially on the resolution of the image. If you consider a satellite
image, you may note the shape of coastlines or mountains. For an aerial photograph
taken from a plane, these features are replaced by structures on smaller scales such
as woods, settlements, or roads. We see that there is no notion of an absolute scale
and that the scale depends on the aims of the analysis. Hence, we ask ourselves
whether there is a mathematical model of this concept of scale. Our aim is to develop
a scale-independent representation of an image. This aim is the motivation behind
the notion of scale space and the multiscale description of images [88, 98, 144].
The notion “scale space” does not refer to a vector space or a similar structure,
but to a scale space representation or multiscale representation: for a given image
u0 : → R one defines a function u : [0, ∞[ × → R, and the new positive
parameter describes the scale. The scale space representation for scale parameter
equal to 0 will be the original image:
u(0, x) = u0 (x).
For larger scale parameters we should obtain images on “coarser scales.” We could
also view the introduction of the new scale variable as follows: We consider the
original image u0 as an element in a suitable space X (e.g., a space of functions
→ R). The scale space representation is then a map u : [0, ∞[ → X, i.e. a path
through the space X. This view is equivalent to the previous (setting u(0) = u0 and
“u(σ )(x) = u(σ, x)”).
An alternative way would be to set u(σ ) = (u0 * σ B) and u(σ ) = (u0 ⊕ σ B),
respectively, to get u : [0, ∞[ → B(Rd ).
One could easily produce more examples of scale spaces but one should ask, which
of these are meaningful. There is an axiomatic approach to this question from [3],
which, starting from a certain set of axioms, arrives at a restricted set of scale spaces
or multiscale analyses. In this chapter we will take this approach, which will lead
us to partial differential equations. This allows for a characterization of scale spaces
and to build further methods on top of these.
The idea behind an axiomatic approach is to characterize and build methods for
image processing by specifying certain obvious properties that can be postulated. In
the following we will develop a theory, starting from a fairly small set off axioms,
and this theory will show that the corresponding methods indeed correspond to the
solution of certain partial differential equations. This provides the foundation of the
widely developed theory of partial differential equations in image processing. The
starting point for scale space theory is the notion of multiscale analysis according
to [3].
5.1 Axiomatic Derivation of Partial Differential Equations 173
We start with the notion of scale space. Roughly speaking, a scale space is a family
of maps. We define the following spaces of functions:
Cb∞ (Rd ) = {u : Rd → R u ∈ C ∞ (Rd ), ∂ α u bounded for all α ∈ Nd },
BC(Rd ) = {u : Rd → R u ∈ C 0 (Rd ), u bounded}.
Architectural Axioms
For all u ∈ X, s, t ≥ 0 :
T0 (u) = u, [REC]
Ts ◦ Tt (u) = Ts+t (u).
The concatenation of two transforms in a scale space should give another trans-
formation of said scale space (associated with the sum of the scale parameters).
This implies that one can obtain Tt (u) from any representation Ts (u) with s < t.
Hence, Ts (u) contains all information to generate the coarser representations
174 5 Partial Differential Equations in Image Processing
Tt (u) in some sense. Put differently, the amount of information in the images
decreases. Moreover, one can calculate all representations on an equidistant
discrete scale by iterating a single operator Tt /N .
There is small a technical problem: the range of the operators Tt may not be
contained in the respective domains of definition. Hence, the concatenation of Tt
and Ts is not defined in general. For now we resort to saying that [REC] will be
satisfied whenever Ts ◦ Tt (u) is defined. In Lemma 5.8 we will see a more elegant
solution of this problem.
Regularity:
For all v ∈ X and t ∈ [0, 1] there exists C(v) > 0 : Tt v − v∞ ≤ C(v)t.
Locality:
Roughly speaking this axiom says that the value Tt u(x) depends only on the
behavior of u in a neighborhood of x if t is small.
Stability
If one image is brighter than another, this will be preserved under the scale space.
If Tt is linear, this is equivalent to Tt u ≥ 0 for u ≥ 0.
Morphological Axioms
The architectural axioms and stability do not say much about what actually happens
to images under the scale space. The morphological axioms describe properties that
are natural from the point of view of image processing.
5.1 Axiomatic Derivation of Partial Differential Equations 175
Gray-level-shift invariance:
This axiom says the one does not have any a priori assumption on the range of
gray values of the image.
Gray-scale invariance: (also contrast invariance; contains [GLSI], but is
stronger)
The map h rescales the gray values but preserves their order. The axiom says
that the scale space will depend only on the shape of the levelsets and not on the
contrast.
Translation invariance:
All points in Rd are treated equally, i.e., the action of the operators Tt does not
depend on the location of objects.
Isometry invariance:
In some sense, the scale space should be invariant with respect to zooming of
the images. Otherwise, it would depend on unknown distance of the object to the
camera.
176 5 Partial Differential Equations in Image Processing
The notion of scale space and its axiomatic description contains a broad class of the
methods that we already know. We present some of these as examples.
Example 5.2 (Coordinate Transformations) Let A ∈ Rd×d be a matrix. Then the
linear coordinate transformation
(Tt u)(x) = u exp(At)x (5.1)
is a scale space.
The operators Tt are linear and the scale space satisfies the axioms [REC], [REG],
[LOC], [COMP], [GSI], and [SCALE], but not [TRANS] or [ISO] (in general).
This can be verified by direct computation. We show two examples. For [REC]: Let
s, t ≥ 0. Then for ut (x) = (Tt u)(x) = u exp(At)x , one has
(Ts Tt u)(x) = (Ts ut )(x) = ut exp(As)x = u exp(As) exp(At)x
= u exp(A(s + t))x = (Ts+t u)(x) .
Moreover, exp(A0) = id, and hence (T0 u) = u. For [LOC] we consider the Taylor
expansion for u ∈ X:
The properties of the matrix exponential imply that exp At − id = O(t) and thus
(Tt u − Tt v)(x) = o(t).
In general, one gets semigroups of transformations also by solving ordinary dif-
ferential equations. We consider a vector field v ∈ C ∞ (Rd , Rd ). The corresponding
integral curves j are defined by
∂t j (t, x) = v j (t, x) , j (0, x) = x.
In the special case v(x) = Ax this reduces to the above coordinate transfor-
mation (5.1). Analogously, this class of scale space inherits the listed properties
of (5.1). The action of (5.1) and (5.2) on an image is shown in Fig. 5.1.
u ∗ ϕt if t > 0,
(Tt u) = (5.3)
u if t = 0,
1 − |x|4
2 √
ϕ(x) = e , τ (t) = t.
(4π)d/2
Hence, the validity of this axiom depends crucially on the kernel and the time
scaling.
[REG]: To check this axiom, we assume further properties of the kernel and the
time scaling:
√
xϕ(x) dx = 0, |x|2|ϕ(x)| dx < ∞, τ (t) ≤ C t . (5.4)
Rd Rd
−1 2
A=
2 −3
Original image u
Original image u v
Fig. 5.1 Example of a scale space given by a coordinate transformation. The first row shows the
image and the matrix that generates the scale space according to (5.1), as show in the second
row. Similarly, the third row shows the image and the vector field, and the fourth row shows the
applications of Tt according to (5.2)
5.1 Axiomatic Derivation of Partial Differential Equations 179
t=5 t = 50
Fig. 5.2 Illustration of multiscale convolution with a Gaussian kernel on different scales
= Cτ (t)2 ≤ Ct .
[LOC]: To check locality, we can use linearity and restrict our attention to the
case of u with ∂ α u(x) = 0 for all α. Without loss of generality we can also
assume x = 0. Moreover, we assume that the kernel ϕ and the time scaling τ
satisfy the assumptions in (5.4) and, on top of that, also
|ϕ(x)| dx ≤ CR −α , α > 2.
|x|>R
Now we estimate:
|(Tt u)(0)| = u(y)ϕt (y) dy
Rd
≤ sup |∇ 2 u(y)| |y|2 |ϕt (y)| dy + u∞ |ϕt (y)| dy
|y|≤ε |y|≤ε |y|>ε
≤ sup |∇ 2 u(y)|τ (t)2 |x|2 |ϕ(x)| dx
|y|≤ε Rd
+ Cu∞ |ϕ(x)| dx
|x|≥ε/τ (t )
≤ C sup |∇ 2 u(y)|t + ε−α t α/2 .
|y|≤ε
Now let δ > 0 be given. Since ∇ 2 u(0) = 0 and ∇ 2 u is continuous, we see that
sup|y|≤ε |∇ 2 u(y)| → 0 for ε → 0. Hence, we can choose ε > 0 small enough to
ensure that
C sup |∇ 2 u(y)|t ≤ δ2 t.
|y|≤ε
Cε−α t α/2 ≤ δ2 t.
Putting things together, we see that for every δ > 0, we have (if t is small enough)
|(Tt u)(0)| ≤ δt
and this means that |(Tt u)(0)| = o(t), which was our aim.
[COMP]: Again we can use the linearity of Tt to show that this axiom is satisfied:
It is enough to show Tt u ≥ 0 for all u ≥ 0. This is fulfilled if ϕ ≥ 0 almost
everywhere, since then,
(Tt u)(x) = u(y) ϕt (x − y) dx ≥ 0 .
Rd
≥0 ≥0 a.e.
≥0 a.e.
5.1 Axiomatic Derivation of Partial Differential Equations 181
Hence, isometry invariance holds for kernels that are rotationally invariant.
[SCALE]: For λ ≥ 0 we can write
x − y
1
Tt (Dλ id )u (x) = u(λy) ϕ dy
Rd τ (t)d τ (t)
λx − z
1
= u(z) d ϕ dz = Dλ id(Tt u) (x),
Rd λτ (t) λτ (t)
(Tt u) = u ⊕ tB . (5.5)
Similarly one could define a multiscale erosion related to B; see Fig. 5.3. We restrict
our attention to the case of dilation.
In contrast to Examples 5.2 and 5.3 above, the scale space in (5.5) is in general
nonlinear. We discuss some axioms in greater detail:
[REC]: The axiom [REC] is satisfied if B is convex. You should check this in
Exercise 5.4.
182 5 Partial Differential Equations in Image Processing
Original image u
Fig. 5.3 Example of multiscale erosion (second row) and dilation (third row). The structure
element is an octagon centered at the origin
[REG]: We estimate
If we assume again that B is bounded and ∂ α u(x) = ∂ α v(x) for |α| ≤ 1, we get
3 4 3 4
sup u(x + y) − v(x + y) = o(t) and sup v(x + y) − u(x + y) = o(t).
y∈t B y∈t B
Again we deduce that multiscale dilation satisfies the axiom [LOC] if the
structure element is bounded.
[COMP]: We have already seen that the comparison principle is satisfied in
Theorem 3.29 (under the name “monotonicity”).
[TRANS]: Translation invariance has also been shown in Theorem 3.29.
[GSI]: Gray-scale invariance has been shown in Theorem 3.31 under the name
“contrast invariance.”
[ISO]: Isometry invariance is satisfied if the structure element is invariant under
rotations.
[SCALE]: Scale invariance can be seen as follows:
Example 5.5 (Fourier Soft Thresholding) A somewhat unusual scale space is given
by the following construction. Let St be the operator that applies the complex soft
thresholding function
|St (x)|
x
|x| |x| − t if |x| > t,
St (x) =
0 if |x| ≤ t,
|x|
t
pointwise, i.e., St (u) (x) = St u(x) . We apply soft thresholding to the Fourier
transform, i.e.,
Tt (u) = F −1 St (F u) . (5.6)
Original image u t = 50
t = 100 t = 150
Fig. 5.4 Illustration of Fourier soft thresholding from Example 5.5 and different scales t
If we assume that there exists a unique solution of (5.7) for every initial value u0 ∈
Cb∞ (Rd ), then we can define
Tt u0 = u(t, ·),
Fig. 5.5 Denoising by Fourier and wavelet soft thresholding from Example 5.5. The parameter t
has been chosen to maximize the PSNR
Tt u − Tt v ≤ v − u∞ .
holds. Swapping the roles of u and v we obtain the claimed property [CONT]. $
#
5.1 Axiomatic Derivation of Partial Differential Equations 187
As we have seen in Example 5.4, the scale space for dilation satisfies the
axioms [COMP] and [GLSI]. So we have shown again that [CONT] holds in this
case, a result we already derived in Exercise 3.10.
The next lemma allows us to extend a scale space from Cb∞ (Rd ) to a larger space,
namely to the space
BUC(Rd ) = {u : Rd → R u bounded and uniformly continuous}.
Lemma 5.8 If [CONT] and [TRANS] hold for (Tt ), one can extend every Tt
uniquely to a mapping
Tt : BUC(Rd ) → BUC(Rd ).
Proof By Lipschitz continuity, [CONT], and the density of C ∞ (Rd ) in the space
BUC(Rd ) we can extend Tt to a map Tt : BUC(Rd ) → BC(Rd ) uniquely. It remains
to show the uniform continuity of Tt u, u ∈ BC(Rd ).
We choose for arbitrary ε > 0 a δ > 0 such that for all x ∈ Rd and |h| < δ, one
has |u(x) − u(x + h)| < ε. With v = Th u and because of [TRANS] and [CONT]
we get
(Tt u)(x) − (Tt u)(x + h) = (Tt u)(x) − (Tt v)(x)
≤ u − v∞ = sup u(x) − u(x + h) ≤ ε,
x∈Rd
1
δt (u)∞ = Tt (0 + 1u) − Tt (0) − 1u∞ ≤ C(u),
t
where, for h → 0, one has vh ∈ Cb∞ (Rd ). Now one easily sees that all vh are in a
suitable set Q from (5.8). The estimate (5.9) gives the desired Lipschitz inequality
δt (u) (· + hy) − δt (u)∞ = δt (u + hvh ) − δt (u)∞ ≤ Ch,
(which also contains a detailed argument for the uniform convergence A[un ] →
A[u]). $
#
Our next step is to note that the operator A can be written as a (degenerate)
elliptic differential operator of second order.
Definition 5.10 Denote by S d×d the space of symmetric d × d matrices. We write
X − Y 0 or X Y if X − Y is positive semi-definite. A function f : S d×d → R
is called elliptic if f (X) ≥ f (Y ) for X Y . If f (X) > f (Y ) for X Y with
X = Y , f is called strictly elliptic, and degenerate elliptic otherwise.
Theorem 5.11 Let (Tt )t ≥0 be a scale space that satisfies the axioms [GEN],
[COMP], and [LOC]. Then there exists a continuous function F : Rd × R × Rd ×
S d×d → R such that F (x, c, p, · ) is elliptic for all (x, c, p) ∈ Rd × R × Rd and
A[u](x) = F x, u(x), ∇u(x), ∇ 2 u(x)
w ≥ 0, w = 1 on Bσ (x0 ), (5.10)
and use wε (x) = w((x − x0 )/ε + x0 ) (see Fig. 5.6) to construct the functions
ūε (x) = wε (x)uε (x) + 1 − wε (x) v(x).
These functions have the property that ∂ α ūε (x) = ∂ α uε (x) for all α ∈ Nd as well as
ūε (x) ≥ v(x) on the whole Rd . By [COMP] this implies Tt ūε (x0 ) ≥ Tt v(x0 ), and
by the monotonicity of the limit also A[ūε ](x0) ≥ A[v](x0 ). Moreover, A[ūε ](x0 ) =
A[uε ](x0) by construction of w and, again, by [LOC]. The continuity of A gives
|x − x0 | |x − x0 |
mε σε mε
and A[u](x0) ≥ A[v](x0 ). Switching the sign in the previous argument, we also get
A[u](x0) ≤ A[v](x0 ) and hence A[u](x0) = A[v](x0 ). We conclude that
A[u](x) = F x, u(x), ∇u(x), ∇ 2 u(x) ,
as desired.
It remains to show the continuity of F and that F is elliptic in its last component.
The latter follows from the following consideration: Construct, using w from (5.10),
the functions
u(x) = c + p · (x − x0 ) + 12 (x − x0 )T X(x − x0 ) w(x),
v(x) = c + p · (x − x0 ) + 12 (x − x0 )T Y (x − x0 ) w(x),
converge uniformly to
u(x) = c + p · (x − x0 ) + 12 (x − x0 )T X(x − x0 ) w(x),
and all their derivatives converge to the respective derivatives, too. By the conclusion
of Theorem 5.9 we get that A[un ] → A[u] uniformly. This implies
Remark 5.12 The proof also reveals the reason behind the fact that the order of the
differential operator has to be two. The auxiliary function η from the proof is, in a
neighborhood of zero, a polynomial of degree two. Every other positive polynomial
of higher degree would also work, but would imply a dependence on higher
derivatives. However, there is no polynomial of degree one that is strictly positive in
a pointed neighborhood of zero. Hence, degree two is the lowest degree for which
the argumentation in the proof works, and hence the order of the differential operator
is two.
If we add further morphological axioms to the setting of Theorem 5.11, we obtain
an even simpler form of F .
Lemma 5.13 Assume that the assumptions in Theorem 5.11 hold.
1. If, additionally, [TRANS] holds, then
Proof The proof is based on the fact that the properties [TRANS] and [GLSI] are
transferred from Tt to A, and you should work out the rest in Exercise 5.5. $
#
By Theorem 5.11 one may say that for u0 ∈ Cb∞ (Rd ), one has that u(t, x) =
(Tt u0 )(x) solves the Cauchy problem
∂u
(t, x) = F (x, u(t, x), ∇u(t, x), ∇ 2 u(t, x)), u(0, x) = u0 (x)
∂t
in some sense, but only at time t = 0 (and the time derivative is only a one-sided
limit).
Remark 5.14 We used in the above formula for the Cauchy problem the widely
used convention that for function with distinguished “time coordinate” (t in this
case), the operators ∇ and ∇ 2 act only on the “spatial variable” x. We will keep
using this convention in the following.
192 5 Partial Differential Equations in Image Processing
To show that the equation is also satisfied for t > 0, we can argue as follows:
By [REC], we should have
∂u Tt +s (u0 )(x) − Tt (u0 )(x) Ts Tt (u0 ) (x) − Tt (u0 )(x)
(t, x) = lim = lim
∂t s→0+ s s→0+ s
, - , -
= A Tt (u0 ) (x) = A u(t, ·) (x) = F x, u(t, x), ∇u(t, x), ∇ 2 u(t, x) .
This would imply that u satisfies the differential equation for all times. However,
there is a problem with this argument: Tt (u0 ) is not necessarily
,an element
- in
Cb∞ (Rd ), and the conclusion lims→0+ 1s Ts (Tt (u0 )) − Tt (u0 ) = A Tt (u0 ) is not
valid.
The lack of regularity is a central problem in the theory of partial differential
equations. An approach that often helps is to introduce a suitable notion of weak
solutions. This means a generalized notion of solution that requires less regularity
than the original equation requires. In the context of scale space theory, the notion
of viscosity solutions is appropriate to define weak solutions with the desired
properties. In the following we give a short introduction to the wide field of viscosity
solutions but do not go into great detail.
The notion of viscosity solution is based on the following important observation:
Theorem 5.15 Let F : [0, ∞[ × Rd × R × Rd × S d×d → R be a continuous
function that is elliptic, i.e., F (t, x, u, p, X) ≥ F (t, x, u, p, Y ) for X Y . Then
u ∈ C 2 ([0, ∞[ × Rd ) is a solution of the partial differential equation
∂u
(t, x) = F t, x, u(t, x), ∇u(t, x), ∇ 2u(t, x) (5.11)
∂t
∂ϕ
(t0 , x0 ) ≤ F t0 , x0 , u(t0 , x0 ), ∇ϕ(t0 , x0 ), ∇ 2 ϕ(t0 , x0 ) .
∂t
2. For all ϕ ∈ C 2 ([0, ∞[ × Rd ) and for all local minima (t0 , x0 ) of the function
u − ϕ,
∂ϕ
(t0 , x0 ) ≥ F t0 , x0 , u(t0 , x0 ), ∇ϕ(t0 , x0 ), ∇ 2 ϕ(t0 , x0 ) .
∂t
5.1 Axiomatic Derivation of Partial Differential Equations 193
Proof Let ϕ ∈ C 2 ([0, ∞[ × Rd ) and let (t0 , x0 ) be a local maximum of the function
u − ϕ. By the classical necessary conditions for a maximum, we obtain
Hence, by ellipticity,
∂ϕ ∂u
(t0 , x0 ) = (t0 , x0 ) = F t0 , x0 , u(t0 , x0 ), ∇u(t0 , x0 ), ∇ 2 u(t0 , x0 )
∂t ∂t
≤ F t0 , x0 , u(t0 , x0 ), ∇u(t0 , x0 ), ∇ 2 ϕ(t0 , x0 )
= F t0 , x0 , u(t0 , x0 ), ∇ϕ(t0 , x0 ), ∇ 2 ϕ(t0 , x0 ) .
∂ϕ
(t0 , x0 ) ≥ F t0 , x0 , u(t0 , x0 ), ∇ϕ(t0 , x0 ), ∇ 2 ϕ(t0 , x0 ) ,
∂t
3. u is a viscosity solution if u is both a viscosity sub-solution and viscosity super-
solution.
For the special case that the function F depends only on ∇u and ∇ 2 u, we have
the following helpful lemma:
Lemma 5.17 Let F : Rd × S d×d → R be continuous and elliptic, i.e., F (p, X) ≥
F (p, Y ) for X Y . A function u ∈ C ([0, ∞[ × Rd ) is a viscosity sub- or super-
solution, respectively, if for all f ∈ Cb∞ (Rd ) and g ∈ Cb∞ (R), the respective part in
Definition 5.16 holds for ϕ(t, x) = f (x) + g(t).
Proof We show the case of a viscosity sub-solution and assume that Definition 5.16
holds for all ϕ of the form ϕ(t, x) = f (x) + g(t) with f, g as given. Without
loss of generality we assume that (t0 , x0 ) = (0, 0) and consider a function ϕ ∈
C 2 ([0, ∞[ × Rd ) such that u − ϕ has a maximum in (0, 0). Hence, we have to show
that
∂ϕ
(0, 0) ≤ F (∇ϕ(0, 0), ∇ 2 ϕ(0, 0)).
∂t
We consider the Taylor expansion of ϕ at (0, 0) and get with a = ϕ(0, 0), b =
∂ϕ 2
p = ∇ϕ(0, 0), c = 12 ∂∂t ϕ2 (0, 0), Q = 12 ∇ 2 ϕ(0, 0), and q = 12 ∂t∂ ∇ϕ(0, 0)
∂t (0, 0),
that
ϕ(t, x) = a + bt + p · x + ct 2 + x T Qx + tq · x + o(|x|2 + t 2 ).
We define, for all ε > 0, the functions f ∈ Cb∞ (Rd ) and g ∈ Cb∞ (R) for small
values of x and t by
|q|
f (x) = a + p · x + x T Qx + ε(1 + 2
2 )|x| ,
g(t) = bt + ( |q|
2ε + ε + c)t .
2
The boundedness of f and g can be ensured with the help of a suitable cutoff
function w as in (5.10) in the proof of Theorem 5.11. Hence, for small values of
x and t, we have
ε|q| |q| 2
ϕ(t, x) = f (x) + g(t) − |x|2 + t − tq · x + ε(|x|2 + t 2 ) + o(|x|2 + t 2 ).
2 2ε
5.1 Axiomatic Derivation of Partial Differential Equations 195
Because of
ε|q| 2 |q| 2 |q| |q| 2
|x| + t − tq · x ≥ ε |x|2 + t − t|q||x|
2 2ε 2 2ε
2 2 2
|q|ε |q|
= |x| − t ≥0
2 2ε
we obtain, for small x and t, that ϕ(t, x) ≤ f (x) + g(t). Hence, in a neighborhood
of (0, 0), we also get u(t, x) − ϕ(t, x) ≥ u(t, x) − f (x) − g(t), and in particular
we see that u − f − g has a local maximum at (0, 0). By assumption we get
∂(f + g)
(0, 0) ≤ F (∇(f + g)(0, 0), ∇ 2 (f + g)(0, 0)).
∂t
Now we note that
∂(f + g) ∂ϕ
(0, 0) = (0, 0), ∇(f + g)(0, 0) = ∇ϕ(0, 0),
∂t ∂t
∇ 2 (f + g)(0, 0) = ∇ 2 ϕ(0, 0) + 2ε(1 + |q|) id,
∂u
(t, x) = F (∇u(t, x), ∇ 2 u(t, x))
∂t
with initial condition u(0, x) = u0 (x).
Proof Theorem 5.11 and Lemma 5.13 ensure that the generator has the stated form.
Now we show that u is a viscosity sub-solution. The proof that u is also a viscosity
super-solution is similar. Let ϕ ∈ C 2 ([0, ∞[×Rd ) be such that (t0 , x0 ) with t0 > 0 is
a local maximum of u−ϕ. Without loss of generality we can assume that u(t0 , x0 ) =
196 5 Partial Differential Equations in Image Processing
Rearranging gives
g(t0 ) − g(t0 − h) Th (f ) − f
≤ (x0 ) .
h h
Since f ∈ Cb∞ (Rd ) and g are differentiable, we can pass to the limit h → 0 and get,
by Theorem 5.11, g (t0 ) ≤ F ∇f (x0 ), ∇ 2 f (x0 ) . Since ϕ(t, x) = f (x) + g(t), we
have also
∂ϕ
(t0 , x0 ) ≤ F ∇ϕ(t0 , x0 ), ∇ 2 ϕ(t0 , x0 ) .
∂t
And hence, by definition, u is a viscosity sub-solution. $
#
We have now completed our structural analysis of scale spaces. We have seen that
scale spaces naturally (i.e., given the specific axioms) lead to functions F : Rd ×
R × Rd × S d×d → R and that these functions characterize the respective scale
space. In this section we will see how the morphological axioms influence F and
discover some differential equations that are important in imaging.
The main result of this chapter will be that among the linear scale spaces there is
essentially only the heat equation.
5.2 Standard Models Based on Partial Differential Equations 197
∂t u − cu = 0 in R+ × Rd ,
u(0, ·) = u0 in Rd .
Proof By Theorem 5.9 the infinitesimal generator A exists, and by Theorem 5.11
and Lemma 5.13 it has the form
Especially, we have
A[DR u] = DR A[u],
F1 (R T p) = F1 (p), (A)
F2 (RXR T ) = F2 (X). (B)
198 5 Partial Differential Equations in Image Processing
3. Now (A) implies that F1 (p) = f (|p|). Linearity of F implies F1 is also linear
and thus, F1 (p) ≡ 0. From (B) we deduce that F2 depends only on quantities
that are invariant under similarity transforms. This is the set of eigenvalues of X
with their multiplicities. Since all eigenvalues have the same role, linearity leads
to
for some c ∈ R.
4. By [COMP] we get that for X Y , we have F (p, X) ≥ F (p, Y ). This implies
that for X Y , we have
c trace X ≥ c trace Y,
i.e. c trace(X − Y ) ≥ 0.
Fig. 5.7 Dislocation of information by the heat equation illustration by the movement of edges.
Left: original image, middle: image after application of the heat equation, right: level lines of the
middle image
Among the linear scale spaces, there is essentially only the heat equation. In
Remark 5.20 we saw that linearity and contrast invariance are mutually exclusive.
In this section we study the consequences of contrast invariance, i.e., of the
axiom [GSI]. This means that we look for scale spaces that are invariant under
contrast changes. In other words, the scale space depends only on the level sets,
i.e., only on the shape of objects and not on the particular gray values, hence the
name “morphological equations.”
Lemma 5.22 Let F : Rd \ {0} × S d×d → R be continuous. A scale space
∂t u = F (∇u, ∇ 2 u)
satisfies [GSI] if and only if F satisfies the following invariance: for all p = 0,
X ∈ S d×d , and all λ ∈ R, μ ≥ 0, one has (with p ⊗ p = ppT ) that
∂t u = F (∇u, ∇ 2 u),
(I)
u(0) = u0 ,
∇(h ◦ u) = h ∇u
∇ 2 (h ◦ u) = h ∇ 2 u + h ∇u ⊗ ∇u
∂t v = h ∂t u = h F (∇u, ∇ 2 u).
By (∗) we get, again using the chain rule (similarly to the previous part), that
In particular, we get from Lemma 5.22, that F (p, · ) does not depend on the entry
Xd,d . It remains to show that the entries Xd,i with 1 ≤ i ≤ d − 1 also do not play a
role. To that end, we define M = xd,12 + · · · + x2
d,d−1 and
⎛ ⎞
ε
⎜ .. ⎟
⎜ . ⎟
Iε = ⎜ ⎟.
⎝ ε ⎠
M
ε
Qp XQp X + Iε
X Qp XQp + Iε .
Axiom [COMP] implies ellipticity of F . Since F does not depend on the entry Xd,d ,
we get, on the one hand, that
Proof First we note that we have seen in step 2 of the proof of Theorem 5.19
that [ISO] implies that for every isometry R ∈ Rd×d ,
Hence, the function F can depend only on the eigenvalues of Qp XQp on the
space orthogonal to p. Since p is an eigenvector of Qp for the eigenvalue zero,
there is a function G such that
2. Now let R denote any isometry and set q = R T p. Then also Rq = p and
|p| = |q|. We calculate
T
RQq = R(id − qq
|q|2
)
pq T
=R− |q|2
T
= (id − pp
|p|2
)R = Qp R,
We consider the unit ball B1 (0) ⊂ Rd as a structure element and the corresponding
scale spaces for erosion and dilation:
Et u0 = u0 * tB,
Dt u0 = u0 ⊕ tB.
5.2 Standard Models Based on Partial Differential Equations 203
By Theorem 5.15 we immediately get that these differential equations are solved
by erosion and dilation, respectively, in the viscosity sense. Since erosion and
dilation produce non-differentiable functions in general, one cannot define classical
solutions here, and we see how helpful the notion of viscosity solutions is.
Remark 5.25 (Interpretation as a Transport Equation) To understand the equations
in a qualitative way, we interpret them as transport equations. We write the
infinitesimal generator for the dilation as
∇u
|∇u| = |∇u| · ∇u.
We see that the invariance from Lemma 5.22 is satisfied. The differential operator
related to F is
F (∇u, ∇ 2 u) = trace (id − ∇u⊗∇u
|∇u|2
)∇ 2 u .
204 5 Partial Differential Equations in Image Processing
To better understand this quantity we use ∇u⊗∇u = ∇u∇uT , linearity of the trace,
and the formula trace(A B C) = trace(C A B) (invariance under cyclic shifts, if the
dimensions fit):
F (∇u, ∇ 2 u) = u − 1
|∇u|2
trace(∇u∇uT ∇ 2 u)
= u − 1
|∇u|2
trace(∇uT ∇ 2 u∇u) = u − 1
|∇u|2
∇uT ∇ 2 u∇u.
and denote by ∂η u the first derivative of u in the direction η and by ∂ηη u the second
derivative, i.e.,
∂η u = |∇u|, ∂ηη u = ηT ∇ 2 u η.
Hence,
F (∇u, ∇ 2 u) = u − ∂ηη u.
We recall that the Laplace operator is rotationally invariant and note that u − ∂ηη u
is the sum of the second derivatives of u in the d − 1 directions perpendicular to the
gradient. Since the gradient is normal to the level sets of u, we see that u − ∂ηη u
is the projection of the Laplace operator onto the space tangential to the level sets.
Hence, the differential equation
∂t u = trace(Q∇u ∇ 2 u) = u − ∂ηη u
is also called the heat equation on the tangent space or the morphological equivalent
of the heat equation.
The case d = 2 is even easier, since the space orthogonal to η is one-dimensional.
We define
−η2
ξ = η⊥ =
η1
∂ξ u = 0, ∂ξ ξ u = ξ T ∇ 2 u ξ.
5.2 Standard Models Based on Partial Differential Equations 205
ξ
η
u = const
trace(Q∇u ∇ 2 u) = ∂ξ ξ u.
In Exercise 5.9 you shall show that ∂ξ ξ u is related to the curvature κ of the levelset;
more precisely,
∂ξ ξ u = |∇u|κ.
We summarize: the differential equation of the generator (5.13) has the form
∂t u = |∇u|κ.
∇u
∂t u = κ ∇u,
|∇u|
and we see that the initial value (as in the case of dilation and erosion) is shifted
in the direction of the negative gradient. In this case, the velocity is proportional to
the curvature of the levelsets, and the resulting scale space is also called curvature
motion. As a result, the level sets are “straightened”; this explains the alternative
name curve shortening flow. One can even show that the boundaries of convex
and compact sets shrink to a point in finite time and that they look like a circle,
asymptotically; see, for example, [79]. The action of the curvature motion on an
image is shown in Fig. 5.8.
In higher dimensions, similar claims are true: here trace(Q∇u ∇ 2 u) = |∇u|κ
with the mean curvature κ; this motivates the name mean curvature motion. For the
definition and further properties of the mean curvature we refer to [12].
206 5 Partial Differential Equations in Image Processing
t = 10 t = 30
The heat equation satisfies the most axioms among the linear methods, but it is
not well suited to denoise images. Its main drawback is the heavy smoothing of
edges, see Remark 5.21. In this section we will treat denoising methods that are
variations of the heat equation. We build the modifications of the heat equation on
its interpretation as a diffusion process: Let u describe some quantity (e.g., heat
of metal or the concentration of salt in a liquid) in d dimensions. A concentration
gradient induces a flow j from high to low concentration:
The matrix A is called a diffusion tensor. If A is a multiple of the identity, one speaks
of isotropic diffusion, if not, of anisotropic diffusion. The diffusion tensor controls
how fast and in which direction the flow goes.
5.3 Nonlinear Diffusion 207
Moreover, we assume that the overall concentration remains constant, i.e., that
no quantity appears or vanishes. Mathematically we describe this as follows. For
some volume V we consider the change of the overall concentration in this volume:
∂t u(x) dx.
V
If the total concentration stays the same, this change has to be equal to the flow
across the boundary of V by the flow j , i.e.,
∂t u(x) dx = j · (−ν) dHd−1.
V ∂V
Interchanging integration and differentiation on the left-hand side and using the
divergence theorem (Theorem 2.81) on the right-hand side, we obtain
∂t u(x) dx = − (div j )(x) dx.
V V
Since this holds for all volumes, we get that the integrands are equal at almost every
point:
Plugging the continuity equation into Fick’s law ,we obtain the following equation
for u:
∂t u = div(A∇u).
The idea of Perona and Malik in [110] was to slow down diffusion at edges. As we
have seen in Application 3.23, edges can be described as points where the gradient
has a large magnitude. Hence, we steer the diffusion tensor A in such a way that it
slows down diffusion where gradients are large. Since we don’t have any reason for
anisotropic diffusion yet, we set
A = g(|∇u|) id
208 5 Partial Differential Equations in Image Processing
g1 g2
λ s λ s
Fig. 5.9 Functions g from (5.14) in the diffusion coefficient of the Perona-Malik equation
with some function g : [0, ∞[ → [0, ∞[ that is close to one for small arguments
and monotonically decreasing to zero. Consequently, diffusion acts as it does in
the heat equation at places with small gradient, and diffusion is slower at places
with large gradient. Two widely used examples of such functions, depending on a
parameter λ > 0, are
1 s2
−
g1 (s) = , g2 (s) = e 2λ2 , (5.14)
s2
1+ λ2
∂t u = div(g(|∇u|)∇u),
(5.15)
u(0, x) = u0 (x).
Figure 5.10 illustrates that the Perona-Malik equation indeed has the desired effect.
We begin our analysis of the Perona-Malik equation with the following observa-
tion:
Lemma 5.26 We have
g (|∇u|) T 2
div(g(|∇u|)∇u) = ∇u ∇ u∇u + g(|∇u|)u,
|∇u|
and thus the Perona-Malik equation has the following infinitesimal generator:
g (|p|) T
F (p, X) = p Xp + g(|p|) trace X.
|p|
Fig. 5.10 Results for the Perona-Malik equation. Top left: original image. Second row: Function
g1 with λ = 0.02. Third row: function g1 with λ = 0.005. Fourth row: Function g2 with λ = 0.02.
The images show the results at times t = 5, 50, 500, respectively
210 5 Partial Differential Equations in Image Processing
and orthogonal to η. For that reason some authors call the Perona-Malik equation
“anisotropic diffusion.”
The function F with g1 and g2 from (5.14) is not elliptic. This leads to a problem,
since our current results do not imply that the Perona-Malik equation has any
solution, even in the viscosity sense. In fact, one can even show that the equation
does not possess any solution for certain initial values [83, 86]. Roughly speaking,
the reason for this is that one cannot expect or show that the gradients remain
bounded, and hence the diffusion coefficient cannot be bounded away from zero.
Some experience with parabolic equations hints that this should lead to problems
with existence of solutions.
We leave the problem aside for now and assume that the Perona-Malik equation
has solutions, at least for small time.
Theorem 5.28 The Perona-Malik method satisfies the axioms [REC], [GLSI],
[TRANS], [ISO]. The axiom [SCALE] holds if g is a monomial.
Proof Recursivity [REC] holds, since the operators Tt are solution operators of a
differential equation, and hence possess the needed semi-group property.
For gray value shift invariance [GLSI] we note, that if u solves the differential
equation (5.15), then u + c solves the same differential equation with initial value
u0 (x) + c. This implies Tt (u0 + c) = Tt (u0 ) + c as desired. (In other words, the
differential operator is invariant with respect to linear gray level shifts.)
Translation invariance [TRANS] and isometry invariance [ISO] can be seen
similarly (the differential operator is invariant with respect to translations and
rotations).
If g is a monomial, i.e., g(s) = s p , then v(t, x) = u(t, λx) satisfies
as claimed. $
#
Since the map F is not elliptic in general, we cannot use the theory of viscosity
solutions in this case. However, we can consider the equation on some domain ⊂
Rd (typically on a rectangle in R2 ) and add boundary conditions, which leads us to a
so-called boundary initial value problem. (One can also consider viscosity solutions
for differential equations on domains; however, the formulation of boundary values
is intricate.) For the boundary initial value problem for the Perona-Malik equation
one can show that a maximum principle is satisfied:
5.3 Nonlinear Diffusion 211
∂t u = div(g(|∇u|)∇u), in [0, ∞[ × ,
∂ν u = 0 on [0, ∞[ × ∂,
u(0, x) = u0 (x) for x ∈ .
and moreover,
u(t, x) dx = u0 (x) dx.
Proof For p < ∞ we set h(t) = |u(t, x)|
p dx and differentiate h:
d
h (t) = |u(t, x)|p dx
dt
= p|u(t, x)|p−2 u(t, x)∂t u(t, x) dx
= p|u(t, x)|p−2 u(t, x) div(g(|∇u|)∇u)(t, x) dx.
The boundary integral vanishes due to the boundary condition, and the other integral
is nonnegative. Hence, h (t) ≤ 0, i.e., h is decreasing, which proves the claim for
p < ∞, and also h(t)1/p = u(t, · )p is decreasing for all p ∈ [2, ∞[. The case
p = ∞ now follows by letting p → ∞.
212 5 Partial Differential Equations in Image Processing
For the second claim we argue similarly and differentiate the function μ(t) =
u(t, x) dx, again using the divergence theorem to get
μ (t) = ∂t u(t, x) dx
= 1 · div(g(|∇u|)∇u)(t, x) dx
= g(|∇u(t, x)|)∂ν u(t, x) dHd−1 − (∇1) · g(|∇u(t, x)|)∇u(t, x) dx.
∂
s s2
−
f1 (s) = , f2 (s) = s e 2λ2 ; (5.16)
s2
1+ λ2
f1 f2
λ s λ s
Fig. 5.11 Flux functions f1 and f2 from (5.16) to the functions g1 and g2 from (5.14)
5.3 Nonlinear Diffusion 213
lines we see that the coefficient g(|∇u|) is small for large gradient, leading to slow
diffusion. We may conjecture three things:
• The Perona-Malik equation has the ability to make edges steeper (i.e., sharper).
• The Perona-Malik equation is unstable, since it has parts that behave like
backward diffusion.
• At steep edges, noise reduction may not be good.
We try to get some rigorous results and ask, what happens to solutions of the Perona-
Malik equation at edges.
To answer this question we follow [85] and consider only the one-dimensional
equation. We denote by u the derivative with respect to x and consider
Hence, the one-dimensional equation behaves like the equation in higher dimensions
in the direction perpendicular to the level lines, and we can use results for the one-
dimensional equation to deduce similar properties for the case in higher dimensions.
To analyze what happens at edges under the Perona-Malik equation, we define an
edge as follows:
Definition 5.30 We say that u : R → R has an edge at x0 if
1. |u (x0 )| > 0;
2. u (x0 ) = 0;
3. u (x0 )u (x0 ) < 0.
The first condition says that the image is not flat, while the second and the third
conditions guarantee a certain kind of inflection point; see Fig. 5.12.
Since in one dimension we have
x0 x0
edge in x0 no edge in x0
Fig. 5.12 Left: inflection point of the first kind (edge), right: inflection point of the second kind
(no edge)
214 5 Partial Differential Equations in Image Processing
In the following we will use the compact notation uη = ∂η u, uηη = ∂ηη u, etc. The
next theorem shows what happens locally at an edge.
Theorem 5.31 Let u be a five times continuously differentiable solution of the
Perona-Malik equation (5.17). Moreover, let x0 be an edge of u(t0 , · ) where
additionally uηηηη (t0 , x0 ) = 0 is satisfied. Then at (t0 , x0 ),
1. ∂t uη = f (uη )uηηη ,
2. ∂t uηη = 0, and
3. ∂t uηηη = 3f (uη )u2ηηη + f (uη )uηηηηη .
Proof We calculate the derivatives by swapping the order of differentiation:
and
Since there is an edge at x0 , all terms uηη and uηηηη vanish. With ∂η (f (uη )) =
f (uη )uηη and ∂ηη (f (uη )) = f (uη )u2ηη + f (uη )uηηη we obtain the claim. #
$
The second assertion of the theorem says that edges remain inflection points. The
first assertion indicates whether the edge gets steeper or less steep with increasing
t. The third assertion tells us, loosely speaking, whether the inflection point tries to
change its type. The specific behavior depends on the functions f and g.
Corollary 5.32 In addition to the assumptions of Theorem 5.31 assume that the
diffusion coefficient g is given by one of the functions (5.14). Moreover, assume that
5.3 Nonlinear Diffusion 215
Since uηηηηη > 0, we conclude that f2 (uη ) < 0 for uη > λ and f2 (uη ) < 0 for
√
uη < 3λ, which implies assertion 3.
For g1 from (5.14) and f1 from (5.16) we have that f1 (uη ) < 0 if and only if
uη > λ (which proves assertion 1), and moreover,
2
− 3 − λs 2 λ2s2
f1 (s) = 2 3
.
1 + λs 2
Similarly to the case g2 , we also have f1 (uη ) < 0 for uη > λ and f1 (uη ) < 0 for
√
uη < 3λ. #
$
We may interpret the corollary as follows:
• The second point says that inflection points remain inflection points.
• The first point says that steep edges get steeper and flat edges become flatter.
More precisely: Edges no steeper than λ become flatter.
• √
If an edge is steeper than λ, there are two possibilities: If it is no steeper than
3λ, the inflection point remains an inflection√ point of the first kind, i.e. the
edge stays an edge. If the edge is steeper than 3λ, the inflection point may try
to change its type (uηηη can grow and become positive). This potentially leads to
the so-called staircasing effect; see Fig. 5.13.
Using a similar technique, one can derive a condition for a maximum principle of
the one-dimensional Perona-Malik equation; see Exercise 5.11.
216 5 Partial Differential Equations in Image Processing
3 u0 3
2 2
1 1
u0
2 4 2 4
Fig. 5.13 Illustration of the staircasing effect, sharpening and smoothing by the Perona-Malik
equation with the function g2 and λ = 1.6. Left: Initial value of the Perona-Malik equation and
= 0.5, 2, 3.5, 5. We see all
its derivative. Right: solution of the Perona-Malik equation at times t √
predicted effects from Corollary 5.32: the first edge is steeper
√ than 3λ and indeed we observe
staircasing. The middle edge has a slope between λ and 3λ and becomes steeper, while the right
edge has a slope below λ and is flattened out
Fig. 5.14 Detection of edges in noisy images. Left column: Noisy image with gray values in
[0, 1] and the edges detected by Canny’s edge detector from Application 3.23 (parameters: σ = 2,
τ = 0.01). Middle column: Presmoothing by the heat equation, final time T = 20. Right column:
Presmoothing by the Perona-Malik equation with the function g1 , final time T = 20 and λ = 0.05
The only difference from the original Perona-Malik equation is the smoothing of u
in the argument of g. To show existence of a solution, we need a different notion
from that of viscosity solutions, namely the notion of weak solutions. This notion is
based on an observation that is similar to that in Theorem 5.15.
Theorem 5.34 Let A : → Rd×d be differentiable and T > 0. Then u ∈
C 2 ([0, ∞[×) with u(t, · ) ∈ L2 () is a solution of the partial differential equation
∂t u = div(A∇u) in [0, T ] ×
A∇u · ν = 0 on [0, T ] × ∂
if and only if for every function v ∈ H 1 () and every t ∈ [0, 1] one has
(∂t u(t, x))v(x) dx = − (A(x)∇u(t, x)) · ∇v(x) dx.
Proof Let u be a solution of the differential equation with the desired regularity and
v ∈ H 1 (). Multiplying both sides of the differential equation by v, integrating
218 5 Partial Differential Equations in Image Processing
The boundary integral vanishes because of the boundary condition, and this implies
the claim.
Conversely, let the equation for the integral be satisfied for all v ∈ H 1 ().
Similar to the above calculation we get
∂t u − div(A∇u) (t, x)v(x) dx = − v(x)(A(x)∇u(t, x)) · ν dHd−1.
∂
Since v is arbitrary, we can conclude that the integrals on both sides have to
vanish. Invoking the fundamental lemma of the calculus of variations (Lemma 2.75)
establishes the claim. $
#
The characterization of solutions in the above theorem does not use the assump-
tion u ∈ C 2 ([0, ∞[ × ), and also differentiability of A is not needed. The
equality of the integrals can be formulated for functions u ∈ C 1 ([0, T ], L2 ()) ∩
L2 (0, T ; H 1 ()). The following reformulation allows us to get rid of the time
derivative of u: we define a bilinear form a : H 1 () × H 1 () → R by
a(u, v) = (A∇u) · v dx.
∂t u = div(A∇u) in [0, T ] × ,
A∇u · ν = 0 on [0, T ] × ∂,
u(0) = u0 ,
d
(u(t), v) + a(u(t), v) = 0,
dt
u(0) = u0 .
5.3 Nonlinear Diffusion 219
This form of the initial-boundary value problem is called the weak formulation.
Remark 5.36 The time derivative in the weak formulation has to be understood in
the weak sense as described in Sect. 2.3. In more detail: the first equation of the
weak formulation states that for all v ∈ H 1 () and φ ∈ D(]0, 1[)
T 3 4
a(u(t), v)φ(t) − (u(t), v)φ (t) dt = 0.
0
= 0.025, =1 = 0.025, =2
Example 5.40 (Color Images and the Perona-Malik Equation) We use the example
of nonlinear diffusion according to Perona-Malik to illustrate some issues that arise
in the processing of color images. We consider a color image with three color
channels u0 : → R3 (cf. Sect. 1.1). Alternatively, we could also consider three
separate images or u0 : × {1, 2, 3} → R, where the image u0 ( · , k) is the kth
color channel. If we want to apply the Perona-Malik equation to this image, we
have several possibilities for choosing the diffusion coefficient. One naive approach
would be to apply the Perona-Malik equation to all channels separately:
∂t u(t, x, k) = div g(|∇u( · , · , k)|)∇u( · , · , k) (t, x), k = 1, 2, 3.
This may lead to problems, since it is not clear whether the different color channels
have their edges at the same positions. This can be seen clearly in Fig. 5.16. The
image there consists of a superposition of slightly shifted, blurred circles in the RGB
channels. After the Perona-Malik equation has been applied to all color channels,
one can clearly see the shifts. One way the get around this problem is to use the
HSV color system. However, there is another problem: edges do not need to have
the same slope in the different channels, and this may lead to further errors in the
colors. In the HSV system, the V-channel carries the most information, and often it
Fig. 5.16 Nonlinear Perona-Malik diffusion for color images. The original image consists of
slightly shifted and blurred circles with different intensities in the three color RGB channels. The
color system and the choice of the diffusion coefficient play a significant role. The best results are
achieved when the diffusion coefficients are coupled among the channels
222 5 Partial Differential Equations in Image Processing
is enough just to denoise this channel; however, there may still be errors. All these
effects can be seen in Fig. 5.16.
Another possibility is to remember the role of the diffusion coefficient as an edge
detector. Since being an edge is not a property of a single color channel, edges
should be at the same places in all channels. Hence, one should choose the diffusion
coefficient to be equal in all channels, for example by using the average of the
magnitudes of the gradients:
1 3
∂t u(t, x, k) = div g ∇u( · , · , i) ∇u( · , · , k) (t, x), k = 1, 2, 3.
3
i=1
Hence, the diffusion coefficient is coupled among the channels. Usually this gives
the best results. The effects can also be seen in real images, e.g., in images where
so-called chromatic aberration is present. This refers to the effect that is caused by
the fact that light rays of different wavelengths are refracted differently. This effect
can be observed, for example, in lenses of low quality; see Fig. 5.17.
The Perona-Malik equation has shown good performance for denoising and simul-
taneous preservation of edges. Smoothing along edges has not been that good,
though. This drawback can be overcome by switching to an anisotropic model. The
basic idea is to design a diffusion tensor that enables Perona-Malik-like diffusion
perpendicular to the edges but linear diffusion along the edges. We restrict ourselves
to the two-dimensional case, since edges are curves in this case and there is only
one direction along the edges. The development of methods based on anisotropic
diffusion goes back to [140].
The diffusion tensor should encode as much local image information as possible.
We follow the modified model (5.18) and take ∇uσ as an edge detector. As
preparation for the following, we define the structure tensor:
Definition 5.41 (Structure Tensor) The structure tensor for u : R2 → R and
noise level σ > 0 is the matrix-valued function J0 (∇uσ ) : R2 → R2×2 defined by
It is obvious that the structure tensor does not contain any more information than
the smoothed gradient ∇uσ , namely the information on the local direction of the
image structure and the rate of the intensity change. We can find this information in
the structure tensor as follows:
5.3 Nonlinear Diffusion 223
Fig. 5.17 Nonlinear Perona-Malik diffusion for a color image degraded by chromatic aberration.
If one treats the color channels separately in either the RGB or HSV system, some color errors
along the edges occur. Only denoising of the V channels shows good results; coupling of the color
channels gives best results
Lemma 5.42 The structure tensor has two orthonormal eigenvectors v1 ∇uσ and
v2 ⊥ ∇uσ . The corresponding eigenvalues are |∇uσ |2 and zero.
Proof For v1 = c∇uσ , one has J0 (∇uσ )v1 = ∇uσ ∇uTσ (c∇uσ ) = ∇uσ c|∇uσ |2 =
|∇uσ |2 v1 . Similarly, one sees that J0 (∇uσ )v2 = 0. $
#
Thus, the directional information is encoded in the eigenvalues of the structure
tensor. The eigenvalues correspond, roughly speaking, to the contrast in the
224 5 Partial Differential Equations in Image Processing
≥ 0. $
#
As a consequence of the above lemma, Jρ (∇uσ ) also has orthonormal eigenvectors
v1 , v2 and corresponding nonnegative eigenvalues μ1 ≥ μ2 ≥ 0. We interpret these
quantities as follows:
• The eigenvalues μ1 and μ2 are the “averaged contrasts” in the directions v1 and
v2 , respectively.
• The vector v1 points in the direction of “largest averaged gray value variation.”
• The vector v2 points in the direction of “average local direction of an edge.” In
other words, v2 is the “averaged direction of coherence.”
Starting from this interpretation we can use the eigenvalues μ1 and μ2 to discrimi-
nate different regions of an image:
• μ1 , μ2 small: There is no direction of significant change in gray values. Hence,
this is a flat region.
• μ1 large, μ2 small: There is a large gray value variation in one direction, but not
in the orthogonal direction. Hence, this is an edge.
5.3 Nonlinear Diffusion 225
Fig. 5.18 The structure tensor Jρ (∇uσ ) encodes information about flat regions, edges, and
corners. In the bottom row, the respective regions have been colored in white. In this example
the noise level is σ = 4 and the spatial scale is ρ = 2
• μ1 , μ2 both large: There are two orthogonal directions of significant gray value
change, and this is a corner.
See Fig. 5.18 for an illustration. We observe that the structure tensor Jρ (∇uσ )
indeed contains more information than J0 (∇uσ ): For the latter, there is always one
eigenvalue zero, and thus it cannot see corners. The matrix Jρ (∇uσ ) is capable of
doing this, since direction information from some neighborhood is used.
Before we develop special methods for anisotropic diffusion, we cite a theorem
about the existence of solutions of anisotropic diffusion equations where the diffu-
sion tensor is based on the structure tensor. The theorem is due to Weickert [140]
and is a direct generalization of Theorem 5.38.
Theorem 5.45 Let u0 ∈ L∞ (), ρ ≥ 0, σ > 0, T > 0, and let D : R2×2 → R2×2
satisfy the following properties:
• D ∈ C ∞ (, R2×2 ).
• D(J ) is symmetric for every symmetric J .
• For every bounded function w ∈ L∞ (, R2 ) with w∞ ≤ K there exists a
constant ν(K) > 0 such that the eigenvalues of D(Jρ (w)) are larger than ν(K).
226 5 Partial Differential Equations in Image Processing
λ1 = g(μ1 ), λ2 = 1.
The eigenvectors of D are obviously the vectors v1 and v2 , and the corresponding
eigenvalues are λ1 and λ2 . Hence, the equation
∂t u = div D(Jρ (∇uσ ))∇u
should indeed lead to linear diffusion as in the heat equation along the edges and
to Perona-Malik-like diffusion perpendicular to each edge; see Fig. 5.19. For results
on existence and uniqueness of solutions we refer to [140].
Fig. 5.19 Effect of anisotropic diffusion according to Examples 5.46 and 5.47 based on the
structure tensor Jρ (∇uσ ) (parameters σ = 0.5, ρ = 2). Top left: original image. Middle row:
edge-enhancing diffusion with function g2 and parameter λ = 0.0005. Bottom row: coherence-
enhancing diffusion with function g2 and parameters λ = 0.001, α = 0.001. Images are shown at
times t = 25, 100, 500
228 5 Partial Differential Equations in Image Processing
the same eigenvectors v1 and v2 as the structure tensor, and the eigenvalue for v2
should become larger for higher local coherence |μ1 − μ2 |. With a small parameter
α > 0 and a function g as in (5.14) we use the following eigenvalues:
The parameter α > 0 is needed to ensure that the diffusion tensor is positive definite.
As in the previous example we use the model
∂t u = div D(Jρ (∇uσ ))∇u .
The function g is used in a way that the eigenvalue λ2 is small for low coherence (in
the order of α) and close to one for high coherence. In Fig. 5.19 one sees that this
equation indeed enhances coherent structures. For further theory we refer to [140].
Application 5.48 (Visualization of Vector Fields) Vector fields appear in several
applications, for example vector fields that describe flows as winds in the weather
forecast or fluid flows around an object. However, the visualization of vector fields
for visual inspection is not easy. Here are some methods: On the one hand, one
can visualize a vector field v : → Rd by plotting small arrows v(x) at some
grid points x ∈ . Another method is to plot so-called integral curves, i.e., curves
γ : [0, T ] → such that the vector field is tangential to the curve, which means
γ (t) = v(γ (t)). The first variant can quickly lead to messy pictures, and the choice
of the grid plays a crucial role. for the second variant one has to choose a set of
integral curves, and it may happen that these curves accumulate in one region of the
image while other regions may be basically empty.
Another method for the visualization of a vector field, building on anisotropic
diffusion, has been proposed in [52]. The idea is to design a diffusion tensor that
allows diffusion along the vector field, but not orthogonal to the field. The resulting
diffusion process is than applied to an image consisting of pure noise. In more detail,
the method goes as follows: for a continuous vector field v : → Rd that is not
zero on , there exists a continuous map B(v) that maps a point x ∈ to a rotation
matrix B(v)(x) that rotates the vector v(x) to the first unit vector e1 : B(v)v = |v|e1 .
Using an increasing and positive mapping α : [0, ∞[ → [0, ∞[ and a decreasing
map G : [0, ∞[ → [0, ∞[ with G(r) → 0 for r → ∞, we define the matrix
8 9
α(|v|) 0
A(v, r) = B(v)T B(v).
0 G(r) idd−1
5.4 Numerical Solutions of Partial Differential Equations 229
For some initial image u0 : → [0, 1] and σ > 0 we consider the following
differential equation:
Since this equation drives all initial images to a constant image in the limit t → ∞
(as all the diffusion equations in this chapter do) the authors of [52] proposed to
add a source term. For concreteness let f : [0, 1] → R be continuous such that
f (0) = f (1) = 0, f < 0 on ]0, 0.5[, and f > 0 on ]0.5, 1[. This leads to the
modified differential equation
The new term f (u) will push the gray values toward 0 and 1, respectively, and thus
leads to higher contrast. This can be accomplished, for example, with the following
function:
f
f (u) = (u − 1
2) ( 21 )2 − (u − 1 2
2)
1
Figure 5.20 shows the effect of Eq. (5.19) with this f and a random initial value.
One can easily extract the directions of the vector field, even in regions where the
vector field has a small magnitude.
To produce images from the equations of the previous sections of this chapter,
we have to solve these equations. In general this is not possible analytically, and
numerical methods are used. The situation in image processing is a little special in
some respects:
• Typically, images are given on a rectangular equidistant grid, and one aims to
produce images of a similar shape. Hence, it is natural to use these grids.
• The visual quality of the images is more important than to solve the equations
as accurately as possible. Thus, methods with lower order are acceptable if they
produce “good images.”
• Some of the partial differential equations we have treated preserve edges or create
“kinks.” Two examples are the equations for erosion and dilation (5.12). This
poses a special challenge for numerical methods, since the solutions that will be
approximated are not differentiable.
We consider an introductory example:
230 5 Partial Differential Equations in Image Processing
Fig. 5.20 Visualization of vector fields by anisotropic diffusion as in Application 5.48. Top left:
The vectorfield. Below: The initial value and different solutions of Eq. (5.19) at different times
5.4 Numerical Solutions of Partial Differential Equations 231
Example 5.49 (Discretization of the Heat Equation) We want to solve the follow-
ing initial-boundary value problem for the heat equation
∂t u = u in [0, T ] × ,
∂ν u = 0 on [0, T ] × ∂, (5.20)
u(0) = u0 ,
h 2h · · · (M − 1)h
x1
h ···
2h
.. ..
. .
(N − 1)h
x2
For the time variable we proceed similarly and discretize it with step-size τ .
Using u for the solution of the initial-boundary value problem (5.20), we want to
find uni,j as an approximation to u(nτ, (i − 1)h, (j − 1)h), i.e., all three equations
in (5.20) have to be satisfied. The initial condition u(0) = u0 is expressed by the
equation
This gives the following equation for the discrete values uni,j :
un+1
i,j − ui,j
n uni+1,j + uni−1,j + uni,j +1 + uni,j −1 − 4uni,j
= . (5.21)
τ h2
There is a problem at the points with i = 1, N or j = 1, M: the terms that involve
i = 0, N + 1 or j = 0, M + 1, respectively, are not defined. To deal with this
issue we take the boundary condition ∂ν u = 0 into account. In this example, the
domain is a rectangle, and the boundary condition has to be enforced for values
with i = 1, N and j = 1, M. We add the auxiliary points un0,j , unN+1,j , uni,0 , and
uni,M+1 and replace the derivative by a central difference quotient and get
Thus, the boundary condition is realized by mirroring the values over the boundary.
The discretized equation (5.21) for i = 1, for example, has the following form:
un+1
1,j − u1,j
n
2un2,j + un1,j +1 + un1,j −1 − 4un1,j
= .
τ h2
We can circumvent the distinction of different cases by the following notation: we
solve Eq. (5.21) for un+1
i,j and get
τ n
un+1
i,j = ui,j +
n
(u + uni−1,j + uni,j +1 + uni,j −1 − 4uni,j ).
h2 i+1,j
This can be realized by a discrete convolution as in Sect. 3.3.3:
⎡ ⎤
0 1 0
τ
un+1 = un + 2 un ∗ ⎣1 −4 1⎦ .
h
0 1 0
5.4 Numerical Solutions of Partial Differential Equations 233
One advantage of this formulation is that we can realize the boundary condition
by a symmetric extension over the boundary. Starting from the initial value u0i,j =
u0 ((i − 1)h, (j − 1)h) we can use this to calculate an approximate solution uni,j
iteratively for every n.
We call the resulting scheme explicit, since we can calculate the values un+1
directly from the values un . One reason for this is that we discretized the time
derivative ∂t u by a forward difference quotient. If we use a backward difference
quotient, we get
uni,j − un−1
i,j uni+1,j + uni−1,j + uni,j +1 + uni,j −1 − 4uni,j
=
τ h2
or
⎛ ⎡ ⎤⎞
0 1 0
τ n
⎝u − u ∗ ⎣1 −4 1⎦⎠ = un−1
n
h2
0 1 0
(again, with a symmetric extension over the boundary to take care of the boundary
condition). This is a linear system of equations for un , and we call this scheme
implicit.
This initial example illustrates a simple approach to constructing numerical
methods for differential equations:
• Approximate the time derivative by forward or backwards difference quotients.
• Approximate the spatial derivative by suitable difference quotients and use
symmetric boundary extension to treat the boundary condition.
• Solve the resulting equation for un+1 .
A little more abstractly: we can solve a partial differential equation of the form
∂t u(t, x) = L(u)(t, x)
with a differential operator L that acts on the spatial variable x only by so-called
semi-discretization:
• Treat the equation in a suitable space X, i.e., u : [0, T ] → X, such that ∂t u =
L(u).
• Discretize the operator L: Choose a spatial discretization of the domain of the
x variable and thus, an approximation of the space X. Define an operator L
that operates on the discretized space and approximates L. Thus, this partial
differential equation turns into a system of ordinary differential equations
∂t u = L(u).
• Solve the system of ordinary differential equations with some method known
from numerical analysis (see, e.g., [135]).
234 5 Partial Differential Equations in Image Processing
In imaging, one often faces the special case that the domain for the x variable
is a rectangle. Moreover, the initial image u0 is often given on an equidistant grid.
This gives a natural discretization of the domain. Hence, many methods in imaging
replace the differential operators by difference quotients. This is generally known
as the method of finite differences.
The equations from Sects. 5.2 and 5.3 can be roughly divided into two groups:
equations of diffusion type (heat equation, nonlinear diffusion) and equations of
transport type (erosion, dilation, mean curvature motion). These types have to be
treated differently.
∂t u = div(A∇u),
i.e., the differential operator L(u) = div(A∇u). First we treat the case of isotropic
diffusion, i.e., A : → R is a scalar function. We start with the approximation of
the differential operator div(A∇u) = ∂x1 (A∂x1 u)+∂x2 (A∂x2 u) by finite differences.
Obviously it is enough to consider the term ∂x1 (A∂x1 u). At some point (i, j ) we
proceed as follows:
1
∂x1 (A∂x1 u) ≈ (A∂x1 u) 1 − (A∂x1 u) 1
h i+ 2 ,j i− 2 ,j
with
u − ui,j u − ui−1,j
i+1,j i,j
(A∂x1 u) 1 =A 1 , (A∂x1 u) 1 =A 1 .
i+ 2 ,j i+ 2 ,j h i− 2 ,j i− 2 ,j h
1
div(A∇u) ≈ A 1 ui,j −1 + A 1 ui,j +1 + A 1 ui−1,j + A 1 ui+1,j
h2 i,j − 2 i,j + 2 i− 2 ,j i+ 2 ,j
− (A 1 +A 1 + A 1 + A 1 )ui,j .
i,j − 2 i,j + 2 i− 2 ,j i+ 2 ,j
(5.22)
We can arrange this efficiently in matrix notation. To that end, we arrange the matrix
u ∈ RN×M in a vector U ∈ RNM , by stacking the rows into a vector.1 We define a
1 Of course, we could also stack the columns into a vector, and indeed, some software packages
have this as a default operation. The only difference between these two approaches is the direction
of the x1 and x2 coordinates.
5.4 Numerical Solutions of Partial Differential Equations 235
Thus we get
..
.
··· ..
.
..
.
.. ..
. .
..
.
If we denote the right-hand side in (5.22) by vi,j and define V(i,j ) = vi,j and
U(i,j ) = ui,j , we get V = AU with a matrix A ∈ RNM×NM defined by
⎧
⎪
⎪−(A 1 +A 1 +A 1 +A 1 ) if i = k, j = l,
⎪
⎪ i,j − 2 i,j + 2 i− 2 ,j i+ 2 ,j
⎪
⎪
⎨A 1 if i ± 1 = k, j = l,
A(i,j ),(k,l) = i± 2 ,j
⎪
⎪ if i = k, j ± 1 = l,
⎪
⎪A 1
⎪
⎪ i,j ± 2
⎩
0 otherwise.
(5.23)
1
∂t U = AU.
h2
This is a system of linear ordinary differential equations. Up to now we did not
incorporate that the diffusion coefficient A may depend on u (or on the gradient of
u, respectively). In this case A depends on u and we obtain the nonlinear system
1
∂t U = A(U )U.
h2
236 5 Partial Differential Equations in Image Processing
U n+1 − U n 1
= 2 A(U n )U n .
τ h
Implicit:
U n+1 − U n 1
= 2 A(U n+1 )U n+1 .
τ h
Semi-implicit:
U n+1 − U n 1
= 2 A(U n )U n+1 .
τ h
The implicit variant leads to a nonlinear system of equations, and its solution may
pose a significant challenge. Hence, implicit methods are usually not the method of
choice. The explicit method can be written as
τ
U n+1 = (id + A(U n ))U n (5.24)
h2
and it requires only one discrete convolution per iteration. The semi-implicit method
leads to
τ
(id − A(U n ))U n+1 = U n . (5.25)
h2
This is a linear system of equations for U n+1 , i.e., in every iteration we need to solve
one such system.
Now we analyze properties of the explicit and semi-implicit methods.
Theorem 5.50 Let A 1 ≥ 0, A 1 ≥ 0, and A(U n ) according to Eq. (5.23).
i± 2 ,j i,j ± 2
For the explicit method (5.24) assume that the step-size restriction
h2
τ≤
maxI |A(U n )I,I |
holds, while for the semi-implicit method (5.25) no upper bound on τ is assumed.
Then the iterates U n of (5.24) and (5.25), respectively, satisfy the discrete maximum
principle, i.e. for all I ,
NM
NM
UJn = UJ0 ,
J =1 J =1
for both the explicit and semi-implicit methods, i.e., the mean gray value is
preserved.
Proof First we consider the explicit iteration (5.24) and set Q(U n ) =
id +τ/ h2 A(U n ). Then
the explicit iteration reads U n+1 = Q(U n )U n . By definition
NM
of A(U ), we have J =1 A(U n )I,J = 0 for all I (note the boundary condition
n
NM
Q(U n )I,J = 1.
J =1
This immediately implies the preservation of the mean gray value, since
NM
NM NM
NM
NM
NM
UJn+1 = Q(U n )I,J UIn = Q(U n )I,J UIn = UIn ,
J =1 J =1 I =1 I =1 J =1 I =1
The step-size restriction implies Q(U n )I,I ≥ 0, which shows that the matrix Q(U n )
has nonnegative components. We deduce that
NM
NM
UIn+1 = Q(U n )I,J UJn ≤ max UKn Q(U n )I,J = max UKn .
K K
J =1 J =1
=1
construction
A(U n )I,I = − A(U n )I,J ,
J =I
and thus
τ τ τ
R(U n )I,I = 1 − 2
A(U n )I,I = 1 + 2 A(U n )I,J > 2 A(U n )I,J = |R(U n )I,J |.
h h h
J =I J =I J =I
The property R(U n )I,I > J =I |R(U n )I,J | is called “strict diagonal dominance.”
This property implies that R(U n ) is invertible and that the inverse matrix R(U n )−1
has nonnegative entries (cf. [72]). Moreover, with e = (1, . . . , 1)T ∈ RNM , we have
R(U n )e = e, which implies, by invertibility of R(U n ), that R(U n )−1 e = e holds,
too. We conclude that
NM
(R(U n )−1 )I,J = 1.
J =1
Similarly to the explicit case, we deduce the preservation of the mean gray value
from the nonnegativity R(U n )−1 , and
NM
NM
UIn+1 = (R(U n )−1 )I,J UJn ≤ max UKn (R(U n )−1 )I,J = max UKn
K K
J =1 J =1
=1
A(u) = g(|∇u|) id .
A 1 = g(|∇u| 1 ).
i± 2 ,j i± 2 ,j
|∇u|i,j + |∇u|i±1,j
|∇u| 1 = .
i± 2 ,j 2
The gradients at these integer places can be approximated by finite differences. For
the modified Perona-Malik equation we have
A 1 = g(|∇uσ | 1 ).
i± 2 ,j i± 2 ,j
Then the entries of A are calculated in exactly the same way, after an initial
presmoothing. If the function g is nonnegative, the entries A 1 and A 1 are
i± 2 ,j i,j ± 2
nonnegative, and Theorem 5.50 applies. This discretization was used to generate the
images in Fig. 5.10. Alternative methods for the discretization of isotropic nonlinear
diffusion are described, for example, in [142] and [111].
Remark 5.53 (Anisotropic Equations) In the case of anisotropic diffusion with
symmetric diffusion tensor
8 9
B C
A= ,
CD
there are mixed second derivatives in the divergence. For example, in two dimen-
sions,
If we form the matrix A similar to Eqs. (5.22) and (5.23) by finite differences, it
is not clear a priori how one can ensure that the entries A 1 and A 1 are
i± 2 ,j i,j ± 2
nonnegative. In fact, this is nontrivial, and we refer to [140, Section 3.4.2]. One
alternative to finite differences is the method of finite elements, and we refer to [52,
115] for details.
240 5 Partial Differential Equations in Image Processing
Transport equations are a special challenge. To see why this is so, we begin with a
simple one-dimensional example of a transport equation. For a = 0 we consider
∂t u + a∂x u = 0, t > 0, x ∈ R,
with initial value u(0, x) = u0 (x). It is simple to check that the solution is just the
initial value transported with velocity a, i.e.,
u(t, x) = u0 (x − at).
X (t) = a, X(0) = x0 .
Method of Characteristics
∂t u + a · ∇u = 0,
u(0, x) = u0 (x).
X = a(X), X(0) = x0 ,
Proof We consider u along the solutions X of the initial value problems and take
the derivative with respect to t:
d
u(t, X(t)) = ∂t u(t, X(t)) + a(X(t)) · ∇u(t, X(t)) = 0.
dt
Thus, u is constant along X, and by the initial value for X we get at t = 0 that
(cf. Exercise 5.6). Hence, the scale space is described by the differential equation
∂t u − v · ∇u = 0, u(0, x) = u0 (x).
with some suitable routine up to time T . Here one can use, for example, the Runge-
Kutta methods, see, e.g., [72]. If v is given only on a discrete set of points, one can
use interpolation as in Sect. 3.1.1 to evaluate v at intermediate points. Then one gets
u(T , X(T )) = u0 (x0 ) (where one may need interpolation again to obtain u(T , · ) at
the grid points). This method was used to generate the images in Fig. 5.1.
Application 5.56 (Erosion, Dilation, and Mean Curvature Motion) The equa-
tions for erosion, dilation, and mean curvature motion can be interpreted as transport
equations, cf. Remark 5.25 and Sect. 5.2.2. However, the vector field v depends on
u in these cases, i.e.,
∂t u − v(u) · ∇u = 0.
Hence, the method of characteristics from Application 5.55 cannot be applied in its
plain form. One still obtains reasonable results if the function u is kept fixed for
the computation of the vector field v(u) for some time. In the example of mean
242 5 Partial Differential Equations in Image Processing
∇u(tn , x)
v(u(tn , x)) = κ(tn , x) .
|∇u(tn , x)|
∇u
κ = div( ).
|∇u|
Thus we may proceed as follows. First calculate the unit vector field
∇u(tn ,x)
ν(tn , x) = |∇u(t , e.g., by finite differences (avoiding division by zero,
n ,x)| <
e.g., by |∇u(tn , x)| ≈ |∇u(tn , x)| + ε2 with some small ε > 0). Compute
vtn (x) = (div ν)(tn , x)ν(tn , x), e.g. again by finite differences.
• Solve the equation
∂t u − vtn · ∇u = 0
with initial value u(tn , x) up to time tn+1 = tn + T with T not too large by the
method of characteristics and go back to the previous step.
This method was used to produce the images in Fig. 5.8.
Similarly one can apply the method for the equations of erosion and dilation with
a circular structure element
∂t u ± |∇u| = 0;
see Fig. 5.21 for the case of dilation. One notes some additional smoothing that
results from the interpolation.
In this application we did not treat the nonlinearity in a rigorous way. For a
nonlinear transport equation of the form ∂t u + ∇(F (u)) = 0 one can still define
characteristics, but it may occur that two characteristics intersect or are not well
defined. The first case leads to so-called “shocks,” while the second case leads to
nonunique solutions. Our method does not consider these cases and hence may run
into problems.
t=4 t=6
Fig. 5.21 Solutions of the equation for the dilation by the method of characteristics according to
Application 5.56
Example 5.57 (Stability Analysis for the One-Dimensional Case) Again we begin
with a one-dimensional example
We use forward differences in the t direction and a central difference quotient in the
x direction and get with notation similar to Example 5.49, the explicit scheme
τ n
un+1 = unj + a (u − unj−1 ). (5.26)
j
2h j +1
To see that this method is not useful we use the so-called “von Neumann stability
analysis.” To that end, we consider the method on a finite interval with periodic
boundary conditions, i.e., j = 1, . . . , M and unj+M = unj . We make a special ansatz
244 5 Partial Differential Equations in Image Processing
or more compactly,
τ
un+1
j = unj + max(0, aj )(unj+1 − unj ) + min(0, aj )(unj − unj−1 ) .
h
5.4 Numerical Solutions of Partial Differential Equations 245
Depending on the sign of ∂xi u, we choose either the forward or backward difference,
i.e.,
1 2
(∂x1 u)2i,j ≈ 2
max 0, ui+1,j − ui,j , −(ui,j − ui−1,j ) ,
h
1 2
(∂x2 u)2i,j ≈ 2 max 0, ui,j +1 − ui,j , −(ui,j − ui,j −1 ) .
h
The resulting method is known as Rouy-Tourin method [120]. Results for this
method are shown in Fig. 5.22. Again we note, similar to the method of characteris-
tics from Application 5.56, that a certain blur occurs. This phenomenon is called
numerical viscosity. Finite difference methods with less numerical viscosity are
proposed, for example, in [20].
Remark 5.59 (Upwind Method According to Osher and Sethian) The authors
of [106] propose the following different upwind method:
1
(∂x1 u)2i,j ≈ 2
max(0, ui+1,j − ui,j )2 + max(0, ui−1,j − ui,j )2
h
1
(∂x2 u)2i,j ≈ 2 max(0, ui,j +1 − ui,j )2 + max(0, ui,j −1 − ui,j )2 .
h
This method gives results that are quite similar to that of Rouy-Tourin, and hence we
do not show extra pictures. Especially, some numerical viscosity can be observed,
too.
246 5 Partial Differential Equations in Image Processing
Fig. 5.22 Solution of the dilation equation with the upwind method according to Rouy-Tourin
from Application 5.58
Partial differential equations can be used for many further tasks, e.g., also for
inpainting; cf. Sect. 1.2. An idea proposed by Bertalmio [13], is, to “transport” the
information of the image into the inpainting domain. Bornemann and März [15]
provide the following motivation for this approach: in two dimensions we denote by
∇ ⊥ u the gradient of u turned π2 to the left,
8 9 8 9
0 −1 −∂x2 u
∇⊥u = ∇u = .
1 0 ∂x1 u
∂t u = −∇ ⊥ (u) · ∇u.
5.6 Exercises 247
As illustrated in Application 3.23, the level lines of u roughly follow the edges.
Hence, the vector ∇ ⊥ (u) is also tangential to the edges, and thus the equation
realizes some transport along the edges of the image. This is, roughly speaking,
the same one would do by hand to fill up a missing piece of an image: take the
edges and extend them into the missing part and then fill up the domain with the
right colors. Bornemann and März [15] propose, on the one hand, to calculate the
transport direction ∇ ⊥ (uσ ) with a presmoothed uσ and, on the other hand, get
further improved results by replacing the transport direction with the eigenvector
corresponding to the smaller eigenvalue of the structure tensor Jρ (∇uσ ).
Methods that are based on diffusion can be applied to objects different from
images; we can also “denoise” surfaces. Here a surface is a manifold, and the
Laplace operator has to be replaced by the so-called Laplace-Beltrami operator. Also
one can adapt the ideas of anisotropic diffusion to make them work on surfaces, too;
see, e.g., [45].
The Perona-Malik equation and its analytical properties are still subject to
research. Amann [4] describes a regularization of the Perona-Malik equation that
uses a temporal smoothing instead of the spatial smoothing used in the modified
model (5.18). This can be interpreted as a continuous analogue of the semi-
implicit method (5.25). Chen and Zhang [44] provide a new interpretation of the
nonexistence of solutions of the Perona-Malik equation in the context of Young
measures. Esedoglu [59] develops a finer analysis of the stability of the discretized
Perona-Malik method and proves maximum principle for certain initial values.
5.6 Exercises
(1 + d/2)
ϕ(x) = χB1 (0)(x)
π d/2
x
(cf. Example 2.38) and ϕt (x) = τ (t)−d ϕ τ (t ) . We consider the scale space
u ∗ ϕt if t > 0
Tt u =
u if t = 0.
248 5 Partial Differential Equations in Image Processing
Which scale space axioms are satisfied, and which are not? Can you show that an
infinitesimal generator exists?
Exercise 5.4 (Recursivity of Scaled Dilation) Let B ⊂ Rd be nonempty.
1. Show that
2. Show that the multiscale dilation from Example 5.4 satisfies the axiom [REC] if
B is convex. Which assumptions are needed for the reverse implication?
Exercise 5.5 (Properties of the Infinitesimal Generator) Let the assumptions of
Theorem 5.11 be fulfilled. Show the following:
1. If axiom [TRANS] is satisfied in addition, then
∂j
(t, x) = v(j (t, x)), j (0, x) = x.
∂t
Show that the infinitesimal generator of (Tt ) is given by
∇(h ◦ u) = h ∇u,
∇ 2 (h ◦ u) = h ∇ 2 u + h ∇u ⊗ ∇u.
5.6 Exercises 249
Exercise 5.8 (Auxiliary Calculation for Theorem 5.23) Let X ∈ S d×d with xd,d =
0 and M = d−1 2
i=1 xd,i . Moreover, let ε > 0 and
⎛ ⎞ ⎛ ⎞
1 ε
⎜ .. ⎟ ⎜ .. ⎟
⎜ . ⎟ ⎜ . ⎟
Q=⎜ ⎟, Iε = ⎜ ⎟.
⎝ 1 ⎠ ⎝ ε ⎠
M
0 ε
Show that
QXQ X + Iε ,
X QXQ + Iε .
x y − x y
κ= 3/2 .
(x )2 + (y )2
Let u : R2 → R be such that the zero level set {(x, y) u(x, y) = 0} is
parameterized by such a curve c.
Show that on this zero level set at points with ∇u = 0 one has
∇u
κ = div |∇u| .
g (|p|) T
F (p, X) = p Xp + g(|p|) trace X
|p|
(In this case one says that the solution satisfies a maximum principle.)
Exercise 5.12 (Decrease of Energy and Preservation of the Mean Gray Value for
the Modified Perona-Malik Equation) Let ⊂ Rd , u0 ∈ L∞ (), g : [0, ∞[ →
[0, ∞[ infinitely differentiable and u : [0, T ] × → R a solution of the modified
Perona-Malik equation, i.e., a solution of the initial-boundary value problem
is constant.
Chapter 6
Variational Methods
u0 = u† + η,
where the noise η is unknown. As we have already seen, there are different
approaches to solve the denoising problem, e.g., the application of the moving
average, morphological opening, the median filter, and the solution of the Perona-
Malik equation.
Since we do not know the noise η, we need to make assumptions on u† and η and
hope that these assumptions are indeed satisfied for the given data u0 . Let us make
some basic observations:
• The noise η = u0 − u† is a function whose value at every point is independent of
the values in the neighborhood of that point. There is no special spatial structure
in η.
• The function u† represents an image that has some spatial structure. Hence, it is
possible to make assertions about the behavior of the image in the neighborhood
of a point.
In a little bit more abstract terms, the image and the noise will have different
characteristics that allow one to discriminate between these two; in this case these
characteristics are given by the local behavior. These assumptions, however, do
not lead to a mathematical model and of course not to a denoising method. The
basic idea of variational methods in mathematical imaging is to express the above
assumptions in quantitative expressions. Usually, these expressions say how “well”
a function “fits” the modeling assumption; it should be small for a good fit, and large
for a bad fit.
With this is mind, we can reformulate the above points as follows:
• There is a real-valued function that gives, for every “noise” function η, the
“size” of the noise. The function should use only point-wise information.
“Large” noise, or the presence of spatial structure, should lead to large values.
• There is a real valued function ! that says how much an image u looks like a
“natural image.” The function should use information from neighborhoods and
should be “large” for “unnatural” images and “small” for “natural” images.
These assumptions are based on the hope that the quantification of the local behavior
is sufficient to discriminate image information from noise. For suitable functions
and ! one chooses a weight λ > 0, and this leads, for every image u (and,
consequently, for every noise η = u − u0 ), to the expression
(u0 − u) + λ!(u),
which gives a value that says how well both requirements are fulfilled; the smaller,
the better. Thus, it is natural to look for an image u that minimizes the expression,
i.e., we are looking for u∗ for which
holds.
Within the model given by , !, and λ, the resulting u∗ is optimal and gives
the denoised image. A characteristic feature of these methods is the solution of a
minimization problem. Since one varies over all u to search for an optimum u∗ ,
these methods are called variational methods or variational problems. The function
to be minimized in a variational problem is called an objective functional. Since
measures the difference u0 − u, it is often called a discrepancy functional or
discrepancy term. In this context, ! is also called a penalty functional.
The following example, chosen such that the calculations remain simple, gives
an impression as to how the model assumption can be transferred to a functional
and what mathematical questions can arise in this context.
Example 6.1 (L2 -H 1 Denoising) Consider the whole space Rd and a (complex-
valued) noisy function u0 ∈ L2 (Rd ). It appears natural to choose for the squared
norm
1
(u) = |u(x)|2 dx.
2 Rd
uses the gradient of u and hence uses in some sense also information from a
neighborhood. The corresponding variational problem reads
1 λ
min |u0 (x) − u(x)|2 dx + |∇u(x)|2 dx. (6.1)
u∈H 1 (Rd ) 2 Rd 2 Rd
Note that the gradient has to be understood in the weak sense, and hence the
functional is well defined in the space H 1 (Rd ). Something that is not clear a priori
is the existence of a minimizer and hence the justification to use a minimum instead
of an infimum.
We will treat the question of existence of minimizers in greater detail later in this
chapter and content ourselves with a formal solution of the minimization problem:
by the Plancherel formula (4.2) and the rules for derivatives from Lemma 4.28, we
can reformulate the problem (6.1) as
1 λ
min |u00 (ξ ) − 0
u(ξ )|2 dξ + |ξ |2 |0
u(ξ )|2 dξ.
u∈H 1 (Rd ) 2 Rd 2 Rd
We see that this is now a minimization problem for 0u in which we aim to minimize
an integral that depends only on 0
u(ξ ). For this “point-wise problem,” and we will
argue in more detail later, the overall minimization is achieved by “point-wise
almost everywhere minimization.” The point-wise minimizer u∗ satisfies for almost
all ξ ,
1 00 λ
0
u(ξ ) = arg min |u (ξ ) − z|2 + |ξ |2 |z|2 .
z∈C 2 2
1 00 λ 1 1
|u (ξ ) − z|2 + |ξ |2 |z|2 = 1 + λ|ξ |2 |z|2 + |u00 (ξ )|2 − |z| Re sgn(z)u00 (ξ ) ,
2 2 2 2
and hence the minimization with respect to the argument sgn(z) yields sgn(z) =
sgn u00 (ξ ) . This leads to
1 00 λ 1 1
|u (ξ ) − z|2 + |ξ |2 |z|2 = 1 + λ|ξ |2 |z|2 − |z||u00 (ξ )| + |u00 (ξ )|2 ,
2 2 2 2
which we minimize with respect to the absolute value |z| and obtain |z| =
|u00 (ξ )|/(1 + λ|ξ |2 ). In total we get z = u00 (ξ )/(1 + λ|ξ |2 ), and hence u0∗ is unique
and given by
u00 (ξ )
u0∗ (ξ ) = for almost every ξ ∈ Rd .
1 + λ|ξ |2
254 6 Variational Methods
u∗ = u0 ∗ Pλ .
Using the (d/2 − 1)th modified Bessel function of the second kind Kd/2−1 , we can
write Pλ as
|x|1−d/2 2π|x|
Pλ (x) = K d/2−1 √ (6.2)
(2π)d−1 λ(d+2)/4 λ
u0 = u† ∗ k + η and η = u0 − u† ∗ k, respectively.
Fig. 6.1 Denoising by solving problem (6.1). Upper left: Original image u† with 256 × 256
pixels, upper right: Noisy version u0 (PSNR(u0 , u† ) = 19.98 dB). Bottom row: Denoised images
by solving the minimization problem (6.1) u1 (left, PSNR(u1 , u† ) = 26.21 dB) and u2 (right,
PSNR(u2 , u† ) = 24.30 dB). The regularization parameters are λ1 = 25 × 10−5 and λ2 =
75 × 10−5 , respectively and u0 has been extended to R2 by 0
Similar to Example 6.1 we get, using the convolution theorem this time (Theo-
rem 4.27), that the minimization is equivalent to
1 λ
min |u00 (ξ ) − (2π)d/20 u(ξ )|2 dξ +
k(ξ )0 |ξ |2 |0
u(ξ )|2 dξ,
u∈H 1 (Rd ) 2 Rd 2 Rd
u00 (ξ )(2π)d/20
k(ξ )
u0∗ (ξ ) = for almost all ξ ∈ Rd ,
d 0
(2π) |k(ξ )| + λ|ξ |2
2
256 6 Variational Methods
and hence the solution is again obtained by convolution, this time with the kernel
0k
u∗ = u0 ∗ k λ , kλ = F −1 . (6.4)
(2π)d |0
k|2 + λ| · |2
We note that the assumptions k ∈ L1 (Rd ) ∩ L2 (Rd ) and Rd k dx = 1 guarantee
that the denominator is continuous and bounded away from zero, and hence, we have
kλ ∈ L2 (Rd ). For λ → 0 it follows that (2π)d/2k0λ → (2π)−d/20
k −1 point-wise, and
hence we can say that the convolution with kλ is in some sense a regularization of
the division by (2π)d/20k, which would be “exact” deconvolution.
The numerical implementation can be done similarly to Example 6.1. Figures 6.2
and 6.3 show some results for this method. In contrast to Remark 4.21 we used
u†
Fig. 6.2 Solutions of the deblurring problem in (6.3). Top left: Original image, extended by zero
(264 × 264 pixel). Bottom left: Convolution with an out-of-focus kernel (diameter of 8 pixels) and
quantized to 256 gray values (not visually noticeable). Bottom right: Reconstruction with (6.4)
(PSNR(u∗ , u† ) = 32.60 dB)
6.1 Introduction and Motivation 257
u† u0 , PSNR(u0 , u† ∗ k ) = 32.03db
Fig. 6.3 Illustration of the influence of noise in u0 on the deconvolution according to (6.3). Top
left: Original image. Top right: Convolved image corrupted by additive normally distributed noise
(visually noticeable). Bottom row: Reconstruction with different parameters (λ = 5 × 10−6 left,
λ = 10−6 right). Smaller parameters amplify the artifacts produced by noise
a convolution kernel for which the Fourier transform is equal to zero at many
places. Hence, division in the Fourier space is not an option. This problem can be
solved by the variational approach (6.3): even after quantization to 256 gray levels
we can achieve a satisfactory, albeit not perfect, deblurring (Fig. 6.2; cf. Fig. 4.3).
However, the method has its limitations. If we distort the image by additive noise,
this will be amplified during the reconstruction (Fig. 6.3). We obtain images that
look somehow sharper, but they contain clearly perceivable artifacts. Unfortunately,
both “sharpness” and artifacts increase for smaller λ, and the best results (in the
sense of visual perception) are obtained with a not-too-small parameter.
This phenomenon is due to the fact that deconvolution is an ill-posed inverse
problem, i.e., minimal distortions in the data lead to arbitrarily large distortions in
the solution. A clever choice of λ can lead to an improvement of the image, but
some essential information seems to be lost. To some extent, this information is
258 6 Variational Methods
amended by the “artificial” term λ! in the minimization problem. The penalty term
!, however, is part of our model that we have for our original image; hence, the
result depends on how well u† is represented by that model. Two questions come
up: what reconstruction quality can be expected, despite the loss of information, and
how does the choice of the minimization problem influence this quality?
As a consequence, the theory of inverse problems [58] is closely related to
mathematical imaging [126]. There one deals with the question how to overcome
the problems induced by ill-posedness in greater detail and also answers, to some
extent, the question how to choose the parameter λ. In this book we do not discuss
questions of parameter choice and assume that the parameter λ is given.
Remark 6.3 The assumption that the convolution kernel k is known is a quite strong
assumption in practice. If a blurred image u0 is given, k can usually not be deduced
from the image. Hence, one faces the problem of reconstructing both u† and k
simultaneously, a problem called blind deconvolution. The task is highly ambiguous
since one sees by the convolution theorem that for every u0 there exist many pairs of
u and k such that (2π)d/20 k0u = u00 . This renders blind deconvolution considerably
more difficult and we will restrict ourselves to the, already quite difficult, task of
“non-blind” deconvolution. For blind deconvolution we refer to [14, 26, 39, 81].
As a final introductory example we consider a problem that seems to have a
different flavor at first sight: inpainting.
Example 6.4 (Harmonic Inpainting) The task to fill in a missing part of an image
in a natural way, called inpainting, can also be written as a minimization problem.
We assume that the “true,” real valued image u† is given on a domain ⊂ Rd , but
on a proper subset ⊂ with ⊂⊂ it is not known. Hence, the given data
consists of and u0 = u† |\ .
Since we have to “invent” suitable data, the model that we have for an image is of
great importance. Again we note that the actual brightness value of an image is not
so important in comparison to the behavior of the image in a local neighborhood.
As before, we take the Sobolev space H 1 () as a model (this time real-valued) and
postulate u† ∈ H 1 (). In particular, we assume that the corresponding (semi-)norm
measures how well an element u ∈ H 1 () resembles a natural image. The task of
inpainting is then formulated as the minimization problem
1
min |∇u(x)|2 dx. (6.5)
u∈H 1 () 2
u=u0 on \
The set
{u − u∗ ∈ H 1 () u = u0 in \ } = {v ∈ H 1 () v = 0 in \ }
is a subspace of H 1 (), and it can be shown that it is equal to H01 ( ) (see
Exercise 6.2). Hence, u∗ has to satisfy
∇u∗ (x) · ∇v(x) dx = 0 for all v ∈ H01 ( ). (6.6)
This is the weak form of the so-called Euler-Lagrange equation associated to (6.5).
In fact, (6.6) is the weak form of a partial differential equation. If u∗ is twice
continuously differentiable in , one has for every v ∈ D( ) that
∗
∇u (x) · ∇v(x) dx = − u∗ (x)v(x) dx = 0
and by the fundamental lemma of the calculus of variations (Lemma 2.75) we obtain
u∗ = 0 in , i.e. the function u∗ is harmonic there (see [80] for an introduction
to the theory of harmonic functions). It happens that u∗ is indeed always twice
differentiable, since it is weakly harmonic, i.e., it satisfies
u∗ (x)v(x) dx = 0 for all v ∈ D( ).
260 6 Variational Methods
By Weyl’s lemma [148, Theorem 18.G] such functions are infinitely differentiable
in . If we further assume that u∗ | has a trace on ∂ , it has to be the same as the
trace of u0 (which is given only on \ ). This leads to the conditions
u∗ = 0 in , u∗ = u0 on ∂ ,
which is the strong form of the Euler-Lagrange equation for (6.5). Hence, the
inpainting problem with H 1 norm leads to the solution of the Laplace equation
with so-called Dirichlet boundary values, and is also called harmonic inpainting.
Some properties of u∗ are immediate. On the one hand, harmonic functions
satisfy the maximum principle , which says that non constant u∗ do not have local
maxima and minima in . This says that the solution is unique and that harmonic
inpainting cannot introduce new structures. This seems like a reasonable property.
On the other hand, u∗ satisfies the mean-value property
1
u∗ (x) = u∗ (x − y) dy
L Br (0)
d
Br (0)
Fig. 6.4 Inpainting by minimization of the H 1 norm. Left column: The original image (top,
256 × 256 pixels) contains some homogeneous regions that should be reconstructed by inpainting
(middle, is the checkerboard pattern). Bottom, result of the minimization of (6.5). Right column:
The original image (top, 256 × 256 pixels) contains some fine structures with high contrast, which
has been removed in the middle picture. The bottom shows the result of harmonic inpainting
262 6 Variational Methods
One of the most widely used proof techniques to show the existence of minimizers
of some functional is the direct method in the calculus of variations. Its line of
argumentation is very simple and essentially follows three abstract steps.
Before we treat the method we recall the definition of the extended real numbers:
R∞ = ]−∞, ∞] = R ∪ {∞}.
Conversely, for every sequence (un ) in X with limn→∞ un = u there exists a subse-
quence (unk ) such that for tn = F (un ), one has limk→∞ tnk = lim infn→∞ F (un ).
We conclude that F (u) ≤ lim infn→∞ F (un ).
With these notions in hand, we can describe the direct method as follows:
To Show
The functional F : X → R∞ , defined on a topological space X has a minimizer u∗ ,
i.e.,
Remark 6.9 (Compactness in Examples 6.1–6.4) For the functional in (6.1), (6.3),
and (6.5) one can show only that every minimizing sequence (un ) is bounded in
X = H 1 (Rd ) or X = H 1 (), respectively. For the functionals in (6.1) and (6.3),
for example, the form of the objective functional implies that there exists a C > 0
such that for all n,
2
∇un 22 ≤ F (un ) ≤ C,
λ
For any subsequence nk , for which F (unk ) converges, we get by monotonicity and
lower semicontinuity of ϕ
ϕ lim inf F (un ) ≤ ϕ lim F (unk ) ≤ lim inf ϕ F (unk ) .
n→∞ k→∞ k→∞
Since we can argue as above for any subsequence, we obtain the claim.
Assertion 4: For un u in Y one has (un ) (u) in X, and thus
F (u) ≤ lim inf F (un ) .
n→∞
Assertion 5: For every n ∈ N and i ∈ I one has Fi (un ) ≤ supi∈I Fi (un ), and hence
we conclude that
Fi (u) ≤ lim inf Fi (un ) ≤ lim inf sup Fi (un ) ⇒ sup Fi (u) ≤ lim inf sup Fi (un )
n→∞ n→∞ i∈I i∈I n→∞ i∈I
and hence the norm is, by Lemma 6.14, items 5 and 6, weakly lower semicontinuous.
The claim follows by item 3 of that lemma. $
#
Example 6.16 (Weak Lower Semicontinuity for Examples 6.1–6.4) Now we have
all the ingredients to prove weak lower semicontinuity of the functionals in the
examples from the beginning of this chapter.
1. We write the functional F1 (u) = 12 Rd |u0 − u|2 dx with ϕ(x) = 12 x 2 and
(u) = u − u0 as
F1 = ϕ ◦ · L2 ◦ .
F2 (u) = λ
2 Rd |∇u|2 dx is shown similarly: we use ϕ(x) = λ2 x 2 and = ∇ and
write
F2 = ϕ ◦ · L2 ◦
It is easy to see (Exercise 6.2) that H01 ( ) is a closed subspace of H 1 (). For
every sequence (un ) in H01 ( ) ⊂ H 1 () with weak limit u ∈ H 1 (), one has
Hence, we have u ∈ H01 ( ), and this shows that IH 1 ( ) is weakly lower
0
semicontinuous. With F2 from item 1 and F = F1 + F2 we obtain the weak
lower semicontinuity of the functional in (6.5).
This settles the question of existence of minimizing elements for the introductory
problems. We note the general procedure in the following theorem:
Theorem 6.17 (The Direct Method in Banach Spaces) Let X be a reflexive
Banach space and let F : X → R∞ be bounded from below, coercive, and weakly
lower semicontinuous. Then the problem
min F (u)
u∈X
has a solution in X.
For dual spaces X∗ of separable normed spaces (not necessarily reflexive) one
can prove a similar claim under the assumption that F is weakly* lower semicon-
tinuous (Exercise 6.4). As we have seen, the notion of weak lower semicontinuity
is a central element of the argument. Hence, the question as to which functionals
have this property is well investigated within the calculus of variations. However,
there is no general answer to this question. The next example highlights some of the
difficulties that can arise with weak lower semicontinuity.
6.2 Foundations of the Calculus of Variations and Convex Analysis 269
ϕ1 ϕ2
2 2
1.5 1.5
1 1
0.5 0.5
(a) -2 -1 0 1 2s -2 -1 0 1 2s
u0 u1 u2 u3
1 1 1 1
0 0 0 0
1 1 1 1
-1 -1 -1 -1
u4 1 u5 u
1 1
0 0 · · · 0
1 1 1
(b) -1 -1 -1
Fig. 6.5 Visualization of the functions from Example 6.18. (a) The pointwise “energy functions”
ϕ1 and ϕ2 . (b) The first elements of the sequence (un ) and its weak limit u
that ϕ1 is convex, while ϕ2 is not. This is explained within the theory of convex
analysis, which we treat in the next section.
During the course of the chapter we will come back to the notion of weak lower
semicontinuity of functionals. But for now, we end this discussion with a remark.
Remark 6.19 For every functional F : X → R∞ , for which there exists a weakly
lower semicontinuous F0 : X → R∞ such that F0 ≤ F holds pointwise, one can
consider the following construction:
F (u) = sup {G(u) G : X → R∞ , G ≤ F, G weakly lower semicontinuous}.
This and the following subsection give an overview of the basic ideas of convex
analysis, where we focus on the applications to variational problems in mathemati-
cal imaging. The results, more details, and further material can be found in standard
references on convex analysis such as [16, 57, 118]. We focus our study of convex
analysis on convex functionals and recall their definition.
Definition 6.20 (Convexity of Functionals) A functional F : X → R∞ on a
normed space X is called convex if for all u, v ∈ X and λ ∈ [0, 1], one has
F λu + (1 − λ)v ≤ λF (u) + (1 − λ)F (v).
It is called strictly convex if for all u, v ∈ X with u = v and λ ∈ ]0, 1[, one has
F λu + (1 − λ)v < λF (u) + (1 − λ)F (v).
We will study convex functionals in depth in the following, and we will see that
they have several nice properties. These properties make them particularly well
suited for minimization problems. Let us start with fairly obvious constructions
and identify some general examples of convex functionals. The method from
Lemma 6.14 can be applied to convexity in a similar way.
Lemma 6.21 (Construction of Convex Functionals) Let X, Y be normed spaces
and F : X → R∞ convex. Then we have the following:
1. For α ≥ 0 the functional αF is convex.
2. If G : X → R∞ is convex, then so is F + G.
6.2 Foundations of the Calculus of Variations and Convex Analysis 271
ϕ1
2
1.5 ϕ3
2.5
1
epi ϕ1 2
0.5
s
-2 -1 0 1 2 epi ϕ3
ϕ2 x1
2 2
1.5 1
1 -2 -1 0 1 2 x2
epi ϕ2 -1
0.5 -2
s
-2 -1 0 1 2
3 2
10 s , s ≤ 0, 0, s ∈ [−1, 1],
ϕ1 (s) = ϕ2 (s) = ϕ3 (x) = 21 (x12 + x22 ).
2 s + s, ∞,
1 2
s > 0, otherwise,
2. Norms
Every norm · X on a normed space is convex, since for all u, v ∈ X and
λ ∈ K,
We remark that a norm on a nontrivial normed space is never strictly convex (due
to positive homogeneity). The situation is different for strictly convex functions
of norms: for ϕ : [0, ∞[ → R∞ strictly
monotonically increasing and strictly
convex, the functional F (u) = ϕ uX is strictly convex if and only if the norm
in X is strictly convex, i.e. for all u, v ∈ X with uX = vX = 1, u = v, and
for all λ ∈ ]0, 1[, one has λu + (1 − λ)v X < 1.
The norm in a Hilbert space is always strictly convex, since for uX =
2
vX = 1 and u = v, the function Fλ : λ → λu + (1 − λ)v X is twice
continuously differentiable with Ft (λ) = 2u − v2X > 0, and hence convex.
6.2 Foundations of the Calculus of Variations and Convex Analysis 273
3. Indicator functionals
The indicator functional of a set K ⊂ X, i.e.,
0 if u ∈ K,
IK (u) =
∞ otherwise,
4. Functionals in X∗
An element x ∗ ∈ X∗ and a convex function ϕ : K → R∞ always lead to a
composition ϕ ◦ x ∗ , · X∗ ×X that is convex.
5. Composition with a linear map F ◦ A
If A : dom A ⊂ Y → X is a linear map defined on a subspace dom A and
F : X → R∞ is convex, then the functional F ◦ A : Y → R∞ with
F (Ay) if y ∈ dom A,
(F ◦ A)(y) =
∞ otherwise,
is also convex.
6. Convex functions in integrals
Let (, F, μ) be a measure space, X = Lp (, KN ) for some N ≥ 1, and
ϕ : KN → R∞ convex and lower semicontinuous. Then
F (u) = ϕ u(x) dx
is convex, at least at the points where the integral exists (it may happen that
| · | ◦ ϕ ◦ u is not integrable; the function ϕ ◦ u is, due to lower semicontinuity
of ϕ, always measurable). A similar claim holds for strict convexity of ϕ, since
for nonnegative f ∈ Lp (), it is always the case that f dx = 0 implies that
f = 0 almost everywhere.
1/p
In particular, the norms up = |u(x)|p dx in Lp (, KN ) are strictly
convex norms for p ∈ ]1, ∞[ if the vector norm | · | on KN is strictly convex.
Remark 6.24 (Convexity in the Introductory Examples) The functional in (6.1) from
Examples 6.1 is strictly convex: We see this by Remark 6.22 or Lemma 6.21 and
item 2 of Example 6.23. Similarly one sees strict convexity of the functionals in (6.3)
and (6.5) for Examples 6.2–6.4.
274 6 Variational Methods
Convex functions satisfy some continuity properties. These can be deduced, quite
remarkably, only from assumptions on boundedness. Convexity allows us to transfer
local properties to global properties.
Theorem 6.25 If F : X → R∞ is convex and there exists u0 ∈ X such that F is
bounded from above in a neighborhood of u0 , then F is locally Lipschitz continuous
at every interior point of dom F .
Proof We begin with the proof of the following claim: If F bounded from above in
a neighborhood of u0 ∈ X, then it is Lipschitz continuous in a neighborhood of u0 .
By assumption, there exist δ0 > 0 and R > 0, such that F (u) ≤ R for u ∈ Bδ0 (u0 ).
Since F is bounded from below on Bδ0 (u0 ) we conclude that for u ∈ Bδ0 (u0 ) we
have u0 = 12 u + 12 (2u0 − u), and consequently
1 1
F (u0 ) ≤ F (u) + F (2u0 − u).
2 2
For distinct u, v ∈ Bδ0 /2 (u0 ) the vector w = u+α −1 (u−v) with α = 2u − vX /δ0
is still in Bδ0 (u0 ), since
δ0 δ0
w − u0 X ≤ u − u0 X + α −1 v − uX < + u − vX = δ0 .
2 2u − vX
We write u = 1
1+α v + α
1+α w as a convex combination, and conclude that
1 α α
F (u) − F (v) ≤ F (v) + F (w) − F (v) = F (w) − F (v)
1+α 1+α 1+α
≤ α(R + L) = Cu − vX
(here we used the boundedness of F in Bδ0 (u0 ) from above and below and the
definition of α). Swapping the roles of u and v in this argument, we obtain the
Lipschitz estimate |F (u) − F (v)| ≤ Cu − vX in Bδ0 /2 (u0 ).
Finally, we show that every interior point u1 of dom F has a neighborhood
on
which F is bounded from above. To that end, let λ ∈ ]0, 1[ be such that F λ−1 u1 −
(1−λ)u0 = S < ∞. Such a λ exists, since the mapping λ → λ−1 u1 −(1−λ)u0
is continuous at λ = 1 and has the value u1 there.
Furthermore, for a given v ∈ B(1−λ)δ0 (u1 ) we choose a vector u = u0 + (v −
u )/(1 − λ) (which is also in Bδ0 (u0 )). This v is a convex combination, since v =
1
we conclude that
Moreover, · Y is convex (cf. Example 6.23), and hence also weakly sequen-
tially lower semicontinuous.
The claim also holds if we assume that Y is the dual space of a separable
normed space (by weak* sequential compactness of the unit ball, see Theo-
rem 2.21) and that the embedding into X is weakly*-to-strongly closed.
2. Composition with a linear map F ◦ A
For reflexive Y , F : Y → R∞ convex, lower semicontinuous and coercive
and a strongly-to-weakly closed linear mapping A : X ⊃ dom A → Y , the
composition F ◦ A is convex (see Example 6.23) and also lower semicontinuous:
if un → u in X and lim infn→∞ F (Aun ) < ∞, then by coercivity, (un Y ) is
bounded. Hence, there exists a weakly convergent subsequence (Aunk ) v with
v ∈ Y , and without loss of generality, we can assume that limk→∞ F (Aunk ) =
lim infn→∞ F (Aun ). Moreover, unk → u, and we conclude that v = Au and
thus
which together with Fatou’s lemma (Lemma 2.46) implies the inequality
F (u) = ϕ u(x) dx ≤ lim inf ϕ un (x) dx
n→∞
≤ lim inf ϕ un (x) dx = lim inf F (un ).
n→∞ n→∞
This prove the lower semicontinuity of F and, together with convexity, the weak
sequential lower semicontinuity (Corollary 6.28).
Lower continuity of convex functionals implies not only weak lower semiconti-
nuity but also strong continuity in the interior of the effective domain.
Theorem 6.30 A convex and lower semicontinuous functional F : X → R∞ on a
Banach space X is continuous at every interior point of dom F .
Proof For the nontrivial case it is, by Theorem 6.25, enough to show, that F is
bounded from above in a neighborhood. We choose u0 ∈ int(dom F ), R > F (u0 )
and define V = {u ∈ X F (u) ≤ R}, so that the sets
u − u0
V n = u ∈ X u0 + ∈V ,
n
n ≥ 1, are a sequence of convex and closed sets (since F is convex and lower
semicontinuous; see Remark 6.26). Moreover, Vn0 ⊂ Vn1 for n0 ≤ n1 , since u0 +
n−1 0 0 −1
1 (u − u ) = u + (n0 /n1 )n0 (u − u ) is, by convexity, contained in V if u +
0 0
−1
n0 (u − u0 ) ∈ V .
Finally, we see that for all u ∈ X, the convex function t → F (u0 + t (u − u0 ))
is finite in a neighborhood of 0 (otherwise, u0 would not be an interior point of
dom F ). Without loss of generality, we assume that Fu is continuous even in this
−1
6.25. Hence, there exists n ≥ 1 such that u + n (u −
neighborhood, see Theorem 0
By the Baire category theorem (Theorem 2.14), one Vn has an interior point,
and hence V has an interior point, which implies the boundedness of F in a
neighborhood. $
#
Now we state the direct method for the minimization of convex functionals.
Theorem 6.31 (The Direct Method for Convex Functionals in a Banach Space)
Let X be a reflexive Banach space and F : X → R∞ a convex, lower semicontinu-
ous, and coercive functional. Then, there is a solution of the minimization problem
min F (u).
u∈X
Re x ∗ , u + t ∗ F (u) ≥ λ ∀u ∈ X
and
hold. This shows that λ − t ∗ < λ, i.e., t ∗ > 0. For all R > 0 we get, for u ∈ X with
uX ≤ R, the estimate
λ − Rx ∗ X∗ λ − Re x ∗ , u
≤ ≤ F (u).
t∗ t∗
Coercivity of F implies the existence of some R > 0 such that F (u) ≥ 0 for all
uX ≥ R. This shows that F is bounded from below, and Theorem 6.17 shows that
a minimizer exists.
If we now assume strict convexity of F and let u∗ = u∗∗ be two minimizers of
F , we obtain
1 1 u∗ + u∗∗
min F (u) = F (u∗ ) + F (u∗∗ ) > F ≥ min F (u),
u∈X 2 2 2 u∈X
1 p 1 q
(v) = vY , !(u) = uX
p q
with some λ > 0. Functionals F of this form are also called Tikhonov functionals.
They play an important role in the theory of ill-posed problems.
Using Lemma 6.21 and the considerations in Example 6.23, it is not hard to show
that F : X → R∞ is convex. The functional is finite on all of X, hence continuous
q
(Theorem 6.25), in particular lower semicontinuous. The term qλ uX implies the
coercivity of F (see Remark 6.13). Hence, we can apply Theorem 6.31 and see that
there exists a minimizer u∗ ∈ X. Note that the penalty λ! is crucial for this claim,
since u → (Au − u0 ) is not coercive in general. If it were, there would exist A−1
on rg(A) and it would be continuous; in the context of inverse problems, Au = u0
would not be ill-posed (see also Exercise 6.6).
For the uniqueness of the minimizer we immediately see two sufficient condi-
q
tions. On the one hand, strict convexity of · X (q > 1) implies strict convexity of
p
! and also of F . On the other hand, an injective A and strictly convex · Y (p > 1)
lead to strict convexity of u → (Au − u ) and also of F . In both cases we obtain
0
and thus (6.8). Now assume that (6.8) holds. Swapping the roles of u and v and
adding the inequalities gives
F (u + t v̄) − F (u)
w, v̄ ≤ lim = DF (u), v̄
t →0+ t
and similarly for t < 0, we get w, v̄ = DF (u), v̄. This shows that w = DF (u).
$
#
6.2 Foundations of the Calculus of Variations and Convex Analysis 281
DF (u)(v − u) G1
v−u F G2
v u K u
Fig. 6.7 Left: Interpretation of the derivative of a convex function at some point as the slope of the
respective affine support function. Right: The characterization does not hold outside of the interior
of the domain K. Both G1 (v) = F (u) + DF (u)(v − u) and G2 (v) = F (u) + w(v − u) with
w > DF (u) are estimated by F in K
Remark 6.34 For convex Gâteaux differentiable functionals we can characterize the
derivative at any interior point u of K by the inequality (6.9). More geometrically,
this means that there is an affine linear support functional at the point u, that is tight
at F (u) and below F on the whole of K. The “slope” of this functional must be
equal to DF (u); see Fig. 6.7.
Corollary 6.35 If in Theorem 6.33 the functional F is also convex and if DF (u∗ ) =
0 holds for some u∗ ∈ K, then u∗ is a minimizer of F in K.
Proof Just plug u∗ in (6.8). $
#
In the special case K = X we obtain that
if F : X → R is convex and Gâteaux differentiable, then u∗ is a minimizer if and only if
DF (u∗ ) = 0.
Example 6.36 (Euler-Lagrange Equation for Example 6.1) Let us consider again
Example 6.1, now with real-valued u0 and respective real spaces L2 (Rd ) and
H 1 (Rd ). The functional F from (6.1) is Gâteaux differentiable, and it is easy to
compute the derivative as
DF (u) (v) = u(x) − u0 (x) v(x) dx + λ ∇u(x) · ∇v(x) dx.
Rd Rd
for all v ∈ H 1 (Rd ). This is a weak formulation (see also Example 6.4) of the
equation
u∗ − λu∗ = u0 in Rd .
282 6 Variational Methods
min F (u),
u∈[−1,1]
F (1 − t) − F (1)
−DF (u∗ ) = lim ≥ 0,
t →0 t
and similarly, in the case u∗ = −1 that DF (u∗ ) ≥ 0. By (6.8) these conditions are
also sufficient for u∗ to be a minimizer. Hence, we can characterize optimality of u∗
as follows: there exists μ∗ ∈ R such that
The variable μ∗ is called the Lagrange multiplier for the constraint |u|2 − 1 ≤ 0. In
the next section we will investigate this notion in more detail.
Since we are interested in minimization problems in (infinite-dimensional)
Banach spaces, we ask ourselves how we can transfer the above technique to this
setting. For a domain ⊂ Rd and F : L2 () → R convex and differentiable on
the real Hilbert space L2 () and u∗ a solution of
min F (u),
u∈L2 (), u2 ≤1
min F (u).
u∈L2 (), u∞ ≤1
If we have a minimizer with u∗ ∞ < 1, we can not conclude that DF (u∗ ) = 0,
since the set {u ∈ L () u∞ ≤ 1} has empty interior (otherwise, the embedding
2
It is easy to see that a unique solution exists. The functional F is convex, continuous,
and coercive, but not differentiable. Nonetheless, a suitable case distinction allows
us to determine the solution. As an example, we treat the case a = 1, b = 0, and
c = 1.
1. If u∗1 = 0 and u∗2 = 0, then DF (u∗ ) = 0 has to hold, i.e.,
u∗1 − f1 + sgn(u∗1 ) = 0, u∗1 + sgn(u∗1 ) = f1 ,
⇐⇒
u∗2 + f2 + sgn(u∗2 ) = 0, u∗2 + sgn(u∗2 ) = f2 ,
and hence |f1 | > 1 as well as |f2 | > 1 (since we would get a contradiction
otherwise). Is it easy to check that the solution is
in this case.
2. If u∗1 = 0 and u∗2 = 0, then F is still differentiable with respect to u2 , and we
obtain u∗2 + sgn(u∗2 ) = f2 , and hence |f2 | > 1 and u∗2 = f2 − sgn(f2 ).
3. For u∗1 = 0 and u∗2 = 0 we obtain similarly |f1 | > 1 and u∗1 = f1 − sgn(f1 ).
4. The case u∗1 = u∗2 = 0 does not lead to any new conclusion.
All in all, we get
⎧
⎪
⎪ (0, 0) if |f1 | ≤ 1, |f2 | ≤ 1,
⎪
⎨f − sgn(f ), 0
⎪
if |f1 | > 1, |f2 | ≤ 1,
u∗ =
1 1
⎪
⎪ 0, f2 − sgn(f2 ) if |f1 | ≤ 1, |f2 | > 1,
⎪
⎪
⎩
f1 − sgn(f1 ), f2 − sgn(f2 ) if |f1 | > 1, |f2 | > 1,
since anything else would contradict the conclusions of the above cases 1–3.
In the case of general a, b, c, the computations get a little bit more involved, and
a similar claim holds in higher dimensions. In the case of infinite dimensions with
A ∈ L(2 , 2 ) symmetric and positive definite, f ∈ 2 , however, we cannot apply
the above reasoning to the problem
∞
(u, Au)
min − (f, u) + |ui |,
u∈2 2
i=1
since the objective functional is nowhere continuous, and consequently not differ-
entiable.
The above example shows again that a unified treatment of minimization
problems with nondifferentiable (or even better, noncontinuous) convex objectives
is desirable. The subdifferential is an appropriate tool in these situations.
6.2 Foundations of the Calculus of Variations and Convex Analysis 285
Some preparations are in order before we define subgradients and the subdifferen-
tial.
Lemma 6.39 Let X be a complex normed space. Then there exist a real normed
space XR and norm-preserving maps iX : X → XR and jX∗ : X∗ → XR ∗ , such that
∗ ∗ ∗ ∗
jX x , iX x = Re x , x for all x ∈ X and x ∈ X .
∗
Proof The complex vector space X turns into a real one XR by restricting the scalar
multiplication to real numbers. Then · XR = · X is a norm on XR , and hence
iX = id maps X → XR and preserves the norm. We define jX∗ : X∗ → XR ∗ via
−1
jX∗ x ∗ , xXR∗ ×XR = Re x ∗ , iX xX∗ ×X ∀x ∈ XR .
It remains to show that jX∗ preserves the norm. On the one hand, we have
−1
jX∗ x ∗ XR∗ = sup |Re x ∗ , iX x| ≤ sup |x ∗ , x| = x ∗ X∗ .
xXR ≤1 xX ≤1
(x n ) in X with x X ≤ 1
On the other hand, we can choose for every sequence n
∗ ∗
n ∗
and |x , x | → x X∗ the sequence x̄ = iX sgn x , x x in XR , which also
n n n
1. A set-valued mapping
F : X ⇒ Y or graph is a subset F ⊂ X × Y . We write
F (x) = {y ∈ Y (x, y) ∈ F } and use y ∈ F (x) synonymous to (x, y) ∈ F .
every mapping F : X → Y we denote its graph also by F =
2. For
x, F (x) x ∈ X and use F (x) = y and F (x) = {y} interchangeably.
3. For set-valued mappings F, G : X ⇒ Y and λ ∈ R let
(F + G)(x) = {y1 + y2 y1 ∈ F (x), y2 ∈ G(x)},
(λF )(x) = {λy y ∈ F (x)}.
The first function is differentiable on R\{0}. It has a kink at the origin, and it is easy
to see that st ≤ 12 t 2 + t for all t ≥ 0 if and only if s ≤ 1. Similarly, st ≤ 10
3 2
t − 14 t
holds for all t ≤ 0 if and only if s ≥ − 14 . Hence, ∂ϕ1 (0) = [− 14 , 1].
The function ϕ2 is constant on ]−1, 1[, hence differentiable there with derivative
0. At the point 1 we note that s(t −1) ≤ 0 for all t ∈ [−1, 1] if and only if s ≥ 0. This
shows that ∂ϕ2 (1) = [0, ∞[. A similar argument shows that ∂ϕ2 (−1) = ]−∞, 0],
and the subgradient is empty at all other points. In conclusion we get
⎧
⎧ ⎪ ]−∞, 0] if t = −1,
⎪ ⎪
⎪
⎨{ 10 t − 4 }
6 1
⎪ if t < 0, ⎪
⎨{0} if t ∈ ]−1, 1[,
∂ϕ1 (t) = [− 14 , 1] if t = 0, ∂ϕ2 (t) =
⎪
⎪ ⎪[0, ∞[
⎪ if t = 1,
⎩{t + 1} ⎪
⎪
if t > 0, ⎩
∅ otherwise
ϕ1 ϕ2
2 2
1.5 1.5
1 1
0.5 0.5
s s
-2 -1 0 1 2 -2 -1 -0.5 0 1 2
-0.5
-1 -1
∂ϕ1 ∂ϕ2
3 3
2 2
1 1
s s
-2 -1 0 1 2 -2 -1 0 1 2
-1 -1
-2 -2
Fig. 6.8 Example of subdifferentials of convex functions. The top row shows the graphs, the
bottom row the respective subdifferentials. The gray affine supporting functionals correspond to
the gray points in the subdifferential. The function ϕ1 is differentiable except at the origin, and
there the subgradient is a compact interval; the subgradient of the indicator functional ϕ2 is the
nonpositive semiaxis at −1 and the nonnegative semiaxis at 1. Outside of [−1, 1] it is empty
But since int(A) = A (Exercise 6.9), there exists for every x ∈ A a sequence (x n )
in int(A) converging to x, and hence Re x ∗ , x = limn→∞ Re x ∗ , x n ≤ λ. $
#
Now we collect some fundamental properties of the subdifferential.
Theorem 6.46 Let F : X → R∞ be a convex function on a real normed space X.
Then the subdifferential ∂F satisfies the following:
1. For every u, the set ∂F (u) is a convex weakly* closed subset of X∗ .
2. If F is also lower semicontinuous, then ∂F is a strongly-weakly* and also
weakly-strongly closed subset of X × X∗ , i.e., for sequences ((un , wn )) in ∂F ,
=
∗
un → u in X, wn w in X∗
⇒ (u, w) ∈ ∂F.
or un u in X, wn → w in X∗
Taking the supremum over vX < 1, we obtain wX∗ ≤ δ −1 , and hence ∂F (u)
is bounded.
To show that ∂F (u) is not empty, note that epi F has nonempty interior, since
openset Bδ (u) × ]F (u) + 1, ∞[ is a subset of the epigraph of F . Moreover,
the
u, F (u) is not in int(epi F ), since every (v, t) ∈ int(epi(F )) satisfies t > F (v).
Now we choose A = epi F and B = { u, F (u) } in Lemma 6.45 to get 0 =
(w0 , t0 ) ∈ X∗ × R,
F (u) + w · (v − u) ≤ F (v),
then w ∈ ∂F (u).
Proof We show that for every v 0 ∈ dom F there exists a sequence (v n ) in V with
limn→∞ v n = v 0 and lim supn→∞ F (v n ) ≤ F (v 0 ). Then the claim follows by
taking limits in the subgradient inequality.
Let v 0 ∈ dom F and set
K = max {k ∈ N ∃ u1 , . . . , uk ∈ dom F, u1 − v 0 , . . . , uk − v 0 linearly independent}.
290 6 Variational Methods
dom F ⊂ U = v 0 + span(u1 − v 0 , . . . , uK − v 0 ).
K K
1
Sn = v 0 + λi (ui − v 0 ) λi ≤ , λ1 , . . . , λK ≥ 0
n
i=1 i=1
for n ≥ 1 and note that their interior with respect to the relative topology on U is
not empty. Hence, for every n ≥ 1 there exists some v n ∈ Sn ∩ V , since V is also
dense in Sn . We have
K
K
K
vn = v0 + λni (ui − v 0 ) = 1 − λni v 0 + λni ui
i=1 i=1 i=1
K
for suitable λn1 , . . . , λnK ≥ 0 with n
i=1 λi ≤ 1
n. Thus, limn→∞ v n = v 0 and by
convexity of F
K
K
lim sup F (v n ) ≤ lim sup 1 − λni F (v 0 ) + λni F (ui ) = F (v 0 ). $
#
n→∞ n→∞
i=1 i=1
It is easy to see that ∂IK (u) is a convex cone: for w1 , w2 ∈ ∂IK (u) we also have
w1 + w2 ∈ ∂IK (u), and for u ∈ ∂IK (u), α ≥ 0, also αu ∈ ∂IK (u). This cone is
called the normal cone (of K at u). Moreover, it is always the case that 0 ∈ ∂IK (u),
i.e., the subgradient is nonempty exactly on K. The special case K = U + u0 , with
a closed subspace U and u0 ∈ X, leads to ∂IK (u) = U ⊥ for all u ∈ K.
If K = ∅ satisfies
K = {u ∈ X G(u) ≤ 0, G : X → R convex and Gâteaux differentiable},
6.2 Foundations of the Calculus of Variations and Convex Analysis 291
For G(u) < 0 we need to show that ∂IK (u) = {0}. By continuity of G we haves for
a suitable δ > 0 that G(u + v) < 0 for all vX < δ, and hence, every w ∈ ∂IK (u)
satisfies the inequality w, v = w, u + v − u ≤ 0 for all vX < δ; thus w = 0
has to hold. Consequently, {0} is the only element in ∂G(u).
For G(u) = 0 the claim is ∂IK (u) = {μDG(u) μ ≥ 0}. We argue that
DG(u) = 0 has to hold: otherwise, u would be a minimizer of G and the functional
could not take on negative values. Now choose w ∈ ∂IK (u). For every v ∈ X with
DG(u), v = −αv < 0, Gâteaux differentiability enables us to find some t > 0
such that
αv
G(u + tv) − G(u) − DG(u), tv ≤ t.
2
This implies G(u + tv) ≤ t ( α2v + DG(u), v) = −t α2v < 0 and hence u + tv ∈ K.
Plugging this into the subgradient inequality we get
∂IK (u)
DG(u)
0
u u
G≤0 K
DG(u), (v − u) 0
Fig. 6.9 Visualization of the normal cone associated with convex constraints. Left: The normal
cone for the set K = {G ≤ 0} with Gâteaux differentiable G consists of the nonnegative multiples
of the derivative DG(u). The plane DG(u), v − u = 0 is “tangential” to u, and K is contained
in the corresponding nonpositive halfspace DG(u), v − u ≤ 0. Right: An example of a convex
set for which the normal cone ∂IK (u) at some point u contains more than one linearly independent
direction
To prove this claim, let w ∈ ∂F (u) for u ≤ R. For every vector v ∈ X with
vX = uX , the subgradient inequality (6.10) implies
ϕ uX + w, v − u ≤ ϕ uX ⇒ w, v ≤ w, u ≤ wX∗ uX .
Taking the supremum over all vX = uX , we obtain w, u = wX∗ uX .
For u = 0, we get, additionally, by the subgradient inequality that for fixed t ≥ 0
and all vX = t, one has
ϕ(0) + w, v − 0 ≤ ϕ vX = ϕ(t) ⇒ ϕ(0) + wX t ≤ ϕ(t).
And since t ≥ 0 is arbitrary, we get wX∗ ∈ ∂ϕ uX . (For the latter claim we
have implicitly extended ϕ by ϕ(t) = infs≥0 ϕ(s) for t < 0.) For the case u = 0 we
plug v = tu−1X u for some t ≥ 0 in the subgradient inequality,
ϕ(t) = ϕ vX ≥ ϕ uX + w, v − u = ϕ uX + wX∗ (t − uX ),
and conclude that wX∗ ∈ ∂ϕ uX also in this case.
∗
To prove the reverse
inclusion, let w ∈ X be such that w, u = wX∗ uX
and wX∗ ∈ ∂ϕ uX . For all v ∈ X, we have
ϕ uX + w, v − u ≤ ϕ uX + wX∗ (vX − uX ) ≤ ϕ vX ,
i.e., w ∈ ∂F (u).
6.2 Foundations of the Calculus of Variations and Convex Analysis 293
Theorem 6.46 says that ∂F (u) = ∅ for all uX < R. If, additionally, ∂ϕ(R) = ∅
holds, we also have ∂F (u) = ∅ for all uX ≤ R, since there always exists w ∈ X∗
that satisfies w, u = wX∗ uX if the norm wX∗ is prescribed.
If X is a Hilbert space, we can describe ∂F a little bit more concretely. Using the
Riesz map JX−1 we argue as follows: For u = 0, one has w, u = wX∗ uX
if and only if (u, JX−1 w) = JX−1 wX uX , which in turn is equivalent to the
existence of some λ ≥ 0 such that JX−1 w = λu holds. Then, the condition
wX ∈ ∂ϕ uX ) becomes λ ∈ ∂ϕ uX /uX , and thus the subgradient is
given by λJX u with λ ∈ ∂ϕ uX /uX . For u = 0, it consists exactly of these
JX v ∈ X∗ with vX ∈ ∂ϕ(0). In conclusion, we get
⎧
⎨∂ϕ uX JX u if u = 0,
∂F (u) = uX
⎩
∂ϕ(0)JX {vX = 1} if u = 0.
N ∗ ∗
If w ∈ L (, R ) = L (, R ) satisfies
This can be seen as follows: p p N the
condition w(x) ∈ ∂ϕ u(x) almost everywhere, we take any v ∈ Lp (, RN )
and plug v(x) for almost every x in the subgradient inequality for φ and get, after
integration,
ϕ u(x) dx + w, v − uLp ×Lp∗ ≤ ϕ v(x) dx,
i.e., w ∈ ∂F (u). For the reverse inclusion, let w ∈ ∂F (u). Then for every v ∈
Lp (, RN ), we have
ϕ v(x) − ϕ u(x) − w(x) · v(x) − u(x) dx ≥ 0.
Now choose an at most countable set V ⊂ dom ϕ, which is dense in dom ϕ. For
every v̄ ∈ V and every measurable A ⊂ with μ(A) < ∞ we can plug vA (x) =
χA v̄ + χ\A u in the subgradient inequality and get
ϕ v(x) − ϕ u(x) − w(x) · v(x) − u(x) dx ≥ 0
A
294 6 Variational Methods
and consequently
ϕ u(x) + w(x) · v̄ − u(x) ≤ ϕ v̄ for almost every x ∈ .
Since V is countable, the union of all sets where the above does not hold is still a
nullset, and hence we get that for almost every x ∈ ,
ϕ u(x) + w(x) · v̄ − u(x) ≤ ϕ(v̄) for all v̄ ∈ V .
By Lemma 6.47 we finally conclude that w(x) ∈ ∂ϕ u(x) for almost every x ∈ .
Now we prove some useful rules for subdifferential calculus. Most rules are
straightforward generalizations of the respective rules for the classical derivatives,
sometimes with additional assumptions.
In this context we denote translations as follows: Tu0 u = u+u0 . Since this notion
is used in this chapter exclusively, there will be no confusion with the distributions
Tu0 induced by u0 (cf. Sect. 2.3). The rules of subgradient calculus are particularly
useful to find minimizers (cf. Theorem 6.43). Note that these rules often require
additional continuity assumptions.
Theorem 6.51 (Calculus for Subdifferentials) Let X, Y be real normed spaces,
F, G : X → R∞ proper convex functionals, and A : Y → X linear and continuous.
The subdifferential obeys the following rules:
1. ∂(λF ) = λ∂F for λ > 0,
2. ∂(F ◦ Tu0 )(u) = ∂F (u + u0 ) for u0 ∈ X,
3. ∂(F + G) ⊃ ∂F + ∂G and ∂(F + G) = ∂F + ∂G if F is continuous at some
point u0 ∈ dom F ∩ dom G,
4. ∂(F ◦ A) ⊃ A∗ ◦ ∂F ◦ A and ∂(F ◦ A) = A∗ ◦ ∂F ◦ A if F is continuous at some
point u0 ∈ rg(A) ∩ dom F .
Proof Assertions 1 and 2: It is simple to check the rules by direct application of the
definition.
Assertion 3: The inclusion is immediate: for u ∈ X, w1 ∈ ∂F (u) and w2 ∈
∂G(u) the subgradient inequality (6.10) implies
F (u) + G(u) + w1 + w2 , v − u
= F (u) + w1 , v − u + G(u) + w2 , v − u ≤ F (v) + G(v)
for all v ∈ X.
For the reverse inclusion, let w ∈ ∂(F + G)(u), which implies u ∈ dom F ∩
dom G and hence
With F̄ (v) = F (v) − w, v the inequality becomes F̄ (v) − F̄ (u) ≥ G(u) − G(v).
Now we aim to find a suitable linear functional that “fits” between this inequality,
i.e., some w2 ∈ X∗ for which
We note that K1 , K2 are nonempty convex sets, and moreover, int(K1 ) is not empty
(the latter due to continuity of F̄ in u0 ). Also we note that int(K1 ) ∩ K2 = ∅,
since (v, t) ∈ int(K1 ) implies t > F̄ (v) − F̄ (u) while (v, t) ∈ K2 means that
G(u) − G(v) ≥ t. If both were satisfied we would get a contradiction to (6.11).
Lemma 6.45 implies that there exist 0 = (w0 , t0 ) ∈ X∗ × R and λ ∈ R such that
w0 , v + t0 t − F̄ (u) ≤ λ ∀v ∈ dom F, F̄ (v) ≤ t,
w0 , v + t0 G(u) − t ≥ λ ∀v ∈ dom G, G(v) ≤ t.
Now we show that t0 < 0. The case t0 > 0 leads to a contradiction by letting v = u,
t > F̄ (u) and t → ∞. In the case t0 = 0, we would get w0 , v ≤ λ for all v ∈
dom F and especially w0 , u0 < λ, since u0 is in the interior of dom F . However,
since u0 ∈ dom G, we also get w0 , u0 ≥ λ, which is again a contradiction.
With t = F̄ (v) and t = G(v), respectively, we obtain
We aim to introduce a separating linear functional into this inequality that amounts
to a separation of the nonempty convex sets
K1 = epi F, K2 = Av, F (Au) + w, v − u ∈ X × R v ∈ Y .
The case t0 > 0 cannot occur, and t0 = 0 follows from the continuity of F at u0 and
u0 ∈ dom F ∩ rg(A). If we set v̄ = Au, t = F (v̄) and v = u, we also conclude that
λ = w0 , Au + t0 F (Au). By the second inequality in (6.14) we get
Fv (t) − Fv (0)
0≤ − w, v.
t
On the other hand, for every ε > 0, there exists tε > 0 such that Fv (tε ) < Fv (0) +
tε w, v + tε ε (note that w, v + ε ∈
/ ∂Fv (0)). By convexity of Fv , for every
t ∈ [0, tε ] one has
t tε − t
Fv (t) ≤ Fv (tε ) + Fv (0) ≤ F (0) + tw, v + tε,
tε tε
and hence
Fv (t) − Fv (0)
− w, v ≤ ε.
t
6.2 Foundations of the Calculus of Variations and Convex Analysis 297
Since ε > 0 was arbitrary, it follows that limt →0+ 1t Fv (t) − Fv (0) = w, v,
which proves Gâteaux-differentiability and DF (u0 ) = w. #
$
Remark 6.53
• The assertion in item 4 of Theorem 6.51 remains valid in other situations as well.
If, for example, rg(A) = X and F is convex, then ∂(F ◦ A) = A∗ ◦ ∂F ◦ A
without additional assumptions on continuity (cf. Exercise 6.11).
As another example, in the case of a densely defined linear mapping A :
dom A ⊂ Y → X, dom A = Y , we obtain the same formula for ∂(F ◦ A)
(cf. Example 6.23) when the adjoint has to be understood in the sense of
Definition 2.25 (Exercise 6.12).
• One can also generalize the continuity assumptions. Loosely speaking, the sum
rule holds if continuity of F and G at one point holds relatively to some subspaces
whose sum is the whole of X and on which one can project continuously.
Analogously, the continuity of F with respect to a subspace that contains the
complement of rg(A) is sufficient for the chain rule to hold (cf. Exercises 6.13–
6.16).
On suitable spaces, there are considerably more “subdifferentiable” convex
functions than Gâteaux-differentiable or continuous ones. We prove this claim using
the sum rule.
Theorem 6.54 (Existence of Nonempty Subdifferentials) Let F : X → R∞
be proper, convex and lower semicontinuous on a reflexive Banach space X. Then
∂F = ∅.
Proof Choose some u0 ∈ X with F (u0 ) < ∞ and t0 < F (u0 ) and consider the
minimization problem
min v − u0 2X + (t − t0 )2 + Iepi F (v, t) . (6.15)
(v,t )∈X×R
Since F is convex and lower semicontinuous, Iepi F is convex and lower semicontin-
1/2
uous (Remark 6.26 and Example 6.29). The functional (v, t) = v2X + t 2
is a norm on X × R, and hence convex, continuous, and coercive (Example 6.23
and
Remark
6.13), and obviously, the same holds for the functional G defined by
G (v, t) = (v − u0 , t − t0 )2 . Hence, problem (6.15) satisfies all assumptions of
Theorem 6.31 (see also Lemmas 6.14 and 6.21) and consequently, it has a minimizer
(u, τ ) ∈ epi F .
We prove that τ = t0 . Ifτ = t0 held,
we would get u = u and also that the
0
would get
2
G u + λ(u0 − u), t0 + λ(F (u0 ) − t0 ) = (λ − 1)2 u − u0 2X + λ2 F (u0 ) − t0 .
298 6 Variational Methods
2
Setting a = u − u0 2X and b = F (u0 ) − t0 , we calculate that the right-hand side
is minimal for λ = a/(a + b) ∈ ]0, 1[, which leads to
and
for all (v, t) ∈ X × R. One has s ≤ 0, since for s > 0 we obtain a contradiction
to (6.16) by letting v = u, t > τ and t → ∞. The case s = 0 can also not occur:
We choose v = u and t = t0 , in (6.17), and since τ = t0 we obtain
a contradiction. Hence, s < 0 and (6.16) implies with v = u and t = F (u) the
inequality τ ≤ F (u), moreover, since (u, τ ) ∈ epi F , we even get τ = F (u). For
v ∈ dom F and t = F (v) we finally get
w, v − u + s F (v) − F (u) ≤ 0 ∀v ∈ dom F,
min F (u)
u∈K
6.2 Foundations of the Calculus of Variations and Convex Analysis 299
are exactly the u∗ for which 0 ∈ ∂(F + IK )(u∗ ). By Theorem 6.51 we can write
∂(F + IK )(u∗ ) = ∂F (u∗ ) + ∂IK (u∗ ) = DF (u∗ ) + ∂IK (u∗ ). Using the result of
Example 6.48 we get the optimality condition
6
M
K= Km , Km = {u ∈ X Gm (u) ≤ 0}
m=1
M
∂IK (u) = ∂IKm (u)
m=1
⎧
⎪
⎪ M
⎨
μm DGm (u) μm ≥ 0, μm Gm (u) = 0 if u ∈ K,
=
⎪
⎪ m=1
⎩∅ otherwise.
M
u∗ ∈ K : DF (u∗ ) + μ∗m DGm (u∗ ) = 0, μ∗m ≥ 0, μ∗m Gm (u∗ ) = 0
m=1
(6.19)
for m = 1, . . . , M.
In this context, one calls the variables μ∗m ≥ 0 the Lagrange multipliers for
the constraints {Gm ≤ 0}. These have to exist for every minimizer u∗ .
The subdifferential calculus provides an alternative approach to optimality
for general convex constraints. The w∗ in (6.18) corresponds to the linear
combination of the derivatives DGm (u∗ ) in (6.19), and the existence of Lagrange
multipliers μ∗m is abstracted by the condition that w∗ is in the normal cone of K.
2. Tikhonov functionals
We consider an example similar to Example 6.32. Let X be a real Banach
space, Y a Hilbert space, A ∈ L(X, Y ), u0 ∈ Y , and p ∈ ]1, ∞[. Moreover, let
Z → X be a real Hilbert space that is densely and continuously embedded in
X, λ > 0, and q ∈ ]1, ∞[. We aim at optimality conditions for the minimization
p
of the Tikhonov functional (6.7). We begin with the functional (v) = p1 vY ,
300 6 Variational Methods
p−2
∂(v) = (JY v)vY .
The objective function F from (6.7) is continuous, and hence we can apply the
sum rule for subgradients to obtain
q
To calculate ∂!, we note that we can write ! as a concatenation of q1 · Z and
the inverse embedding i −1 from X to Z with domain of definition dom i −1 = Z.
By construction, i −1 is a closed and densely defined mapping. By continuity
of the norm in Z, the respective chain rule (Remark 6.53) holds, hence ∂! =
(i −1 )∗ ◦ ∂ q1 · Z ◦ i −1 . The space X∗ is densely and continuously embedded
q
For the minimizer u∗ of the Tikhonov functional F , it must be the case that
u∗ ∈ Z, (JZ u∗ ) ∈ X∗ , and
u − u0 2X
min + λuY (6.20)
u∈X 2
for some λ > 0. This could, for example, model a denoising problem (see
Example 6.1, where X = L2 (Rd ), but the penalty is a squared seminorm in
H 1 (Rd )). To reformulate the problem, we identify X = X∗ and write Y ⊂ X =
X∗ ⊂ Y ∗ . Every u ∈ Y is mapped via w = j ∗ j u to some w ∈ Y ∗ , where
j ∗ : X∗ → Y ∗ denotes the adjoint of the continuous embedding. It is easy to
see that X = X∗ → Y ∗ densely, and hence
λuY = sup {w, u w ∈ Y ∗ , wY ∗ ≤ λ} = sup {(w, u) w ∈ X, wY ∗ ≤ λ}.
u − u0 2X
min sup + (w, u).
u∈X w ∗ ≤λ
Y
2
Now assume that we can swap the minimum and the supremum (in general one has
only “sup inf ≤ inf sup”, see Exercise 6.19) to obtain
u − u0 2X u − u0 2X
inf sup + (w, u) = sup inf + (w, u).
u∈X w ∗ ≤λ
Y
2 wY ∗ ≤λ u∈X 2
and hence is minimal for u = u0 − w. Plugging this into the functional, we obtain
The maximization problem on the right-hand side is the dual problem to (6.20).
Obviously, it is equivalent (in the sense that the solutions are the same) to the
projection problem
u0 − w2X
min + I{wY ∗ ≤λ} (w).
w∈X 2
The latter has the unique solution w∗ = P{wY ∗ ≤λ} (u0 ), since projections onto
nonempty, convex and closed sets in Hilbert spaces are well defined (note that
{wY ∗ ≤ λ} is closed by the continuous embedding j ∗ : X∗ → Y ∗ ).
For wY ∗ ≤ λ and u ∈ X, one has
where equality holds if and only if one plugs in the respective optimal solutions u∗
and w∗ of the primal problem (6.20) and the dual problem (6.21). We rewrite the
last inequality as
pointwise.
Proof For F ≡ ∞, we just set K0 = X∗ × R. Otherwise, we construct in the
following for every pair (u, t) ∈ X × R with t < F (u) a pair (w, s) ∈ X∗ × R such
that
w, uX∗ ×X + s > t and w, vX∗ ×X + s < F (v) for all v ∈ X.
If we set K0 as the collection of all these (w, s), we obtain the claim by
and the observation that for every fixed u ∈ X and all t < F (u), one has
So let (u, t) with t < F (u) < ∞ be given (such a pair exists in this case).
Analogously to Theorem 6.31 we separate {(u, t)} from the closed, convex, and
nonempty set epi F by some (w0 , s0 ) ∈ X∗ × R, i.e., for some λ ∈ R and ε > 0,
Hence, w = −s0−1 w0 and s = s0−1 λ with τ = F (v) gives the desired inequalities
w, u + s > t and w, v + s < F (v) for all v ∈ X. In particular, K0 is not empty.
Now we treat the case t < F (u) = ∞, where we can also find some (w0 , s0 ) ∈
X∗ × R and λ ∈ R, ε > 0 with the above properties. exists v ∈ dom F with
If there
w0 , u − v > −2ε, we plug in v and τ > max t, F (v) and obtain
and thus s0 > 0; moreover, we can choose (w, s) as above. If such a v does not
exist, then w0 , v − u + 2ε ≤ 0 for all v ∈ dom F . By the above consideration
there exists a pair (w∗ , s ∗ ) ∈ X∗ × R with w∗ , · + s∗ < F . For this, one has with
c > 0 that
Analogous claims hold for functionals on the dual space, i.e., for G : X∗ → R∞
with
spt(G) = {(u, t) ∈ X × R · , uX∗ ×X + t ≤ G}.
equivalent to
w, u − F (u) ≤ −s for all u ∈ X ⇔ s ≤ − sup w, u − F (u) ,
u∈X
and hence F ∗ (w) = supu∈X w, u − F (u). This motivates the following definition.
Definition 6.60 (Dual Functional) Let F : X → R∞ be a proper functional on a
real Banach space X. Then
Lemma 6.63 Let X be a real Banach space. The Fenchel conjugation ∗ : 0 (X) →
0 (X∗ ) is invertible with inverse ∗ : 0 (X∗ ) → 0 (X).
Proof First note that the conjugation on 0 (X) and 0 (X∗ ), respectively, maps by
definition to the respective sets (Remark 6.62). For F ∈ 0 (X), there exists ∅ =
K0 ⊂ X∗ × R with F = sup(w,s)∈K0 w, · + s. By definition, K0 is contained in
spt(F ), and thus, by Remark 6.62
A similar claim is true for the conjugate of G = I{wX∗ ≤λ} . The latter situation
occurred in Example 6.56.
More generally, for F (u) = ϕ(uX ) with a proper and even ϕ : R → R∞
(i.e., ϕ(x) = ϕ(−x)), one has
1 p 1 ∗ 1 p∗
st ≤ |t| + ∗ |s|p ⇒ ϕ ∗ (s) ≤ |s| = ψ(s),
p p p∗
308 6 Variational Methods
1
and on the other hand, we obtain for t = sgn(s)|s| p−1 that
1 p 1 p−1p 1 ∗
st − |t| = 1 − |s| = ∗ |s|p .
p p p
ϕ(t) = |t| ⇒ ϕ ∗ (s) = I[−1,1] (s), ϕ(t) = I[−1,1] (t) ⇒ ϕ ∗ (s) = |s|.
Fig. 6.10 Graphical visualization of Fenchel duality in dimension one. Left: A convex (and
continuous) function F . Right: Its conjugate F ∗ (with inverted s-axis). The plotted lines are
maximal affine linear functionals below the graphs, their intersections with the respective s-axis
correspond to the negative values of F ∗ and F , respectively (suggested by dashed lines). For some
slopes (smaller than −1) there are no affine linear functionals below F , and consequently, F ∗
equals ∞ there
In other words, mw = w, · − F ∗ (w) is, for given w ∈ X∗ , the largest affine linear
functional below F . We have mw (0) = −F ∗ (w), i.e. the intersection of the graph
of mw and the s-axis is the negative value of the dual functional at w. This allows
us to construct the conjugate in a graphical way, see Fig. 6.10.
We collect some obvious rules for Fenchel conjugation.
Lemma 6.65 (Calculus for Fenchel Conjugation) Let F1 : X → R∞ be a
proper functional on a real Banach space X.
1. For λ ∈ R and F2 = F1 + λ, we have F2∗ = F1∗ − λ,
2. for λ > 0 and F2 = λF1 , we have F2∗ = λF1∗ ◦ λ−1 id,
3. for u0 ∈ X, w0 ∈ X∗ , and F2 = F1 ◦ Tu0 + w0 , · we have F2∗ = F1∗ −
· , u0 ◦ T−w0 ,
4. for a real Banach space Y and K ∈ L(Y, X) continuously invertible, we have for
F2 = F1 ◦ K that F2∗ = F1∗ ◦ (K −1 )∗ .
Proof Assertion 1: For w ∈ X∗ we get by definition that
F2∗ (w) = sup w, u − F1 (u) − λ = sup w, u − F1 (u) − λ = F1∗ (w) − λ.
u∈X u∈X
= F1∗ (w − w0 ) − w − w0 , u0 .
In view of Lemmas 6.14 and 6.21, we may ask how conjugation acts for
pointwise suprema and sums. Similarly for the calculus of the subdifferential
(Theorem 6.51), this question is a little delicate. Let us first look at pointwise
@ Let {Fi }, i ∈ I = ∅ be a family of proper functionals Fi : X → R∞
suprema.
with i∈I dom Fi = ∅. For w ∈ X∗ we deduce:
∗
sup Fi (w) = sup w, u − sup Fi (u) = sup inf {w, u − Fi (u)}
i∈I u∈X i∈I u∈X i∈I
≤ inf sup w, u − Fi (u) = inf Fi∗ (w).
i∈I u∈X i∈I
It is natural to ask whether equality holds, i.e., whether infimum and supremum can
be swapped. Unfortunately, this is not true in general: we know that the Fi∗ are con-
vex and lower semicontinuous, but these properties are not preserved by pointwise
infima, i.e. infi∈I Fi∗ is in general neither convex nor lower semicontinuous. Hence,
this functional is not even a conjugate in general. However, it still contains enough
information to extract the desired conjugate:
Theorem 6.66 (Conjugation of Suprema) Let I = 0 and Fi : X → R∞ , i ∈ I
and supi∈I Fi be proper on a real Banach space X. Then
∗ ∗∗
sup Fi = inf Fi∗ .
i∈I i∈I
= F1∗ (w 1 ) + F2∗ (w 2 ).
(F1 + F2 )∗ (w) ≤ inf F1∗ (w1 ) + F2∗ (w2 ) = (F1∗ F2∗ )(w). (6.23)
w=w1 +w2
If we assume that we can swap infimum and supremum and that the supremum is
actually assumed, then this turns into
sup inf −A∗ w, −u + F1 (u) − F2∗ (w) = max∗ −F1∗ (−A∗ w) − F2∗ (w).
w∈Y ∗ u∈X w∈Y
312 6 Variational Methods
It remains to clarify whether infimum and supremum can be swapped and whether
the supremum is realized. In other words, we have to show that the minimum (6.24)
equals the maximum in (6.25). The following theorem gives a sufficient criterion
for this to hold.
Theorem 6.68 (Fenchel-Rockafellar Duality) Let F1 : X → R∞ , F2 : Y →
R∞ be proper, convex, and lower semicontinuous on the real Banach spaces X and
Y , respectively. Further, let A : X → Y be linear and continuous, and suppose that
the minimization problem
This means that there exists w∗ ∈ Y ∗ such that −A∗ w∗ ∈ ∂F1 (u∗ ) and w∗ ∈
∂F2 (Au∗ ). Now we reformulate the subgradient inequality for −A∗ w∗ ∈ ∂F1 (u∗ ):
= F1∗ (−A∗ w∗ ).
Similarly we obtain w∗ , Au∗ − F2 (Au∗ ) ≥ F2∗ (w∗ ). Adding these inequalities,
we get
and hence
On the other hand, we have by Remark 6.62 (see also Exercise 6.19) that
We conclude that
which is the desired equality for the supremum. We also see that it is assumed at w∗ .
$
#
Remark 6.69 The assumption that there exists some u0 ∈ X such that F1 (u0 ) <
∞, F2 (Au0 ) < ∞ and that F2 is continuous at Au0 is used only to apply the
sum rule and the chain rule for subdifferentials. Hence, we can replace it with the
assumption that ∂(F1 + ∂(F2 ◦ A)) = ∂F1 + A∗ ◦ ∂F2 ◦ A. See Exercises 6.11–6.15
for more general sufficient conditions for this to hold.
The previous proof hinges on the applicability of rules for subdifferential
calculus and hence fundamentally relies on the separation of suitable convex sets. A
closer inspection of the proof reveals the following primal-dual optimality system:
Corollary 6.70 (Fenchel-Rockafellar Optimality System) If for proper, convex,
and lower semicontinuous functionals F1 : X → R∞ and F2 : Y → R∞ it is the
case that
This is equivalent to
−A∗ w∗ , u∗ = F1 (u∗ ) + F1∗ (−A∗ w∗ ) and w∗ , Au∗ = F2 (Au∗ ) + F2∗ (w∗ ).
1
F1 (u) = λuX , F2 (v) = v − u0 2Y ,
2
and obtain as primal problem (6.24)
Au − u0 2Y
min + λuX ,
u∈X 2
Substituting w̄ = −w, flipping the sign, and dropping terms independent of w̄, we
see that the problem is equivalent to a projection problem in a Hilbert space:
u0 − w̄2Y
w̄∗ = arg min ⇔ w̄∗ = P{A∗ w̄X∗ ≤λ} (u0 ).
A∗ w̄X∗ ≤λ 2
6.2 Foundations of the Calculus of Variations and Convex Analysis 315
The optimal w∗ = −w̄∗ and every solution u∗ of the primal problem satisfy (6.27),
and in particular we have
and we see that u0 − w̄∗ lies in the image of A even if this image is not closed. If
A is injective, we can apply its inverse (which is not necessarily continuous) and
obtain
u∗ = A−1 u0 − P{A∗ w̄X∗ ≤λ} (u0 ) .
This is a formula for the solution of the minimization problem (however, often of
limited use in practice), and also we have deduced that the result from Example 6.56
in the case A = I was correct.
At the end of this section we give a more geometric interpretation of the solution
of the primal-dual problem.
Remark 6.72 (Primal-Dual Solutions and Saddle Points) We can interpret the
simultaneous solution of the primal and dual problem as follows. We define the
Lagrange functional L : dom F1 × dom F2∗ → R by
and observe that every optimal pair (u∗ , w∗ ) ∈ X × Y ∗ , in the situation of (6.26),
has to satisfy the inequalities
which means that (u∗ , w∗ ) is a solution of the primal-dual problem. Hence, the
saddle points of L are exactly the primal-dual solutions.
316 6 Variational Methods
This fact will be useful in deriving so-called primal-dual algorithms for the
numerical solution of the primal problem (6.24). The basic idea of these methods is
to find minimizers of the Lagrange functional in the primal direction and respective
maximizers in the dual direction. In doing so, one can leverage the fact that L has
a simpler structure than the primal and dual problems. More details on that can be
found in Sect. 6.4.
Now we pursue the goal to apply the theory we developed in the previous sections
to convex variational problems in imaging. As described in the introduction and
in Examples 6.1–6.4, it is important to choose a good model for images; in our
terminology, to choose the penalty ! appropriately. Starting from the space H 1 ()
we will consider Sobolev spaces with more general exponents. However, we will
see that these spaces are not satisfactory for many imaging tasks, since they do not
allow a proper treatment of discontinuities, i.e. jumps, in the gray values as they
occur on object borders. This matter is notably different in the space of functions
with bounded total variation, and consequently, this space plays an important role
in mathematical imaging. We will develop the basic theory for this Banach space in
the context of convex analysis and apply it to some concrete problems.
We begin with the analysis of problems with functions of the Sobolev seminorm in
H m,p (). To that end, we begin with some important notions and results from the
theory of Sobolev spaces.
Lemma 6.73 Let ⊂ Rd be a domain, α ∈ Nd a multiindex, and 1 ≤ p, q < ∞.
∂α
Then the weak partial derivative ∂x α
∂αu
∂α ∂α
∂x α : dom ∂x α = u ∈ Lp
() α ∈ Lq () → Lq (),
∂x
∗ ∗
spaces Lp () and Lq (), and hence, by definition of the weak derivative,
∂α ϕ ∂α ϕ ∂ α un
u dx = lim un dx = lim (−1)|α| ϕ dx = (−1)|α| vϕ dx.
∂x α n→∞ ∂x α n→∞ ∂x α
α
Since this holds for every test function, we see that v = ∂x
∂
α u as desired.
∂α
We show that the mapping ∂x α is densely defined: every u ∈ D() also lies in
∂α
Lp () and has a continuous (strong) derivative of order α and in particular, ∂x αu ∈
∂α
Lq (). This shows that D() ⊂ dom ∂x α , and using Lemma 3.16 we obtain the
claim. $
#
In Sect. 3.3 on linear filters we have seen that smooth functions are dense in
Sobolev spaces with = Rd . The density of C ∞ () for bounded , however,
needs some regularity of the boundary of .
Theorem 6.74 Let ⊂ Rd be a bounded Lipschitz domain, m ≥ 1, and
1 ≤ p < ∞. Then there exists a sequence of linear operators (Mn ) in
L(H m,p (), H m,p ()) such that rg(Mn ) ⊂ C ∞ () for all n ≥ 1 and the property
that for every u ∈ H m,p (),
for every multiindex |α| ≤ m. The following argument for m = 1 can be easily
extended by induction for the full proof. For i = 1, . . . , d and v ∈ D(), one has
∂v
ϕk ∂x i
∈ D() and
∂v ∂ ∂ϕk
ϕk = (ϕk v) − v.
∂xi ∂xi ∂xi
318 6 Variational Methods
Hence,
∂v ∂ ∂ϕk ∂u ∂ϕk
uϕk dx = u (ϕk v) − u v dx = − ϕk + u v dx,
∂xi ∂xi ∂xi ∂xi ∂xi
α
∂ α (ϕk u)p ≤ ∂ α ϕk ∞ ∂ β−α up ≤ Cum,p ,
β
β≤α
and hence u → ϕk u is a linear and continuous map from H m,p () to itself.
In the following we write uk = ϕk u and set η0 = 0. Our aim is to translate
uk in the direction of ηk from the segment condition and to convolve the translated
function such that uk is evaluated only on a compact subset of . To that end,
choose (tn ) in ]0, 1] with tn → 0 such that Vk − tn ηk ⊂⊂ Uk holds for all k and
n. Let ψ ∈ D(Rd ) be a mollifier and choose, for every n, an εn > 0 such that the
support of the scaled function ψ n = ψεn satisfies
( − supp ψ n + tn ηk ) ∩ Vk ⊂⊂
for all k = 0, . . . , K. This is possible, since by the segment condition and the choice
of tn , we have
( + tn ηk ) ∩ Vk = ∩ (Vk − tn ηk ) + tn ηk ⊂⊂ .
K
Mn u = Ttn ηk (ϕk u) ∗ ψ n , (6.30)
k=0
6.3 Minimization in Sobolev Spaces and BV 319
which is, by the above consideration, in L(H m,p (), H m,p ()). Since ψ is a
mollifier, we also have Mn u ∈ C ∞ () for all n and u ∈ H m,p (). Note that
the construction of Mn is indeed independent of m and p. It remains to show that
Mn u − um,p → 0 for n → ∞.
To that end, let ε > 0. Since ϕ0 , . . . , ϕK is a partition of unity, it follows that
K
Mn u − um,p ≤ Tt uk ∗ ψ n − uk m,p .
n ηk
k=0
ε
∂ α (Ttn ηk uk − uk )p <
2M(K + 1)
for all multiindices with |α| ≤ m, and we denote the number of these multiindices
by M. With vk,n = Ttn ηk uk we get, by Theorem 3.15, the property that translation
and the weak derivative commute, and by Lemma 3.16 that for n large enough,
The claim is still true for the functions ϕ(t) = min(a, t) and ϕ(t) = max(a, t) with
some a ∈ R (by abuse of notation we set ϕ (a) = 0 here).
Proof For u ∈ C ∞ (), the result follows from the usual chain rule. For general
u ∈ H 1,p () we choose a sequence (un ) in C ∞ () that converges to u in the
320 6 Variational Methods
Hence, ϕ (un )∇un ϕ (u)∇u in Lp (, Rd ), and the claim follows from the
strong-to-weak closedness of the weak derivative (Lemma 6.73).
Now let ϕ(t) = min(a, t) and choose, for ε > 0,
<
(t − a)2 + ε2 − ε + a if t > a,
ϕε (t) =
a otherwise,
where, of course, the derivatives are weak derivatives. In the following we also
m
understand ∇ m u as an Rd -valued mapping, i.e., ∇ m u(x) is a d × d × · · · × d
tuple with
∂ ∂ ∂
(∇ m u)i1 ,i2 ,...,im = ··· u.
∂xi1 ∂xi2 ∂xim
By the symmetry of higher derivatives, one has for every permutation π :
{1, . . . , m} → {1, . . . , m} that
d 1/2 1/2
2 m 2
|ξ | = |ξi1 ,...,im | = |ξα | . (6.32)
α
i1 ,...,im =1 |α|=m
Remark 6.78
The Sobolev seminorm in (6.31) is slightly different from the usual
definition |α|=m ∂ α up , but both are equivalent. We choose the form (6.31) to
ensure that the norm is both translation-invariant and also rotation-invariant; see
Exercise 6.25.
322 6 Variational Methods
To construct these mappings Pm we start with the calculation of the kernel of the
seminorms ∇ m · p .
Lemma 6.79 Let be a bounded domain, m ≥ 1, and 1 ≤ p, q < ∞. Then for
m
u ∈ Lq () and ∇ m u ∈ Lp (, Rd ), one has that ∇ m up = 0 if and only if
u ∈ "m () = {u : → R u is a polynomial of degree < m}.
is called the projection onto "m ; the mapping Pm : Lq () → Lq () defined by
Pm = id −Qm is the projection onto the complement of "m .
6.3 Minimization in Sobolev Spaces and BV 323
It is clear that the set of monomials {x → x α |α| < m} is a basis for "m ().
Hence, the projection is well defined.
One easily sees that Q2m = Qm . The map Pm = id −Qm is also a projection with
ker(Pm ) = "m and should be a map suitable for (6.33) to hold. The upper estimate
is clear: since Qm u ∈ "m holds and the seminorm can be estimated by the Sobolev
norm, it follows that
for all u ∈ H m,p (). The following lemma establishes the other inequality:
Lemma 6.81 Let ⊂ Rd be a bounded Lipschitz domain, 1 < p < ∞, m ≥ 1.
There exists a constant C > 0 such that for all u ∈ H m,p () with Qm u = 0 one
has
Proof Let us assume that the inequality is wrong, i.e., that there exists a sequence
(un ) in H m,p () with Qm un = 0, un m−1,p = 1, such that ∇ m un p ≤ n1 for all
n, i.e., ∇ m un → 0 for n → ∞. Since H m,p () is reflexive and un is bounded in
the respective norm, we can also assume that un u for some u ∈ H m,p () with
Qm u = 0. By the weak closedness of ∇ m we see that also ∇ m u = 0 has to hold. By
Lemma 6.79 we have u ∈ "m and consequently u = 0, since Qm u = 0.
By the compact embedding into H m−1,p () (see Theorem 6.76) we obtain
Pm un m−1,p → 0 for n → ∞. This is a contradiction to Pm un m−1,p = 1. $
#
Corollary 6.82 (Poincaré-Wirtinger Inequality) In the above situation, for k =
0, . . . , m one has
Proof First, let k = m. We plug Pm u into (6.34), and get Pm um−1,p ≤
C∇ m Pm up = C∇ m up . Adding ∇ m Pm up = ∇ m up on both sides and
using the fact that · m−1,p +∇ m · p is equivalent to the norm in H m,p () yields
the claim.
The case k < m follows from the estimate uk,p ≤ um,p for all u ∈
H m,p (). $
#
Now we prove the desired properties of the Sobolev penalty.
Lemma 6.83 Let ⊂ Rd be a bounded Lipschitz domain and let ϕ : [0, ∞[ →
R∞ be proper, convex, lower semicontinuous, and non-decreasing. Further, let 1 <
p < ∞ and m ≥ 0. Then the functional
ϕ ∇ m up if u ∈ H m,p (),
!(u) =
∞ otherwise,
By Lemma 3.16, ∇ m is densely defined, and we aim to show that it is also strongly-
to-weakly closed. To that end, let (un ) be a sequence in Lq () ∩ H m,p (),
converging in Lq () to some u ∈ Lq () that also satisfies ∇ m un v in
m
Lp (, Rd ). By Lemma 6.73 we get that v = ∇ m u, but it remains to show that
u∈H m,p ().
Here we apply (6.35) and conclude that the sequence (Pm un ) in H m,p ()
is bounded. By reflexivity we can assume, by moving to a subsequence, that
Pm un w for some w ∈ H m,p (), and by the compact embedding from
Theorem 6.76 we get Pm un → w and, as a consequence of the embedding of the
Lebesgue spaces, un → u in L1 (). This shows strong convergence of the sequence
Qm un = un − Pm un → u − w in the finite dimensional space "m . We can view this
space as a subspace of H m,p () and hence, we have Qm un → u − w in H m,p ()
by equivalence of norms. This gives un = Pm un + Qm un w + u − w = u in
H m,p (); in particular, u is contained in this Sobolev space.
Since ∇ m is strongly-to-weakly closed, we get from Example 6.23 and
Lemma 6.21 the convexity and by Example 6.29 and Lemmas 6.28 and 6.14
the (weak) lower semicontinuity of !.
To prove coercivity, let q ≤ pd/(d − mp) if mp < d and let (un ) be a sequence
in Lq () with Pm un q → ∞. Now assume that there exists a L > 0 such that
∇ m un p ≤ L for infinitely many n. For these n, one has un ∈ H m,p () and by
the Poincaré-Wirtinger inequality (6.35) and the continuous embedding of H m,p ()
into Lq () by Theorem 6.76, we obtain
Now we have all the ingredients for proving the existence of solutions of
minimization problems with Sobolev seminorms as penalty.
Theorem 6.84 (Existence of Solutions with Sobolev Penalty) Let ⊂ Rd be a
bounded Lipschitz domain, m ∈ N, m ≥ 1, 1 < p, q < ∞ with q ≤ pd/(d − mp)
if mp < d. Moreover, let : Lq () → R∞ be proper on H m,p (), convex, lower-
semicontinuous on Lq (), and coercive on "m in the sense that
Moreover, set
ϕ ∇ m up if u ∈ H m,p (),
!(u) =
∞ otherwise
If ϕ is strictly convex, any two solutions u∗ and u∗∗ differ only in "m .
Proof By assumption and using Lemma 6.83 we see that F = + λ! is proper,
convex, and lower semicontinuous on the reflexive Banach space Lq (). To apply
Theorem 6.31, we need only to show coercivity.
First, note that is bounded from below by an affine linear functional, and by
∗
Theorem 6.54 we can even choose u0 ∈ Lq (), w0 ∈ Lq () such that (u0 ) +
w0 , u − u0 ≤ (u) for all u ∈ Lq (). In particular, we obtain the boundedness
of from below on bounded sets. Now assume that un q → ∞ for a sequence
(un ) in Lq (). For an arbitrary subsequence (unk ) consider the sequences (Pm unk )
and (Qm unk ). We distinguish two cases. First, if Pm unk q is bounded, Qm unk q
has to be unbounded, and by moving to a subsequence we get Qm unk q → ∞. By
assumption we get (unk ) → ∞ and, since !(unk ) ≥ 0, also F (unk ) → ∞.
Second, if Pm unk q is unbounded, we get by Lemma 6.83 (again moving to a
subsequence if necessary) !(unk ) → ∞. If, moreover, Qm unk q is bounded, then
since the term in parentheses goes to infinity by the strong coercivity of ! (again,
cf. Lemma 6.83). In the case that Qm unk q is unbounded, we obtain (again moving
to a subsequence if necessary) (unk ) → ∞ and hence F (unk ) → ∞.
Since the above reasoning holds for every subsequence, we see that for the whole
sequence we must have F (un ) → ∞, i.e., F is coercive. By Theorem 6.31 there
exists a minimizer u∗ ∈ Lq ().
Finally, let u∗ and u∗∗ be minimizers with u∗ − u∗∗ ∈ / "m . Then ∇ m u∗ =
∗∗ m
∇ u , and since · p on L (, R ) is based on a Euclidean norm on Rd
m p d m
(see Definition (6.31) and explanations in Example 6.23 for norms and convex
integrands), for strongly convex ϕ, we have
∇ m (u∗ + u∗∗ ) 1 1
ϕ < ϕ ∇ m u∗ p + ϕ ∇ m u∗∗ p
2 p 2 2
and hence F 12 (u∗ + u∗∗ ) < 12 F (u∗ ) + 12 F (u∗∗ ), a contradiction. We conclude that
u∗ − u∗∗ ∈ "m . $
#
Remark 6.85
• One can omit the assumption q ≤ pd/(d − mp) for mp < d if is coercive on
the whole space Lq ().
• Strong coercivity ϕ can be replaced by mere coercivity if is bounded from
below.
• If is strictly convex, minimizers are unique without further assumptions on ϕ.
We can apply the above existence result to Tikhonov functionals that are
associated with the inversion of linear and continuous mappings.
Theorem 6.86 (Tikhonov Functionals with Sobolev Penalty) Let , d, m, p be
as in Theorem 6.84, q ∈ ]1, ∞[, and Y a Banach space and A ∈ L(Lq (), Y ). If
one of the conditions
1. q ≤ pd/(d − mp) for mp < d and A injective on "m
2. A injective and rg(A) closed
is satisfied, then there exists for every u0 ∈ Y , r ∈ [1, ∞[, and λ > 0 a solution for
the minimization problem
p
Au − u0 rY ∇ m up
min +λ . (6.36)
q
u∈L () r p
In the case r > 1 and strictly convex norm in Y , the solution is unique.
6.3 Minimization in Sobolev Spaces and BV 327
Proof To apply Theorem 6.84 in the first case, it suffices to show coercivity in the
sense that Pm un q bounded and Qm uq → ∞ ⇒ 1r Au − u0 rY → ∞. All
other conditions are satisfied by the assumptions or are simple consequences of them
(see also Example 6.32).
Now consider A restricted to the finite-dimensional space "m and note that this
restriction is, by assumption, injective and hence boundedly invertible on rg(A|"m ).
Hence, there is a C > 0 such that uq ≤ CAuY for all u ∈ "m . Now let
(un ) be a sequence in Lq () with Pm un q bounded and Qm un q → ∞. Then
(APm un − u0 Y ) is also bounded (by some L > 0), and since Qm projects onto
"m , one has
in that space. We use the inner product coming from the norm (6.32), namely
d
m
a·b = ai1 ,...,im bi1 ,...,im for a, b ∈ Rd .
i1 ,...,im =1
What is (∇)∗ w for w ∈ D(, Rd )? We test with u ∈ C ∞ () ⊂ dom ∇ m , and get
m
d
w · ∇ m u dx = wi1 ,...,im ∂ i1 · · · ∂ im u dx
i ,...,i =1
1 m
d
= (−1)m ∂ i1 · · · ∂ im wi1 ,...,im u dx.
i1 ,...,im =1
Since C ∞ () is dense in Lp (), we see that the adjoint is the differential operator
on the right-hand side. In the case m = 1 this amounts to ∇ ∗ = − div, and hence
we write
d
(∇ m )∗ = (−1)m divm = (−1)m ∂ i1 · · · ∂ im .
i1 ,...,im =1
We write v = divm w if the weak divergence exists. Similarly to Lemma 6.73 one
∗ m ∗
can show that this defines a closed operator between Lp (, Rd ) and Lq ().
Since (∇ m )∗ is also closed, we obtain for (wn ) in D(, Rd ) with limn→∞ wn = w
m
∗ ∗
in Lp (, Rd ) and limn→∞ (−1)m divm wn = v in Lq () also (∇ m )∗ w = v =
m
(−1)m divm w with the weak mth divergence. We have shown that for
∗ m ∗ m
m
Ddiv = w ∈ Lp (, Rd ) ∃ divm w ∈ Lq () and sequence (wn ) in D(, Rd )
with lim wn − wp∗ + divm (wn − w)q ∗ = 0 , (6.37)
n→∞
6.3 Minimization in Sobolev Spaces and BV 329
K
wn = M∗n w = ϕk T−tn ηk (w ∗ ψ̄ n ) ,
k=0
K
n = ( − supp ψ n + tn ηk ) ∩ Vk ,
k=0
and by the fundamental lemma of the calculus of variations (Lemma 2.75) we also
get wn = 0 on \n . This shows that wn ∈ D(, Rd ).
∗
The sequence wn converges weakly in Lp (, Rd ) to w: for u ∈ Lp (, Rd ),
one has Mn u → u in Lp (, Rd ) and hence wn , u = w, Mn u → w, u.
Moreover, for u ∈ dom ∇, one has
K
Mn ∇u = ∇(Mn u) − Ttn ηk (u∇ϕk ) ∗ ψ n = ∇(Mn u) − Nn u
k=0
=Nn u
330 6 Variational Methods
K
lim Nn u = u∇ϕk = 0 in Lq (, Rd )
n→∞
k=0
holds, since (ϕk ) is a partition of unity. This implies that for every u ∈ Lq (),
since Nn up ≤ CNn uq because we have p ≤ q. By the uniform boundedness
∗
principle (Theorem 2.15) we get that (− div wn ) is bounded in Lq (, Rd ), and
hence there exists a weakly convergent subsequence. By the weak closedness of
− div, the weak limit of every weakly convergent subsequence has to coincide with
− div w; consequently, the whole sequence converges, i.e., − div wn − div w as
n → ∞.
We have shown that
∗ ∗
dom ∇ ∗ = w ∈ Lp (, Rd ) ∃ div w ∈ Lq () and sequence (w n ) in D(, Rd )
∗ ∗
with w n w in Lp (, Rd ) and div w n div w in Lq () .
Now assume that there exist ε > 0 and w0 ∈ dom ∇ ∗ for which w − w0 p∗ ≥ ε
or div(w − w0 )q ∗ ≥ ε for every w ∈ D(, Rd ). Then by the definition of the
dual norm as a supremum, there exists v ∈ Lp (, Rd ), vp ≤ 1, or u ∈ Lq (),
uq ≤ 1, with
ε ε
w − w 0 , v ≥ or div(w − w 0 ), u ≥ for all w ∈ D(, Rd ).
2 2
This is a contradiction, and hence we can replace weak by strong convergence and
get dom ∇ ∗ = Ddiv
1 , as desired. $
#
Remark 6.89 The domain of definition of the adjoint ∇ ∗ can be interpreted in a
different way. To that end, we consider certain boundary values on ∂.
6.3 Minimization in Sobolev Spaces and BV 331
and with Theorem 2.73 and the fundamental lemma of the calculus of variations
(Lemma 2.75) we get that v − w · ν = 0 Hd−1 -almost everywhere on ∂. This
motivates a more general definition of the so-called normal trace on the boundary.
We say that v ∈ L1Hd−1 (∂) is the normal trace of the vector field w ∈
∗ ∗
Lp (, Rd ) with div w ∈ Lq (), if there exists a sequence (wn ) in C ∞ (, Rd )
∗ ∗
with wn → w in Lp (, Rd ), div wn → div w in Lq () and wn · ν → v in
L1Hd−1 (∂), such that for all u ∈ C ∞ (),
uv dH d−1
= u div w + ∇u · w dx.
∂
Remark 6.90 The closed operator ∇ m from Lq () to Lp (, Rd ) on the bounded
Lipschitz domain with q ≤ pd/(d − mp) if mp < d has a closed range: If (un )
m
is a sequence in Lq () such that limn→∞ ∇ m un = v for some v ∈ Lp (, Rd ),
then (Pm un ) is a Cauchy sequence, since the Poincaré-Wirtinger inequality (6.35)
and the embedding into Lq () (Theorem 6.76) lead to
p
Proof The convex functional F (v) = p1 vp defined on Lp (, Rd ) is, as a pth
power of a norm, continuous everywhere, in particular at every point of rg(∇).
Since ! = F ◦ ∇, we can apply the identity ∂! = ∇ ∗ ◦ ∂F ◦ ∇ in the sense
of Definition 6.41 (see Exercise 6.12). By the rule for subdifferentials for convex
integrands (Example 6.50) as well as Gâteaux differentiability of ξ → p1 |ξ |p we
get
1
F (v) = |v(x)|p dx ⇒ ∂F (v) = {|v|p−2 v},
p
and it holds ∂!(u) = ∅ if and only if u ∈ dom ∇ = H 1,p () and |∇u|p−2 ∇u ∈
dom ∇ ∗ . By Theorem 6.88 and Remark 6.89, respectively, we can express the latter
∗
by div(|∇u|p−2 ∇u) ∈ Lq () with |∇u|p−2 ∇u · ν = 0 on ∂. In this case we get
∗
∂!(u) = ∇ ◦ ∂F (∇u) which shows the desired identity. $
#
Remark 6.92 It is easy to see that the case p = 2 leads to the negative Laplace
operator for functions u that satisfy ∇u · ν = 0 on the boundary; ∂ 12 ∇ · 22 = −.
6.3 Minimization in Sobolev Spaces and BV 333
Thus, the generalization for p ∈ ]1, ∞[ is called p-Laplace operator; hence, one
p
can say that the subgradient ∂ p1 ∇ · p is the p-Laplace operator for functions with
the boundary conditions |∇u|p−2 ∇u · ν = 0.
Example 6.93 (Solution of the p-Laplace Equation) An immediate application of
the above result is the proof of existence and uniqueness of the p-Laplace equation.
For a bounded Lipschitz domain , 1 < p ≤ q < ∞, q ≤ d/(d − p) if p < d, and
∗
f ∈ Lq () we consider the minimization problem
1
min |∇u|p dx − f u dx + I{v∈Lq () (u). (6.38)
q
u∈L () p v dx=0}
and ϕ(t) = p1 t p . To that end we note that (0) = 0, and hence, is proper
on H 1,p (). Convexity and lower semicontinuity are immediate, and coercivity
follows from the fact that for u ∈ Lq() and v ∈ "1 (i.e. v is constant) with
v = 0, one has (u + v) = ∞, since u + v dx = 0. Thus, the assumptions in
Theorem 6.84 are satisfied, and the minimization problem has a solution u∗ , which
is unique up to contributions from "1 . A solution u∗∗ different from u∗ would
satisfy u∗∗ = u∗ + v with v ∈ "1 , v = 0, and this would imply (u∗∗ ) = ∞,
a contradiction. Hence, the minimizer is unique.
Let us deduce the optimality conditions for u∗ . Here we face a difficulty, since
neither the Sobolev term ! nor is continuous, i.e., the assumption for the sum
rule in Theorem 6.51 are not satisfied. However, we see that both u → !(Q1 u) and
u → (P1 u) are continuous. Since for all u ∈ Lq () we have u = P1 u + Q1 u,
we can apply the conclusion from Exercise 6.14 and get 0 ∈ ∂!(u∗ ) + ∂(u∗ ).
By Lemma 6.91 we know that ∂!; let us compute ∂(u∗ ). Since u → f u dx is
continuous, Theorem 6.51 and Example 6.48 lead to
−f + "1 if u dx = 0,
∂(u) =
∅ otherwise,
⊥
since ("1 )⊥ = "1 . Thus, u∗ is optimal if and only if there exists some λ∗ ∈ R
such that
− div |∇u∗ |p−2 ∇u∗ = f − λ∗ 1 in ,
|∇u∗ |p−2 ∇u∗ · ν = 0 on ∂,
∗
u dx = 0,
334 6 Variational Methods
∗
where 1 ∈ Lq () denotes the function that is constant 1. It is easy to calculate the
value λ∗ : we integrate the equation in on both sides to get
∗
f dx − λ || = ∇ ∗ |∇u∗ |p−2 ∇u∗ dx = 0,
and hence λ∗ = || −1
f dx, which is the mean value of f . In conclusion, we
∗
have shown that for every f ∈ L () with f dx = 0, there exists a unique
q
The theory of convex minimization with Sobolev penalty that we have developed up
to now gives a unified framework to treat the motivating examples from Sect. 6.1. In
the following we revisit these problems and present some additional examples.
Application 6.94 (Denoising with Lq -Data and H 1,p -Penalty) Consider the
denoising problem on a bounded Lipschitz domain ⊂ Rd . Further we assume
that 1 < p ≤ q < ∞. Let u0 ∈ Lq () be a noisy image and let λ > 0 be given. We
aim to denoise u0 by solving the minimization problem
1 λ
min |u − u | dx +
0 q
|∇u|p dx. (6.39)
u∈Lq () q p
It is easy to see that this problem has a unique solution: the identity A = id is
injective and has closed image, and since for r = q the norm on Lr () is strictly
convex, we obtain uniqueness and existence from Theorem 6.86.
Let us analyze the solutions u∗ of (6.39) further. For example, it is simple to see
that the mean values of u∗ and u0 are equal in the case q = 2, i.e., Q1 u∗ = Q1 u0
(see Exercise 6.27, which treats a more general case). It is a little more subtle to
show that a maximum principle holds. We can derive this fact directly from the
properties of the minimization problem:
Theorem 6.95 Let ⊂ Rd be a bounded Lipschitz domain and let 1 < p ≤ q <
∞. Moreover, let u0 ∈ L∞ () with L ≤ u0 ≤ R almost everywhere and λ > 0.
Then the solution u∗ of (6.39) also satisfies L ≤ u∗ ≤ R almost everywhere.
6.3 Minimization in Sobolev Spaces and BV 335
Moreover, by Lemma 6.75 we get u ∈ H 1,p (), and also that ∇u = ∇u∗ almost
everywhere in {L ≤ u∗ ≤ R} and ∇u = 0 almost everywhere else. Hence,
1 1
|∇u|p dx ≤ |∇u∗ |p dx,
p p
This shows, at least formally, that the solution u∗ satisfies the nonlinear partial
differential equation
−G x, u∗ (x), ∇u∗ (x), ∇ 2 u∗ (x) = 0 in ,
ξ ξ
G(x, u, ξ, Q) = |u − u0 (x)|q−2 u0 (x) − u + λ|ξ |p−2 trace id +(p − 2) ⊗ Q
|ξ | |ξ |
u† u0 = u† + , PSNR = 20.00db
Fig. 6.11 Illustration of the denoising capabilities of variational denoising with Sobolev penalty.
Top: Left the original, right its noisy version. Middle and bottom: The minimizer of (6.39) for
q = 2 and different Sobolev exponents p. To allow for a comparison, the parameter λ has been
chosen to maximize the PSNR with respect to the original image. Note that the remaining noise
and the blurring of edges is less for p = 1.1 and p = 1.2 than it is for p = 2
6.3 Minimization in Sobolev Spaces and BV 337
u† u0 = u† + , PSNR = 12.02db
Fig. 6.12 Illustration of the influence of the exponent q in the data term of Application 6.94. Top:
Left the original, right a version with strong noise. Middle and bottom: The minimizer of (6.39)
for p = 2 and different exponents q, again with λ optimized with respect to the PSNR. For larger
exponents q we see some “impulsive” noise artifacts, which again get less for q = 6, q = 12 due
to the choice of λ. The image sharpness does not vary much
338 6 Variational Methods
By Theorem 3.13, A maps Lq () to Lq ( ) linearly and continuously for every
1 ≤ q < ∞. Moreover, A maps constant functions in to constant functions in ,
and hence A is injective on "1 .
If we choose 1 < p ≤ q < ∞ and q ≤ pd/(d − p) if p < d, then Theorem 6.86
implies the existence of a unique minimizer of the Tikhonov functional
1 0 q λ
min |u ∗ k − u | dx + |∇u|p dx (6.41)
q
u∈L () q p
u† u0 , PSNR(u0 , u† ∗ k) = 33.96db k
Fig. 6.13 Illustration of the method (6.41) for joint denoising and deblurring. Top: Left the
original (320 × 320 pixels), right the measured data (310 × 310 pixels) obtained by convolution
with an out-of-focus kernel (right, diameter of 11 pixels) and addition of noise. Middle and bottom:
The minimizer of (6.41) for q = 2 and different exponents p with λ optimized for PSNR. For p
close to 1 one sees, similarly to Fig. 6.11, a reduction of noise, fewer oscillating artifacts, and a
sharper reconstruction of edges
340 6 Variational Methods
u and u coincide on ∂ . If, conversely, for u ∈ H ( ), the trace of u equals the
0 1,p
trace of u0 on ∂ , then Lemma 6.99 implies that u − u0 ∈ H0 () has to hold.
1,p
= {u ∈ Lq () u|\ = u0 |\ , u| ∈ H 1,p ( ), u|∂ = u0 |∂ },
where we have understood the restriction onto ∂ as taking the trace with respect
to . This motivates the definition of a linear map ∇0 from Lq ( ) to Lp ( ):
The space Lq ( ) contains H0 ( ) as a dense subspace, and thus ∇0 is densely
1,p
defined. The map is also closed: To see this, let un ∈ H0 ( ) with un → u in
1,p
is continuous on the affine subspace u0 + X1 , X1 = {v ∈ Lq () v| = 0}. Also
is continuous on the subspace u0 + X2 , X2 = {v ∈ Lq () v|\ = 0}. Since the
restrictions P1 : u → uχ\ and P2 : u → uχ , respectively, are continuous and
they sum to the identity, we can apply the result of Exercise 6.14 and get ∂(F1 +
F2 ) = ∂F1 + ∂F2 .
If we denote by A : Lq () → Lq ( ) the restriction to and by E :
L ( , Rd ) → Lp (, Rd ) the zero padding, we get
p
p
F1 = p · p
1
◦ T∇u0 ◦ E ◦ ∇0 ◦ A ◦ T−u0 .
The map A is surjective, and by the results of Exercises 6.11 and 6.12 for A and
∇0 , respectively, as well as Theorem 6.51 for T−u0 , E, and T∇u0 , the subgradient
satisfies
A∗ ∇0∗ E ∗ Jp ∇u0 + ∇(u − u0 )|
1,p
if (u − u0 )| ∈ H0 (),
∂F1 (u) =
∅ otherwise,
and this shows that ∇0∗ w = − div w in the sense of the weak divergence. Conversely,
∗ ∗
let w ∈ Lp ( , Rd ) such that − div w ∈ Lq ( ). Then by the definition of
H0 ( ), we can choose for every u ∈ H0 ( ) a sequence (un ) in D( ) such
1,p 1,p
w · ∇u dx = lim w · ∇un dx = − lim (div w)un dx = − (div w)u dx,
n→∞ n→∞
and hence w ∈ dom ∇0∗ and ∇0∗ = − div w. We have shown that
∗ ∗
∇0∗ = − div, dom ∇0∗ = {w ∈ Lp ( , Rd ) div w ∈ Lq ( )}
in other words, the adjoint of the gradient with zero boundary conditions is the weak
divergence. In contrast to ∇ ∗ , ∇0∗ operates on all vector fields for which the weak
divergence exists and not only on those for which the normal trace vanishes at the
boundary (cf. Theorem 6.88 and Remark 6.89).
344 6 Variational Methods
For the subgradients of F1 we get, using the convention that gradient and
divergence are considered on and the divergence will be extended by zero, that
if (u − u0 )| ∈ H0 ( ),
1,p
− div ∇(u| )|∇(u| )|p−2
∂F1 (u) =
∅ otherwise.
Since the divergence of \ is extended by zero and (u∗ − u0 )| ∈ H0 () if
1,p
and only if u∗ | ∈ H 1,p ( ) with u∗ |∂ = u0 |∂ in the sense of the trace, we
conclude the characterization
u∗ = u0 in \ ,
− div ∇u∗ |∇u∗ |p−2 = 0 in , (6.46)
u∗ = u0 on ∂ .
Note that the last equality has to be understood in the sense of the trace of u0 on
∂ with respect to . In principle, this could depend on the values of u0 in the
inpainting domain . However, it is simple to see that the traces of ∂ with respect
to and \ coincide for Sobolev functions u0 ∈ H 1,p (). Hence, the solution
of the inpainting problem is independent of the auxiliary function u0 .
Again, the optimality conditions (6.46) show that u∗ has to be locally smooth in
: by the same argument as in Applications 6.94 and 6.97, we get u∗ ∈ H 2,p ( )
for all ⊂⊂ , if p < 2. For p = 2 we even get that the solution u∗ in
is harmonic (Example 6.4), and hence, u∗ ∈ C ∞ ( ). The case p > 2 is
treated in some original papers (see, e.g., [60] and the references therein) and at
least gives u∗ ∈ C 1 ( ). Thus, the two-dimensional case (d = 2) always leads to
continuous solutions, and this says that this method of inpainting is suited only for
the reconstruction of homogeneous regions; see also Fig. 6.14.
6.3 Minimization in Sobolev Spaces and BV 345
Fig. 6.14 Inpainting by solving (6.43) or (6.44), respectively. Top: Left the original u† together
with two enlarged details (according to the marked regions in the image), right the given u0 on
\ ( is given by the black region), again with details. Middle and bottom: The minimizer of
the inpainting functional for different p together with enlarged details. While the reconstruction of
homogeneous regions is good, edges get blurred in general. As the details show, this effect is more
prominent for larger p; on the other hand, for p = 1.1 some edges are extended in a sharp way (the
edge of the arrow in the left, lower detail), but the geometry is not always reconstructed correctly
(disconnected boundaries of the border of the sign in the upper right detail)
346 6 Variational Methods
holds for all u ∈ Lq (). The surjectivity of A is equivalent to the linear indepen-
dence of the wi,j ; the assumption that constant functions are not in the kernel of
A can be expressed with the vector w̄ ∈ RN×M , defined by w̄i,j = wi,j dx,
simply as w̄ = 0. In view of the above, the choice wi,j (x1 , x2 ) = k(i − x1 , j − x2 )
∗
with suitable k ∈ Lq (R2 ) seems natural. This amounts to a convolution with
subsequence point sampling, and hence k should be a kind of low-pass filter, see
Sect. 4.2.3. It is not hard to check that for example, k = χ]0,1[×]0,1[ satisfies the
assumptions for the map A. In this case the map A is nothing else than averaging u
over the squares ]i − 1, i[ × ]j − 1, j [. Using k(x1 , x2 ) = sinc(x1 − 12 ) sinc(x2 − 12 )
leads to the perfect low-pass filter for the sampling rate 1 with respect to the
midpoints of the squares (i − 12 , j − 12 ); see Theorem 4.35.
The assumption that the image u is an interpolation for the data U0 is now
expressed as Au = U 0 . However, this is true for many images; indeed it is true
for an infinite-dimensional affine subspace of Lq (). We aim to find the image that
is best suited for a given image model. Again we use the Sobolev space H 1,p ()
for p ∈ ]1, ∞[ as a model, where we assume that q ≤ 2/(2 − p) holds if p < 2.
This leads to the minimization problem
1
min |∇u|p dx + I{v∈Lq () Av=U 0 } (u). (6.47)
q
u∈L () p
by the embedding H 1,p () → Lq () (see Theorem 6.76). Now assume that A :
H 1,p () → RN×M is not surjective. Since RN×M is finite-dimensional, the image
of A is a closed subspace, and hence it must be that Au − U 0 ≥ ε for all u ∈
H 1,p () and some ε > 0. However, H 1,p () is dense in Lq (), and hence there
has to be some ū ∈ H 1,p () with ū − u0 q < 2ε A−1 , and thus
which is a contradiction. Hence, the operator A has to map H 1,p () onto RN×M ,
and thus there is some u1 ∈ H 1,p () with Au1 = U 0 . In particular is proper on
H 1,p ().
Convexity of is obvious, and the lower semicontinuity follows from the
continuity of A and the representation K = A−1 ({U 0 }). Finally, we assumed that
A does not map constant functions to zero, i.e., for u ∈ K and v ∈ "1 with v = 0,
we have u + v ∈ / K. This shows the needed coercivity of . We conclude with
Theorem 6.84 that there exists a minimizer u∗ of the functional in (6.47), and one
can argue along the lines of Application 6.98 that it is unique.
If we try to apply convex analysis to study the minimizer u∗ , we face similar
problems to the one in Application 6.98: The restriction is not continuous, and
hence we cannot use the sum rule for subdifferentials without additional work.
However, there is a remedy.
To see this, we note that by surjectivity of A : H 1,p () → RN×M there are NM
linear independent vectors ui,j ∈ H 1,p () (1 ≤ i ≤ N and 1 ≤ j ≤ M) such that
the restriction of A to V = span(ui,j ) ⊂ Lq () is bijective. Hence, there exists
A−1 −1
V and for T1 = AV A, one has T1 = T1 and ker(T1 ) = ker(A). For T2 = id −T1
2
this implies that rg(T2 ) = ker(A). With the above u1 the function
1
u∈V : u → |∇(u1 + u)|p dx
p
first term is again the p-Laplace operator, while for the second we have
ker(A)⊥ if Au = U 0 ,
∂IK (u) =
∅ otherwise.
Since A is surjective, the closed range theorem (Theorem 2.26) implies that
ker A⊥ = rg(A∗ ) = span(wi,j ), the latter since wi,j = A∗ ei,j for 1 ≤ i ≤ N
and 1 ≤ j ≤ M. Hence, the optimality conditions say that u∗ ∈ Lq () is a solution
348 6 Variational Methods
N
M
λ∗i,j w i,j
dx = ∇ ∗ |∇u∗ |p−2 ∇u∗ dx = 0,
i=1 j =1
i.e., λ∗ · w̄ = 0 with the above defined vector of integrals w̄. Similar to the previous
applications one can use some theory for the p-Laplace equation to show that the
solution u∗ has to be continuous if the functions wi,j are in L∞ ().
While (6.48) is a nonlinear partial differential equation for p = 2 that is coupled
with linear equalities and the Lagrange multipliers, the case p = 2 leads to a linear
system of equalities. This can be solved as follows. In the first step, we solve for
every (i, j ) the equation
⎧
⎪
⎪ −zi,j = wi,j − 1
w
i,j dx in ,
⎪
⎨ ||
N
M
= λ∗k,l wk,l zi,j dx + λ∗0 w̄i,j = Ui,j
0
.
k=1 l=1
6.3 Minimization in Sobolev Spaces and BV 349
Here we used that ∇zi,j and ∇u∗ are contained in dom ∇ ∗ . We can simplify the
scalar product further: using zi,j dx = 0, we get
1
wk,l zi,j dx = wk,l − wk,l dy zi,j dx = (−zk,l )zi,j dx
||
= ∇zk,l · ∇zi,j dx.
Setting S(i,j ),(k,l) = ∇zk,l · ∇zi,j dx and using the constraint λ∗ · w̄ = 0, we
obtain the finite-dimensional linear system of equations
A
Sλ∗ + w̄λ∗0 = U 0 ,
(6.50)
w̄T λ∗ = 0,
for the Lagrange multipliers. This system has a unique solution (see Exercise 6.31),
and hence λ∗ and λ∗0 can be computed. The identity for −u∗ in (6.48) gives
uniqueness of u∗ up to constant functions, and the constant offset is determined
by λ∗0 , namely
N
M
u∗ = λ∗i,j zi,j + λ∗0 1, (6.51)
i=1 j =1
where 1 denotes the function that is equal to 1 on . In conclusion, the method for
H 1 interpolation reads as follows
1. For all (i, j ) solve Eq. (6.49).
2. Calculate the matrix S(i,j ),(k,l) = ∇zk,l · ∇zi,j dx and the vector w̄i,j =
i,j dx, and solve the linear system (6.50).
w
3. Calculate the solution u∗ by plugging λ∗ and λ∗0 into (6.51).
In practice, the solution of (6.47) and (6.48) is done numerically, i.e., the domain
is discretized accordingly. This allows one to obtain images of arbitrary resolution
from the given image u0 . Figure 6.15 compares the variational interpolation with
the classical methods from Sect. 3.1.1. One notes that variational interpolation deals
favorably with strong edges. Figure 6.16 illustrates the influence on the sampling
operator A. The choice of this operator is especially important if the data U 0
does not exactly match the original image u† . in this case, straight edges are not
reconstructed exactly, but change their shape in dependence of the sampling operator
(most prominently seen for p close to 1).
One possibility to reduce these unwanted effects is to allow for Au = U 0 .
For example, one can replace problem (6.47) by the minimization of the Tikhonov
functional
Au − U 0 22 λ
min + |∇u|p dx
u∈Lq () 2 p
350 6 Variational Methods
ukonstant u∗ , p = 2
ulinear u∗ , p = 1.5
usinc u∗ , p = 1.1
Fig. 6.15 Comparison of classical methods for interpolation and variational interpolation for
eightfold zooming. Left column: the interpolation methods from Sect. 3.1.1 of an image with
128 × 96 pixels to 1024 × 768 pixels. Specifically, we show the result of constant interpolation
(top), piecewise bilinear interpolation (middle), and tensor product interpolation with the sinc
function (bottom). Right column: The minimizers of the variational interpolation method (6.47)
for different exponents p. The used sampling operator A is the perfect low-pass filter. Constant
and bilinear interpolation have problems with the line structures. These are handled satisfactorily,
up to oscillation at the boundary, by usinc , and also u∗ with p = 2 is comparable. Smaller p allows
for sharp and almost perfect reconstruction of the dark lines, but at the expense of smaller details
like the lighter lines between the dark lines
M
for some λ > 0 (with the norm v22 = N i=1 j =1 |vi,j | on the finite dimensional
2
Fig. 6.16 Comparison of different sampling operators A for variational interpolation (6.47). Top
left: Given data U 0 (96 × 96 pixel). Same row: Results of eightfold zooming by averaging over
squares (i.e. kconst = χ]0,1[×]0,1[ ) for different p. Bottom row: Respective results for sampling with
the perfect low-pass filter ksinc (x1 , x2 ) = sinc(x1 − 12 ) sinc x2 − 12 . The operator that averages
over squares favors solutions with “square structures”; the images look “blocky” at some places.
This effect is not present for the perfect lowpass filter but there are some oscillation artifacts due
to the non-locality of the sinc function
The applications in the previous section always used p > 1. One reason for this
m
constraint was that the image space of ∇ m , Lp (, Rd ), is a reflexive Banach space,
and for F convex, lower semicontinuous and coercive, one also could deduce that
F ◦ ∇ m is lower semicontinuous (see Example 6.29). However, the illustrations
indicated that p → 1 leads to interesting effects. On the one hand, edges are more
emphasized, and on the other hand, solutions appear to have more “linear” regions,
352 6 Variational Methods
which is a favorable property for images with homogeneous regions. Hence, the
question whether p = 1 can be used for a Sobolev penalty suggests itself, i.e.
whether H 1,1 () can be used as an image model. Unfortunately, this leads to
problems in the direct method:
Theorem 6.101 (Failure of Lower Semicontinuity for the H 1,1 Semi-norm) Let
⊂ Rd be a domain and q ∈ [1, ∞[. Then the functional ! : Lq () → R∞ given
by
!(u) = |∇u| dx if u ∈ H 1,1().
∞ otherwise,
where ∇ϕn−1 (x) = nd+1 ∇ϕ(nx). Young’s inequality for convolutions and the
identities |Br (0)| = r d |B1 (0)| and (1 − n−1 )d = dk=0 dk (−1)k n−k lead to
|∇un | dx ≤ u dx |∇ϕn−1 | dx = n∇ϕ1 1 dx
{ n−1
n ≤|x|≤ n }
n+1
Rd {1−n−1 ≤|x|≤1}
d
d
d−1
d
≤ Cn 1 − (1 − n−1 )d = Cn (−1)k+1 n−k ≤ C n−k .
k k+1
k=1 k=0
The right-hand side is bounded for n → ∞, and hence lim infn→∞ !(un ) < ∞.
However, u ∈ H 1,1 () cannot be true. If it were, we could test with φ ∈
D(B1 (0)) for i = 1, . . . , d and get
∂φ ∂φ
u dx = dx = ϕνi dx = 0.
∂xi ∂xi ∂
By the fundamental lemma of the calculus of variations we would get ∇u|B1 (0) = 0.
Similarly one could conclude that ∇u|\B1 (0) = 0, i.e., ∇u = 0 almost everywhere
in . This would imply that u is constant in (see Lemma 6.79), a contradiction.
By definition of ! this means !(u) = ∞.
In other words, we have found a sequence un → u, with !(u) >
lim infn→∞ !(un ), and hence ! is not lower semicontinuous. $
#
6.3 Minimization in Sobolev Spaces and BV 353
Fig. 6.17 Illustration of a sequence (un ) in H 1,1 (]0, 1[), which violates the definition property of
1
lower semicontinuity for u → 0 |u | dt in Lq (]0, 1[). If the ramp part of the functions un gets
arbitrarily steep while the total increase remains constant, the derivatives ((un ) ) are a bounded
sequence in L1 (]0, 1[), but the Lq limit u of (un ) is only in Lq (]0, 1[) and not in H 1,1 (]0, 1[).
In particular, ((un ) ) does not converge in L1 (]0, 1[), and one wonders in what sense there is still
some limit
See also Fig. 6.17 for an illustration of a slightly different counterexample for the
lower semicontinuity of the H 1,1 semi-norm in dimension one.
The above result prohibits direct generalizations of Theorems 6.84 and 6.86 to the
case p = 1. If we have a closer look at the proof of lower semicontinuity of F ◦A for
F : Y → R∞ convex, lower semicontinuous, coercive and A : X ⊃ dom A → Y
strongly-weakly closed in Example 6.29 we note that an essential ingredient is
to deduce the existence of a weakly convergent subsequence of (Aun ) from the
boundedness of F (Aun ). However, this very property fails for ∇ as a strongly-
weakly closed operator between Lq () and L1 (, Rd ).
However, we can use this failure as a starting
point to define a functional that is, is
some sense, a generalization of the integral |∇u| dx. More precisely, we replace
L1 (, Rd ) by the space of vector-valued Radon measures M(, Rd ), which is, on
the one hand, the dual space of a separable Banach space: by the Riesz-Markov
theorem (Theorem 2.62) M(, Rd ) = C0 (, Rd )∗ . On the other hand, L1 (, Rd )
is isometrically embedded in M(, Rd ) by the map u → uLd , i.e., uLd M =
u1 for all u ∈ L1 (, Rd ) (see Example 2.60).
Since the norm on M(, Rd ) is convex, weakly* lower-semicontinuous and
coercive, it is natural to define the weak gradient ∇ on a subspace of Lq () with
values in M(, Rd ) and to consider the concatenation · M ◦ ∇. How can this
weak gradient be defined? We simply use the most general notion of derivative
that we have, namely the distributional gradient, and claim that this should have a
representation as a finite vector-valued Radon measure.
Definition 6.102 (Weak Gradient in M(, Rd )/Total Variation) Let ⊂ Rd
be a domain. Some μ ∈ M(, Rd ) is the weak gradient of some u ∈ L1loc () if for
every ϕ ∈ D(, Rd )
u div ϕ dx = − ϕ dμ.
354 6 Variational Methods
If it exists, we denote μ = ∇u and call its norm, denoted by TV(u) = ∇uM , the
total variation of u. If there does not exist a μ ∈ M(, Rd ) such that μ = ∇u, we
define TV(u) = ∞.
It turns out that this definition is useful, and we can deduce several pleasant
properties.
Lemma 6.103 Let be a domain and q ∈ [1, ∞[. Then for u ∈ Lq () one has he
following.
1. If there exists a μ ∈ M(, Rd ) with μ = ∇u as in Definition 6.102, it is unique.
2. It holds ∇u ∈ M(, Rd ) with ∇uM ≤ C if and only if for all ϕ ∈ D(, Rd ),
u div ϕ dx ≤ Cϕ∞ .
In particular, we obtain
TV(u) = sup u div ϕ dx ϕ ∈ D(, Rd ), ϕ∞ ≤ 1 . (6.52)
2. Characteristic functions
Let be a bounded Lipschitz subdomain of and u = χ . Then by the
divergence theorem (Theorem 2.81), one has for ϕ ∈ D(, Rd ) with ϕ∞ ≤ 1
that
u div ϕ dx = div ϕ dx = ϕ · ν dHd−1 .
∂
This means simply that ∇u = −νHd−1 ∂ , and this is a measure in M(, Rd ),
since for every ϕ ∈ C0 (, Rd ), ϕ · ν is Hd−1 integrable on ∂ , and
− ϕ · ν dHd−1 ≤ Hd−1 (∂ )ϕ∞ ,
6.3 Minimization in Sobolev Spaces and BV 355
i.e., ∇u ∈ C0 (, Rd )∗ , and the claim follows from the Riesz-Markov theorem
(Theorem 2.62). We also see that ∇uM ≤ Hd−1 (∂ ) has to hold. In fact, we
even have equality (see Exercise 6.36), i.e. TV(u) = Hd−1 (∂ ).
In other words, the total variation of a characteristic function of Lipschitz sub-
domains equals its perimeter. This motivates the following generalization of the
perimeter for measurable sets ⊂ :
Per( ) = TV(χ ).
In this sense, bounded Lipschitz domains have finite perimeter. The study of
sets with finite perimeter leads to the notion of Caccioppoli sets, which are a
further slight generalization.
3. Piecewise smooth functions
Let u ∈ Lq () be piecewise smooth, i.e., we can write as a union of some
k with mutually disjoint bounded Lipschitz domains 1 , . . . , K , and for every
k = 1, . . . , K one has uk = u|k ∈ C 1 (k ), i.e., u restricted to k can be
extended to a differentiable function uk on k . Let us see, whether the weak
gradient is indeed a Radon measure.
To begin with, it is clear that u ∈ H 1,1 (1 ∪· · ·∪K ) with (∇u)L1 |k = ∇uk .
For ϕ ∈ D(, Rd ) we note that
K
u div ϕ dx = ϕ · u ν dH
k k d−1
− ϕ · ∇uk dx, (6.53)
k=1 ∂k ∩ k
where ν k denotes the outer unit normal to k . We can rewrite this as follows. For
pairs 1 ≤ l < k ≤ K, let l,k = l ∩ k ∩ and ν = ν k on l,k . Then
νk = ν on l,k ∩ k , ν l = −ν on l,k ∩ l ,
k−1
K
ϕ · uk ν k dHd−1 = ϕ · uk ν dHd−1 − ϕ · uk ν dHd−1 .
∂k ∩ l=1 l,k l=k+1 k,l
k−1
K
u div ϕ dx = − ϕ · (ul − uk )ν dHd−1 − ϕ · (∇u)L1 dx,
k=1 l=1 l,k
356 6 Variational Methods
Note that this representation is independent of the order of the k : the product
(ul − uk )ν stays the same if we swap k and l . It is easy to see (Exercise 6.37),
that ∇u is a finite vector-valued Radon measure, and for the norm, one has
TV(u) = ∇uM = (∇u)L1 1 + |ul − uk | dHd−1 .
l<k l,k
Hence, the weak gradient is measured in the L1 norm, and the “jumps” ul −uk of
the function u are integrated along the interfaces l,k ; see Fig. 6.18 for a simple
example.
These considerations show that the total variation, when used as a penalty in
minimization problems, still allows functions with discontinuities. For images,
this is an exceptionally favorable property, for then we may view images, some-
what oversimplified, as smooth within objects k and discontinuous along object
boundaries (i.e., edges) ∂k . We expect that solutions of suitable optimization
problems have exactly these properties.
Prepared with the results of Lemma 6.103, we see that the total variation has
properties relevant for the direct method, that the functional |∇ · | dx does not
have.
Lemma 6.105 Let ∈ Rd be bounded. The space of functions of bounded total
variation
BV() = {u ∈ L1 () ∇u ∈ M(, Rd )}
Fig. 6.18 A piecewise constant function u and its gradient as Radon measure. It holds that ∇u =
σ H1 , where σ is vector field on the set of discontinuity of the function in the left image
(visualized in the right image). The point at the T-junction in has H1 measure zero and ∇u is not
defined there
6.3 Minimization in Sobolev Spaces and BV 357
1
∇(unk − u)M ≤ lim inf ∇(unk − um )M ≤ ,
m→∞ k
Proof We reuse the operators Mn from Theorem 6.74, and show that the sequence
un = Mn u for u ∈ BV() ∩ Lq () has the desired properties. Let us recall their
definition:
K
Mn u = Ttn ηk (ϕk u) ∗ ψ n
k=0
358 6 Variational Methods
for a smooth partition of unity (ϕk ), translation vectors ηk , step sizes tn , and scaled
mollifiers ψ n . By the arguments of Theorem 6.74 we get convergence un → u in
Lq ().
Now we choose w ∈ D(, Rd ) with w∞ ≤ 1 as a test function, and obtain
un div w dx = u(M∗n div w) dx = u div(M∗n w) dx − u(Nn∗ w) dx
= u div(M∗n w) dx − (Nn u) · w dx, (6.54)
where
K
K
M∗n w = ϕk T−tn ηk (w ∗ ψ̄ n ) , Nn∗ w = ∇ϕk · T−tn ηk (w ∗ ψ̄ n ) ,
k=0 k=0
K
|(M∗n w)(x)| ≤ ϕk |w(y)|ψ n (y − x + tn ηk ) dy
k=0
(6.55)
K
≤ w∞ ϕk ψ (y) dy ≤ w∞ ≤ 1.
n
k=0 Rd
K
div(M∗n w − w) (x) = ϕk (x) div w(y) − div w(x) ψ n (y − x + tn ηk ) dy
k=0
K
+ ∇ϕk (x) · w(y) − w(x) ψ n (y − x + tn ηk ) dy.
k=0
6.3 Minimization in Sobolev Spaces and BV 359
Since both w and div w are uniformly continuous in , we can find, for every
ε > 0, a δ > 0 such that |w(x) − w(y)| ≤ ε and |div w(x) − div w(y)| ≤ ε
holds for all x, y ∈ with |x − y| ≤ δ. Since (tn ) converges to zero and supp ψ n
becomes arbitrarily small, we can find some n0 such that for all n ≥ n0 , one has
|x − y + tn ηk | ∈ supp ψ n ⇒ |x − y| ≤ δ. For these n we obtain the estimate
K
K
div(M∗ w − w)(x) ≤ ε ϕ k (x) ψ n (y) dy + ε |∇ϕk (x)| ψ n (y) dy ≤ Cε.
n
k=0 Rd k=0 Rd
The constant C>0 can be chosen independently of n, and hence div M∗n w−
div w∞ → 0 for n → ∞.
Subgoal 3: We easily check that
K
K
K
lim Nn u = lim Ttn ηk (u∇ϕk ) ∗ ψ n = lim Ttn ηk (u∇ϕk ) = u ∇ϕk = 0,
n→∞ n→∞ n→∞
k=0 k=0 k=0
since smoothing with a mollifier converges in L1 () (see Lemma 3.16) and
translation is continuous in L1 () (see Exercise 3.4).
Together with (6.54) our subgoals give the desired convergence |∇un | dx: for
every ε > 0 we can find some n0 such that for n ≥ n0 , Nn u1 ≤ 3ε . For these
n and arbitrary w ∈ D(, Rd ) with w∞ ≤ 1 we get from (6.54) and the total
variation as defined in (6.52) that
un div w dx = u div(M∗n w) dx − (Nn u)w dx
ε
≤ ∇uM + Nn u1 w∞ ≤ ∇uM + .
3
Now we take, for every n, the supremum over all test functions w and get
|∇un | dx ≤ ∇uM + ε.
such that u div w dx ≥ ∇uM − 3 . For this w we can ensure for some n1 and
all n ≥ n1 that
ε
u div(M∗n w − w) dx ≤ u1 div(M∗n w − w)∞ ≤ .
3
360 6 Variational Methods
This implies
un div w dx = u div w dx + u div(M∗n w − w) dx − (Nn u)w dx
ε ε ε
≥ ∇uM − − −
3 3 3
and in particular, by the supremum definition (6.52),
|∇un | dx ≥ ∇uM − ε.
|∇u
Since ε>0 is arbitrary, we have shown the desired convergence limn→∞ n | dx
= ∇uM . $
#
Remark 6.107 The above theorem guarantees, for every u ∈ BV(), the existence
∗
of a sequence (un ) in C ∞ () with un → u in L1 () and ∇un ∇u in M(, Rd ).
By Lemma 6.103, the latter implies only the weaker property
∇uM ≤ lim inf |∇un | dx.
n→∞
uq ≤ lim inf un q ≤ C(lim inf un 1 +lim inf ∇un M ) = C(u1 +∇uM ),
n→∞ n→∞ n→∞
is bounded in H 1,1 (), and there exists, since H 1,1() → L1 () is compact, a
subsequence (v nk ) with limk→∞ v nk = v for some v ∈ L1 (). For the respective
subsequence (unk ), one has by construction that limk→∞ unk = v, which shows that
BV() → L1 () compactly.
For the case 1 < q < d/(d − 1) we recall Young’s inequality for numbers,
which is
∗
ap bp
ab ≤ + ∗, a, b ≥ 0, p ∈ ]1, ∞[.
p p
We choose r ∈ ]q, d/(d − 1)[ and p = (r − 1)/(q − 1), and get that for all a ≥ 0
and δ > 0,
1 r(q−1) − 1 r−q q − 1 r r − q − p∗
a q = δ p a r−1 δ p a r−1 ≤ δa + δ p a,
r −1 r −1
a q ≤ δa r + Cδ a. (6.56)
Now let again (un ) be bounded in BV(), hence also bounded in Lr () with bound
L > 0 on the norm. By the above, we can assume without loss of generality that
εq
un → u in L1 (). For every ε > 0 we choose 0 < δ < 2L r and Cδ > 0 such
that (6.56) holds. Since (un ) is a Cauchy sequence in L1 (), there exists an n0 such
that
εq
|un1 − un2 | dx ≤
2Cδ
for all n1 , n2 ≥ n0 . Consequently, we see that for these n1 and n2 , using (6.56),
|un1 − un2 |q dx ≤ δ |un1 − un2 |r dx + Cδ |un1 − un2 | dx
εq r εq
≤ L + Cδ = εq ,
2Lr 2Cδ
362 6 Variational Methods
and hence (un ) is a Cauchy sequence in Lq () and hence convergent. By the
continuous embedding Lq () → L1 () the limit has to be u.
Assertion 2: We begin the proof for q = 1 with a preliminary remark. If ∇u = 0
for some u ∈ BV() then obviously u ∈ H 1,1 (), and by Lemma 6.79 we see
that u is constant. The rest of the argument can be done similarly to Lemma 6.81.
If
then inequality didn not hold, there would exist a sequence (un ) in BV() with
u dx = 0, u 1 = 1, and ∇u M ≤ n . It particular, we would have
n 1
The notion of trace allows one to formulate and prove a property that distin-
guishes BV() from the spaces H 1,p ().
Theorem 6.111 (Zero Extension of BV Functions) For a bounded Lipschitz
domain ⊂ Rd , the zero extension E : BV() → BV(Rd ) is continuous, and
where ν is the outer unit normal and T u is the trace on ∂ defined in Theo-
rem 6.110.
Corollary 6.112 For a bounded Lipschitz domain ⊂ Rd and a Lipschitz
subdomain ⊂ with ⊂⊂ , for u1 ∈ BV( ), u2 ∈ BV(\ ), and
u = u1 + u2 (with implicit zero extension) one has that u ∈ BV() and
Here we take the trace of u1 on ∂ with respect to , and the trace of u2 on
∂(\ ) with respect to \ .
The following result relates functions of bounded total variation and the perimeter
of their sub-level sets
Theorem 6.113 (Co-area Formula) For a bounded Lipschitz domain ⊂ Rd
and u ∈ BV() it holds that
∇uM = 1 d|∇u| = Per {x ∈ u(x) ≤ t} dt = TV(χ{u≤t} ) dt.
R R
In other words, the total variation is the integral over all perimeters of the sublevel
sets.
The proofs of the previous three theorems can be found, e.g., in [5] and [61].
Using the co-area formula, one sees that for u ∈ BV() and h : R → R strongly
increasing and continuously differentiable with h ∞ < ∞, the scaled versions
h◦u are also contained in BV(). The value TV(h◦u) depends only on the sublevel
sets of u; see Exercise 6.34.
Now we can start to use TV as an image model in variational problems.
Theorem 6.114 (Existence of Solutions with Total Variation Penalty) Let ⊂
Rd be a bounded Lipschitz domain, q ∈ ]1, ∞[, with q ≤ d/(d − 1) and :
Lq () → R∞ proper on BV(), convex, lower semicontinuous on Lq (), and
coercive in "1 , i.e.,
1
u − u dx bounded and u dx → ∞ ⇒ (u) → ∞.
|| q
The assertion is also true if is bounded from below and ϕ is only coercive.
364 6 Variational Methods
Proof One argues similarly to the proof of Theorem 6.84, the respective properties
for the TV functional follow from Lemmas 6.105 and 6.108 as well as Corol-
lary 6.109. $
#
Analogously to Theorem 6.86, we obtain the following (see also [1, 31]):
Theorem 6.115 (Tikhonov Functionals with Total Variation Penalty) For a
bounded Lipschitz domain , q ∈ ]1, ∞[, a Banach space Y , and A ∈ L(Lq (), Y )
one has the following implication: If
1. q ≤ d/(d − 1) and A does not vanish for constant functions or
2. A is injective and rg(A) closed,
then there exists for every u0 ∈ Y , r ∈ [1, ∞[, and λ > 0 a solution u∗ of the
problem
Au − u0 rY
min + λ TV(u). (6.57)
q
u∈L () r
∂ TV = ∇ ∗ ◦ ∂ · M ◦ ∇.
Since v ∈ L1|μ| (, Rd ) was arbitrary and |μ| is finite, we can identify σ by duality
with an element in (σ )|μ| ∈ L∞ |μ| (, R ) with (σ )|μ| ∞ ≤ 1. By the polar
d
∞
decomposition μ = σμ |μ|, σμ ∈ L|μ| (, Rd ) (see Theorem 2.58) we obtain
σ, μM∗ ×M = μM ⇐⇒ (σ )|μ| · σμ d|μ| = 1 d|μ|.
Since (σ )|μ| ∞ ≤ 1, it follows that 0 ≤ 1 − (σ )|μ| · σμ |μ| almost everywhere, and
hence the latter is equivalent to (σ )|μ| · σμ = 1 |μ|-almost everywhere and hence,
to (σ )|μ| = σμ |μ|-almost everywhere (by the Cauchy-Schwarz inequality).
We rewrite the result more compactly: since the meaning is clear from context,
μ
we write σ instead of (σ )|μ| , and also we set σμ = |μ| , since σμ is the |μ|
almost everywhere uniquely defined density of μ with respect to |μ|. This gives
the characterization
μ
∂ · M (μ) = σ ∈ M(, Rd )∗ σ M∗ ≤ 1, σ = |μ| almost everywhere .
|μ|
(6.59)
The right-hand side canbe estimated by the Lq norm only if σ · ν = 0 on ∂. If this
is the case, then μ → σ dμ induces a continuous linear map σ̄ on M(, Rd ),
which satisfies for every u ∈ C ∞ () that
Since test functions u are dense in Lq () (see Theorem 6.74), σ̄ ∈ dom ∇ ∗ has
to hold. In some sense, dom ∇ ∗ contains only the elements for which the normal
366 6 Variational Methods
trace vanishes on the boundary. Since we will use another approach later, we will
not try to formulate a normal trace for certain elements in M(, Rd )∗ , but for the
time being, use the slightly sloppy characterization
∗
∇ ∗ = − div, dom ∇ ∗ = {σ ∈ M(, Rd )∗ div σ ∈ Lq (), σ · ν = 0 on ∂}.
Collecting the previous results, we can describe the subgradient of the total
variation as
∇u
∂ TV(u) = − div σ σ M∗ ≤ 1, σ · ν = 0 on ∂, σ = |∇u| almost everywhere .
|∇u|
(6.60)
Unfortunately, some objects in the representation are not simple to deal with. As
already noted, the space M(, Rd )∗ does not have a characterization as a function
space (there are attempts to describe the biduals of C (K) for compact Hausdorff
spaces K [82], though). Also, we would like to have a better understanding of the
divergence operator on M(, Rd )∗ , especially under what circumstances one can
speak of a vanishing normal trace on the boundary. Therefore, we present a different
approach to characterizing ∂TV, which does not use the dual space of the space of
vector-valued Radon measures but only regular functions.
At the core of the approach lies the following normed space, which can be seen
as a generalization of the sets defined in (6.37) for p = 1:
∗
Ddiv,∞ = σ ∈ L∞ (, Rd ) ∃ div σ ∈ Lq (Rd ) and a sequence (σ n ) in D(, Rd )
with lim σ n − σ q ∗ + div(σ n − σ )q ∗ = 0 ,
n→∞
σ div,∞ = σ ∞ + div σ q ∗ .
(6.61)
We omit the dependence on q also for these spaces. In some sense, the elements of
Ddiv,∞ satisfy σ · ν = 0 on ∂, cf. Remark 6.89. Let us analyze this space further;
we first prove its completeness.
Lemma 6.116 Let ⊂ Rd be a bounded Lipschitz domain and q ∈ ]1, ∞[. Then
Ddiv,∞ according to the definition in (6.61) is a Banach space.
Proof For a Cauchy sequence (σ n ) in Ddiv,∞ we have the convergence σ n → σ
∗
in L∞ (, Rd ) as well as div σ n → w in Lq (). By the closedness of the weak
divergence we have w = div σ , hence it remains to show the needed approximation
property from the definition (6.61). To that end, choose for every n a sequence (σ n,k )
of test functions (i.e., in D(, Rd )), that approximate σ n in the sense of (6.61). Now,
for every n there is a kn such that
1
σ n,kn − σ n q ∗ + div(σ n,kn − σ n )q ∗ ≤ .
n
6.3 Minimization in Sobolev Spaces and BV 367
The Banach space Ddiv,∞ is well suited for the description of the subgradient of
TV, as the following lemma will show.
Lemma 6.118 (Ddiv,∞ -Vector Fields and ∂ TV) For ⊂ Rd a bounded Lipschitz
∗
domain, q ∈ ]1, ∞[ and u ∈ BV() ∩ Lq (), we have that w ∈ Lq () lies in
∂ TV(u) if and only if there exists σ ∈ Ddiv,∞ such that
σ ∞ ≤ 1, − div σ = w and − u div σ dx = TV(u).
368 6 Variational Methods
there exists σ ∈ Ddiv,∞ with
vw dx ≤ TV(v) for all v ∈ BV() ∩ Lq () ⇐⇒
w = − div σ and σ ∞ ≤ 1.
This assertion is true if and only if equality holds for the sets K1 and K2 defined by
q∗
K1 = w ∈ L () vw dx ≤ TV(v) for all v ∈ BV() ∩ Lq () ,
K2 = {− div σ σ ∈ Ddiv,∞ with σ ∞ ≤ 1}.
σ ∞ ≤ lim inf σ n ∞ ≤ 1.
n→∞
Using the same argument as in the end of the proof of Proposition 6.88, weak
convergence can be replaced by strong convergence, possibly yielding a different
sequence. Finally, this implies σ ∈ Ddiv,∞ , and thus K2 is closed.
Let us now show that K2 ⊂ K1 : For w = − div σ ∈ K2 , according to
Remark 6.117, there exists a sequence (σ n ) in D(, Rd ) with σ n ∞ ≤ σ ∞ ≤ 1
∗
and limn→∞ − div σ n = − div σ = w in Lq (). Using the supremum definition
of TV (6.52), this implies for arbitrary v ∈ BV() ∩ Lq () that
vw dx = lim − v div σ n dx ≤ TV(v),
n→∞
Tuν : Ddiv,∞ → L∞
|∇u| () with Tu σ ∞ ≤ σ ∞ ,
ν
∇u
Tuν σ = σ · in L∞
|∇u| ()
|∇u|
∇u
where |∇u| is the sign of the polar decomposition of ∇u. Furthermore, Tuν is weakly
continuous in the sense that
∗
⎫
σn σ in Lq (, Rd )⎬ ∗
&⇒ Tuν σ n Tuν σ in L∞|∇u| ().
n
div σ div σ in L () q∗ ⎭
∗
where the last equality is due to the convergence in Lq () and u ∈ Lq ().
Additionally, ϕσ n ∈ D(, Rd ) with div(ϕσ n ) = ϕ div σ n + ∇ϕ · σ n . Due to
u ∈ BV() and the characterization of TV in (6.52), this implies
|L(ϕ)| = lim u div(ϕσ n ) dx = lim ϕσ n d∇u
n→∞ n→∞
≤ lim inf σ n ∞ |ϕ| d|∇u| ≤ σ ∞ ϕ1 ,
n→∞
where the latter norm is taken in L1|∇u| (). Note that the set D() ⊂ C ∞ () is
densely contained in this space (cf. Exercise 6.35). Therefore, L can be uniquely
extended to an element in Tuν σ ∈ L1|∇u| ()∗ = L∞|∇u| () with Tu σ ∞ ≤ σ ∞ ,
ν
and hence the linear mapping Tuν : Ddiv,∞ → L∞ |∇u| () is continuous.
6.3 Minimization in Sobolev Spaces and BV 371
For σ ∈ D(, Rd ), we can choose σ n = σ , and the construction yields for all
ϕ ∈ C ∞ ()
∇u
(Tuν σ )ϕ d|∇u| = L(ϕ) = − u div(ϕσ ) dx = ϕ σ· d|∇u|.
|∇u|
Since the test functions are dense in L1|∇u| (), this implies the identity Tσν u =
∇u
σ · |∇u| in L∞
|∇u| ().
In order to establish the weak continuity, let (σ n ) and σ in Ddiv,∞ be given as in
the assertion. For ϕ ∈ C ∞ (), we infer, due to the construction as well as the weak
convergence of (σ n ) and (div σ n ), that
lim (Tuν σ n )ϕ d|∇u| = lim − u(ϕ div σ n + ∇ϕ · σ n ) dx
n→∞ n→∞
=− u(ϕ div σ + ∇ϕ · σ ) dx = (Tuν σ )ϕ d|∇u|.
∗
Resultingly, Tuν σ n Tuν σ converges in L∞
|∇u| (); again due to the density of test
functions. $
#
Remark 6.120 (Tuν as Normal Trace Operator) The mapping Tuν is the unique
∇u
element in L(Ddiv,∞ , L∞ |∇u| ()) that satisfies Tu σ = σ · |∇u| for all σ ∈ D(, R )
ν d
and that exhibits the weak continuity described in Lemma 6.119. (This fact is
implied by the approximation property specified in the definition of Ddiv,∞ .) If we
∇u
view − |∇u| as a vector field of outer normals with respect to the level-sets of u, we
can also interpret Tuν as the normal trace operator.
∇u
Thus, we write σ · |∇u| = Tuν σ for σ ∈ Ddiv,∞ . In particular, due to the weak
continuity, the following generalization of the divergence theorem holds:
∇u
u ∈ BV(), σ ∈ Ddiv,∞ : − u div σ dx = σ· d|∇u|.
|∇u|
(6.64)
The assertions of Lemmas 6.118 and 6.119 are the crucial ingredients for the
desired characterization of the subdifferential of the total variation.
Theorem 6.121 (Characterization of ∂ TV with Normal Trace) Let ⊂ Rd be
a bounded Lipschitz domain, q ∈ ]1, ∞[, q ≤ d/(d − 1), and u ∈ BV(). Then for
∗
w ∈ Lq (), one has the equivalence
⎧
⎪ σ ∞ ≤ 1,
⎪
⎨
w ∈ ∂ TV(u) ⇐⇒ there exists σ ∈ Ddiv,∞ with − div σ = w,
⎪
⎪
⎩ ∇u
σ · |∇u| = 1,
372 6 Variational Methods
∇u
where σ · |∇u| = 1 represents the |∇u|-almost everywhere identity for the normal
trace of σ .
In particular, using the alternative notation (6.62), we have that
∇u
∂ TV(u) = − div σ σ ∞ ≤ 1, σ · ν = 0 on ∂ and σ · = 1 |∇u|-almost everywhere ,
|∇u|
∗
with σ ∈ L∞ (, Rd ) and div σ ∈ Lq ().
Proof In view of the assertion in Lemma 6.118, we first consider an arbitrary σ ∈
Ddiv,∞ with σ ∞ ≤ 1. For this σ , there exists,
according
to Lemma 6.119, the
∇u
normal trace Tuν σ = σ · |∇u| ∈ L∞ () with σ · ∇u (x) ≤ 1 for |∇u|-almost all
|∇u| |∇u|
x ∈ . Together with (6.64), this implies
∇u
− u div σ dx = 1 d|∇u| ⇐⇒ 1− σ · d|∇u| = 0,
|∇u|
and since the integrand on the right-hand side is |∇u|-almost everywhere nonposi-
∇u
tive, we infer the equivalence to σ · |∇u| = 1 |∇u|-almost everywhere.
According to Lemma 6.118, w ∈ ∂ TV(u) if and only if there exists σ ∈ Ddiv,∞
with σ ∞ ≤ 1 and − div σ = w such that
− u div σ dx = 1 d|∇u|.
of σ n ∈ L∞
|∇u| (, R ) is not greater than 1. Furthermore, due to 1 − |σ | ≥ 0
d n 2
1 n ∇u 2 1 ∇u 1
σ − = |σ n |2 − σ n · +
2 |∇u| 2 |∇u| 2
1 1 n2 ∇u 1 1 ∇u
≤ + |σ | − σ n · + − |σ n |2 = 1 − σ n · .
2 2 |∇u| 2 2 |∇u|
6.3 Minimization in Sobolev Spaces and BV 373
The weak continuity of the normal trace now implies the convergence
n ∇u 2 ∇u
lim σ − d|∇u| ≤ lim 2 1 − σ n · d|∇u| = 0,
n→∞ |∇u| n→∞ |∇u|
∇u
i.e., we have limn→∞ σ n = |∇u| in L2|∇u| (, Rd ) and, due to the finiteness of the
measure |∇u|, also in L1|∇u| (, Rd ).
We now say that σ ∈ Ddiv,∞ has a (full) trace Tud σ ∈ L∞ d
|∇u| (, R ) if
and only if for every approximating sequence (σ ) as above, one has σ →
n n
be continuous.
Using this notion of a trace and writing, slightly abusing notation, σ = Tud σ , we
can express ∂ TV(u) by
∇u
∂ TV(u) = − div σ σ ∞ ≤ 1, σ · ν = 0 on ∂, σ = |∇u|-almost everywhere ,
|∇u|
∗
where throughout, we assume σ ∈ L∞ (, Rd ), div σ ∈ Lq (), and the existence
of the full trace of σ in L∞ d
|∇u| (, R ).
We now have three equivalent characterizations of the subgradient of the total
variation at hand. In order to distinguish them, let us summarize them here again.
∇u
Recall that for u ∈ BV() ∩ Lq (), |∇u| ∈ L∞ d
|∇u| (, R ) denotes the unique
∇u
element of the polar decomposition of the Radon measure ∇u, i.e., ∇u = |∇u| |∇u|.
∇u
σ · |∇u| = Tuν σ with the continuous normal trace operator Tuν : Ddiv,∞ →
L∞
|∇u| () according to Lemma 6.119.
3. Ddiv,∞ trace representation: (Remark 6.122)
∇u
∂ TV(u) = − div σ σ ∞ ≤ 1, σ · ν = 0 on ∂, σ = |∇u|-a. e. .
|∇u|
Remark 6.123 (∂ TV and the Mean Curvature) For u ∈ H 1,1(), one has
(∇u)M = (∇u)L1 Ld , which implies that |(∇u)M |-almost everywhere is equivalent
to (Lebesgue)-almost everywhere in {(∇u)L1 = 0}. Furthermore, we have the
agreement of the sign
(∇u) (∇u)L1 (x)
M
(x) = for almost all x ∈ {(∇u)L1 = 0}.
|(∇u)M | |(∇u)L1 (x)|
Let us further note that the trace Tud σ (cf. Remark 6.122) exists for every σ ∈
Ddiv,∞ . This implies, writing ∇u = (∇u)L1 , that
∇u
∂TV(u) = − div σ σ ∞ ≤ 1, σ · ν = 0 on ∂, σ = a.e. in {∇u = 0} .
|∇u|
κ = −∂ TV(u).
for a solution to
1
min |u − u0 |q dx + λ TV(u). (6.65)
q
u∈L () q
are satisfied. Writing κ ∗ = div σ ∗ and interpreting this as the mean curvature of the
level-sets of u∗ , we can consider u∗ as the solution of the equation
Remark 6.125 The total variation penalty term was presented in the form of the
denoising problem with quadratic data term for the first time in [121]. For that
reason, the cost functional in (6.65) is also named after the authors as the Rudin-
Osher-Fatemi functional. Since then, TV has become one of the standard models in
image processing.
By means of Eq. (6.66) or rather its interpretation as an equation for the mean cur-
vature, we can gain a qualitative understanding of the solutions of problem (6.65).
For this purpose, we first derive a maximum principle again.
Lemma 6.126 (Maximum Principle for Lq -TV Denoising) If in the situation of
Example 6.124 L ≤ u0 ≤ R almost everywhere in for some L, R ∈ R, then the
solution u∗ of (6.65) also satisfies L ≤ u∗ ≤ R almost everywhere in .
Proof Let (un ) be a sequence in C ∞ () such that un → u∗ in Lq () as well as
TV(un ) → TV(u∗ ). According to Lemma 6.106, such a sequence exists; without
loss of generality, we can assume that un → u∗ even holds pointwise almost
everywhere in (by applying the theorem of Fischer-Riesz, Proposition 2.48).
As seen in the proof of the maximum principle in Proposition 6.95, we set
v n = min R, max(L, un ) as well as v ∗ = min R, max(L, u∗ ) . Analogously
to that situation, we see that one always has |v n − u0 | ≤ |un − u0 | almost
everywhere in . Together with the pointwise almost everywhere convergence and
376 6 Variational Methods
Furthermore, due to the chain rule for Sobolev functions (Lemma 6.75), we have
v n ∈ H 1,1 () with ∇v n = ∇un on {∇un = 0} and ∇v n = 0 otherwise, i.e.,
TV(v n ) = |∇v n | dx ≤ |∇un | dx = TV(un ).
If F denotes the cost functional in (6.65), then the choice of (un ), the properties of
(v n ), and the lower semicontinuity of F in Lq () imply
∗ 1
F (v ) ≤ lim inf F (v ) ≤
n
|u∗ − u0 |q dx + lim inf λ TV(un ) = F (u∗ ).
n→∞ q n→∞
Therefore,
v ∗ is a minimizer and due to uniqueness, we have u∗ =
min R, max(L, u∗ ) , which proves the assertion. $
#
Like the method in Application 6.94, the variational denoising presented in
Example 6.124 yields in particular a solution u∗ ∈ L∞ () if u0 ∈ L∞ (). In
this case, we also obtain |u0 − u∗ |q−2 (u0 − u∗ ) ∈ L∞ () and since the Euler-
Lagrange equation (6.66) holds, we have κ ∗ = div σ ∗ ∈ L∞ (). Interpreting κ ∗
as the mean curvature, we conclude that the mean curvatures of the level-sets of u∗
therefore have to be essentially bounded. This condition still allows u∗ to exhibit
discontinuities (in contrast to the solutions associated with the Sobolev penalty term
in (6.39)), but corners and objects with high curvature cannot be reproduced.
Figure 6.19 shows some numerical examples of this method. We can see that it
yields very good results for piecewise constant functions, especially in terms of the
reconstruction of the object boundaries while simultaneously removing the noise. If
this condition is violated, which usually happens with natural images, artifacts arise
in many cases, which let the result appear “blocky” or “staircased.” In this context,
this is again referred to as “staircasing artifacts” or the “staircasing effect.”
The effect of the regularization parameter λ can exemplarily be observed in
Fig. 6.20 for q = 2: for higher values, also the size of the details, that are not
reconstructed in the smoothed images anymore, increases. Relevant edges, however,
are preserved. An undesired effect, which appears for large λ, is a reduction of
contrast. One can show that this is a consequence of the quadratic error term and
that it is possible to circumvent it by a transition to the L1 -norm (L1 -TV, cf.
[36, 55]). Solutions of this problem even satisfy a variant of the gray value scaling
invariance [GSI] of Chap. 5: for suitable strictly increasing h : R → R and for a
minimizer u∗ of the L1 -TV problem for the data u0 , the scaled version h ◦ u∗ is a
minimizer for the scaled data h ◦ u0 ; see Exercise 6.38 for more details.
6.3 Minimization in Sobolev Spaces and BV 377
Fig. 6.19 Denoising with total variation penalty. Top: Left a noisy natural image (see Fig. 6.11
for the original), right the solution of (6.65). Bottom: The noisy version of a piecewise constant
artificial image (original in Fig. 5.7), right the result of TV denoising. In both cases q = 2 as
well as λ chosen to optimize PSNR. For the natural image we see effects similar to Fig. 6.11 for
p = 1.1. The reconstruction for the artificial image is particularly good: The original has small
total variation and hence, fits exactly to the modeling assumptions for (6.65)
Fig. 6.20 Effect of L2 -TV denoising with varying regularization parameter. Top left: Original.
Second row: Solutions u∗ for different λ. Third row: The difference images (u∗ − u0 )/λ with
contours of the level sets of u∗ . In both rows λ = 0.1, λ = 0.3, and λ = 0.9 have been used. The
Euler-Lagrange equation (6.66) states that the difference images coincide with the curvature of the
level sets, and indeed, this can be seen for the depicted contours
an optimal solution of (6.67) if and only if there exists σ ∗ ∈ Ddiv,∞ such that
⎧ ∗
⎪
⎪ |u ∗ k − u0 |q−2 (u∗ ∗ k − u0 ) ∗ k̄ = λ div σ ∗ in ,
⎪
⎪
⎨
σ∗ · ν = 0 on ∂,
⎪
⎪
⎪
⎪ ∇u∗
⎩ σ ∗ ∞ ≤ 1 and σ∗ = |∇u∗ | almost everywhere,
|∇u∗ |
(6.68)
where k̄ = D− id k.
6.3 Minimization in Sobolev Spaces and BV 379
Fig. 6.21 Deconvolution with total variation penalty. Top: Left the convolved and noisy data u0
from Fig. 6.13, right a minimizer u∗ of the functional (6.67) for q = 2. Bottom: Left convolved and
noisy data (comparable to u0 in Fig. 6.3), right the respective deconvolved version. The parameter
λ has been optimized for maximal PSNR
Qualitatively, the results are comparable with those of the denoising problem
with total variation; in particular one can see that the mean curvature is essentially
bounded if k ∈ Lq (0 ) (cf. the arguments in Application 6.97). Numerical results
for this method are reported in Fig. 6.21. Despite the noise, a considerably sharper
version of the noisy image could be reconstructed. However, details smaller than a
certain size have been lost. Moreover, the staircasing effect is a little bit stronger
than in Example 6.124 and leads to a blocky appearance which is typical for total
variation methods.
Example 6.128 (Total Variation Inpainting) Let ⊂ Rd be a bounded Lipschitz
domain and with ⊂⊂ a Lipschitz subdomain on which the “true image” u† :
→ R is to be reconstructed. Moreover, we assume that u† |\ ∈ BV(\ ),
and so the zero extension u0 of u† |\ is, by Theorem 6.111, in BV(). As image
380 6 Variational Methods
for some given q ∈ ]1, ∞[ with q ≤ d/(d − 1). Using Theorem 6.114 and the same
techniques as in Application 6.98, we can prove the existence of such a u∗ ; however,
we do not know whether it is unique, since TV is not strictly convex.
For the optimality conditions we analyze the total variation functional on the set
K = {v ∈ Lq () v = u0 almost everywhere on \ }.
rule for subgradients can be applied in this situation (see Exercise 6.14). If we
denote by A the mapping u → uχ , we have that rg(A) = X2 , and according
to Exercise 6.15, we can apply the chain rule for subgradients to
F1 = TV ◦Tu0 ◦ A
where u0 |∂(\ ) is the trace of u0 on ∂ with respect to \ and u|∂ is the
trace on ∂ with respect to . Since div σ is arbitrary on \ , σ plays no role
there, and we can modify the condition for the trace of σ accordingly to
⎧
⎪ ∇(u| )
⎪
⎪ σ = |∇(u| )| almost everywhere,
⎪
⎨ |∇(u| )|
⎪
⎪ σ =ν on {u|∂ < u0 |∂(\ ) },
⎪
⎪
⎩
σ = −ν on {u|∂ > u0 |∂(\ ) },
In we can see div σ ∗ as the mean curvature of the level sets of u∗ , and we see
that the optimality conditions tell us that it has to vanish there. Hence, by definition,
every level set is a so-called minimal surface, and this notion underlies a whole
theory for BV functions; see, e.g., [65]. These minimal surfaces are connected to
the level sets of u0 on the boundary ∂ where the traces of u∗ |∂ and u0 |∂(\ )
coincide. In this case we get the impression that u∗ indeed connects the boundaries
of objects. However, it can happen that u∗ jumps at some parts of ∂ . According
to (6.71) this can happen only under special circumstances, but still, in this case
some level sets “end” at that point, and this gives results in which some objects
appear to be “cut off.”
In the case of images, i.e., for d = 2, we even have that vanishing mean
curvature means that TV inpainting connects object boundaries by straight lines
(see Exercise 6.40). In fact, this can be seen in Figs. 6.22 and 6.23.
382 6 Variational Methods
Fig. 6.22 Illustration of total variation inpainting. Left: Given data; the black region is the region
which is to be restored (see also Fig. 6.14). Right: The solution of the minimization problem (6.69).
Edges of larger objects are reconstructed well, but finer structures are disconnected more often
Fig. 6.23 TV inpainting connects level-sets with straight lines. Top row, left: An artificial image
with marked inpainting domains of increasing width. Right and below: Solutions u∗ of the
inpainting problem (6.69) for these regions. One clearly sees that some level sets are connected by
straight lines. At the points on the boundary ∂ where this does not happen, which are the points
where the solutions jump, such a connection would increase the total variation
In conclusion, we note that the TV model has favorable properties for the
reconstruction of image data, but the solutions of TV inpainting problems like (6.69)
may jump at the boundary of the inpainting domain, and hence the inpainting
domain may still be visible after inpainting. Moreover, object boundaries can be
connected only by straight lines which is not necessarily a good fit for the rest of the
object’s shape.
6.3 Minimization in Sobolev Spaces and BV 383
We also note that the solutions of (6.69) obey a maximum principle similar to
inpainting with Sobolev semi-norm (Application 6.98). Moreover, a variant of gray
value scaling invariance [GSI] from Chap. 5 is satisfied, similar to L1 -TV denoising
(Exercise 6.39).
Example 6.129 (Interpolation with Minimal Total Variation) We can also use the
TV functional in the context of Application 6.100. We recall the task: for a discrete
image U 0 ∈ RN×M we want to find a continuous u∗ : → R with = ]0, N[ ×
]0, M[ such that u∗ is an interpolation of the data U 0 . For a linear, continuous and
surjective sampling operator A : Lq () → RN×M with q ∈ ]1, 2] we require that
Au∗ = U 0 holds. Moreover, u∗ should correspond to some image model, in this
case the total variation model. This leads to the minimization problem
Hence, optimal solutions u∗ satisfy an equation for the mean curvature that also has
to lie in the subspace spanned by the {wi,j }. If wi,j ∈ L∞ () for all i, j , then the
mean curvature of the level sets of u∗ has to be essentially bounded.
The actual form of solutions depends on the choice of the functions wi,j , i.e.,
on the sampling operator A; see Fig. 6.24 for some numerical examples. If one
chooses the mean value over squares, i.e. wi,j = χ]i−1,i[×]j −1,j [ , the level sets
necessarily have constant curvature on these squares. The curvature is determined
by λ∗i,j . The level sets of u∗ on ]i − 1, i[ × ]j − 1, j[ are, in the case of λ∗i,j = 0,
line segments (similar to Example 6.128), and are segments of circles in other cases
384 6 Variational Methods
Fig. 6.24 Examples of TV interpolation with perfect low-pass filter. Outer left and right: Original
images U 0 . Middle: Solutions u∗ of the TV interpolation problem with eightfold magnification.
All images are shown with the same resolution
Fig. 6.25 TV interpolation with mean values over image valued leads to solutions with piecewise
constant curvature of the level sets. Left: Original images U 0 (9 × 9 pixels). Middle: Solutions
u∗ of the respective TV interpolation problem with 60-fold magnification. Right: Level sets of u∗
together with a grid of the original image
(see Exercise 6.40). Hence, TV interpolation is well suited for images that fulfill
these characteristics. For images with a more complex geometry, however, we may
still hope that the geometry is well approximated by these segments. On the other
hand, it may also happen that straight lines are interpolated by segments of varying
curvature. This effect is most strongly if the given Data U 0 only fits loosely to the
unknown “true” image; see Fig. 6.25.
6.3 Minimization in Sobolev Spaces and BV 385
In the following we will briefly show how one can extend the introduced variational
methods to color images. We recall: depending on the choice of the color space, a
color image has N components, and hence can be modeled as a function u : →
RN . We discussed some effects of the choice of the color space, may it be RGB or
HSV, already in Chap. 5. We could also note that methods that couple these color
components in some way, usually lead to better results than methods without such
coupling. Hence, we focus on the development of methods that couple the color
components; moreover, we restrict ourselves to the RBG color space.
As an example, let us discuss the denoising problem with Sobolev semi-norm
or total variation from Application 6.94 and Example 6.124, respectively. Let u0 ∈
Lq (, RN ), N ≥ 1, be a given noisy color image. The result of the variational
problem (6.39) applied to all color channels separately amounts to the solution of
⎧
⎪
N
⎪
⎪
λ
|∇u | p
N ⎪
⎨p i dx if p > 1,
1
min |ui − u0i |q dx + i=1
u∈Lq (,RN ) q ⎪
⎪
N
i=1 ⎪
⎪ if p = 1,
⎩λ TV(ui )
i=1
(6.74)
respectively. To couple the color channels, we can choose different vector norms
in RN for the data term and different matrix norms in RN×d for the penalty term.
We want to do this in a way such that the channels do not separate, i.e., such that
the terms are not both sums over the contributions of the channels i = 1, . . . , N.
We focus on the pointwise matrix norm for ∇u, since one can see the influence of
the norms more easily in this case, and choose the usual pointwise Euclidean vector
norm on Lq (, RN ):
N q 1
N 1
2 q 2
1≤q<∞: uq = |ui (x)|2 dx , u∞ = ess sup |ui (x)|2 .
i=1 x∈ i=1
The analogous choice for the matrix norm, i.e., the sum of the squares of
entries, seems suitable; this amounts to the so-called Frobenius norm |∇u(x)|2F =
d
|∇u(x)|2 = N i=1 j =1 |∂xj ui (x)| , i.e., for 1 ≤ p < ∞
2
d
N 2 p 1 d
N 2 1
∂ui 2 p ∂ui 2
∇up = (x) dx , ∇u∞ = ess sup (x) .
∂xj x∈ ∂xj
i=1 j =1 i=1 j =1
with K = min (d, N) and (η ⊗ ξ )i,j = ηi ξj (see, e.g., [102]). The values σk (x) are
uniquely determined up to reordering. In the case N = 1 one such decomposition
∇u
is σ1 (x) = |∇u(x)|, ξ1 (x) = |∇u| (x) and η1 (x) = 1. For N > 1 we can interpret
ξk (x) as a “generalized” normal direction for which the color u(x) changes in the
direction ηk (x) at the rate σk (x). If σk (x) is large (or small, respectively), then
the color changes in the direction ξk (x) a lot (or only slightly, respectively). In
particular, maxk=1,...,K σk (x) quantifies the intensity of the largest change in color.
The way to define a suitable matrix norm for ∇u is, to use this intensity as a norm:
This is indeed a matrix norm on RN×d , the so-called spectral norm. It coincides
with the operator norm of ∇u(x) as a linear mapping from Rd to RN . To see this,
note that for z ∈ Rd with |z| ≤ 1 we get by orthonormality of ηk (x) and ξk (x), the
Pythagorean theorem, and Parseval’s identity that
K
2
K
|∇u(x)z|2 = σk (x)2 ξk (x) · z ≤ ∇u(x)2spec ξk (x) · z2 ≤ |∇u(x)|2 .
spec
k=1 k=1
This allows us to define respective Sobolev semi-norms and total variation: For
1 ≤ p < ∞ let
1
p p
∇up,spec = |∇u(x)|spec dx , u∞,spec = ess sup |∇u(x)|spec ,
x∈
as well as
TVspec (u) = sup u · div v dx v ∈ D(, RN×d ), v∞,spec ≤ 1 . (6.77)
Fig. 6.26 Illustration of variational L2 -TV denoising of color images. Top: The original u† with
some marked details. Middle: Left the images with additive chromatic noise u0 (PSNR(u0 , u† ) =
16.48 dB), right the solution u∗sep with separable penalty term (6.74) (PSNR(u∗sep , u† ) = 27.84 dB).
Bottom: Left the solution u∗ for the pointwise Frobenius matrix norm (6.76) (PSNR(u∗ , u† ) =
28.46 dB), right the solution u∗spec for the pointwise spectral norm (6.78) (PSNR(u∗spec , u† ) =
28.53 dB)
6.3 Minimization in Sobolev Spaces and BV 389
u†
Fig. 6.27 Solution of the variational deconvolution problem with monochromatic noise and
blurred color data. Top: Original image u† . Bottom, left to right: the convolution kernal k, the
given data u0 , and the reconstruction u∗ obtained by L2 -TV deconvolution (with Frobenius matrix
norm)
Fig. 6.28 Inpainting of color images with different models. From left to right: Given data with
inpainting region, solutions for H 1 , TV, and TVspec penalty, respectively
390 6 Variational Methods
Fig. 6.29 Reconstruction of a color image from information along edges. Top: Left original u†
with two highlighted regions, right the given data u0 along edges and the inpainting region. Bottom:
Reconstruction with H 1 inpainting (left) and TV inpainting (with Frobenius matrix norm, right)
which is characteristic for total variation method and which we have seen for gray
images; cf. Fig. 6.23.
The use of the total variation with Frobenius matrix norm for the inpainting of
mostly homogeneous regions has some advantages over H 1 inpainting (which is
separable). In general, the reconstruction of edges is better and the “cropping effect”
is not as strong as in the case of scalar gray-valued TV inpainting; see Fig. 6.29.
For the sake of completeness, we mention that similar effects can be observed
for variational interpolation. Figure 6.30 shows a numerical example in which the
TVspec penalty leads to sharper color transitions than the TV penalty with Frobenius
matrix norm.
6.4 Numerical Methods 391
Fig. 6.30 Interpolation for color images. Top: Left the color image U 0 to be interpolated, right
the sinc interpolation (zoom factor 4). Bottom: Solution of the TV interpolation problem (left) and
the TVspec interpolation problem (right) for fourfold zooming
Our ultimate goal is, as it was in Chap. 5, where we developed methods based
on partial differential equations, to apply the method to concrete images. Since
our variational methods are genuinely minimization problems for functions on
continuous domains, we are faced with the question of appropriate discretization,
but we also need numerical methods to solve the respective optimization problems.
There exists a vast body of work on this topic, some of it developing methods for
special problems in variational imaging while other research develops fairly abstract
optimization concepts. In this section we will mainly introduce tools that allow us
to solve the variational problems we developed in this chapter. Our focus is more on
broad applicability of the tools than on best performance, high speed or efficiency.
However, these latter aspects should not be neglected, but we refer to the original
literature on these topics.
Let us start with the problem to find a solution for the convex minimization
problem
min F (u)
u∈X
A first very simple idea for solving Euler-Lagrange equations for some F is
motivated by the fact that these equations are often partial differential equations
of the form
− G x, u(x), ∇u(x), ∇ 2 u(x) = 0 in (6.79)
∂u(t, x)
= G x, u(t, x), ∇u(t, x), ∇ 2 u(t, x) in ]0, ∞[ × , u(0, x) = f (x) in
∂t
(6.80)
from Application 6.94 is a nonlinear elliptic equation. The instationary version with
initial value f = u0 reads
⎧
⎪ ∂u
⎪
⎪ − div |∇u|p−2 ∇u = |u0 − u|q−2 (u0 − u) in ]0, ∞[ × ,
⎪
⎨ ∂t
⎪ |∇u|p−2 ∇u · ν = 0 on ]0, ∞[ × ∂,
⎪
⎪
⎪
⎩
u(0, · ) = u0 in ,
and is a nonlinear diffusion equation. The methods from Sect. 5.4.1 allow a
numerical solution as follows. We choose a spatial stepsize h > 0, a time stepsize
τ > 0, denote by U n the discrete solution at time nτ (and with U 0 the discrete and
noisy data), and we discretize the diffusion coefficient |∇u|p−2 by
|∇U | + |∇U |i,j p−2 (Ui+1,j − Ui,j )2 (Ui,j +1 − Ui,j )2
i±1,j
A(U )i± 1 ,j = , |∇U |2i,j = + ,
2 2 h2 h2
and A(U )i,j ± 1 similarly and obtain, with the matrix A(U ) from (5.23), the
2
following semi-implicit method:
τ −1
U n+1 = id − 2 A(U n ) U n + τ |U 0 − U n |q−2 (U 0 − U n ) .
h
In every time step we have to solve a linear system of equations, which can be done
efficiently.
In the case p < 2, the entries in A can become arbitrarily large or even become
infinite. This leads to numerical problems. A simple workaround is the following
trick: we choose a “small” ε > 0 and replace |∇U | by
The discrete images U will be defined on h ∪ (∂ )h . For a given U we denote
by A(U )|h the restriction of the matrix A(U ) from Example 6.130 on h , i.e., the
matrix in which we eliminated the rows and columns that belong to the indices that
are not in h . A semi-implicit method is then given by the successive solution of
the linear system of equations
τ
idh − 2 A(U n )|h U n+1 |h = U n |h ,
h
id(∂ )h U n+1 |(∂ )h = U 0 |(∂ )h .
The convolution introduced implicit integral terms that appear in addition to the
differential terms. Hence, we also need to discretize the convolution in addition to
the partial differential equation (which can be done similarly to Example 6.130).
From Sect. 3.3.3 we know how this can be done. We denote the matrix that
implements the convolution with k on discrete images by B, and its adjoint, i.e.,
the matrix for the convolution with k̄, by B∗ . The right-hand side of the equation is
affine linear in u, and we discretize it implicitly in the context of the semi-implicit
method, i.e.,
U n+1 − U n 1
− 2 A(U n )U n+1 = B∗ U 0 − B∗ BU n+1 .
τ h
6.4 Numerical Methods 395
∂u
= −JX−1 DF (u) for t > 0, u(0) = u0 ,
∂t
This shows that u(t) reduces the functional values with increasing t. This justifies
the use of gradient flows for minimization processes.
For real Hilbert spaces X we can generalize the notion of gradient flow to
subgradients. If F : X → R∞ is proper, convex, and lower semicontinuous, one
can show that for every u0 ∈ dom ∂F there exists a function u : [0, ∞[ → X that
solves, in some sense, the differential inclusion
∂u
− (t) ∈ ∂F u(t) for t > 0, u(0) = u0 .
∂t
The study of such problems is the subject of the theory of nonlinear semigroups and
monotone operators in Hilbert spaces, see, e.g., [21, 130].
The above examples show how one can use the well-developed theory of
numerics of partial differential equations to easily obtain numerical methods to
solve the optimality conditions of the variational problems. As we have seen in
Examples 6.130–6.132, sometimes some modifications are necessary in order to
avoid undesired numerical effects (cf. the case p < 2 in the diffusion equations).
The reason for this is the discontinuity of the differential operators or, more
abstractly, the discontinuity of the subdifferential. This problem occurs, in contrast
to linear maps, even in finite dimensions and leads to problems with numerical
methods.
396 6 Variational Methods
Hence, we are going to develop another approach that does not depend on the
evaluation of general subdifferentials.
Again we consider the problem to minimize a proper, convex, and lower semicon-
tinuous functional F , now on a real Hilbert space X. In the following we identify,
via the Riesz map, X = X∗ . In particular, we will consider the subdifferential as a
graph in X × X, i.e., ∂F (u) is a subset of X.
Suppose that F is of the form F = F1 + F2 ◦ A, where F1 : X → R∞ is proper,
convex, and lower semicontinuous, A ∈ L(X, Y ) for some Banach space Y , and
F2 : Y → R is convex and continuously differentiable. Moreover, we assume that
F1 has a “simple” structure, which we explain in more detail later. Theorem 6.51
states that the Euler-Lagrange equation for the optimality of u∗ for the problem
is given by
Somewhat surprisingly, it turns out that the operation on the right-hand side of the
last formulation is single-valued.
Lemma 6.134 Let X be a real Hilbert space and F : X → R∞ proper, convex, and
lower semicontinuous. For every σ > 0, one has that (id +σ ∂F )−1 is characterized
by the mapping that maps u to the unique minimizer of
v − u2X
min + σ F (v). (6.83)
v∈X 2
Moreover, the map (id +σ ∂F )−1 is nonexpansive, i.e., for u1 , u2 ∈ X,
(id +σ ∂F )−1 (u1 ) − (id +λ∂F )−1 (u2 ) ≤ u1 − u2 X .
X
6.4 Numerical Methods 397
Proof For u ∈ H we consider the minimization problem (6.83). The objective func-
tional is proper, convex, lower semicontinuous, and coercive, and by Theorem 6.31
there exists a minimizer v ∗ ∈ X. By strict convexity of the norm, this minimizer is
X. By Theorems 6.43 and 6.51 (the norm is continuous in X), v ∈ X is a minimizer
if and only if
1 2
0∈∂ 2 · X ◦ T−u (v) + σ ∂F (v) ⇐⇒ 0 ∈ v − u + σ ∂F (v).
The latter is equivalent to v ∈ (id +σ ∂F )−1 (u), and by uniqueness of the minimizer
we obtain that v ∈ (id +σ ∂F )−1 (u) if and only if v = v ∗ . Is particular,
(id +σ ∂F )−1 (u) is single-valued.
To prove the inequality we first show monotonicity of ∂F (cf. Theorem 6.33 and
its proof). Let v 1 , v 2 ∈ X and w1 ∈ ∂F (v 1 ) as well as w2 ∈ ∂F (v 2 ). By the
respective subgradient inequalities we get
(w1 , v 2 − v 1 ) ≤ F (v 2 ) − F (v 1 ), (w2 , v 1 − v 2 ) ≤ F (v 1 ) − F (v 2 ).
as desired. $
#
Remark 6.135 The mapping (id +σ ∂F )−1 is also called the resolvent of ∂F for
σ > 0.
Another name is the proximal mapping of σ F , and denoted by proxσ F but we
will stick to the resolvent notation in this book.
We revisit the result in (6.82) and observe that we have obtained an equivalent
formulation of the optimality condition for u∗ as a fixed point equation
u∗ = (id +σ ∂F1 )−1 ◦ (id −σ A∗ ◦ DF2 ◦ A) (u∗ ).
This leads to a numerical method immediately, namely to the fixed point iteration
un+1 = T (un ) = (id +σ ∂F1 )−1 ◦ (id −σ A∗ ◦ DF2 ◦ A) (un ). (6.84)
This method is known as “forward backward splitting”, see [46, 93]. It is a special
case of splitting methods, and depending on how the sum ∂F1 + A∗ DF2 A is split
up, one obtains different methods, e.g., the Douglas-Rachford splitting method or
the alternating direction method of multipliers; see [56].
398 6 Variational Methods
To justify the fixed point iteration, we assume that the iterates (un ) converge
to some u. By assumptions on A and F2 we see that T is continuous, and hence
we also have the convergence T (un ) → T (u). Since T (un ) = un+1 , we see that
T (u) = u, and this gives the optimality of u. Hence, the fixed point iteration is, in
case of convergence, a continuous numerical method for the minimization of sums
of the form F1 + F2 ◦ A with convex functionals under the additional assumption
that F2 is continuously differentiable.
Let us analyze the question whether and when (id +σ ∂F1 )−1 can be computed
by elementary operations in more detail. We cannot expect this to be possible for
general functionals, but for several functionals that are interesting in out context,
there are indeed formula. We begin with some elementary rules of calculus for
resolvents and look at some concrete examples.
Lemma 6.136 (Calculus for Resolvents) Let F1 : X → R∞ be a proper, convex,
and lower semicontinuous functional on the real Hilbert space X, with Y another
real Hilbert space and σ > 0.
1. For α ∈ R
i.e., if F2 (u) = F1 (u) + α, then (id +σ ∂F2 )−1 (u) = (id +∂σ F1 )−1 (u),
2. for τ, λ > 0
i.e., if F2 (u) = τ F1 (λu), then (id +σ ∂F2 )−1 (u) = λ−1 (id +σ τ λ2 ∂F1 )−1 (λu),
3. for u0 ∈ X, w0 ∈ X
F2 = F1 ◦ Tu0 + (w0 , · ) ⇒ (id +σ ∂F2 )−1 = T−u0 ◦ (id +σ ∂F1 )−1 ◦ Tu0 −σ w0 ,
i.e., if F2 (u) = F1 (Au), then (id +σ ∂F2 )−1 (u) = A∗ (id +σ ∂F1 )−1 (Au),
5. if F2 : Y → R∞ is proper, convex, and lower semicontinuous, then
: ;
−1 (id +σ ∂F1 )−1 (u)
F3 (u, w) = F1 (u) + F2 (w) ⇒ (id +σ ∂F3 ) (u, w) = .
(id +σ ∂F2 )−1 (w)
Proof Assertions 1–3: The proof of the identities consists of obvious and elemen-
tary steps, which we omit here.
6.4 Numerical Methods 399
v − u2X
v∗ solves min + σ F1 (Av)
v∈X 2
A(v − u)2Y
⇔ v∗ solves min ∗ + σ F1 (Av)
v∈rg(A ) 2
w − Au2Y
⇔ Av ∗ solves min + σ F1 (w).
w∈Y 2
Here we used both the bijectivity of A and A∗ A = id. The last formulation is
equivalent to v ∗ = A∗ (id +σ ∂F1 )−1 (Au), and thus
−1
id +σ ∂F2 = A∗ ◦ (id +σ ∂F1 )−1 ◦ A.
v − u2X ω − w2Y
(v ∗ , ω∗ ) solves min + σ F1 (v) + + σ F2 (ω)
v∈X 2 2
ω∈Y
⎧
⎪
⎪ ∗ v − u2X
⎪
⎨v solves min + σ F1 (v),
v∈X 2
⇔
⎪
⎪ ω − w2X
⎪
⎩ ω∗ solves min + σ F2 (ω).
ω∈Y 2
The first minimization problem is solved only by v ∗ = (id +σ ∂F1 )−1 (u), and the
second only by ω∗ = (id +σ ∂F2 )−1 (w), i.e.,
−1 (id +σ ∂F1 )−1 (u)
(id +σ ∂F3 ) (u, w) =
(id +σ ∂F2 )−1 (w),
as desired. $
#
Example 6.137 (Resolvent Maps)
1. Functionals in R
For F : R → R∞ proper, convex, and lower semicontinuous, dom ∂F
has to be an interval (open, half open, or closed), and every ∂F (t) is a closed
interval which we denote by [G− (t), G+ (t)] (where the values ±∞ are explicitly
allowed, but then are excluded from the interval). The functions G− , G+ are
monotonically increasing in the sense that G+ (s) ≤ G− (t) for s < t.
400 6 Variational Methods
s + σ F (s) = t.
(cf. Example 6.49). Moreover, (u, v)X = uX vX if and only if v = λu for
some λ ≥ 0, and hence the properties that define the set in the last equation are
equivalent to vX = (id +σ ∂ϕ)−1 (uX ) and v = λu for a λ ≥ 0, which in
turn is equivalent to
(id +σ ∂ϕ)−1 (uX ) u
u
if u = 0,
v= X
0 if u = 0.
6.4 Numerical Methods 401
By the monotonicity of ϕ, we have (id +σ ∂ϕ)−1 (0) = 0, and we can write the
resolvent, using the notation 00 = 1, as
−1 u
id +σ ∂(ϕ ◦ · X ) (u) = (id +σ ∂ϕ)−1 uX .
uX
(Qu, u)X
F (u) = + (w, u)X ,
2
which is differentiable with derivative DF (u) = Qu + w. By the assumption
on Q,
(Qu, u)X
F (u) + (DF (u), v − u)X = + (w, u)X + (Qu + w, v − u)X
2
(Qu, u)X (Qv, v)X (Qv, v)X
=− + (Qu + w, v)X − +
2 2 2
(Q(u − v), (u − v))X (Qv, v)X
=− + + (w, v)X
2 2
≤ F (v)
for all u, v ∈ X, and this implies, using Theorem 6.33, the convexity of F . For
given σ and u ∈ X we would like to calculate the resolvent (id +σ ∂F )−1 (u).
One has u = v + σ Qv + σ w if and only if v = (id +σ Q)−1 (u − σ w), and hence
v − u2X
min .
v∈K 2
402 6 Variational Methods
Hence,
and in particular, the resolvent does not depend on σ . Without further information
on K we cannot say how the projection can be calculated, but in several special
cases, simple ways to do so exist.
(a) Let K be a nonempty closed interval in R. Depending on the boundedness,
we get
K
dim
PK (u) = (v n , u)X v n .
n=1
If one has only some basis {v 1 , . . . , v N } for a subspace K with dim K < ∞
and Gram-Schmidt orthonormalization would be too expensive, one could
still calculate the projection PK (u) by solving the linear system of equations
x ∈ RN : Mx = b, Mi,j = (v i , v j )X , bj = (u, v j )X
and setting PK (u) = N i
i=1 xi v . By definition, M is positive definite and
hence invertible. However, depending on the basis {v i }, it may be badly
conditioned.
5. Convex integrands and summands
Let X = L2 (, RN ) be the Lebesgue-Hilbert space with respect to a
measure space (, F, μ) and let ϕ : RN → R∞ be proper, convex, and lower
semicontinuous with known resolvent (id +σ ∂ϕ)−1 . Moreover, assume ϕ ≥ 0
and ϕ(0) = 0 if has infinite measure, and ϕ bounded from below otherwise
(cf. Examples 6.23 and 6.29).
The minimization problem (6.83) corresponding to the resolvent of the
subgradient of
F (u) = ϕ u(x) dx
6.4 Numerical Methods 403
reads
|v(x) − u(x)|2
min + σ ϕ v(x) dx,
v∈L2 (,RN ) 2
In the special case of finite sums, we have an even more general result: for a given
family ϕ1 , . . . , ϕM : RN → R∞ of proper, convex, and lower semicontinuous
functionals one gets for u : {1, . . . , M} → RN ,
M
F (u) = ϕj (uj ) ⇒ (I + σ ∂F )−1 (u)j = (I + σ ∂ϕj )−1 (uj ).
j =1
Applying the rules of the calculus of resolvents and using the previous examples,
we obtain a fairly large class of functionals for which we can evaluate the resolvent
by elementary means and which can be used for practical numerical methods. One
important case is the pth power of the Lp norms.
p
Example 6.138 (Resolvent of ∂ p1 · p ) For some measure space (, F, μ) we
consider X = L2 (, RN ) and the functionals
1
F (u) = u(x)p dx
p
norm | · |, and by item 2 in Example 6.137 we have only to determine the resolvent
of ∂ϕp with
⎧
⎨ 1 tp if t ≥ 0,
ϕp (t) = p
for p < ∞, ϕ∞ (t) = I]−∞,1] (t) for p = ∞,
⎩0 otherwise,
as a function on R, and moreover, this only for positive arguments. Let us discuss
this for fixed t ≥ 0 and varying p: In the case p = 1 we can reformulate s =
(id +σ ∂ϕ)−1 (t) by definition as
[0, σ ] if s = 0
t∈
{s + σ } otherwise.
t = s + σ s p−1 ,
for s. For p = 2 this is the same as s = t/(1 + σ ) and in all other cases, this is
a nonlinear equation, but for t = 0 we have that s = 0 is always a solution. If
p > 2 is an integer, the problem is equivalent to finding a root of a polynomial of
degree p − 1. As is known, this problem can be solved in closed form using roots in
the cases p = 3, 4, 5 (with the quadratic formula, Cardano’s method, and Ferrari’s
method, respectively). By the substitution s p−1 = ξ we can extend this method to
the range p ∈ ]1, 2[; in this way, we can also treat the cases p = 32 , 43 , 54 exactly.
For all other p ∈ ]1, ∞[ we resort to a numerical method. For t > 0 we can employ
Newton’s method, for example, and get the iteration
p−1
t − sn − σ sn
sn+1 = sn + p−2
.
1 + σ (p − 1)sn
The iterates are decreasing and converge, giving high precision after a few iterations,
if of initializes the iteration with
t 1
p−1
s0 ≥ t and s0 < if p < 2;
σ (2 − p)
see Exercise 6.41. The only case remaining is p = ∞, but this is treated already in
item 4 of Example 6.137 and gives s = min(1, t).
6.4 Numerical Methods 405
• The case p = ∞
u u
F (u) = I{v∞ ≤1} (u) ⇒ (id +σ ∂F )−1 (u) = min(1, |u|) = .
|u| max(1, |u|)
Hence, the resolvent is the pointwise projection onto the closed unit ball in RN .
• The case 1 < p < ∞
Newton’s method amounts to the following procedure:
1. Set v = |u| and choose some v 0 ∈ L2 () with
⎧ 0
⎨ v (x) ≥ v(x)
⎪ almost everywhere in ,
v(x) 1
⎪
⎩ v 0 (x) < p−1
almost everywhere in {v(x) = 0} if p < 2.
σ (2 − p)
2. Iterate
v − v n − σ |v n |p−1
v n+1 = v n + .
1 + σ (p − 1)|v n |p−2
1 u
(id +σ ∂F )−1 (u) = v ∗
p
F = · p ⇒ .
p |u|
Then F2 may be continuous but not continuously differentiable. Hence, we aim for
a method that can treat more general functionals F2 .
At this point, Fenchel duality comes in handy. We assume that F2 is the Fenchel
conjugate with respect to a Hilbert space Y , i.e., F2 = F2∗∗ in Y . In the following
we identify the spaces Y = Y ∗ also for conjugation, i.e.,
w∈Y : F2∗ (w) = sup (w, v)Y −F2 (v), v∈Y : F2 (v) = sup (v, w)Y −F2∗ (w),
v∈Y w∈Y
and similarly for conjugation in X. We postulate that also F2∗ is “simple enough,”
but do not make assumptions on continuity or differentiability (note that lower
semicontinuity is already implied, though).
In the following we also need that the conclusion of Fenchel-Rockafellar duality
(see Theorem 6.68 for sufficient conditions) holds:
Recall that (u∗ , w∗ ) ∈ dom F1 × dom F2∗ is a saddle point of L if and only if for all
(u, w) ∈ dom F1 × dom F2∗ ,
For the Lagrange functional we define for every pair (u0 , w0 ) ∈ dom F1 × dom F2∗
the restrictions Lw0 : X → R∞ , Lu0 : Y → R ∪ {−∞} by
L(u, w 0 ) if u ∈ dom F1 , L(u0 , w) if w ∈ dom F2∗ ,
Lw 0 (u) = Lu0 (w) =
∞ otherwise, −∞ otherwise.
It is simple to see, using the notions of Definition 6.60 and the result of Lemma 6.57,
that Lw0 ∈ 0 (X) and −Lu0 ∈ 0 (Y ). Hence by Lemma 6.134 the resolvents
−1
(id +σ ∂Lw0 )−1 and id +τ ∂(−Lu0 ) exist, and with the help of Lemma 6.136
6.4 Numerical Methods 407
Thus, the property of (u∗ , w∗ ) ∈ dom F1 × dom F2∗ being a saddle point of L is
equivalent to
The optimality conditions from Theorem 6.43 allow, for arbitrary σ, τ > 0, the
following equivalent formulations:
(u∗ , w∗ ) saddle point ⇔ 0 ∈ ∂Lw∗ (u∗ ), 0 ∈ ∂(−Lu∗ )(w∗ )
∗
u ∈ u∗ + σ ∂Lw∗ (u∗ ),
⇔
w∗ ∈ w∗ + τ ∂(−Lu∗ )(w∗ )
⎧ ∗
⎪
⎪ u = (id +σ ∂Lw̄∗ )−1 (u∗ ),
⎨
−1
⇔ w∗ = id +τ ∂(−Lū∗ ) (w∗ ),
⎪
⎪
⎩ ∗
ū = u∗ , w̄∗ = w∗ .
The pair (ū∗ , w̄∗ ) ∈ X × Y has been introduced artificially and will denote the
points at which we take the resolvents of ∂Lw̄∗ and ∂(−Lū∗ ). Hence, we have again
a formulation of optimality in terms of a fixed point equation: the saddle points
(u∗ , w∗ ) ∈ dom F1 × dom F2 ∗ are exactly the elements of X × Y that satisfy the
coupled fixed point equations
⎧ ∗
⎪
⎪ u = (id +σ ∂F1 )−1 (u∗ − σ A∗ w̄∗ ),
⎨
w∗ = (id +τ ∂F2∗ )−1 (w∗ + τ Aū∗ ), (6.88)
⎪
⎪
⎩ ∗
ū = u∗ , w̄∗ = w∗ .
We obtain a numerical method by fixed point iteration: to realize the right hand side,
we need only the resolvents of ∂F1 and ∂F2∗ and the application of A and its adjoint
A∗ . Since the resolvents are nonexpansive (cf. Lemma 6.134) and A and A∗ are
continuous, the iteration is even Lipschitz continuous. We recall that no assumptions
on differentiability of F2 have been made.
From Eq. (6.88) we can derive a number of numerical methods, and we give a
few (see also [7]).
408 6 Variational Methods
In practice one can save some memory, since we can overwrite un directly with
un+1 . Moreover, we note that one can interchange the roles of u and w, of course.
• Modified Arrow-Hurwicz method/Extra gradient method
The idea of the modified Arrow-Hurwicz method is not to use ūn = un and
w̄ = wn but to carry these on and update them with explicit Arrow-Hurwicz
n
steps [113]:
⎧ n+1
⎪
⎪ u = (id +σ ∂F1 )−1 (un − σ A∗ w̄n ),
⎪
⎪
⎪
⎪
⎨ wn+1 = (id +τ ∂F2∗ )−1 (wn + τ Aūn ),
⎪
⎪ ūn+1 = (id +σ ∂F1 )−1 (un+1 − σ A∗ w̄n ),
⎪
⎪
⎪
⎪
⎩ n+1
w̄ = (id +τ ∂F2∗ )−1 (wn+1 + τ Aūn ).
The earlier extra gradient method, proposed in [89], uses a similar idea, but
differs in that in the calculations of ūn+1 and w̄n+1 we evaluate the operators
A∗ and A at wn+1 and un+1 , respectively.
• Arrow-Hurwicz method with linear primal extra gradient/Chambolle-Pock
method
A method that seems to work well for imaging problems, proposed in [34], is
based on the semi-implicit Arrow-Hurwicz method with a primal “extra gradient
sequence” (ūn ). Instead of an Arrow-Hurwicz step, one uses a well-chosen linear
combination (based on ū∗ = 2u∗ − u∗ ).
⎧ n+1
⎪ w = (id +τ ∂F2∗ )−1 (wn + τ Aūn ),
⎪
⎨
un+1 = (id +σ ∂F1 )−1 (un − σ A∗ wn+1 ), (6.89)
⎪
⎪
⎩ n+1
ū = 2un+1 − un .
6.4 Numerical Methods 409
Note that only the primal variable uses an “extra gradient,” and that the order in
which we update the primal and dual variable is important, i.e., we have to update
the dual variable first. Of course we can swap the roles of the primal and dual
variables, and in this case we could speak of a dual extra gradient. The memory
requirements are comparable to those in the explicit Arrow-Hurwicz method.
For all the above methods there are conditions on the steps sized σ and τ
that guarantee convergence in some sense. In comparison to the classical Arrow-
Hurwicz method, the modified Arrow-Hurwicz method and the extra gradient
method need weaker conditions to ensure convergence, and hence they are of
practical interest, despite the slightly higher memory requirements. For details we
refer to the original papers and treat only the case of convergence of the Arrow-
Hurwicz method with linear primal extra gradient. We follow the presentation
of [34] and first derive an estimate for one Arrow-Hurwicz step with fixed (ū, w̄).
Lemma 6.139 Let X, Y be real Hilbert spaces, A ∈ L(X, Y ), F1 ∈ 0 (X), F2∗ ∈
0 (Y ), and σ, τ > 0. Moreover, let Z = X × Y , and for elements z = (u, w),
z̄ = (ū, w̄) in X × Y , let
If z̄, zn , zn+1 ∈ Z with z̄ = (ū, w̄), zn = (un , wn ), and zn+1 = (un+1 , wn+1 ), and
the equations
un+1 = (id +σ ∂F1 )−1 (un − σ A∗ w̄)
(6.90)
wn+1 = (id +τ ∂F2∗ )−1 (wn + τ Aū),
are satisfied, then for every z = (u, w), u ∈ dom F1 , w ∈ dom F2∗ , one has the
estimate
which we rearrange to
Adding the right-hand sides, using the definition of L, and inserting ±(wn+1 ,
Aun+1 )Y , we obtain
F1 (u) − F2∗ (w n+1 ) + F2∗ (w) − F1 (un+1 ) + (w̄, A(u − un+1 ))Y − (w − w n+1 , Aū)Y
= L(u, w n+1 ) − L(un+1 , w) + (w̄, A(u − un+1 ))Y − (w n+1 , Au)Y + (w n+1 , Aun+1 )Y
Adding the respective left-hand sides, we obtain the scalar product in Z, which we
reformulate as
Moving 12 z − zn+1 2Z − 12 z − zn 2Z to the right-hand side and all terms in (6.92)
to the left-hand side, we obtain the desired inequality. $
#
Next we remark that the iteration (6.89) can be represented by (6.90): the choice
ū = 2un − un−1 and w̄ = wn+1 with u−1 = u0 is obviously the correct one,
given that we initialize with ū0 = u0 , which we assume in the following. Hence, we
can use the estimate (6.91) to analyze the convergence. To that end, we analyze the
scalar products in the estimate and aim to estimate them from below. Ultimately we
would like to estimate in a way such that we can combine them with the norms in
the expression.
6.4 Numerical Methods 411
Lemma 6.140 In the situation of Lemma 6.139 let (zn ) be a sequence in Z = X×Y
with components zn = (un , wn ) and u−1 = u0 . If σ τ A2 ≤ 1, then for all w ∈ Y
and M, N ∈ N with M ≤ N, one has
N−1
wN − w2Y
− wn+1 − w, A(un+1 − 2un + un−1 ) Y
≤ δM (w) + σ τ A2
2τ
n=M
N −1
− (wn+1 − w, A(un+1 − 2un + un−1 ))Y
n=M
−1
wn+1 − wn 2Y
N
√ un − un−1 2X
≤ δM (w) − δN (w) + σ τ A +
2σ 2τ
n=M
We estimate the value −δN (w) also with the Cauchy-Schwarz inequality, the
operator norm, and Young’s inequality, this time with λ = σ to get
N−1
zn+1 − zn 2 wN − w2Y √ zn+1 − zn 2
N−2
Z
−σ τ A2 − σ τ A Z
2 2τ 2
n=0 n=0
√ zn+1 − zn 2
N−2 z − zN 2Z
(1 − σ τ A) Z
+ (1 − σ τ A2 )
2 2
n=0
N−1
z − z0 2Z
+ L(un+1 , w) − L(u, wn+1 ) ≤ .
2
n=0
The first estimate implies limn→∞ zn+1 − zn 2Z = 0. Consequently, the sequence
(zn ) is bounded, and by the finite dimensionality of Z there exists a convergent
subsequence (znk ) with limk→∞ znk = z∗ for some z∗ = (u∗ , w∗ ) ∈ Z.
Moreover, we have convergence of neighboring subsequences limk→∞ znk +1 =
limk→∞ znk −1 = z∗ , and also, by continuity of A, A∗ , (id +σ ∂F1 )−1 and
(id +τ ∂F2∗ )−1 (see Lemma 6.134), we conclude that
w = lim (id +τ ∂F2∗ )−1 (wnk + τ Aūnk ) = (id +τ ∂F2∗ )−1 (w∗ + τ Au∗ ),
∗
k→∞
u = lim (id +σ ∂F1 )−1 (unk − σ A∗ wnk +1 ) = (id +σ ∂F1 )−1 (u∗ − σ A∗ w∗ ).
∗
k→∞
Thus, the pair (u∗ , w∗ ) satisfies Eq. (6.88), and this shows that it is indeed a saddle
point of L.
It remains to show that the whole sequence (zn ) converges to z∗ . To that end fix
k ∈ N and N ≥ nk + 1. Summing (6.94) from M = nk to N − 1 and repeating the
above steps, we get
√ zn+1 − zn 2
N−2 z − zN 2Z
(1 − σ τ A) Z
+ (1 − σ τ A2 )
n=n
2 2
k
N−1
unk − unk −1 2X z − znk 2Z
+ L(un+1 , w) − L(u, wn+1 ) ≤ δnk (w) + + .
n=nk
2σ 2
414 6 Variational Methods
Plugging in z∗ , using that σ τ A2 < 1 and that (u∗ , w∗ ) is a saddle point, we
arrive at
Obviously, limk→∞ δnk (w∗ ) = (w∗ − w∗ , A(u∗ − u∗ ))Y = 0, and hence the right-
hand side converges to 0 for k → ∞. In particular, for every ε > 0 there exists k
such that the right-hand side is smaller than ε2 . This means, that for all N ≥ nk ,
z∗ − zN 2Z ≤ ε2 ,
inf L(u, wn ) ≤ L(ũ, wn ) ≤ L(ũ, w̃) ≤ L(un , w̃) ≤ sup L(un , w).
u∈X w∈Y
The infimum on the left-hand side is exactly the dual objective value
−F1∗ (−A∗ wn ) − F2∗ (wn ), while the supremum on the right-hand side is the primal
objective value F1 (un ) + F2 (Aun ). The difference is always nonnegative and
vanishes exactly at the saddle points of L. Hence, one defines the duality gap
G : X × Y → R∞ as follows:
This also shows that G is proper, convex, and lower semicontinuous. In particular,
G(un , wn ) ≥ F1 (un ) + F2 (Aun ) − min F1 (u) + F2 (Au) ,
u∈X
G(u , w ) ≥ max
n n
−F1∗ (−A∗ w) − F2∗ (w) − − F1∗ (−A∗ wn ) − F2∗ (wn ) .
w∈Y
(6.96)
Hence, a small duality gap implies that the differences between the functional
values of un and wn and the respective optima of the primal and dual problems,
respectively, are also small. Hence, the condition G(un , wn ) < ε for some given
tolerance ε > 0 is a suitable criterion to terminate the iteration.
If (un , wn ) converges to a saddle point (u∗ , w∗ ), then the lower semicontinuity
G gives only 0 ≤ lim infn→∞ G(un , wn ), i.e., the duality gap does not necessarily
converge to 0. However, this may be the case, and a necessary condition is the
continuity of F1 , F2 and their Fenchel conjugates. In these cases, G gives a stopping
6.4 Numerical Methods 415
criterion that guarantees for the primal-dual method (6.89) (given its convergence)
the optimality of the primal and dual objective values up to a given tolerance.
Now we want to apply a discrete version of the method we derived in the previous
subsection to solve the variational problems numerically. To that end, we have
to discretize the respective minimization problems and check whether Fenchel-
Rockafellar duality holds, in order to guarantee the existence of a saddle point
of the Lagrange functional. If we succeed with this, we have to identify how to
implement the steps of the primal-dual method (6.89) and then, by Theorem 6.141,
we have a convergent numerical method to solve our problems. Let us start with the
discretization of the functionals
As in Sect. 5.4 we assume that we have rectangular discrete images, i.e., N × M
matrices (N, M ≥ 1). The discrete indices (i, j ) always satisfy 1 ≤ i ≤ N and
1 ≤ j ≤ M. For a fixed “pixel size” h > 0, the (i, j )th entry corresponds to the
function value at (ih, j h). The associated space is denoted by RN×M and equipped
with the scalar product
N
M
(u, v) = h2 ui,j vi,j .
i=1 j =1
To form the discrete gradient we also need images with multidimensional values,
and hence we denote by RN×M×K the space of images with K-dimensional values.
For u, v ∈ RN×M×K we define a pointwise and a global scalar product by
K
N
M
ui,j · vi,j = ui,j,k vi,j,k , (u, v) = h2 ui,j · vi,j ,
k=1 i=1 j =1
√
definitions give a pointwise absolute value |ui,j | = ui,j · ui,j
respectively. These √
and a norm u = (u, u).
With this notation, we can write the Lp norms simply as “pointwise summation”:
For u ∈ RN×M×K and p ∈ [1, ∞[ we have
N
M 1
p
up = h2 |ui,j |p , u∞ = max max |ui,j |.
i=1,...,N j =1,...,M
i=1 j =1
416 6 Variational Methods
Lemma 6.142 (Nullspace, Adjoint, and Norm Estimate for ∇h ) The linear map
∇h : RN×M → RN×M×2 has the following properties:
1. The nullspace is ker(∇h ) = span(1) with the constant vector 1 ∈ RN×M ,
2. the adjoint is ∇h∗ = − divh , and
3. the norm satisfies ∇h 2 < h82 .
N−1
M
N M−1
(∇h u, v) = h (ui+1,j − ui,j )vi,j,1 + h (ui,j +1 − ui,j )vi,j,2
i=1 j =1 i=1 j =1
M
N N−1
=h ui,j vi−1,j,1 − ui,j vi,j,1
j =1 i=2 i=1
6.4 Numerical Methods 417
N
M M−1
+h ui,j vi,j −1,2 − ui,j vi,j,2
1=1 j =2 j =1
M
N−1
=h − v1,j,1 u1,j + (vi−1,j,1 − vi,j,1 )ui,j + vN−1,j,1 uN,j
j =1 i=2
N
M−1
+h − vi,1,2 ui,1 + (vi,j −1,2 − vi,j,2 )ui,j + vi,M−1,2 ui,M
i=1 j =2
N
M
= h2 − (∂1− v 1 )i,j − (∂2− v 2 )i,j ui,j = (u, − divh v).
i=1 j =1
N−1 M
N M−1
−h2 ui+1,j ui,j < 1, −h2 ui,j +1 ui,j < 1.
i=1 j =1 i=1 j =1
Assume that the first inequality is not satisfied. Let vi,j = −ui+1,j if i < N and
vN,j = 0. By the Cauchy-Schwarz inequality we obtain (v, u) ≤ vu ≤ 1,
i.e. the scalar product has to be equal to 1. This means that the Cauchy-Schwarz
inequality is tight and hence u = v. In particular, we have uN,j = 0, and recursively
we get for i < N that
N
−1
M
N M−1
= u2i+1,j + u2i,j − 2ui+1,j ui,j + u2i,j +1 + u2i,j − 2ui,j +1 ui,j
i=1 j =1 i=1 j =1
N
M N
−1
M
N M−1
≤4 u2i,j − 2 ui+1,j ui,j − 2 ui,j +1 ui,j
i=1 j =1 i=1 j =1 i=1 j =1
4 2 2 8
< + 2 + 2 = 2.
h2 h h h
Since the set {u = 1} is compact, there exists some u∗ with u∗ = 1 where the
value of the operator norm is attained, showing that ∇h 2 = ∇h u∗ 2 < h82 . $
#
418 6 Variational Methods
The Sobolev semi-norm of u is discretized as ∇h up , and the total variation
corresponds to the case p = 1, i.e., TVh (u) = ∇h u1 . This allows one to
discretize the applications with Sobolev penalty from Sect. 6.3.2 and their coun-
terparts with total variation from Sect. 6.3.3. If Fenchel-Rockafellar duality (6.86)
holds, method (6.89) yields a saddle point of the Lagrange functional and the
primal component is a solution of the original problem. With a little intuition and
some knowledge about the technical feasibility of resolvent maps it is possible to
derive practical algorithms for a large number of convex minimization problems in
imaging. To get an impression, how this is done in concrete cases, we discuss the
applications from Sects. 6.3.2 and 6.3.3 in detail.
Example 6.143 (Primal-Dual Method for Variational Denoising) For 1 ≤ p < ∞,
1 < q < ∞, X = RN×M , and a discrete, noisy image U 0 ∈ RN×M and λ > 0 the
discrete denoising problem reads
q p
u − U 0 q λ∇h up
min + .
u∈X q p
and note that the assumptions for Fenchel-Rockafellar duality from Theorem 6.68
are satisfied, and hence the associated Lagrange functional has a saddle point. To
apply method (6.89) we need the resolvents of ∂F1 and ∂F2∗ . Note that Lemma 6.65
and Example 6.64 applied to F2∗ lead to
p∗ ∗
p∗
λ−p /p
p ∗ · p ∗
λ
◦ λ−1 id = p ∗ · p ∗ if p > 1,
F2∗
p
F2 = p · p
λ
⇒ =
λI{v∞ ≤1} ◦ λ−1 id = I{v∞ ≤λ} if p = 1.
Now let σ, τ > 0. For the resolvent (id +σ ∂F1 )−1 we have by Lemma 6.136 and
Example 6.138,
(id +σ ∂F1 )−1 (u) i,j
0
= Ui,j 0
+ sgn(ui,j − Ui,j )(id +σ | · |q−1 )−1 |ui,j − Ui,j
0
| .
Table 6.1 Primal-dual method for the numerical solution of the discrete variational denoising
problem
Primal-dual method for the solution of the variational denoising problem
q p
u − U 0 q λ∇h up
min + .
u∈RN×M q p
1. Initialize
Let n = 0, ū0 = u0 = U 0 , w 0 = 0. Choose σ, τ > 0 with σ τ ≤ 8
h2
.
2. Dual step
w̄ n+1 = w n + τ ∇h ūn ,
⎧ n+1
⎪
⎪ p∗ n+1 w̄i,j
⎪ id +τ λ− p | · |p −1 −1 |w̄i,j
∗
⎪
⎨ | if p > 1,
n+1
|w̄i,j | 1 ≤ i ≤ N,
n+1
wi,j =
⎪
⎪
n+1
w̄i,j 1 ≤ j ≤ M.
⎪
⎪ if p = 1,
⎩
max(1, |w̄i,j
n+1
|/λ)
ūn+1 = 2un+1 − un .
4. Iterate
Update n ← n + 1 and continue with Step 2.
Finally, we note that ∇h∗ = − divh holds and that the step size restriction
σ τ ∇h 2 < 1 is satisfied for σ τ ≤ h82 ; see Lemma 6.142. Now we have all
ingredients to fully describe the numerical method for variational denoising; see
Table 6.1. By Theorem 6.141 this method yields a convergent sequence ((un , wn )).
Let us briefly discuss a stopping criterion based on the duality gap (6.95). By
Lemma 6.65 and Example 6.64, we have
q∗
F1∗ =
q
F1 = q1 · q ◦ T−U 0 ⇒ q ∗ · q ∗
1
+ (U 0 , · ).
q∗ −p∗
un − U 0 q divh w q ∗
q n
λ n p λ
p
p∗
G (un , wn ) = + +(U 0
, divh w n
) + ∇ h u p + wn p ∗
q q∗ p p∗
420 6 Variational Methods
q∗
q
un − U 0 q divh wn q ∗
G(u , w ) =
n n
+ + (U 0 , divh wn ) + λ TVh (un ),
q q∗
and the convergence G(un , wn ) → 0 is satisfied as well. Thus, stopping the iteration
as soon as G(un , wn ) < ε, and this has to happen at some point, one has arrived
at some un such that the optimal primal objective value is approximated up to a
tolerance of ε.
Example 6.144 (Primal-Dial Method for Tikhonov Functionals/Deconvolution)
Now we consider the situation with a discretized linear operator Ah . Let again
1 ≤ p < ∞, 1 ≤ q < ∞ (this time we also allow q = 1) and X = RN1 ×M1 ,
moreover, let Ah ∈ L(X, Y ) with Y = RN2 ×M2 a forward operator that does not
map constant images to zero, and let U 0 ∈ Z be noisy measurements. The problem
is the minimization of the Tikhonov functional
q p
Ah u − U 0 q λ∇h up
min + .
u∈X q p
Noting that the nullspace of ∇h consists of the constant image only (see
Lemma 6.142), we can argue similarly to Theorems 6.86 and 6.115 and
Example 6.143 that minimizers exist. Application 6.97 and Example 6.127
motivate the choice of Ah as a discrete convolution operator as in Sect. 3.3.3,
i.e., Ah u = u ∗ kh with a discretized convolution kernel kh with nonnegative entries
that sum to one. The norm is, in this case, estimated by Ah ≤ 1, and the adjoint
A∗h amounts to a convolution with k̄h = D− id k. In the following the concrete
operator will not be of importance, and we will discuss only the general situation.
Since the minimization problem features both Ah and ∇h , we dualize with Z =
RN×M×2 , but in the following way:
G H
1 q λ p Ah
u ∈ X : F1 (u) = 0, (v, w) ∈ Y ×Z : F2 (v, w) = v − U 0 q + wp , A = .
q p ∇h
Again, the assumption in Theorem 6.68 are satisfied and hence we only need to
find saddle points of the Lagrange functional. We dualize F2 , which, similar to
Example 6.143, leads to
q∗
∗
p∗
λ−p /p
q ∗ v̄q ∗ + (U 0 , v̄)
1
F2∗ (v̄, w̄) =
if q > 1,
+ p ∗ w̄p ∗ if p > 1,
I{v̄∞ ≤1} (v̄) + (U 0 , v̄) if q = 1, I{w̄∞ ≤λ} (w̄) if p = 1.
6.4 Numerical Methods 421
By Lemma 6.136 item 5 we can apply the resolvent of ∂F2∗ for σ by applying
the resolvents for v̄ and w̄ componentwise. We calculated both already in Exam-
ple 6.143, and we only need to translate the variable v̄ by −σ U 0 . The resolvent for
∂F1 is trivial: (id +τ ∂F1 )−1 = id.
Let us note how σ τ can be chosen: we get an estimate of A2 by
Au2 = Ah u2 + ∇h u2 ≤ Ah 2 + ∇h 2 )u2 < Ah 2 + 8
h2
u2 ,
and hence σ τ ≤ (Ah 2 + h82 )−1 is sufficient for the estimate σ τ A2 < 1.
The whole method is described in Table 6.2. The duality gap is, in the case of
p > 1 and q > 1, as follows (note that F1∗ = I{0} ):
1 λ
G(un , v n , wn ) = I{0} (divh wn − A∗h v n ) +
q p
Ah un − U 0 q + ∇h un p
q p
∗
− −p
1 q∗ λ p p∗
+ ∗ v n q ∗ + (U 0 , v n ) + ∗
wn p∗ .
q p
1 λ
G̃(un , v n , wn ) = Cdivh wn − A∗h v n +
q p
Ah un − U 0 q + ∇h un p
q p
∗
− −p
1 q∗ λ p p∗
+ ∗ v n q ∗ + (U 0 , v n ) + ∗
wn p∗ ,
q p
Table 6.2 Primal-dual method for the minimization of the Tikhonov functional with Sobolev- or
total variation penalty
Primal-dual method for the minimization of Tikhonov functionals
q p
Ah u − U 0 q λ∇h up
min + .
u∈RN1 ×M1 q p
1. Initialization
Let n = 0, ū0 = u0 = 0, v 0 = 0 and w 0 = 0.
Choose σ, τ > 0 such that σ τ ≤ (Ah 2 + h82 )−1 .
2. Dual step
w̄ n+1
= w + τ ∇h ū ,
n n
⎧ n+1
⎪ id +τ λ− pp | · |p∗ −1 −1 |w̄ n+1 | w̄i,j
⎪ ∗
⎪
⎪ if p > 1,
⎨ i,j
|w̄i,j |
n+1 1 ≤ i ≤ N1 ,
n+1
wi,j =
⎪
⎪ w̄ n+1
1 ≤ j ≤ M1 .
⎪
⎪
i,j
if p = 1,
⎩ n+1
max(1, |w̄i,j |/λ)
ūn+1 = 2un+1 − un .
4. Iteration
Set n ← n + 1 and continue with step 2.
We want to apply the primal-dual method to this problem. First note that a minimizer
exists. This can be shown, as in the previous examples, using arguments similar
to those for the existence of solutions in Application 6.98 and Example 6.128. To
6.4 Numerical Methods 423
Again similarly to the continuous case, we use the sum and the chain rule for
subgradients: F2 is continuous everywhere and F1 and F2 ◦∇h satisfy the assumption
in Exercise 6.14. We obtain
and by Remark 6.72 there exists a saddlepoint of the respective Lagrange function-
als.
To apply the primal-dual method
we need (id +σ ∂F1 )−1 , and this amounts to the
projection onto K = {v ∈ X v|h \h = U 0 } (see Example 6.137). To evaluate this
we do
1 ≤ j ≤ M2 .
Similarly to Example 6.145 one sees that this problem has a solution. With Z =
RN×M×2 we write
λ p
u∈X: F1 (u) = I{Ah v=U 0 } (u), v∈Z: F2 (v) = vp , A = ∇h ,
p
Table 6.3 Primal-dual method for the numerical solution of the variational inpainting problem
Primal-dual method for variational inpainting
p
∇h up
min + I{v| 0 (u).
u∈RN×M p h \h =U }
1. Initialization
Let n = 0, ū0 = u0 = U 0 , w 0 = 0. Choose σ, τ > 0 with σ τ ≤ 8
h2
.
2. Dual step
w̄ n+1 = w n + τ ∇h ūn ,
⎧ n+1
⎪ id +τ | · |p∗ −1 −1 |w̄ n+1 | w̄i,j
⎪
⎪
⎪
⎨ i,j if p > 1,
1 ≤ i ≤ N,
|w̄i,j |
n+1
n+1
wi,j =
⎪
⎪ w̄ n+1
1 ≤ j ≤ M.
⎪
⎪
i,j
if p = 1,
⎩ n+1
max(1, |w̄i,j |)
ūn+1 = 2un+1 − un .
4. Iteration
Set n ← n + 1 and continue with step 2.
see Example 6.137. Table 6.4 shows the resulting numerical method.
Now let us consider the concrete case of the interpolation problem from
Application 6.100, i.e., the k-fold zooming of an image U 0 ∈ RN×M . Here it is
natural to restrict oneself to positive integers k. Thus, the map Ah models a k-fold
downsizing from N1 × M1 to N2 × M2 with N1 = kN, M1 = kM, N2 = N, and
M2 = M. If we choose, for example, the averaging over k × k squares, then the v i,j
are given by
1
v i,j = χ[(i−1)k+1,ik]×[(j −1)k+1,j k], 1 ≤ i ≤ N, 1 ≤ j ≤ M.
hk
1
The factor hk makes these vectors orthonormal, and hence the matrix Ah A∗h is the
identity. Thus, there is no need to solve a linear system and we can set λ = μ in
step 3 of the method in Table 6.4. For the perfect low-pass filter one can leverage
the orthogonality of the discrete Fourier transform in a similar way.
6.5 Further Developments 425
Table 6.4 Primal-dual method for the numerical solution of minimization problems with Sobolev
semi-norm or total variation and linear equality constraints
Primal-dual method for linear equality constraints
p
∇h up
min + I{Ah v=U 0 } (u).
u∈RN1 ×M1 p
1. Initialization
Let n = 0, ū0 = u0 = U 0 , w 0 = 0. Choose σ, τ > 0 with σ τ ≤ 8
h2
.
2. Dual step
w̄ n+1 = w n + τ ∇h ūn ,
⎧ n+1
⎪
⎪ n+1 w̄i,j
⎪ id +τ | · |p −1 −1 |w̄i,j
∗
⎪
⎨ | if p > 1,
|w̄i,j
n+1
| 1 ≤ i ≤ N1 ,
n+1
wi,j =
⎪
⎪
n+1
w̄i,j 1 ≤ j ≤ M1 .
⎪
⎪ if p = 1,
⎩ n+1
max(1, |w̄i,j |)
ūn+1 = 2un+1 − un .
4. Iteration
Set n ← n + 1 and continue with step 2.
an edge set ⊂ and a piecewise smooth function u that minimize the following
functional:
E(u, ) = (u − u0 )2 dx + λ |∇u|2 dx + μHd−1 (),
\
where λ, μ > 0 are again some regularization parameters. The third term penalizes
the length of the edge set and prohibits the trivial and unnatural minimizer =
and u0 = u. The Mumford-Shah functional is a mathematically challenging object,
and both its analysis and numerical minimization have been the subject of numerous
studies [5, 6, 9, 18, 33, 49, 76, 112]. On big challenge in the analysis of the
minimization of E results from an appropriate description of the objects and the
functions u ∈ H 1 (\) (with changing ) in an appropriate functional-analytic
context. It turned out that the space of special functions of bounded total variation
SBV() is well suited. It consists of these functions in BV(), where the derivative
can be written as
where (u+ − u− ) denotes the jump, ν the measure-theoretic normal, and the jump
set. The vector space SBV() is a proper subspace of BV() (for BV functions the
gradient also contains a so-called Cantor part), and the Mumford-Shah functional
is well defined on this space but is not convex. Moreover, it is possible to prove the
existence of minimizers of the Mumford-Shah functional in SBV(), but due to the
nonconvexity, the proof is more involved than in the cases we treated in this chapter.
A simplified variational model for segmentation with piecewise constant images
has been proposed by Chan and Vese in [38]. Formally it is a limiting case of the
Mumford-Shah problem with the parameters (αλ, αμ), where α → ∞, and hence it
mainly contains the geometric parameter . In the simplest case of two gray values
one has to minimize the functional
F (, c1 , c2 ) = (χ c1 + χ\ c2 − u0 )2 dx + μHd−1 (),
where = ∂
−1
∩0. For a fixed , the optimal
−1
constants are readily calculated as
c1 = | | u dx, and c 2 = |\ | \ u 0 dx and the difficult part is to
find the edge set . For this problem so-called level-set methods are popular [105,
128]. Roughly speaking, the main idea is to represent as the zero level-set of a
function φ : → R that is positive in and negative in \ . With the Heaviside
function H = χ[0,∞[ we can write F as
F (φ, c1 , c2 ) = (c1 − u0 )2 (H ◦ φ) + (c2 − u0 )2 (1 − H ) ◦ φ dx + μ TV(H ◦ φ),
6.5 Further Developments 427
The numerical solution of this equation implicitly defines the evolution of the edge
set . If one updates the mean values c1 and c2 during the iteration, this approach
results, after a few more numerical tricks, in a solution method [38, 73].
To decompose images into different parts there are more elaborate approaches
such as the denoising method we have seen in this chapter, especially with respect
to texture. The model behind these methods assumes that the dominant features of
an image u0 are given by a piecewise smooth “cartoon” part and a texture part.
The texture part should contain mainly fine and repetitive structures, but not contain
noise. If one assumes that u0 contains noise, one postulates
u0 = ucartoon + utexture + η.
We mention specifically the so-called G-norm model by Meyer [70, 87, 99]. In this
model the texture part utexture is described by the semi-norm that is dual to the total
variation:
u∗ = inf σ ∞ σ ∈ Ddiv,∞ , div σ = u .
The G-norm has several interesting properties and is well suited to describe
oscillating repetitive patterns. In this context ucartoon is usually modeled with
Sobolev semi-norms or the total variation, which we discussed extensively in this
chapter. An associated minimization problem can be, for example,
1 λ
min |u0 − ucartoon − utexture |q dx+ |∇ucartoon|p dx+μutexture∗
ucartoon ,utexture q p
There are also theoretical results and several numerical methods for this approach
available [10, 92, 107].
The problem to determine the optical flow can be cast as a variational problem
in different ways; see, for example, [17, 24, 75]. We show the classical approach
of [78]. For a given image sequence u : [0, 1] × → R on a domain one aims to
428 6 Variational Methods
find a velocity field v : [0, 1] × → Rd for which v(t, · ) gives the directions and
velocities of the objects in each image u(t, · ). The respective variational problem is
then derived from the following considerations. If we traced the movement of every
point x ∈ in time, the path would follow the trajectory ϕx : [0, 1] → with
ϕx (0) = x. Now we assume that the overall brightness of the sequence u0 does not
change, and even that all the points do not change theirbrightness over time. If we
plug this assumption into the trajectory, we see that u t, ϕx (t) = u(0, x) has to
hold. Differentiating this equation with respect to t leads to
∂u
t, ϕx (t) + ∇u t, ϕx (t) · ϕx (t) = 0.
∂t
The derivative ϕx (t) is exactly the velocity v t, ϕx (t) . If we claim that the above
equation is satisfied throughout [0, 1] × we obtain the optical flow constraint
∂u
+ ∇u · v = 0. (6.99)
∂t
On the other hand, the velocity field v should have a certain smoothness in space,
i.e., we expect that the objects mainly follow rigid motions and deform only slightly,
e.g., by a change of the viewpoint The idea from [78] is to enforce the smoothness
by an H 1 penalty. Since the brightness will not be exactly constant, one does not
enforce the optical flow constraint exactly. This leads to the following minimization
problem for the optical flow at time t:
2
1 ∂u λ
F (v) = (t) + ∇u(t) · v dx + |∇v|2 dx.
2 ∂t 2
There are many variants of this theme. On the one hand, one could use the whole
time interval in the optimization, and hence the minimization problem would
contain an integral over the time interval [0, 1]. On the other hand, one can consider
numerous other data and regularization terms [24].
In practice there are only discrete images. There are several approaches in the
literature that determine the optical flow in the case that only u0 = u(0) and
u1 = u(1) are known. Here one assumes again that u(t) satisfies the optical flow
constraint. If v is known, one can set the initial condition u(0) = u0 and solve
the transport equation (6.99) (e.g., by the method of characteristics; see Chap. 5) to
obtain u(1) = u1 . If this is not satisfied, one can use the discrepancy u(1) − u1
to determine how well the unknown v fits the data. This motivates the optimization
problem
⎧
1 ∂v ⎪
⎨
∂u
+ ∇u · v = 0,
1
min |u(1) − u0 |2 dx + λ ϕ , ∇v dx dt with ∂t
v 2 0 ∂t ⎪
⎩ u(0) = u0 ;
see [17]. The penalty for v contains a regularization in space, but also the time
derivative ∂v
∂t . Another possibility is to fix both endpoints u(0) and u(1) and to
6.5 Further Developments 429
look for an interpolation u : [0, 1] → and measure the deviation of the optical
flow constraint. In this case one optimizes over both u and v; for this approach,
see [42, 43, 84]:
1 1 ∂u 2 ∂v u(0) = u0 ,
min + v · ∇u + λϕ , ∇v dx dt with
u,v 2 0 ∂t ∂t u(1) = u1 .
Of course, one can use other differential operators for a similar definition, e.g., the
Laplace operator
uM = sup uv dx v ∈ D(), v∞ ≤ 1
430 6 Variational Methods
d
∂ 2v
i
diag ∇ 2 uM = sup u 2
dx v ∈ D(, Rd ), v∞ ≤ 1 .
i=1
∂x i
Both approaches are well suited for preserving edges with reduced staircasing,
and the different second-order differential operators lead to slightly different
Fig. 6.31 Illustration of variational denoising with second-order penalties. Left: The original on
top, below the noisy version. Middle and right: The minimizers of the L2 -! denoising problem
with PSNR-optimal parameter λ
6.5 Further Developments 431
d
∂2v
i
!diag (u) = sup u 2
dx v ∈ D(, Rd ), v∞ ≤ α, diag ∇v∞ ≤ 1
i=1
∂x i
which easily generalized to higher orders [19]. Figure 6.32 shows results for these
functionals in a denoising problem.
Fig. 6.32 Illustration of variational denoising with penalties that combine first and second order.
Left: The original on top, below the noisy version. Middle and right: The minimizer of the L2 -!
denoising problems with PSNR-optimal parameters λ
432 6 Variational Methods
6.6 Exercises
|x|1−d/2 2π|x|
Pλ (x) = d−1 (d+2)/4
Kd/2−1 √ .
(2π) λ λ
from [138].
Exercise 6.2 Let ⊂ Rd be a domain and ⊂ a bounded Lipschitz subdomain
such that ⊂⊂ .
1. Show the identities
{u ∈ H 1 () u = 0 on \ } = {u ∈ H 1 () u| ∈ H01 ( )} = D( ),
Exercise 6.4 Let X∗ be the dual space of a separable normed space. Show that a
functional F : X∗ → R∞ , that is bounded from below, coercive, and sequentially
weak* lower semicontinuous has a minimizer in X∗ .
Exercise 6.5 Let : X → Y be an affine linear map between vector spaces X, Y ,
i.e.,
λx + (1 − λ)y = λ(x) + (1 − λ)(y) for all x, y ∈ X and λ ∈ K.
is coercive on X.
2. The operator A is injective, rg(A) is closed, and in particular A−1 : rg(A) → X
is continuous.
Exercise 6.7 Let X and Y be Banach spaces and let X be reflexive. Moreover let
| · |X : X → R be an admissible semi-norm on X, i.e., there exist a linear and
continuous P : X → X and constants 0 < c ≤ C < ∞ such that
2. The assertion remains true if intXi (Ki ) are nonempty and disjoint. Here intXi (Ki )
is the set of all x ∈ Ki such that 0 is a relative interior point of (Ki − x) ∩ Xi in
Xi .
[Hint:] First prove that Ki = intXi (Ki ) (see also Exercise 6.9).
6.6 Exercises 435
holds.
Exercise 6.15 Show that for real Banach spaces X, Y and A ∈ L(Y, X) such that
rg(A) is closed and there exists a subspace X1 ⊂ X with X = X1 + rg(A), as well
as mappings P1 ∈ L(X, X1 ) and P2 ∈ L(X, rg(A)) with id = P1 + P2 , one has for
convex F : X → R∞ for which there exists a point x 0 ∈ dom F ∩ rg(A) such that
x 1 → F (x 0 + x 1 ), x 1 ∈ X1 is continuous at the origin that
∂(F ◦ A) = A∗ ◦ ∂F ◦ A.
Exercise 6.16 Use the results of Exercise 6.15 to find an alternative proof for the
third point in Theorem 6.51.
Exercise 6.17 Let X, Y be real Banach spaces and A ∈ L(X, Y ) injective and rg(A)
dense in Y . Show:
1. A−1 with dom A−1 = rg(A) is a closed and densely defined linear map,
2. (A∗ )−1 with dom (A∗ )−1 = rg(A∗ ) is a closed and densely defined linear map,
3. it holds that (A−1 )∗ = (A∗ )−1 .
Exercise 6.18 Let ⊂ Rd be a domain and F : L2 () → R convex and Gâteaux-
differentiable. Consider the problems of minimizing the functionals
where the supremum of a set that is unbounded from above is ∞ and the infimum
of a set that is unbounded from below is −∞.
Exercise 6.20 Prove the claims in Lemma 6.59.
436 6 Variational Methods
Show:
1. For p > 1, one has that F ∗ is positively p∗ -homogeneous with p1 + p1∗ = 1.
2. For p = 1, one has that F ∗ = IK with a convex and closed set K ⊂ X∗ .
3. For “p = ∞,” i.e., F = IK with K = ∅ positively absorbing, i.e., αK ⊂ K for
α ∈ [0, 1], one has that F ∗ is positively 1-homogeneous.
Exercise 6.22 Let X be a real Banach space and F : X → R∞ strongly coercive.
Show that the Fenchel conjugate F ∗ : X → R is continuous.
is bounded.
[Hint:] Use the uniform boundedness principle. To that end, use the fact that for every
(v 1 , v 2 ) ∈ X × X one can find a representation v 1 − αu0 = v 2 − αu1 with α > 0 and
u1 ∈ dom F1 and apply the Fenchel inequality to w 1 , u1 and w 2 , u0 .
2. The infimal convolution F1∗ F2∗ is proper, convex, and lower semicontinuous.
[Hint:] For the lower semicontinuity use the result of the first point to conclude that
for sequences w n → w with (F1∗ F2∗ )(w n ) → t, sequences ((w 1 )n ), ((w 2 )n ) with
(w 1 )n +(w 2 )n = w n are bounded and F1∗ (w 1 )n +F2∗ (w 2 )n ≤ (F1∗ F2∗ )(w n )+ n1 .
∗
Also use the lower semicontinuity of F1 and F2 . ∗
3. Moreover, the infimal convolution is exact, i.e., for every w ∈ X∗ the minimum
in
is attained.
6.6 Exercises 437
[Hint:] Use the first point and the direct method to show that for a given w ∈ X ∗ a
respective minimizing sequence ((w 1 )n , (w 2 )n ) with (w 1 )n + (w 2 )n = w is bounded.
4. If F1 and F2 are convex and lower semicontinuous, then (F1 + F2 )∗ = F1∗ F2∗ .
5. The exactness of F1∗ F2∗ and (F1 + F2 )∗ = F1∗ F2∗ implies that for convex
and lower semicontinuous F1 , F2 , one has that ∂(F1 + F2 ) = ∂F1 + ∂F2 .
Exercise 6.25 Consider the semi-norm
m p/2 1/p
m ∂ u 2
∇ up =
m
(x) dx
Rd α ∂x α
|α|=m
∇ m · p = ∇ m · p ◦ Tx 0 ◦ O.
m ∂ m u 2
d d
∂ mu 2
α (x) = ··· (x) ,
α ∂x ∂xi1 · · · ∂xim
|α|=m im =1 i1 =1
Exercise 6.26 Let the assumptions in Theorem 6.86 be satisfied. Moreover, assume
that Y is a Hilbert space. Denote by T : Y → A"m the orthogonal projection
onto the image of the polynomials of degree up to m − 1 under A and further set
S = A−1 T A, where A is inverted on A"m .
Show that every solution u∗ of the problem (6.36) satisfies ASu∗ = T u∗ .
Exercise 6.27 Let be a bounded Lipschitz domain, m ≥ 1, and p, q ∈ ]1, ∞[.
1. Using appropriate conditions for q, show that the denoising problem
1 λ
min |u − u0 |q dx + |∇ m u|p dx
u∈Lq () q p
K
M∗n u = ϕk T−tn ηk (u ∗ ψ̄k,n ) ,
k=0
and show that M∗n u ∈ D() as well as M∗n u| → u0 = u| in H m,p ().
2. For the second claim you may use Gauss’s theorem (Theorem 2.81) to reduce it
to the first claim.
Exercise 6.31 Let M, N ∈ N with N, M ≥ 1 and let S ∈ RN×N and W ∈ RM×N
be matrices.
6.6 Exercises 439
is invertible.
Exercise 6.32 Let ⊂ Rd be a domain, N ∈ N, and L : D(, RN ) → R such that
there exists a constant C ≥ 0 such that |L(ϕ)| ≤ Cϕ∞ for every ϕ ∈ D(, RN ).
Show that L has a unique continuous extension to C0 (, RN ) and hence that there
exists a vector-valued finite Radon measure μ ∈ M(, RN ) such that
L(ϕ) = ϕ dμ for all ϕ ∈ D(, RN ).
[Hint:] Use the definition of C0 (, RN ) as well as Theorems 3.13 and 2.62.
Exercise
K 6.37 Let , 1 , . . . , K ⊂ Rd be bounded Lipschitz domains with =
k=1 k and k mutually disjoint. Moreover, let u : → R be such that every
uk = u|k can be extended to an element in C 1 (k ). Show that with l,k = l ∩
k ∩ , one has
K
TV(u) = |∇uk | dx + |ul − uk | dHd−1.
k=1 k l<k l,k
[Hint:] For every ε > 0 choose a neighborhood l,k of the part l,k of the boundary with
|l,k | < ε. Approximate sgn(uk −ul )ν there by smooth functions (similar to Exercise 6.36)
and also approximate on k \ 1≤l<k≤K l,k almost everywhere the negative sign
∇uk
− |∇u k|
. Patch these piecewise functions by smooth cutoff functions to construct a sequence
∇u k
(ϕ n ) in D (, Rd ) with ϕ n ∞ ≤ 1 that converges almost everywhere on k to − |∇u k | and
Show that if
(∇u)x(s), y(s)
(∇u) x(s), y(s) = 0 as well as div = κ
(∇u) x(s), y(s)
for some κ ∈ R and all s ∈ ]−ε, ε[, then there exists a ϕ0 ∈ R such that (x, y) can
be written as
x(s) = x0 + sin(κs + ϕ0 ),
y(s) = y0 + cos(κs + ϕ0 ),
for all s ∈ ]−ε, ε[. In particular, (x, y) parameterizes a piece of a line or circle with
curvature κ.
Exercise 6.41 Let t > 0, p ∈ ]1, ∞[, σ > 0, and s0 be such that
t 1
p−1
s0 ≥ t and s0 < if p < 2.
σ (2 − p)
Show that the sequence (sn ) defined by the iteration
p−1
t − sn − σ sn
sn+1 = sn + p−2
,
1 + σ (p − 1)sn
is well defined, fulfills sn > 0 for all n and is decreasing. Moreover, it converges to
the unique s, which fulfills the equation s + σ s p−1 = t.
Exercise 6.42 Implement the primal-dual method for variational denoising
(Table 6.1).
Exercise 6.43 For K ≥ 1 let the matrix κ ∈ R(2K+1)×(2K+1) represent
K a convolu-
tion kernel that is indexed by −K ≤ i, j ≤ K and satisfies K
i=−K j =−K κi,j =
1. Moreover, for N, M ≥ 1 let Ah : R (N+2K)×(M+2K) → R N×M be a discrete
convolution operator,
K
K
(Ah u)i,j = u(i+K−k),(j +K−k)κk,l .
k=−K l=−K
1. Implement the primal-dual method from Table 6.2 for the solution of the
variational deconvolution problem
q p
Ah u − U 0 q λ∇h up
min +
u∈R(N+2K)×(M+2K) q p
3. Use the convergence proof of Theorem 6.141 to estimate the norm of the iterates
(un , wn ) of the primal-dual method from Table 6.3.
Exercise 6.45 Implement the primal-dual inpainting method from Table 6.3.
Add-on: Use the results from Exercise 6.44, to derive a modified duality gap G̃
according to Example 6.144. Prove that G̃(un , wn ) → 0 for the iterates (un , wn )
and modify your program such that it terminates if G̃ falls below a certain threshold.
Exercise 6.46 Let 1 ≤ p < ∞, N1 , N2 , M1 , M2 ∈ N positive, let the map
Ah : RN1 ×M1 → RN2 ×M2 be linear and surjective, and let Ah 1 = 0. Consider
the minimization problem
p
∇h up
min + I{Ah v=U 0 } (u)
u∈RN1 ×M1 p
for some U 0 ∈ RN2 ×M2 . Define X = RN1 ×N2 , Z = RN2 ×M2 × RN1 ×M1 ×2 and
p
wp
F1 : X → R, F1 (u) = 0, F2 : Z → R, F2 (v, w) = I{0} (v −U 0 )+ .
p
6.6 Exercises 443
Prove that the minimization problem is equivalent to the saddle point problem for
Derive an alternative method for the minimization of the discrete Sobolev and total
variation semi-norm, respectively, that in contrast to the method from Table 6.4,
does not use the projection onto {Ah u = U 0 } and hence does not need to solve a
linear system.
Exercise 6.47 Let N, M ∈ N be positive, U 0 ∈ RN×M and K ∈ N with K ≥ 1.
For 1 ≤ p < ∞ consider the interpolation problem
p
∇h up
min + I{Ah u=U 0 } (u)
u∈RKN×KM p
with
1
K K
(Ah u)i,j = u((i−1)K+k),((j −1)K+l).
K2
k=1 l=1
1. R. Acar, C.R. Vogel, Analysis of bounded variation penalty methods for ill-posed problems.
Inverse Probl. 10(6), 1217–1229 (1994)
2. R.A. Adams, J.J.F. Fournier, Sobolev Spaces. Pure and Applied Mathematics, vol. 140, 2nd
edn. (Elsevier, Amsterdam, 2003)
3. L. Alvarez, F. Guichard, P.-L. Lions, J.-M. Morel, Axioms and fundamental equations in
image processing. Arch. Ration. Mech. Anal. 123, 199–257 (1993)
4. H. Amann, Time-delayed Perona-Malik type problems. Acta Math. Univ. Comenian. N. Ser.
76(1), 15–38 (2007)
5. L. Ambrosio, N. Fusco, D. Pallara, Functions of Bounded Variation and Free Discontinuity
Problems. Oxford Mathematical Monographs (Oxford University Press, Oxford, 2000)
6. L. Ambrosio, N. Fusco, J.E. Hutchinson, Higher integrability of the gradient and dimension
of the singular set for minimisers of the Mumford-Shah functional. Calc. Var. Partial Differ.
Equ. 16(2), 187–215 (2003)
7. K.J. Arrow, L. Hurwicz, H. Uzawa, Studies in Linear and Non-linear Programming. Stanford
Mathematical Studies in the Social Sciences, 1st edn. (Stanford University Press, Palo Alto,
1958)
8. G. Aubert, P. Kornprobst, Mathematical Problems in Image Processing (Springer, New York,
2002)
9. G. Aubert, L. Blanc-Féraud, R. March, An approximation of the Mumford-Shah energy by
a family of discrete edge-preserving functionals. Nonlinear Anal. Theory Methods Appl. Int.
Multidiscip. J. Ser. A Theory Methods 64(9), 1908–1930 (2006)
10. J.-F. Aujol, A. Chambolle, Dual norms and image decomposition models. Int. J. Comput.
Vis. 63(1), 85–104 (2005)
11. V. Aurich, J. Weule, Non-linear gaussian filters performing edge preserving diffusion, in
Proceedings 17. DAGM-Symposium, Bielefeld (Springer, Heidelberg, 1995), pp. 538–545
12. C. Bär, Elementary Differential Geometry (Cambridge University Press, Cambridge, 2010).
Translated from the 2001 German original by P. Meerkamp
13. M. Bertalmio, G. Sapiro, V. Caselles, C. Ballester, Image inpainting, in Proceedings of
SIGGRAPH 2000, New Orleans (2000), pp. 417–424
14. M. Bertero, P. Boccacci, Introduction to Inverse Problems in Imaging (Institute of Physics,
London, 1998)
15. F. Bornemann, T. März, Fast image inpainting based on coherence transport. J. Math. Imaging
Vis. 28(3), 259–278 (2007)
16. J.M. Borwein, A.S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and
Examples. CMS Books in Mathematics, vol. 3, 2nd edn. (Springer, New York, 2006)
17. A. Borzí, K. Ito, K. Kunisch, Optimal control formulation for determining optical flow. SIAM
J. Sci. Comput. 24, 818–847 (2002)
18. B. Bourdin, A. Chambolle, Implementation of an adaptive finite-element approximation of
the Mumford-Shah functional. Numer. Math. 85(4), 609–646 (2000)
19. K. Bredies, K. Kunisch, T. Pock, Total generalized variation. SIAM J. Imaging Sci. 3(3),
492–526 (2010)
20. M. Breuß, J. Weickert, A shock-capturing algorithm for the differential equations of dilation
and erosion. J. Math. Imaging Vis. 25(2), 187–201 (2006)
21. H. Brézis, Operateurs maximaux monotones et semi-groupes de contractions dans les espaces
de Hilbert. North-Holland Mathematics Studies, vol. 5. Notas de Matemática (50) (North-
Holland, Amsterdam; Elsevier, New York, 1973).
22. H. Brézis, Analyse fonctionnelle - Théorie et applications. Collection Mathématiques
Appliquées pour la Maîtrise (Masson, Paris, 1983)
23. T. Brox, O. Kleinschmidt, D. Cremers, Efficient nonlocal means for denoising of textural
patterns. IEEE Trans. Image Process. 17(7), 1083–1092 (2008)
24. A. Bruhn, J. Weickert, C. Schnörr, Lucas/Kanade meets Horn/Schunck: combining local and
global optical flow methods. Int. J. Comput. Vis. 61(3), 211–231 (2005)
25. A. Buades, J.-M. Coll, B. Morel, A review of image denoising algorithms, with a new one.
Multiscale Model. Simul. 4(2), 490–530 (2005)
26. M. Burger, O. Scherzer, Regularization methods for blind deconvolution and blind source
separation problems. Math. Control Signals Syst. 14, 358–383 (2001)
27. E.J. Candès, D.L. Donoho, New tight frames of curvelets and optimal representations of
objects with piecewise c2 singularities. Commun. Pure Appl. Math. 57(2), 219–266 (2004)
28. E. Candès, L. Demanet, D. Donoho, L. Ying, Fast discrete curvelet transforms. Multiscale
Model. Simul. 5(3), 861–899 (2006)
29. J. Canny, A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach.
Intell. 8(6), 679–698 (1986)
30. F. Catté, P.-L. Lions, J.-M. Morel, T. Coll, Image selective smoothing and edge detection by
nonlinear diffusion. SIAM J. Numer. Anal. 29(1), 182–193 (1992)
31. A. Chambolle, P.-L. Lions, Image recovery via Total Variation minimization and related
problems. Numer. Math. 76, 167–188 (1997)
32. A. Chambolle, B.J. Lucier, Interpreting translation-invariant wavelet shrinkage as a new
image smoothing scale space. IEEE Trans. Image Process. 10, 993–1000 (2001)
33. A. Chambolle, G.D. Maso, Discrete approximation of the Mumford-Shah functional in
dimension two. Math. Model. Numer. Anal. 33(4), 651–672 (1999)
34. A. Chambolle, T. Pock, A first-order primal-dual algorithm for convex problems with
applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
35. A. Chambolle, R.A. DeVore, N. Lee, B.J. Lucier, Nonlinear wavelet image processing:
variational problems, compression and noise removal through wavelet shrinkage. IEEE Trans.
Image Process. 7, 319–335 (1998)
36. T.F. Chan, S. Esedoglu, Aspects of total variation regularized L1 function approximation.
SIAM J. Appl. Math. 65, 1817 (2005)
37. T.F. Chan, J. Shen, Image Processing And Analysis: Variational, PDE, Wavelet, and
Stochastic Methods (Society for Industrial and Applied Mathematics, Philadelphia, 2005)
38. T.F. Chan, L.A. Vese, Active contours without edges. IEEE Trans. Image Process. 10(2),
266–277 (2001)
39. T.F. Chan, C. Wong, Total variation blind deconvolution. IEEE Trans. Image Process. 7,
370–375 (1998)
40. T.F. Chan, A. Marquina, P. Mulet, High-order total variation-based image restoration. SIAM
J. Sci. Comput. 22(2), 503–516 (2000)
References 447
41. T.F. Chan, S. Esedoglu, F.E. Park, A fourth order dual method for staircase reduction in
texture extraction and image restoration problems. Technical report, UCLA CAM Report
05-28 (2005)
42. K. Chen, D.A. Lorenz, Image sequence interpolation using optimal control. J. Math. Imaging
Vis. 41(3), 222–238 (2011)
43. K. Chen, D.A. Lorenz, Image sequence interpolation based on optical flow, segmentation,
and optimal control. IEEE Trans. Image Process. 21(3), 1020–1030 (2012)
44. Y. Chen, K. Zhang, Young measure solutions of the two-dimensional Perona-Malik equation
in image processing. Commun. Pure Appl. Anal. 5(3), 615–635 (2006)
45. U. Clarenz, U. Diewald, M. Rumpf, Processing textured surfaces via anisotropic geometric
diffusion. IEEE Trans. Image Process. 13(2), 248–261 (2004)
46. P.L. Combettes, V.R. Wajs, Signal recovery by proximal forward-backward splitting.
Multiscale Model. Simul. 4(4), 1168–1200 (2005)
47. R. Courant, K.O. Friedrichs, H. Lewy, Über die partiellen Differenzengleichungen der
mathematischen Physik. Math. Ann. 100(1), 32–74 (1928)
48. I. Daubechies, Orthonormal bases of compactly supported wavelets. Commun. Pure Appl.
Math. 41(7), 909–996 (1988)
49. G. David, Singular Sets of Minimizers for the Mumford-Shah Functional. Progress in
Mathematics, vol. 233 (Birkhäuser, Basel, 2005)
50. J. Diestel, J.J. Uhl Jr., Vector Measures. Mathematical Surveys and Monographs, vol. 15
(American Mathematical Society, Providence, 1977)
51. J. Dieudonné, Foundations of Modern Analysis. Pure and Applied Mathematics, vol. 10-I
(Academic, New York, 1969). Enlarged and corrected printing
52. U. Diewald, T. Preußer, M. Rumpf, Anisotropic diffusion in vector field visualization on
Euclidean domains and surfaces. IEEE Trans. Visual. Comput. Graph. 6(2), 139–149 (2000)
53. N. Dinculeanu, Vector Measures. Hochschulbücher für Mathematik, vol. 64 (VEB Deutscher
Verlag der Wissenschaften, Berlin, 1967)
54. D.L. Donoho, Denoising via soft thresholding. IEEE Trans. Inf. Theory 41(3), 613–627
(1995)
55. V. Duval, J.-F. Aujol, Y. Gousseau, The TVL1 model: a geometric point of view. Multiscale
Model. Simul. 8(1), 154–189 (2009)
56. J. Eckstein, D.P. Bertsekas, On the Douglas-Rachford splitting method and the proximal point
algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
57. I. Ekeland, R. Temam, Convex Analysis and Variational Problems. Studies in Mathematics
and Its Applications, vol. 1 (North-Holland, Amsterdam, 1976)
58. H.W. Engl, M. Hanke, A. Neubauer, Regularization of Inverse Problems. Mathematics and
Its Applications, vol. 375, 1st edn. (Kluwer Academic, Dordrecht, 1996)
59. S. Esedoḡlu, Stability properties of the Perona-Malik scheme. SIAM J. Numer. Anal. 44(3),
1297–1313 (2006)
60. L.C. Evans, A new proof of local C 1,α regularity for solutions of certain degenerate elliptic
P.D.E. J. Differ. Equ. 45, 356–373 (1982)
61. L.C. Evans, R.F. Gariepy, Measure Theory and Fine Properties of Functions (CRC Press,
Boca Raton, 1992)
62. H. Federer, Geometric Measure Theory (Springer, Berlin, 1969)
63. B. Fischer, J. Modersitzki, Ill-posed medicine — an introduction to image registration. Inverse
Probl. 24(3), 034008 (2008)
64. I. Galić, J. Weickert, M. Welk, A. Bruhn, A. Belyaev, H.-P. Seidel, Image compression with
anisotropic diffusion. J. Math. Imaging Vis. 31, 255–269 (2008)
65. E. Giusti, Minimal Surfaces and Functions of Bounded Variation. Monographs in
Mathematics, vol. 80 (Birkhäuser, Boston, 1984)
66. G.H. Golub, C.F. Van Loan, Matrix Computations. Johns Hopkins Studies in the
Mathematical Sciences, 4th edn. (Johns Hopkins University Press, Baltimore, 2013)
67. R.C. Gonzalez, P.A. Wintz, Digital Image Processing (Addison-Wesley, Reading, 1977)
68. K. Gröchenig, Foundations of Time-Frequency Analysis (Birkhäuser, Boston, 2001)
448 References
69. F. Guichard, J.-M. Morel, Partial differential equations and image iterative filtering, in The
State of the Art in Numerical Analysis, ed. by I.S. Duff, G.A. Watson. IMA Conference Series
(New Series), vol. 63 (Oxford University Press, Oxford, 1997)
70. A. Haddad, Texture separation BV − G and BV − L1 models. Multiscale Model. Simul.
6(1), 273–286 (electronic) (2007)
71. P.R. Halmos, Measure Theory (D. Van Nostrand, New York, 1950)
72. M. Hanke-Bourgeois, Grundlagen der Numerischen Mathematik und des Wissenschaftlichen
Rechnens, 3rd edn. (Vieweg+Teubner, Wiesbaden, 2009)
73. L. He, S.J. Osher, Solving the Chan-Vese model by a multiphase level set algorithm based on
the topological derivative, in Scale Space and Variational Methods in Computer Vision, ed. by
F. Sgallari, A. Murli, N. Paragios. Lecture Notes in Computer Science, vol. 4485 (Springer,
Berlin, 2010), pp. 777–788
74. W. Hinterberger, O. Scherzer, Variational methods on the space of functions of bounded
Hessian for convexification and denoising. Computing 76, 109–133 (2006)
75. W. Hinterberger, O. Scherzer, C. Schnörr, J. Weickert, Analysis of optical flow models in the
framework of the calculus of variations. Numer. Funct. Anal. Optim. 23(1), 69–89 (2002)
76. M. Hintermüller, W. Ring, An inexact Newton-CG-type active contour approach for the
minimization of the Mumford-Shah functional. J. Math. Imaging Vis. 20(1–2), 19–42 (2004).
Special issue on mathematics and image analysis
77. M. Holler, Theory and numerics for variational imaging — artifact-free JPEG decompression
and DCT based zooming. Master’s thesis, Universität Graz (2010)
78. B.K.P. Horn, B.G. Schunck, Determining optical flow. Artif. Intell. 17, 185–203 (1981)
79. G. Huisken, Flow by mean curvature of convex surfaces into spheres. J. Differ. Geom. 20(1),
237–266 (1984)
80. J. Jost, Partial Differential Equations. Graduate Texts in Mathematics, vol. 214 (Springer,
New York, 2002). Translated and revised from the 1998 German original by the author
81. L.A. Justen, R. Ramlau, A non-iterative regularization approach to blind deconvolution.
Inverse Probl. 22, 771–800 (2006)
82. S. Kakutani, Concrete representation of abstract (m)-spaces (a characterization of the space
of continuous functions). Ann. Math. Second Ser. 42(4), 994–1024 (1941)
83. B. Kawohl, N. Kutev, Maximum and comparison principle for one-dimensional anisotropic
diffusion. Math. Ann. 311, 107–123 (1998)
84. S.L. Keeling, W. Ring, Medical image registration and interpolation by optical flow with
maximal rigidity. J. Math. Imaging Vis. 23, 47–65 (2005)
85. S.L. Keeling, R. Stollberger, Nonlinear anisotropic diffusion filtering for multiscale edge
enhancement. Inverse Probl. 18(1), 175–190 (2002)
86. S. Kichenassamy, The Perona-Malik paradox. SIAM J. Appl. Math. 57, 1328–1342 (1997)
87. S. Kindermann, S.J. Osher, J. Xu, Denoising by BV-duality. J. Sci. Comput. 28(2–3), 411–
444 (2006)
88. J.J. Koenderink, The structure of images. Biol. Cybern. 50(5), 363–370 (1984)
89. G.M. Korpelevič, An extragradient method for finding saddle points and for other problems.
Ékonomika i Matematicheskie Metody 12(4), 747–756 (1976)
90. G. Kutyniok, D. Labate, Construction of regular and irregular shearlets. J. Wavelet Theory
Appl. 1, 1–10 (2007)
91. E.H. Lieb, M. Loss, Analysis. Graduate Studies in Mathematics, vol. 14, 2nd edn. (American
Mathematical Society, Providence, 2001)
92. L.H. Lieu, L. Vese, Image restoration and decomposition via bounded Total Variation and
negative Hilbert-Sobolev spaces. Appl. Math. Optim. 58, 167–193 (2008)
93. P.-L. Lions, B. Mercier, Splitting algorithms for the sum of two nonlinear operators. SIAM
J. Numer. Anal. 16(6), 964–979 (1979)
94. A.K. Louis, P. Maass, A. Rieder, Wavelets: Theory and Applications (Wiley, Chichester,
1997)
References 449
95. M. Lysaker, A. Lundervold, X.-C. Tai, Noise removal using fourth-order partial differential
equation with applications to medical magnetic resonance images in space and time. IEEE
Trans. Image Process. 12(12), 1579–1590 (2003)
96. J. Ma, G. Plonka, The curvelet transform: a review of recent applications. IEEE Signal
Process. Mag. 27(2), 118–133 (2010)
97. S. Mallat, A Wavelet Tour of Signal Processing - The Sparse Way, with Contributions from
Gabriel Peyré, 3rd edn. (Elsevier/Academic, Amsterdam, 2009)
98. D. Marr, E. Hildreth, Theory of edge detection. Proc. R. Soc. Lond. 207, 187–217 (1980)
99. Y. Meyer, Oscillating Patterns in Image Processing and Nonlinear Evolution Equations.
University Lecture Series, vol. 22 (American Mathematical Society, Providence, 2001). The
fifteenth Dean Jacqueline B. Lewis memorial lectures
100. J. Modersitzki, FAIR: Flexible Algorithms for Image Registration. Fundamentals of Algo-
rithms, vol. 6 (Society for Industrial and Applied Mathematics, Philadelphia, 2009)
101. D. Mumford, J. Shah, Optimal approximations by piecewise smooth functions and variational
problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989)
102. H.J. Muthsam, Lineare Algebra und ihre Anwendungen, 1st edn. (Spektrum Akademischer
Verlag, Heidelberg, 2006)
103. F. Natterer, F. Wuebbeling, Mathematical Methods in Image Reconstruction (Society for
Industrial and Applied Mathematics, Philadelphia, 2001)
104. J. Nečas, Les méthodes directes en théorie des équations elliptiques (Masson, Paris, 1967)
105. S.J. Osher, R. Fedkiw, Level Set Methods and Dynamic Implicit Surfaces. Applied
Mathematical Sciences, vol. 153 (Springer, Berlin, 2003)
106. S.J. Osher, J.A. Sethian, Fronts propagating with curvature-dependent speed: algorithms
based on Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988)
107. S.J. Osher, A. Sole, L. Vese, Image decomposition and restoration using Total Variation
minimization and the h−1 norm. Multiscale Model. Simul. 1(3), 349–370 (2003)
108. S. Paris, P. Kornprobst, J. Tumblin, F. Durand, Bilateral filtering: theory and applications.
Found. Trends Comput. Graph. Vis. 4(1), 1–73 (2009)
109. W.B. Pennebaker, J.L. Mitchell, JPEG: Still Image Data Compression Standard (Springer,
New York, 1992)
110. P. Perona, J. Malik, Scale-space and edge detection using anisotropic diffusion. IEEE Trans.
Pattern Anal. Mach. Intell. 12(7), 629–639 (1990)
111. G. Plonka, G. Steidl, A multiscale wavelet-inspired scheme for nonlinear diffusion. Int. J.
Wavelets Multiresolut. Inf. Process. 4(1), 1–21 (2006)
112. T. Pock, D. Cremers, H. Bischof, A. Chambolle, An algorithm for minimizing the Mumford-
Shah functional, in 2009 IEEE 12th International Conference on Computer Vision (2009),
pp. 1133–1140
113. L.D. Popov, A modification of the Arrow-Hurwicz method for search of saddle points. Math.
Notes 28, 845–848 (1980)
114. W.K. Pratt, Digital Image Processing (Wiley, New York, 1978)
115. T. Preußer, M. Rumpf, An adaptive finite element method for large scale image processing.
J. Vis. Commun. Image Represent. 11(2), 183–195 (2000)
116. J.M.S. Prewitt, Object enhancement and extraction, in Picture Processing and Psychopic-
torics, ed. by B.S. Lipkin, A. Rosenfeld (Academic, New York, 1970)
117. T.W. Ridler, S. Calvard, Picture thresholding using an iterative selection method. IEEE Trans.
Syst. Man Cybern. 8(8), 630–632 (1978)
118. R.T. Rockafellar, Convex Analysis. Princeton Mathematical Series (Princeton University
Press, Princeton, 1970)
119. A. Rosenfeld, A.C. Kak, Digital Picture Processing (Academic, New York, 1976)
120. E. Rouy, A. Tourin, A viscosity solutions approach to shape-from-shading. SIAM J. Numer.
Anal. 29(3), 867–884 (1992)
121. L.I. Rudin, S.J. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms.
Phys. D Nonlinear Phenom. 60(1–4), 259–268 (1992)
450 References
All figures not listed in these picture credits only contain our own pictures. Only the
first figure that features a certain image is listed.
Fig. 1.1 Matthew Mendoza@Flickr http://www.flickr.
com/photos/mattmendoza/2421196777/(License:
http://creativecommons.org/licenses/by-sa/2.0/
legalcode)PerkinElmer (http://www.cellularimaging.com/
assays/receptor_activation),CNRS/Université de St-Etienne
(France), Labor für Mikrozerspanung (Universität Bremen) . . . . . . . 3
Fig. 1.5 Last Hero@Flickr http://www.flickr.com/photos/
uwe_schubert/4594327195/ (License: http://
creativecommons.org/licenses/by-sa/2.0/legalcode),
Kai Schreiber@Flickr http://www.flickr.com/photos/genista/
1249056653/ (License: http://creativecommons.org/licenses/
by-sa/2.0/legalcode), http://grin.hq.nasa.gov/ABSTRACTS/
GPN-2002-000064.html.. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8
Fig. 3.14 huangjiahui@Flickr http://www.flickr.com/photos/
huangjiahui/3128463578/ (license: http://creativecommons.
org/licenses/by-sa/2.0/legalcode) . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 96
Fig. 5.17 Mike Baird@Flickr http://www.flickr.com/photos/
mikebaird/4533794674/ (License: http://creativecommons.
org/licenses/by/2.0/legalcode).. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 223
Fig. 5.19 Benson Kua@Flickr http://www.flickr.com/photos/
bensonkua/3301838191/ (License: http://creativecommons.
org/licenses/by-sa/2.0/legalcode) . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 227
Abbreviations
B(Rd ) space of bounded functions, page 89
B() Borel algebra over , page 32
BC(Rd ) space of bounded and continuous functions, page 173
BUC(Rd ) space of bounded and uniformly continuous functions, page 187
Br (x) ball around x with radius r, page 17
C (U , Y ) space of bounded and uniformly continuous mapping on U with
values in Y , page 19
C (U, Y ) space of continuous mappings on U with values in Y , page 19
C set of complex numbers, page 16
Cα,β (u) semi-norms on the Schwartz space, page 112
Cc (, X) space of continuous function with compact support, page 42
C0 (, X) closure of Cc (, X) in C (, X), page 44
C k () space of k-times continuously differentiable functions, page 49
C ∞ () space of infinitely differentiable functions, page 49
Cb∞ (Rd ) space of infinitely differentiable functions with bounded derivatives,
page 173
cψ constant in the admissibility condition for a wavelet ψ, page 147
DF (x) Fréchet derivative of F at the point x, page 20
Dk F kth derivative of F , page 21
D() space of test functions, page 49
D()∗ space of distributions, page 50
DA u linear coordinate transformation of u : Rd → K using A ∈ Rd×d ,
i.e., DA u(x) = u(Ax), page 56
DCT(u) discrete cosine transform of u, page 140
Ddiv
m
space of vector fields with mth weak divergence and vanishing trace
on the boundary, page 329
Symbols
|x| Euclidean norm of vectors x ∈ Kn , page 16
|x|p p-norm of vector x ∈ Kn , page 16
|α| order of a multi-index α ∈ Nd , page 22
|μ| total variation measure to μ, page 43
|| Lebesgue measure of a set ∈ B(Rd ), page 34
x direction of vector x ∈ R2 , page 78
χB characteristic function of the set B, page 57
δx Dirac measure in x, page 33
x ∗ , xX∗ ×X duality pairing of x ∗ and x, page 25
η, ξ local image coordinates, page 204
∂α
∂x α αth derivative, page 22
y" ceiling function applied to y, largest integer that is smaller than y,
page 56
medB (u) median filter with B applied to u, page 101
μ restriction of μ to , page 34
∇F gradient of F , page 21
∇ 2F Hessian matrix of F , page 21
∇mF mth derivative of F organized as an m-tensor, page 321
μM norm of the Radon measure μ, page 43
f m,p norm in the Sobolev space H m,p (), page 51
f p norm in the Banach space Lp (, X), page 39
458 Notation
Aberration Ball
chromatic, 222 closed, 17
Adjoint, 27 open, 16
Hilbert space ∼, 32 Banach space, 23
of unbounded mappings, 27 Bandwidth, 128
of the weak gradient, 329 Bessel function
Admissibility condition, 147 modified, 254
Algorithm Bessel’s inequality, 31
Arrow-Hurwicz, 408 Bidual space, 25
edge detection ∼ according to Canny, 96 Bilateral filter, 101
extra gradient ∼, 408 Binomial filter, 84
forward-backward splitting, 397 Black top-hat operator, 95
isodata ∼, 67 Borel algebra, 32
primal-dual, 316 Boundary
Alias effect, 60 topological, 17
Aliasing, 125, 128, 133, 137 Boundary extension, 82
Almost everywhere, 35 constant, 82
Annihilator, 24, 308 periodical, 82
Anti-extensionality symmetrical, 82
of opening, 93 zero-∼, 82
Aperture problem, 11 Boundary initial value problem, 210
Approximation space, 152 Boundary treatment, 82
Artifact, 1, 2, 340 Boundedness, 18
color, 387
compression ∼, 61
staircasing ∼, 376 Caccioppoli set, 355
Average Calculus of variations, 263
moving, 68, 84, 247 direct method of, 263
nonlocal, 103 fundamental lemma of the, 50
Averaging filter Cauchy problem, 184, 191
non-local, 104 CFL condition, 245
Axioms Characteristic
∼ of a scale space, 173 method of ∼s, 240
of a transport equation, 240, 241
17. Abbate, C. DeCusatis, and P.K. Das: Wavelets and Subbands (ISBN 978-0-
8176-4136-8)
18. O. Bratteli, P. Jorgensen, and B. Treadway: Wavelets Through a Looking Glass
(ISBN 978-0-8176-4280-80)
19. H.G. Feichtinger and T. Strohmer: Advances in Gabor Analysis (ISBN 978-0-
8176-4239-6)
20. O. Christensen: An Introduction to Frames and Riesz Bases (ISBN 978-0-8176-
4295-2)
21. L. Debnath: Wavelets and Signal Processing (ISBN 978-0-8176-4235-8)
22. G. Bi and Y. Zeng: Transforms and Fast Algorithms for Signal Analysis and
Representations (ISBN 978-0-8176-4279-2)
23. J.H. Davis: Methods of Applied Mathematics with a MATLAB Overview (ISBN
978-0-8176-4331-7)
24. J.J. Benedetto and A.I. Zayed: Sampling, Wavelets, and Tomography (ISBN
978-0-8176-4304-1)
25. E. Prestini: The Evolution of Applied Harmonic Analysis (ISBN 978-0-8176-
4125-2)
26. L. Brandolini, L. Colzani, A. Iosevich, and G. Travaglini: Fourier Analysis and
Convexity (ISBN 978-0-8176-3263-2)
27. W. Freeden and V. Michel: Multiscale Potential Theory (ISBN 978-0-8176-
4105-4)
28. O. Christensen and K.L. Christensen: Approximation Theory (ISBN 978-0-
8176-3600-5)
29. O. Calin and D.-C. Chang: Geometric Mechanics on Riemannian Manifolds
(ISBN 978-0-8176-4354-6)
30. J.A. Hogan: Time–Frequency and Time–Scale Methods (ISBN 978-0-8176-
4276-1)
31. Heil: Harmonic Analysis and Applications (ISBN 978-0-8176-3778-1)
32. K. Borre, D.M. Akos, N. Bertelsen, P. Rinder, and S.H. Jensen: A Software-
Defined GPS and Galileo Receiver (ISBN 978-0-8176-4390-4)
33. T. Qian, M.I. Vai, and Y. Xu: Wavelet Analysis and Applications (ISBN 978-3-
7643-7777-9)
34. G.T. Herman and A. Kuba: Advances in Discrete Tomography and Its Applica-
tions (ISBN 978-0-8176-3614-2)
35. M.C. Fu, R.A. Jarrow, J.-Y. Yen, and R.J. Elliott: Advances in Mathematical
Finance (ISBN 978-0-8176-4544-1)
36. O. Christensen: Frames and Bases (ISBN 978-0-8176-4677-6)
37. P.E.T. Jorgensen, J.D. Merrill, and J.A. Packer: Representations, Wavelets, and
Frames (ISBN 978-0-8176-4682-0)
38. M. An, A.K. Brodzik, and R. Tolimieri: Ideal Sequence Design in Time-
Frequency Space (ISBN 978-0-8176-4737-7)
39. S.G. Krantz: Explorations in Harmonic Analysis (ISBN 978-0-8176-4668-4)
40. Luong: Fourier Analysis on Finite Abelian Groups (ISBN 978-0-8176-4915-9)
41. G.S. Chirikjian: Stochastic Models, Information Theory, and Lie Groups,
Volume 1 (ISBN 978-0-8176-4802-2)
Applied and Numerical Harmonic Analysis (81 Volumes) 471
42. Cabrelli and J.L. Torrea: Recent Developments in Real and Harmonic Analysis
(ISBN 978-0-8176-4531-1)
43. M.V. Wickerhauser: Mathematics for Multimedia (ISBN 978-0-8176-4879-4)
44. B. Forster, P. Massopust, O. Christensen, K. Gröchenig, D. Labate, P. Van-
dergheynst, G. Weiss, and Y. Wiaux: Four Short Courses on Harmonic Analysis
(ISBN 978-0-8176-4890-9)
45. O. Christensen: Functions, Spaces, and Expansions (ISBN 978-0-8176-4979-
1)
46. J. Barral and S. Seuret: Recent Developments in Fractals and Related Fields
(ISBN 978-0-8176-4887-9)
47. O. Calin, D.-C. Chang, and K. Furutani, and C. Iwasaki: Heat Kernels for
Elliptic and Sub-elliptic Operators (ISBN 978-0-8176-4994-4)
48. C. Heil: A Basis Theory Primer (ISBN 978-0-8176-4686-8)
49. J.R. Klauder: A Modern Approach to Functional Integration (ISBN 978-0-
8176-4790-2)
50. J. Cohen and A.I. Zayed: Wavelets and Multiscale Analysis (ISBN 978-0-8176-
8094-7)
51. Joyner and J.-L. Kim: Selected Unsolved Problems in Coding Theory (ISBN
978-0-8176-8255-2)
52. G.S. Chirikjian: Stochastic Models, Information Theory, and Lie Groups,
Volume 2 (ISBN 978-0-8176-4943-2)
53. J.A. Hogan and J.D. Lakey: Duration and Bandwidth Limiting (ISBN 978-0-
8176-8306-1)
54. Kutyniok and D. Labate: Shearlets (ISBN 978-0-8176-8315-3)
55. P.G. Casazza and P. Kutyniok: Finite Frames (ISBN 978-0-8176-8372-6)
56. V. Michel: Lectures on Constructive Approximation (ISBN 978-0-8176-8402-
0)
57. D. Mitrea, I. Mitrea, M. Mitrea, and S. Monniaux: Groupoid Metrization
Theory (ISBN 978-0-8176-8396-2)
58. T.D. Andrews, R. Balan, J.J. Benedetto, W. Czaja, and K.A. Okoudjou:
Excursions in Harmonic Analysis, Volume 1 (ISBN 978-0-8176-8375-7)
59. T.D. Andrews, R. Balan, J.J. Benedetto, W. Czaja, and K.A. Okoudjou:
Excursions in Harmonic Analysis, Volume 2 (ISBN 978-0-8176-8378-8)
60. D.V. Cruz-Uribe and A. Fiorenza: Variable Lebesgue Spaces (ISBN 978-3-
0348-0547-6)
61. W. Freeden and M. Gutting: Special Functions of Mathematical (Geo-)Physics
(ISBN 978-3-0348-0562-9)
62. A. I. Saichev and W.A. Woyczyński: Distributions in the Physical and Engi-
neering Sciences, Volume 2: Linear and Nonlinear Dynamics of Continuous
Media (ISBN 978-0-8176-3942-6)
63. S. Foucart and H. Rauhut: A Mathematical Introduction to Compressive Sensing
(ISBN 978-0-8176-4947-0)
64. Herman and J. Frank: Computational Methods for Three-Dimensional
Microscopy Reconstruction (ISBN 978-1-4614-9520-8)
472 Applied and Numerical Harmonic Analysis (81 Volumes)
65. Paprotny and M. Thess: Realtime Data Mining: Self-Learning Techniques for
Recommendation Engines (ISBN 978-3-319-01320-6)
66. Zayed and G. Schmeisser: New Perspectives on Approximation and Sampling
Theory: Festschrift in Honor of Paul Butzer’s 85th Birthday (ISBN 978-3-319-
08800-6)
67. R. Balan, M. Begue, J. Benedetto, W. Czaja, and K.A Okoudjou: Excursions in
Harmonic Analysis, Volume 3 (ISBN 978-3-319-13229-7)
68. Boche, R. Calderbank, G. Kutyniok, J. Vybiral: Compressed Sensing and its
Applications (ISBN 978-3-319-16041-2)
69. S. Dahlke, F. De Mari, P. Grohs, and D. Labate: Harmonic and Applied
Analysis: From Groups to Signals (ISBN 978-3-319-18862-1)
70. Aldroubi, New Trends in Applied Harmonic Analysis (ISBN 978-3-319-27871-
1)
71. M. Ruzhansky: Methods of Fourier Analysis and Approximation Theory (ISBN
978-3-319-27465-2)
72. G. Pfander: Sampling Theory, a Renaissance (ISBN 978-3-319-19748-7)
73. R. Balan, M. Begue, J. Benedetto, W. Czaja, and K.A Okoudjou: Excursions in
Harmonic Analysis, Volume 4 (ISBN 978-3-319-20187-0)
74. O. Christensen: An Introduction to Frames and Riesz Bases, Second Edition
(ISBN 978-3-319-25611-5)
75. E. Prestini: The Evolution of Applied Harmonic Analysis: Models of the Real
World, Second Edition (ISBN 978-1-4899-7987-2)
76. J.H. Davis: Methods of Applied Mathematics with a Software Overview, Second
Edition (ISBN 978-3-319-43369-1)
77. M. Gilman, E. M. Smith, S. M. Tsynkov: Transionospheric Synthetic Aperture
Imaging (ISBN 978-3-319-52125-1)
78. S. Chanillo, B. Franchi, G. Lu, C. Perez, E.T. Sawyer: Harmonic Analysis,
Partial Differential Equations and Applications (ISBN 978-3-319-52741-3)
79. R. Balan, J. Benedetto, W. Czaja, M. Dellatorre, and K.A Okoudjou: Excursions
in Harmonic Analysis, Volume 5 (ISBN 978-3-319-54710-7)
80. Pesenson, Q.T. Le Gia, A. Mayeli, H. Mhaskar, D.X. Zhou: Frames and Other
Bases in Abstract and Function Spaces: Novel Methods in Harmonic Analysis,
Volume 1 (ISBN 978-3-319-55549-2)
81. Pesenson, Q.T. Le Gia, A. Mayeli, H. Mhaskar, D.X. Zhou: Recent Applications
of Harmonic Analysis to Function Spaces, Differential Equations, and Data
Science: Novel Methods in Harmonic Analysis, Volume 2 (ISBN 978-3-319-
55555-3)
82. F. Weisz: Convergence and Summability of Fourier Transforms and Hardy
Spaces (ISBN 978-3-319-56813-3)
83. Heil: Metrics, Norms, Inner Products, and Operator Theory (ISBN 978-3-319-
65321-1)
84. S. Waldron: An Introduction to Finite Tight Frames: Theory and Applications.
(ISBN: 978-0-8176-4814-5)
85. Joyner and C.G. Melles: Adventures in Graph Theory: A Bridge to Advanced
Mathematics. (ISBN: 978-3-319-68381-2)
Applied and Numerical Harmonic Analysis (81 Volumes) 473