100% found this document useful (3 votes)

671 views481 pages

Mathematical Image Processing 9783030014575 9783030014582

Uploaded by

Sasikala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (3 votes)

671 views481 pages

Mathematical Image Processing 9783030014575 9783030014582

Uploaded by

Sasikala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 481

Applied and Numerical Harmonic Analysis

Kristian Bredies
Dirk Lorenz

Mathematical
Image
Processing
Applied and Numerical Harmonic Analysis

Series Editor
John J. Benedetto
University of Maryland
College Park, MD, USA

Editorial Advisory Board

Akram Aldroubi Gitta Kutyniok

Vanderbilt University Technische Universität Berlin
Nashville, TN, USA Berlin, Germany

Douglas Cochran Mauro Maggioni

Arizona State University Duke University
Phoenix, AZ, USA Durham, NC, USA

Hans G. Feichtinger Zuowei Shen

University of Vienna National University of Singapore
Vienna, Austria Singapore, Singapore

Christopher Heil Thomas Strohmer

Georgia Institute of Technology University of California
Atlanta, GA, USA Davis, CA, USA

Stéphane Jaffard Yang Wang

University of Paris XII Michigan State University
Paris, France East Lansing, MI, USA

Jelena Kovačević
Carnegie Mellon University
Pittsburgh, PA, USA

More information about this series at http://www.springer.com/series/4968

Kristian Bredies • Dirk Lorenz

Mathematical Image
Processing
Kristian Bredies Dirk Lorenz
Institute for Mathematics and Scientific Braunschweig, Germany
University of Graz
Graz, Austria

ISSN 2296-5009 ISSN 2296-5017 (electronic)

Applied and Numerical Harmonic Analysis
ISBN 978-3-030-01457-5 ISBN 978-3-030-01458-2 (eBook)
https://doi.org/10.1007/978-3-030-01458-2

Library of Congress Control Number: 2018961010

© Springer Nature Switzerland AG 2018

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This book is published under the imprint Birkhäuser, www.birkhauser-science.com by the registered
company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
ANHA Series Preface

The Applied and Numerical Harmonic Analysis (ANHA) book series aims to
provide the engineering, mathematical, and scientific communities with significant
developments in harmonic analysis, ranging from abstract harmonic analysis to
basic applications. The title of the series reflects the importance of applications
and numerical implementation, but richness and relevance of applications and
implementation depend fundamentally on the structure and depth of theoretical
underpinnings. Thus, from our point of view, the interleaving of theory and
applications and their creative symbiotic evolution is axiomatic.
Harmonic analysis is a wellspring of ideas and applicability that has flourished,
developed, and deepened over time within many disciplines and by means of
creative cross-fertilization with diverse areas. The intricate and fundamental rela-
tionship between harmonic analysis and fields such as signal processing, partial
differential equations (PDEs), and image processing is reflected in our state-of-the-
art ANHA series.
Our vision of modern harmonic analysis includes mathematical areas such as
wavelet theory, Banach algebras, classical Fourier analysis, time-frequency analysis,
and fractal geometry, as well as the diverse topics that impinge on them.
For example, wavelet theory can be considered an appropriate tool to deal with
some basic problems in digital signal processing, speech and image processing,
geophysics, pattern recognition, biomedical engineering, and turbulence. These
areas implement the latest technology from sampling methods on surfaces to fast
algorithms and computer vision methods. The underlying mathematics of wavelet
theory depends not only on classical Fourier analysis, but also on ideas from abstract
harmonic analysis, including von Neumann algebras and the affine group. This leads
to a study of the Heisenberg group and its relationship to Gabor systems, and of the
metaplectic group for a meaningful interaction of signal decomposition methods.
The unifying influence of wavelet theory in the aforementioned topics illustrates the
justification for providing a means for centralizing and disseminating information
from the broader, but still focused, area of harmonic analysis. This will be a key role
of ANHA. We intend to publish with the scope and interaction that such a host of
issues demands.

v
vi ANHA Series Preface

Along with our commitment to publish mathematically significant works at the

frontiers of harmonic analysis, we have a comparably strong commitment to publish
major advances in the following applicable topics in which harmonic analysis plays
a substantial role:

Antenna theory Prediction theory

Biomedical signal processing Radar applications
Digital signal processing Sampling theory
Fast algorithms Spectral estimation
Gabor theory and applications Speech processing
Image processing Time-frequency and
Numerical partial differential equations time-scaleanalysis
Wavelet theory

The above point of view for the ANHA book series is inspired by the history of
Fourier analysis itself, whose tentacles reach into so many fields.
In the last two centuries Fourier analysis has had a major impact on the
development of mathematics, on the understanding of many engineering and
scientific phenomena, and on the solution of some of the most important problems
in mathematics and the sciences. Historically, Fourier series were developed in
the analysis of some of the classical PDEs of mathematical physics; these series
were used to solve such equations. In order to understand Fourier series and the
kinds of solutions they could represent, some of the most basic notions of analysis
were defined, e.g., the concept of “function." Since the coefficients of Fourier
series are integrals, it is no surprise that Riemann integrals were conceived to deal
with uniqueness properties of trigonometric series. Cantor’s set theory was also
developed because of such uniqueness questions.
A basic problem in Fourier analysis is to show how complicated phenomena,
such as sound waves, can be described in terms of elementary harmonics. There are
two aspects of this problem: first, to find, or even define properly, the harmonics or
spectrum of a given phenomenon, e.g., the spectroscopy problem in optics; second,
to determine which phenomena can be constructed from given classes of harmonics,
as done, for example, by the mechanical synthesizers in tidal analysis.
Fourier analysis is also the natural setting for many other problems in engineer-
ing, mathematics, and the sciences. For example, Wiener’s Tauberian theorem in
Fourier analysis not only characterizes the behavior of the prime numbers, but also
provides the proper notion of spectrum for phenomena such as white light; this
latter process leads to the Fourier analysis associated with correlation functions in
filtering and prediction problems, and these problems, in turn, deal naturally with
Hardy spaces in the theory of complex variables.
Nowadays, some of the theory of PDEs has given way to the study of Fourier
integral operators. Problems in antenna theory are studied in terms of unimodular
trigonometric polynomials. Applications of Fourier analysis abound in signal
processing, whether with the fast Fourier transform (FFT), or filter design, or the
ANHA Series Preface vii

adaptive modeling inherent in time-frequency-scale methods such as wavelet theory.

The coherent states of mathematical physics are translated and modulated Fourier
transforms, and these are used, in conjunction with the uncertainty principle, for
dealing with signal reconstruction in communications theory. We are back to the
raison d’être of the ANHA series!

University of Maryland John J. Benedetto

College Park, MD, USA Series Editor
Preface

Mathematical imaging is the treatment of mathematical objects that stand for images
where an “image” is just what is meant in everyday conversation, i.e., a picture of
a real scene, a photograph, or a scan. In this book, we treat images as continuous
objects, i.e., as image of a continuous scene or, put differently, as a function of a
continuous variable. The closely related field of digital imaging, on the other hand,
treats discrete images, i.e., images that are described by a finite number of values
or pixels. Mathematical imaging is a subfield of computer vision where one tries
to understand how information is stored in images and how information can be
extracted from images in an automatic way. Methods of computer vision usually
use underlying mathematical models for images and the information therein. To
apply methods for continuous images in practice, i.e., to digital images, one has
to discretize the methods. Hence, mathematical imaging and digital imaging are
closely related and often methods in both fields are developed simultaneously. A
method based on a mathematical model is useful only if it can be implemented in an
efficient way and the mathematical treatment of a digital method often reveals the
underlying assumptions and may explain observed effects.
This book emphasizes the mathematical character of imaging and as such is
geared toward students of mathematical subjects. However, students of computer
science, engineering, or natural sciences who have a knack for mathematics may
also find this book useful. We assume knowledge of introductory courses like
linear algebra, calculus, and numerical analysis; some basics of real analysis and
functional analysis are advantageous. The book should be suited for students in their
third year; however, later chapters of the book use some advanced mathematics. In
this book, we give an overview of mathematical imaging; we describe methods and
solutions for standard problems in imaging. We will also introduce elementary tools
as histograms and linear and morphological filters since they often suffice to solve
a given task. A special focus is on methods based on multiscale representations,
partial differential equations, and variational methods In most cases, we illustrate
how the methods can be realized practically, i.e., we derive applicable algorithms.
This book can serve as the basis for a lecture on mathematical imaging, but is also
possible to use parts in lectures on applied mathematics or advanced seminars.

ix
x Preface

The introduction of the book outlines the mathematical framework and intro-
duces the basic problems of mathematical imaging. Since we will need mathematics
from quite different fields, there is a chapter on mathematical basics. Advanced
readers may skip this chapter, just use it to brush up their knowledge, or use it as a
reference for the terminology used in this book. The chapter on mathematical basics
does not cover the basics we will need. Many mathematics facts and concepts are
introduced when they are needed for specific methods. Mathematical imaging itself
is treated in Chaps. 3–6. We organized the chapters according to the methods, and
not according to the problems. Somehow we present a box of tools that shall serve
as a reservoir of methods so that the user can pick, combine, and develop tools
that seem to be best suited for the problem at hand. These mathematical chapters
conclude with exercises which shall help to develop a deeper understanding of
the methods and techniques. Some exercises involve programming, and we would
like to encourage all readers to try to implement the method in their favorite
programming language. As with every book, there are a lot of topics which did
not find their way into the book. We would still like to mention some of these topics
in the sections called “Further developments.”
Finally, we would like to thank all the people who contributed to this book in one
way or another: Matthias Bremer, Jan Hendrik Kobarg, Christian Kruschel, Rainer
Löwen, Peter Maaß, Markus Müller (who did a large part of the translation from the
German edition), Tobias Preusser, and Nadja Worliczek.

Graz, Austria Kristian Bredies

Braunschweig, Germany Dirk Lorenz
August 2018
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.1 What Are Images? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.2 The Basic Tasks of Imaging . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5
2 Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15
2.1 Fundamentals of Functional Analysis .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15
2.1.1 Analysis on Normed Spaces. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 16
2.1.2 Banach Spaces and Duality . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 23
2.1.3 Aspects of Hilbert Space Theory.. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 29
2.2 Elements of Measure and Integration Theory . . . . .. . . . . . . . . . . . . . . . . . . . 32
2.2.1 Measure and Integral . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 32
2.2.2 Lebesgue Spaces and Vector Spaces of Measures.. . . . . . . . . . . . 38
2.2.3 Operations on Measures .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 45
2.3 Weak Differentiability and Distributions . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 49
3 Basic Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 55
3.1 Continuous and Discrete Images . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 55
3.1.1 Interpolation.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 55
3.1.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 59
3.1.3 Error Measures .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 61
3.2 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 62
3.3 Linear Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 68
3.3.1 Definition and Properties . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 69
3.3.2 Applications .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 75
3.3.3 Discretization of Convolutions .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 81
3.4 Morphological Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 86
3.4.1 Fundamental Operations: Dilation and Erosion . . . . . . . . . . . . . . . 88
3.4.2 Concatenated Operations .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 92
3.4.3 Applications .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 95
3.4.4 Discretization of Morphological Operators.. . . . . . . . . . . . . . . . . . . 97
3.5 Further Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 101
3.6 Exercises.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 105

xi
xii Contents

4 Frequency and Multiscale Methods . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 109

4.1 The Fourier Transform .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 109
4.1.1 The Fourier Transform on L1 (Rd ) . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 109
4.1.2 The Fourier Transform on L2 (Rd ) . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 112
4.1.3 The Fourier Transform for Measures and Tempered
Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 120
4.2 Fourier Series and the Sampling Theorem .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 125
4.2.1 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 125
4.2.2 The Sampling Theorem.. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 126
4.2.3 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 128
4.3 The Discrete Fourier Transform .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 135
4.4 The Wavelet Transform .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 141
4.4.1 The Windowed Fourier Transform .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 141
4.4.2 The Continuous Wavelet Transform . . . . . . .. . . . . . . . . . . . . . . . . . . . 144
4.4.3 The Discrete Wavelet Transform .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 149
4.4.4 Fast Wavelet Transforms . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 156
4.4.5 The Two-Dimensional Discrete Wavelet Transform . . . . . . . . . . 161
4.5 Further Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 165
4.6 Exercises.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 166
5 Partial Differential Equations in Image Processing ... . . . . . . . . . . . . . . . . . . . 171
5.1 Axiomatic Derivation of Partial Differential Equations .. . . . . . . . . . . . . . 172
5.1.1 Scale Space Axioms . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 173
5.1.2 Examples of Scale Spaces . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 176
5.1.3 Existence of an Infinitesimal Generator . . .. . . . . . . . . . . . . . . . . . . . 186
5.1.4 Viscosity Solutions . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 191
5.2 Standard Models Based on Partial Differential Equations . . . . . . . . . . . . 196
5.2.1 Linear Scale Spaces: The Heat Equation . .. . . . . . . . . . . . . . . . . . . . 196
5.2.2 Morphological Scale Space . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 199
5.3 Nonlinear Diffusion .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 206
5.3.1 The Perona-Malik Equation .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 207
5.3.2 Anisotropic Diffusion . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 222
5.4 Numerical Solutions of Partial Differential Equations . . . . . . . . . . . . . . . . 229
5.4.1 Diffusion Equations . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 234
5.4.2 Transport Equations . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 240
5.5 Further Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 246
5.6 Exercises.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 247
6 Variational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 251
6.1 Introduction and Motivation .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 251
6.2 Foundations of the Calculus of Variations and Convex Analysis .. . . . 263
6.2.1 The Direct Method.. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 263
6.2.2 Convex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 270
6.2.3 Subdifferential Calculus . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 285
6.2.4 Fenchel Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 301
Contents xiii

6.3 Minimization in Sobolev Spaces and BV . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 316

6.3.1 Functionals with Sobolev Penalty.. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 316
6.3.2 Practical Applications . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 334
6.3.3 The Total Variation Penalty . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 351
6.3.4 Generalization to Color Images . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 385
6.4 Numerical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 391
6.4.1 Solving a Partial Differential Equation . . . .. . . . . . . . . . . . . . . . . . . . 392
6.4.2 Primal-Dual Methods .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 396
6.4.3 Application of the Primal-Dual Methods ... . . . . . . . . . . . . . . . . . . . 415
6.5 Further Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 425
6.6 Exercises.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 432

References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 445

Picture Credits.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 453

Notation . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 455

Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 461

Applied and Numerical Harmonic Analysis (81 Volumes) . . . . . . . . . . . . . . . . . . 469

Chapter 1
Introduction

1.1 What Are Images?

We omit the philosophical aspect of the question “What are images?” and aim
to answer the question “What kind of images are there?” instead. Images can be
produced in many different ways:
Photography: Photography produces two-dimensional images by projecting a
scene of the real world through some optics onto a two-dimensional image plane.
The optics are focused onto some plane, called the focal plane, and objects appear
more blurred the farther they are from the focal plane. Hence, photos usually have
both sharp and blurred regions.
At first, photography was based on chemical reactions to map the different
values of brightness and color onto photographic film. Then some other chemical
reactions were used to develop the film and to produce photoprints. Each of the
different chemical reactions happens with some slight uncontrolled variations,
and hence the photoprint does not exactly correspond to the real brightness and
color values. In particular, photographic film has a certain granularity, which
amounts to a certain noise in the picture.
Nowadays, most photos are obtained digitally. Here, the brightness and color
are measured digitally at certain places—the pixels, or picture elements. This
results in a matrix of brightness or color values. The process of digital picture
acquisition also results in some noise in the picture.
Scans: To digitize photos one may use a scanner. The scanner illuminates the
photo row by row and measures the brightness or color along the lines. Usually
this does not result in some additional blur. However, a scanner operates at some
resolution, which results in a reduction of information. Moreover, the scanning
process may result in some additional artifacts. Older scans are often pale and
may contain some contamination. The correction of such errors is an important
problem in image processing.

© Springer Nature Switzerland AG 2018 1

K. Bredies, D. Lorenz, Mathematical Image Processing,
Applied and Numerical Harmonic Analysis,
https://doi.org/10.1007/978-3-030-01458-2_1
2 1 Introduction

Microscopy: Digital and analog microscopy is a kind of mixture of photography

and scanning. One uses measurement technology similar to photography but
since the light cone is very wide in this case, the depth of field is very low.
Consequently, objects that are not in the focal plane appear very blurred and
are virtually invisible. This results in images that are almost two-dimensional.
Confocal microscopy exacerbates this effect in that only the focal plane is
illuminated, which suppresses objects not in the focal plane even further.
Indirect imaging: In some scenarios one cannot measure the image data directly.
Prominent examples are computerized tomography (CT) and ultrasound imaging.
In the case of CT, for example, one acquires X-ray scans of the object from
different directions. These scans are then used to reconstruct a three-dimensional
image of the density of the object.
Here one needs some mathematics to reconstruct the image in the first
place [103]. The reconstruction often generates some artifacts, and due to noise
from the measurement process, the reconstructed images are also not noise-free.
Other indirect imaging modalities are single-photon emission computerized
tomography (SPECT), positron-emission-tomography (PET), seismic tomogra-
phy, and also holography.
Generalized images: Data different from the above cases can also be treated as
images. In industrial engineering, for example, one measures surfaces to check
the smoothness of some workpiece. This results in a two-dimensional “elevation
profile” that can be treated just like an image. In other cases one may have
measured a chemical concentration or the magnetization.
A slightly different example is liquid chromatography-mass spectrometry
(LC/MS). Here one measures time-dependent one-dimensional mass spectra.
The resulting two-dimensional data has a mass and a time dimension and can
be treated by imaging techniques.
Some examples of different kinds of images can be seen in Fig. 1.1.
As we have seen, images do not have to be two-dimensional. Hence, we will work
with d-dimensional images in general (which includes volume data and movies). In
special cases we will restrict ourselves to one- or two-dimensional data.
Let us return to the question of what images are. We take a down-to-earth
viewpoint and formulate an answer mathematically: an image is a function that
maps every point in some domain of definition to a certain color value. In other
words: an image u is a map from an image domain to some color space F :

u : → F.

We distinguish between discrete and continuous image domains:

• discrete d-dimensional images, for example = {1, . . . , N1 } × · · · ×
{1, . . . , Nd }.
• continuous d-dimensional images, for example ⊂ Rd , or specifically =
[0, a1 ] × · · · × [0, ad ].
1.1 What Are Images? 3

Fig. 1.1 Different types of images. First row: Photos. Second row: A scan and a microscopy image
of cells. Third row: An image from indirect measurements (holography image of droplets) and a
generalized image (“elevation profile” of a surface produced by a turning process)

Different color spaces are, for example:

• Black-and-white images (also binary images): F = {0, 1}.
• Grayscale images with discrete color space with k-bit depth: F =
{0, . . . , 2k − 1}.
• Color images with k-bit depth for each of N color channels: F =
{0, . . . , 2k − 1}N .
• Images with continuous gray values: F = [0, 1] or F = R.
• Images with continuous colors: F = [0, 1]3 or F = R3 .
4 1 Introduction

The field of digital image processing treats mostly discrete images, often also
with discrete color space. This is reasonable in the sense that images are most often
generated in discrete form or have to be transformed to a discrete image before
further automatic processing. The methods that are used are often motivated by
continuous considerations. In this book we take the viewpoint that our images are
continuous objects ( ⊂ Rd ). Hence, we will derive methods for continuous images
with continuous color space. Moreover, we will deal mostly with grayscale images
(F = R or F = [0, 1]).
The mathematical treatment of color images is a delicate subject. For example,
one has to be aware of the question of how to measure distances in the color space:
is the distance from red to blue larger than that from red to yellow? Moreover, the
perception of color is very complex and also subjective. Colors can be represented
in different color spaces and usually they are encoded in different color channels.
For example, there are the RGB space, where colors are mixed additively from the
red, green, and blue channels (as on screens and monitors) and the CMYK space,
where colors are mixed subtractively from the cyan (C), magenta (M), yellow (Y),
and black (K for black) channels (as is common in print). In the RGB space, color
values are encoded by triplets (R, G, B) ∈ [0, 1]3, where the components represent
the amount of the respective color; (0, 0, 0) represents the color black, (1, 1, 1)
stands for white. This is visualized in the so-called RGB cube; see Fig. 1.2. Also the
colors cyan, magenta, and yellow appear as corners of the color cube. To process
color images one often uses the so-called HSV space: a color is described by the
channels Hue, Saturation, and Value. In the HSV space a color is encoded by a
triplet (H, S, V ) ∈ [0, 360[ × [0, 100] × [0, 100]. The hue H is interpreted as an
angle, the saturation S and the value V as percentages. The HSV space is visualized
as a cylinder; see Fig. 1.3. Processing only the V-channel for the value (and leaving
the other channels untouched) often leads to fairly good results in practice.
The goal of image processing is to automate or facilitate the evaluation and
interpretation of images. One speaks of high-level methods if one obtains certain
information from the images (e.g., the number of objects, the viewpoint of the

R=(1,0,0) M=(1,0,1)

Y=(1,1,0)
W=(1,1,1)

K=(0,0,0)
B=(0,0,1)

G=(0,1,0) C=(0,1,1)

Fig. 1.2 RGB space, visualized as a cube

1.2 The Basic Tasks of Imaging 5

Fig. 1.3 HSV space, H

visualized as a cylinder

camera, the size of an object, and even the meaning of the scene). Low-level methods
are methods that produce new and improved images out of given images. This book
treats mainly low-level methods.
For the automatic processing of images one usually focuses on certain properties
and structures of interest. These may be, for example:
Edges, corners: An edge describes the boundary between two different struc-
tures, e.g., between different objects. However, a region in the shade may also be
separated from a lighter area by an edge.
Smooth regions: Objects with uniform color appear as smooth regions. If the
object is curved, the illumination creates a smooth transition of the brightness.
Textures: The word “texture” mostly stands for something like a pattern. This
refers, for example, to the fabric of a cloth, the structure of wallpapers, or fur.
Periodic structures: Textures may feature some periodic structures. These struc-
tures may have different directions and different frequencies and also occur as
the superposition of different periodic structures.
Coherent regions: Coherent regions are regions with a similar orientation of
objects as, for example, in the structure of wood or hair.
If such structures are to be detected or processed automatically, one needs good
models for the structures. Is an edge adequately described by a sudden change of
brightness? Does texture occur where the gray values have a high local variance?
The choice of the model then influences how methods are derived.

1.2 The Basic Tasks of Imaging

Many problems in imaging can be reduced to only a few basic tasks. This section
presents some of the classical basic tasks. The following chapters of this book will
introduce several tools that can be used for these basic tasks. For a specific real-
6 1 Introduction

Fig. 1.4 Unfavorable light conditions lead to noisy images. Left: Photo taken in dim light. Right:
Gray values along the depicted line

world application one usually deals with several problems and tasks and has to
combine and adapt methods or even invent new methods.
Denoising: Digital images contain erroneous information. Modern cameras that
can record images with several megapixels still produce noisy images, see
Fig. 1.4; in fact, it is usually the case that an increase of resolution also results
in a higher noise level. The camera’s chip uses the photon count to measure the
brightness. Since the emission of photons is fundamentally a random process,
the measurement is also a random variable and hence contains some noise.The
presence of noise is an inherent problem in imaging. The task of denoising is:
• Identify and remove the noise but at the same time preserve all important
information and structure.
Noise does not pose a serious problem for the human eye. We have no problems
with images with high noise level, but computers are different. To successfully
denoise an image, one needs a good model for the noise and the image. Some
reasonable assumptions are, for example:
• The noise is additive.
• The noise is independent of the pixel and comes from some distribution.
• The image consists of piecewise smooth regions that are separated by lines.
In this book we will treat denoising at the following places: Example 3.12,
Example 3.25, Sect. 3.4.4, Sect. 3.5, Example 4.19, Exam-
ple 5.5, Remark 5.21, Example 5.39, Example 5.40, Example 6.1,
Application 6.94, and Example 6.124.
Image decomposition: This usually refers to an additive decomposition of an
image into different components. The underlying assumption is that an image is
1.2 The Basic Tasks of Imaging 7

a superposition of different parts, e.g.,

image = cartoon + texture + noise.

Here, “cartoon” refers to a rough sketch of the image in which textures and
similar components are omitted, and “texture” refers to these textures and other
fine structure.
A decomposition of an image can be successful if one has good models for the
different components.
In this book we will treat image decomposition at these places: Example 4.20,
Sect. 6.5.
Enhancement, deblurring: Besides noise, there are other errors that may be
present in images:
• Blur due to wrong focus: If the focus is not adjusted properly, one point in the
image is mapped to an entire region on the film or chip.
• Blur due to camera motion: The object or the camera may move during
exposure time. One point of the object is mapped to a line on the film or
chip.
• Blur due to turbulence: This occurs, e.g., as “shimmering” of the air above a
hot street but also is present in the observation of astronomic objects.
• Blur due to erroneous optics: One of the most famous examples is the Hubble
Telescope. Only after the launch of the telescope was it recognized that one
mirror had not been made properly. Since a fix in orbit was not considered
appropriate at the beginning, elaborate digital methods to correct the resulting
errors were developed.
See Fig. 1.5 for illustrations of these defects.
The goal of enhancement is to reduce the blur in images. The more is known
about the type of blur, the better. Noise is a severe problem for enhancement and
deblurring, since usually, noise is also amplified during deblurring or sharpening.
Methods for deblurring are developed at the following places in this book:
Application 3.24, Remark 4.21, Example 6.2, Appplication 6.97,
and Example 6.127.
Edge detection: One key component to the understanding of images is the
detection of edges:
• Edges separate different objects or an object from the background.
• Edges help to infer the geometry of a scene.
• Edges describe the shape of an object.
Edges pose different questions:
• How to define an edge mathematically?
• Edges exist at different scales (e.g., fine edges describe the shape of bricks,
while coarse edges describe the shape of a house). Which edges are important
and should be detected?
8 1 Introduction

Fig. 1.5 Blur in images. Top left: Blur due to wrong focus; top right: motion blur due to camera
shake; bottom left: shimmering; bottom right: an image from the Hubble Telescope before the error
was corrected

We cover edge detection at the following places: Application 3.23 and

Application 5.33.
Segmentation: The goal of segmentation is to decompose an image into different
objects. In its simplest form, one object is to be separated from the background.
At first glance, this sounds very similar to edge detection. However, in segmen-
tation one focuses on decomposing the whole image into regions, and it may be
that different objects are not separated by an edge. However, if all objects are
1.2 The Basic Tasks of Imaging 9

separated by edges, edge detection is a method for segmentation. In other cases,

there are other methods to separate objects without focusing on edges.
There are several problems:
• While object boundaries are easy to see for the human eye, they are not easy
to detect automatically if they are not sharp but blurred due to conditions as
described above.
• Objects may have different color values, and often these are distorted by noise.
• Edges are distorted by noise, too, and may be rough, even without noise.
Segmentation is treated at the following places in this book: Application 3.11,
Application 3.40, and in Sect. 6.5.
Optical flow computation: Movement is another attribute that can be used to
identify and characterize objects and help to understand a scene. If one considers
image sequences instead of images one can infer information about movements
from digital data.
The movement of an object may result in a change of the gray value of a specific
pixel. However, a change of the gray value may also be caused by other means,
e.g., a change in illumination. Hence, one distinguishes between the real field of
motion and the so-called optical flow. The real field of motion of a scene is the
projection of the motion in the three-dimensional scene onto the image plane.
One aims to extract this information from an image scene. The optical flow is
the pattern of apparent motion, i.e., the change that can be seen. The real field of
motion and the optical flow coincide only in special cases:
• A single-colored ball rotates. The optical flow is zero, but the real field of
motion is not.

• A ball at rest is illuminated by a moving light source. The real field of motion
is zero, but the optical flow is not.
10 1 Introduction

The correspondence problem is a consequence of the fact that the optical flow
and the real field of motion do not coincide. In some cases different fields of
motion may cause the same difference between two images. Also there may be
some points in one image that may have moved to more than one place in the
other image:
• A regular grid is translated. If we observe only a small part of the grid, we
cannot detect a translation that is approximately a multiple of the grid size.

• Different particles move. If the mean distance between particles is larger than
their movements, we may find a correspondence; if the movement is too large,
we cannot.

• A balloon is inflated. Since the surface of the object increases, one point does
not have a single trajectory, but splits up into multiple points.
1.2 The Basic Tasks of Imaging 11

The aperture problem is related the correspondence problem. Since we see only
a part of the whole scene, we may not be able to trace the motion of some object
correctly. If a straight edge moves through the picture, we cannot detect any
motion along the direction of the edge. Similarly, we are unable to detect whether
a circle is rotating. This problem does not occur if the edges of the object have a
varying curvature, as illustrated by this picture:

To solve these problems, we make additional assumptions, for example:

• The illumination does not change.
• The objects do change their shape.
• The motion fields are smooth (e.g., differentiable).
In this book we will not treat methods to determine the optical flow. In Chap. 6,
however, we will introduce a class of methods that can be adapted for this task,
see also Sect. 6.5. Moreover, the articles [17, 24, 78] and the book [8] may be
consulted.
Registration: In registration one aims to map one image onto another one. This
is used in medical contexts, for example: If a patient is examined, e.g., by CT
at different times, the images should show the same information (apart from
local variations), but the images will not be aligned similarly. A similar problem
occurs if the patient is examined with both a CT scan and a scan with magnetic
resonance tomography. The images show different information, but may be
aligned similarly.
The task in registration is to find a deformation of the image domains such that
the content of one image is mapped to the content of the other one. Hence, the
12 1 Introduction

problem is related to that of determining the optical flow. Thus, there are similar
problems, but there are some differences:
• Both images may come from different imaging modalities with different
properties (e.g., the images may have a different range of gray values or
different characteristics of the noise).
• There is no notion of time regularity, since there are only two images.
• In practice, the objects may not be rigid.
As for optical flow, we will not treat registration in this book. Again, the methods
from Chap. 6 can be adapted for this problem, too; see Sect. 6.5. Moreover,
one may consult the book [100].
Restoration (inpainting): Inpainting means the reconstruction of destroyed parts
of an image. Reasons for missing parts of an image may be:
• Scratches in old photos.
• Occlusion of objects by other objects.
• Destroyed artwork.
• Occlusion of an image by text.
• Failure of sensors.
• Errors during transmission of images.
There may be several problems:
• If a line is covered, it is not clear whether there may have been two different,
separated, objects.

• If there is an occluded crossing, one cannot tell which line is in front and
which is in back.
1.2 The Basic Tasks of Imaging 13

We are going to treat inpainting in this book at the following places: Sect. 5.5,
Example 6.4, Application 6.98, and Example 6.128.
Compression: “A picture is worth a thousand words.” However, it needs even
more disk space:
1 letter = 1bByte
1 word ≈ 8 letters = 8 bytes
1000 words ≈ 8 KB.

1 pixel = 3 bytes
1 picture ≈ 4,000,000 pixels ≈ 12 MB
So one picture is worth about 1,500,000 words!
To transmit an uncompressed image with, say, four megapixels via email with an
upstream capacity of 128 KB/s, the upload will take about 12 min. However,
image data is usually somewhat redundant, and an appropriate compression
allows for significant reduction of this time. Several different compression
methods have entered our daily lives, e.g. JPEG, PNG, and JPEG2000.
One distinguishes between lossless and lossy compression. Lossless compression
allows for a reconstruction of the image that is accurate bit by bit. Lossy
compression, on the other hand, allows for a reconstruction of an image that
is very similar to the original image. Inaccuracies and artifacts are allowed as
long as they are not disturbing to the human observer.
• How to measure the “similarity” of images?
• Compression should work for a large class of images. However, a simple
reduction of the color values works well for simple graphics or diagrams, but
not for photos.
We will treat compression of images in this book at the following places:
Sect. 3.1.3, Application 4.53, Application 4.73, and Remark 6.5.
Chapter 2
Mathematical Preliminaries

Abstract Mathematical image processing, as a branch of applied mathematics, is

not a self-contained theory of its own, but rather builds on a variety of different
fields, such as Fourier analysis, the theory of partial differential equations, and
inverse problems. In this chapter, we deal with some of those fundamentals that
commonly are beyond the scope of introductory lectures on analysis and linear
algebra. In particular, we introduce several notions of functional analysis and
briefly touch upon measure theory in order to study classes of Lebesgue spaces.
Furthermore, we give an introduction to the theory of weak derivatives as well as
Sobolev spaces. The following presentation is of reference character, focusing on
the development of key concepts and results, omitting proofs where possible. We
also give references for further studies of the respective issues.

Mathematical image processing, as a branch of applied mathematics, is not a self-

contained theory of its own, but rather builds on a variety of different fields,
such as Fourier analysis, the theory of partial differential equations, and inverse
problems. In this chapter, we deal with some of those fundamentals that commonly
are beyond the scope of introductory lectures on analysis and linear algebra. In
particular, we introduce several notions of functional analysis and briefly touch upon
measure theory in order to study classes of Lebesgue spaces. Furthermore, we give
an introduction to the theory of weak derivatives as well as Sobolev spaces. The
following presentation is of reference character, focusing on the development of
key concepts and results, omitting proofs where possible. We also give references
for further studies of the respective issues.

2.1 Fundamentals of Functional Analysis

For image processing, mainly those aspects of functional analysis are of interest that
deal with function spaces (as mathematically, images are modeled as functions).
Later, we shall see that, depending on the space in which an image is contained,

© Springer Nature Switzerland AG 2018 15

K. Bredies, D. Lorenz, Mathematical Image Processing,
Applied and Numerical Harmonic Analysis,
https://doi.org/10.1007/978-3-030-01458-2_2
16 2 Mathematical Preliminaries

it exhibits different analytical properties. Functional analysis allows us to abstract

from concrete spaces and obtain assertions based on these abstractions. For this
purpose, the notions of normed spaces, Banach- and Hilbert spaces are essential.

2.1.1 Analysis on Normed Spaces

Let K denote either the field of real numbers R or complex numbers C. For complex
numbers, the real part, the imaginary part, the conjugate, and the absolute value are
respectively defined by
√
z = a + ib with a, b ∈ R : Re z = a, Im z = b, z = a − ib, |z| = zz.

Definition 2.1 (Normed Space) Let X be a vector space over K. A function · :

X → [0, ∞[ is called a norm if it exhibits the following properties:
1. λx = |λ|x for λ ∈ K and x ∈ X, (positive homogeneity)
2. x + y ≤ x + y for x, y ∈ X, (triangle inequality)
3. x = 0 ⇔ x = 0. (positive definiteness)
The pair (X, · ) is then called a normed space.
Two norms · 1 , · 2 on X are called equivalent if there exist constants
0 < c < C such that

cx1 ≤ x2 ≤ Cx1 for all x ∈ X.

In order to distinguish norms, we may add the name of the underlying vector
space to it, for instance, · X for the norm on X. It is also common to refer to X
itself as the normed space if the norm used is obvious due to the context. Norms
on finite-dimensional spaces will be denoted by | · | in many cases. Since in finite-
dimensional spaces all norms are equivalent, they play a different role from that in
infinite-dimensional spaces.
Example 2.2 (Normed Spaces) Obviously, the pair (K, | · |) is a normed space. For
N ≥ 1 and 1 ≤ p < ∞,

N 1/p
|x|p = |xi |p and |x|∞ = max |xi |
i∈{1,...,N}
i=1

define equivalent norms on KN . The triangle inequality for | · |p is also known as

the Minkowski inequality. For p = 2, we call | · |2 the Euclidean vector norm and
normally abbreviate | · | = | · |2 .
The norm on a vector space directly implies a topology on X, the norm topology
or “strong” topology: for x ∈ X and r > 0, define the open r-ball around x as
2.1 Fundamentals of Functional Analysis 17

the set

Br (x) = {y ∈ X x − y < r}.

A subset U ⊂ X is
• open if it consists of interior points only, i.e., for every x ∈ U , there exists an
ε > 0 such that Bε (x) ⊂ U ,
• a neighborhood of x ∈ X if x is an interior point of U ,
• closed if it consists of limit points only, i.e., for every x ∈ X for which for every
ε > 0, the sets Bε (x) intersect the set U , one has also x ∈ U ,
• compact if every covering of U by a family of open sets has a finite subcover,
i.e.,

Vi open, i ∈ I with U ⊂ Vi ⇒ ∃J ⊂ I, J finite with U ⊂ Vj .
i∈I j ∈J

The fact that the set U is a compact subset of X we denote by U ⊂⊂ X.

Furthermore, let the interior of U , abbreviated by int(U ), be the set of all interior
points, and the closure of U , denoted by U , the set of all limit points. The set
difference ∂U = U \ int(U ) is called the boundary. For open r-balls, one has

Br (x) = {x ∈ X x − yX ≤ r},

which is why the latter is also referred to as a closed r-ball. We say that a subset
U ⊂ X is dense in X if U = X. In particular, X is called separable if it possesses a
countable and dense subset.
Normed spaces are first countable (i.e., each point has a countable neighborhood
base, cf. [122]). That is why we can also describe the terms closed and compact also
by means of sequences and their convergence properties:
• We say that a sequence (xn ) : N → X converges to x ∈ X if (xn − x) is a null
sequence. This is also denoted by xn → x for n → ∞ or x = limn→∞ xn .
• The subset U is closed if and only if for every sequence (xn ) in U with xn → x,
the limit x lies in U as well (sequential closedness).
• The subset U is compact if and only if every sequence (xn ) in U has a convergent
subsequence (sequential compactness).
For nonempty subsets V ⊂ X, we naturally obtain a topology on V through
restriction, which is referred to as the relative topology. The notions introduced
above result simply through substituting X by the subset V in the respective
definitions.
18 2 Mathematical Preliminaries

Example 2.3 (Construction of Normed Spaces)

1. For finitely many normed spaces Xi , · Xi , i = 1, . . . , N, and a norm · on
RN , the product space

Y = X1 × · · · × XN , (x1 , . . . , xN )Y = (x1 X1 , . . . , xN XN )

is a normed space.
2. For a subspace U of a normed space (X, · X ), the pair (U, · X ) is a normed
space again. Its topology corresponds to the relative topology on U .
3. If (X, · X ) is a normed space and Y ⊂ X is a closed subspace, the quotient
vector space

X/Y = {[x] x1 ∼ x2 if and only if x1 − x2 ∈ Y }

can be constructed with the following norm:

[x] = inf{x + yX y ∈ Y }.
X/Y

These topological constructions lead to a notion of continuity:

Definition 2.4 (Bounded, Continuous, and Closed Functions) A mapping F :
X ⊃ U → Y between the normed spaces (X, · X ) and (Y, · Y ) is bounded if
there exists C > 0 such that F (x)Y ≤ C for all x ∈ U . The domain U of F is
donated by dom(F ).
The mapping is continuous in x ∈ U if for every ε > 0, there exists δ > 0 such
that the implication

x − yX < δ ⇒ F (x) − F (y)Y < ε

holds. If F is continuous at every point x ∈ U , we call F simply continuous. If,

furthermore, δ does not depend on x, then F is uniformly continuous.
Finally, we call F closed if the graph

graph(F ) = {(x, y) ∈ X × Y y = F (x)} ⊂ X × Y

is closed.
Continuity in x ∈ U can equivalently be expressed through sequential continuity,
i.e., for every sequence (xn ) in U , one has xn → x ⇒ F (xn ) → F (x) for n → ∞.
The weaker property that F is closed is expressed with sequences as follows: for
every sequence (xn ) in U with xn → x such that F (xn ) → y for some y ∈ Y , we
always have x ∈ dom F and y = F (x).
On normed spaces, a stronger notion of continuity is of importance as well:
Definition 2.5 (Lipschitz Continuity) A mapping F : X ⊃ U → Y between
the normed spaces (X, · X ) and (Y, · Y ) is called Lipschitz continuous if there
2.1 Fundamentals of Functional Analysis 19

exists a constant C > 0 such that for all x, y ∈ U , one has

F (x) − F (y)Y ≤ Cx − yX .

The infimum over all these constants C is called the Lipschitz constant.
Sets of uniformly continuous mappings can be endowed with a norm structure:
Definition 2.6 (Spaces of Continuous Mappings) Let U ⊂ X be a non-empty
subset of the normed space (X, · X ), endowed with the relative topology, and let
(Y, · Y ) be a normed space.
The vector space of continuous mappings we denote by:

C (U, Y ) = {F : U → Y F continuous}.

The set and the norm

C (U , Y ) = {F : U → Y F bounded and uniformly continuous}, F ∞ = sup F (x)Y
x∈U

form the normed space of the (uniformly) continuous mappings on U .

If U and Y are separable, then C (U , Y ) is separable as well.
Uniformly continuous mappings on U can always be continuously extended onto
U , which is indicated by the notation C (U, Y ). This notation is quite common in the
literature, but it is misused easily: For unbounded U , it is possible that U = U , but
also that C (U, Y ) = C (U , Y ).
Another important case is that of continuous mappings between two normed
spaces that are linear. These mappings are characterized by continuity at the origin,
or equivalently, through boundedness on the unit sphere, and they form a normed
space themselves. Of course, linear mappings can also be discontinuous; and in this
case mappings that are defined on a dense subspace are most interesting.
Definition 2.7 (Linear and Continuous Mappings) Let (X, · X ) and
(Y, · Y ) be normed spaces.
A linear mapping F : dom F → Y that is defined on a subspace dom F ⊂ X
with dom F = X is called densely defined; we call the set dom F the domain of F .
The notation L(X, Y ) = {F : X → Y F linear and continuous} denotes the set
of linear and continuous mappings, which, together with the norm

F = inf {M ≥ 0 F xY ≤ MxX for all x ∈ X},

forms the space of linear and continuous mappings between X and Y . We also refer
to the norm given on L(X, Y ) as operator norm.
Linear and continuous mappings are often also called bounded linear mappings.
Note that densely defined and continuous mappings can be extended onto the
whole of X, which is why densely defined linear mappings are often also called
unbounded.
20 2 Mathematical Preliminaries

Depending on the situation, we will also use the following characterizations of

the operator norm:
F xY
F = sup F xY = sup .
xX ≤1 x=0 xX

In this context, let us note the following:

• The set ker(F ) = {x ∈ X F x = 0} denotes the kernel of F .
• The range of F is given by rg(F ) = {F x ∈ Y x ∈ X}.
• If F is bijective, the inverse F −1 is continuous if and only if there exists c > 0
such that

cxX ≤ F xY for all x ∈ X.

In this case, we have F −1 ≤ c−1 .

• If F and F −1 are continuous, we call F a linear isomorphism. If in particular,
F = F −1 = 1, then F is an isometric isomorphism.
Definition 2.8 (Embeddings) We say that a normed space X is continuously
embedded into another normed space Y , denoted by X → Y , if:
• X ⊂ Y (or X can be identified with a subset of Y )
• the identity map id : X → Y is continuous.
In other words

X → Y ⇐⇒ ∃C > 0 ∀x ∈ X : xY ≤ CxX .

Normed spaces are also the starting point for the definition of the differentiability
of a mapping. Apart from the classical definition of Fréchet differentiability, we will
also introduce the weaker notion of Gâteaux differentiability.
Definition 2.9 (Fréchet Differentiability) Let F : U → Y be a mapping defined
on the open subset U ⊂ X between the normed spaces (X, · X ) and (Y, · Y ).
Then, F is Fréchet differentiable (or differentiable) at x ∈ U if there exists DF (x) ∈
L(X, Y ) such that for every ε > 0, there exists δ > 0 such that

F (x + h) − F (x) − DF (x)hY
0 < hX < δ ⇒ x+h ∈U and < ε.
hX

The linear and continuous mapping DF (x) is also called the (Fréchet) derivative at
the point x.
If F is differentiable at every point x ∈ U , then F is called (Fréchet)
differentiable, and DF : U → L(X, Y ), given by DF : x → DF (x), denotes
the (Fréchet) derivative. If DF is continuous, we call F continuously differentiable.
2.1 Fundamentals of Functional Analysis 21

Since L(X, Y ) is also a normed space, we can consider the differentiability of DF

as well: If the second derivative exists at a point x, we denote it by D2 F (x) and note
that it lies in the space L(X, L(X, Y )), which exactly corresponds to the continuous
bilinear mappings G : X ×X → Y (cf. [51]). Together with the notion of k-linearity
or multilinearity, we can formulate the analogue for k ≥ 1:

L(X, . . . L(X, Y ) · · ·) ∼ Lk (X, Y ) = {G : X × . . . × X → Y G k-linear and continuous},

k-times k-times

k

G = inf M > 0 G(x1 , . . . , xk )Y ≤ M xi X for all (x1 , . . . , xk ) ∈ Xk ,
i=1

where the latter norm coincides with the respective operator norm. Usually, we
regard the kth derivative as an element in Lk (X, Y ) (the space of k-linear continuous
mappings), or equivalently, as a mapping Dk F : U → Lk (X, Y ). If such a derivative
exists in x ∈ U and is continuous at this point, then Dk F (x) is symmetric.
Example 2.10 (Differentiable Mappings)
• A linear and continuous mapping F ∈ L(X, Y ) is infinitely differentiable with
itself as the first derivative and 0 as every higher derivative.
• On KN , every polynomial is infinitely differentiable.
• Functions on U ⊂ KN that possess continuous partial derivatives are also
continuously differentiable.
In the case of functions, i.e., X = RN and Y = K, the following notations are
common:
⎛ 2 ⎞
∂2F
∂ F
2 · · · ∂x ∂x
⎜ ∂x1 1 N
⎟
⎜ .. . .. ⎟ .
∇F = ∂x ∂F
· · · ∂F
, ∇ 2
F = ⎜ . . . . ⎟
1 ∂xN ⎝ 2 ⎠
∂2F
∂ F
∂xN ∂x1 · · · 2 ∂xN

By means of that notation and under the assumption that the respective partial
derivatives are continuous, the Fréchet derivatives can be represented by matrix
vector products

DF (x)y = ∇F (x)y, D2 F (x)(y, z) = zT ∇ 2 F (x)y.

For higher derivatives, there are similar summation notations. In the former
situation, the vector field ∇F is called the gradient of F , while the matrix-valued
mapping ∇ 2 F is, slightly abusively, referred to as the Hessian matrix. In the case
of the gradient, it has become common in the literature not to distinguish between a
row vector and a column vector—in the sense that one can multiply the gradient at
a point by a matrix from the left as well. We will also make use of this fact as long
as ambiguities are impossible, but we will point this out again at suitable locations.
22 2 Mathematical Preliminaries

Finally, for U ⊂ RN and F : U → KM , we introduce the notion of the Jacobian

matrix, for which, in the case of continuity of the partial derivatives, one has:
⎛ ∂F ∂F1
⎞
∂x1
1
··· ∂xN
⎜ . .. ⎟
∇F = ⎜
⎝ ..
..
. ⎟
. ⎠, DF (x)y = ∇F (x)y.
∂FM ∂FM
∂x1 ··· ∂xN

Apart from that, let us introduce two specific and frequently used differential
operators: For a vector field F : U → KN with U ⊂ RN a nonempty open subset,
the function

N
∂Fi
div F = trace ∇F =
∂xi
i=1

is called the divergence and the associated operator div the divergence operator. For
functions F : U → K, the operator with

N
∂ 2F
F = trace ∇ F = 2

i=1
∂xi2

denotes the Laplace operator.

In order to keep track of higher-order partial derivatives of a function, one often
uses the so-called multi-index notation. A multi-index is given by α ∈ Nd and for
α = (α1 , . . . , αd ), we write

∂α ∂ α1 ∂ αd
= · · · .
∂x α ∂x α1 ∂x αd
We will also use the notation
∂α
∂α = .
∂x α

By |α| = dk=1 αk , we denote the order of the multi-index. By means of multi-
indices, we can, for instance, formulate the Leibniz rule
for the higher-order

derivatives of a product in a compact fashion: with α! = dk=1 αk ! and βα =
α!
β!(α−β)! for β ≤ α (i.e., βk ≤ αk for 1 ≤ k ≤ d), one has

α
∂ (f g) =
α
∂ α−β f ∂ β g .
β
β≤α
2.1 Fundamentals of Functional Analysis 23

Let us finally introduce a weaker variant of differentiability than the Fréchet

differentiability:
Definition 2.11 (Gâteaux Differentiability) A mapping F : X ⊃ U → Y of
a non-empty subset U between the normed spaces (X, · X ) and (Y, · Y ) is
Gâteaux differentiable at x ∈ U if there exists DF (x) ∈ L(X, Y ) such that for
every y ∈ X, the mapping Fx,y : λ → F (x + λy), defined on a neighborhood of
0 ∈ K, is differentiable at 0 and the (Gâteaux) derivative satisfies DFx,y = DF (x)y.
In particular, the mapping F is called Gâteaux differentiable if it is Gâteaux
differentiable at every x ∈ U .
The essential difference between Gâteaux and Fréchet differentiability is the quality
of approximation of the respective derivative: In the case of Fréchet differentiability,
the linearization F (x) + DF (x)( · − x) approximates better than ε · − xX for
every ε > 0 and uniformly in a neighborhood of x. For Gâteaux differentiability,
this holds only for every direction.

2.1.2 Banach Spaces and Duality

Apart from the concept of normed spaces, the notion of completeness, i.e., the
existence of limits of Cauchy sequences, is essential for various fundamental results.
Definition 2.12 (Banach Space) A normed space (X, · X ) is complete if every
Cauchy sequence converges in X, i.e., for every sequence (xn ) in X with the
property

for all ε > 0, there exists n0 ∈ N such that for all n, m ≥ n0 , one has: xn − xm X < ε,

there exists some x ∈ X with xn → x for n → ∞.

A complete normed space is called a Banach space.
Example 2.13 (Banach Spaces) The field (K, | · |) is a Banach space, and due to
that fact, all (KN , | · |p ) of Example 2.2 are Banach spaces as well. Furthermore,
L(X, Y ) is a Banach space if (Y, · Y ) is one.
In Banach spaces, several fundamental assertions hold that link pointwise
properties to global ones (cf. [22, 122]).
Theorem 2.14 (Baire Category Theorem)
Let X be a Banach space and (An ) a
sequence of closed sets such that n An = X. Then there exists n0 such that the set
An0 contains an open ball.
24 2 Mathematical Preliminaries

Theorem 2.15 (Banach Steinhaus or Uniform Boundedness Principle) Let X

be a Banach space, Y a normed space and (Fi ) ⊂ L(X, Y ), i ∈ I = ∅, a family of
linear and continuous mappings. Then

sup Fi xY < ∞ for all x ∈ X ⇒ sup Fi < ∞.

i∈I i∈I

Theorem 2.16 (Open/Inverse Mapping Theorem) A linear and continuous map-

ping F ∈ L(X, Y ) between the Banach spaces X and Y is open if and only if it is
surjective.
In particular, a bijective, linear and continuous mapping F between Banach
spaces always possesses a continuous inverse F −1 .
We now turn to a construction that is important in functional analysis, the dual
space X∗ = L(X, K) associated to X. According to Example 2.13, X∗ is always a
Banach space, since K is a Banach space. The norm on X∗ is due to Definition 2.7
or to one of the characterizations as

x ∗ X∗ = sup x ∗ (x).

xX ≤1

The subspaces of X and X∗ can be related in the following way: For a subspace
U ⊂ X, the annihilator is defined as the set

U ⊥ = {x ∗ ∈ X∗ x ∗ (x) = 0 for all x ∈ U } in X∗ ,

and for the subspace V ⊂ X∗ , the definition reads

V ⊥ = {x ∈ X x ∗ (x) = 0 for all x ∗ ∈ V } in X.

The sets U ⊥ and V ⊥ are always closed subspaces. Annihilators are used for the
characterizations of dual spaces of subspaces.
Example 2.17 (Dual Spaces)
1. The dual space X∗ of an N-dimensional normed space X is again N-dimensional,
i.e., equivalent to itself. In particular, (KN , · )∗ = (KN , · ∗ ), where · ∗ is
the norm dual to · .
2. The dual space of Y = X1 × · · · × XN of Example 2.3 can be regarded as

Y ∗ = X1∗ × · · · × XN
∗
, (x1∗ , . . . , xN
∗
)Y ∗ = (x1∗ X1∗ , . . . , xN
∗
XN∗ ) ∗

with the norm · ∗ dual to · .

3. The dual space of a subspace U ⊂ X is given by the quotient space U ∗ =
X∗ /U ⊥ endowed with the quotient norm; cf. Example 2.3.
2.1 Fundamentals of Functional Analysis 25

The consideration of the dual space is the basis for the notion of weak
convergence, for instance, and in many cases, X∗ reflects important properties of
the predual space X. It is common to regard the application of elements in X∗ to
elements in X as a bilinear mapping called duality pairing:

· , · X∗ ×X : X∗ × X → K, x ∗ , xX∗ ×X = x ∗ (x).

The subscript is often omitted if the spaces are evident due to the context. Of course,
one can iterate the generation of the dual space, the next space being the bidual
space X∗∗ . This space naturally contains X, and the canonical injection is given by

J : X → X∗∗ , J (x), x ∗ X∗∗ ×X∗ = x ∗ , xX∗ ×X , J (x)X∗∗ = xX .

The latter equality of the norms is a consequence of the Hahn-Banach extension

theorem, which in a rephrased statement, reads

xX = inf {L ≥ 0 |x ∗ , x| ≤ Lx ∗ X∗ ∀x ∗ ∈ X∗ }
|x ∗ , x|
= sup |x ∗ , x| = sup .
x ∗ X∗ ≤1 x ∗ =0 x ∗ X∗

Hence, the bidual space is always at least as large as the original space. We can
now consider the closure of J (X) in X∗∗ and naturally obtain a Banach space that
contains X in a certain sense.
Theorem 2.18 (Completion of Normed Spaces) For every normed space
(X, · X ), there exists a Banach space (X̃, · X̃ ) and a norm-preserving mapping
J : X → X̃ such that J (X) is dense in X̃.
Often, one identifies X ⊂ X̃ and considers the completed space X̃ instead of
X. A completion can also be constructed by taking equivalence classes of Cauchy
sequences; according to the inverse mapping theorem, this procedure yields an
equivalent Banach space.
The notion of the bidual space is also essential in another context: If the injection
J : X → X∗∗ is surjective, X is called reflexive. Reflexive spaces play a particular
role in the context of weak convergence:
Definition 2.19 (Weak Convergence, Weak*-Convergence) A sequence (xn ) in
a normed space (X, · X ) converges weakly to some x ∈ X if for every x ∗ ∈ X∗ ,
one has

lim x ∗ , xn X∗ ×X = x ∗ , xX∗ ×X .

n→∞

In this case, we write xn x for n → ∞.

26 2 Mathematical Preliminaries

Analogously, a sequence (xn∗ ) in X∗ converges in the weak* sense to some x ∗ ∈

X∗ if

lim xn∗ , xX∗ ×X = x ∗ , xX∗ ×X

n→∞

∗
for every x ∈ X, which we also denote by xn∗ x ∗ for n → ∞.
Note that the definition coincides with the notion of the convergence of sequences
in the weak or weak* topology, respectively. However, we will not go into further
detail here. While the convergence in the norm sense implies convergence in the
weak sense, the converse holds only in finite-dimensional spaces. Also, in the
dual space X∗ , weak* convergence is in general a weaker property than weak
convergence; for reflexive spaces, however, the two notions coincide. According
to Theorem 2.15 (Banach Steinhaus), weakly or weak*-convergent sequences,
respectively, are at least still bounded, i.e., xn x for n → ∞ implies
supn xn X < ∞ and, the analogue holds for weak* convergence.
Of course, notions such as continuity and closedness of mappings can be
generalized for these types of convergence.
Definition 2.20 (Weak/Weak* Continuity, Closedness) Let X, Y be normed
spaces and U ⊂ X a nonempty subset.
• A mapping F : U → Y is called {strongly, weakly}-{strongly, weakly}
continuous if the {strong, weak} convergence of a sequence (xn ) to some x ∈ U
implies the {strong, weak} convergence of (F (xn )) to F (x).
• The mapping is {strongly, weakly}-{strongly, weakly} closed if the {strong,
weak} convergence of (xn ) to some x ∈ X and the {strong, weak} convergence
of (F (xn )) to some y ∈ Y imply: x ∈ U and y = F (x).
In the case that X or Y is a dual space, the corresponding weak* terms are defined
analogously with weak* convergence.
One of the main reasons to study these types of convergences is compactness
results, whose assertions are similar to the Heine Borel theorem in the finite-
dimensional case.
• A subset U ⊂ X is weakly sequentially compact if every sequence in U possesses
a weakly convergent subsequence with limit in U .
• We say that a subset U ⊂ X∗ is weak*-sequentially compact if the analogue
holds for weak* convergence.
Theorem 2.21 (Banach-Alaoglu for Separable Spaces) Every closed ball in the
dual space of a separable normed space is weak*-sequentially compact.
Theorem 2.22 (Eberlein Šmulyan) A normed space is reflexive if and only if
every closed ball is weakly sequentially compact.
2.1 Fundamentals of Functional Analysis 27

Dual spaces and weak and weak* convergence can naturally be used in connec-
tion with linear and continuous mappings as well. Corresponding examples are the
adjoint as well as the notions of weak and weak* sequential continuity.
Definition 2.23 (Adjoint Mapping) For F ∈ L(X, Y ),

F ∗ y ∗ , xX∗ ×X = y ∗ , F xY ∗ ×Y , for all x ∈ X, y ∗ ∈ Y ∗ ,

defines the adjoint mapping F ∗ ∈ L(Y ∗ , X∗ ).

Remark 2.24
• If F is linear and continuous, then F ∗ is linear and continuous. Furthermore,
F ∗ = F ; i.e., taking the adjoint is a linear, continuous, and norm-preserving
mapping.
• If rg(F ) is dense in Y , then the mapping F ∗ is injective. Conversely, if F is
injective, then rg(F ∗ ) is dense.
• Every mapping F ∈ L(X, Y ) is also weakly sequentially continuous, since for
(xn ) in X with xn x and arbitrary y ∗ ∈ Y ∗ , one has

y ∗ , F xn Y ∗ ×Y = F ∗ y ∗ , xn X∗ ×X → F ∗ y ∗ , xX∗ ×X = y ∗ , F xY ∗ ×Y ,

i.e., F xn F x in Y . Analogously, we infer that an adjoint mapping F ∗ ∈

∗ ∗
L(Y ∗ , X∗ ) is weak*-sequentially continuous, i.e., yn∗ y ∗ implies F ∗ yn∗
F ∗y ∗.
The adjoint can also be defined for densely defined, unbounded mappings:
Definition 2.25 (Adjoint of Unbounded Mappings) Let dom F ⊂ X be a dense
subspace of a normed
space
X, · X and F : dom F → Y a linear mapping in a
normed space Y, · Y . Then, the mapping F ∗ : dom F ∗ → X∗ , defined on

dom F ∗ = {y ∗ ∈ Y ∗ x → y ∗ , F xY ∗ ×Y is continuous on dom F } ⊂ Y ∗

and satisfying that F ∗ y ∗ is the extension of x → y ∗ , F xY ∗ ×Y onto the whole of

X, is also called the adjoint.
For reflexive Banach spaces, the adjoint is densely defined and closed. For a
densely defined and closed mapping F between X and Y , the following relations
between the kernel and the range hold:
⊥ ⊥
ker(F ) = rg(F ∗ ) in X and ker(F ∗ ) = rg(F ) in Y ∗ .

The circumstances under which these identities hold for interchanged annihilators
are given in the following theorem on linear mappings with a closed range.
28 2 Mathematical Preliminaries

Theorem 2.26 (Closed Range Theorem) Let X, Y be Banach spaces and F :

X ⊃ dom F → Y a densely defined, closed mapping. Then the following assertions
are equivalent:
1. rg(F ) is closed in Y ,
2. rg(F ∗ ) is closed in X∗ ,
3. ker(F )⊥ = rg(F ∗ ) in X∗ ,
4. ker(F ∗ )⊥ = rg(F ) in Y .
An important class of operators, which in the linear and continuous case relates
weak and strong convergence, it that of compact mappings:
Definition 2.27 (Compact Mappings) A mapping F : X → Y between two
normed spaces X and Y is called compact if F (U ) is relatively compact for every
bounded U ⊂ X.
The set of linear, continuous, and compact mappings is denoted by K(X, Y ).
In the context of embeddings of Banach spaces X and Y , we also say that X is
compactly embedded into Y if for the identity map id : X → Y of Definition 2.8,
one has id ∈ K(X, Y ).
In functional analysis, there exists an extensive theory on linear, continuous, and
compact mappings, since results on linear mappings between finite-dimensional
spaces that do not hold true for general linear and continuous mappings can be
transferred in this case. For this subject, we refer to [22, 122] and only mention
some elementary properties:
• The space K(X, Y ) is a closed subspace of L(X, Y ). Every linear mapping with
finite-dimensional range is compact.
• Elements in K(X, Y ) are weakly-strongly continuous or completely continuous.
If X is reflexive, a linear and continuous mapping is compact if and only if it is
weakly-strongly continuous.
• An analogous statement can be found for weak* convergence, dual spaces X∗
and Y ∗ with X separable, and adjoints of linear and continuous mappings, i.e.,
for F ∗ with F ∈ L(Y, X).
Finally, let us touch upon the role of dual spaces in the separation of convex sets.
Definition 2.28 A subset A ⊂ X of a normed space X is called convex if for every
x, y ∈ A and λ ∈ [0, 1], one has λx + (1 − λ)y ∈ A.
Theorem 2.29 (Hahn-Banach Separation Theorems) Let A, B ⊂ X be
nonempty, disjoint, convex subsets of the normed space (X, · X ).
1. If A is open, then there exist x ∗ ∈ X∗ , x ∗ = 0, and λ ∈ R such that

Re x ∗ , x ≤ λ for all x ∈ A and Re x ∗ , x ≥ λ for all x ∈ B.

2.1 Fundamentals of Functional Analysis 29

2. If A is closed and B is compact, then there exist x ∗ ∈ X∗ , λ ∈ R, and ε > 0

such that

Re x ∗ , x ≤ λ − ε for all x ∈ A and Re x ∗ , x ≥ λ + ε for all x ∈ B.

In the case above, the set {x ∈ X Re x ∗ , x = λ} is interpreted as a closed
hyperplane that separates the sets A and B. Hence, dual spaces contain enough
elements to separate two different points, for instance, but also points from closed
sets, etc.

2.1.3 Aspects of Hilbert Space Theory

The concept of a Banach space is very general. Thus, it is not surprising that many
desired properties (such as reflexivity, for instance) do not hold in general and have
to be required separately. In the case of a Hilbert space, however, one has additional
structure at hand due to the inner product, which naturally yields several properties.
Let us give a brief summary of the most important of these properties.
Definition 2.30 (Inner Product) Let X be a K-vector space. A mapping ( · , · )X :
X × X → K is called an inner product if it satisfies:
1. (λ1 x1 + λ2 x2 , y)X = λ1 (x1 , y)X + λ2 (x2 , y) for x1 , x2 , y ∈ X and λ1 , λ2 ∈ K,
(linearity)
2. (x, y)X = (y, x)X for x, y ∈ X, (Hermitian symmetry)
3. (x, x)X ≥ 0 and (x, x)X = 0 ⇔ x = 0. (positive definiteness)
√
An inner product on X induces a norm by xX = (x, x)X , which satisfies the
Cauchy-Schwarz inequality:

|(x, y)X | ≤ xX yX for all x, y ∈ X.

Based on that, one can easily deduce the continuity of the inner product as well.
For a K-vector space with inner product and the associated normed space
(X, · X ), the terms inner
product space
and pre-Hilbert space are common. We
also denote this space by X, ( · , · )X . Together with the notion of completeness,
we are lead to the following definition.

Definition 2.31
(Hilbert Space) A Hilbert space is a complete pre-Hilbert space
X, ( · , · )X . Depending on K = R or K = C, it is called a real or a complex
Hilbert space, respectively.
Example 2.32 For N ≥ 1, the set KN is a finite-dimensional Hilbert space if the
inner product is chosen as

N
(x, y)2 = xi yi .
i=1
30 2 Mathematical Preliminaries

We call this inner product the Euclidean inner product and also write x ·y = (x, y)2 .

Analogously, the set 2 = {x : N → K ∞
i=1 |xi | < ∞}, endowed with the
2

inner product
∞

(x, y)2 = xi yi ,
i=1

yields an infinite-dimensional separable Hilbert space. (Note that the inner product
is well defined as a consequence of the Cauchy-Schwarz inequality.)
For (pre-)Hilbert spaces, the notion of orthogonality is characteristic:

Definition 2.33 (Orthogonality) Let X, ( · , · )X be a pre-Hilbert space.
• Two elements x, y ∈ X are called orthogonal if (x, y)X = 0, which we also
denote by x ⊥ y. A set U ⊂ X whose elements are mutually orthogonal is called
an orthogonality system.
• The subspaces U, V ⊂ X are orthogonal, denoted by U ⊥ V , if orthogonality
holds for every pair (x, y) ∈ U × V . For a subspace W ⊂ X with W = U + V ,
we also write W = U ⊥ V .
• For a subspace U ∈ X, the subspace of all vectors

U ⊥ = {y ∈ X x ⊥ y for all x ∈ U }

is called the orthogonal complement of U .

If x, y ∈ X are orthogonal, then the norm satisfies the Pythagorean theorem

x + y2X = x2X + y2X .

The analogous assertion on the square of norms of sums remains true for finite
orthogonal systems as well as countable orthogonal systems whose series converge
in X.
The orthogonal complement U ⊥ is closed and for closed subspaces U in the
Hilbert space X, one has X = U ⊥ U ⊥ . That implies the existence of the
orthogonal projection on U : if X, ( · , · )X is a Hilbert space and U ⊂ X is a
closed subspace, then there exists a unique P ∈ L(X, X) with

P2 = P, rg(P ) = U, and ker(P ) = U ⊥ .

Obviously, Q = I − P represents the orthogonal projection onto U ⊥ .

The notion of orthogonality is the foundation for orthonormal systems as well as
orthonormal bases.
2.1 Fundamentals of Functional Analysis 31

Definition 2.34 (Orthonormal System)

• A subset U of a pre-Hilbert space X, ( · , · )X is called an orthonormal system
if (x, y)X = δx,y for all x, y ∈ U .
• An orthonormal system is called complete if there is no orthonormal system that
contains U as a strict subset.
• An at most countable and complete orthonormal system is also referred to as an
orthonormal basis.
For orthonormal systems U ⊂ X, Bessel’s inequality holds:

x∈X: |(x, y)X |2 ≤ x2X ,
y∈U

where (x, y) = 0 is true for at most countably many y ∈ U , i.e., the sum is to be
interpreted as a convergent series.
If U is complete, then equality holds as long as X is a Hilbert space. This
relationship is called Parseval’s identity:

x∈X: x2X = |(x, y)X |2 .
y∈U

The latter can also be interpreted as a special case of the Parseval identity:

x1 , x2 ∈ X : (x1 , x2 )X = (x1 , y)X (x2 , y)X .
y∈U

One can show that in every Hilbert space, there exists a complete orthonormal
system (cf. [22]). Furthermore, a Hilbert space X is separable if and only if there
exists an orthonormal basis in X. Due to the Parseval identity, every separable
Hilbert space is thus isometrically isomorphic to either 2 or KN for some N ≥ 0. In
particular, the Parseval relation implies that every sequence of orthonormal vectors
(x n ) weakly converges to zero.
Let us finally consider the dual spaces of Hilbert spaces. Since the inner product
is continuous, for every y ∈ X we obtain an element JX y ∈ X∗ by means of
JX y, xX∗ ×X = (x, y)X . The mapping JX : X → X∗ is semi linear, i.e., we have

JX (λ1 x1 + λ2 x2 ) = λ1 JX x1 + λ2 JX x2 for all x1 , x2 ∈ X and λ1 , λ2 ∈ K.

The remarkable property of a Hilbert space is now that the range of JX is the whole
of the dual space:

Theorem 2.35 (Riesz Representation Theorem) Let X, ( · , · )X be a Hilbert
space. For every x ∗ ∈ X∗ , there exists y ∈ X with yX = x ∗ X∗ such that

x ∗ , xX∗ ×X = (x, y)X for all x ∈ X.

32 2 Mathematical Preliminaries

The mapping JX−1 : X∗ → X that assigns y ∈ X to the x ∗ ∈ X∗ above is also

called Riesz mapping. It is semi linear and norm-preserving, which yields that X
and X∗ are isometrically isomorphic in this sense. Immediately, this implies that a
Hilbert space is reflexive.
If X and Y are Hilbert spaces, then for every F ∈ L(X, Y ), we can additionally
construct the so-called Hilbert space adjoint F = JX−1 F ∗ JY ∈ L(Y, X) by means
of the Riesz mapping. Obviously, this mapping can equivalently be defined by

(F x, y)Y = (x, F y)X for all x ∈ X, y ∈ Y.

If F = F , then the mapping F is called self-adjoint.

We will often use the identification of a Hilbert space with its dual without further
notice and simply write F ∗ = F ∈ L(Y, X), for instance, as long as this is evident
from to the context.

2.2 Elements of Measure and Integration Theory

For our purposes, it is mainly the functional-analytic aspects of measure and

integration theory that are of interest: On the one hand, the function spaces
associated to the Lebesgue integral exhibit several “good” properties. On the other
hand, they contain objects that are of interest for image processing: classical notions,
such as the continuity of functions, are too restrictive for images, since jumps
in the function values may appear. Additionally, noise is generally regarded as a
discontinuous perturbation. The Lebesgue spaces comply with such assumptions
and at the same time, they provide a suitable analytical structure.

2.2.1 Measure and Integral

The notion of the Lebesgue integral is based on measuring the contents of sets by
means of a so-called measure.
Definition 2.36 (Measurable Space) Let be a nonempty set. A family of subsets
F is a σ -algebra if
1. ∅ ∈ F,
2. for all A ∈ F, one has \A
∈ F,
3. Ai ∈ F, i ∈ N, implies that i∈N Ai ∈ F.
The pair (, F) is called a measurable space, and the sets in F are called
measurable. The smallest σ -algebra that contains a family of subsets G is the σ -
algebra induced by G. For a topological space , the σ -algebra induced by the
open sets is called a Borel algebra, denoted by B().
2.2 Elements of Measure and Integration Theory 33

Roughly speaking, in a σ -algebra, taking the complement as well as (countable)

intersections and unions of sets are allowed. In particular, a Borel algebra contains
all open, all closed, and all compact sets. We consider σ -algebras, since they are a
suitable class of systems of sets whose elements can be measured or integrated.
Definition 2.37 (Positive Measure) A measure on a measurable space (, F) is a
function μ : F → [0, ∞] with the following properties:
1. μ(∅) = 0,
2. Ai ∈ F, i ∈ N, mutually disjoint implies that

∞
μ Ai = μ(Ai ).
i∈N i=1

If μ() < ∞, then μ is a finite measure; in the special case that μ() = 1, it is
exists a sequence (Ai ) in F for which μ(Ai ) <
called a probability measure. If there
∞ for all i ∈ N as well as = i∈N Ai , then the measure is called σ -finite.
The triple (, F, μ) is called a measure space.
Often, the concrete measurable space to which a measure is associated to, is of
interest: in the case F = B(), we speak of a Borel measure; if additionally one
has μ(K) < ∞ for all compact sets K, we call μ a positive Radon measure.
Example 2.38 (Measures)
1. On every nonempty set , a measure is defined on the power set F = P() in
an obvious manner:

card(A) if A is finite,
μ(A) =
∞ otherwise.

This measure is called the counting measure on . One can restrict it to a Borel
measure, but in general, it is not a positive Radon measure: in the important
special case of the standard topology on Rd , there are compact sets with infinite
measure.
2. The following example is of a similar nature: For ⊂ Rd , x ∈ , and A ∈ F =
B(),

1 if x ∈ A,
δx (A) =
0 otherwise,

defines a positive Radon measure, the Dirac measure in x.

34 2 Mathematical Preliminaries

3. For half-open cuboids [a, b[ = {x ∈ Rd ai ≤ xi < bi } ∈ B(Rd ) with a, b ∈
Rd , we define

d
Ld ([a, b[) = (bi − ai ).
i=1

One can show that this function possesses a unique extension to a Radon measure
on B(Rd ) (cf. [123]). This measure, denoted by Ld as well, is the d-dimensional
Lebesgue measure. It corresponds to the intuitive idea of the “volume” of a d-
dimensional set, and we also write || = Ld ().
4. A different approach to the Lebesgue measure assumes that the volume of k-
dimensional unit balls is known: For an integer k ≥ 0, this volume is given by
∞
π k/2
ωk = , (k) = t k−1 e−t dt,
(1 + k/2) 0

where is known as the gamma function. For k ∈ [0, ∞[, a volume can be
defined even for “fractional dimensions.” For an arbitrary bounded set A ⊂ Rd ,
one now expects that the k-dimensional volume is at most ωk diam(A)k /2k with

diam(A) = sup {|x − y| x, y ∈ A}, diam(∅) = 0.

This suffices to define for A ∈ B(Rd ),

∞
ωk k
Hk (A) = lim k
inf diam(A i A ⊂
) Ai , diam(Ai ) < δ .
δ→0 2
i=1 i∈N

The latter constitutes a Borel measure, the k-dimensional Hausdorff measure,

which plays a major role in geometric measure theory. A fundamental result in
this theory is that this measure coincides with the Lebesgue measure in the case
k = d. Furthermore, the surface area of a k-dimensional surface can be expressed
through Hk (cf. [62]).
In order to integrate with respect to these measures on subsets as well, one first
needs the notion of the restriction of a measure.
Definition 2.39 Let (, F, μ) be a measure space and a measurable set.
1. Then

A∈F: (μ )(A) = μ(A ∩ )

defines another measure on , the restriction of μ to .

2.2 Elements of Measure and Integration Theory 35

2. Together with the σ -algebra

F| = {A ∩ A ∈ F},

the triple ( , F| , μ ) defines the measure space restricted to .

Example 2.40
• For (Rd , B(Rd ), Ld ) and ⊂ Rd nonempty and open, we obtain the restricted
measure space (, B(), Ld ). This space leads to the standard Lebesgue
integration on . ∞
• The Dirac comb μ = n=0 δn can also be interpreted as a restriction: μ =
H0 N. Note that in contrast to H0 , the measure μ is σ -finite.
• For an Hk -integrable , the term μ = Hk is a finite measure, while Hk is not
σ -finite for k < d. This is of importance in the context of theorems that assume
(σ -)finiteness.
If a specific measure is given on a σ -algebra, it is possible to extend the measure
and the σ -algebra in such a way that in some sense, as many sets as possible can be
measured.
Definition 2.41 (Null Sets, Almost Everywhere, μ-Measurability) Let (, F, μ)
be a measure space.
• If for N ⊂ , there exists some A ∈ F with N ⊂ A and μ(A) = 0, the set N is
called a μ-null set.
• If a statement P (x) is true for all x ∈ A and \A is a null set, we say that P (x)
holds μ-almost everywhere in .
• The σ -algebra Fμ , given by

A ∈ Fμ ⇔ A = B ∪ N, B ∈ F, N null set,

is the completion of F with respect to μ. Its elements are the μ-measurable sets.
• For A ∈ Fμ , we extend μ(A) = μ(B) using B ∈ F above.
The extension of μ to Fμ results in a measure again, which we tacitly denote by μ
as well. For the Lebesgue measure, the construction B(Rd )Ld yields the Lebesgue
measurable sets. The measure space

, B()Ld , Ld

associated to ∈ B(Rd )Ld presents the basis for the standard Lebesgue integration
on . If nothing else is mentioned explicitly, the notions of measure and integration
theory refer to this measure space.
36 2 Mathematical Preliminaries

We can now establish the integral for nonnegative functions:

Definition 2.42 (Measurable Nonnegative Functions, Integral) Let (, F, μ)
be a measure space. A function f : → [0, ∞] is called measurable if

f −1 ([a, b]) ∈ F for all 0 ≤ a ≤ b.

For measurable functions with finite range, called step functions, the integral is
defined by

f (t) dμ(t) = f dμ = μ f −1 ({a}) a ∈ [0, ∞].
a∈f ()

The integral for general measurable nonnegative functions is defined by means of

f (t) dμ(t) = f dμ = sup g dμ g step function, g ≤ f ;

if this supremum is finite, we call f integrable.

The value of the integral is in particular invariant with respect to changes in the
integrand on a measurable null set. Thus, it is possible to define measurability and
integrability for functions that are defined only almost everywhere on (extending
the function with 0, for instance). That is why, despite the sloppiness, we use the
notation “f : → [0, ∞] measurable” also in the case that the domain of f is not
the whole of .
The notion of the integral for nonnegative functions presents the basis for the
Lebesgue integral. Generally, measurable functions with values in a Banach space
are defined to be integrable if their pointwise norm is integrable.
Definition 2.43 (Measurability/Integrability for Vector-Valued Mappings) Let
(, F, μ) be a measure space and (X, · X ) a Banach space.
A step function f : → X, i.e., f () is finite, is called measurable if
f −1 ({x}) ∈ F for all x ∈ X. A general mapping f : → X is called μ-measurable
or measurable if there exists a sequence (fn ) of measurable step functions that
converges pointwise to f almost everywhere.
If f is a step function, the integral is given by

f (t) dμ(t) = f dμ = μ f −1 ({x}) x.
x∈f ()

Otherwise, we set

f (t) dμ(t) = f dμ = lim fn dμ
n→∞
2.2 Elements of Measure and Integration Theory 37

where (fn ) is a sequence of step functions that converges pointwise to f almost

everywhere. If f (t)X dμ(t) < ∞, then f is called integrable.
Note that the latter form of the integral is the more interesting construction. One
can show that this construction indeed makes sense, i.e., that for every integrable f ,
there exists a sequence of step functions that converges pointwise almost everywhere
to f (in the sense of norm convergence) and that the limit in the definition exists
(for which the completeness of the space is needed). Furthermore, the integral is
independent of the choice of the sequence. The linearity of the integral with respect
to the integrand also follows immediately from this definition as well.
• If X is the space of real numbers, the measurability of f : → R is equivalent
to the measurability of f+ = max (0, f ) and f− = − min(0, f ). Also, f is
integrable if and only if f+ are f− are integrable; additionally,

f dμ = f+ dμ − f− dμ.

The integral is monotonic, i.e., for f, g : → R integrable or f, g nonnegative

and measurable, we have

f ≤ g almost everywhere ⇒ f dμ ≤ g dμ.

• For f : → C, the measurability of f is equivalent to the measurability of Re f

and Im f . Integrability is also the case if and only if the real and imaginary parts
are integrable. For the integral, one has

f dμ = Re f dμ + i Im f dμ.

The analogue holds for N-dimensional K-vector spaces X.

• If X is a general Banach space, the integral introduced in Definition 2.43 is
also called the Bochner integral (cf. [147]). Mappings that are integrable in this
sense exhibit, apart from the image of a null set, a separable range in X. This is
particularly important for infinite-dimensional Banach spaces.
• An important relationship between the integral of vector-valued mappings and
nonnegative functions is given by the following fundamental estimate:

f (t) dμ(t) ≤ f (t)X dμ(t).
X
38 2 Mathematical Preliminaries

• In connection with the integral sign, several special notations are common: if A is
a μ-measurable subset of , then the integral of a measurable/integrable f over
A is defined by

f (t) if t ∈ A,
f dμ = f¯ dμ, f¯(t) =
A 0 otherwise.

This corresponds to the integral over the restriction of μ to A.

In the case of the Lebesgue measure Ld and a Lebesgue measurable subset
⊂ Rd , one often abbreviates

f (t) dLd (t) = f (t) dt = f dt,

where the latter notation can lead to misunderstandings and is used only if it is
evident that f is a function of t.
The notions μ-measurability and μ-integrability for non-negative functions as
well as vector-valued mappings present the respective analogues for the completed
measure space (, Fμ , μ). They constitute the basis for the Lebesgue spaces, which
we will introduce now as spaces of equivalence classes of measurable functions.

2.2.2 Lebesgue Spaces and Vector Spaces of Measures

Definition 2.44 (Lebesgue Spaces) Let (, Fμ , μ) be a complete measure space

and (X, · X ) a Banach space. For a μ-measurable f : → X, we denote by [f ]
the equivalence class associated to f under the relation

f ∼g ⇔ f =g almost everywhere.

Note that [f ] is measurable or integrable if there exists a measurable or integrable

representative, respectively.
Let p ∈ [1, ∞[. The Lebesgue space of p-integrable functions is the set

p
Lpμ (, X) = [f ] f : → X μ-measurable, f (t)X dμ(t) < ∞

endowed with the norm

1/p
p
[f ]p = f (t)X dμ(t) .

2.2 Elements of Measure and Integration Theory 39

The space of the essentially bounded mappings is defined as the set

L∞
μ (, X) = {[f ] f : → X μ-measurable, f ( · )X : → R bounded}

together with the norm

[f ]∞ = inf sup g(t)X g : → X measurable, f ∼ g .
t ∈

For real-valued functions f : → [−∞, ∞], we define the essential supremum

and the essential infimum by, respectively,

ess sup f = inf {sup g(x) g : → [−∞, ∞] measurable, f ∼ g} and
x∈

ess inf f = sup { inf g(x) g : → [−∞, ∞] measurable, f ∼ g}.
x∈

Example 2.45 (Standard Lebesgue Spaces) In the case that ⊂ Rd is non-empty

and measurable, we denote the spaces associated to (, B()Ld , Ld ) simply
by Lp (, X). For X = K, we also write Lp (). These spaces are the standard
Lebesgue spaces.
Regarding the transition to equivalence classes and the definition of the Lebesgue
spaces, let us remark the following:
• The norms used in Definition 2.44 are well-defined mappings on the equivalence
classes, i.e., they do not depend on the respective representative. As is common
in the literature, we will tacitly select a representative for f ∈ Lp (, X) and
then denote it by f as well, as long as all that follows is independent of the
representative.
• · p is indeed a norm on a vector space: The positive homogeneity can be
deduced from the definition directly. For integrals, one also has the Minkowski
inequality
1/p 1/p 1/p
p p p
f + gX dμ ≤ f X dμ + gX dμ ,

which yields the triangle inequality for p ∈ [1, ∞[; the case p = ∞ can
be proved in a direct way. Finally, the requirement of the positive definiteness
reflects the fact that a transition to equivalence classes is necessary, since the
integral of a nonnegative, measurable function vanishes if and only if it is zero
almost everywhere.
For sequences of measurable or integrable functions in the sense of Lebesgue,
several convergence results hold. The most important of these findings are as
follows. Proofs can be found in [50, 53], for instance.
40 2 Mathematical Preliminaries

Lemma 2.46 (Fatou’s Lemma) Let (, Fμ , μ) be a measure space and (fn ) a
sequence of nonnegative measurable functions fn : → [0, ∞], n ∈ N. Then,

lim inf fn (t) dμ(t) ≤ lim inf fn (t) dμ(t).
n→∞ n→∞

Theorem 2.47 (Lebesgue’s Dominated Convergence Theorem) Let (, Fμ , μ)

be a measure space, X a Banach space, 1 ≤ p < ∞ and (fn ) a sequence in
p
Lμ (, X) that converges to f : → X pointwise μ-almost everywhere. Assume
that there exists g ∈ Lp () that satisfies fn (t)X ≤ g(t) μ-almost everywhere for
p
all n ∈ N. Then, f ∈ Lμ (, X) as well as

p
lim fn (t) − f (t)X dμ(t) = 0.
n→∞

Let us now turn to the functional-analytical aspects, first to the completeness of

the Lebesgue spaces, which is based on the following result:
p
Theorem 2.48 (Fischer-Riesz) Let 1 ≤ p < ∞ and Lμ (, X) a Lebesgue space.
p
For every Cauchy sequence (fn ) in Lμ (, X), there exists a subsequence (fnk ) that
p
converges to f ∈ Lμ (, X) pointwise μ-almost everywhere.
p
In particular, that fn → f in Lμ (, X) for n → ∞.
This assertion holds also for p = ∞, but the restriction to subsequences is not
necessary.
p
Corollary 2.49 The Lebesgue spaces Lμ (, X) are Banach spaces. If X is a
Hilbert space, then L2μ (, X) is also a Hilbert space with the inner product

(f, g)2 = f (t), g(t) X
dμ(t).

Example 2.50 (Sequence Spaces) Using the counting measure μ on N yields

Lebesgue spaces that contain sequences in X (the empty set ∅ being the only null
set). They are accordingly called sequence spaces. We use the notation p (X) for
general Banach spaces X and write p in the case of X = K. The norm of a sequence
f : N → X can be simply expressed by

∞
p 1/p
f p = fn X , f ∞ = sup fn X .
n=1 n∈N

Due to the above results, Lebesgue spaces become part of Banach and Hilbert
space theory. This also motivates further topological considerations, such as the
p
characterization of the dual spaces, for instance: if Lμ (, X) is a Hilbert space,
i.e., p = 2 and X is a Hilbert space, then the Riesz representation theorem
2.2 Elements of Measure and Integration Theory 41

∗
(Theorem 2.35) immediately yields L2μ (, X) = L2μ (, X) in the sense of the
Hilbert space isometry

∗
J : L2μ (, X) → L2μ (, X) , (Jf ∗ )f = (f (t), f ∗ (t)) dμ(t).

The situation is similar for a general p: With the exception of p = ∞, the

corresponding dual spaces are again Lebesgue spaces with an analogous isometry.
However, there are restrictions if the range X is of infinite dimension (cf. [50]).
Theorem 2.51 (Dual Space of a Lebesgue Space) Let X be a reflexive Banach
p
space, 1 ≤ p < ∞, and Lμ (, X) the Lebesgue space with respect to the σ -finite
measure space (, Fμ , μ). Denote by p∗ the dual exponent, i.e., the solution of
1/p + 1/p∗ = 1 (with p∗ = ∞ if p = 1). Then,

∗ ∗
J : Lpμ (, X∗ ) → Lpμ (, X) , (Jf ∗ )f = f ∗ (t), f (t)X∗ ×X dμ(t)

defines a Banach space isometry.

Remark 2.52
• The fact that the above J is a continuous mapping with

Jf ∗ Lpμ (,X)∗ ≤ f ∗ p∗

leads to Hölder’s inequality:

1
∗ p∗
1
p∗
f (t), f (t)X ∗ ×X dμ(t) ≤ p
f (t)X dμ(t)
p
f ∗ (t)X ∗ dμ(t) .

Equality is usually obtained by choosing a normed sequence (f n ) that satisfies

Jf ∗ , f n → f ∗ p∗ . The surjectivity of J , however, is a deeper result of
measure theory: it is based on the Radon-Nikodym theorem, an assertion about
when a measure can be expressed as an integral with respect to another measure
(cf. [71]).
• As mentioned before, every finite-dimensional space is reflexive. Thus, the
condition that X has to be reflexive is relevant only in the infinite dimensional
case. In particular, there are examples of nonreflexive spaces X for which the
above characterization of the dual space does not hold.
One can easily calculate that the “double dual exponent” p∗∗ satisfies p∗∗ = p
for finite p. Therefore, we have the following corollary.
p
Corollary 2.53 The spaces Lμ (, X) are reflexive if p ∈ ]1, ∞[ and X is reflexive.
42 2 Mathematical Preliminaries

In particular, according to the Eberlein-Šmulyan theorem (Theorem 2.22), every

p
bounded sequence in Lμ (, X), 1 < p < ∞, possesses a weakly convergent
subsequence.
Functions in Lp (, X) can be approximated by continuous functions, in partic-
ular by those with compact support:
Definition 2.54 Let ⊂ Rd be a nonempty, open subset endowed with the relative
topology on Rd .
A function f ∈ C (, X) has a compact support in if the set

supp f = {t ∈ f (t) = 0}

is compact in .
We denote the subspace of continuous functions with compact support by

Cc (, X) = {f ∈ C (, X) f has compact support} ⊂ C (, X).

Theorem 2.55 For 1 ≤ p < ∞, the set Cc (, X) is dense in Lp (, X), i.e., for
every f ∈ Lp (, X) and every ε > 0, there exists some g ∈ Cc (, X) such that
f − gp ≤ ε.
Apart from Lebesgue spaces, which contain classes of measurable functions,
we also consider Banach spaces of measures. In particular, we are interested in
the spaces of signed or vector-valued Radon measures. These spaces, in some
sense, contain functions, but additionally also measures that cannot be interpreted
as functions or equivalence classes of functions. Thus, they already present a set
of generalized functions, i.e., of objects that in general can no longer be evaluated
pointwise. In the following, we give a summary of the most important corresponding
results, following the presentation in [5, 61].
Definition 2.56 (Vector-Valued Measure) A function μ : F → X on a measur-
able space (, F) into a finite-dimensional Banach space X is called a vector-valued
measure if
1. μ(∅) = 0 and
2. for Ai ∈ F, i ∈ N, with Ai mutually disjoint, one has

∞
μ Ai = μ(Ai ).
i∈N i=1

For X = R, the function μ is also called a signed measure.

In the special case of F = B(), we speak of a vector-valued finite Radon
measure.
An essential point that distinguishes vector-valued measures from positive
measures is the fact that the “value” ∞ is not allowed for the former, i.e., every
2.2 Elements of Measure and Integration Theory 43

measurable set exhibits a “finite measure.” Furthermore, there is the following

fundamental relationship between vector-valued measures and positive measures in
the sense of Definition 2.37.
Definition 2.57 (Total Variation Measure) Let μ be a vector-valued measure on
a measurable space (, F) with values in X. Then the measure |μ| given by

∞

A∈F: |μ|(A) = sup μ(Ai )X A = Ai , Ai ∈ F mutually disjoint
i=1 i∈N

denotes the total variation measure or variation measure associated to μ.

One can show that |μ| always yields a positive finite measure. If μ is a vector-
valued finite Radon measure, then |μ| is a positive finite Radon measure. In fact, one
can interpret the total variation measure as a kind of absolute value; in particular,
we have the following.
Theorem 2.58 (Polar Decomposition of Measures) For every vector-valued
measure μ on , there exists an element σ ∈ L∞
|μ| (, X) that satisfies σ (t)X =
1 |μ|-almost everywhere as well as

μ(A) = σ (t) d|μ|(t)
A

for all A ∈ F. The pair (|μ|, σ ) is called the polar decomposition of μ.

Viewed from the other side, every positive measure yields a vector-valued measure
by means of the above representation (together with a function that is measurable
with respect to that measure and exhibits norm 1 almost everywhere). In this sense,
the notion of a vector-valued measure does not yield anything new at first sight.
However, we gain a Banach space structure:
Theorem 2.59 (Space of Radon Measures) The set

M(, X) = {μ : B() → X μ vector-valued finite Radon measure},

endowed with the norm μM = |μ|(), is a Banach space.

Example 2.60 (Vector-Valued Finite Radon Measures) Let ⊂ Rd , d ≥ 1, be a
nonempty open subset. To every k ∈ [0, d] and every element f ∈ L1Hk (, X), we
can associate a measure μf ∈ M(, X) by means of

μf (A) = f (t) dHk (t).
A

Since μf M = f 1 , this mapping is injective and the range is a closed subspace.
44 2 Mathematical Preliminaries

For k = d, one can interpret L1 (, X) as a subspace of M(, X) since Hd =

Ld .For k < d, due to the requirement that f be integrable, μf can be nontrivial
only on “thin” sets such as suitable k-dimensional surfaces. For k = 0, the measure
μf canbe expressed through an at most countable sequence of Dirac measures:
μf = ∞ i=1 xi δti with x ∈ (X) and ti ∈ mutually disjoint.
1

A characterization of M(, X) that allows for far-reaching functional-analytical

implications is the identification of the space of Radon measures with a suitable
Banach space, namely the dual space of the continuous functions that, roughly
speaking, vanish on the boundary of .
Definition 2.61 The closure of Cc (, X) in C (, X) is denoted by C0 (, X).
Theorem 2.62 (Riesz-Markov Representation Theorem) Let ⊂ Rd , d ≥ 1,
be a nonempty open subset and F ∈ C0 (, X∗ )∗ . Then there exists a unique μ ∈
M(, X) (with the polar decomposition |μ| and σ ) such that

F, f = f dμ = f (t), σ (t) d|μ|(t) for all f ∈ C0 (, X∗ ).

Furthermore, one has F = μM .

Example 2.63 For = [a, b], the Riemann integral for f ∈ C0 () defines an
element in C0 ()∗ by means of
b
f → f (t) dt,
a

and therefore, it yields a finite, even positive, Radon measure that coincides with the
restriction of the one-dimensional Lebesgue measure to [a, b].
By means of the characterization of M(, X) as a dual space, we immediately
obtain the notion of weak* convergence in the sense of Definition 2.19: in fact,
∗
μn μ for a sequence (μn ) in M(, X) and μ ∈ M(, X) if and only if

f dμn → f dμ for all f ∈ C0 (, X∗ ).

In the case that is a subset of Rd , the Riesz-Markov representation theorem,

together with the Banach-Alaoglu theorem (Theorem 2.21), yields the follow-
ing compactness result, which is similar to the weak sequential compactness in
p
Lμ (, X).
Theorem 2.64 Let ⊂ Rd be a nonempty, open subset. Then every bounded
sequence (μn ) in M(, X) possesses a weak*-convergent subsequence.
2.2 Elements of Measure and Integration Theory 45

2.2.3 Operations on Measures

For two measure spaces given on 1 and 2 , respectively, one can easily construct
a measure on the Cartesian product 1 × 2 . This, for example, is helpful for
integration on Rd1 +d2 = Rd1 × Rd2 .
Definition 2.65 (Product Measure) For measure spaces (1 , F1 , μ1 ) and
(2 , F2 , μ2 ), the product F1 ⊗ F2 denotes the σ -algebra induced by the sets
A × B, A ∈ F1 and B ∈ F2 .
A product measure μ1 ⊗ μ2 is a measure given on F1 ⊗ F2 that satisfies

(μ1 ⊗ μ2 )(A) = μ1 (A)μ2 (B) for all A ∈ F1 , B ∈ F2 .

The triple (1 × 2 , F1 ⊗ F2 , μ1 ⊗ μ2 ) denotes a product measure space.

Based on that, we can define measurability and integrability for product mea-
sures, of course. The key question, whether a product measure is uniquely defined
and whether in the case of its existence, the integral can be calculated by means of a
double integral, is answered by Fubini’s theorem. This result often justifies changing
the order of integration as well.
Theorem 2.66 (Fubini) Let (1 , F1 , μ1 ) and (2 , F2 , μ2 ) be σ -finite measure
spaces and X a Banach space.
Then there exists a unique product measure μ1 ⊗ μ2 , the corresponding product
measure space (1 ×2 , F1 ⊗F2 , μ1 ⊗μ2 ) is σ -finite and one has the following:
1. The measure of every measurable set A ⊂ 1 × 2 can be expressed through

(μ1 ⊗ μ2 )(A) = μ1 (At ) dμ2 (t) = μ2 (As ) dμ1 (s)
2 1

by means of the sets At = {s ∈ 1 (s, t) ∈ A} and As = {t ∈ 2 (s, t) ∈ 2 }.
In particular, At is μ2 -almost everywhere μ1 -measurable, whereas As is μ1 -
almost everywhere μ2 -measurable and the functions t → μ1 (At ) and s →
μ2 (As ) are μ2 - and μ1 -measurable, respectively.
2. A (μ1 ⊗ μ2 )-measurable mapping f : 1 × 2 → X is integrable if and only
if t → 1 f (s, t)X dμ1 (s) is μ2 -integrable or s → 2 f (s, t)X dμ2 (t) is
μ1 -integrable. In particular, these functions are always measurable, and in the
case of integrability, one has

f (s, t) d(μ1 ⊗ μ2 )(s, t) = f (s, t) dμ1 (s) dμ2 (t)
1 ×2 2 1

= f (s, t) dμ2 (t) dμ1 (s).
1 2
46 2 Mathematical Preliminaries

Remark 2.67 The respective assertions
hold for the completed measure space 1 ×
2 , (F1 ⊗ F2 )μ1 ⊗μ2 , μ1 ⊗ μ2 as well.
Example 2.68 One can show that the product of Lebesgue measures is a again a
Lebesgue measure: Ld−n ⊗ Ln = Ld for integers 1 ≤ n < d (cf. [146]). This fact
facilitates integration on Rd :

f (t) dt = f (t1 , t2 ) dt1 dt2 , t = (t1 , t2 )
Rd Rn Rd−n

for f ∈ L1 (Rd , X). According to Fubini’s theorem, the value of the integral does
not depend on the order of integration.
For a measure space (1 , F1 , μ1 ), a measurable space (2 , F2 ), and a measur-
able mapping ϕ : 1 → 2 , one can abstractly define the pushforward measure:

μ2 (A) = μ1 ϕ −1 (A) , A ∈ F2 .

By means of that, (, F2 , μ2 ) becomes a measure space, and integration on 2 with

respect to the pushforward measure can be defined, and one has

f dμ2 = f ◦ ϕ dμ1 .
2 1

In applications, we wish to integrate on 2 with respect to the Lebesgue measure

and to “pull back” the integral to 1 by means of the coordinate transformation
ϕ. The following theorem shows that Ld is then the pushforward of a particular
measure.
Theorem 2.69 (Change of Variables Formula for the Lebesgue Measure) Let
1 , 2 ⊂ Rd be nonempty, open and ϕ : 1 → 2 a diffeomorphism, i.e., ϕ is
invertible and ϕ as well as ϕ −1 are continuously differentiable.
Then for all Lebesgue measurable subsets A ⊂ 2 , one has

Ld (A) = |det ∇ϕ| dLd .
ϕ −1 (A)

If f : 2 → [0, ∞] is Lebesgue measurable or respectively f ∈ L1 (2 , X) for a

Banach space X, we have

f dLd = |det ∇ϕ|(f ◦ ϕ) dLd ;
2 1

in particular, |det ∇ϕ|(f ◦ ϕ) is Lebesgue measurable or lies in L1 (1 , X),

respectively.
2.2 Elements of Measure and Integration Theory 47

We are also interested in integration on surfaces and parts of surfaces. Here, we

confine ourselves to integration with respect to the Hd−1 -measure on the boundary
∂ of a so-called domain . In order to be able to calculate such integrals, we
perform a transformation by means of a suitable parameterization. For this purpose,
we need the following notion of regularity for sets and their boundaries.
Definition 2.70 (Domain/Bounded Lipschitz Domain) A nonempty, open, and
connected set ⊂ Rd is called a domain. If additionally, is bounded, we speak
of a bounded domain.
A bounded domain is a bounded Lipschitz domain or possesses the Lipschitz
property if there exist a finite open cover U1 , . . . , Un of the boundary ∂, open
subsets V1 , . . . , Vn , as well as for every j = 1, . . . , n, a Lipschitz continuous
mapping ψj : Vj → Uj with a Lipschitz continuous inverse such that for all

x ∈ Uj , one has x ∈ if and only if ψj−1 (x) d < 0 (i.e., the d-th component
of the preimage of x under ψj is negative).
This condition is equivalent to the commonly used requirement that locally and
after a change of coordinates, the boundary of can be represented as the graph of
a Lipschitz function (cf. [137]). By means of the mappings ψj , the boundary of
can be “locally flattened”; see Fig. 2.1.
Example 2.71
1. Every open cuboid = ]a1 , b1 [ × ]a2 , b2 [ × · · · × ]ad , bd [ with ai < bi , for
i = 1, . . . , d, is a bounded Lipschitz domain.
2. More generally, everyconvex bounded domain possesses the Lipschitz property.
3. Let be = {x ∈ Rd φ(x) < 0} for a continuously differentiable function φ :
Rd → R. Then an application of the inverse function theorem implies that
possesses the Lipschitz property if it is bounded and ∇φ(x) = 0 for all x ∈ ∂.

Fig. 2.1 The mappings ψj pull the boundary of a Lipschitz domain “locally straight”
48 2 Mathematical Preliminaries

In dealing with Lipschitz domains, the possibility of localization is important, in

particular for parts of the boundary ∂. For this purpose, we quote the following
technical result:
Lemma 2.72 For every bounded Lipschitz domain and the corresponding sets
exist an open set U0 ⊂ as well as functions ϕ0 , . . . , ϕn ∈ D(Rd ) such
Uj , there
that ⊂ nj=0 Uj and for j = 0, . . . , n, we have

n
supp ϕj ⊂⊂ Uj , ϕj (x) ∈ [0, 1] ∀x ∈ Rd , ϕj (x) = 1 ∀x ∈ .
j =0

The functions ϕj are called a partition of unity on subordinated to U0 , . . . , Un .

By means of the previous lemma, an integration on ∂ can be expressed as an
integration with respect to Ld−1 (cf. [62]).
Theorem 2.73 (Integration on the Boundary)
p
1. For a bounded Lipschitz domain and p ∈ [1, ∞[, we have f ∈ LHd−1 (∂) if
and only if for a subordinated partition of unity of Lemma 2.72, one has
p
(ϕj |f |p ) ◦ ψj ( · , 0) ∈ LLd−1 (Vj ∩ Rd−1 ) for j = 1, . . . , n.

2. For f ∈ L1Hd−1 (∂), the following integral identity holds:

n

f (x) dH d−1
(x) = Jj (x) (ϕj f ) ◦ ψj (x, 0) dLd−1(x),
j =1 Vj ∩R
∂ d−1

!
where Jj (x) = det Dj (x)T Dj (x) with Dj (x) corresponding to the first d −
1 columns of the Jacobian matrix ∇ψj (x, 0) and Dj (x)T corresponding to its
transpose. In particular, the function Jj can be defined Ld−1 -almost everywhere
in Vj ∩ Rd−1 and is measurable and essentially bounded there.
3. There exists an Hd−1 -measurable mapping ν : ∂ → Rd , called the outer
normal, with |ν(x)| = 1 Hd−1 -almost everywhere such that for all vector fields
f ∈ L1 (∂, Rd ),
n

(f ·ν)(x) dHd−1(x) = Jj (x) (ϕj f )◦ψj ·Ej (x, 0) dLd−1(x),
j =1 Vj ∩R
∂ d−1

where Ej is given by Ej (x) = (∇ψj (x)−T ed )/|∇ψj (x)−T ed | and ∇ψj (x)−T
denotes the inverse of the transposed Jacobian matrix ∇ψj (x).
2.3 Weak Differentiability and Distributions 49

An important application of integration on the boundary is the generalization of

the formula for integration by parts to higher dimensions, usually called Gauss’s
theorem or the divergence theorem.
Theorem 2.74 (Gauss’s Theorem) For ⊂ Rd a bounded Lipschitz domain and
f : Rd → Kd a continuously differentiable vector field, one has

f · ν dHd−1 = div f dx,
∂

where ν denotes the outer normal introduced in Theorem 2.73. In particular, we

obtain for continuously differentiable vector fields f : Rd → Kd and continuously
differentiable functions g : Rd → K,

f g · ν dH d−1
= g div f + f · ∇g dx.
∂

A proof can be found in [62] again, for instance.

2.3 Weak Differentiability and Distributions

Differentiable functions can form function spaces as well: for a domain ⊂ Rd ,

we define
∂α
C k () = {f : → K ∂x α f ∈ C () for |α| ≤ k}

α
and analogously C k (), endowed with the norm f k,∞ = max|α|≤k ∂x∂
α f ∞ .
The space of functions that are infinitely differentiable is given by
∂α
C ∞ () = {f : → K ∂x α f ∈ C () for all α ∈ N },
d

C ∞ () defined analogously again. Furthermore, the following notion is common:

D() = {f ∈ C ∞ () supp f is compact in }.

For some applications, the classical notion of differentiability is too restrictive: In

the solution theory for partial differential equations or in the calculus of variations,
for instance, a weaker notion of a derivative is needed. For this purpose, it is
fundamental to observe that a large class of functions is characterized through
integrals of products with smooth functions. By L1loc () we denote the space of
functions whose absolute value is integrable over every compact subset of .
50 2 Mathematical Preliminaries

Lemma 2.75 (Fundamental Lemma of the Calculus of Variations) Let ⊂ Rd

be nonempty and open and f ∈ L1loc (). Then, f = 0 almost everywhere if and
only if for every φ ∈ D(), we have

f φ dx = 0.

In reference to the integrals f φ dx, we also say that f is “tested” with φ. That
is, the fundamental lemma claims that a function f ∈ L1loc () is determined almost
everywhere by testing f with all functions φ ∈ D(). Therefore, the space D()
is also called the space of test functions. It is endowed with the following notion of
convergence:
Definition 2.76 (Convergence in D()) A sequence (φn ) in D() converges to
φ ∈ D() if
1. there exists a compact set K ⊂⊂ such that supp φn ⊂ K for all n and
2. for all multi-indices α ∈ Nd ,

∂ α φn → ∂ α φ uniformly in .

By means of this notion of convergence, we can consider continuous linear

functionals on D():
Definition 2.77 (Distributions) Analogously to the dual space of a normed space,
let D()∗ denote the set of linear and continuous functionals T : D() → K. Note
that T is continuous if φn → φ in D() implies T (φn ) → T (φ). The elements of
D()∗ are called distributions. A sequence (Tn ) of distributions converges to T if
for all φ ∈ D(), one has Tn (φ) → T (φ).
Every function f ∈ L1loc () induces a distribution Tf by

Tf (φ) = f (x)φ(x) dx,

which is why distributions are also called “generalized functions.” If a distribution

T is induced by a function, we call it regular. Examples of nonregular distributions
are Radon measures μ ∈ M(, K), which also introduce distributions through

Tμ (φ) = φ(x) dμ(x).

The Dirac measures of Example 2.38 are called Dirac distributions or delta
distributions in this context.
2.3 Weak Differentiability and Distributions 51

Test functions also induce distributions, and for arbitrary multi-indices α,

according to the rule of integration by parts, one has for f, φ ∈ D() that

∂α
= (−1)|α| |α|
α
T∂ α f (φ) = ∂x α f (x)φ(x) dx
∂
f (x) ∂x α φ(x) dx = (−1) Tf (∂ α φ).

This identity gives rise to the definition of the derivative of a distribution:

∂ α T (φ) = (−1)|α| T (∂ α φ).

The derivative of a distribution induced by a function is also called distributional

derivative of that function. Note that in general, distributional derivatives of
functions are not functions, but if they are, they are called weak derivatives:
Definition 2.78 (Weak Derivative) Let ⊂ Rd be a domain, f ∈ L1loc (), and
α a multi-index. If there exists a function g ∈ L1loc () such that for all φ ∈ D(),

g(x)φ(x) dx = (−1)|α| f (x)∂ α φ(x) dx,

we say that the weak derivative ∂ α f = g exists. If for an integer m ≥ 1 and all
multi-indices α with |α| ≤ m, the weak derivatives ∂ α f exist, then f is m-times
weakly differentiable.
Note that we denote the classical as well as the weak derivative by the same
symbol, which normally will not lead to ambiguities. If necessary, we will explicitly
state which derivative is meant. According to Lemma 2.75 (fundamental lemma
of the calculus of variations), the weak derivative is uniquely determined almost
everywhere.
Definition 2.79 (Sobolev Spaces) Let be 1 ≤ p ≤ ∞ and m ∈ N. The Sobolev
space H m,p () is the set

H m,p () = {f ∈ Lp () ∂ α f ∈ Lp () for |α| ≤ m}

endowed with the so-called Sobolev norm

1/p
p
f m,p = ∂ α
f p if p<∞
|α|≤m

and f m,∞ = max|α|≤m ∂ α f ∞ , otherwise. Furthermore, we define

m,p
H0 () = D() ⊂ H m,p (),

where the closure is taken with respect to the Sobolev norm.

52 2 Mathematical Preliminaries

The Sobolev spaces are Banach spaces; for 1 < p < ∞, they are reflexive, and
for p = 2, together with the inner product

(f, g)H m = (∂ α f , ∂ α g)2 ,
|α|≤m

they form Hilbert spaces; we also write H m,2 () = H m ().

The theory of Sobolev spaces is an extensive mathematical field of study. Since
their definition and the notion of the weak derivative are based on the Lebesgue
integral, the techniques used in this context are closely connected to integration
theory. For instance, rules for derivatives such as the chain rule and the product rule
also hold for Sobolev functions; however, their justification is considerably more
complex. For this purpose, density results are essential. These results are based on
the technique of so-called mollifiers, which will be introduced in Chap. 3. For most
parts of this book, the elementary properties of Sobolev functions will be sufficient,
i.e., those properties that can be proven without getting deeper into the theory. In
Sect. 6.3, however, some deeper results will be presented in greater detail, since the
direct connection to their application is important there. Anyhow, a comprehensive
treatment of Sobolev spaces will be beyond the scope of this book and we refer to
the corresponding literature (see [2] or [149], for instance).
For assertions on the properties of Sobolev functions, the nature of the domain
is crucial in many cases. In particular, the behavior of the functions close to
the boundary ∂ is important, which is why assumptions on the boundary are
necessary in many cases. For our purposes, however, the notion of a Lipschitz
domain introduced in Sect. 2.2.3 will be sufficient.
Let us remark on the sense in which Sobolev functions can attain values on the
boundary: Obviously, every f ∈ C ()∩H 1,p () is continuous on the boundary, and
p
thus the restriction lies in every LHd−1 (∂). Being equivalence classes of functions,
general Sobolev functions are usually not uniformly continuous; however, they
possess a so-called trace on ∂:
Theorem 2.80 (Traces of Sobolev Functions) Let be a bounded Lipschitz
domain and p ∈ [1, ∞[. Then there exists a unique linear and continuous mapping
p
T : H 1,p () → LHd−1 (∂) such that for all u ∈ H 1,p () ∩ C (), one has T u =
p
u|∂ . The mapping T is called the trace operator, and the image T u ∈ LHd−1 (∂)
of u ∈ H 1,p () is called the trace of the Sobolev function u.
A proof can be found in [104], for instance. Roughly speaking, it is based on
defining T on H 1,p () ∩ C () and then continuously extending it. An important
2.3 Weak Differentiability and Distributions 53

property of the trace is the fact that Gauss’s theorem holds for Sobolev functions on
Lipschitz domains:
Theorem 2.81 (Gauss’s Theorem, Weak Form) If is a bounded Lipschitz
d
domain and f ∈ H 1,1(, Kd ) = H 1,1 () is a Sobolev vector field. Then

f · ν dHd−1 = div f dx
∂

where f ∈ L1Hd−1 (∂) is the Sobolev trace of Theorem 2.80 and ν is the outer
normal introduced in Theorem 2.73. In particular, we have for f ∈ H 1,p (, Kd )
∗
and g ∈ H 1,p (),

f g · ν dH d−1
= g div f + f · ∇g dx.
∂

The proof is again based on the fact that the assertion holds for smooth functions
and vector fields. Then density arguments transfer the result to the general case. For
the second claim, we additionally use that the product satisfies fg ∈ H 1,1(, Kd )
and div(f g) = g div f + f · ∇g.
Chapter 3
Basic Tools

In this book we regard, admittedly slightly arbitrarily, as basic tools histograms

and linear and morphological filters. These tools belong to the oldest methods in
mathematical image processing and are discussed in early books on digital image
processing as well (cf. [67, 114, 119]).

3.1 Continuous and Discrete Images

In Sect. 1.1 we considered images with continuous and discrete image domains. In
this book, we essentially work with continuous image domains. However, there are
good reasons to deal with discrete images and in particular with the connection of
discrete and continuous images:
• In practice, images are given in discrete form.
• In order to apply continuous methods to discrete images, the method has to
be discretized. This can, for instance, be achieved by interpolating the discrete
image to a continuous one and then employing the continuous method.
• Also images that are given in discrete form often stem from “continuous
brightness distributions.” For this purpose, the continuous scene was sampled.
What does this sampled image have to do with the real image?
Let us first deal with the interpolation of images.

3.1.1 Interpolation

We consider a discrete image U : {1, . . . , N } × {1, . . . , M} → R. If we want

to rotate this image, for instance, we can determine the value of a pixel of the

© Springer Nature Switzerland AG 2018 55

K. Bredies, D. Lorenz, Mathematical Image Processing,
Applied and Numerical Harmonic Analysis,
https://doi.org/10.1007/978-3-030-01458-2_3
56 3 Basic Tools

rotated image by calculating where this pixel was located before the rotation. This
point, however, will in general not be a pixel of the original image, i.e., we have
to evaluate the image at intermediate points. Of course, the same happens for other
geometric transformations such as stretching, shrinking, shearing, and shifting, for
instance. Let us define the following geometric operations on images, which we will
encounter frequently in this book:
Definition 3.1 For y ∈ Rd and A ∈ Rd×d , we define

ty : Rd → Rd , d A : Rd → Rd ,
ty (x) = x + y, dA (x) = Ax.

By means of that, we define the linear operators for translation (shifting) and linear
coordinate transformation (scaling) on C (Rd ) by

Ty : C (Rd ) → C (Rd ), DA : C (Rd ) → C (Rd ),

Ty u = u ◦ ty , DA u = u ◦ dA .

Remark 3.2 The operators Ty and DA act “from the right,” i.e., they are applied
before the use of the function u. For concatenation, one has

(Ty DA u)(x) = (Ty (u ◦ dA ))(x) = (u ◦ dA ◦ ty )(x) = u(A(x + y)),

(DA Ty u)(x) = (DA (u ◦ ty ))(x) = (u ◦ ty ◦ dA )(x) = u(Ax + y).

By means of this, the piecewise constant interpolation of U can be written as

N
u(x) = Uj φj0 (x).
j =1

Piecewise Linear Interpolation Piecewise linear interpolation is the classical

interpolation ansatz. Between the grid points, the image data is extended linearly,
which corresponds to the interpolation with splines of first order. We define the
vector space

V 1 = {u : [1, N ] → R u is continuous , u|[j,j +1[ is linear for j = 1, . . . , N − 1}.

Obviously, a basis of this vector space is given by the hat functions φj1 (x) =
T−j φ 1 (x) = φ 1 (x − j ) with
⎧
⎪
⎨x + 1
⎪ if x ∈ [−1, 0[,
1
φ (x) = 1 − x if x ∈ [0, 1[,
⎪
⎪
⎩0 otherwise.

By means of this, the piecewise linear interpolation of U can be written as

N
u(x) = Uj φj1 (x).
j =1
58 3 Basic Tools

Further Interpolation Functions Technically, one can use any interpolating

function φ for interpolation. We call a function φ : R → R an interpolation function
if
⎧
⎪
⎪ if x = 0
⎨1
φ(x) = 0 if x ∈ Z \ {0}
⎪
⎪
⎩arbitrary else.

A function that will play a major role in Sect. 4.2.2 is the sinc function:

sin(πx)
if x 0,
sinc(x) = πx
1 if x = 0.

Analogously to nearest-neighbor and bilinear interpolation, we obtain the inter-

polation rule

N
u(x) = Uj T−j sinc(x).
j =1

Tensor Product Interpolation The methods introduced in the previous paragraphs

can easily be generalized to multidimensional images. Here, we state only the
formulas for two-dimensional images U ∈ RN×M :
Nearest neighbor:

N
M
u(x, y) = U 1 1 = Ui,j φi0 (x)φj0 (y).
x+ 2 ", y+ 2 "
i=1 j =1

Piecewise bilinear interpolation:

N
M
u(x, y) = Ui,j φi1 (x)φj1 (y).
i=1 j =1

General interpolating function:

N
M
u(x, y) = Ui,j T−i φ(x)T−j φ(y).
i=1 j =1
3.1 Continuous and Discrete Images 59

3.1.2 Sampling

In order to digitize a continuous image u : → R, it is usually sampled.

This can be achieved in different ways. Let us first describe the so-called point
sampling, i.e., one defines sampling points xi in the image domain , and the values
(Ui ) = (u(xi )) are stored. Usually, images are sampled on regular grids. Let us
assume that our image is given on a rectangle or, to make it even simpler, on a
square = [0, 1]2, so a natural choice for the grid is given by xi,j = (i/N, j/N),
i, j = 1, . . . , N. We can then regard the discrete image (Ui,j ) = (u(xi,j )) as an
element U ∈ RN×N as well.
Remark 3.3 (Discrete Images as Delta Comb) We can also apply the continuous
approach to sampled images. For this purpose, in the case of point sampling, the
following observation
is helpful: For a countable set I , let (Ui )i∈I be a discrete
image such that |Ui | < ∞. On the set I , we define the counting measure μ:

number of elements in J if J finite ,
μ(J ) =
∞ if J infinite.

By means of that, we can regard U as an element in L1μ (I ), since

U L1μ (I ) = |Ui |.
i∈I

Using the Dirac measure of Example 2.38, we can view U as a delta comb

U= Ui δxi

and therefore, as a measure on .

Another sampling method is the sampling of mean values. In this case, the
domain is partitioned into subsets i , and the average values of u : → R
on each subset are stored:

1
Ui = u(x) dx.
|i | i

This approach is somewhat closer to what happens inside a digital camera: the chip
collects photons over a small area. Mathematically, one can argue that an image u
cannot be evaluated pointwise, since it actually corresponds to the “distribution”
of brightness. Slightly more generally, we can define the mean sampling also by
60 3 Basic Tools

means of a test function φ ∈ D(Rd ) with φ ≥ 0 and φ dx = 1. For this purpose,
let xi ∈ Rd be the sampling points and

Ui = φ(x − xi )u(x) dx.
Rd

In this case, the mean sampling is justified for distributions as well, since it
corresponds to the application of the distribution to a test function in the sense of
Sect. 2.3.
When an image is sampled, some information is obviously lost. However, it can
even happen that “wrong” or undesired information creeps in; cf. Fig. 3.1. With
both point sampling and mean sampling, errors occur in this way, yet the errors
introduced by mean sampling are less obvious. The sampling of continuous images
(or signals, respectively) is a particular mathematical theory, which we will cover in
Sect. 4.2.2. We can then explain in Sect. 4.2.3 the so-called “alias effect” shown in
Fig. 3.1.

Fig. 3.1 Error due to wrong sampling. Top: Original image in high resolution. Lower left: The
same image after an eight times point subsampling, i.e., in the horizontal and vertical directions,
every eighth value was taken. Lower right: The same image after an eight times mean subsampling,
i.e., the mean was taken of squares of eight by eight pixels, respectively. In order to ease the
comparison, the images were scaled to the same size
3.1 Continuous and Discrete Images 61

3.1.3 Error Measures

Definition 3.4 (Mean Squared Error (MSE)) For two continuous images u, v ∈
L2 (), the mean squared error is given by

1 1
MSE(u, v) = u − v22 = (u(x) − v(x))2 dx.
|| ||

If U, V ∈ RN×M are discrete images, we have

1
MSE(U, V ) = (Ui,j − Vi,j )2 .
NM
i,j

The mean squared error can be used to evaluate the difference of images, for instance
to compare the result of a denoising method with the original image.
In the context of image compression, one compares the uncompressed image
with the compressed one. For this purpose, the “peak signal-to-noise ratio,” PSNR,
is common. Essentially, the PSNR is a scaled version of the MSE; it measures the
ratio of the maximal possible energy of the signal and the energy of the existing
noise. The PSNR is usually given logarithmically (more precisely, in decibels):
Definition 3.5 (Peak Signal-to-Noise Ratio (PSNR)) For two continuous images
u, v ∈ L2 () with u, v : → [0, 1], the PSNR is given by
1
PSNR(u, v) = 10 log10 db.
MSE(u, v)

If U, V ∈ RN×M are discrete images with Ui,j , Vi,j ∈ [0, 255], then we have

2552
PSNR(U, V ) = 10 log10 db.
MSE(u, v)

Note that a higher PSNR value implies a better image quality. We set PSNR(u, u) =
∞; a PSNR value of over 40db typically means that the difference between the
images cannot be perceived; cf. Fig. 3.2. The PSNR is designed to measure noise
or compression artifacts. It is not suitable for specifying a distance between two
general images that in some sense reflects the “similarity” of these images.
62 3 Basic Tools

Fig. 3.2 The PSNR for some noisy images

3.2 Histograms

A histogram contains some important properties of the image and is very helpful
for several basic applications. Roughly speaking, a histogram specifies how often
the different gray values appear in the image. Before we introduce the histogram for
continuous images, we consider a basic example:
Example 3.6 (Histogram of a Discrete Image) We consider a discrete image u :
→ F with = {1, . . . , N } × {1, . . . , M} and F = {0, . . . , n}. The histogram
Hu of u states how often the respective gray values appear in the image:

Hu (k) = #{(i, j ) ∈ ui,j = k}.
3.2 Histograms 63

By means of the Kronecker delta, we can represent the histogram in a different way:

N
M
Hu (k) = δk,ui,j .
i=1 j =1

For a continuous image domain , the example can easily be generalized

if is
endowed with a measure μ. In this case, one sets Hu (k) = μ({x ∈ u(x) = k}).
For images with a continuous color space, the example cannot be readily adapted:
a particular gray value is attained on a subset of , which can possibly have
the measure 0 for every gray value. Therefore, the histogram cannot be defined
pointwise for every gray value. Rather, the histogram is a measure itself, which is
made precise in the following definition:
Definition 3.7 (Histogram) Let (, F, μ) be a measure space and u : → [0, 1]
a measurable image. Then the histogram Hu of u is a measure on [0, 1] defined by

Hu (E) = μ({x ∈ u(x) ∈ E}).

We see immediately that Hu ([0, 1]) = μ(). By means of the distribution

function Gu : R → R of the gray values of the image u,

Gu (s) = μ({x ∈ u(x) ≤ s}),

we obtain an alternative representation of the histogram as the distributional

derivative of the distribution function:
Theorem 3.8 Let (, F, μ) be a σ -finite measure space and u : → [0, 1]
a measurable image. Then for the distribution function Gu of u, one has the
following
1. Gu : R → [0, μ()] is monotonically increasing with Gu (s) = 0 for s <
ess inf u and Gu (s) = μ() for s > ess sup u.
2. Gu = Hu in the distributional sense.
Proof Obviously,
Gu is monotonically increasing. For s < ess inf u, the set
{x ∈ u(x) ≤ s} is either empty or a null set. In each case, its measure is zero.
For s > ess sup u, the set {x ∈ u(x) ≤ s} comprises the whole of apart from,
at most, a set of measure 0, i.e., assertion 1 holds. For assertion 2, we remark that
for φ = χ]a,b] , we have the identity
1
φ dHu (t) = Gu (b) − Gu (a) = φ ◦ u dμ,
0
64 3 Basic Tools

which together with the definition of the integral yields

1
φ dHu (t) = φ ◦ u dμ for all φ ∈ D(]0, 1[).
0

If we now test with −φ for φ ∈ D(]0, 1[), we obtain

1 1 1

− Gu (t)φ (t) dt = − 1 dμ(x) φ (t) dt = − φ (t) dt dμ(x)
0 0 {u(x)≤t} u(x)

1
= φ u(x) dμ(x) = φ dHu (t),
0

which implies Gu = Hu in the distributional sense. $

#
In Definition 3.7, we have generalized the notion of the histogram of Exam-
ple 3.6:
Example 3.9 We consider a domain that is split into three disjoint sets 1 , 2 ,
and 3 . Let the image u : → [0, ∞[ be given by
⎧
⎪
⎪ if x ∈ 1 ,
⎨s1
u(x) = s2 if x ∈ 2 ,
⎪
⎪
⎩s if x ∈ 3 ,
3

with 0 < s1 < s2 < s3 , i.e., the color space is essentially discrete. Then the
distribution function reads
⎧
⎪
⎪ 0 if s < s1 ,
⎪
⎪
⎨μ( ) if s1 ≤ s < s2 ,
1
Gu (s) =
⎪
⎪ μ(1 ) + μ(2 ) if s2 ≤ s < s3 ,
⎪
⎪
⎩
μ(1 ) + μ(2 ) + μ(3 ) = μ() if s3 ≤ s.

Therefore, the histogram is given by

Hu = Gu = μ(1 )δs1 + μ(2 )δs2 + μ(3 )δs3 ,

where δsi denotes the Dirac measure of Example 2.38.

In two small applications, we demonstrate for what purposes the histogram can
be utilized.
Application 3.10 (Contrast Improvement Through Histogram Equalization)
Due to bad conditions while shooting a photo or wrongly adjusted optical settings,
3.2 Histograms 65

the resulting image may exhibit a low contrast. By means of the histogram, a
reasonable contrast improvement method can be motivated.
An image with high contrast usually has gray values in all the range available. If
we assume this range to be the interval [0, 1], the linear gray value spread

s − ess inf u
s →
ess sup u − ess inf u

leads to a full coverage of the range. However, this does not necessarily suffice to
increase the contrast sufficiently in all parts of the image, i.e., in particular areas,
the contrast can still be improvable. One possibility is to distribute the gray values
equally over the range of gray values as much as possible. For this purpose, we look
for a monotonic function : R → [0, 1] such that

H◦u (]a, b]) = |b − a|μ().

Using the distribution function, this reads

G◦u (s) = sμ().

If we assume that is invertible, we obtain

sμ() = μ({x ∈ (u(x)) ≤ s})

= μ({x ∈ u(x) ≤ −1 (s)})
= Gu (−1 (s)).

This leads to −1 (s) = G−1

u (sμ()) and finally to

(s) = Gu (s)/μ().

Hence, the gray value transformation is the distribution function itself.

If the range of gray values is discrete, i.e., u : → {0, . . . , n}, for instance, we
have to round in some meaningful way:

n n s0
(s0 ) = round Gu (s0 ) = round Hu (s) . (3.1)
μ() μ()
s=0

Of course, in a discrete setting, an equalized histogram cannot be achieved, since

equal gray values are mapped to equal gray values again. Nevertheless, this gray
value transformation often yields good results, see Fig. 3.3.
66 3 Basic Tools

500 500 500

400 400 400
300 300 300
200 200 200
100 100 100

50 100 150 200 250 50 100 150 200 250 50 100 150 200 250

250 250
200 200
150 150
100 100
50 50

50 100 150 200 250 s 50 100 150 200 250s

Fig. 3.3 Histogram equalization according to Application 3.10. Left column: Original image with
histogram. Middle column: Spreading of the gray values to the full range. Right column: Image
with equalized histogram. Lowest row: Respective transformation

Application 3.11 (Segmentation Through Thresholding) A scan u : →

[0, S] of a handwritten note or a black-and-white printout typically contains noise.
In particular, in most cases, nearly all gray levels occur. If this scan is to be printed
or archived, for instance, it is reasonable to first reduce the levels of gray to two
again, namely black and white. For this purpose, it is advisable to use a threshold:
All gray values below the threshold s0 become black; all gray values above become
white,

0 if u(x) ≤ s0 ,
ũ(x) →
1 if u(x) > s0 .

The question remains how to determine the threshold. There are numerous corre-
sponding methods. In this simple case, the following idea often works well:
We select the threshold such that it corresponds to the arithmetic mean of the centers of
mass of the histogram above and below the threshold.
3.2 Histograms 67

Since the center of mass corresponds to the normed first moment, we can write this
is formula as follows: The threshold s0 satisfies the equation

s S
1 0 0 s dHu (s) s s dHu (s)
s0 = s0 + S0 .
2 0 1 dHu(s) 1 dH u (s)
s0

This equation can be solved by a fixed-point iteration, for instance

s S
1 0 0 s dHu (s) s s dHu (s)
s0n+1 = f (s0n ) with f (s0 ) = s0 + S0 .
2 0 1 dHu (s)
s0 1 dHu (s)

Why does this fixed-point iteration converge? The iteration map f , as a sum of two
increasing functions, is monotonically increasing. Furthermore, we have
S S
1 0 sHu (s) ds 1 0 sHu (s) ds
f (0) = S ≥ 0, f (S) = S +S ≤ S.
2 2
0 Hu (s) ds 0 Hu (s) ds

Due to the monotonicity, there exists at least one fixed point with a slope of less than
one. Thus, the fixed-point iteration converges.
This method is also known as the isodata algorithm and has been used since the
1970s (cf. [117]). An example is given in Fig. 3.4.

Fig. 3.4 Segmentation through thresholding according to Application 3.11. Left: Scanned hand-
writings. Right: Segmentation
68 3 Basic Tools

3.3 Linear Filters

Linear filters belong to the oldest tools in digital image processing. We first consider
an introductory example:
Example 3.12 (Denoising with the Moving Average) We consider a continuous
image u : Rd → R and expect that this image exhibits some form of noise, i.e., we
assume that the image u results from a real u† by adding noise n:

u = u† + n.

Furthermore, we assume that the noise is distributed in some way uniformly around
zero (this is not a precise mathematical assumption, but we do without a more
precise formulation here). In order to reduce the noise, we take averages over
neighboring values and hope that this procedure will suppress the noise. In formulas,
this reads: for a radius r > 0, we compute

1
Mr u(x) = d u(y) dy.
L (Br (0)) Br (x)

We can equivalently write this by means of the indicator function as

1
Mr u(x) = u(y)χBr (0) (x + y) dy.
Ld (Br (0)) Rd

This operation is also called moving average, see Fig. 3.5.

Let us further remark that the operation Mr also models the “out-of-focus” blur,
i.e., the blurring that occurs if the object does not lie in the plane of focus of the
camera.
The example above describes what in digital image processing is called a filter.
The function χBr (0) is called a filter function. The mathematical structure that

Fig. 3.5 Denoising with moving average. Left: Original image. Middle: Original image with
noise. Right: Application of the moving average. Next to this: The indicator function used
3.3 Linear Filters 69

underlies filtering is the convolution (apart from the sign in the argument of the
filter function).

3.3.1 Definition and Properties

The restriction to indicator functions in Example 3.12 is not necessary, of course.

We can consider more general weighted moving averages. For this purpose, let u, h :
Rd → R be measurable. Then we define the convolution of u with h by

(u ∗ h)(x) = u(y)h(x − y) dy.
Rd

The function h is also called a convolution kernel. Obviously, the convolution is

linear in both arguments. Furthermore, the convolution is translation invariant in the
following sense:

Ty (u ∗ h) = (u ∗ Ty h) = (Ty u ∗ h).

Further properties of the convolution are listed in the following theorem:

Theorem 3.13 (Properties of the Convolution)
1. For 1 ≤ p, q, r ≤ ∞, 1r + 1 = p1 + q1 , u ∈ Lp (Rd ), and v ∈ Lq (Rd ), we have
u ∗ v = v ∗ u ∈ Lr (Rd ), and in particular, Young’s inequality holds:

u ∗ vr ≤ up vq .

2. For 1 ≤ p ≤ ∞, u ∈ Lp (Rd ), and φ : Rd → R k-times continuously

differentiable with compact support, the convolution u∗φ is k-times continuously
differentiable, and for all multi-indices α with |α| ≤ k, one has

∂ α (u ∗ φ) ∂αφ
= u ∗ .
∂x α ∂x α

3. Let be φ ∈ L1 (Rd ) with

φ ≥ 0, φ(x) dx = 1,
Rd

and for ε > 0, set

φε (x) = 1
εd
φ( xε ).
70 3 Basic Tools

Then for a uniformly continuous and bounded function u : Rd → R, one has

(u ∗ φε )(x) → u(x) for ε → 0.

Furthermore, u ∗ φε converges uniformly on every compact subset of Rd .

Proof
1. The equality u ∗ v = v ∗ u follows by integral substitution. We consider the case
q = 1 (i.e., r = p). One has

|u ∗ v(x)| ≤ |u(x − y)||v(y)| dy.
Rd

For p = ∞, we can pull the supremum of |u| out of the integral and obtain the
required estimate. For p < ∞, we integrate the pth power of u ∗ v and get
p
|u ∗ v(x)| dx ≤ p
|u(x − y)||v(y)| dy dx.
Rd Rd Rd

For p = 1, according to Fubini’s theorem, we have

|u ∗ v(x)| dx ≤ |u(x − y)| dx |v(y)| dy,
Rd Rd Rd

which yields the assertion. For 1 < p < ∞ and 1

p + 1
p∗ = 1, we apply Hölder’s
inequality and obtain (using Fubini’s theorem)
p
∗
|u ∗ v(x)|p dx ≤ |u(x − y)||v(y)|1/p |v(y)|1/p dy dx
Rd Rd Rd
p/p∗
≤ |u(x − y)| |v(y)| dyp
|v(y)| dy dx
Rd Rd Rd

p/p ∗
= |v(y)| |u(x − y)|p dx dy v1
Rd Rd

p/p ∗ +1
≤ |u(x)|p dx v1 .
Rd

The application of Fubini’s theorem is justified here in retrospect since the latter
integrals exist. After taking the pth root, the assertion follows. For q > 1, the
proof is similar but more complicated, cf. [91], for instance.
2. We prove the assertion only for the first partial derivatives; the general case then
follows through repeated application. We consider the difference quotient in the
3.3 Linear Filters 71

direction of the ith unit vector ei :

(u ∗ φ)(x + tei ) − (u ∗ φ)(x) φ(x + tei − y) − φ(x − y)
= u(y) dy.
t Rd t
∂φ
Since the quotient (φ(x+tei −y)−φ(x−y))/t converges uniformly to ∂xi (x−y),
the assertion follows.
3. To begin with, we remark that

φε (x) dx = 1, φε (x) dx → 0 for ε → 0,
Rd Rd \Bρ (0)

which can be found using the variable transformation ξ = x/ε. We conclude that

|(u ∗ φε )(x) − u(x)| ≤ |u(x − y) − u(x)||φε (y)| dy.
Rd

We choose ρ > 0 and split the integral into large and small y:

|(u ∗ φε )(x) − u(x)|

≤ |u(x − y) − u(x)||φε (y)| dy + |u(x − y) − u(x)||φε (y)| dy
Bρ (0) Rd \Bρ (0)

≤ |φε (y)| dy sup |u(x − y) − u(x)|
Bρ (0) |y|≤ρ

≤1 →0 for ρ→0

+ |φε (y)| dy sup |u(x − y) − u(x)| .
Rd \Bρ (0) |y|≥ρ

→0 for ε→0 and every ρ>0 ≤2 sup |u(x)|

On the one hand, this shows the pointwise convergence; on the other hand, it also
shows the uniform convergence on compact sets. $
#
Expressed in words, the properties of the convolution read:
1. The convolution is a linear and continuous operation if it is considered between
the function spaces specified.
2. The convolution of functions inherits the smoothness of the smoother function
of the two, and the derivative of the convolution corresponds to the convolution
with the derivative.
3. The convolution of a function u with a “narrow” function φ approximates u in a
certain sense.
For image processing, the second and third property are of particular interest:
Convolving with a smooth function smooths the image. When convolving with a
function φε , for small ε, only a small error occurs.
72 3 Basic Tools

In fact, the convolution is often even slightly smoother than the smoother one of
the functions. A basic example for this case is given in the following proposition:
Theorem 3.14 Let p > 1 and p∗ the dual exponent, u ∈ Lp (Rd ), and v ∈
∗
Lp (Rd ). Then u ∗ v ∈ C (Rd ).
Proof For h ∈ Rd , we estimate by means of Hölder’s inequality:

|u ∗ v(x + h) − u ∗ v(x)| ≤ |u(y)||v(x + h − y) − v(x − y)| dy
Rd
1/p ∗
1/p∗
≤ |u(y)|p dy |v(x + h − y) − v(x − y) dy|p
Rd Rd

= up Th v − vp∗ .

The fact that the last integral converges to 0 for h → 0 corresponds to the assertion
∗
that Lp -functions are continuous in the p∗ -mean for 1 ≤ p < ∞, see Exercise 3.4.
$
#
Note that Theorem 3.13 yields only u ∗ v ∈ L∞ (Rd ) in this case.
The smoothing properties of the convolution can also be formulated in several
other ways than in Theorem 3.13, for instance, in Sobolev spaces H m,p (Rd ):
Theorem 3.15 Let m ∈ N, 1 ≤ p, q ≤ ∞, 1r + 1 = p1 + q1 , u ∈ Lp (Rd ), and
h ∈ H m,q (Rd ). Then u ∗ h ∈ H m,r (Rd ), and for the weak derivatives up to order
m, we have

∂ α (u ∗ h) = u ∗ ∂ α h almost everywhere.

Furthermore, let 1 and 2 be domains. The assertion holds with u∗h ∈ H m,r (2 )
if u ∈ Lp (1 ) and h ∈ H m,q (2 − 1 ).
Proof We use the definition of the weak derivative and calculate, similarly to
Theorem 3.13, using Fubini’s theorem:

|α| ∂α
∂ (u ∗ h)φ(x) dx = (−1)
α
(u ∗ h)(x) φ(x) dx
Rd Rd ∂x α

∂α
= (−1)|α| u(y)h(x − y) dy φ(x) dx
Rd Rd ∂x α

|α| ∂α
= (−1) u(y) h(x − y) φ(x) dx dy
Rd Rd ∂x α

= u(y) ∂ α h(x − y)φ(x) dx dy
Rd Rd

= (u ∗ ∂ α h)(x)φ(x) dx.
Rd
3.3 Linear Filters 73

Now the asserted rule for the derivative follows from the fundamental lemma of the
calculus of variations, Lemma 2.75. Due to ∂ α h ∈ Lq (Rd ), Theorem 3.13 yields
∂ α (u ∗ h) = u ∗ ∂ α h ∈ Lr (Rd ) for |α| ≤ m, and hence we have u ∗ h ∈ H m,p (Rd ).
In particular, the existence of the integral on the left-hand side above is justified
retrospectively.
The additional claim follows analogously; we remark, however, that for φ ∈
D(2 ), extending by zero and y ∈ 1 , we have Ty φ ∈ D(2 − y). Thus, the
definition of the weak derivative can be applied. $
#
If the function φ of Theorem 3.13 (3) additionally lies in D(Rd ), the functions
φε are also called mollifiers. This is motivated by the fact that in this case, u ∗
φε is infinitely differentiable and for small ε, it differs only slightly from u. This
situation can be expressed more precisely in various norms, such as the Lp -norms,
for instance:
Lemma 3.16 Let ⊂ Rd be a domain and 1 ≤ p < ∞. Then for every u ∈
Lp (), mollifier φ, and δ > 0, there exists some ε > 0 such that

φε ∗ u − up < δ.

In particular, the space D() is dense in Lp ().

Proof For δ > 0 and u ∈ Lp (), there exists, according to Theorem 2.55, some
f ∈ Cc () such that u − f p < δ/3. Together with the triangle inequality and
Young’s inequality, this yields

2δ
φε ∗ u − up ≤ φε ∗ u − φε ∗ f p + φε ∗ f − f p + f − up ≤ + φε ∗ f − f p .
3

Since f has compact support and is continuous, it is uniformly continuous as well,

and according to Theorem 3.13 (3), it follows that for sufficiently small ε > 0, one
has φε ∗ f − f p < δ/3. By means of the above estimate, the assertion follows.
The remaining claim can be shown analogously, using the modification

2δ
u − φε ∗ f p ≤ u − f p + φε ∗ f − f p < < δ,
3

where ε > 0 is chosen sufficiently small such that we have supp f − supp φε ⊂⊂ .
Together with φε ∗ f ∈ D(), this implies the density. #
$
A similar density result also holds for Sobolev spaces:
Lemma 3.17 Let ⊂ Rd be a domain, m ∈ N with m ≥ 1, 1 ≤ p < ∞ and
u ∈ H m,p (). Then for every δ > 0 and subdomain with ⊂⊂ , there exists
f ∈ C ∞ () such that u − f m,p < δ on .
Proof We choose a mollifier φ and ε0 > 0 such that − supp φε ⊂⊂ holds
for all ε ∈ ]0, ε0 [. The function fε = φε ∗ u now lies in C ∞ () and according
74 3 Basic Tools

to Theorem 3.15, for every ε ∈ ]0, ε0[, one has fε ∈ H m,p ( ), where we have
∂ α fε = u ∗ ∂ α φε for every multi-index α with |α| ≤ m. By means of Lemma 3.16,
we can choose ε sufficiently small such that for every multi-index with |α| ≤ m,
one has
δ
∂ α (u − fε )p = ∂ α u − φε ∗ ∂ α up < in ,
M
where M denotes the number of multi-indices with |α| ≤ m. Setting f = fε and
using the Minkowski inequality yields the desired assertion. $
#
Theorem 3.18 Let 1 ≤ p < ∞ and m ∈ N. Then the space D(Rd ) is dense in
H m,p (Rd ).
Proof We first show that the set C ∞ (Rd ) ∩ H m,p (Rd ) is dense. Let u ∈ H m,p ()
and φ a mollifier. Since in particular u ∈ Lp (Rd ), we obtain u ∗ φε → u in Lp (Rd )
and for |α| ≤ m, (∂ α u) ∗ φε = ∂ α (u ∗ φε ) converges to ∂ α u in Lp (Rd ), i.e., we have
u ∗ φε → u in H m,p (Rd ).
Now we show that the space D(Rd ) is dense in C ∞ (Rd ) ∩ H m,p (Rd ) (which will
complete the proof). For this purpose, let u ∈ C ∞ (Rd ) ∩ H m,p (Rd ), which implies
in particular that the classical derivatives ∂ α u up to order m are in Lp (Rd ). Now
let η ∈ D(Rd ) with η ≡ 1 in a neighborhood of zero. For R > 0, we consider the
functions uR (x) = u(x)η(x/R). Then uR ∈ D(Rd ), and according to the dominated
convergence theorem, we also have uR → u in Lp (Rd ) for R → ∞. For the partial
derivatives, due to the Leibniz formula, we have
α
(∂ uR )(x) =
α
(∂ α−β u)(x)R −|β| (∂ β η)(x/R).
β
β≤α

The summand for β = 0 converges to ∂ α u in Lp (Rd ) (again due to dominated

convergence). The remaining summands even converge to zero. Therefore, we have
∂ α uR → ∂ α u in Lp (Rd ), and we conclude that uR → u in H m,p (Rd ). $
#
Remark 3.19 If u ∈ H m,p (Rd ) has compact support, then for every ε > 0 and
with supp u ⊂⊂ we can find by means of the above argument some φ ∈ D()
such that u − φm,p < ε in . Note that we did not prove the density result for
domains ⊂ Rd . In general, D() is not dense in H m,p (); note that the closure
of D() has been denoted by H0 (). However, the space C ∞ () is often dense
m,p

in H m,p () as we will demonstrate in Proposition 6.74.

The convolution is defined not only for suitable integrable functions. For
instance, measures can also be convolved with integrable functions: if μ is a measure
on Rd and f : Rd → R an integrable function, then

μ ∗ f (x) = f (x − y) dμ(y).
Rd
3.3 Linear Filters 75

Remark 3.20 (Interpolation as Convolution) By means of the convolution of a

measure and a function, we can take a new look at the interpolation of images.
According to Remark 3.3, we write a discrete image (Ui,j ) on a regular quadratic
grid as a delta comb:

U= Ui,j δxi,j , xi,j = (i, j ).

The interpolation of U with an interpolation function φ is then given by

u(x) = U ∗ φ(x) = φ(x − y) dU (y) = Ui,j φ(x − xi,j ).
R2

Remark 3.21 In image processing, it is more common to speak of filters rather than
convolutions. This corresponds to a convolution with a reflected convolution kernel:
the linear filter for h is defined by u ∗ D− id h. If not stated otherwise, we will use
the term “linear filter” for the operation u ∗ h in this book.

3.3.2 Applications

By means of linear filters, one can create interesting effects and also tackle some
of the fundamental problems of image processing. Exemplarily, we here show three
applications:
Application 3.22 (Effect Filters) Some effects of analog photography can be
realized by means of linear filters:
Duto filter: The Duto filter overlays the image with a smoothed version of itself.
The result gives the impression of blur on the one hand, whereas on the other
hand, the sharpness is maintained. This results in a “dream-like” effect. In
mathematical terms, the Duto filter can be realized by means of a convolution
with a Gaussian function, for instance. For this purpose, let

1 −|x|2
Gσ (x) = exp (3.2)
(2πσ )d/2 2σ

be the d-dimensional Gaussian function with variance σ . In the case of the Duto
filter, the convolved image is linearly overlayed with the original image. This can
be written as a convex combination with parameter λ ∈ [0, 1]:

λu + (1 − λ)u ∗ Gσ .

Motion blur: If an object (or the camera) moves during the exposure time, a point
of the object is mapped onto a line. For a linear motion of length l in direction
76 3 Basic Tools

v ∈ Rd , we can write the motion blur as

l
1
u(x + tv) dt.
l 0

This also constitutes a linear operation. However, it is not realized by a

convolution with a function, but with a measure. For this purpose, we consider
the line L = {tv 0 ≤ t ≤ l} of length l and define the measure

1 1
μ= H L
l
by means of the Hausdorff measure of Example 2.38. The motion blur is then
given by

u(x + y) dμ(y).

In Fig. 3.6, an example for the application of the Duto filter as well as motion
blurring is given.

Application 3.23 (Edge Detection According to Canny) For now, we simply

define an edge by saying that at an edge, the gray value changes abruptly. For
a one-dimensional gray value distribution u, it is reasonable to locate an edge
at the point with maximal slope. This point can also be determined as a root of
the second derivative u ; cf. also Fig. 3.7. If the image is noisy, differentiability
cannot be expected, and numerical differentiation leads to many local maxima of
the derivative and, therefore, to many roots of the second derivative as well. Hence,
it is advisable to smoothen the image beforehand by means of a convolution with
a smooth function f and compute the derivative after this. For the derivative of the

Fig. 3.6 Effect filters of Application 3.22. Left: Original image. Center: Duto blurrer with λ =
0.5. Right: Motion blurring
3.3 Linear Filters 77

u:
u :
u :

Fig. 3.7 A one-dimensional gray value distribution and its first two derivatives

2
0.1 0.1
1
-3 -2 -1 1 -3 -2 -1 1
-3 -2 -1 0 1 2 3 -0.1 -0.1

2 2
1 1

-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3

Fig. 3.8 Edge detection by determining the maxima of the smoothed derivative. Left: A noisy edge
at zero. Center and right: On top a filter function, below the corresponding result of the filtering

convolution, we can write according to Proposition 3.13,

∂(u ∗ f )(x)
= (u ∗ f )(x).
∂x

Different functions h = f lead to different qualities of results for the edge detection
by determining the maxima of (u ∗ f )(x), cf. Fig. 3.8.
Canny [29] presents a lengthy derivation of a class of, in some sense, optimal
functions f . Here, we present a heuristic variant, which leads to the same result.
Edges exist on different scales, i.e., there are “coarse edges” and “fine edges.”
Fine edges belong to small, delicately structured objects and are thus suppressed
by a convolution with a function with a high variance. The convolution with a
function with a small variance changes the image only slightly (cf. Proposition 3.13)
and hence preserves all edges. Therefore, we consider rescaled versions fσ (x) =
σ −1 f (σ −1 x) of a given convolution kernel f . If σ is large, fσ is “wider” and if σ
is small, fσ is “narrower.” For one original image u0 , we thus obtain a whole class
of smoothed images:

u(x, σ ) = u0 ∗ fσ (x).

We now formulate requirements for finding a suitable f . We require that the location
of the edges remain constant for different σ . Furthermore, no new edges shall appear
78 3 Basic Tools

for larger σ . In view of Fig. 3.7, we hence require that at an edge point x0 , we have

∂2 ∂
u(x0 , σ ) > 0 &⇒ u(x0 , σ ) > 0,
∂x 2 ∂σ
∂2 ∂
2
u(x0 , σ ) = 0 &⇒ u(x0 , σ ) = 0,
∂x ∂σ
∂2 ∂
u(x0 , σ ) < 0 &⇒ u(x0 , σ ) < 0.
∂x 2 ∂σ
In other words, if the second derivative in the x-direction of u is positive (or
zero/negative), then u will increase (or remain constant/decrease) for increasing σ ,
i.e., for coarser scales. In order to ensure this, we require that the function u solve
the following differential equation:

∂ ∂2
u(x, σ ) = u(x, σ ). (3.3)
∂σ ∂x 2
Furthermore, σ = 0 should lead to the function u0 , of course. Thus, we set the initial
value for the differential equation as follows:

u(x, 0) = u0 (x). (3.4)

The initial value problem (3.3), (3.4) is known from physics, where is models heat
conduction in one dimension. The problem admits a unique solution,
√ which is given
by the convolution with the Gaussian function (3.2) with variance 2σ :

u(x, σ ) = (u0 ∗ G√2σ )(x).

We have hence found a suitable function f = G1 .

Therefore, Canny edge detection starts with convolving a given image u with a
Gaussian function Gσ . Then the gradient is computed, and its absolute value and
direction are determined:
!
ρ(x) = |∇(u ∗ Gσ )(x)| = ∂x1 (u ∗ Gσ )(x)2 + ∂x2 (u ∗ Gσ )(x)2 ,
∂ (u ∗ G )(x)
x σ
(x) = (∇(u ∗ Gσ )(x)) = arctan 2 .
∂x1 (u ∗ Gσ )(x)

The points x at which ρ(x) exhibits a local maximum in the direction of the vector
(sin((x)), cos((x))) are then marked as edges. Afterward, a threshold is applied
in order to suppress edges that are not important or induced by noise, i.e., if ρ(x)
is smaller than a given value τ , the corresponding x is removed. The result of the
3.3 Linear Filters 79

Fig. 3.9 Edge detection. Top left: Original image with 256×256 pixels. Top center: Edge detection
through thresholding of the gradient after convolution with a Gaussian function with σ = 3.
Bottom: Edge detection with Canny edge detector (from left to right: σ = 1, 2, 3)

Canny detector is shown in Fig. 3.9. For the sake of comparison, the result of a
simple thresholding method is depicted as well. In this case, ρ is calculated as for
the Canny edge detector, and then all points x for which ρ(x) is larger than a given
threshold are marked as edges.

Application 3.24 (Laplace Sharpening) Suppose we are given a blurred image

u that we would like to sharpen. For this purpose, we consider a one-dimensional
cross section f through the image u:

u
80 3 Basic Tools

Furthermore, we consider the second derivative of the cross section f and remark
(analogously to Fig. 3.7) that if we subtract from f a small multiple of the second
derivative f , we obtain an image in which the edge is steeper than before:

f f − τ f

Note, however, that edges occur in different directions. Thus, a rotationally invariant
differential operator is necessary. The simplest one is given by the Laplace operator

∂ 2u ∂ 2u
u = + 2 .
∂x 2 ∂y

Therefore, the sharpening in 2D with a parameter τ > 0 is performed by means of

the operation

u − τ u.

Note that this is a linear operation. In general, we cannot assume that the images u
are sufficiently smooth, so that the Laplace operator may not be well defined. Again,
a simple remedy is to smoothen the image beforehand—by a convolution with a
Gaussian function, for instance. According to Proposition 3.13, we then obtain

(u) = (u − τ u) ∗ Gσ = u ∗ (Gσ − τ Gσ ).

Successively applying this operation emphasizes the edges step by step:

...

In general, however, the edges are overemphasized after some time, i.e., the function
values in a neighborhood of the edge are smaller or larger than in the original image.
Furthermore, noise can be increased by this operation as well. These effects can be
seen in Fig. 3.10.
3.3 Linear Filters 81

Fig. 3.10 Laplace sharpening. Top left: Original image with 256 × 256 pixels. Next to it:
Successively applying Laplace sharpening of Application 3.24 (parameter: σ = 0.25, α = 1)

3.3.3 Discretization of Convolutions

We now derive the standard discretization of convolutions. To start with, we consider

a one-dimensional discrete image U : Z → R as well as a one-dimensional discrete
convolution kernel H : Z → R. We deduce the convolution of U with H from
the continuous representation. By means of the piecewise constant interpolation in
Sect. 3.1.1, we obtain, with φ = χ[−1/2,1/2[ ,

(u ∗ h)(k) = u(y)h(k − y) dy = Ul φ(y − l) Hm φ(k − y − m) dy
R R l∈Z m∈Z

= Ul H m φ(x)φ(k − l − m − x) dx.
l∈Z m∈Z R
82 3 Basic Tools

Since for the integral we have

1 if m = k − l,
φ(x)φ(k − l − m − x) dx =
R 0 otherwise,

we arrive at the representation

(u ∗ h)(k) = Ul Hk−l .
l∈Z

Therefore, we define the convolution of discrete images by

(U ∗ H )k = Ul Hk−l .
l∈Z

The generalization to higher dimensions is obvious. Dealing with finite images and
finite convolution kernels, we obtain finite sums and are faced with the problem
that we have to evaluate U or H at undefined points. This problem can be tackled
by means of a boundary treatment or a boundary extension, in which it suffices to
extend U . We extend a given discrete and finite image U : {0, . . . , N − 1} → R to
an image Ũ : Z → R in one of the following ways:
• Periodic extension: Tessellate Z with copies of the original image

Ũi = Ui mod N .

• Zero extension: Set Ũi = 0 for i < 0 or i ≥ N.

• Constant extension: Repeat the gray values at the boundary, i.e.,

Ũi = UPN (i) , PN (i) = max min(N − 1, i), 0 .

• Reflection and periodization or symmetric extension: Reflect the image succes-

sively at the edges, i.e.,

Ũi = UZN (i mod 2N) , ZN (i) = min(i, 2N − 1 − i).

For the extension of images of multiple dimensions, the rules are applied to each
dimension separately. Figure 3.11 shows an illustration of the different methods in
two dimensions.
Note that the periodical and zero extensions induce unnatural jumps at the
boundary of the image. The constant and the symmetric extensions do not produce
such additional jumps.
Let us now consider two-dimensional images U : Z2 → R and cover some
classical methods that belong to the very first methods of digital image processing.
3.3 Linear Filters 83

Fig. 3.11 Different methods of boundary extension

In this context, one usually speaks of filter masks rather than convolution kernels. A
filter mask H ∈ R2r+1×2s+1 defines a filter by

r
s
(U H )i,j = Hk,l Ui+k,j +l .
k=−r l=−s

We assume throughout that the filter mask is of odd size and is indexed in the
following way:
⎡ ⎤
H−r,−s . . . H−r,s
⎢ . ⎥
H = ⎣ ... H0,0 .. ⎦ .
Hr,−s . . . Hr,s

Filtering corresponds to a convolution with the reflected filter mask. Due to the
symmetry of the convolution, we observe that

(U H ) G = U (H ∗ G) = U (G ∗ H ) = (U G) H.
84 3 Basic Tools

Therefore, the order of applying different filter masks does not matter. We now
introduce some important filters:
Moving average: For odd n, the moving average is given by

1, -
Mn = 1 . . . 1 ∈ Rn .
n

A two-dimensional moving average is obtained by (M n )T ∗ M n .

Gaussian filter: The Gaussian filter Gσ with variance σ > 0 is a smoothing filter
that is based on the Gaussian function. It is obtained by normalization:

−(k 2 + l 2 ) G̃σ
G̃σk,l = exp , Gσ = .
2σ 2 k,l G̃σk,l

Binomial filter: The binomial filters B n correspond to the normalized lines of

Pascal’s triangle:

1, -
B1 = 110 ,
2
1 , -
B2 = 121 ,
4
1 , -
B3 = 13310 ,
8
1 , -
B4 = 14641 ,
16
..
.

For large n, the binomial filters present good approximations to Gaussian filters.
Two-dimensional binomial filters are obtained by (B n )T ∗ B n . An important
property of binomial filters is the fact that B n+1 can be obtained by a convolution
of B n with B 1 (up to translation).
Derivative filter according to Prewitt and Sobel: In Application 3.23, we saw
that edge detection can be realized by calculation of derivatives. Discretizing the
derivative in the x and y direction by central difference quotients and normalizing
the distance of two pixels to 1, we obtain the filters
⎡ ⎤
−1
1, - 1
Dx = −1 0 1 , D y = (D x )T = ⎣ 0 ⎦ .
2 2
1

Since derivatives amplify noise, it was suggested in the early days of image
processing to complement these derivative filters with a smoothing into the
3.3 Linear Filters 85

respectively opposite
, -direction. In case of the Prewitt filters [116], a moving
average M 3 = 1 1 1 /3 is used:
⎡ ⎤
−1 0 1
1
x
DPrewitt = (M 3 )T ∗ D x = ⎣−1 0 1⎦ ,
6
−1 0 1
⎡ ⎤
−1 −1 −1
1
= M 3 ∗ Dy = ⎣ 0 0 0 ⎦ .
y
DPrewitt
6
1 1 1

The Sobel filters [133] use the binomial filter B 2 as a smoothing filter:

⎡ ⎤
−1 0 1
1
x
DSobel = (B 2 )T ∗ D x = ⎣−2 0 2⎦ ,
8
−1 0 1
⎡ ⎤
−1 −2 −1
1
= B 2 ∗ Dy = ⎣ 0 0 0 ⎦ .
y
DSobel
8
1 2 1

Laplace filter: We have already seen the Laplace operator in Application 3.24,
where it was used for image sharpening. In this case, we need to discretize
second-order derivatives. We realize this by successively applying forward and
backward difference quotients:

∂ 2u , - , - , -
2
≈ (U D−
x
) D+
x
= (U −1 1 0 ) 0 −1 1 = U 1 −2 1 .
∂x
Therefore, the Laplace filter is obtained by
⎡ ⎤
0 1 0
u ≈ (U D−
x
) D+
x
+ (U
y y
D− ) D+ ⎣
= U 1 −4 1⎦ .
0 1 0

We now consider the numerical cost of filtering or convolutions: the complexity

of calculating the convolution with a filter mask of size 2r + 1 × 2s + 1 in a pixel is
of order O((2r + 1)(2s + 1)). Since this has to be computed for every pixel, a better
asymptotic behavior is desirable.
Separating filter masks: We call a filter mask H ∈ R2r+1×2s+1 separable if it
can be obtained from one-dimensional filter masks F ∈ R2r+1 , G ∈ R2s+1 in
the following way:

H = GT ∗ F.
86 3 Basic Tools

With this decomposition, the filtering with H can be factorized into

U H = (U GT ) F

and therefore, the numerical cost reduces to O((2r + 1) + (2s + 1)). The moving
average, as well as the Laplace, Sobel, Prewitt, and binomial filters is separable.
Recursive implementation: The moving average can be implemented recur-
sively: If Vi = (U M 2n+1 )i = n1 nk=−n Ui+k is known, then

Vi+1 = Vi + n1 (Ui+1+n − Ui−n ).

Thus, the cost consists of two additions and one multiplication—independently

of the size of the filter. Recursive filters play a major role in signal processing, in
particular in real-time filtering of measured signals. In this case, only the already
measured points are known, and the filter can use only these measurements.
Factorization/utilizing bit shifts: The binomial filters can be factorized into
smaller filters; for instance,

1 , - 1 , - , - , - , -
14641 = 110 011 110 011 .
16 16
Note that each of the partial filters consists of only one addition. Furthermore,
the multiplication by 1/16 presents a bit shift, which can be executed faster than
a multiplication.

3.4 Morphological Filters

Morphological filters are the main tools of mathematical morphology, i.e., the
theory of the analysis of spatial structures in images (the name is derived from
the Greek word “morphe” = shape). The mathematical theory of morphological
filters traces back to the engineers Georges Matheron and Jean Serra, cf. [134], for
instance. Morphological methods aim mainly at the recognition and transformation
of the shape of objects. We again consider an introductory example:
Example 3.25 (Denoising of Objects) Let us first assume that we have found an
object in a discrete digital image—by means of a suitable segmentation method, for
instance. For the mathematical description of the object, an obvious approach is to
encode the object as a binary image, i.e., as a binary function u : Rd → {0, 1} with

1 if x belongs to the object,
u(x) =
0 if x does not belongs to the object.
3.4 Morphological Filters 87

Furthermore, we assume that the object is “perturbed,” i.e., that there are perturba-
tions in the form of “small” objects. Since “1” typically encodes the color white
and “0” the color black, the shape consists of the white part of the image u, and the
perturbations are small additional white points.
Since we know that the perturbances are small, we define a “structure element”
B ⊂ Rd of which we assume that it is just large enough to cover each perturbance.
In order to eliminate the perturbances, we compute a new image v by

1 if for all y ∈ B, there holds u(x + y) = 1
v(x) =
0 otherwise.

This implies that only those points x for which the structure element B shifted by x
lies within the old object completely are part of the new object v. This eliminates all
parts of the objects that are smaller than the structuring element. However, the object
is also changed significantly: the object is “thinner” than before. We try to undo the
“thinning” by means of the following procedure: we compute another image w by

1 if for a y ∈ B, v(x + y) = 1,
w(x) =
0 otherwise.

Hence, a point x is part of the new object w if the structure element B shifted by
x touches the old object v. This leads to an enlargement of the object. However, all
remaining perturbations are increased again as well. All in all, we have reached our
goal relatively well: the perturbations are eliminated to a large extent and the object
is changed only slightly; cf. Fig. 3.12.
The methods used in this introductory example are the fundamental methods
in mathematical morphology and will now be introduced systematically. The

u v w

Fig. 3.12 Denoising of objects in Example 3.25

88 3 Basic Tools

description of an object by means of a binary function u : Rd → {0, 1} is equivalent

to the description as a subset of Rd . In the following, we will not distinguish between
the two representations and use the slightly inexact notation u ⊂ Rd as long as it
does not lead to any misunderstandings. Union and intersection of sets correspond
to finding the maximum and minimum of functions, respectively:

(u ∪ v)(x) = u(x) ∨ v(x) = max(u(x), v(x)),

(u ∩ v)(x) = u(x) ∧ v(x) = min(u(x), v(x)).

We will also use the notation u ∨ v and u ∧ v for the supremum and infimum,
respectively. Obtaining the complement corresponds to subtraction from one:

u(x) = 1 − u(x).

With these operations, the binary functions are a Boolean algebra.

3.4.1 Fundamental Operations: Dilation and Erosion

We already became acquainted with the two fundamental operations of mathemat-

ical morphology in the introductory Example 3.25. They are called erosion and
dilation.
Definition 3.26 Let B ⊂ Rd be a nonempty subset and u ⊂ Rd . The dilation of u
with a structure element B is defined by

1 if for some y ∈ B u(x + y) = 1,
(u ⊕ B)(x) =
0 otherwise.

The erosion of u with the structure element B is defined by

1 if for all y ∈ B u(x + y) = 1,
(u * B)(x) =
0 otherwise.

If we interpret B as a shape, the erosion of an object u provides an answer to

the question, “Which translates of B fit into the object u?” Analogously, we can
view the dilation of u as an answer to the question, “Which translates of B touch
the object u?” Figure 3.13 shows an illustrative example of these operations.
3.4 Morphological Filters 89

u u B
u⊕B u

Fig. 3.13 Illustration of dilation (left) and erosion (right) of an object (dashed line) with a circular
disk

Erosion and dilation can be extended to grayscale images in a natural way. The
key to this is provided by the following simple lemma:
Lemma 3.27 Let u : Rd → {0, 1} and B ⊂ Rd nonempty. Then

(u ⊕ B)(x) = sup u(x + y), (u * B)(x) = inf u(x + y). (3.5)

y∈B y∈B

Proof The proof consists simply in carefully considering the definitions. For the
dilation, we observe that we have supy∈B u(x + y) = 1 if and only if u(x + y) = 1
for some y ∈ B. And for the erosion we have infy∈B u(x + y) = 1 if and only if
u(x + y) = 1 for all y ∈ B. $
#
The formulation of erosion and dilation in this lemma does not use the fact that u(x)
attains only the values 0 and 1. Therefore, we can use the formulas for real-valued
functions u analogously. In order to avoid the values ±∞ in the supremum and
infimum, we assume in the following that u is bounded, i.e., we work in the vector
space of bounded functions:

B(Rd ) = {u : Rd → R u bounded}.

Definition 3.28 (Erosion and Dilation of Grayscale Images) Let B ⊂ Rd be a

nonempty subset and u ∈ B(Rd ). The dilation of u with a structure element B is
defined by

(u ⊕ B)(x) = sup u(x + y).

y∈B
90 3 Basic Tools

The erosion of u with the structure element B is defined by

(u * B)(x) = inf u(x + y).

y∈B

Erosion and dilation have a set of useful fundamental properties:

Theorem 3.29 Let u, v ∈ B(Rd ), B, C ⊂ Rd nonempty structure elements, and
y ∈ Rd .
Then the following properties hold
Duality

−(u ⊕ B) = (−u) * B.

Translation invariance

(Ty u) * B = Ty (u * B);
(Ty u) ⊕ B = Ty (u ⊕ B).

Monotonicity

u * B ≤ v * B,
u≤v ⇒
u ⊕ B ≤ v ⊕ B.

Distributivity

(u ∧ v) * B = (u * B) ∧ (v * B),
(u ∨ v) ⊕ B = (u ⊕ B) ∨ (v ⊕ B).

Composition For B + C = {x + y ∈ Rd x ∈ B, y ∈ C}, one has

(u * B) * C = u * (B + C),
(u ⊕ B) ⊕ C = u ⊕ (B + C).

Proof The proofs of these assertions rely on the respective properties of the
supremum and infimum. For instance, we can show duality as follows:

−(u ⊕ B)(x) = − sup u(x + y) = inf −u(x + y) = ((−u) * B)(x).

y∈B y∈B

The further proofs are a good exercise in understanding the respective notions. $
#
Dilation and erosion obey a further fundamental property: among all operations
on binary images, they are the only ones that are translation invariant and satisfy
3.4 Morphological Filters 91

the distributivity of Proposition 3.29, i.e., dilation and erosion are characterized by
these properties, as the following theorem shows:
Theorem 3.30 Let D be a translation invariant operator on binary images with
D(0) = 0 such that for each set of binary images ui ⊂ Rd ,
. .
D( ui ) = D(ui ).
i i

Then there exists a set B ⊂ Rd such that

D(u) = u ⊕ B.

If E is translation invariant with E(χRd ) = χRd and

/ /
E( ui ) = E(ui ),
i i

then there exists B ⊂ Rd such that

E(u) = u * B.

Proof Since D is translation invariant, translation invariant images are mapped onto
translation invariant images again. Since 0 and χRd are the only translation invariant
images, we must have either D(0) = 0 or D(0) = χRd . The second case, in which
D would not be a dilation, is excluded by definition.
Since we can write every binary image u ⊂ Rd as a union of its elements χ{y} ,
we have
. .
Du = D( χ{y} ) = Dχ{y} .
y∈u y∈u

Since D is translation invariant, we have Dχ{y} = Dχ{0} ( · − y), which implies

. 1 if ∃y : u(y) = 1 and Dχ{0} (x − y) = 1,
Du(x) = (Dχ{0} )(x − y) =
y∈u 0 otherwise.

On the other hand, we observe

1 if there holds ∃y : B(y) = 1 and u(x + y) = 1
(u ⊕ B)(x) =
0 else
92 3 Basic Tools

and we obtain

Du = u ⊕ Dχ{0} (− · ),

i.e., the operation D equals

the dilation of u with the structure element B =
Dχ{0} (− · ) = {y ∈ Rd −y ∈ Dχ{0} }.
The case of erosion can be deduced from the case of dilation. We define the
operator D̃ by D̃u = E(u) and remark that it possesses the characteristic
properties of a dilation. Thus, by means of the first part of the assertion, we can
conclude that D̃ is a dilation:

D̃(u) = u ⊕ B,

where the structure element is given by B = D̃(χ{0} (− · )) = E(χ{0} (− · )) =

E(χRd \{0} (− · )). This implies

E(u) = D̃(u) = (u ⊕ B) = u * B. $

A further property of erosion and dilation that makes them interesting for image
processing is their contrast invariance:
Theorem 3.31 Let B ⊂ Rd be nonempty. Erosion and dilation with structure
element B are invariant with respect to change of contrast, i.e., for every continuous,
monotonically increasing grayscale transformation : R → R and every u ∈
B(Rd ),

(u) * B = (u * B) and (u) ⊕ B = (u ⊕ B).

Proof The assertion is a direct consequence of the fact that continuous, monotoni-
cally increasing functions can be interchanged with supremum and infimum; shown
exemplarily for the dilation,

(u) ⊕ B(x) = sup (u(x + y)) = (sup u(x + y)) = (u ⊕ B(x)). $
#
y∈B y∈B

3.4.2 Concatenated Operations

Erosion and dilation can be combined in order to achieve specific effects. We already
saw this procedure in the introductory Example 3.25: By means of erosion followed
by a dilation we could remove small perturbations. By erosion of binary images, all
objects that are “smaller” than the structure element B (in the sense of set inclusion)
are eliminated. The eroded image contains only the larger structures, albeit “shrunk”
3.4 Morphological Filters 93

by B. A subsequent dilation with −B then has the effect that the object is suitably
enlarged again.
On the other hand, dilation fills up “holes” in the object that are smaller than
B. However, the result is enlarged by B, which can analogously be corrected by a
subsequent erosion with −B. In this way, we can “close” holes that are smaller than
B. In fact, the presented procedures are well-known methods in morphology.
Definition 3.32 (Opening and Closing) Let B ⊂ Rd be a nonempty structure
element and u ∈ B(Rd ) an image. Set −B = {−y y ∈ B}. The operator

u ◦ B = (u * B) ⊕ (−B)

is called opening, and the mapping

u • B = (u ⊕ B) * (−B)

is called closing.
The operators inherit many properties from the basic operators, yet in contrast to
erosion and dilation, it is less reasonable to iterate them.
Theorem 3.33 Let be B a nonempty structure element, u, v ∈ B(Rd ) images and
y ∈ Rd . Then the following properties hold
Translation invariance

(Ty u) ◦ B = Ty (u ◦ B),
(Ty u) • B = Ty (u • B).

Duality

−(u • B) = (−u) ◦ B.

Monotonicity

u ◦ B ≤ v ◦ B,
u≤v ⇒
v • B ≤ v • B.

Anti-extensionality and extensionality

u ◦ B ≤ u, u ≤ u • B.

Idempotence

(u ◦ B) ◦ B = u ◦ B,
(u • B) • B = u • B.
94 3 Basic Tools

Proof Translation invariance, duality, and monotonicity are a direct consequence of

the properties of dilation and erosion in Theorem 3.29. For the anti-extensionality
of the opening, we assume that the contrary holds, i.e., that for some x we had
(u ◦ B)(x) = ((u * B) ⊕ (−B))(x) > u(x). Then there would be some z ∈ B such
that

inf u(x + y − z) > u(x).

y∈B

This would imply that for all y ∈ B we would have

u(x + y − z) > u(x),

which apparently does not apply for y = z. We obtain a contradiction and hence
conclude that u ◦ B ≤ u. Analogously, we can deduce u • B ≥ u.
In order to show the idempotence of the opening, we remark that due to the anti-
extensionality of the opening, we have

(u ◦ B) ◦ B ≤ u ◦ B.

On the other hand, the monotonicity of the erosion and the extensionality of the
closing imply

(u ◦ B) * B = ((u * B) ⊕ (−B)) * B
= (u * B) • (−B)
≥ u * B.

By means of the monotonicity of the dilation, we obtain

(u ◦ B) ◦ B = ((u ◦ B) * B) ⊕ (−B) ≥ (u * B) ⊕ (−B) = u ◦ B,

which implies (u ◦ B) ◦ B = u ◦ B. For proving the idempotence of the closing, we

can argue analogously. $
#
Another commonly used combination of the fundamental morphological oper-
ators is the hit-or-miss operator. While in some sense, the erosion answers the
question, “Does B fit into the object?” the hit-or-miss operator responds to “Does
B fit into the object exactly?” Mathematically, we can state this more precisely by
introducing a miss mask C ⊂ B that describes the area in which the object does
not fit (B is called a hitmask in this context).
Definition 3.34 Let B, C ∈ Rd be nonempty disjoint subsets. Then the hit-or-miss
operator of a binary image u ⊂ Rd is defined by

u (B, C) = (u * B) ∩ (u) * C .
3.4 Morphological Filters 95

Remark 3.35 The definition can be generalized to grayscale images by introducing

a reasonable complementation. If we assume that the entire range of gray values is
given by [0, 1], we can define, for instance,

u (B, C) = (u * B) (1 − u) * C ,

where pointwise multiplication implements the intersection of sets. For binary

images u ⊂ Rd , this expression is equivalent to Definition 3.34.
The last common class of morphological filters we cover is that of the top-hat
operators, which aim at extracting information out of the image or object that is
smaller than the structure element.
Definition 3.36 Let B ⊂ Rd be a nonempty structure element and u ∈ B(Rd ) an
image. The white top-hat operator is defined by

u # B = u − u ◦ B,

while

u$B =u•B −u

defines the black top-hat operator.

Theorem 3.37 The white top-hat and black top-hat operators are translation
invariant, idempotent, and positive in the sense of

u # B ≥ 0, u $ B ≥ 0,

for each image u ∈ B(Rd ).

Proof This is a direct consequence of the properties of opening and closing given
in Theorem 3.33. $
#
The top-hat operators can be used effectively to remove an irregular background
or to correct nonuniform lighting, for instance; refer to Sect. 3.4.3.

3.4.3 Applications

Morphological operators can be used to solve a multitude of tasks. Often, the oper-
ators introduced here are combined in an intelligent and creative way. Therefore, it
is difficult to tell in general for which type of problems morphology can provide a
solution. The following application examples primarily illustrate the possibilities of
applying morphological filters and serve as an inspiration for developing one’s own
methods.
96 3 Basic Tools

Application 3.38 (Detection of Dominant Directions) Let a two-dimensional

image be given in which the dominant directions at to be determined. This can be
important in automatically verifying the orientation of a workpiece, for instance. By
means of morphological methods, this task can be accomplished in the following
way:
First, the edges are extracted from the image, by means of the algorithm
according to Canny in Application 3.23, for instance. To determine the dominant
directions in the edge image, the essential idea is to perform, for each α ∈ [0, π[,
an erosion with two points that are oriented by the angle α. We hence choose the
structure element:

Bα = {(0, 0), (sin α, cos α)}.

If there are many lines with direction α in the image, there will remain many points;
otherwise, there will not. Considering the area of the remaining object, we can thus
determine the dominant directions by means of the largest local maxima (also see
Fig. 3.14).

10.000

5.000

π/ 2 π

Fig. 3.14 Example for the detection of dominant directions. Upper left: Underlying image. Upper
right: Result of edge detection by the algorithm according to Canny. Lower left: Amount of pixels
in the eroded edge image depending on the angle α. Lower right: Image overlaid with the three
dominant directions determined by maxima detection
3.4 Morphological Filters 97

1: Given: Finite binary image u ⊂ R2

2: for α ∈ [−π/2, π/2[ do
3: Erode with two point mask Bα to angle α: vα = u * Bα
4: Calculate area of vα : N(α) = L2 (vα )
5: end for
6: Determine the largest local maxima of N.
Application 3.39 (Extraction of Specific Forms) As an example for the applica-
tion of the hit-or-miss operator we consider the following task: extract all letters
from a scanned text that have a serif at the bottom. For this purpose, we define a hit
mask in the form of the serif and a miss mask that describes the free space around
the lower part of the serif as follows:

hit mask B miss mask C combined

hit-or-miss mask

The application of the hit-or-miss operator now results in an image in which exactly
those letters are marked that exhibit a serif at the bottom in the form of the hit mask;
see Fig. 3.15.

Application 3.40 (Correction of an Irregular Background) The segmentation

of a grayscale image by a simple thresholding method is made considerably more
difficult by an irregular background. In this case, the following correction of the
background can remedy this:
By a closing with a large structure element, the dark parts that stand out from
the light background are eliminated and the background remains. Subtracting the
background from the image (which is the negative of the blacktop-hat operation),
we obtain an image with a more even background, which is better suited for
segmentation; see Fig. 3.16. The same method can be applied to images whose
background is lit nonuniformly.

3.4.4 Discretization of Morphological Operators

We now describe the discretization of the fundamental morphological operators:

erosion and dilation. For this purpose, we consider discrete images with continuous
color space u : Z2 → R. For discrete images with finite support, we can perform a
boundary extension analogously to Sect. 3.3.3.
98 3 Basic Tools

uB selections of the hit-mask (red)

u C selections of the miss-mask (blue)

◦ ( B, C )
u selections of the hit-or-miss mask
(red/blue)

Fig. 3.15 Example for a simple application of the hit-or-miss operator to select a downward
oriented serif B. The “hit operator” u * B also selects the cross bar of “t”, which is excluded by
applying the “miss operator” u * C. The combination finally leads to exactly the serif described
by B and C

We encode the structure element B by a binary matrix B ∈ {0, 1}2r+1×2s+1 of

odd size, which is indexed in the same way as in Sect. 3.3.3:
⎡ ⎤
B−r,−s · · · B−r,s
⎢ .. ⎥
B = ⎣ ... B . ⎦
0,0
Br,−s · · · Br,s
3.4 Morphological Filters 99

u
⏐ u•B u−u•B =
⏐ −(u B)
⏐ ⏐
⏐ ⏐

Fig. 3.16 Correction of an irregular background to improve segmentation. The structure element
is chosen to be a square. The lower two images show the respective results of the automatic
segmentation introduced in Application 3.11

where Bi,j = 1 denotes that (i, j ) belongs to the structure element and (i, j ) = 0
implies that (i, j ) is not part of the structure element. Erosion and dilation are then
defined as follows:

(u * B)i,j = min{ui+k,j +l (k, l) with Bk,l = 1},

(u ⊕ B)i,j = max{ui+k,j +l (k, l) with Bk,l = 1}.

The discrete erosion and dilation satisfy all properties of their continuous variants
presented in Theorem 3.29. For the composition property

(u * B) * C = u * (B + C), (u ⊕ B) ⊕ C = u ⊕ (B + C),

the sum B +C ∈ {0, 1}2(r+u)+1×2(s+v)+1 of structure elements B ∈ {0, 1}2r+1×2s+1

and C ∈ {0, 1}2u+1×2v+1 is defined by
⎧
⎪
⎪ if (k, l) with Bk,l = 1 and (n, m) with Cn,m = 1 and
⎨1
(B + C)i,j = (k + n, l + m) = (i, j ) exist,
⎪
⎪
⎩0 otherwise.

Remark 3.41 (Increasing Efficiency of Erosion and Dilation) In order to evaluate

the erosion (or dilation) at a pixel for a structure element with n elements, we have
100 3 Basic Tools

to find the minimum (or maximum) of n numbers; which can be achieved with
n − 1 pairwise comparisons. Let us further assume that the image consists of NM
pixels and that the boundary extension is of negligible cost. Then we observe that
the application of the erosion or dilation requires (n − 1)NM pairwise comparisons.
Due to the composition property in Theorem 3.29, we can increase the efficiency in
certain cases:
If B and C consist of n and m elements, respectively, then B +C can have at most
nm elements. Hence, for the calculation of u * (B + C), we need (nm − 1)NM
pairwise comparisons in the worst case. However, to compute (u * B) * C, we
need only (n + m − 2)NM pairwise comparisons. Already for moderately large
structure elements, this can make a significant difference, as the following example
demonstrates. Therein, we omit the zeros at the boundary of the structure element
and denote the center of a structure element by an underscore:
, - , - , - , -
11 + 101 + 10001 = 11111111 ,

B1 B2 B3
⎡ ⎤
11111111
⎢1 1 1 1 1 1 1 1⎥
⎢ ⎥
⎢1 1 1 1 1 1 1 1⎥
⎢ ⎥
⎢ ⎥
⎢1 1 1 1 1 1 1 1 ⎥
B1 + B2 + B3 + (B1T + B2T + B3T ) = ⎢ ⎥.
⎢1 1 1 1 1 1 1 1⎥
⎢ ⎥
⎢1 1 1 1 1 1 1 1⎥
⎢ ⎥
⎣1 1 1 1 1 1 1 1⎦
11111111

The erosion with an 8 × 8 square (64 elements, 63 pairwise comparisons) can hence
be reduced to 6 erosions with structure elements containing two elements each (6
pairwise comparisons).
The other morphological operators (opening, closing, hit-or-miss, top-hat trans-
formations) are obtained by combination. Their properties are analogous to the
continuous versions.
In the discrete variant, there is an effective generalization of erosion and dilation:
for erosion with a structure element with n elements, for each pixel of the image, the
smallest gray value is taken that is hit by the structure element (for dilation, we take
the largest one). The idea of this generalization is to sort the image values masked
by the structure element and the subsequent replacement by the nth value within
this rank order. These filters are called rank-order filters.
Definition 3.42 Let B ∈ {0, 1}2r+1×2s+1 be a structure element with n ≥ 1
elements. The elements will be indexed by I = {(k1 , l1 ), . . . , (kn , ln )}, i.e., we have
Bk,l = 1 if and only if (k, l) ∈ I . By sort(a1 , . . . , an ) we denote the nondecreasing
reordering of the vector (a1 , . . . , an ) ∈ Rn . The mth rank-order filter of a bounded
3.5 Further Developments 101

image u : Z2 → R is then given by

(um B)i,j = sort(ui+k1 ,j +l1 , . . . , ui+kn ,j +ln )m .

For k = 1 and k = n, we respectively obtain for erosion and dilation

u1 B = u * B, un B = u ⊕ B.

For odd n, sort(a1 , . . . , an )(n+1)/2 is called the median of the values a1 , . . . , an ,

i.e., the associated rank-order filter forms the median pointwise over the elements
masked by B. Hence, it is similar to the moving average in Sect. 3.3.3, but uses
the median instead of the average and is thus also called the ∼ median filter. For
median filters with structure elements B with an even number of elements, we use
the common definition of the median:

u(n+1)/2B if n odd,
medB (u) = 1
2 (un/2 B) + (un/2+1 B) if n even.

The operator performs a type of averaging that is more robust with respect to outliers
than the computation of the arithmetic mean. It is thus well suited for denoising in
case of so-called impulsive noise, i.e., the pixels are not all perturbed additively, but
random pixels have a random value that is independent of the original gray value;
refer to Fig. 3.17.

3.5 Further Developments

Even though this chapter treats basic methods, it is worthwhile to cover further
developments in this field in the following. In particular, in the area of linear filters,
there are noteworthy further developments. An initial motivation for linear filters
was the denoising of images. However, we quickly saw that this does not work
particularly well with linear filters, since in particular, edges cannot be preserved. In
order to remedy this, we recall the idea in Example 3.12. Therein, the noise should
be reduced by computing local averages. The blurring of the edges can then be
explained by the fact that the average near an edge considers the gray values on both
sides of the edge. The influence of the pixels on “the other side of the edge” can be
resolved by considering not only the spatial proximity, but also the proximity of the
gray values during the averaging process. The so-called ∼bilateral filter achieves
this as follows: for a given image u : Rd → R and two functions h : Rd → R and
g : R → R, it computes a new image by

1
Bh,g u(x) = u(y)h(x − y)g(u(x) − u(y)) dy.
Rd h(x − y)g(u(x) − u(y)) dy Rd
102 3 Basic Tools

Fig. 3.17 Denoising in case of impulsive noise. Upper left: Original image. Upper right: Image
perturbed by impulsive noise; 10% of the pixels were randomly replaced by black or white pixels
(PSNR of 8.7 db). Lower left: Application of the moving average with a small circular disk with
a radius of seven pixels (PSNR of 22.0 db). Lower right: Application of the median filter with a
circular structure element with a radius of two pixels (PSNR of 31.4 db). The values of the radii
were determined such that the respective PSNR is maximal

We observe that the function h denotes the weight of the gray value u(y) depending
on the distance x − y, while the function g presents the weight of the gray value
depending on the similarity of the gray values u(x) and u(y). The factor
u(y)
( Rd h(x − y)g(u(x) − u(y)) dy)−1 is a normalization factor that ensures that the
weights integrate to one in every point x. For linear filters
(in this case, when there
is no function g), it does not depend on x, and usually Rd h(y) dy = 1 is required.
The name “bilateral filter” traces back to Tomasi and Manduchi [136]. If we chose
h and g to be Gaussian functions, the filters are also called “nonlinear Gaussian
filters” after Aurich and Weule [11]. In the case of characteristic functions h and
g, the filter is also known as SUSAN [132]. The earliest reference for this kind of
filters is probably Yaroslavski [145]. The bilateral filter exhibits excellent properties
for edge-preserving denoising; see Fig. 3.18. A naive discretization of the integrals,
however, reveals a disadvantage: the numerical cost is significantly higher than for
linear filters, since the normalization factor has to be recalculated for every point
3.5 Further Developments 103

Fig. 3.18 The bilateral filter with Gaussian functions h and g, i.e., h(x) = exp(−|x|2 /(2σh2 )) and
g(x) = exp(−|x|2 /(2σg2 )) applied to the original image in Fig. 3.17. The range of gray values is
[0, 1], i.e., the distance of black to white amounts to one

x. Methods to increase the efficiency are covered in the overview article [108], for
instance.
Progressing a step further than bilateral filters, the so-called nonlocal aver-
ages [25] take averages not over values that are close and have a similar gray value,
but over values that have a similar neighborhood. Mathematically, this is realized
as follows: for an image u : Rd → R and a function g : Rd → R, we define the
function

h(x, y) = g(t)|u(x + t) − u(y + t)|2 dt.
Rd

For the choice of the function g, again a Gaussian function or the characteristic
function of a ball around the origin is suitable, for instance. The function h exhibits
a small value in (x, y) if the functions t → u(x + t) and t → u(y + t) are similar
104 3 Basic Tools

in a neighborhood of the origin. If they are dissimilar, the value is large. Hence, the
values x and y have similar neighborhoods in this sense if h(x, y) is small. This
motivates the following definition of the nonlocal averaging filter:

1
NL u(x) = u(y)e−h(x,y) dy.
Rd e−h(x,y) dy Rd

Nonlocal averaging filters are particularly well suited for denoising of regions
with textures. Their naive discretization is even more costly than for the bilateral
filter, since for every value x, we first have to determine h(x, y) by means of an
integration. For ideas regarding an efficient implementation, we refer to [23].
The median filter that we introduced in Sect. 3.4.4 was defined only for images
with discrete image domain. It was based on the idea of ordering the neighboring
pixels. A generalization to images u : Rd → R is given by the following: for a
measurable set B ⊂ Rd let

medB u(x) = inf sup u(x + y).

|B|
B ⊂B |B |≥ 2 y∈B

For this filter, there is a connection to the mean curvature flow, which we will cover
in Sect. 5.2.2. Roughly speaking, the iterated application of the median filter with
B = Bh (0) asymptotically corresponds (for h → 0) to a movement of the level
lines of the image into the direction of the normal with a velocity proportional to the
average curvature of the contour lines. For details, we refer to [69].
In this form, the median filter is based on the ordering of the real numbers. For
an image u : → F with discrete support but non-ordered color space F , the
concept of the median based on ordering cannot be transferred. Non-ordered color
spaces, for instance, arise in the case of color images (e.g. F = R3 ) or in so-called
diffusion tensor imaging; in which F is the set of symmetric matrices. If there is a
distance defined on F , we can use the following idea: the median of real numbers
a1 , . . . , an is a minimizer of the functional

n
F (a) = |a − ai |
i=1

(cf. Exercise 3.13). If there is a distance · defined on F , we can define the median
of n “color values” A1 , . . . , An as a minimizer of

n
F (A) = A − Ai .
i=1

In [143], it was shown that this procedure defines reasonable “medians” in case of
matrices Ai depending on the matrix norm, for instance.
3.6 Exercises 105

3.6 Exercises

Exercise 3.1 (Translation and Linear Coordinate Transformation on Lp (Rd ))

Show that the operators Ty and DA , for invertible matrices A, are well defined
as linear operators on Lp (Rd ) with 1 ≤ p ≤ ∞. Calculate the adjoint operators.
Are the operators continuous?
Exercise 3.2 (Commutativity of Ty and DA ) Show the following commutativity
relation of linear coordinate transformations and translations:

Ty DA = DA TAy .

Exercise 3.3 (Average of an Image in the Histogram) Let (, F, μ) be a σ -finite

measure space and u : → [0, 1] a measurable image. Show that
μ()
u(x) dx = s dHu .
0

Exercise 3.4 (Lp -Functions Are Continuous in the pth Mean) Let 1 ≤ p < ∞ and
u ∈ Lp (Rd ). Show that

h→0
Th u − up −→ 0.

In other words, the translation operator Th is continuous in the argument h on Lp .

Hint: Use the fact that according to Theorem 2.55, the continuous functions with compact
support are dense in Lp (Rd ).

Exercise 3.5 (Solution of the Heat Equation) Let Gσ be the d-dimensional Gaus-
sian function defined in (3.2) and

F (t, x) = G√2t (x).

1. Show that for t > 0, the function F solves the equation

∂t F = F.

2. Let u0 : Rd → R be bounded and continuous. Show that the function

u(t, x) = (u0 ∗ F (t, · ))(x)

106 3 Basic Tools

solves the initial-boundary value problem

∂t u(t, x) = u(t, x) for t > 0,

u(0, x) = u0 (x) for x ∈ Rd ,

the latter in the sense of u(0, x) = limt →0 u(t, x).

Exercise 3.6 (Rotational Invariance of the Laplace Operator) Show that the
Laplace operator

d
∂2
=
i=1
∂xi2

is rotationally invariant in Rd , i.e., for every twice continuously differentiable

function u : Rd → R and every rotation R ∈ Od (R), one has (u) ◦ R = (u ◦ R).
Furthermore, show that every rotationally invariant linear differential operator D
of order K ≥ 1 of the form
∂α
D= cα
∂x α
|α|≤K

does not exhibit any terms of odd order of differentiation, i.e., one has cα = 0 for
all multi-indices α with |α| odd.
Exercise 3.7 (Separability Test) We call a discrete two-dimensional filter mask
H ∈ R(2r+1)×(2r+1) separable if for some one-dimensional filter masks F, G ∈
R2r+1 , one has H = F ⊗ G (i.e., Hi,j = Fi Gj ).
Derive a method that for every H ∈ R(2r+1)×(2r+1), provides an n ≥ 0 as well
as separable filter masks Hk ∈ R(2r+1)×(2r+1), 1 ≤ k ≤ n, such that there holds

H = H1 + H2 + · · · + Hn

and n is minimal.
Exercise 3.8 (Proofs in the Morphology Section) Prove the remaining parts of
Theorems 3.29, 3.33, and 3.37.
Exercise 3.9 (Lipschitz Constants for Erosion and Dilation) Let B ⊂ Rd be a
nonempty structure element and u ∈ B(Rd ) Lipschitz continuous with constant
L > 0. Show that u * B and u ⊕ B are also Lipschitz continuous and that their
Lipschitz constants are less than or equal to L.
3.6 Exercises 107

Exercise 3.10 (Non-expansiveness of Erosion and Dilation) Let B ⊂ Rd be a

nonempty structure element. Show that the operations erosion and dilation are non-
expansive with respect to the ∞-norm, i.e., for u, v ∈ B(Rd ), one has

u ⊕ B − v ⊕ B∞ ≤ u − v∞
u * B − v * B∞ ≤ u − v∞ .

Exercise 3.11 (Counting Circles by Means of Morphological Methods) Suppose an

image contains circular objects of varying sizes:

Describe an algorithm (based on morphological operations) that returns the number

and sizes of the circles. Implement the algorithm.
Exercise 3.12 (Decomposition of Structure Elements)
1. Let a diamond-shaped structure element of size n be given by the set

Dn = {(i, j ) ∈ Z2 |i| + |j | ≤ n}.

How many elements does Dn contain? How can Dn be expressed as a sum of

O(log2 |Dn |) two-point structure elements?
2. Show that, if a structure element B is invariant with respect to opening, i.e., B =
B ◦ B1 for some B1 , then the element B can be decomposed as B = (−B1 ) + B2
for some structure element B2 .
Based on this observation, develop and implement a “greedy” algorithm for
the decomposition of a given structure element B0 :
(a) Find, if possible, a two-point element Z1 such that B0 = Z1 + B1 with a
minimal number of elements in B1 .
(b) As long as possible, continue the decomposition of the remainder according
to the scheme above such that we finally obtain B0 = Z1 + Z2 + · · · Zn + Bn .
Apply the algorithm to the set

K8 = (i, j ) ∈ Z2 i 2 + j 2 ≤ 82 .
108 3 Basic Tools

Exercise 3.13 (Description of the Median) Let be a1 , . . . , an ∈ R. Show that the

median of these values is a solution to the minimization problem

n
min |a − ai |.
a∈R
i=1

Furthermore, show that the arithmetic mean ā = (a1 + · · · + an )/n is the unique
solution to the minimization problem

n
min |a − ai |2 .
a∈R
i=1
Chapter 4
Frequency and Multiscale Methods

Like the methods covered in Chap. 3, the methods based on frequency or scale-
space decompositions belong to the older methods in image processing. In this
case, the basic idea is to transform an image into a different representation in order
to determine its properties or carry out manipulations. In this context, the Fourier
transformation plays an important role.

4.1 The Fourier Transform

Since Fourier transforms are naturally complex-valued, as we will see shortly, it

is reasonable to first introduce complex-valued images. Thus, in the following, we
assume

u : Rd → C.

In this section, we will define the Fourier transform for certain Lebesgue spaces and
measures of Sect. 2.2.2 as well as distributions of Sect. 2.3. We begin by defining
the Fourier transform on the space L1 (Rd ).

4.1.1 The Fourier Transform on L1 (Rd )

Definition 4.1 Let be u ∈ L1 (Rd ) and ξ ∈ Rd . Then the Fourier transform of u at

ξ is defined by

1
(F u)(ξ ) = 0
u(ξ ) = u(x)e−ix·ξ dx.
(2π)d/2 Rd
The mapping F : u → 0
u is called the Fourier transform as well.

© Springer Nature Switzerland AG 2018 109

K. Bredies, D. Lorenz, Mathematical Image Processing,
Applied and Numerical Harmonic Analysis,
https://doi.org/10.1007/978-3-030-01458-2_4
110 4 Frequency and Multiscale Methods

Lemma 4.2 As a mapping F : L1 (Rd ) → C (Rd ), the Fourier transform is well

defined, linear, and continuous.
Proof The integrand in the Fourier transform is continuous in ξ for almost all x
and bounded by |u(x)| for almost all ξ . By means of the dominated convergence
theorem, we obtain for ξn → ξ ,

lim 0
u(ξn ) = 0
u(ξ ),
n→∞

and hence the continuity of 0 u. The linearity of F is obvious, and the continuity
results from the estimate

1 −ix·ξ

1 1
|0
u(ξ )| = d/2 u(x)e dx ≤ d/2
|u(x)| dx = u1 ,
(2π) Rd (2π) Rd (2π)d/2

which implies 0
u∞ ≤ 1
(2π)d/2
u1 . $
#

The above lemma implies in particular that Fourier transforms of L1 -functions

are bounded.
Remark 4.3 (Alternative Definitions of the Fourier Transform) In other books,
other definitions of the Fourier transform are used. The following variants are
common

1
(F u)(ξ ) = u(x)e−ix·ξ dx,
(2π)d Rd

(F u)(ξ ) = u(x)e−ix·ξ dx,
Rd

(F u)(ξ ) = u(x)e−2πix·ξ dx.
Rd

Furthermore, the minus sign in the exponent may be omitted. Therefore, caution is
advised when using tables of Fourier transforms or looking up calculation rules.
The Fourier transform goes well with translations Ty and linear coordinate
transformations DA . Furthermore, it also goes well with modulations, which we
will define now.
Definition 4.4 For y ∈ Rd , we set

my : Rd → C, my (x) = eix·y

and thereby define the modulation of u by pointwise multiplication by my :

My : L1 (Rd ) → L1 (Rd ), My u = my u.
4.1 The Fourier Transform 111

In the following lemma, we collect some elementary properties of the Fourier

transform that will be helpful in the following.
Lemma 4.5 Let u ∈ L1 (Rd ), y ∈ Rd , and A ∈ Rd×d a regular matrix. Then we
have the following equalities

F (Ty u) = My (F u),
F (My u) = T−y (F u),
F (DA u) = |det A|−1 DA−T (F u),
F (u) = D− id (F u).

Proof One should first assure oneself that the operators Ty , My , and DA map
both L1 (Rd ) and C (Rd ) onto themselves, i.e., all occurring terms are well defined.
According to the transformation formula for integrals,

1
(F Mω Ty u)(ξ ) = u(x + y)e−ix·(ξ −ω) dx
(2π)d/2 Rd

1
= ei(ξ −ω)·y u(z)e−iz·(ξ −ω) dz
(2π)d/2 Rd

= (T−ω My F u)(ξ ).

For ω = 0, we obtain the translation formula, for y = 0, the modulation formula.

The formula for the linear coordinate transformation follows directly from the
transformation formula integrals as well, and the formula for the conjugation can
be obtained elementarily. $
#
The following symmetry properties are a direct consequence:
Corollary 4.6 For u ∈ L1 (Rd )

u real-valued &⇒ 0
u(ξ ) = 0
u(−ξ ),
u imaginary valued &⇒ 0
u(ξ ) = −0
u(−ξ ),
u real-valued &⇒ u(x) = u(−x),
0
u imaginary valued &⇒ u(x) = −u(−x).
0

An important and surprisingly elementary property of the Fourier transform is

its effect on convolutions introduced in Sect. 3.3.1. The Fourier transform maps
convolutions into pointwise multiplications:
Theorem 4.7 (Convolution Theorem) For u, v ∈ L1 (Rd )

F (u ∗ v) = (2π)d/2F (u)F (v).

112 4 Frequency and Multiscale Methods

Proof Applying Fubini’s theorem, we obtain

1
F (u ∗ v)(ξ ) = u(y)v(x − y) dy e−ix·ξ dx
(2π)d/2 Rd Rd

1
= u(y)e−iy·ξ v(x − y)e−i(x−y)·ξ dx dy
(2π)d/2 Rd Rd

= u(y)e−iy·ξ dy F (v)(ξ )
Rd

= (2π)d/2F (u)(ξ )F (v)(ξ ). $

Analogously to the convolution theorem, we can prove the following lemma:

Lemma 4.8 For u, v ∈ L1 (Rd )

0
u(ξ )v(ξ ) dξ = v (ξ ) dξ.
u(ξ )0
Rd Rd

At this point, it is tempting to state the assertion of the lemma as an equation of

inner products. According to Lemma 4.5, we would have

u, v)2 =
(0 0
u(ξ )v(ξ ) dξ = u(ξ )0
v(ξ ) dξ = v (−ξ ) dξ = (u, D− id0
u(ξ )0 v )2 .
Rd Rd Rd

However, this is not allowed at this instance, since in Definition 4.1, we defined
the Fourier transform for L1 -functions only. This was for a good reason, since for
L2 -functions, it cannot be readily ensured that the defining integral exists. Anyhow,
it appears desirable and will prove to be truly helpful to have access to the Fourier
transform not only on the (not even reflexive) Banach space L1 (Rd ), but also on the
Hilbert space L2 (Rd ).

4.1.2 The Fourier Transform on L2 (Rd )

The extension of the Fourier transform to the space L2 (Rd ) requires some further
work. As a first step, we define a “small” function space on which the Fourier
transform exhibits some further interesting properties—the Schwartz space:
Definition 4.9 The Schwartz space of rapidly decreasing functions is defined by

S(Rd ) = u ∈ C ∞ (Rd ) ∀α, β ∈ Nd : Cα,β (u) = sup |x α ∂x
∂β
β u(x)| < ∞ .
x∈Rd

A function u ∈ S(Rd ) is also called a Schwartz function.

4.1 The Fourier Transform 113

Roughly speaking, the Schwartz space contains smooth functions that tend to
zero faster than polynomials tend to infinity. It can be verified elementarily that
the Schwartz space is a vector space. In order to make it accessible for analytical
methods, we endow it with a topology. We describe this topology by defining a
notion of convergence for sequences of functions.
Definition 4.10 A sequence (un ) in the Schwartz space converges to u if and only
if for all multi-indices α, β, one has

Cα,β (un − u) → 0 for n → ∞.

Convergence in the Schwartz space is very restrictive: a sequence of functions

converges if it and all its derivatives multiplied by arbitrary monomials converge
uniformly.
Remark 4.11 For our purposes, the description of the topology S(Rd ) by conver-
gence of sequences suffices. Let us remark that the functionals Cα,β are so-called
seminorms on the Schwartz space and thereby turn it into a metrizable, locally
convex space; refer to [122], for instance.
Lemma 4.12 The Schwartz space is nonempty and closed with respect to deriva-
tives of arbitrary order as well as pointwise multiplication.
Proof An example of a function in S(Rd ) is given by u(x) = exp(−|x|2), as one
can show elementarily.
For u ∈ S(Rd ), one has for every multi-index γ that
γ
γ u) = Cα,β+γ (u) < ∞,
∂
Cα,β ( ∂x
γ
γ u ∈ S(R ).
∂ d
and hence we have ∂x
The fact that for u, v ∈ S(Rd ), the product uv lies in the Schwartz space as well
can be proved by means of the Leibniz rule for multi-indices (see Sect. 2.1.1). $
#
The Schwartz space is closely connected to the Fourier transform. The following
lemma presents further calculation rules for Fourier transforms of Schwartz func-
tions.
Lemma 4.13 Let u ∈ S(Rd ), α ∈ Nd a multi-index, and define pα (x) = x α . Then,
one has the equalities

F ( ∂∂x uα ) = i|α| pα F (u),

F (pα u) = i|α| ∂x
α
α F (u).
∂

Proof We begin with the following auxiliary calculations:

∂ α −ix·ξ ∂ α −ix·ξ
(e ) = (−i)|α| ξ α e−ix·ξ and x α eix·ξ = i|α| (e ).
∂x α ∂ξ α
114 4 Frequency and Multiscale Methods

By means of integration by parts, we obtain

α 1 ∂α −ix·ξ
F ( ∂x
∂
α u)(ξ ) = ∂x α u(x)e dx
(2π)d/2 Rd

1
= i|α| ξ α u(x)e−ix·ξ dx
(2π)d/2 Rd

= i|α| pα (ξ )F u(ξ ).

By interchanging the order of integration and differentiation, we arrive at

1
F (p u)(ξ ) =
α
u(x)x α e−ix·ξ dx
(2π)d/2 Rd

1
i|α| −ix·ξ
α
= ∂
u(x) ∂ξ αe dx
(2π)d/2 Rd

= i|α| ( ∂ξ
α
α F u)(ξ ).
∂

Both of the previous arguments are valid, since the integrands are infinitely
differentiable with respect to ξ and integrable with respect to x. $
#
We thus observe that the Fourier transform transforms a differentiation into a
multiplication and vice versa. This lets us assume already that the Schwartz space
S(Rd ) is mapped onto itself by the Fourier transform. In order to show this, we state
the following lemma:
|x|2
− 2
Lemma 4.14 For the Gaussian function G(x) = e , one has

0 ) = G(ξ ),
G(ξ

i.e., the Gaussian function is an eigenfunction of the Fourier transform correspond-

ing to the eigenvalue one.
Proof The Gaussian function can be written as a tensor product of one-dimensional

Gaussian functions g : R → R, g(t) = exp(−t 2 /2) such that G(x) = dk=1 g(xk ).
By Fubini’s theorem, we obtain

d
d
0 )= 1 −ixk ξk
G(ξ g(xk )e dx = 0
g (ξk ).
(2π)d/2 Rd k=1 k=1

In order to determine the Fourier transform of g, we remark that g satisfies

the differential equation g (t) = −tg(t). Applying the Fourier transform to this
equation, we by means of Lemma 4.13 obtain the differential equation −ω0
g (ω) =
g (ω). Furthermore, 0
0 g (0) = R g(t) dt = 1 = g(0). Therefore, the functions g and
4.1 The Fourier Transform 115

0
g satisfy the same differential equation with the same initial value. By the Picard-
Lindelöf theorem on uniqueness of solutions of initial value problems, they thus
have to coincide, which proves the assertion. $
#
Theorem 4.15 The Fourier transform is a continuous and bijective mapping of the
Schwartz space into itself. For u ∈ S(Rd ), we have the inversion formula

q(x) = 1
(F −1 F u)(x) = 0
u 0
u(ξ )eix·ξ dξ = u(x).
(2π)d/2 Rd

Proof According to Lemma 4.13, we have for every ξ ∈ Rd that

β α 1 α
|ξ α ∂ξ
∂
β0
u(ξ )| = |F ( ∂x
∂
α p u)(ξ )| ≤
β
∂ α pβ u1 . (4.1)
(2π)d/2 ∂x

Therefore, for u ∈ S(Rd ), we also have 0 u ∈ S(Rd ). Since the Fourier transform is
linear, it is sufficient to show the continuity at zero. We thus consider a null sequence
(un ) in the Schwartz space, i.e., as n → ∞, Cα,β (un ) → 0. That is, (un ), as well
as (∂ α pβ un ) for all α, β, converges to zero uniformly. This implies that the right-
hand side in (4.1) tends to zero. In particular, we obtain that Cα,β (u0n ) → 0, which
implies that (u0n ) is a null sequence, proving continuity.
In order to prove the inversion formula, we for now consider two arbitrary
functions u, φ ∈ S(Rd ). By means of Lemma 4.8 and the calculation rules for
translation and modulation given in Lemma 4.5, we infer for the convolution of 0 u
and φ that

(0
u ∗ φ)(x) = 0
u(y)φ(x − y) dy = u(y)eix·y 0
0 φ (−y) dy
Rd Rd

= u(y)0
φ(−x − y) dy = (u ∗ 0
φ)(−x).
Rd

Now we choose φ to be a rescaled Gaussian function:

|x|2
−
φε (x) = ε−d (Dε−1 id G)(x) = ε−d e 2ε2 .

According to the calculation rule for linear coordinate transformations of

0 and hence we have φ0ε = ε−d Dε−1 id G
Lemma 4.5, we infer that φ0ε = Dε id G, 0
as well. According to Lemma 4.14, G 0
0 = G and thus φε = φε . Since u is in
particular bounded and continuous, and furthermore, G is positive and its integral
is normalized to one, we can apply Theorem 3.13 and obtain that for ε → 0,

0
u ∗ φε (x) → 0
u(x) and u ∗ φε (−x) → u(−x).
116 4 Frequency and Multiscale Methods

We hence conclude that

0
0
u(x) = u(−x).

Note that we can state the inversion formula for the Fourier transform in the
following way as well:

q = F u.
u

q=
According to the calculation rule for conjugation in Lemma 4.5, we infer that u
D− id0
u, and substituting 0
u for u, this altogether results in

q
u = D− id0
0 u = u. $
#

Theorem 4.16 There is a unique continuous operator F : L2 (Rd ) → L2 (Rd ) that

extends the Fourier transform F to S(Rd ) and satisfies the equation u2 = F u2
for all u ∈ L2 (Rd ).
Furthermore, this operator F is bijective, and its inverse F −1 is a continuous
extension of F −1 onto S(Rd ).
Proof For two functions u, v ∈ S(Rd ), we infer according to Lemma 4.8 that

(0 v )2 = (u, v)2
u,0

and in particular u2 = F u2 . Thus, the Fourier transform is an isometry defined
on a dense subset of L2 (Rd ). Hence, there exists a unique continuous extension onto
the whole space. Due to the symmetry between F and F −1 , an analogous argument
yields the remainder of the assertion. $
#
Remark 4.17 As remarked earlier, the formula

1
F (u)(ξ ) = u(x)e−iξ ·x dx
(2π)d/2 Rd

cannot be applied to a function u ∈ L2 (Rd ), since the integral does not necessarily
exist. However, for u ∈ L2 (Rd ), there holds that the function

1
ψR (ξ ) = u(x)e−iξ ·x dx
(2π)d/2 |x|≤R

converges to 0u for R → ∞ in the sense of L2 convergence. An analogous statement

holds for the inversion formula. In the following, we will neglect this distinction and
use the integral representation for L2 -functions as well.
4.1 The Fourier Transform 117

The isometry property u2 = F u2 also implies that

(u, v)2 = (F u, F v)2 , (4.2)

which is known as Plancherel’s formula.

The calculation rules in Lemma 4.5, the symmetry relations in Corollary 4.6,
and the convolution Theorem 4.7 also hold for the Fourier transform on L2 (Rd ),
of course; refer also to Sect. 4.1.3. The inversion formula enables the following
interpretation of the Fourier transform:
Example 4.18 (Frequency Representation of a Function) For u ∈ L2 (Rd ), accord-
ing for the inversion formula we have

1
u(x) = 0
u(ξ )eix·ξ dξ.
(2π)d/2 Rd

Thus, we can say that in some sense, u can be expressed as a superposition of

complex exponential functions and that furthermore, 0 u(ξ ) indicates the extend to
which the corresponding exponential function x → eix·ξ contributes to u. For this
reason, 0u is also called the frequency representation of u (in this context, we also
call u itself the spatial representation, or for d = 1, the time representation).
The convolution theorem now facilitates a new interpretation of the linear filters
in Sect. 3.3:
Example 4.19 (Interpretation of Linear Filters in Frequency Representation) For a
function u ∈ L2 (Rd ) and a convolution kernel h ∈ L2 (Rd ), one has

F (u ∗ h) = (2π)d/2F (u)F (h).

This is to say that the Fourier transform of h indicates in what way the frequency
components of u are damped, amplified, or modulated. We also call 0 h the transfer
function in this context. A convolution kernel h whose transfer function 0 h is zero
(or attains small values) for large ξ is called a ∼low-pass filter, since it lets low
frequencies pass. Analogously, we call h a ∼high-pass filter if 0 h(ξ ) is zero (or small)
for small ξ . Since noise contains many high-frequency components, one can try to
reduce noise by a low-pass filter. For image processing, it is a disadvantage in this
context that edges also exhibit many high-frequency components. Hence, a low-pass
filter necessarily blurs the edges as well. It turns out that edge-preserving denoising
cannot be accomplished with linear filters; cf. Fig. 4.1 as well.

Example 4.20 (Image Decomposition into High- and Low-Frequency Components)

By means of high- and low-pass filters, we can decompose a given image into its
high- and low-frequency components. for this purpose, let h be the so-called perfect
118 4 Frequency and Multiscale Methods

Fig. 4.1 High- and low-pass filters applied to an image. The Fourier transform of the low-pass
filter g is a characteristic function of a ball around the origin, and the Fourier transform of the
high-pass filter h is a characteristic function of an annulus around the origin. Note that the filters
oscillate slightly, which is noticeable in the images as well

low-pass filter, i.e., for a radius r > 0, the Fourier transform of h is given by

0 1
h= χB (0).
(2π)d/2 r

The low- and high-frequency components of the image u are then respectively given
by

ulow = u ∗ h and uhigh = u − ulow .

4.1 The Fourier Transform 119

u ulow uhigh

Fig. 4.2 Image decomposition into low- and high-frequency components

In particular,

u1 u(1 − (2π)d/20
high = 0 h) = 0
u · χ{|ξ |>r} .

In Fig. 4.2, we observe that the separation of textured components by this method
has its limitations: the low-frequency component now contains almost no texture of
the fabric, whereas this is contained in the high-frequency component. However,
we also find essential portions of the edges, i.e., the separation of texture from
nontextured components is not very good.

Remark 4.21 (Deconvolution with the Fourier Transform) The “out-of-focus” and
motion blurring as well as other models of blurring assume that the blurring is
modeled by a linear filter, i.e., by a convolution. A deblurring can in this case be
achieved by a deconvolution: For a blurred image u given by

u = u0 ∗ h

u = 2π u000
with an unknown image u0 and a known convolution kernel h, one has 0 h
(in the two-dimensional case), and we obtain the unknown image by
0
u
u0 = F −1 .
2π 0
h

If u is measured exactly, the model for h is accurate, and 0 h(ξ ) = 0, then it is

actually possible in this way to eliminate the blurring exactly. However, if u is not
available exactly, also for accurate h, a problem arises: typically, 0 h exhibits zeros
and (somehow more severe) arbitrarily small values. If now instead of u only ũ is
given, which is endowed with an error, then 0̃u exhibits an error as well. By division
by 0
h, this error can be magnified arbitrarily. To observe this, it is sufficient in this
case to use a fine quantization of the gray values as an error in ũ; cf. Fig. 4.3.
120 4 Frequency and Multiscale Methods

Fig. 4.3 Deconvolution with the Fourier transform. The convolution kernel h models a motion
blurring. The degraded image ũ results from a quantization of u into 256 gray values (a difference
the eye cannot perceive). After the deconvolution, the error becomes unpleasantly apparent

4.1.3 The Fourier Transform for Measures and Tempered

Distributions

We can not only extend the Fourier transform from the Schwartz space S(Rd ) to
L2 (Rd ), but even define it for certain distributions as well. For this purpose, we
define a specific space of distributions:
Definition 4.22 By S(Rd )∗ we denote the dual space of S(Rd ), i.e., the space of
all linear and continuous functionals T : S(Rd ) → C. We call this space the space
of tempered distributions.
Tempered distributions are distributions in the sense of Sect. 2.3. Furthermore,
there are both regular and non-regular tempered distributions. The delta-distribution
is a non-regular tempered distribution, for example. In particular, every function
u ∈ S(Rd ) induces a regular tempered distribution Tu :

Tu (φ) = u(x)φ(x) dx,
Rd

but every polynomial function u also does so.

4.1 The Fourier Transform 121

Remark 4.23 We use the notation Tu for the distribution induced by u and the
similar notation Ty for the translation by y as long as there cannot be any confusion.
Our goal is to define a Fourier transform for tempered distributions. Since one
often does not distinguish between a function and the induced distribution, it is
reasonable to denote the Fourier transform of Tu by T0u = T0 u . According to
Lemma 4.8,

T0u (φ) = 0
u(ξ )φ(ξ ) dξ = u(ξ )0
φ(ξ ) dξ = Tu (0
φ).
Rd Rd

This motivates the following definition:

Definition 4.24 The Fourier transform of T ∈ S(Rd )∗ is defined by

T0(φ) = T (0
φ).

Analogously, the inverse Fourier transform of T is given by

Tq (φ) = T (φ).
q

Since the Fourier transform is bijective from the Schwartz space to itself, the
same holds if we view the Fourier transform as a map from the space of tempered
distributions to itself.
Theorem 4.25 As a mapping of the space of tempered distributions into itself, the
Fourier transform T → T0 is bijective and is inverted by T → Tq .
Since according to the Riesz-Markov representation theorem (Theorem 2.62),
Radon measures are elements of the dual space of continuous functions, they are in
particular tempered distributions as well. Hence, by Definition 4.24, we have defined
a Fourier transform for Radon measures as well.
Example 4.26 The distribution belonging to the Dirac measure δx of Example 2.38
is the delta distribution, denoted by δx as well:

δx (φ) = φ dδx = φ(x).
Rd

Its Fourier transform is given by

1
δ0x (φ) = δx (0
φ) = 0
φ (x) = e−ix·y φ(y) dy.
Rd (2π)d/2

Therefore, the Fourier transform of δx is regular and represented by the function

y → (2π)1 d/2 e−ix·y . In particular, the Fourier transform of δ0 is given by the constant
function 1/(2π)d/2.
122 4 Frequency and Multiscale Methods

Calculation with tempered distributions in the context of the Fourier transform

does not usually constitute a significant difficulty. We illustrate this using the
example of the convolution theorem on L2 (Rd ):
Theorem 4.27 For u, v ∈ L2 (Rd ), one has for almost all ξ ∈ Rd that

∗ v(ξ ) = (2π)d/20
u u(ξ )0
v (ξ ).

Proof We calculate “distributionally” and show the equality T1

u∗v = T(2π)d/20v:
u0

(u ∗ v)(ξ )0
φ (ξ ) dξ = u(y)v(ξ − y) dy 0
φ (ξ ) dξ
Rd Rd Rd

= u(y) v(ξ − y)0
φ (ξ ) dξ dy
Rd Rd

1
= u(y) v(ξ − y)φ(ξ )e−iξ ·x dx dξ dy
Rd Rd (2π)d/2 Rd

= u(y) v (x)e−iy·x φ(x) dx dy
0
Rd Rd

= (2π)d/20 v (x)φ(x) dx.
u(x)0 $
#
Rd

The computation rules for Fourier transforms and derivatives in Lemma 4.13
hold analogously for weak derivatives:
Lemma 4.28 Let u ∈ L2 (Rd ) and α ∈ Nd be such that the weak derivative ∂ α u
lies in L2 (Rd ) as well. Then

F (∂ α u) = i|α| pα F (u).

For pα u ∈ L2 (Rd ), there holds

F (pα u) = i|α| ∂ α F (u).

Proof As above, we show the equation in the distributional sense. We use integra-
tion by parts, Lemma 4.13, and the Plancherel formula (4.2), to obtain for a Schwartz
function φ,

T1
∂ α u (φ) = T∂ α u (0
φ) = ∂ α u(x)0
φ (x) dx
Rd

= (−1)|α| u(x)∂ α 0
φ (x) dx
Rd
4.1 The Fourier Transform 123

= (−1)|α| u(x)(−i
|α| pα φ)(x) dx
Rd

|α|
= (−1) u(x)(−i|α| pα (x)φ(x)) dx = Ti|α| pα 0
0 u (φ).
Rd

The second assertion is left as Exercise 4.8. $

#
This lemma yields the following characterization of the Sobolev spaces H k (Rd ):
Theorem 4.29 For k ∈ N,

u ∈ H k (Rd ) ⇐⇒ (1 + |ξ |2 )k |0
u(ξ )|2 dξ < ∞.
Rd

Proof The Sobolev space H k (Rd ) consists

of those L2 (Rd ) functions u for which
the corresponding norm uk,2 =2
|α|≤k ∂ u2 is finite. By means of the
α 2

Plancherel formula, this translates into

u2k,2 = ∂1α u2
2
|α|≤k

= u(ξ )|2 dξ
|ξ α0
|α|≤k Rd

= |ξ α |2 |0
u(ξ )|2 dξ.
Rd |α|≤k

The
asserted equivalence now follows from the fact that the functions h(ξ ) =
|α|≤k |ξ α |2 and g(ξ ) = (1 + |ξ |2 )k are comparable, i.e., they can be estimated

against each other by constants that depend on k and d only, cf. Exercise 4.9. This
1/2
shows in particular that Rd (1 + |ξ |2 )k |0 u(ξ )|2 dξ is an equivalent norm on
k d
H (R ). $
#
Another way to put the previous theorem is that Sobolev space H k (Rd ) is the
Fourier transform of the weighted Lebesgue space L2(1+| · |2 )k Ld (Rd ).

Example 4.30 Roughly speaking, (weak) differentiability of a function is reflected

in a rapid decay of the Fourier transform at infinity. Concerning this, we exemplarily
consider the Fourier transforms of the L2 (R)-functions

u(x) = χ[−1,1] (x),

v(x) = exp(−x 2 ),
w(x) = (1 + x 2 )−1
124 4 Frequency and Multiscale Methods

Fig. 4.4 Illustration to Example 4.30: the smoothness of a function is reflected in the rapid decay
of the Fourier transform (and vice versa)

depicted in Fig. 4.4. The Fourier transform of u exhibits a decay rate at infinity
like |ξ |−1 ; in particular, the function ξ → |ξ |20
u(ξ ) is not in L2 (R). For v and w,
however, the Fourier transforms decay exponentially (cf. Exercise 4.4); in particular,
ξ → |ξ |k0v (ξ ) is an L2 (R)-function for every k ∈ N (just as it is for w). Conversely,
the slow decay of w is reflected in non-differentiability of w 0.
The relationship between smoothness and decay is of fundamental importance
for image processing: images with discontinuities never have a rapidly decaying
Fourier transform. This demonstrates again that in filtering with low-pass filters,
edges are necessarily smoothed and hence become blurred (cf. Example 4.19).
The equivalence in Theorem 4.29 motivates us to define Sobolev spaces for
arbitrary smoothness s ∈ R as well:
Definition 4.31 The fractional Sobolev space to s ∈ R is defined by

u ∈ H s (Rd ) ⇐⇒ (1 + |ξ |2 )s |0
u(ξ )|2 dξ < ∞.
Rd

The space is endowed with the following inner product:

(u, v)H s (Rd ) = (1 + |ξ |2 )s0 v (ξ ) dξ.
u(ξ )0
Rd

Remark 4.32 In fractional Sobolev spaces, “nonsmooth” functions can still exhibit
a certain smoothness. For instance, the characteristic function u(x) = χ[−1,1] (x)
lies in the space H s (R) for every s ∈ [0, 1/2[ as one should convince oneself in
Exercise 4.10.
4.2 Fourier Series and the Sampling Theorem 125

4.2 Fourier Series and the Sampling Theorem

Apart from the Fourier transform on L1 (Rd ), L2 (Rd ), S(Rd ), and S(Rd )∗ , anal-

ogous transformations for functions f on rectangles dk=1 [ak , bk ] ⊂ Rd are of
interest for image processing as well. This leads to the so-called Fourier series. By
means of these, we will prove the sampling theorem, which explains the connection
of a continuous image to its discrete sampled version. Furthermore, we will be able
to explain the aliasing in Fig. 3.1 that results from incorrect sampling.

4.2.1 Fourier Series

For now, we consider one-dimensional signals u : [−π, π] → C. We will obtain

signals on general bounded intervals through scaling, and we will cover higher
dimensional mappings by forming tensor products. In contrast to the Fourier trans-
form situation, we can begin directly in the Hilbert space L2 ([−π, π]) immediately
in the case of Fourier series. We endow it with the normalized inner product
π
1
(u, v)[−π,π] = u(x)v(x) dx.
2π −π

The result of relevance for us is given in the following theorem:

Theorem 4.33 For k ∈ Z, we set ek (x) = eikx . These functions (ek )k∈Z form an
orthonormal basis of L2 ([−π, π]). In particular, every function u ∈ L2 ([−π, π])
can be expressed as a Fourier series

u= (u, ek )[−π,π] ek .
k∈Z

Proof The orthonormality of the functions (ek )k can be verified elementarily. In

order to show that (ek )k forms a basis, we will show that the linear span of (ek )k
is dense in L2 ([−π, π]). According to the Weierstrass approximation theorem for
trigonometric polynomials (cf. [123], for instance), for every continuous function
:k[−π, π] → C and every ε > 0, there exists a trigonometric polynomial Pk (x) =
u
n=−k an en (x) such that |u(x) − Pk (x)| ≤ ε. This implies
π
1
u − Pk 22 = |u(x) − Pk (x)|2 dx ≤ ε2 .
2π −π

Since the continuous functions are dense in L2 ([−π, π]), every L2 -function can also
be approximated by trigonometric polynomials arbitrarily well, and we conclude
that (ek )k forms a basis. $
#
126 4 Frequency and Multiscale Methods

The values
π
1
(u, ek )[−π,π] = u(x)e−ikx dx
2π −π

are called Fourier coefficients of u.

Remark 4.34 For functions in L2 ([−B, B]), we define the inner product
B
1
(u, v)[−B,B] = u(x)v(x) dx.
2B −B

π
In this case, the functions ek (x) = eik B x form an orthonormal basis and together
with the Fourier coefficients
B
1 π
(u, ek )[−B,B] = u(x)e−ik B x dx
2B −B

of u ∈ L2 ([−B, B]), one has

u= (u, ek )[−B,B] ek .
k∈Z

d
On a d-dimensional rectangle = l=1 [−Bl , Bl ], we define the functions
, d by
(ek, )k∈Z

d π
ikl B xl
ek, (x) = e l

l=1

and obtain an orthonormal basis in L2 () with respect to the inner product

1
(u, v) = d u(x)v(x) dx.
2 d
l=1 Bl

4.2.2 The Sampling Theorem

Continuous one-dimensional signals u : R → C are usually sampled with a constant

sampling rate T > 0, i.e., the values (u(nT ))n∈Z are measured. As illustrated in
Figs. 3.1 and 4.5, the discrete sampling of a signal can have unexpected effects. In
particular, the sampled function does not necessarily reflect the actual function well.
The following theorem shows that under certain conditions, the discrete sampling
points nevertheless carry the entire information of the signal.
4.2 Fourier Series and the Sampling Theorem 127

4π

-1

Fig. 4.5 Sampling of the function u(x) = sin(5x). While sampling with rate T = 0.1 (crosses)
reflects the function well, the sampling rate T = 1.2 (dots) yields a totally wrong impression, since
it suggests a much too low frequency

Theorem 4.35 (Shannon-Whittaker Sampling Theorem) Let B > 0 and u ∈

L2 (R) be such that 0
u(ξ ) = 0 for |ξ | > B. Then u is determined by the values
(u(kπ/B))k∈Z and for all x ∈ R, one has the reconstruction formula
B
u(x) = u( kπ
B ) sinc π (x − kπ
B ) .
k∈Z

Proof In this proof, we make use of the trick that 0u can be regarded as an element in
L2 (R) as well as in L2 ([−B, B]). Thus, we can consider both the Fourier transform
and the Fourier series of 0
u.
Since 0
u lies in L2 ([−B, B]), it lies in L1 ([−B, B]) as well. Thus, u is continuous
and the evaluation of u at a point is well defined. We use the inversion formula of
π
the Fourier transform, and with the basis functions ek (x) = eik B x , we obtain
B !
1 kπ
B )= √
u( kπ u(ξ )eiξ
0 B dξ = 2
πB (0
u, e−k )[−B,B] .
2π −B

Hence, the values u( kπB ) determine the coefficients (0u, e−k )[−B,B] , and due to 0u∈
L ([−B, B]), they actually determine the whole function 0
2 u. This proves that u is
determined by the values (u( kπB ))k∈Z .
In order to prove the reconstruction formula, we develop 0 u into its Fourier series
and note that for ξ ∈ R, we need to restrict the result by means of the characteristic
function χ[−B,B] :
2
π 1
0
u(ξ ) = u, ek )[−B,B] ek (ξ )χ[−B,B] (ξ ) =
(0 u(− kπ
B )ek (ξ )χ[−B,B] (ξ ).
2B
k∈Z k∈Z

Since the inverse Fourier transform is continuous, we can pull it inside the series
and obtain
2
π 1 −1
u= u(− kπ
B )F (ek χ[−B,B] ).
2B
k∈Z
128 4 Frequency and Multiscale Methods

By means of the calculation rules in Lemma 4.5 and Exercise 4.3, we infer

F −1 (ek χ[−B,B] )(x) = F (Mk π χ[−B,B] )(x)

B
! B
= D−1 T−k π F (χ[−B,B] )(x) = 2
πB sinc π (−x − kπ
B )
B
! B
= 2
πB sinc π (x + kπ
B ) .

Inserting this expression into the previous equation yields the assertion. $
#
Remark 4.36 In the above case, we call B the bandwidth of the signal. This
bandwidth indicates the highest frequency contained in the signal. Expressed in
words, the sampling theorem reads:
π
A signal with bandwidth B has to be sampled with sampling rate B in order to store all
information of the signal.

We here use the word “frequency” not in the sense in which it is often used
in engineering. In this context, typically the angular frequency f = 2πB is used.
Also, the variant of the Fourier transform in Remark 4.3 including the term e−2πix·ξ
is common there. For this situation, the assertion of the sampling theorem reads:
If a signal exhibits frequencies up to a maximal angular frequency f , it has to be sampled
1
with the sampling rate 2f in order to store all information of the signal.

That is, one has to sample twice as fast as the highest angular frequency. The
1
sampling frequency 2f is also called the Nyquist rate or Nyquist frequency.

4.2.3 Aliasing

Aliasing is what we observe in Figs. 3.1 and 4.5: the discrete image or signal does
not match the original signal, since in the discrete version, frequencies arise that are
not contained in the original. As “aliases,” they stand for the actual frequencies.
In the previous subsection, we saw that this effect cannot occur if the signal is
sampled at a sufficiently high rate. In this subsection, we desire to understand how
exactly aliasing arises and how we can eliminate it.
For this purpose, we need an additional tool:

Formula) Let u2 ∈ L (R) and B > 0be suchkπthat2

Lemma 4.37 (Poisson 2

either the function k∈Z 0u( · + 2Bk) ∈ L ([−B, B]) or the series k∈Z |u( B )|
converges. Then, for almost all ξ ∈ R,
√
2π kπ −i kπ ξ
0
u(ξ + 2Bk) = u( B )e B .
2B
k∈Z k∈Z
4.2 Fourier Series and the Sampling Theorem 129

Proof We define the periodization of 0

u by

φ(ξ ) = 0
u(ξ + 2Bk).
k∈Z

For φ ∈ L2 ([−B, B]), we can represent the function by its Fourier series. The
Fourier coefficients are given by
B
1 kπ
(φ, ek )[−B,B] = φ(ξ )e−i B ξ dξ
2B −B
B
1 kπ
= u(ξ + 2Bl)e−i B ξ dξ
0
2B −B l∈Z
B
1 kπ
= u(ξ + 2Bl)e−i B
0 (ξ +2Bl)
dξ
2B −B l∈Z

1 kπ
= u(ξ )e−i B ξ dξ
0
2B R
√
2π
= u(− kπ
B ).
2B
Therefore, the Fourier series

φ(ξ ) = (φ, ek )[−B,B] ek (ξ )
k∈Z
√
2π kπ
= u(− kπ
B )ei B ξ
2B
k∈Z

is convergent in the L2 -sense, which implies the assertion.

Conversely, the above Fourier series converges if the coefficients (u( kπ
B ))k are
square-summable, and the assertion follows as well. $
#
Remark 4.38 For the spacial case ξ = 0, we obtain the remarkable formula
√
2π π
0
u(2Bk) = u( B k),
2B
k∈Z k∈Z

which relates the values of u and 0

u.
We will now consider sampling in greater detail. Expressed by means of
distributions, we can represent a signal, sampled discretely with rate Bπ , as a delta
130 4 Frequency and Multiscale Methods

comb, as discussed in Remark 3.3:

ud = u( kπ
B )δk .
π
B
k∈Z

The connection between u and ud can be clarified via the Fourier transform:
Lemma 4.39 For almost all ξ ∈ R, one has

B
u0d (ξ ) = 0
u(ξ + 2Bk).
π
k∈Z

Proof According to Example 4.26, the Fourier transform of δk π is given by

1 kπ
F (δk π )(ξ ) = √ e−i B ξ .
B 2π

Due to the Poisson formula in Lemma 4.37, we therefore have

1 kπ −i kπ ξ
u0d (ξ ) = √ u( )e B
2π k∈Z B
B
= 0
u(ξ + 2Bn). $
#
π
n∈Z

Expressed in words, the lemma states that the Fourier transform of the sampled
signal corresponds to a periodization of the Fourier transform of the original signal
with period 2B.
In this way of speaking, we can interpret the reconstruction formula in the
sampling theorem (Theorem 4.35) as a convolution as well:

u(x) = B ) sinc( π (x −
u( kπ B kπ
B )) = ud ∗ sinc( Bπ · )(x).
k∈Z

In the Fourier realm, this formally means

0
u(ξ ) = u0d (ξ ) Bπ χ[−B,B] (ξ ).

If the support of 0u is contained in the interval [−B, B], then no overlap occurs
during periodization, and u0d Bπ χ[−B,B] corresponds to 0
u exactly. This procedure is
depicted in Fig. 4.6.
However, if 0
u has a larger support, then the support of 0
u( · + 2Bk) exhibits a
nonempty intersection with [−B, B] for several k. This “folding” in the frequency
domain is responsible for aliasing; cf. Fig. 4.7.
4.2 Fourier Series and the Sampling Theorem 131

u(ξ ) u(x)

ξ x
−B B

ud (ξ ) ud (x)

ξ x
−3B −B B 3B π
B

χ[−B,B] (ξ ) sinc( B
π x)

ξ x
−B 3π − π π
B − 5π
B − B B B
3π
B
5π
B

ud (ξ )χ[−B,B] (ξ ) ud ∗ sinc( Bπ ·)(x)

ξ x
B B

Fig. 4.6 Reconstruction of a discretized signal by means of the reconstruction formula in the
sampling theorem. First row: A signal u and its Fourier transform. Second row: Sampling the
function renders the Fourier transform periodic. Third row: The sinc convolution kernel and its
Fourier transform. Fourth row: Convoluting with the sinc function reconstructs the signal perfectly

Example 4.40 (Sampling of Harmonic Oscillations) We consider a harmonic oscil-

lation

eiξ0 x + e−iξ0 x
u(x) = cos(ξ0 x) = .
2
Its Fourier transform is given by
!
0
u= π
2 (δξ0 + δ−ξ0 ).

Hence, the signal formally has bandwidth ξ0 . If we assume another bandwidth B

and accordingly sample the signal with rate π/B, we obtain in the Fourier realm
!
u0d = Bπ π2 (δξ0 −2kB + δ−ξ0 −2kB ).
k∈Z
132 4 Frequency and Multiscale Methods

u(ξ ) u(x)

ξ x
−B B

ud (ξ ) ud (x)

ξ x
−3B −B B 3B π
B

χ[−B,B] (ξ ) sinc( B
π x)

ξ x
3π − π π
−B B − 5π
B − B B B
3π
B
5π
B

ud (ξ )χ[−B,B] (ξ ) ud ∗ sinc( Bπ ·)(x)

ξ x
B B

Fig. 4.7 Illustration of aliasing. First row: A signal u and its Fourier transform. Second row:
Sampling of the function renders the Fourier transform periodic and produces an overlap. Third
row: The sinc convolution kernel and its Fourier transform. Fourth row: Convoluting with the sinc
function reconstructs a signal in which the high-frequency components are represented by low
frequencies

Reconstructing according to the sampling theorem (Theorem 4.35) means to restrict

the support of u0d to the interval [−B, B]. In order to understand what this implies
for the signal, we need to study how this restriction affects the series.
Oversampling: If we assume a bandwidth B > ξ0 that is too large, we sample
the signal too fast. Then, the terms in the series for u0d that lie in the interval
[−B, B] are precisely these that belong to k = 0. One has
!
u0d = π
B
π
2 (δξ0 −2kB + δ−ξ0 −2kB ) Bπ χ[−B,B]
k∈Z
!
= π
2 (δξ0 + δ−ξ0 ) = 0
u.

This implies ud = u, i.e., we reconstruct the signal perfectly.

4.2 Fourier Series and the Sampling Theorem 133

Undersampling: If we take a bandwidth B < ξ0 < 3B that is too small, we

sample the signal too slowly. Then again, exactly two terms in the series for u0d
lie in the interval [−B, B], namely δξ0 −2B and δ−ξ0 +2B , i.e., we have
!
u0d B
π χ[−B,B] = π
2 (δξ0 −2B + δ−(ξ0 −2B) ).

We hence reconstruct the signal

urek (x) = cos((ξ0 − 2B)x).

The reconstruction is again a harmonic oscillation, but it exhibits a different

frequency.
Through undersampling, high frequencies ξ0 are represented by low frequencies
in [−B, B]. As an exercise, one can calculate the frequency observed from
undersampling in Fig. 4.5.
Remark 4.41 (Sampling in 2D) We obtain a simple generalization of the sampling
theorem and the explanation of aliasing for two dimensions by forming the tensor
product. Let u : R2 → C be such that its Fourier transform 0 u has its support in
the rectangle [−B1 , B1 ] × [−B2 , B2 ]. In this case, u is determined by the values
u(k1 π/B1 , k2 π/B2 ), and
k1 π B2 k2 π
u(x1 , x2 ) = u kB1 π1 , kB2 π2 sinc Bπ1 (x1 − B1 ) sinc π (x2 − B2 ) .
k∈Z2

We denote by

ud = u(k1 T1 , k2 T2 ) δ(k1 T1 ,k2 T2 )
k∈Z2

an image that is discretely sampled on a rectangular grid with sampling rates T1 and
T2 . By means of the Fourier transform, the connection to the continuous image u
can be expressed as

B1 B2
u0d (ξ ) = 0
u(ξ1 + 2B1 k1 , ξ2 + 2B2 k2 ).
π2 2
k∈Z

Also in this case, aliasing occurs if the image does not have finite bandwidth or is
sampled too slowly. In addition to the change of frequency, a change in the direction
also may occur here:
134 4 Frequency and Multiscale Methods

(ξ1 , ξ2 )
B

(ξ1 , ξ2 − 2B)

Example 4.42 (Undersampling, Preventing

Aliasing) If we want reduce the size
of
a given discrete image u d = k∈Z 2 uk δk by the factor l ∈ N, we obtain ud =
l

k∈Z2 ulk δlk . Also during this undersampling, aliasing arises; cf. Fig. 3.1. In order
to prevent this, a low-pass filter h should be applied before the undersampling in
order to eliminate those frequencies that due to the aliasing would be reconstructed
as incorrect frequencies. It suggests itself to choose this filter as the perfect low-
pass filter with width π/ l, i.e., we have 0 h = χ[−π/ l,π/ l]2 . This prevents aliasing;
cf. Fig. 4.8.

Fig. 4.8 Preventing aliasing by low-pass filtering. For better comparability, the subsampled
images are rescaled to the original size
4.3 The Discrete Fourier Transform 135

4.3 The Discrete Fourier Transform

For the numerical realization of frequency methods, we need the discrete Fourier
transform. Also in this case, we shall study one-dimensional discrete images for
now and will obtain the higher-dimensional version later as a tensor product. Hence,
we consider

u : {0, . . . , N − 1} → C.

These images form an N-dimensional vector space CN , which by means of the inner
product

N−1
(u, v) = un v n
n=0

turns into a Hilbert space.

u ∈ CN of u ∈
Definition 4.43 The one-dimensional discrete Fourier transform 0
N
C is defined by

1
N−1 −2πink
0
uk = un exp .
N N
n=0

By means of the vectors

−2πink
bn = exp
N k=0,...,N−1

and the resulting matrix

, -
B = b 0 . . . bN−1 ,

we can express the discrete Fourier transform as a matrix vector product:

1
0
u= Bu.
N

Theorem 4.44 The vectors bn are orthogonal. In particular,

(bn , b n ) = Nδn,n .
136 4 Frequency and Multiscale Methods

Furthermore, the Fourier transform is inverted by

N−1 2πink
un = 0
uk exp .
N
k=0

Proof The orthogonality relation of the vectors bn can be verified elementarily. In

particular, this implies that the matrix B is orthogonal and B ∗ B = N id, which
yields B −1 = N1 B ∗ ; i.e., we have

u = NB −10
u = B ∗0
u.

For the adjoint matrix B ∗ , we have

2πikl
(B ∗ )k,l = exp
N
and in particular, we obtain
3 4
B ∗ = b0 . . . bN−1 . $
#

In other words, the discrete Fourier transform expresses a vector u in terms of

the orthogonal basis

(bk )k=0,...,N−1 , i.e., u= 0
uk b k .
k

Also in the discrete case, we denote the inverse of the Fourier transform by ǔ.
The generalization to two-dimensional images is simple:
Remark 4.45 (Two-Dimensional Discrete Fourier Transform) The two-dimensional
discrete Fourier transform 0
u ∈ CN×M of u ∈ CN×M is defined by

1
M−1 N−1 −2πink −2πiml
0
uk,l = un,m exp exp
MN N M
m=0 n=0

and is inverted by

M−1
N−1 2πink 2πiml
un,m = 0
uk,l exp exp .
N M
k=0 l=0
4.3 The Discrete Fourier Transform 137

Remark 4.46 (Periodicity of the Discrete Fourier Transform) The vectors bn can
also be regarded as N-period, i.e.,
2πin(k + N) 2πink
n
bk+N = exp = exp = bkn .
N N

Furthermore, one has also bn+N = bn . In other words, for the discrete Fourier
transform, all signals are N-periodic, since we have

0
uk+N = 0
uk .

This observation is important for the interpretation of the Fourier coefficients. The
basis vectors bN−n correspond to the vectors b−n , i.e., the entries 0
uN−k with small k
correspond to the low-frequency vectors b−k . This can also be observed in Fig. 4.9:
the “high” basis vectors have low frequencies, while the “middle” basis vectors
have the highest frequencies. Another explanation for this situation is given by the
sampling theorem as well as the alias effect: when sampling with rate 1, the highest

Fig. 4.9 Basis vectors of the

discrete Fourier transform for
N = 8 (left: real part, right:
imaginary part)
138 4 Frequency and Multiscale Methods

frequency that can be represented is given by π, which corresponds to the vector

bn/2 . The higher frequencies are represented as low frequencies due to aliasing.
If we want to consider the frequency representation of a signal or an image, it
is hence reasonable to reorder the values of the discrete Fourier transform such that
the coefficient belonging to the frequency zero is placed in the center:

uk uk

0 1 2 3 4 5 6 7 k -4 -3 -2 -1 0 1 2 3 k

There is also a convolution theorem for the discrete Fourier transform. In view
of the previous remark, it is not surprising that it holds for periodic convolution:
Definition 4.47 Let u, v ∈ CN . The periodic convolution of u with v is defined by

N−1
(u v)n = vk u(n−k) mod N .
k=0

Theorem 4.48 For u, v ∈ CN ,

(u
v)n = N0
un0
vn .

Proof Using the periodicity of the complex exponential function, the equation can
be verified directly:

1
N−1 N−1 −2πink
(u
v)n = vl u(k−l) mod N exp
N N
k=0 l=0

1
N−1 −2πinl N−1
−2πin(k − l)
= vl exp u(k−l) mod N exp
N N N
l=0 k=0

vn0
= N0 un . $
#

Also in the discrete case, convolution can be expressed via the multiplication of
the Fourier transform. And here we again call the Fourier transform of a convolution
kernel the transfer function:
4.3 The Discrete Fourier Transform 139

Definition 4.49 The transfer function of a convolution kernel h ∈ R2r+1 is defined

1
r 2πink
0
hk = hn exp − .
N n=−r N

Note that the Fourier transform of a convolution kernel depends on the period N
of the signal! Furthermore, it is important that we use here the periodic boundary
extension of Sect. 3.3.3 throughout.
Example 4.50 We consider some of the filters introduced in Sect. 3.3.3.
For the moving average filter M 3 = [1 1 1]/3, the transfer function is given by

1
1 2πink
1
M k=
3 exp −
3N N
n=−1
2πik 2πik
1
= 1 + exp + exp −
3N N N

1 2πk
= 1 + 2 cos .
3N N

• For k ≈ ±N/3, the transfer function is almost 0, i.e., the corresponding

frequencies vanish from the signal.
• For k ≈ N/2, the transfer function exhibits negative values, i.e., the correspond-
ing frequencies change their sign.

M3
1
N
− N2 N
2
k
1
3N

We now consider the binomial filter B 2 = [1 2 1]/4. The transfer function is

given by
2πik 2πik
52 k = 1
B 2 + exp + exp −
4N N N

1 2πk
= 2 + 2 cos .
4N N
140 4 Frequency and Multiscale Methods

The transfer function of this filter is nonnegative throughout, i.e., no frequencies are
“flipped.”

B2
1
N

k
N 1 N
2 4N 2

In this light, Sobel filters appear more reasonable than Prewitt filters.
Remark 4.51 (Fast Fourier Transform and Fast Convolution) Evaluating the sums
directly, the discrete Fourier transform needs O(N 2 ) operations. By making use
of symmetries, the cost can be significantly reduced to O(N log2 N), cf. [97],
for instance. By means of the convolution theorem, this can be utilized for a fast
convolution; cf. Exercises 4.11 and 4.12.
Remark 4.52 The discrete Fourier transform satisfies the following symmetry
relation:

1
N−1 2πink 1
N−1 2πin(N − k)
0
uk = un exp − = un exp − =0
uN−k .
N N N N
n=0 n=0

If u is real, then one also has 0 uk = u

N−k . That is, the Fourier transform contains
redundant data – for N even, knowing 0 uk for k = 0, . . . , N/2−1 suffices, for N odd,
knowing 0 uk for k = 0, . . . , (N − 1)/2 suffices. Hence, in the case of real signals,
more quantities are computed than necessary. These unnecessary calculations can
be remedied by means of the discrete cosine transform (DCT). With
√
1/ 2 if k = 0,
λk =
1 otherwise,

we define for a real signal u ∈ RN ,

2λk
N−1
DCT(u)k = un cos kπ
N (n + 12 ) .
N
n=0

The DCT is an orthogonal transformation, and we have the inversion formula

N−1
un = IDCT(DCT(u))n = λn DCT(u)k cos N (n + 2 )
kπ 1
.
k=0
4.4 The Wavelet Transform 141

Like the discrete Fourier transform, the DCT can be computed with complexity
O(N log2 N). There are three further variants of the discrete cosine transform;
cf. [97] for details, for instance.
Application 4.53 (Compression with the DCT: JPEG) The discrete cosine
transform is a crucial part of the JPEG compression standard, which is based on
the idea of transform coding, whereby the image is transformed into a different
representation that is better suited for compression through quantization. One can
observe that the discrete cosine transform of an image typically exhibits many small
coefficients, which mostly belong to the high frequencies. Additionally, there is a
physiological observation that the eye perceives gray value variations less well for
higher frequencies. Together with further techniques, this constitutes the foundation
of the JPEG standard. For grayscale images, this standard consists of the following
steps:
• Partition the image into 8 × 8-blocks and apply the discrete cosine transform to
these blocks.
• Quantize the transformed values (this is the lossy part).
• Reorder the quantized values.
• Apply entropy coding to the resulting numerical sequence.
The compression potential of the blockwise two-dimensional DCT is illustrated in
Fig. 4.10. For a detailed description of the functioning of JPEG, we refer to [109].

4.4 The Wavelet Transform

In the previous sections, we discussed in detail that the Fourier transform yields the
frequency representation of a signal or an image. However, any information about
location is not encoded in an obvious way. In particular, a local alteration of the
signal or image at only one location results in a global change of the whole Fourier
transform. In other words, the Fourier transform is a global transformation in the
sense that the value 0u(ξ ) depends on all values of u. In several circumstances,
transformations that are “local” in a certain sense appear desirable. Before we
introduce the wavelet transform, we present another alternative to obtain “localized”
frequency information: the windowed Fourier transform.

4.4.1 The Windowed Fourier Transform

An intuitive idea for localizing the Fourier transform is the following modification.
We use awindow function g : Rd → C, which is nothing more than a function that
is “localized around the origin,” i.e., a function that for large |x|, assumes small
values. For σ > 0, two such examples are given by the following functions, which
142 4 Frequency and Multiscale Methods

u0 PSNR(u, u0 ) = 34.4db

PSNR(u, u0 ) = 31.6db PSNR(u, u0 ) = 28.0db

Fig. 4.10 Illustration of the compression potential of the two-dimensional DCT on 8 × 8 blocks.
From upper left to lower right: Original image, reconstruction based on 10%, 5% and 2% of the
DCT coefficients, respectively. The resulting artifacts are disturbing only in the last case

are normalized in L2 (Rd ):

(1 + d/2) 1 |x|2

g(x) = d d/2
χBσ (0) (x) and g(x) = d/2
exp − .
σ π (2πσ ) 2σ

We localize the function of interest u around a point t by multiplying u by g( · − t).

After that, we compute the Fourier transform of the product:
Definition 4.54 Let u, g ∈ L2 (Rd ). The windowed Fourier transform of u with
window function g is defined by

1
(Gg u)(ξ, t) = u(x)g(x − t)e−ix·ξ dx.
(2π)d/2 Rd
4.4 The Wavelet Transform 143

The windowed Fourier transform depends on a frequency parameter ξ as well

as a spatial parameter t, and there are several other possibilities to represent it, for
instance

(Gg )u(ξ, t) = F (uT−t g)(ξ )

1
= (u, Mξ T−t g)2
(2π)d/2
1
= (M−ξ u ∗ D− id g)(t).
(2π)d/2

The first alternative again explains the name “windowed” Fourier transform:
through multiplication by the shifted window g, the function u is localized prior
to the Fourier transform.
Note that the windowed Fourier transform is a function of 2d variables: Gg u :
R2d → C. If the window function g is a Gaussian function, i.e., g(x) =
(2πσ )−d/2 exp(−|x|2/(2σ )) for some σ > 0, the transform is also called the Gabor
transform.
Thanks to the previous work in Sects. 4.1.1–4.1.3, the analysis of the elementary
properties of the windowed Fourier transform is straightforward.
Lemma 4.55 Let u, v, g ∈ L2 (Rd ). Then we have Gg u ∈ L2 (R2d ) and

(Gg u, Gg v)L2 (R2d ) = g22 (u, v)2 .

Proof In order to prove the equality of the inner products, we use the isometry
property of the Fourier transform and in particular the Plancherel formula (4.2).
With Ft denoting the Fourier transform with respect to the variable t, we use one
of the alternative representations of the windowed Fourier transform as well as the
convolution theorem to obtain

Ft Gg u(ξ, · ) (ω) = Ft (2π)−d/2(M−ξ u ∗ D− id g) (ω)
= F (M−ξ u)(ω)(F D− id g)(ω)
=0
u(ω + ξ )0
g (ω).

We then obtain the assertion by computing

(Gg u, Gg v)L2 (R2d ) = (Ft (Gg u), Ft (Gg v))L2 (R2d )

= 0
u(ω + ξ )0 g (ω)0v (ω + ξ )0
g (ω) dξ dω
Rd Rd

= |0
g (ω)|2 0
u(ω + ξ )0
v (ω + ξ ) dξ dω
Rd Rd
144 4 Frequency and Multiscale Methods

g 22 (0
= 0 u,0
v )2
= g22 (u, v)2 . $
#

We thereby see that the windowed Fourier transform is an isometry, and as such,
it is inverted on its range by its adjoint (up to a constant). In particular, we have the
following.
Corollary 4.56 For u, g ∈ L2 (Rd ) with g2 = 1, we have the inversion formula

1
u(x) = Gg u(ξ, t)g(x − t)eix·ξ dξ dt for almost all x.
(2π)d/2 Rd Rd

Proof Since g is normalized, Gg is an isometry and after the previous remarks, it

remains only to compute the adjoint operator. For u ∈ L2 (Rd ) and F ∈ L2 (R2d ),
we have

(u, Gg∗ F )L2 (Rd ) = (Gg u, F )L2 (R2d )

= Gg u(ξ, t)F (ξ, t) dξ dt
R2d

1
= u(x)g(x − t)e−ix·ξ dx F (ξ, t) dξ dt
R2d (2π)d/2 Rd

1
= u(x) F (ξ, t)eix·ξ g(x − t) dξ dt dx,
Rd (2π)d/2 R2d

which implies

1
Gg∗ F (x) = F (ξ, t)eix·ξ g(x − t) dξ dt. $
#
(2π)d/2 R2d

The windowed Fourier transform is not applied in image processing on a large

scale. This is due to several reasons: On the one hand, the transformation of an
image yields a function in four variables. This leads to a large memory consumption
and is also no longer easily visually accessible. On the other hand, the discretization
of the windowed Fourier transform is not obvious, and there is no direct analogue
to the Fourier series or the discrete Fourier transform. For further discussion of the
windowed Fourier transform, we refer to [68].

4.4.2 The Continuous Wavelet Transform

Another transformation that analyzes local behavior is given by the wavelet trans-
form. This transform has found broad applications in signal and image processing, in
particular due to its particularly elegant discretization and its numerical efficiency.
4.4 The Wavelet Transform 145

√1 ψ( x−b )
g(x − t) |a| a
ψ(x)

x x
t b

Re(g(x t)e−iξ x )

Fig. 4.11 Localization during windowed Fourier transform (left) and during wavelet transform
(right)

We will follow a similar path as for the Fourier transform: we first introduce the
continuous wavelet transform as well as the wavelet series, and finally, we cover the
discrete wavelet transform.
While the windowed Fourier transform uses a fixed window in order to localize
the function of interest, the wavelet transform uses functions of varying widths,
cf. Fig. 4.11. In case of dimensions higher than one, the wavelet transform can be
defined in various ways. We here cover the one-dimensional case of real-valued
functions:
Definition 4.57 Let u, ψ ∈ L2 (R, R). For b ∈ R and a > 0, the wavelet transform
wavelet transform of u with ψ is defined by

x−b
Lψ u(a, b) = u(x) √1a ψ a dx.
R

The wavelet transform depends on a spatial parameter b and a scale parameter

a. As in the case of the windowed Fourier transform, we have several representation
possibilities by means of the inner product as well as the convolution:

1
Lψ u(a, b) = √ (u, T−b D1/a ψ)L2 (R)
a
1
= √ (u ∗ D−1/a ψ)(b).
a

In the second representation, we see a relationship to Application 3.23 (edge

detection according to Canny), since there the image of interest was convolved with
different scaled convolution kernels.
Like the windowed Fourier transform, the wavelet transform exhibits a certain
isometry property, however, with respect to a weighted measure. For this purpose,
we introduce
∞
2 da db
L ([0, ∞[ × R, a 2 ) = F : [0, ∞[ × R → R
da db
|F (a, b)|2 2 < ∞ ,
R 0 a
146 4 Frequency and Multiscale Methods

where we have omitted the usual transition to equivalence classes (cf. Sect. 2.2.2).
The inner product in this space is given by
∞ da db
(F, G) da db = F (a, b)G(a, b) .
L2 ([0,∞[×R, 2 )
a R 0 a2

Theorem 4.58 Let u, ψ ∈ L2 (R) with

∞ 0(ξ )|2
|ψ
0 < cψ = 2π dξ < ∞.
0 ξ

Then

Lψ : L2 (R) → L2 ([0, ∞[ × R, da db
a2
)

is a linear mapping, and one has

(Lψ u, Lψ v) da db = cψ (u, v)L2 (R).

L2 ([0,∞[×R, )
a2

Proof We use the inner product representation of the wavelet transform, the
calculation rules for the Fourier transform, and the Plancherel formula (4.2) to
obtain
1
Lψ u(a, b) = √ (u, T−b D1/a ψ)L2 (R)
a
1
= √ (0u, F (T−b D1/a ψ))L2 (R)
a
1 0 L2 (R)
= √ (0 u, aM−b Da ψ)
a

√
= a 0 0(aξ ) dξ
u(ξ )eibξ ψ
R
√
= a2πF −1 (0 0)(b).
uDa ψ

Now we compute the inner product of Lψ u and Lψ v, again using the calculation
rules for the Fourier transform and the Plancherel formula with respect to the
variable b:
∞
da db
(Lψ u, Lψ v) = Lψ u(a, b)Lψ v(a, b)
L2 ([0,∞[×R, da 2db ) R 0 a2
a
∞
da
= 2π aF −1 (0 0
uDa ψ)(b)F −1 (0 0
v Da ψ)(b) db 2
0 R a
4.4 The Wavelet Transform 147

∞
da
= 2π a0 0 )0
u(ξ )ψ(aξ 0 ) dξ
v (ξ )ψ(aξ
0 R a2
∞ 0
|ψ(aξ )|2
= 2π 0
u(ξ )0
v (ξ ) da dξ.
R 0 a

0(−ξ )| = |ψ
Then a change of variables and |ψ 0(ξ )| leads to
∞ 0 ∞ 0 ∞ 0
|ψ (aξ )|2 |ψ (a|ξ |)|2 |ψ (ω)|2 cψ
da = da = dω = .
0 a 0 a 0 ω 2π
Applying the Plancherel formula again yields the assertion. $
#
The condition cψ < ∞ ensures that Lψ is a continuous mapping, while cψ > 0
guarantees the stable invertibility on the range of Lψ .
Definition 4.59 The condition
∞ 0 )|2
|ψ(ξ
0 < cψ = 2π dξ < ∞ (4.3)
0 ξ
is called admissibility condition, and the functions ψ that satisfy it are called
wavelets.
The admissibility condition says in particular that around zero, the Fourier transform
of a wavelet must tend to zero sufficiently fast, roughly speaking, ψ 0(0) = 0. This
implies that the average of a wavelet vanishes.
Analogously to Corollary 4.56 for the windowed Fourier transform, we derive
that the wavelet transform is inverted on its range by its adjoint (up to a constant).
Corollary 4.60 Let u, ψ ∈ L2 (R) and cψ = 1. Then,
∞
1 da db
u(x) = Lψ u(a, b) √ ψ x−ba .
R 0 a a2

Proof Due to the normalization, we have only to compute the adjoint of the wavelet
transform. For u ∈ L2 (R) and F ∈ L2 ([0, ∞[ × R, daa 2db ),
∞ x−b da db
(Lψ u, F ) da db = u(x) √1a ψ a dx F (a, b)
L2 ([0,∞[×R,
a2
) R 0 R a2
∞ x−b da db
= u(x) F (a, b) √1a ψ a dx.
R R 0 a2
This implies
∞ x−b da db
L∗ψ F (x) = F (a, b) √1a ψ a ,
R 0 a2
which yields the assertion. $
#
148 4 Frequency and Multiscale Methods

Example 4.61 (Continuous Wavelets) For a continuous wavelet transform, the

following wavelets are employed, for instance:
Derivatives of the Gaussian function: For G(x) = e−x /2 , we set ψ(x) =
2

− dx
d 2
0(ξ ) = iξ e−ξ 2 /2 , which
G(x) = xe−x /2 . This function is a wavelet, since ψ
implies
∞ 0 )|2
|ψ(ξ
cψ = 2π dξ = π.
0 ξ

This wavelet and its Fourier transform look as follows:

1 ψ 1 |ψ |
2 2

-3 3x 3 ξ

We can express the wavelet transform by means of the convolution and obtain in
this case
1
Lψ u(a, b) = √ (u ∗ D−1/a ψ)(b)
a
1
= √ (u ∗ D−1/a G )(b)
a
√
= a(u ∗ (D−1/a G) )(b)
√ d
= a (u ∗ (D−1/a G))(b).
db
Here we can recognize an analogy to edge detection according to Canny in
Application 3.23, whereby the image was convolved with scaled Gaussian
functions as well.
A similar construction leads to the so-called Mexican hat function: ψ(x) =
√
d2
− dx 0(ξ ) = ξ 2 e−ξ 2 /2 and cψ = π/2.
2 −x 2 /2 . Here we have ψ
2 G(x) = (1 − x )e
The Mexican hat function is named after its shape:

ψ
1 |ψ |
1
2

-3 3x 3 ξ
4.4 The Wavelet Transform 149

Haar wavelet: A different kind of wavelet is given by the Haar wavelet, defined
by
⎧
⎪
⎪ if 0 ≤ x < 12 ,
⎨1
ψ(x) = −1 if 12 ≤ x < 1,
⎪
⎪
⎩0 otherwise.

It is discontinuous, but it exhibits compact support:

ψ
1
1 |ψ|
2

1x 4π 8π ξ

The wavelets that result from derivatives of the Gaussian function are very
smooth (infinitely differentiable) and decay rapidly. In particular, they (as well as
their Fourier transforms) are well localized. For a discrete implementation, compact
support would furthermore be desirable, since then, the integrals that need to be
computed would be finite. As stated above, the Haar wavelet exhibits compact
support, but it is discontinuous. In the following subsection, we will see that it is
particularly well suited for the discrete wavelet transform. In this context, we will
also encounter further wavelets.

4.4.3 The Discrete Wavelet Transform

The continuous wavelet transform is a redundant representation. The question arises

whether it would be sufficient to know the wavelet transform of a function on
a subset of [0, ∞[×R. This is actually the case for certain discrete subsets—
independent of the function and the wavelet. This will enable us to derive an
analogue to the Fourier series. It will turn out that under certain conditions, the
functions

ψj,k (x) = 2−j/2 ψ(2−j x − k) j, k ∈ Z

form an orthonormal basis of L2 (R). To comprehend this will require quite a bit
of work. We first introduce the central notion for wavelet series and the discrete
wavelet transform, namely the notion of a “multiscale analysis.”
150 4 Frequency and Multiscale Methods

Definition 4.62 (Multiscale Analysis) A sequence (Vj )j ∈Z of closed subspaces of

L2 (R) is called a multiscale analysis if it satisfies the following conditions:
Translation invariance: For all j, k ∈ Z,

u ∈ Vj ⇐⇒ T2j k u ∈ Vj .

Inclusion: For all j ∈ Z, we have

Vj +1 ⊂ Vj .

Scaling: For all j ∈ Z,

u ∈ Vj ⇐⇒ D1/2 u ∈ Vj +1 .

Trivial intersection:
6
Vj = {0}.
j ∈Z

Completeness:

Vj = L2 (R).
j ∈Z

Orthonormal
basis: There is a function φ ∈ V0 such that the functions
{Tk φ k ∈ Z} form an orthonormal basis of V0 .
The function φ is called a generator or scaling function of the multiscale analysis.
Let us make some remarks regarding this definition: The spaces Vj are translation
invariant with respect to the dyadic translations by 2j . Furthermore, they are nested
into each other and become smaller with increasing j . If we denote the orthogonal
projection onto Vj by PVj , then we have for every u that

lim PVj u = 0 and lim PVj u = u.

j →∞ j →−∞

Remark 4.63 In most cases, it is required in the definition of a multiscale analysis

only that the functions (Tk φ) form a Riesz basis, i.e., that its linear span is dense in
the space V0 and that there exist 0 < A ≤ B such that for every u ∈ V0 , one has

Au2 ≤ |(u, Tk φ)|2 ≤ Bu2 .
k∈Z

We do not cover this construction here and refer to [94, 97].

4.4 The Wavelet Transform 151

Next, we present the standard example for a multiscale analysis:

Example 4.64 (Piecewise Constant Multiscale Analysis) Let Vj be the set of
functions u ∈ L2 (R) that are constant on the dyadic intervals [k2j , (k + 1)2j [,
k ∈ Z. Translation invariance, inclusion, and scaling are obvious. The trivial
intersection of the Vj is implied by the circumstance that the zero function is
the only constant L2 -function. Completeness follows from the fact that every L2 -
function can be approximated by piecewise constant functions. As a generator of
this multiscale analysis, we can choose φ = χ[0,1[ , for instance.
The scaling property implies that the functions φj,k (x) = 2−j/2 φ(2−j x − k),
k ∈ Z, form an orthonormal basis of Vj (Exercise 4.14). In the sense of the previous
example, we can say, roughly speaking, that PVj u is the representation of u “on the
scale Vj ” and contains details of u “up to size 2j ”; cf. Fig. 4.12.
Due to the scaling property, φ does not lie only in V0 , but also in V−1 . Since the
functions φ−1,k form an orthonormal basis of V−1 , one has
√
φ(x) = hk 2φ(2x − k) (4.4)
k∈Z

with hk = (φ, φ−1,k ). Equation (4.4) is called the scaling equation, and it explains
the name scaling function for φ. The functions φj,k already remind us of the
continuous wavelet transform with discrete values a = 2j and b = 2j k. We find
the wavelets in the following construction again:
Definition 4.65 (Approximation and Detail Spaces) Let (Vj )j ∈Z be a multiscale
analysis. Let the spaces Wj be defined as orthogonal complements of Vj in Vj −1 ,
i.e.,

Vj −1 = Vj ⊕ Wj , Vj ⊥ Wj .

2−j/2 φj,k

1 φ
1 1 3 1
4 2 4

PV−1 u
PV−2 u
1 2j k 2j (k 1) u

Fig. 4.12 Representation of a function in the piecewise constant multiscale analysis. Left:
Generator function φ and φj,k for j = −1, k = 3. Right: The function u(x) = cos(πx) and
its representation in the spaces V−1 and V−2
152 4 Frequency and Multiscale Methods

The space Vj is called the approximation space to the scale j ; the space Wj is called
the detail space or wavelet space to the scale j .
The definition of the spaces Wj immediately implies
7
Vj = Wm
m≥j +1

and due to the completeness of Vj , there also follows

7
L2 (R) = Wm .
m∈Z

Furthermore, we have PVj−1 = PVj + PWj and hence

PWj = PVj−1 − PVj .

We can now represent every u ∈ L2 (R) by means of the spaces Vj and Wj in

different ways:

u= PWj u = PVm u + PWj u.
j ∈Z j ≤m

These equations justify the name “multiscale analysis”: the spaces Vj allow a
systematic approximation of functions on different scales.
Example 4.66 (Detail Spaces Wj for Piecewise Constant Multiscale Analysis) Let
us investigate what the spaces Wj look like in Example 4.64. We construct the
spaces by means of the projection PVj . For x ∈ [k2j , (k + 1)2j [, we have

(k+1)2j
PVj u(x) = 2−m u(x) dx.
k2j

In particular, this implies

PVj u = (u, φj,k )φj,k .
k∈Z

In order to obtain PWj+1 = PVj − PVj+1 , we use the scaling equation for φ. In this
case, we have

φ(x) = χ[0,1[ (x) = φ(2x) + φ(2x − 1)

√
or expressed differently, φ0,0 = (φ−1,0√+ φ−1,1 )/ 2. Slightly more generally, we
also have φj +1,k = (φj,2k + φj,2k+1 )/ 2. By splitting up even and odd indices and
4.4 The Wavelet Transform 153

using the properties of the inner product, we obtain

PWj+1 u = PVj u − PVj+1 u

= (u, φj,k )φj,k − (u, φj +1,k )φj +1,k
k∈Z k∈Z

= (u, φj,2k )φj,2k + (u, φj,2k+1 )φj,2k+1
k∈Z k∈Z
1
− (u, φj,2k + φj,2k+1 )(φj,2k + φj,2k+1 )
2
k∈Z
1
= (u, φj,2k − φj,2k+1 )(φj,2k − φj,2k+1 ).
2
k∈Z

The projection onto Wj +1 hence

√ corresponds to the expansion with respect to the
functions (φj,2k − φj,2k+1 )/ 2. This can be simplified by means of the following
notation:

ψ(x) = φ(2x) − φ(2x − 1), ψj,k (x) = 2−j/2 ψ(2−j x − k).

√
This implies ψj +1,k = (φj,2k − φj,2k+1 )/ 2, and we have

PWj u = (u, ψj,k )ψj,k ;
k∈Z

cf. Fig. 4.13. Therefore, also the spaces Wj have orthonormal bases, namely
(ψj,k )k∈Z . The function ψ is just the Haar wavelet in Example 4.61 again.
The above example shows that for piecewise constant multiscale analysis, there is
actually a wavelet (the Haar wavelet) that yields an orthonormal basis of the wavelet
spaces Wj . A similar construction also works in general, as the following theorem
demonstrates:
Theorem 4.67 Let (Vj ) be a multiscale analysis with generator φ such that φ
satisfies the scaling equation (4.4) with a sequence (hk ). Furthermore, let ψ ∈ V−1
be defined by
√
ψ(x) = 2 (−1)k h1−k φ(2x − k).
k∈Z

Then:

1. The set {ψj,k k ∈ Z} is an orthonormal basis of Wj .
2. The set {ψj,k j, k ∈ Z} is an orthonormal basis of L2 (R).
3. The function ψ is a wavelet with cψ = 2 ln 2.
154 4 Frequency and Multiscale Methods

Fig. 4.13 Scaling function

and wavelets for piecewise 2−j/2 φj,k
constant multiscale analysis
1 φ

1 2j k 2j (k + 1)

ψj,k

Proof We first verify that for all k ∈ Z, we have

(ψ, φk,0 ) = 0,
(ψ, ψk,0 ) = δ0,k

(Exercise 4.16). The first equation implies ψ ⊥ V0 , and hence we have ψ ∈ W0 .

The second equation yields the orthonormality of
the translates of ψ.
We now demonstrate that the system {ψk,0 k ∈ Z} is complete in W 0 . Due to
V−1 = V0 ⊕ W0 , it is equivalent to show that the system {φk,0 , ψk,0 k ∈ Z} is
complete in V−1 . We derive the latter by showing that φ−1,0 can be represented
by {φk,0 , ψk,0 k ∈ Z}. For this purpose, we calculate by means of the scaling
equation (4.4) and the definition of ψ:

|(φ−1,0 , ψk,0 )|2 + |(φ−1,0 , φk,0 )|2
k∈Z
2 2
= hl (φ−1,0 , φ−1,l+2k ) + (−1)l h1−l (φ−1,0 , φ−1,l+2k )

k∈Z l∈Z l∈Z
=δ0,l+2k =δ0,l+2k

= h2−2k + h21+2k = h2k .
k∈Z k∈Z

In Exercise 4.16, you will prove that in particular, k∈Z h2k = 1, and due to
φ−1,0 = 1, it follows that the system {φk,0 , ψk,0 k ∈ Z} is complete in V−1 .
The assertions 1 and 2 can now be shown by simple arguments (cf. Exercise 4.14).
For assertion 3, we refer to [94]. $
#
4.4 The Wavelet Transform 155

In the context of Theorem 4.67, we also note that the set {ψj,k j, k ∈ Z} is an
orthonormal wavelet basis of L2 (R).
Given a sequence of subspaces (Vj ) that satisfies all the further requirements
for a multiscale analysis, it is in general not easy to come up with an orthonormal
basis for V0 (another example is given in Example 4.15). However, it can be shown
that under the assumption in Remark 4.63 (the translates of φ form a Riesz basis of
V0 ), a generator can be constructed whose translates form an orthonormal basis of
V0 , cf. [94], for instance. Theorem 4.67 expresses what the corresponding wavelet
looks like. By now, a large variety of multiscale analyses and wavelets with different
properties have been constructed. Here, we only give some examples:
Example 4.68 (Daubechies Wavelets and Symlets) We consider two important
examples of multiscale analysis:
Daubechies Wavelets: The Daubechies wavelets (named after Ingrid Daubechies,
cf. [48]) are wavelets with compact support, a certain degree of smoothness and
of vanishing moments, i.e., for l = 0, . . . , k up to a certain k,
a certain number
the integrals R x l ψ(x) dx are all zero. There is a whole scale of these wavelets,
and for a given support, these wavelets are those that exhibit the most vanishing
moments, i.e., k is maximal. The so-called db2-wavelet ψ (featuring two
vanishing moments) and the corresponding scaling function φ look as follows:

φ(x)

1 ψ(x)

x
3

For the db2-wavelet, the analytical coefficients of the scaling equation are given
by
√ √ √ √
1− 3 3− 3 3+ 3 1+ 3
h0 = √ , h1 = √ , h2 = √ , h3 = √ .
8 2 8 2 8 2 8 2

For the other Daubechies wavelets, the values are tabulated in [48], for instance.
156 4 Frequency and Multiscale Methods

Symlets: Symlets also trace back to Ingrid Daubechies. They are similar to
Daubechies wavelets, but more “symmetric.” Also in this case, there is a scale of
symlets. The coefficients of the scaling equation are tabulated but not available
in analytical form. The so-called sym4-wavelet (exhibiting four vanishing
moments) and the corresponding scaling function look as follows:

1 ψ(x) φ(x)

x
7

4.4.4 Fast Wavelet Transforms

The scaling equation (4.4) and the definition of the wavelet in Theorem 4.67 are
the key to a fast wavelet transform. We recall that since {ψj,k j, k ∈ Z} forms an
orthonormal basis of L2 (R), one has

u= (u, ψj,k )ψj,k .
j,k∈Z

In other words: the wavelet decomposition of a signal u ∈ L2 (R) consists in

computing the inner products (u, ψj,k ). The following lemma shows that these
products can be calculated recursively:
Lemma 4.69 Let φ be a generator of a multiscale analysis and ψ the correspond-
ing wavelet of Theorem 4.67. Then the following equalities hold:

φj,k = hl φj −1,l+2k ,
l∈Z

ψj,k = (−1)l h1−l φj −1,l+2k .
l∈Z
4.4 The Wavelet Transform 157

Furthermore, there is an analogous relationship for the inner products:

(u, φj,k ) = hl (u, φj −1,l+2k ), (4.5)
l∈Z

(u, ψj,k ) = (−1)l h1−l (u, φj −1,l+2k ). (4.6)
l∈Z

Proof Using the scaling equation (4.4) for φ, we obtain

φj,k (x) = 2−j/2 φ(2−j x − k)

√
= hl 2 2−j/2 φ(2(2−j x − k) − l)
l

= hl φj −1,l+2k (x).
l

The equation for ψj,k can be shown analogously, and the equations for the inner
products are an immediate consequence. $
#
Starting from the values (u, φ0,k ), we now have recurrence formulas for the
coefficients on coarser scales j > 0. By means of the abbreviations

cj = ((u, φj,k ))k∈Z and d j = ((u, ψj,k ))k∈Z ,

the recurrence formulas (4.5) and (4.6) take the form

j
j −1 j
j −1
ck = hl c2k+l and dk = (−1)l h1−l c2k+l .
l∈Z l∈Z

Based on the projection PVj u, the fast wavelet transform computes the coarser
projection PVj+1 u in the approximation space Vj +1 and the wavelet component
PWj+1 u in the detail space Wj +1 . Note that in the case of a finite coefficient sequence
h, the summation processes are finite. In the case of short coefficient sequences, only
few calculations are necessary in each recursive step.
For the reconstruction, the projection PVj u is computed based on the coarser
approximation PVj+1 u and the details PWj+1 u, as the following lemma describes:
Lemma 4.70 For the coefficient sequences d j = ((u, ψj,k ))k∈Z and cj =
((u, φj,k ))k∈Z , we have the following recurrence formula:

j
j +1
j +1
ck = cl hk−2l + dl (−1)k−2l h1−(k−2l) .
l∈Z l∈Z

Proof Since the space Vj is orthogonally decomposed into the spaces Vj +1 and
Wj +1 , one has PVj u = PVj+1 u + PWj+1 u. Expressing the projections by means of
158 4 Frequency and Multiscale Methods

the orthonormal bases, we obtain

j
ck φj,k = PVj u
k∈Z

= PVj+1 u + PWj+1 u
j +1 j +1
= cl φj +1,l + dl ψj +1,l
l∈Z l∈Z
j +1
j +1

= cl hn φj,n+2l + dl (−1)n h1−n φj,n+2l
l∈Z n∈Z l∈Z n∈Z
j +1
j +1

= cl hk−2l φj,k + dl (−1)k−2l h1−(k−2l)φj,k .
l∈Z k∈Z l∈Z k∈Z

Swapping the sums and comparing the coefficients yields the assertion. $
#
In order to denote the decomposition and the reconstruction in a concise way, we
introduce the following operators:

H : 2 (Z) → 2 (Z), (H c)k = hl c2k+l ,
l∈Z

G : 2 (Z) → 2 (Z), (Gc)k = (−1)l h1−l c2k+l .
l∈Z

By means of this notation, the fast wavelet transform reads:

Input: c0 ∈ 2 (Z) and decomposition depth M.
for m = 1 to M do
Compute d m = Gcm−1 and cm = H cm−1 .
end for
Output: d 1 , . . . , d M and cM .
Remark 4.71 (Decomposition as Convolution with Downsampling) A decomposi-
tion step of the wavelet transform exhibits a structure that is similar to a convolution.
A change of indices yields

(H c)k = hn−2k cn = (c ∗ D−1 h)2k .
n∈Z

Using the abbreviation gl = (−1)l h1−l , we analogously obtain

(Gc)k = gn−2k cn = (c ∗ D−1 g)2k .
n∈Z

Hence, a decomposition step consists of a convolution with the mirrored filters as

well as a downsampling.
4.4 The Wavelet Transform 159

The fast wavelet reconstruction can be expressed by means of the adjoint

operators of H and G:

H ∗ : 2 (Z) → 2 (Z), (H ∗ c)k = hk−2l cl ,
l∈Z

G∗ : 2 (Z) → 2 (Z), (G∗ c)k = (−1)k−2l h1−(k−2l)cl
l∈Z

and thereby, the algorithm reads

Input: Decomposition depth M, d 1 , . . . , d M and cM .
for m = M to 1 do
Compute cm−1 = H ∗ cm + G∗ d m .
end for
Output: c0 ∈ 2 (Z).
Remark 4.72 (Reconstruction as Upsampling with Convolution) Also the wavelet
reconstruction exhibits the structure of a convolution. However, we need here an
upsampling. For a sequence c, we define

cl if n = 2l,
(c̃)n =
0 if n = 2l + 1.

Thereby, we obtain

(H ∗ c)k = hk−2l cl = hk−n c̃n = (c̃ ∗ h)k .
l∈Z n∈Z

By means of the abbreviation gl = (−1)l h1−l , one has analogously

(G∗ d)k = (d̃ ∗ g)k .

Schematically, decomposition and reconstruction are often depicted as a so-

called filter bank as follows:

∗D−1 h 2 cj +1 2 ∗h

cj + cj

∗D−1 g 2 d j +1 2 ∗g

Here, ↓ 2 refers to downsampling with the factor 2, i.e., omitting every second
value. Analogously, ↑ 2 denotes upsampling with the factor 2, i.e., the extension of
the vector by filling in zeros at every other place.
160 4 Frequency and Multiscale Methods

It remains to discuss how to obtain an initial sequence cJ . Let us assume that the
signal of interest u lies in a certain approximation space VJ , i.e.,

u= ckJ φJ,k ,
k∈Z

where the coefficients are given by

ckJ = 2J /2 u(x)2−J φ(2−J x − k) dx.
R

Since R 2−J φ(2−J x−k) dx = 1, we find that 2−J /2 ckJ is a weighted average of u in
a neighborhood of 2J k whose size is proportional to 2−J . For u regular (continuous,
for instance), we hence have

ckJ ≈ 2J /2 u(2J k).

If u is available at sampling points kT , it is thus reasonable to interpret the sampled

values u(kT ) with 2J = T as
√
ckJ = T u(T k).

An example of a one-dimensional wavelet decomposition is presented in Fig. 4.14.

c−4

d −4

d −5

d −6

d −7

u ≈ 2−4 c−8
1

Fig. 4.14 One-dimensional wavelet transform of a signal with the wavelet sym4 (cf. Exam-
ple 4.68). Bottom row: Signal of interest u : [0, 1] → R, sampled with T = 2−8 corresponding
to approximately 2−4 c−8 . Upper graphs: Wavelet and approximation coefficients, respectively.
Note that the jumps and singularities of the signal evoke large coefficients on the fine scales. In
contrast, the smooth parts of the signal can be represented nearly exclusively by the approximation
coefficients
4.4 The Wavelet Transform 161

For a finite signal u, the sampling results in a finite sequence. Since the discrete
wavelet transform consists of convolutions, again the problem arises that the
sequences have to be evaluated at undefined points. Also in this case, boundary
extension strategies are of help. The simplest method is given by the periodic
boundary extension: the convolutions are replaced by periodic convolutions. This
corresponds to the periodization of the wavelet functions ψj,k . A drawback of this
method is the fact that the periodization typically induces a jump discontinuity
that leads to unnaturally large wavelet coefficients at the boundary. Other methods
such as symmetric extension or zero extension are somewhat more complex to
implement; cf. [97], for instance.
The numerical cost of a decomposition or reconstruction step is proportional to
the length of the filter h and the length of the signal. For a finite sequence c0 , the
length of the sequence is cut by half due to the downsampling in every decompo-
sition step (up to boundary extension effects). Since the boundary extension effects
are of the magnitude of the filter length, the total complexity of the decomposition
of a signal of length N = 2M into M levels with a filter h of length n is given by
O(nN). The same complexity true for the reconstruction. For short filters h, this
cost is even lower than for the fast Fourier transform.

4.4.5 The Two-Dimensional Discrete Wavelet Transform

Based on an orthonormal wavelet basis {ψj,k j, k ∈ Z} of L2 (R), we can construct
an orthonormal basis of L2 (R2 ) by collecting all tensor products: The functions

(x1 , x2 ) → ψj1 ,k1 (x1 )ψj2 ,k2 (x2 ), j1 , j2 , k 1 , k 2 ∈ Z

form an orthonormal basis of L2 (R2 ).

In the same way, we can constitute a multiscale analysis of L2 (R2 ): for a
multiscale analysis (Vj ) of L2 (R), we set up the spaces

Vj2 = Vj ⊗ Vj ⊂ L2 (R2 )

that are defined by the fact that the functions

j,k : (x1 , x2 ) → φj,k1 (x1 )φj,k2 (x2 ), k = (k1 , k2 ) ∈ Z2

form an orthonormal basis of Vj2 . This construction is also called a tensor product
of separable Hilbert spaces; cf. [141], for instance.
162 4 Frequency and Multiscale Methods

In the two-dimensional case, the wavelet spaces, i.e., the orthogonal comple-
ments of Vj2 in Vj2−1 , exhibit a little more structure. We define the wavelet space
Wj2 by

Vj2−1 = Vj2 ⊕ Wj2

(where the superscripted number two in one case denotes a tensor product and in the
other just represents a name). On the other hand, Vj −1 = Vj ⊕ Wj , and we obtain

Vj2−1 = (Vj ⊕ Wj ) ⊗ (Vj ⊕ Wj )

= (Vj ⊗ Vj ) ⊕ (Vj ⊗ Wj ) ⊕ (Wj ⊗ Vj ) ⊕ (Wj ⊗ Wj ),

which by means of the notation

Hj2 = Vj ⊗ Wj , Sj2 = Wj ⊗ Vj , Dj2 = Wj ⊗ Wj

can be expressed as

Vj2−1 = Vj2 ⊕ Hj2 ⊕ Sj2 ⊕ Dj2 .

Denoting the scaling function of (Vj ) by φ and the corresponding wavelet by ψ,

we define three functions:

ψ 1 (x1 , x2 ) = φ(x1 )ψ(x2 ), ψ 2 (x1 , x2 ) = ψ(x1 )φ(x2 ), ψ 3 (x1 , x2 ) = ψ(x1 )ψ(x2 ).

For m ∈ {1, 2, 3}, j ∈ Z, and k ∈ Z2 , we set

m
ψj,k (x1 , x2 ) = 2−j ψ m (2−j x1 − k1 , 2−j x2 − k2 ).

1 k ∈ Z2 } form an orthonormal basis
Now one can prove that the functions {ψj,k

2 k ∈ Z2 } constitute an orthonormal basis of S 2 , and the
of Hj2 , the functions {ψj,k
j
functions {ψj,k k ∈ Z } are an orthonormal basis of Dj2 . Then
3 2

m

{ψj,k m = 1, 2, 3, k ∈ Z2 , j ∈ Z}

naturally forms an orthonormal basis of L2 (R2 ).

We observe that the wavelet spaces Wj2 are generated by the three wavelets ψ 1 ,
ψ , and ψ 3 together with their scalings and translations. The spaces Hj2 contain
2

the horizontal details on the scale j (i.e., the details in the x1 -direction), the spaces
Sj2 the vertical details (in the x2 -direction) and the spaces Dj2 the diagonal details;
cf. Fig. 4.15.
4.4 The Wavelet Transform 163

Fig. 4.15 Two-dimensional wavelet transform of an image by means of the Haar wavelet. The
image itself is interpreted as the finest wavelet representation in the space V0 . Based on this, the
components in the coarser approximation and detail spaces are computed

This kind of two-dimensional multiscale analysis is rather obvious and is

in particular simple to implement algorithmically: based on the approximation
coefficients cj , we compute the approximation coefficients cj +1 on a coarser scale
as well as the three detail coefficients d 1,j +1 , d 2,j +1 , and d 3,j +1 (belonging to the
spaces H j +1 , S j +1 , and D j +1 , respectively). In practice, this can be accomplished
by the concatenation of the one-dimensional wavelet decomposition along the
columns as well as the rows. Illustrated as a filter bank, this looks as follows:

∗D−1 h 2 cj +1
∗D−1 h 2
∗D−1 g 2 d 1,j +1
cj
∗D−1 h 2 d 2,j +1
∗D−1 g 2
∗D−1 g 2 d 3,j +1

A reconstruction step is performed analogously according to the scheme of

the one-dimensional case. A drawback of the tensor product approach is the
poor resolution of directions. Essentially, only vertical, horizontal, and diagonal
structures can be recognized. Other procedures are possible and are described
in [94], for instance.
164 4 Frequency and Multiscale Methods

Application 4.73 (JPEG2000) The DCT being a foundation of the JPEG stan-
dard, the discrete wavelet transform constitutes a foundation of the JPEG2000
standard. Apart from numerous further differences between JPEG and JPEG2000,
using the wavelet transform instead of the blockwise DCT is the most profound
distinction. This procedure has several advantages:
• Higher compression rates, but preserving the same subjective visual quality.
• More “pleasant” artifacts.
• Stepwise image buildup through stepwise decoding of the scales (of advantage
in transferring with a low data rate).
Figure 4.16 shows the compression potential of the discrete wavelet transform
(compare to the DCT case in Fig. 4.10). For a detailed description of JPEG2000,
we refer to [109] again.

u0 PSNR(u, u0 ) = 34.3db

PSNR(u, u0 ) = 32.0db PSNR(u, u0 ) = 29.8db

Fig. 4.16 Illustration of the compression potential of the two-dimensional wavelet transform.
From upper left to lower right: Original image, reconstruction based on 10%, 5% and 2% of the
wavelet coefficients, respectively
4.5 Further Developments 165

4.5 Further Developments

The representation of multidimensional functions, in particular of images, is

currently a highly active field of research. Some powerful tools are already
available in the form of Fourier analysis (especially Fourier series) and orthogonal
wavelet bases. As seen above, a disadvantage of two-dimensional wavelets is their
comparably poor adaptation to directional structures. For this case, newer systems
of functions try to provide a remedy. A prominent example is given by the so-
called curvelets; cf. [27], for instance. Curvelets result from a “mother function”
γ : R2 → R by scaling, rotation, and translation. More precisely, let a > 0,
θ ∈ [0, 2π[, and b ∈ R2 , and set
8 2 9 8 9
a 0 cos(θ ) − sin(θ )
Sa = , Rθ =
0 a sin(θ ) cos(θ )

in order to define
3
γa,θ,b (x) = a 2 γ (Sa Rθ x − b).

The functions γa,θ,b hence result from γ through translation, rotation, and parabolic
scaling. The continuous curvelet transform is then given by

γ u(a, θ, b) = u(x)γa,θ,b (x) dx;
R2

cf. [96] as well. It is remarkable about the construction of the curvelets that they
allow a discretization that nearly results in an orthonormal basis, cf. [27]. Apart from
that, curvelets are in a certain sense nearly optimally suited to represent functions
that are piecewise twice continuously differentiable and whose discontinuities occur
on sets that can be parameterized twice continuously differentiably. A curvelet
decomposition and reconstruction can be implemented efficiently, cf. [28]. In
comparison to Fourier or wavelet decompositions, however, it is still quite involved.
Another ansatz is given by the so-called shearlets; cf. [90], for instance. These
functions are also based on translations and parabolic scalings. In contrast to
curvelets, however, shearings are used instead of rotations. Specifically, for a, s >
0, we set
8 98 9 8 √ 9
1s a 0 a as
Ma,s = √ = √ ,
01 0 a 0 a

and for b ∈ R2 as well as a mother function ψ, we define

3
ψa,s,b (x) = a − 4 ψ(Ma,s
−1
(x − b)),
166 4 Frequency and Multiscale Methods

which yields the continuous shearlet transform by

Sψ u(a, s, b) = u(x)ψa,s,b (x) dx.
R2

Also, the shearlet transform allows a systematic discretization and can be imple-
mented by efficient algorithms.
Apart from curvelets and shearlets, there are numerous other procedures to
decompose images into elementary components that reflect the structure of the
image as well as possible: Ridgelets, edgelets, bandlets, brushlets, beamlets, or
platelets are just a few of these approaches. In view of the abundance of “-lets,”
some people also speak of ∗-lets (read “starlets”).

4.6 Exercises

Exercise 4.1 (Commutativity of Modulation) For ξ, y ∈ Rd and A ∈ Rd×d ,

prove the following commutativity relation for translation, modulation and linear
coordinate transformation:

Mξ Ty = eiξ ·y Ty Mξ , Mξ DA = DA MAT ξ .

Exercise 4.2 Elaborate the proof of Lemma 4.8.

Exercise 4.3 (The sinc Function as Fourier Transform) Show that the Fourier
transform of the characteristic function χ[−B,B] is given by
2
2
F χ[−B,B] (ξ ) = B sinc( Bξ
π ).
π

Exercise 4.4 (Fourier Transform and Semigroup Property)

1. Compute the Fourier transform of f : R → R, defined by

1
f (x) = .
1 + x2

2. Show that for the family of functions (fa )a>0 defined by

1 1
fa (x) = ,
aπ 1 + x 2
a 2

we have the following semigroup property with respect to convolution:

fa ∗ fb = fa+b .
4.6 Exercises 167

3. Show that the family of scaled Gaussian functions ga : Rd → R, defined by

1 |x|2
− 4a
ga (x) = e
(4πa)d/2

also satisfies the following semigroup property:

ga ∗ gb = ga+b .

Exercise 4.5 (Convergence in Schwartz Space) Let γ ∈ Nd be a multi-index. Show

∂γ
γ : S(R ) → S(R ) is continuous.
that the derivative operator ∂x d d

Exercise 4.6 (Fourier Transform of a Distribution) Show that the mapping

k (φ) = φ (k) (0) that assigns to a Schwartz function φ the value of its kth derivative
at 0 is tempered distribution and compute its Fourier transform.

Transform of Polynomials) Show that every polynomial p :

Exercise 4.7 (Fourier
R → R, p(x) = nk=0 an x n induces a regular distribution and compute its Fourier
transform.
Exercise 4.8 (Fourier Transform and Weak Derivative) Let u ∈ L2 (Rd ), α ∈ Nd
a multi-index, and pα u ∈ L2 (Rd ). Show the following relationship between the
Fourier transform of pα u and the weak derivative of the Fourier transform:

F (pα u) = i|α| ∂ α F (u).

Exercise 4.9 (Regarding Equivalent Sobolev Norms, cf. Theorem 4.29) Let k, d ∈
N and define f, g : Rd → R by

f (ξ ) = |ξ α |2 , g(ξ ) = (1 + |ξ |2 )k .
|α|≤k

Show that there exist constants c, C > 0 (which may depend on k and d) such that

cf ≤ g ≤ Cf.

[Hint:] One has the following multinomial theorem:

k!
(x1 + · · · + xd )k = xα .
α!
|α|=k

Exercise 4.10 (Smoothness of Characteristic Functions) Show that the function

u(x) = χ[−1,1] (x) lies in the space H s (R) for every s ∈ [0, 1/2[. What is the
situation in the limiting case s = 1/2?
Addition: For which s ∈ R and d ∈ N does the delta distribution lie in the
Sobolev space H s (Rd )?
168 4 Frequency and Multiscale Methods

Exercise 4.11 (Recursive Representation of the Discrete Fourier Transform)

1. Let N be an even number. Show that the elements of the vector 0
y have the
following representations of even entries,

1
N/2−1
− 2π ikn
y2k =
0 yn + yn+N/2 e N/2 ,
N
n=0

and odd entries,

1 − 2π in
N/2−1
− 2π ikn
0
y2k+1 = e N yn − yn+N/2 e N/2 .
N
n=0

(That is, the values 0

y2k result from the Fourier transform of the N/2-periodic
signal yn + yn+N/2 and the values 0 y2k+1 are obtained by the Fourier transform
in
− −2π
of the N/2-periodic signal e N yn − yn+N/2 .)
2. Show that there exists an algorithm that computes the discrete Fourier transform
of a signal of length N = 2n by means of O(n2n ) operations (in comparison to
O(2n 2n ) operations for a direct implementation of the sums).
Exercise 4.12 (Fast Convolution with the Discrete Fourier Transform) The convo-
lution of u, v ∈ CZ is defined by

(u ∗ v)k = uk vn−k .
n∈Z

The support of u ∈ CZ is given by

supp u = {k ∈ Z uk = 0}.

1. Let u, h ∈ CZ with supp u = {0, . . . , N − 1} and supp h = {−r, . . . , r}. Then,

supp u ∗ v ⊂ {−r, . . . , N + r − 1} (why?). Develop and implement an algorithm
fftconv that computes the convolution of u and v on the whole support by
means of the Fourier transform. Input: Two vectors u ∈ CN and h ∈ C2r+1 .
Output: The result w ∈ CN+2r of the convolution of u and v.
2. Analogously to the previous task, develop and implement a two-dimensional fast
convolution fft2conv. Input: A grayscale image u ∈ RN×M and a convolution
kernel h ∈ R2r+1×2s+1 . Output: The convolution u ∗ h ∈ RN+2r,M+2s .
3. What is the complexity for the fast convolution in contrast to the direct evaluation
of the sums? For which sizes of u and h does the fast convolution pay off in this
regard?
4.6 Exercises 169

4. Test the algorithm fft2conv with convolution kernels of your choice. Compare
the results and execution times with an implementation of the direct calculation
of the sums according to Sect. 3.3.3 (also in the light of the complexity estimates).
Exercise 4.13 (Overlap-Add Convolution) For a signal u ∈ CN and a convolution
kernel h ∈ CM that is significantly shorter than the signal, the convolution of u with
h can be computed considerably more efficiently than in Exercise 4.12. For M a
factor of N, we partition u into N/M blocks of length M:

N/M−1
u(r−1)M+n if 0 ≤ n ≤ M − 1,
un = urn−rM with urn =
r=0 0 otherwise.

u u ...u u u . . . u2M−1 . . .

0 1 M−1
M M+1
u1 u2

(without explicitly mentioning it, we have considered all vectors to be vectors on

the whole of Z by zero-extension.) The convolution of u with h can now be realized
by means of a fast convolution of the parts ur with h: we compute v r = ur ∗ h
(which according to Exercise 4.12, requires O(M log M) operations) and due to the
linearity of the convolution, we obtain

N/M−1
u∗h = vr .
r=0

Note that v r and v r+1 overlap. This procedure is also called the overlap-add
method.
1. Show that the complexity of the overlap-add convolution is given by
O(N log M).
2. Develop, implement, and document an algorithm fftconv_oa that computes
the convolution of u and v on the whole support by means of the overlap-add
method. Input: Two vectors u ∈ CN and h ∈ CM where M is a factor of N.
Output: The result w ∈ CN+M−1 of the convolution of u and v.
3. For suitable test examples, compare the results and execution times of your
algorithm with those of the algorithm fftconv in Exercise 4.12.
Exercise 4.14 (Scaled Bases of a Multiscale Analysis)
Let (Vj ) be a multiscale
analysis with generator φ. Show that the set {φj,k k ∈ Z} forms an orthonormal
basis of Vj .
Exercise 4.15 (Multiscale Analysis of Bandlimited Functions) Let

Vj = {u ∈ L2 (R) supp 0
u ⊂ [−2−j π, 2−j π]}.
170 4 Frequency and Multiscale Methods

1. Show that (Vj ) together with the generator φ(x) = sinc(x) forms a multiscale
analysis of L2 (R).
2. Determine the coefficient sequence (hk ) with which φ satisfies the scaling
equation (4.4) and calculate the corresponding wavelet.
Exercise 4.16 (Properties of ψ in Theorem 4.67) Let φ : R → R be the generator
of a multiscale analysis and let ψ be defined as in Theorem 4.67.
1. Show that the coefficients (hk ) of the scaling equation are real and satisfy the
following condition: For all l ∈ Z,

1 if l = 0,
hk hk+2l =
k∈Z 0 if l = 0.

(Use the fact that the function φ is orthogonal to the functions φ( · + m) for
m ∈ Z, m = 0.)
2. Show:

1 if l = 0,
(a) For all l ∈ Z, (ψ, ψ( · − l)) =
0 if l = 0.
(b) For all l ∈ Z, (φ, ψ( · − l)) = 0.
Chapter 5
Partial Differential Equations in Image
Processing

Our first encounter with a partial differential equation is this book was Applica-
tion 3.23 on edge detection according to Canny: we obtained a smoothed image by
solving the heat equation. The underlying idea was that images contain information
on different spatial scales and one should not fix one scale a priori. The perception of
an image depends crucially on the resolution of the image. If you consider a satellite
image, you may note the shape of coastlines or mountains. For an aerial photograph
taken from a plane, these features are replaced by structures on smaller scales such
as woods, settlements, or roads. We see that there is no notion of an absolute scale
and that the scale depends on the aims of the analysis. Hence, we ask ourselves
whether there is a mathematical model of this concept of scale. Our aim is to develop
a scale-independent representation of an image. This aim is the motivation behind
the notion of scale space and the multiscale description of images [88, 98, 144].
The notion “scale space” does not refer to a vector space or a similar structure,
but to a scale space representation or multiscale representation: for a given image
u0 : → R one defines a function u : [0, ∞[ × → R, and the new positive
parameter describes the scale. The scale space representation for scale parameter
equal to 0 will be the original image:

u(0, x) = u0 (x).

For larger scale parameters we should obtain images on “coarser scales.” We could
also view the introduction of the new scale variable as follows: We consider the
original image u0 as an element in a suitable space X (e.g., a space of functions
→ R). The scale space representation is then a map u : [0, ∞[ → X, i.e. a path
through the space X. This view is equivalent to the previous (setting u(0) = u0 and
“u(σ )(x) = u(σ, x)”).

© Springer Nature Switzerland AG 2018 171

K. Bredies, D. Lorenz, Mathematical Image Processing,
Applied and Numerical Harmonic Analysis,
https://doi.org/10.1007/978-3-030-01458-2_5
172 5 Partial Differential Equations in Image Processing

We can imagine many different representations of an image on different scales.

Starting from the methods we already know, we can define:
Convolution with scaled kernels: As in Application 3.23 we can define, for an
image u0 ∈ L2 (Rd ) and a kernel h ∈ L2 (Rd ),

u(σ, x) = (u0 ∗ σ −d Dσ −1 id h)(x).

Hence, the function u : [0, ∞[ × Rd → R consists of convolutions of u with

kernels y → σ −d h(σ −1 y) that are obtained by scaling a given kernel to different
widths but keeping their integrals constant. This is similar to what happens in the
continuous wavelet transformation.
Alternatively, we could define u(σ ) = (u0 ∗ σ −d Dσ −1 id h) and obtain u :
[0, ∞[ → L2 (Rd ).
Morphological operators with scaled structure elements: For a bounded
image u0 : Rd → R and a structure element B ⊂ Rd we can define

u(σ, x) = (u0 * σ B)(x) and u(σ, x) = (u0 ⊕ σ B)(x) respectively.

An alternative way would be to set u(σ ) = (u0 * σ B) and u(σ ) = (u0 ⊕ σ B),
respectively, to get u : [0, ∞[ → B(Rd ).
One could easily produce more examples of scale spaces but one should ask, which
of these are meaningful. There is an axiomatic approach to this question from [3],
which, starting from a certain set of axioms, arrives at a restricted set of scale spaces
or multiscale analyses. In this chapter we will take this approach, which will lead
us to partial differential equations. This allows for a characterization of scale spaces
and to build further methods on top of these.

5.1 Axiomatic Derivation of Partial Differential Equations

The idea behind an axiomatic approach is to characterize and build methods for
image processing by specifying certain obvious properties that can be postulated. In
the following we will develop a theory, starting from a fairly small set off axioms,
and this theory will show that the corresponding methods indeed correspond to the
solution of certain partial differential equations. This provides the foundation of the
widely developed theory of partial differential equations in image processing. The
starting point for scale space theory is the notion of multiscale analysis according
to [3].
5.1 Axiomatic Derivation of Partial Differential Equations 173

5.1.1 Scale Space Axioms

We start with the notion of scale space. Roughly speaking, a scale space is a family
of maps. We define the following spaces of functions:

Cb∞ (Rd ) = {u : Rd → R u ∈ C ∞ (Rd ), ∂ α u bounded for all α ∈ Nd },

BC(Rd ) = {u : Rd → R u ∈ C 0 (Rd ), u bounded}.

Definition 5.1 A scale space is a family of transformations (Tt )t ≥0 with the

property that

Tt : Cb∞ (Rd ) → BC(Rd ) for all t ≥ 0.

We call t ≥ 0 the scale.

This definition is very general, and without further assumptions we cannot interpret
the notion of scale any further. The intuition should be that Tt u represents an image
u on all possible “scales,” and with larger t contains only coarser elements of the
image. This intuition is reflected in the following axioms, which we group into three
different classes:
Architectural axioms: These axioms provide the foundation and mainly model
the basic intuition behind a “scale space.”
Stability: The axioms that provides stability will be the so-called comparison
principle, also called the maximum principle.
Morphological axioms: These axioms describe properties that are particular for
the description of images. They say how shapes should behave in a scale space.
We use the abbreviation X = Cb∞ (Rd ) and postulate:

Architectural Axioms

Recursivity: (Semigroup property)

For all u ∈ X, s, t ≥ 0 :
T0 (u) = u, [REC]
Ts ◦ Tt (u) = Ts+t (u).

The concatenation of two transforms in a scale space should give another trans-
formation of said scale space (associated with the sum of the scale parameters).
This implies that one can obtain Tt (u) from any representation Ts (u) with s < t.
Hence, Ts (u) contains all information to generate the coarser representations
174 5 Partial Differential Equations in Image Processing

Tt (u) in some sense. Put differently, the amount of information in the images
decreases. Moreover, one can calculate all representations on an equidistant
discrete scale by iterating a single operator Tt /N .
There is small a technical problem: the range of the operators Tt may not be
contained in the respective domains of definition. Hence, the concatenation of Tt
and Ts is not defined in general. For now we resort to saying that [REC] will be
satisfied whenever Ts ◦ Tt (u) is defined. In Lemma 5.8 we will see a more elegant
solution of this problem.
Regularity:

For all u, v ∈ X and h, t ∈ [0, 1], there exists C(u, v) > 0 :

[REG]
Tt (u + hv) − Tt u − hv∞ ≤ C(u, v)ht.

This is an assumption on the boundedness of difference quotients in the direction

v. In the case of linear operators Tt , this becomes

For all v ∈ X and t ∈ [0, 1] there exists C(v) > 0 : Tt v − v∞ ≤ C(v)t.

Locality:

For all u, v ∈ X, x ∈ Rd with ∂ α u(x) = ∂ α v(x) for all α ∈ Nd ,

[LOC]
(Tt u − Tt v)(x) = o(t) for t → 0.

Roughly speaking this axiom says that the value Tt u(x) depends only on the
behavior of u in a neighborhood of x if t is small.

Stability

Comparison principle: (Monotonicity)

For all u, v ∈ X with u ≤ v one has

[COMP]
Tt u ≤ Tt v.

If one image is brighter than another, this will be preserved under the scale space.
If Tt is linear, this is equivalent to Tt u ≥ 0 for u ≥ 0.

Morphological Axioms

The architectural axioms and stability do not say much about what actually happens
to images under the scale space. The morphological axioms describe properties that
are natural from the point of view of image processing.
5.1 Axiomatic Derivation of Partial Differential Equations 175

Gray-level-shift invariance:

For all t ≥ 0, c ∈ R, u ∈ X, one has

Tt (0) = 0 [GLSI]
Tt (u + c) = Tt (u) + c.

This axiom says the one does not have any a priori assumption on the range of
gray values of the image.
Gray-scale invariance: (also contrast invariance; contains [GLSI], but is
stronger)

For all t ≥ 0, u ∈ X, h : R → R nondecreasing and h ∈ C ∞ (R), one has

Tt h(u) = h Tt (u) .
[GSI]

The map h rescales the gray values but preserves their order. The axiom says
that the scale space will depend only on the shape of the levelsets and not on the
contrast.
Translation invariance:

For all t ≥ 0, u ∈ X, h ∈ Rd , one has

[TRANS]
Tt (Th u) = Th (Tt u).

All points in Rd are treated equally, i.e., the action of the operators Tt does not
depend on the location of objects.
Isometry invariance:

For all t ≥ 0, u ∈ X, and all orthonormal transformations R ∈ O(Rd ), one has

Tt (DR u) = DR (Tt u)
[ISO]
Scale invariance:

For all λ ∈ R and t ≥ 0 there exists t (t, λ) ≥ 0 such that

[SCALE]
Tt (Dλ idu) = Dλ id (Tt u) for all u ∈ X.

In some sense, the scale space should be invariant with respect to zooming of
the images. Otherwise, it would depend on unknown distance of the object to the
camera.
176 5 Partial Differential Equations in Image Processing

5.1.2 Examples of Scale Spaces

The notion of scale space and its axiomatic description contains a broad class of the
methods that we already know. We present some of these as examples.
Example 5.2 (Coordinate Transformations) Let A ∈ Rd×d be a matrix. Then the
linear coordinate transformation

(Tt u)(x) = u exp(At)x (5.1)

is a scale space.
The operators Tt are linear and the scale space satisfies the axioms [REC], [REG],
[LOC], [COMP], [GSI], and [SCALE], but not [TRANS] or [ISO] (in general).
This can be verified by direct computation. We show two examples. For [REC]: Let
s, t ≥ 0. Then for ut (x) = (Tt u)(x) = u exp(At)x , one has

(Ts Tt u)(x) = (Ts ut )(x) = ut exp(As)x = u exp(As) exp(At)x

= u exp(A(s + t))x = (Ts+t u)(x) .

Moreover, exp(A0) = id, and hence (T0 u) = u. For [LOC] we consider the Taylor
expansion for u ∈ X:

u(y) = u(x) + (y − x) · ∇u(x) + O(|x − y|2 ).

Hence, we obtain for u, v ∈ X with ∂ α u(x) = ∂ α v(x) and y = exp(At):

(Tt u − Tt v)(x) = u(exp(At)x) − v(exp(At)x)

= O(|(exp(At) − id)x|2 ) ≤ O(exp(At) − id2 ).

The properties of the matrix exponential imply that exp At − id = O(t) and thus
(Tt u − Tt v)(x) = o(t).
In general, one gets semigroups of transformations also by solving ordinary dif-
ferential equations. We consider a vector field v ∈ C ∞ (Rd , Rd ). The corresponding
integral curves j are defined by

∂t j (t, x) = v j (t, x) , j (0, x) = x.

A scale space is then given by

(Tt u)(x) = u j (t, x) . (5.2)
5.1 Axiomatic Derivation of Partial Differential Equations 177

In the special case v(x) = Ax this reduces to the above coordinate transfor-
mation (5.1). Analogously, this class of scale space inherits the listed properties
of (5.1). The action of (5.1) and (5.2) on an image is shown in Fig. 5.1.

Example 5.3 (Convolution with Dilated Kernels) Let ϕ ∈ L1 (Rd ) be a convolution

kernel with Rd ϕ(x) dx = 1, and τ : [0, ∞[ → [0, ∞[ a continuous and increasing
time scaling withτ (0) = 0 and limt →∞ τ (t) = ∞. We define the dilated kernels
ϕt (x) = τ (t)−d ϕ τ (t
x
) and define the operators Tt by

u ∗ ϕt if t > 0,
(Tt u) = (5.3)
u if t = 0,

and obtain a scale space. Its action is shown in Fig. 5.2.

The operators are, like those in Example 5.2, linear, and we check which axioms
are satisfied. We go a little more deeply into detail and discuss every axiom.
[REC]: In general it is not true that ϕt ∗ ϕs = ϕt +s for all s, t > 0, which would
be equivalent to [REC]. However, Exercise 4.4 shows that this can indeed be
satisfied, as shown by the Gaussian kernel

1 − |x|4
2 √
ϕ(x) = e , τ (t) = t.
(4π)d/2

Hence, the validity of this axiom depends crucially on the kernel and the time
scaling.
[REG]: To check this axiom, we assume further properties of the kernel and the
time scaling:

√
xϕ(x) dx = 0, |x|2|ϕ(x)| dx < ∞, τ (t) ≤ C t . (5.4)
Rd Rd

Linearity of the convolution shows that

Tt (u + hv) − Tt u − hv = hTt v − hv = h(Tt − id)v.

By the first condition in (5.4), we get, for all z ∈ Rd ,

z · yϕt (y) dy = 0.
Rd

The Taylor approximation up to second order of v shows that

|v(y) − v(x) − ∇v(x) · (x − y)| ≤ C∇ 2 v∞ |x − y|2 .

178 5 Partial Differential Equations in Image Processing

−1 2
A=
2 −3

Original image u

Application of (5.1) for some scales t

Original image u v

Application of (5.2) for some scales t

Fig. 5.1 Example of a scale space given by a coordinate transformation. The first row shows the
image and the matrix that generates the scale space according to (5.1), as show in the second
row. Similarly, the third row shows the image and the vector field, and the fourth row shows the
applications of Tt according to (5.2)
5.1 Axiomatic Derivation of Partial Differential Equations 179

Original image u t=1

t=5 t = 50

Fig. 5.2 Illustration of multiscale convolution with a Gaussian kernel on different scales

We combine the two observations and get (using a generic constant C)

|(Tt v − v)(x)| = v(y) − v(x) − ∇v(x) · (y − x) ϕt (x − y) dy
Rd

≤ C∇ 2 v∞ |x − y|2 ϕt (x − y) dx
Rd

|x|2 x
≤C ϕ dx = C τ (t)2 |x|2 |ϕ(x)| dx
Rd τ (t)d τ (t) Rd

= Cτ (t)2 ≤ Ct .

Since this estimate is independent of x, we see that [REG] holds.

With assumptions for higher moments of the convolution kernel ϕ one could use
a similar argument to allow time scalings of the form τ (t) ≤ Ct 1/n .
180 5 Partial Differential Equations in Image Processing

[LOC]: To check locality, we can use linearity and restrict our attention to the
case of u with ∂ α u(x) = 0 for all α. Without loss of generality we can also
assume x = 0. Moreover, we assume that the kernel ϕ and the time scaling τ
satisfy the assumptions in (5.4) and, on top of that, also

|ϕ(x)| dx ≤ CR −α , α > 2.
|x|>R

Now we estimate:

|(Tt u)(0)| = u(y)ϕt (y) dy
Rd

≤ sup |∇ 2 u(y)| |y|2 |ϕt (y)| dy + u∞ |ϕt (y)| dy
|y|≤ε |y|≤ε |y|>ε

≤ sup |∇ 2 u(y)|τ (t)2 |x|2 |ϕ(x)| dx
|y|≤ε Rd

+ Cu∞ |ϕ(x)| dx
|x|≥ε/τ (t )

≤ C sup |∇ 2 u(y)|t + ε−α t α/2 .
|y|≤ε

Now let δ > 0 be given. Since ∇ 2 u(0) = 0 and ∇ 2 u is continuous, we see that
sup|y|≤ε |∇ 2 u(y)| → 0 for ε → 0. Hence, we can choose ε > 0 small enough to
ensure that

C sup |∇ 2 u(y)|t ≤ δ2 t.
|y|≤ε

Now for t small enough, we have

Cε−α t α/2 ≤ δ2 t.

Putting things together, we see that for every δ > 0, we have (if t is small enough)

|(Tt u)(0)| ≤ δt

and this means that |(Tt u)(0)| = o(t), which was our aim.
[COMP]: Again we can use the linearity of Tt to show that this axiom is satisfied:
It is enough to show Tt u ≥ 0 for all u ≥ 0. This is fulfilled if ϕ ≥ 0 almost
everywhere, since then,

(Tt u)(x) = u(y) ϕt (x − y) dx ≥ 0 .
Rd

≥0 ≥0 a.e.

≥0 a.e.
5.1 Axiomatic Derivation of Partial Differential Equations 181

[GLSI]: A simple calculation shows that gray-value shift invariance is satisfied:

Tt (u + c) (x) = (u + c) ∗ ϕt (x) = (u ∗ ϕt )(x) + cϕt (y) dy
Rd

= (u ∗ ϕt )(x) + c = (Tt u)(x) + c.

[GSI]: Gray-value scaling invariance is not satisfied for convolution operators. It

is simple to construct counterexamples (Exercise 5.2).
[TRANS]: Earlier consideration already showed that convolution with ϕt is a
translation invariant operation, and hence every Tt , is translation invariant as
well.
[ISO]: If ϕ is rotationally invariant, then so is ϕt , and we see that for every
rotation R, we have

Tt (DR u) (x) = u(Ry)ϕt (x − y) dy = u(z)ϕt R T (Rx − z) dz
Rd Rd

= u(z)ϕt (Rx − z) dz = DR (Tt u) (x) .
Rd

Hence, isometry invariance holds for kernels that are rotationally invariant.
[SCALE]: For λ ≥ 0 we can write
x − y
1
Tt (Dλ id )u (x) = u(λy) ϕ dy
Rd τ (t)d τ (t)
λx − z
1
= u(z) d ϕ dz = Dλ id(Tt u) (x),
Rd λτ (t) λτ (t)

where t is chosen such that τ (t ) = λτ (t) (this is possible, since τ : [0, ∞[ →

[0, ∞[ is bijective by assumption). For λ ≤ 0 one argues similarly, and hence
scale invariance is satisfied.
Example 5.4 (Erosion and Dilation) For a nonempty structure element B ⊂ Rd
we define tB = {ty | y ∈ B}. Based on this scaling we construct a scale space
related to B as

(Tt u) = u ⊕ tB . (5.5)

Similarly one could define a multiscale erosion related to B; see Fig. 5.3. We restrict
our attention to the case of dilation.
In contrast to Examples 5.2 and 5.3 above, the scale space in (5.5) is in general
nonlinear. We discuss some axioms in greater detail:
[REC]: The axiom [REC] is satisfied if B is convex. You should check this in
Exercise 5.4.
182 5 Partial Differential Equations in Image Processing

Original image u

t=2 t=4 t=6

Fig. 5.3 Example of multiscale erosion (second row) and dilation (third row). The structure
element is an octagon centered at the origin

[REG]: We estimate

((u + hv) ⊕ tB)(x) − (u ⊕ tB)(x) − hv(x)

3 4 3 4
= sup u(x + y) + hv(x + y) − sup u(x + y) − hv(x)
y∈tB y∈tB
3 4
≤ sup u(x + y) + hv(x + y) − u(x + y) − hv(x)
y∈tB
3 4
≤ h sup v(x + y) − v(x) .
y∈tB
5.1 Axiomatic Derivation of Partial Differential Equations 183

A similar computation with −u instead of u and −v instead of v shows that

3 4
((u + hv) ⊕ tB)(x) − (u ⊕ tB)(x) − hv(x) ≥ −h sup v(x + y) − v(x) .
y∈t B

If we assume that B is bounded, the assertion follows by Lipschitz continuity

of v. We conclude that The multiscale dilation satisfied the axiom [REG] if the
structure element is bounded.
[LOC]: For locality we estimate as follows:
3 4 3 4
(u ⊕ tB)(x) − (v ⊕ tB)(x) = sup u(x + y) − sup v(x + y)
y∈t B y∈t B
3 4
≤ sup u(x + y) − v(x + y) .
y∈t B

Replacing u by −u and v by −v, we obtain

3 4
(u ⊕ tB)(x) − (v ⊕ tB)(x) ≥ sup v(x + y) − u(x + y) .
y∈t B

If we assume again that B is bounded and ∂ α u(x) = ∂ α v(x) for |α| ≤ 1, we get
3 4 3 4
sup u(x + y) − v(x + y) = o(t) and sup v(x + y) − u(x + y) = o(t).
y∈t B y∈t B

Again we deduce that multiscale dilation satisfies the axiom [LOC] if the
structure element is bounded.
[COMP]: We have already seen that the comparison principle is satisfied in
Theorem 3.29 (under the name “monotonicity”).
[TRANS]: Translation invariance has also been shown in Theorem 3.29.
[GSI]: Gray-scale invariance has been shown in Theorem 3.31 under the name
“contrast invariance.”
[ISO]: Isometry invariance is satisfied if the structure element is invariant under
rotations.
[SCALE]: Scale invariance can be seen as follows:

(Dλ id u ⊕ tB)(x) = sup u(λ(x + y)) = sup u(λx + z) = Dλ id(u ⊕ λtB)(x).

y∈t B z∈λt B

Hence, scale invariance holds for a symmetric structure element (i.e., −B = B)

with t = |λ|t.
184 5 Partial Differential Equations in Image Processing

Example 5.5 (Fourier Soft Thresholding) A somewhat unusual scale space is given
by the following construction. Let St be the operator that applies the complex soft
thresholding function

|St (x)|
x
|x| |x| − t if |x| > t,
St (x) =
0 if |x| ≤ t,
|x|
t

pointwise, i.e., St (u) (x) = St u(x) . We apply soft thresholding to the Fourier
transform, i.e.,

Tt (u) = F −1 St (F u) . (5.6)

Since all symmetries in Corollary 4.6 are preserved by pointwise thresholding, Tt u

is again a real-valued function if it is defined. Here is a technical problem: unfor-
tunately, strictly speaking, (Tt )t ≥0 is not a scale space according to Definition 5.1,
since the Fourier transform of a function in Cb∞ (Rd ) is merely a distribution, and
hence pointwise operations are not defined. One can show that (5.6) can be defined
on L2 (Rd ). In that sense Fourier soft thresholding is a scale space on L2 (Rd ).
We will not check which axioms are satisfied and only mention that Fourier
soft thresholding is a nonlinear semi-group. Locality, translation, scaling and gray-
level-shift invariance are not satisfied; only isometry invariance can be deduced.
An illustration of the effect of Fourier soft thresholding is shown in Fig. 5.4. One
may notice that the images reduce to the dominant oscillating structures (regardless
of their frequency) with increasing scale parameter. (This is in contrast to “coarse
scale structure” in the case of dilated convolution kernels.)
Similarly to Fourier soft thresholding one can define wavelet soft thresholding.
We use the two-dimensional discrete wavelet transform from Sect. 4.4.3
to generate the approximation coefficients cJ and the detail coefficients
d 1,J , d 2,J , d 3,J , . . . d 1,1 , d 2,1 , d 3,1 from an image u. Then we apply the soft
thresholding function with parameter t to the detail coefficient and reconstruct.
This scale space has similar properties to those of Fourier soft thresholding, but
also isometry invariance is not satisfied. This scale space is widely used in practice
to denoise images [32, 35, 54, 139]. Fourier soft thresholding is also suited for
denoising. Figure 5.5 compares both methods. Wavelet soft thresholding leads to
slightly higher PSNR values than Fourier soft thresholding, and also the subjective
impression seems to be a little bit better.

Example 5.6 (Solutions of Partial Differential Equations of Second Order) Let F :

Rd × R × Rd × Rd×d → R be a smooth function and consider the Cauchy problem

∂t u = F x, u(x), ∇u(x), ∇ 2 u(x) , u(0) = u0 . (5.7)
5.1 Axiomatic Derivation of Partial Differential Equations 185

Original image u t = 50

t = 100 t = 150

Fig. 5.4 Illustration of Fourier soft thresholding from Example 5.5 and different scales t

If we assume that there exists a unique solution of (5.7) for every initial value u0 ∈
Cb∞ (Rd ), then we can define

Tt u0 = u(t, ·),

which is (obviously) a scale space.

The next section will show that many of our previous examples can be written,
in some sense, as solutions of (5.7). This is a central result in scale space theory and
illustrates the central role of partial differential equations in image processing. Also,
this shows that partial differential equations of this type are among the most general
type of multiscale analyses.
186 5 Partial Differential Equations in Image Processing

noisy wavelet soft thresholding Fourier soft thresholding

PSNR(u, u† ) = 14.6db PSNR(u, u† ) = 24.6db PSNR(u, u† ) = 24.2db

PSNR(u, u† ) = 20.2db PSNR(u, u† ) = 27.7db PSNR(u, u† ) = 27.2db

Fig. 5.5 Denoising by Fourier and wavelet soft thresholding from Example 5.5. The parameter t
has been chosen to maximize the PSNR

5.1.3 Existence of an Infinitesimal Generator

The axioms of a scale space immediately imply a number of further properties:

Lemma 5.7 If a scale space (Tt )t ≥0 satisfies the axioms [COMP] and [GLSI], then
it also satisfies

for all u, v ∈ X und t ≥ 0 one has Tt u − Tt v∞ ≤ u − v∞ . [CONT]

In other words, it is Lipschitz continuous with constant no larger than 1 (a property

also called non-expansivity).
Proof From u ≤ v + u − v∞ we deduce, using [COMP], that Tt u ≤ Tt (v +
v − u∞ ). From [GLSI] it follows that Tt u ≤ Tt v + v − u∞ and

Tt u − Tt v ≤ v − u∞ .

holds. Swapping the roles of u and v we obtain the claimed property [CONT]. $
#
5.1 Axiomatic Derivation of Partial Differential Equations 187

As we have seen in Example 5.4, the scale space for dilation satisfies the
axioms [COMP] and [GLSI]. So we have shown again that [CONT] holds in this
case, a result we already derived in Exercise 3.10.
The next lemma allows us to extend a scale space from Cb∞ (Rd ) to a larger space,
namely to the space

BUC(Rd ) = {u : Rd → R u bounded and uniformly continuous}.

Lemma 5.8 If [CONT] and [TRANS] hold for (Tt ), one can extend every Tt
uniquely to a mapping

Tt : BUC(Rd ) → BUC(Rd ).

Proof By Lipschitz continuity, [CONT], and the density of C ∞ (Rd ) in the space
BUC(Rd ) we can extend Tt to a map Tt : BUC(Rd ) → BC(Rd ) uniquely. It remains
to show the uniform continuity of Tt u, u ∈ BC(Rd ).
We choose for arbitrary ε > 0 a δ > 0 such that for all x ∈ Rd and |h| < δ, one
has |u(x) − u(x + h)| < ε. With v = Th u and because of [TRANS] and [CONT]
we get

(Tt u)(x) − (Tt u)(x + h) = (Tt u)(x) − (Tt v)(x)

≤ u − v∞ = sup u(x) − u(x + h) ≤ ε,
x∈Rd

which shows that Tt u is indeed uniformly continuous. $

#
This lemma retroactively justifies the formulation of the axiom [REC]: if [CONT]
and [TRANS] hold, we can consider the operators Tt as mappings from BUC(Rd )
to itself, and the concatenation of two of these operators makes sense.
Now we turn toward a central result in scale space theory, namely the relation
between a scale space and solutions of partial differential equations of second order.
To that end, we are going to show the existence of an infinitesimal generator that
can be written as a differential operator and that acts on the spatial dimensions. One
obtains this generator, if it exists, by a simple limit.
Theorem 5.9 Let (Tt )t ≥0 be a scale space that satisfies [TRANS], [COMP],
[GLSI], [REC], and [REG]. Moreover, let the constant C(u, v) in [REG] be
independent of u, v ∈ Q for every set Q of the form

Q = {u ∈ Cb∞ (Rd ) ∂ α u∞ ≤ Cα for all α ∈ Nd }. (5.8)

Then we have the following existence result:

188 5 Partial Differential Equations in Image Processing

Existence of an infinitesimal generator:

There exists an A : Cb∞ (Rd ) → BUC(Rd ) such that

Tt u − u [GEN]
A[u] = lim uniformly on Rd .
t →0 t

The operator A is continuous in the following sense: if ∂ α un → ∂ α u uniformly on

Rd for all α ∈ Nd , then A[un ] → A[u] uniformly on Rd .
Proof We give only a proof sketch. Details can be found in the original work [3].
Let δt (v) = (Tt v − v)/t denote the difference quotient and deduce from [REG]
that
1
δt (u + hv) − δt (u)∞ = Tt (u + hv) − u − hv − Tt (u) + u∞
t
1 (5.9)
= Tt (u + hv) − Tt (u) − hv∞
t
≤ C(u, v)h.

Using Tt (0) = 0, [GLSI], and v = u, u = 0, h = 1 in [REG], one gets

1
δt (u)∞ = Tt (0 + 1u) − Tt (0) − 1u∞ ≤ C(u),
t

i.e., δt (u) is, for t → 0, a bounded sequence in BC(Rd ).

The next step is to show that δt (u) has a limit for t → 0. To that end, one
shows Lipschitz continuity for every δt (u) with a Lipschitz constant independent of
t, again for t → 0. Choose
|y| = 1, h ∈ [0, 1] and note that by [TRANS], one has
(msat u)(x + hy) = Tt u(· + hy) (x). Moreover,
1
u(x + hy) = u(x) + h ∇u(x + shy) · y ds = u(x) + hvh (x),
0

where, for h → 0, one has vh ∈ Cb∞ (Rd ). Now one easily sees that all vh are in a
suitable set Q from (5.8). The estimate (5.9) gives the desired Lipschitz inequality

δt (u) (· + hy) − δt (u)∞ = δt (u + hvh ) − δt (u)∞ ≤ Ch,

where by assumption, C is independent of vh .

The theorem of Arzelà and Ascoli on compact subsets of spaces of continuous
functions now implies the existence of a convergent subsequence of {δtn (u)} for
tn → 0. The operator A is now defined as the limit A[u] = limt →0 δt (u), if it is
unique. To show this, one shows that the whole sequence converges, which is quite
some more work. Hence, we refer to the original paper for the rest of the proof
5.1 Axiomatic Derivation of Partial Differential Equations 189

(which also contains a detailed argument for the uniform convergence A[un ] →
A[u]). $
#
Our next step is to note that the operator A can be written as a (degenerate)
elliptic differential operator of second order.
Definition 5.10 Denote by S d×d the space of symmetric d × d matrices. We write
X − Y 0 or X Y if X − Y is positive semi-definite. A function f : S d×d → R
is called elliptic if f (X) ≥ f (Y ) for X Y . If f (X) > f (Y ) for X Y with
X = Y , f is called strictly elliptic, and degenerate elliptic otherwise.
Theorem 5.11 Let (Tt )t ≥0 be a scale space that satisfies the axioms [GEN],
[COMP], and [LOC]. Then there exists a continuous function F : Rd × R × Rd ×
S d×d → R such that F (x, c, p, · ) is elliptic for all (x, c, p) ∈ Rd × R × Rd and

A[u](x) = F x, u(x), ∇u(x), ∇ 2 u(x)

for every u ∈ Cb∞ (Rd ) and x ∈ Rd .

Proof From [LOC] and by the definition of A we immediately get that A[u](x) =
A[v](x) if ∂ α u(x) = ∂ α v(x) for all α ∈ Nd . We aim to show that this holds even if
only u(x) = v(x), ∇u(x) = ∇v(x) as well as ∇ 2 u(x) = ∇ 2 v(x) is satisfied: We
consider x0 ∈ Rd and u, v with u(x0 ) = v(x0 ) = c, and

∇u(x0 ) = ∇v(x0 ) = p, ∇ 2 u(x0 ) = ∇ 2 v(x0 ) = X.

Now we construct η ∈ Cb∞ (Rd ) such that η(x) = |x − x0 |2 in a neighborhood of x0

and that there exists a constant m > 0 with uε = u + εη ≥ v on Bmε (x0 ) (see also
Fig. 5.6). Such an η exists, since v − u = o(|x − x0 |2 ) for |x − x0 | → 0 (recall that
the derivatives coincide up to second order).
Moreover, choose a w ∈ D(Bm (x0 )) with

w ≥ 0, w = 1 on Bσ (x0 ), (5.10)

and use wε (x) = w((x − x0 )/ε + x0 ) (see Fig. 5.6) to construct the functions

ūε (x) = wε (x)uε (x) + 1 − wε (x) v(x).

These functions have the property that ∂ α ūε (x) = ∂ α uε (x) for all α ∈ Nd as well as
ūε (x) ≥ v(x) on the whole Rd . By [COMP] this implies Tt ūε (x0 ) ≥ Tt v(x0 ), and
by the monotonicity of the limit also A[ūε ](x0) ≥ A[v](x0 ). Moreover, A[ūε ](x0 ) =
A[uε ](x0) by construction of w and, again, by [LOC]. The continuity of A gives

lim A[ūε ](x0 ) = A[u](x0)

ε→0
190 5 Partial Differential Equations in Image Processing

Fig. 5.6 Schematic η wε

visualization of the auxiliary
functions η and wε used in
the proof of Theorem 5.11

|x − x0 | |x − x0 |
mε σε mε

and A[u](x0) ≥ A[v](x0 ). Switching the sign in the previous argument, we also get
A[u](x0) ≤ A[v](x0 ) and hence A[u](x0) = A[v](x0 ). We conclude that

A[u](x) = F x, u(x), ∇u(x), ∇ 2 u(x) ,

as desired.
It remains to show the continuity of F and that F is elliptic in its last component.
The latter follows from the following consideration: Construct, using w from (5.10),
the functions

u(x) = c + p · (x − x0 ) + 12 (x − x0 )T X(x − x0 ) w(x),

v(x) = c + p · (x − x0 ) + 12 (x − x0 )T Y (x − x0 ) w(x),

which satisfy u ≥ v by construction. Moreover, u(x0 ) = v(x0 ), ∇u(x0 ) = ∇v(x0 ),

and ∇ 2 u(x0 ) = X, ∇ 2 v(x0 ) = Y . Hence by [COMP], we have Tt u ≥ Tt v, and for
t → 0 we obtain
(Tt u − u)(x0 ) (Tt v − v)(x0 )
F (x0 , c, p, X) = lim ≥ lim = F (x0 , c, p, Y ) .
t →0 t t →0 t

The continuity of F can be seen similarly: For sequences xn → x0 , cn → c, pn →

p, and Xn → X, the functions

un (x) = cn + pn · (x − xn ) + 12 (x − xn )T Xn (x − xn ) w(x)

converge uniformly to

u(x) = c + p · (x − x0 ) + 12 (x − x0 )T X(x − x0 ) w(x),

and all their derivatives converge to the respective derivatives, too. By the conclusion
of Theorem 5.9 we get that A[un ] → A[u] uniformly. This implies

F (xn , cn , pn , Xn ) = A[un ](xn ) → A[u](x0) = F (x0 , c, p, X). $

#
5.1 Axiomatic Derivation of Partial Differential Equations 191

Remark 5.12 The proof also reveals the reason behind the fact that the order of the
differential operator has to be two. The auxiliary function η from the proof is, in a
neighborhood of zero, a polynomial of degree two. Every other positive polynomial
of higher degree would also work, but would imply a dependence on higher
derivatives. However, there is no polynomial of degree one that is strictly positive in
a pointed neighborhood of zero. Hence, degree two is the lowest degree for which
the argumentation in the proof works, and hence the order of the differential operator
is two.
If we add further morphological axioms to the setting of Theorem 5.11, we obtain
an even simpler form of F .
Lemma 5.13 Assume that the assumptions in Theorem 5.11 hold.
1. If, additionally, [TRANS] holds, then

A[u](x) = F (u(x), ∇u(x), ∇ 2 u(x)).

2. If, additionally, [GLSI] holds, then

A[u](x) = F (x, ∇u(x), ∇ 2 u(x)).

Proof The proof is based on the fact that the properties [TRANS] and [GLSI] are
transferred from Tt to A, and you should work out the rest in Exercise 5.5. $
#

5.1.4 Viscosity Solutions

By Theorem 5.11 one may say that for u0 ∈ Cb∞ (Rd ), one has that u(t, x) =
(Tt u0 )(x) solves the Cauchy problem

∂u
(t, x) = F (x, u(t, x), ∇u(t, x), ∇ 2 u(t, x)), u(0, x) = u0 (x)
∂t
in some sense, but only at time t = 0 (and the time derivative is only a one-sided
limit).
Remark 5.14 We used in the above formula for the Cauchy problem the widely
used convention that for function with distinguished “time coordinate” (t in this
case), the operators ∇ and ∇ 2 act only on the “spatial variable” x. We will keep
using this convention in the following.
192 5 Partial Differential Equations in Image Processing

To show that the equation is also satisfied for t > 0, we can argue as follows:
By [REC], we should have

∂u Tt +s (u0 )(x) − Tt (u0 )(x) Ts Tt (u0 ) (x) − Tt (u0 )(x)
(t, x) = lim = lim
∂t s→0+ s s→0+ s
, - , -
= A Tt (u0 ) (x) = A u(t, ·) (x) = F x, u(t, x), ∇u(t, x), ∇ 2 u(t, x) .

This would imply that u satisfies the differential equation for all times. However,
there is a problem with this argument: Tt (u0 ) is not necessarily
,an element
- in
Cb∞ (Rd ), and the conclusion lims→0+ 1s Ts (Tt (u0 )) − Tt (u0 ) = A Tt (u0 ) is not
valid.
The lack of regularity is a central problem in the theory of partial differential
equations. An approach that often helps is to introduce a suitable notion of weak
solutions. This means a generalized notion of solution that requires less regularity
than the original equation requires. In the context of scale space theory, the notion
of viscosity solutions is appropriate to define weak solutions with the desired
properties. In the following we give a short introduction to the wide field of viscosity
solutions but do not go into great detail.
The notion of viscosity solution is based on the following important observation:
Theorem 5.15 Let F : [0, ∞[ × Rd × R × Rd × S d×d → R be a continuous
function that is elliptic, i.e., F (t, x, u, p, X) ≥ F (t, x, u, p, Y ) for X Y . Then
u ∈ C 2 ([0, ∞[ × Rd ) is a solution of the partial differential equation

∂u
(t, x) = F t, x, u(t, x), ∇u(t, x), ∇ 2u(t, x) (5.11)
∂t

in ]0, ∞[ × Rd if and only if the following conditions are satisfied:

1. For all ϕ ∈ C 2 ([0, ∞[ × Rd ) and for all local maxima (t0 , x0 ) of the function
u − ϕ,

∂ϕ
(t0 , x0 ) ≤ F t0 , x0 , u(t0 , x0 ), ∇ϕ(t0 , x0 ), ∇ 2 ϕ(t0 , x0 ) .
∂t

2. For all ϕ ∈ C 2 ([0, ∞[ × Rd ) and for all local minima (t0 , x0 ) of the function
u − ϕ,

∂ϕ
(t0 , x0 ) ≥ F t0 , x0 , u(t0 , x0 ), ∇ϕ(t0 , x0 ), ∇ 2 ϕ(t0 , x0 ) .
∂t
5.1 Axiomatic Derivation of Partial Differential Equations 193

Proof Let ϕ ∈ C 2 ([0, ∞[ × Rd ) and let (t0 , x0 ) be a local maximum of the function
u − ϕ. By the classical necessary conditions for a maximum, we obtain

∇(u − ϕ)(t0 , x0 ) = 0, i.e., ∇u(t0 , x0 ) = ∇ϕ(t0 , x0 ),

∂(u − ϕ) ∂u ∂ϕ
(t0 , x0 ) = 0, i.e., (t0 , x0 ) = (t0 , x0 ),
∂t ∂t ∂t
∇ 2 (u − ϕ)(t0 , x0 ) 0, i.e., ∇ 2 u(t0 , x0 ) ∇ 2 ϕ(t0 , x0 ).

Hence, by ellipticity,
∂ϕ ∂u
(t0 , x0 ) = (t0 , x0 ) = F t0 , x0 , u(t0 , x0 ), ∇u(t0 , x0 ), ∇ 2 u(t0 , x0 )
∂t ∂t

≤ F t0 , x0 , u(t0 , x0 ), ∇u(t0 , x0 ), ∇ 2 ϕ(t0 , x0 )

= F t0 , x0 , u(t0 , x0 ), ∇ϕ(t0 , x0 ), ∇ 2 ϕ(t0 , x0 ) .

Claim 2 is proven similarly.

Conversely, we assume 1 and 2. Then we set u = ϕ and get local maxima and
minima in the whole domain [0, ∞[ × Rd . Consequently,
∂u
(t, x) ≤ F t, x, u(t, x), ∇u(t, x), ∇ 2 u(t, x) ,
∂t
∂u
(t, x) ≥ F t, x, u(t, x), ∇u(t, x), ∇ 2 u(t, x) ,
∂t
and this implies
∂u
(t, x) = F t, x, u(t, x), ∇u(t, x), ∇ 2u(t, x)
∂t
on the whole of [0, ∞[ × Rd . $
#
The characterization of solutions in Theorem 5.15 has the notable feature that
claims 1 and 2 do not need any differentiability of u. If u is merely continuous, we
can still decide whether a point is a local maximum or minimum of u−ϕ. Moreover,
the respective inequalities use only derivatives of the test functions ϕ. Put differently,
one implication of the theorem says if 1 and 2 hold for continuous u, then u is also a
solution of equation (5.11) if u has the additional regularity u ∈ C 2 ([0, ∞[ × Rd ). If
we weaken the regularity assumption, we obtain the definition of viscosity solutions.
Definition 5.16 Let F be as in Theorem 5.15 and let u ∈ C ([0, ∞[ × Rd ). Then we
have the following:
1. u is a viscosity sub-solution if for every ϕ ∈ C 2 ([0, ∞[ × Rd ) and every local
maximum (t0 , x0 ) of u − ϕ, one has
∂ϕ
(t0 , x0 ) ≤ F t0 , x0 , u(t0 , x0 ), ∇ϕ(t0 , x0 ), ∇ 2 ϕ(t0 , x0 ) ,
∂t
194 5 Partial Differential Equations in Image Processing

2. u is a viscosity super-solution if for every ϕ ∈ C 2 ([0, ∞[ × Rd ) and every local

minimum (t0 , x0 ) of u − ϕ, one has

∂ϕ
(t0 , x0 ) ≥ F t0 , x0 , u(t0 , x0 ), ∇ϕ(t0 , x0 ), ∇ 2 ϕ(t0 , x0 ) ,
∂t
3. u is a viscosity solution if u is both a viscosity sub-solution and viscosity super-
solution.
For the special case that the function F depends only on ∇u and ∇ 2 u, we have
the following helpful lemma:
Lemma 5.17 Let F : Rd × S d×d → R be continuous and elliptic, i.e., F (p, X) ≥
F (p, Y ) for X Y . A function u ∈ C ([0, ∞[ × Rd ) is a viscosity sub- or super-
solution, respectively, if for all f ∈ Cb∞ (Rd ) and g ∈ Cb∞ (R), the respective part in
Definition 5.16 holds for ϕ(t, x) = f (x) + g(t).
Proof We show the case of a viscosity sub-solution and assume that Definition 5.16
holds for all ϕ of the form ϕ(t, x) = f (x) + g(t) with f, g as given. Without
loss of generality we assume that (t0 , x0 ) = (0, 0) and consider a function ϕ ∈
C 2 ([0, ∞[ × Rd ) such that u − ϕ has a maximum in (0, 0). Hence, we have to show
that
∂ϕ
(0, 0) ≤ F (∇ϕ(0, 0), ∇ 2 ϕ(0, 0)).
∂t
We consider the Taylor expansion of ϕ at (0, 0) and get with a = ϕ(0, 0), b =
∂ϕ 2
p = ∇ϕ(0, 0), c = 12 ∂∂t ϕ2 (0, 0), Q = 12 ∇ 2 ϕ(0, 0), and q = 12 ∂t∂ ∇ϕ(0, 0)
∂t (0, 0),
that

ϕ(t, x) = a + bt + p · x + ct 2 + x T Qx + tq · x + o(|x|2 + t 2 ).

We define, for all ε > 0, the functions f ∈ Cb∞ (Rd ) and g ∈ Cb∞ (R) for small
values of x and t by
|q|
f (x) = a + p · x + x T Qx + ε(1 + 2
2 )|x| ,

g(t) = bt + ( |q|
2ε + ε + c)t .
2

The boundedness of f and g can be ensured with the help of a suitable cutoff
function w as in (5.10) in the proof of Theorem 5.11. Hence, for small values of
x and t, we have
ε|q| |q| 2
ϕ(t, x) = f (x) + g(t) − |x|2 + t − tq · x + ε(|x|2 + t 2 ) + o(|x|2 + t 2 ).
2 2ε
5.1 Axiomatic Derivation of Partial Differential Equations 195

Because of
ε|q| 2 |q| 2 |q| |q| 2
|x| + t − tq · x ≥ ε |x|2 + t − t|q||x|
2 2ε 2 2ε
2 2 2
|q|ε |q|
= |x| − t ≥0
2 2ε

we obtain, for small x and t, that ϕ(t, x) ≤ f (x) + g(t). Hence, in a neighborhood
of (0, 0), we also get u(t, x) − ϕ(t, x) ≥ u(t, x) − f (x) − g(t), and in particular
we see that u − f − g has a local maximum at (0, 0). By assumption we get

∂(f + g)
(0, 0) ≤ F (∇(f + g)(0, 0), ∇ 2 (f + g)(0, 0)).
∂t
Now we note that
∂(f + g) ∂ϕ
(0, 0) = (0, 0), ∇(f + g)(0, 0) = ∇ϕ(0, 0),
∂t ∂t
∇ 2 (f + g)(0, 0) = ∇ 2 ϕ(0, 0) + 2ε(1 + |q|) id,

and conclude that

∂ϕ
(0, 0) ≤ F ∇ϕ(0, 0), ∇ 2 ϕ(0, 0) + 2ε(1 + |q|) id .
∂t
Since this inequality holds for every ε > 0, we can, thanks to the continuity of F ,
pass to the limit ε = 0 and obtain the claim. The case of viscosity super-solutions is
proved in the same way. $
#
Now we are able to prove that the scale spaces from Theorem 5.11 are viscosity
solutions of partial differential equations.
Theorem 5.18 Let (Tt ) be s scale space that satisfies the axioms [TRANS],
[COMP], [GLSI], [REC], [REG], and [LOC]. Let the infinitesimal generator of (Tt )
be denoted by A[u] = F (∇u, ∇ 2 u) and let u0 ∈ Cb∞ (Rd ). Then u(t, x) = (Tt u0 )(x)
is a viscosity solution of

∂u
(t, x) = F (∇u(t, x), ∇ 2 u(t, x))
∂t
with initial condition u(0, x) = u0 (x).
Proof Theorem 5.11 and Lemma 5.13 ensure that the generator has the stated form.
Now we show that u is a viscosity sub-solution. The proof that u is also a viscosity
super-solution is similar. Let ϕ ∈ C 2 ([0, ∞[×Rd ) be such that (t0 , x0 ) with t0 > 0 is
a local maximum of u−ϕ. Without loss of generality we can assume that u(t0 , x0 ) =
196 5 Partial Differential Equations in Image Processing

ϕ(t0 , x0 ), u ≤ ϕ and by Lemma 5.17 it is enough to consider ϕ(t, x) = f (x) + g(t)

with f ∈ Cb∞ (Rd ) and g ∈ Cb∞ (R).
By [REC] we can write for 0 < h ≤ t0 ,

ϕ(t0 , x0 ) = u(t0 , x0 ) = Th u(t0 − h, ·) (x0 ).

By [COMP] and [GLSI] we get

f (x0 ) + g(t0 ) = ϕ(t0 , x0 )

= Th u(t0 − h, ·) (x0 )

≤ Th ϕ(t0 − h, ·) (x0 )
= Th (f )(x0 ) + g(t0 − h).

Rearranging gives

g(t0 ) − g(t0 − h) Th (f ) − f
≤ (x0 ) .
h h

Since f ∈ Cb∞ (Rd ) and g are differentiable, we can pass to the limit h → 0 and get,

by Theorem 5.11, g (t0 ) ≤ F ∇f (x0 ), ∇ 2 f (x0 ) . Since ϕ(t, x) = f (x) + g(t), we
have also
∂ϕ
(t0 , x0 ) ≤ F ∇ϕ(t0 , x0 ), ∇ 2 ϕ(t0 , x0 ) .
∂t
And hence, by definition, u is a viscosity sub-solution. $
#

5.2 Standard Models Based on Partial Differential Equations

We have now completed our structural analysis of scale spaces. We have seen that
scale spaces naturally (i.e., given the specific axioms) lead to functions F : Rd ×
R × Rd × S d×d → R and that these functions characterize the respective scale
space. In this section we will see how the morphological axioms influence F and
discover some differential equations that are important in imaging.

5.2.1 Linear Scale Spaces: The Heat Equation

The main result of this chapter will be that among the linear scale spaces there is
essentially only the heat equation.
5.2 Standard Models Based on Partial Differential Equations 197

Theorem 5.19 (Uniqueness of the Heat Equation) Let Tt be a scale space

that satisfies [TRANS], [COMP], [GLSI], [REC], [LOC], [ISO], and [REG] (with
uniform constant C(u, v) in [REG]). If the maps Tt are in addition linear, then there
exists c > 0 such that F (∇u, ∇ 2 u) = cu. In other words, u(t, x) = (Tt u0 )(x)
satisfies the heat equation

∂t u − cu = 0 in R+ × Rd ,
u(0, ·) = u0 in Rd .

Proof By Theorem 5.9 the infinitesimal generator A exists, and by Theorem 5.11
and Lemma 5.13 it has the form

A[u] = F (∇u, ∇ 2 u).

We aim to prove that F : Rd × S d×d → R is actually

F (p, X) = c trace X for some c > 0.

1. Linearity of Tt implies that the infinitesimal generator is also linear, i.e.,

F (λp + μq, λX + μY ) = λF (p, X) + μF (q, Y ).

Especially, we have

F (p, X) = F (p + 0, 0 + X) = F (p, 0) + F (0, X)

= F1 (p) + F2 (X).

2. By [ISO] we have for every isometry R ∈ Rd×d that

A[DR u] = DR A[u],

since a similar equation holds for every Tt . For F this means

F (∇(DR u))(x), (∇ 2 (DR u))(x) = F (DR ∇u)(x), (DR ∇ 2 u)(x) .

Since (∇(DR u))(x) = R T ∇u(Rx) and (∇ 2 (DR u))(x) = R T ∇ 2 u(Rx)R, we

obtain F (R T p, R T XR) = F (p, X). Referring to the first step, we have arrived
at

F1 (R T p) = F1 (p), (A)
F2 (RXR T ) = F2 (X). (B)
198 5 Partial Differential Equations in Image Processing

3. Now (A) implies that F1 (p) = f (|p|). Linearity of F implies F1 is also linear
and thus, F1 (p) ≡ 0. From (B) we deduce that F2 depends only on quantities
that are invariant under similarity transforms. This is the set of eigenvalues of X
with their multiplicities. Since all eigenvalues have the same role, linearity leads
to

F2 (X) = h(trace X) = c trace X

for some c ∈ R.
4. By [COMP] we get that for X Y , we have F (p, X) ≥ F (p, Y ). This implies
that for X Y , we have

c trace X ≥ c trace Y,
i.e. c trace(X − Y ) ≥ 0.

Hence c ≥ 0, and the proof is complete. $

#
The heat equation plays a central role in the area of image processing based on
partial differential equations. One can say that the heat equation is, essentially, the
only linear partial differential equation that is used.
Remark 5.20 The scale space based on the heat equation does not satisfy [GSI],
i.e., it is not contrast invariant. This can be seen as follows: Let u be a solution of
the differential equation ∂t u − u = 0 and let u = h(v) with a differentiable and
increasing gray value transformation h : R → R. Then

0 = ∂t (h(v)) − (h(v)) = h ∂t v − h v − h |∇v|2 .

If [GSI] were satisfied, then we would have ∂t v − v = 0, which holds only if

h = 0 (i.e., linear gray value transformations are allowed).
Thus, linearity and [GSI] contradict each other: A scale space that satisfies [GSI]
has to be nonlinear. We will see examples for contrast invariant scale spaces in the
next sections.
Remark 5.21 The heat equation has very good smoothing properties in theory (for
t > 0 we have that u(t, ·) is infinitely differentiable), but it is not well suited
to denoise images. On the one hand, this is because all information in the image
is smoothed in the same way, including edges. Another important point is the
dislocation of information, i.e., that location of coarse scale structure (i.e., large
t) does not correspond to the location of the related fine-scale structure; see Fig. 5.7.
One way around this drawback is to modify the differential equation such that
the process preserves edges better. This will be accomplished by the Perona-Malik
equation, which we will introduce and study in Sect. 5.3.1.
5.2 Standard Models Based on Partial Differential Equations 199

Fig. 5.7 Dislocation of information by the heat equation illustration by the movement of edges.
Left: original image, middle: image after application of the heat equation, right: level lines of the
middle image

5.2.2 Morphological Scale Space

Among the linear scale spaces, there is essentially only the heat equation. In
Remark 5.20 we saw that linearity and contrast invariance are mutually exclusive.
In this section we study the consequences of contrast invariance, i.e., of the
axiom [GSI]. This means that we look for scale spaces that are invariant under
contrast changes. In other words, the scale space depends only on the level sets,
i.e., only on the shape of objects and not on the particular gray values, hence the
name “morphological equations.”
Lemma 5.22 Let F : Rd \ {0} × S d×d → R be continuous. A scale space

∂t u = F (∇u, ∇ 2 u)

satisfies [GSI] if and only if F satisfies the following invariance: for all p = 0,
X ∈ S d×d , and all λ ∈ R, μ ≥ 0, one has (with p ⊗ p = ppT ) that

F (μp, μX + λp ⊗ p) = μF (p, X). (∗)

Proof ([GSI] &⇒ (∗)): Let h : R → R be a twice continuously differentiable and

non-decreasing gray value transformation. Moreover, let Tt (h(u0 )) = h(Tt (u0 )) be
satisfied. Then u = Tt (u0 ) is a solution of

∂t u = F (∇u, ∇ 2 u),
(I)
u(0) = u0 ,

and h(u) a solution of

∂t (h ◦ u) = F (∇(h ◦ u), ∇ 2 (h ◦ u)),

(II)
(h ◦ u)(0) = h(u0 ).
200 5 Partial Differential Equations in Image Processing

Using the chain rule, we obtain

∇(h ◦ u) = h ∇u
∇ 2 (h ◦ u) = h ∇ 2 u + h ∇u ⊗ ∇u

(which you should derive in Exercise 5.7). Equation (II) becomes

h ∂t u = F (h ∇u, h ∇ 2 u + h ∇u ⊗ ∇u).

With (I) we obtain

h F (∇u, ∇ 2 u) = F (h ∇u, h ∇ 2 u + h ∇u ⊗ ∇u).

Since u is an arbitrary solution and h is also arbitrary and non-decreasing, we can

choose ∇u = p, ∇ 2 u = X, h = μ ≥ 0 and h = λ (cf. the techniques in the proof
of Theorem 5.11).
((∗) &⇒ [GSI]): Conversely, assume that (∗) is satisfied. Moreover, let u be a
solution of (I) and h a non-decreasing gray value transformation. We have to show
that v = h ◦ u is also a solution of (I) with initial value h ◦ u0 . By the chain rule one
gets

∂t v = h ∂t u = h F (∇u, ∇ 2 u).

By (∗) we get, again using the chain rule (similarly to the previous part), that

h F (∇u, ∇ 2 u) = F (∇(h ◦ u), ∇ 2 (h ◦ u)),

which proves the assertion. $

#
Theorem 5.23 Let Tt be a scale space that satisfies [GSI] and [COMP] and
that has an infinitesimal generator. For p ∈ Rd \ {0} let Qp = (id − p⊗p
|p|2
) be
the projection onto the subspace perpendicular to p. Then for X ∈ S d×d and
p ∈ Rd \ {0}, one has

F (p, X) = F (p, Qp XQp ).

Proof Since X can be chosen independently of p = 0, we first prove the claim in

the special case p = ced , where ed = (0, . . . , 0, 1) denotes the dth unit vector and
c = 0. In this case, we have
⎛ ⎞ ⎛ ⎞
0 ... 0 0 1
⎜ .. .. .. ⎟ ⎜ .. ⎟
⎜ .⎟ ⎜ . ⎟
p ⊗ p = c2 ⎜ . . ⎟, Qp = ⎜ ⎟.
⎝0 . . . 0 0⎠ ⎝ 1 ⎠
0 ... 0 1 0
5.2 Standard Models Based on Partial Differential Equations 201

In particular, we get from Lemma 5.22, that F (p, · ) does not depend on the entry
Xd,d . It remains to show that the entries Xd,i with 1 ≤ i ≤ d − 1 also do not play a
role. To that end, we define M = xd,12 + · · · + x2
d,d−1 and

⎛ ⎞
ε
⎜ .. ⎟
⎜ . ⎟
Iε = ⎜ ⎟.
⎝ ε ⎠
M
ε

In Exercise 5.8 you will show that

Qp XQp X + Iε
X Qp XQp + Iε .

Axiom [COMP] implies ellipticity of F . Since F does not depend on the entry Xd,d ,
we get, on the one hand, that

F (p, Qp XQp ) ≤ F (p, X + Iε ) = F (p, X + ε id),

and on the other hand, that

F (p, X) ≤ F (p, Qp XQp + Iε ) = F (p, Qp XQp + ε id).

Sending to zero in both equations, we get, as claimed,

F (p, X) = F (p, Qp XQp ).

The general case p = 0 follows similarly with an additional orthogonal change of

coordinates that sends p to a multiple of ed . $
#
The matrix Qp XQp maps p to zero and leaves the space perpendicular to p
invariant. Since Qp XQp is also symmetric, one eigenvalue is zero and all d − 1
other eigenvalues are also real. If we now add the axiom [ISO] we obtain that the
infinitesimal generator depends only on the magnitude of p and the eigenvalues:
Theorem 5.24 In addition to the assumptions in Theorem 5.23 let [ISO] be
satisfied. For p ∈ Rd \ {0} and X ∈ S d×d denote by λ1 , . . . , λd−1 the nonzero
eigenvalues of Qp XQp . Then

F (p, X) = F1 (|p|, λ1 , . . . , λd−1 ).

Moreover, F1 is symmetric in the eigenvalues and non-decreasing.

202 5 Partial Differential Equations in Image Processing

Proof First we note that we have seen in step 2 of the proof of Theorem 5.19
that [ISO] implies that for every isometry R ∈ Rd×d ,

F (p, X) = F (R T p, R T XR). (∗)

1. We fix p and choose an isometry that leaves p invariant, i.e., Rp = p and R T p =

p. With Lemma 5.22 and (∗) we get

F (p, X) = F (p, Qp XQp ) = F (R T p, R T Qp XQp R) = F (p, R T Qp XQp R).

Hence, the function F can depend only on the eigenvalues of Qp XQp on the
space orthogonal to p. Since p is an eigenvector of Qp for the eigenvalue zero,
there is a function G such that

F (p, X) = G(p, λ1 , . . . , λd−1 ).

2. Now let R denote any isometry and set q = R T p. Then also Rq = p and
|p| = |q|. We calculate
T
RQq = R(id − qq
|q|2
)

pq T
=R− |q|2
T
= (id − pp
|p|2
)R = Qp R,

i.e., RQq = Qp R. This implies Qq R T XRQq = R T Qp XQp R, and we see that

Qq R T XRQq and Qp XQp have the same eigenvalues. By (∗) and the first point
we obtain

G(p, λ1 , . . . , λd−1 ) = G(R T p, λ1 , . . . , λd−1 ).

Thus, G depends only on the magnitude of p, as claimed. $

#
To get a better understanding of the consequences of axiom [GSI], we shall have
a look at some examples:

The Equations for Erosion and Dilation

We consider the unit ball B1 (0) ⊂ Rd as a structure element and the corresponding
scale spaces for erosion and dilation:

Et u0 = u0 * tB,
Dt u0 = u0 ⊕ tB.
5.2 Standard Models Based on Partial Differential Equations 203

The infinitesimal generators are

F (p, X) = −|p| for erosion and F (p, X) = |p| for dilation.

Hence, the corresponding differential equations are

∂t u = −|∇u| for erosion,

(5.12)
∂t u = |∇u| for dilation.

By Theorem 5.15 we immediately get that these differential equations are solved
by erosion and dilation, respectively, in the viscosity sense. Since erosion and
dilation produce non-differentiable functions in general, one cannot define classical
solutions here, and we see how helpful the notion of viscosity solutions is.
Remark 5.25 (Interpretation as a Transport Equation) To understand the equations
in a qualitative way, we interpret them as transport equations. We write the
infinitesimal generator for the dilation as
∇u
|∇u| = |∇u| · ∇u.

Recall that the equation ∂t u = v · ∇u with initial value u(0) = u0 is solved by

u(t, x) = u0 (x + tv). This means that the initial value u0 is shifted in the direction
−v. Similarly, one can say that erosion and dilation shift the initial value in the
direction of the negative gradient and that the velocity is either −1 or 1, respectively.
Hence, we can describe both examples by the movement of the level sets. For
dilation, the level lines are shifted in the direction of the negative gradient. The
corresponding equation is also called the grassfire equation, since the level lines
move like the contours of a grass fire; the burning/bright regions expand. Erosion
behaves similarly: the dark areas expand uniformly in all directions.

The Mean Curvature Motion

Another important contrast-invariant scale space is given by the generator

F (p, X) = trace(Qp X). (5.13)

Since Qμp = Qp and Qp p = 0, we conclude that

F (μp, μX + λp ⊗ p) = trace(Qμp (μX + λppT )) = μ trace(Qp X) = μF (p, X).

We see that the invariance from Lemma 5.22 is satisfied. The differential operator
related to F is

F (∇u, ∇ 2 u) = trace (id − ∇u⊗∇u
|∇u|2
)∇ 2 u .
204 5 Partial Differential Equations in Image Processing

To better understand this quantity we use ∇u⊗∇u = ∇u∇uT , linearity of the trace,
and the formula trace(A B C) = trace(C A B) (invariance under cyclic shifts, if the
dimensions fit):

F (∇u, ∇ 2 u) = u − 1
|∇u|2
trace(∇u∇uT ∇ 2 u)

= u − 1
|∇u|2
trace(∇uT ∇ 2 u∇u) = u − 1
|∇u|2
∇uT ∇ 2 u∇u.

We can describe this expression in so-called local coordinates in a simpler way. We

define the direction
∇u
η=
|∇u|

and denote by ∂η u the first derivative of u in the direction η and by ∂ηη u the second
derivative, i.e.,

∂η u = |∇u|, ∂ηη u = ηT ∇ 2 u η.

Hence,

F (∇u, ∇ 2 u) = u − ∂ηη u.

We recall that the Laplace operator is rotationally invariant and note that u − ∂ηη u
is the sum of the second derivatives of u in the d − 1 directions perpendicular to the
gradient. Since the gradient is normal to the level sets of u, we see that u − ∂ηη u
is the projection of the Laplace operator onto the space tangential to the level sets.
Hence, the differential equation

∂t u = trace(Q∇u ∇ 2 u) = u − ∂ηη u

is also called the heat equation on the tangent space or the morphological equivalent
of the heat equation.
The case d = 2 is even easier, since the space orthogonal to η is one-dimensional.
We define

−η2
ξ = η⊥ =
η1

and note that

∂ξ u = 0, ∂ξ ξ u = ξ T ∇ 2 u ξ.
5.2 Standard Models Based on Partial Differential Equations 205

Since η is the normalized gradient, it is orthogonal to the level sets {u = const},

and the situation in local coordinates (η, ξ ) looks as follows:

ξ
η

u = const

In particular, we have u = ∂ηη u + ∂ξ ξ u and thus

trace(Q∇u ∇ 2 u) = ∂ξ ξ u.

In Exercise 5.9 you shall show that ∂ξ ξ u is related to the curvature κ of the levelset;
more precisely,

∂ξ ξ u = |∇u|κ.

We summarize: the differential equation of the generator (5.13) has the form

∂t u = |∇u|κ.

In analogy to Remark 5.25, we may interpret this equation as a transport equation

∇u
∂t u = κ ∇u,
|∇u|

and we see that the initial value (as in the case of dilation and erosion) is shifted
in the direction of the negative gradient. In this case, the velocity is proportional to
the curvature of the levelsets, and the resulting scale space is also called curvature
motion. As a result, the level sets are “straightened”; this explains the alternative
name curve shortening flow. One can even show that the boundaries of convex
and compact sets shrink to a point in finite time and that they look like a circle,
asymptotically; see, for example, [79]. The action of the curvature motion on an
image is shown in Fig. 5.8.
In higher dimensions, similar claims are true: here trace(Q∇u ∇ 2 u) = |∇u|κ
with the mean curvature κ; this motivates the name mean curvature motion. For the
definition and further properties of the mean curvature we refer to [12].
206 5 Partial Differential Equations in Image Processing

original image u t=1

t = 10 t = 30

Fig. 5.8 Illustration of the curvature motion at different scales t

5.3 Nonlinear Diffusion

The heat equation satisfies the most axioms among the linear methods, but it is
not well suited to denoise images. Its main drawback is the heavy smoothing of
edges, see Remark 5.21. In this section we will treat denoising methods that are
variations of the heat equation. We build the modifications of the heat equation on
its interpretation as a diffusion process: Let u describe some quantity (e.g., heat
of metal or the concentration of salt in a liquid) in d dimensions. A concentration
gradient induces a flow j from high to low concentration:

j = −A∇u. (Fick’s law)

The matrix A is called a diffusion tensor. If A is a multiple of the identity, one speaks
of isotropic diffusion, if not, of anisotropic diffusion. The diffusion tensor controls
how fast and in which direction the flow goes.
5.3 Nonlinear Diffusion 207

Moreover, we assume that the overall concentration remains constant, i.e., that
no quantity appears or vanishes. Mathematically we describe this as follows. For
some volume V we consider the change of the overall concentration in this volume:

∂t u(x) dx.
V

If the total concentration stays the same, this change has to be equal to the flow
across the boundary of V by the flow j , i.e.,

∂t u(x) dx = j · (−ν) dHd−1.
V ∂V

Interchanging integration and differentiation on the left-hand side and using the
divergence theorem (Theorem 2.81) on the right-hand side, we obtain

∂t u(x) dx = − (div j )(x) dx.
V V

Since this holds for all volumes, we get that the integrands are equal at almost every
point:

∂t u = − div j. (continuity equation)

Plugging the continuity equation into Fick’s law ,we obtain the following equation
for u:

∂t u = div(A∇u).

This equation is called a diffusion equation. If A is independent of u, the left-hand

side of the diffusion equation is linear, and we speak of linear diffusion.

5.3.1 The Perona-Malik Equation

The idea of Perona and Malik in [110] was to slow down diffusion at edges. As we
have seen in Application 3.23, edges can be described as points where the gradient
has a large magnitude. Hence, we steer the diffusion tensor A in such a way that it
slows down diffusion where gradients are large. Since we don’t have any reason for
anisotropic diffusion yet, we set

A = g(|∇u|) id
208 5 Partial Differential Equations in Image Processing

g1 g2

λ s λ s

Fig. 5.9 Functions g from (5.14) in the diffusion coefficient of the Perona-Malik equation

with some function g : [0, ∞[ → [0, ∞[ that is close to one for small arguments
and monotonically decreasing to zero. Consequently, diffusion acts as it does in
the heat equation at places with small gradient, and diffusion is slower at places
with large gradient. Two widely used examples of such functions, depending on a
parameter λ > 0, are

1 s2
−
g1 (s) = , g2 (s) = e 2λ2 , (5.14)
s2
1+ λ2

see Fig. 5.9.

The parameter λ says how fast the function tends to zero. In conclusion, the
Peron-Malik equation reads as

∂t u = div(g(|∇u|)∇u),
(5.15)
u(0, x) = u0 (x).

Figure 5.10 illustrates that the Perona-Malik equation indeed has the desired effect.
We begin our analysis of the Perona-Malik equation with the following observa-
tion:
Lemma 5.26 We have

g (|∇u|) T 2
div(g(|∇u|)∇u) = ∇u ∇ u∇u + g(|∇u|)u,
|∇u|

and thus the Perona-Malik equation has the following infinitesimal generator:

g (|p|) T
F (p, X) = p Xp + g(|p|) trace X.
|p|

Proof You shall check this in Exercise 5.10. $

#
Remark 5.27 The Perona-Malik equation is an example of non-linear isotropic
diffusion, since the diffusion tensor is a multiple of the identity. The decomposition
in Lemma 5.26, however, shows that diffusion acts differently in the directions η
5.3 Nonlinear Diffusion 209

Fig. 5.10 Results for the Perona-Malik equation. Top left: original image. Second row: Function
g1 with λ = 0.02. Third row: function g1 with λ = 0.005. Fourth row: Function g2 with λ = 0.02.
The images show the results at times t = 5, 50, 500, respectively
210 5 Partial Differential Equations in Image Processing

and orthogonal to η. For that reason some authors call the Perona-Malik equation
“anisotropic diffusion.”
The function F with g1 and g2 from (5.14) is not elliptic. This leads to a problem,
since our current results do not imply that the Perona-Malik equation has any
solution, even in the viscosity sense. In fact, one can even show that the equation
does not possess any solution for certain initial values [83, 86]. Roughly speaking,
the reason for this is that one cannot expect or show that the gradients remain
bounded, and hence the diffusion coefficient cannot be bounded away from zero.
Some experience with parabolic equations hints that this should lead to problems
with existence of solutions.
We leave the problem aside for now and assume that the Perona-Malik equation
has solutions, at least for small time.
Theorem 5.28 The Perona-Malik method satisfies the axioms [REC], [GLSI],
[TRANS], [ISO]. The axiom [SCALE] holds if g is a monomial.
Proof Recursivity [REC] holds, since the operators Tt are solution operators of a
differential equation, and hence possess the needed semi-group property.
For gray value shift invariance [GLSI] we note, that if u solves the differential
equation (5.15), then u + c solves the same differential equation with initial value
u0 (x) + c. This implies Tt (u0 + c) = Tt (u0 ) + c as desired. (In other words, the
differential operator is invariant with respect to linear gray level shifts.)
Translation invariance [TRANS] and isometry invariance [ISO] can be seen
similarly (the differential operator is invariant with respect to translations and
rotations).
If g is a monomial, i.e., g(s) = s p , then v(t, x) = u(t, λx) satisfies

div(g(|∇v|)∇v)(x) = |λ|p λ div(g(|∇u|)∇u)(λx).

Thus, with t = t/(|λ|p λ), we have

Tt (u0 (λ · )) = (Tt u0 )(λ · ),

as claimed. $
#
Since the map F is not elliptic in general, we cannot use the theory of viscosity
solutions in this case. However, we can consider the equation on some domain ⊂
Rd (typically on a rectangle in R2 ) and add boundary conditions, which leads us to a
so-called boundary initial value problem. (One can also consider viscosity solutions
for differential equations on domains; however, the formulation of boundary values
is intricate.) For the boundary initial value problem for the Perona-Malik equation
one can show that a maximum principle is satisfied:
5.3 Nonlinear Diffusion 211

Theorem 5.29 Let ⊂ Rd , u0 ∈ L2 (), g : [0, ∞[ → [0, ∞[ continuous, and

u : [0, ∞[ × → R a solution of the boundary initial value problem

∂t u = div(g(|∇u|)∇u), in [0, ∞[ × ,
∂ν u = 0 on [0, ∞[ × ∂,
u(0, x) = u0 (x) for x ∈ .

Then for all p ∈ [2, ∞], one has

u(t, · )p ≤ u0 p ,

and moreover,

u(t, x) dx = u0 (x) dx.

Proof For p < ∞ we set h(t) = |u(t, x)|
p dx and differentiate h:

d
h (t) = |u(t, x)|p dx
dt

= p|u(t, x)|p−2 u(t, x)∂t u(t, x) dx

= p|u(t, x)|p−2 u(t, x) div(g(|∇u|)∇u)(t, x) dx.

We use the divergence theorem (Theorem 2.81) and get

p|u(t, x)|p−2 u(t, x) div(g(|∇u|)∇u)(t, x) dx

= p|u(t, x)|p−2 u(t, x)g(|∇u(t, x)|)∂ν u(t, x) dHd−1
∂

− p(p − 1) |u(t, x)|p−2 g(|∇u(t, x)|)|∇u(t, x)|2 dx.

The boundary integral vanishes due to the boundary condition, and the other integral
is nonnegative. Hence, h (t) ≤ 0, i.e., h is decreasing, which proves the claim for
p < ∞, and also h(t)1/p = u(t, · )p is decreasing for all p ∈ [2, ∞[. The case
p = ∞ now follows by letting p → ∞.
212 5 Partial Differential Equations in Image Processing

For the second claim we argue similarly and differentiate the function μ(t) =
u(t, x) dx, again using the divergence theorem to get

μ (t) = ∂t u(t, x) dx

= 1 · div(g(|∇u|)∇u)(t, x) dx

= g(|∇u(t, x)|)∂ν u(t, x) dHd−1 − (∇1) · g(|∇u(t, x)|)∇u(t, x) dx.
∂

The claim follows, since both terms are zero. $

#
The fact that the Perona-Malik equation reduces the norm · ∞ is also called
maximum principle.
Now we analyze the Perona-Malik equation in local coordinates. With η =
∇u/|∇u| we get from Lemma 5.26

div(g(|∇u|)∇u) = g (|∇u|)|∇u|∂ηη u + g(|∇u|)u

= (g (|∇u|)|∇u| + g(|∇u|))∂ηη u + g(|∇u|)(u − ∂ηη u),

i.e., we have decomposed the operator div(g(|∇u|)∇u) into a part perpendicular

to the levelsets and a part tangential to the level sets. The tangential part is
g(|∇u|)(u − ∂ηη u), and it has a positive coefficient g(|∇u|). The perpendicular
part is described with the help of the flux function f (s) = sg(s) as f (|∇u|)∂ηη u.
The flux functions for the g’s from (5.14) are

s s2
−
f1 (s) = , f2 (s) = s e 2λ2 ; (5.16)
s2
1+ λ2

see Fig. 5.11.

In both cases f (s) becomes negative for s > λ. We note that the Perona-Malik
equation behaves like backward diffusion in the direction η at places with large
gradient (i.e. potential places for edges). In the directions tangential to the level

f1 f2

λ s λ s

Fig. 5.11 Flux functions f1 and f2 from (5.16) to the functions g1 and g2 from (5.14)
5.3 Nonlinear Diffusion 213

lines we see that the coefficient g(|∇u|) is small for large gradient, leading to slow
diffusion. We may conjecture three things:
• The Perona-Malik equation has the ability to make edges steeper (i.e., sharper).
• The Perona-Malik equation is unstable, since it has parts that behave like
backward diffusion.
• At steep edges, noise reduction may not be good.
We try to get some rigorous results and ask, what happens to solutions of the Perona-
Malik equation at edges.
To answer this question we follow [85] and consider only the one-dimensional
equation. We denote by u the derivative with respect to x and consider

∂t u = (g(|u |)u ) . (5.17)

By the chain rule we get

∂t u = (g (|u |)|u | + g(|u |))u .

Hence, the one-dimensional equation behaves like the equation in higher dimensions
in the direction perpendicular to the level lines, and we can use results for the one-
dimensional equation to deduce similar properties for the case in higher dimensions.
To analyze what happens at edges under the Perona-Malik equation, we define an
edge as follows:
Definition 5.30 We say that u : R → R has an edge at x0 if
1. |u (x0 )| > 0;
2. u (x0 ) = 0;
3. u (x0 )u (x0 ) < 0.
The first condition says that the image is not flat, while the second and the third
conditions guarantee a certain kind of inflection point; see Fig. 5.12.
Since in one dimension we have

∂η u = |u |, ∂ηη u = u , ∂ηηη u = sgn(u )u ,

|u (x0 )| > 0 |u (x0 )| > 0

u (x0 ) = 0 u (x0 ) = 0
u (x0 )u (x0 ) < 0 u (x0 )u (x0 ) > 0

x0 x0
edge in x0 no edge in x0

Fig. 5.12 Left: inflection point of the first kind (edge), right: inflection point of the second kind
(no edge)
214 5 Partial Differential Equations in Image Processing

we can characterize an edge by

1. ∂η u > 0;
2. ∂ηη u = 0;
3. ∂ηηη u < 0.
Using this notation, we can write the Perona-Malik equation (5.17) as

∂t u = (g (∂η u)∂η u + g(∂η u))∂ηη u = f (∂η u)∂ηη u.

In the following we will use the compact notation uη = ∂η u, uηη = ∂ηη u, etc. The
next theorem shows what happens locally at an edge.
Theorem 5.31 Let u be a five times continuously differentiable solution of the
Perona-Malik equation (5.17). Moreover, let x0 be an edge of u(t0 , · ) where
additionally uηηηη (t0 , x0 ) = 0 is satisfied. Then at (t0 , x0 ),
1. ∂t uη = f (uη )uηηη ,
2. ∂t uηη = 0, and
3. ∂t uηηη = 3f (uη )u2ηηη + f (uη )uηηηηη .
Proof We calculate the derivatives by swapping the order of differentiation:

∂t uη = ∂η ut = ∂η (f (uη )uηη ) = ∂η (f (uη ))uηη + f (uη )uηηη .

In a similar way we obtain

∂t uηη = ∂ηη (f (uη ))uηη + 2∂η (f (uη ))uηηη + f (uη )uηηηη

and

∂t uηηη = ∂ηηη (f (uη ))uηη + ∂ηη (f (uη ))uηηη

+ 2 ∂ηη (f (uη ))uηηη + ∂η (f (uη ))uηηηη + ∂η (f (uη ))uηηηη + f (uη )uηηηηη .

Since there is an edge at x0 , all terms uηη and uηηηη vanish. With ∂η (f (uη )) =
f (uη )uηη and ∂ηη (f (uη )) = f (uη )u2ηη + f (uη )uηηη we obtain the claim. #
$
The second assertion of the theorem says that edges remain inflection points. The
first assertion indicates whether the edge gets steeper or less steep with increasing
t. The third assertion tells us, loosely speaking, whether the inflection point tries to
change its type. The specific behavior depends on the functions f and g.
Corollary 5.32 In addition to the assumptions of Theorem 5.31 assume that the
diffusion coefficient g is given by one of the functions (5.14). Moreover, assume that
5.3 Nonlinear Diffusion 215

uηηηηη > 0 holds at the edge. Then in (t0 , x0 ):

1. ∂t uη > 0 if uη > λ and ∂t uη < 0 if uη < λ;
2. ∂t uηη = 0; √
3. ∂t uηηη < 0 if λ < uη < 3λ.
Proof We consider the function g2 from (5.14) and f2 from (5.16).
Assertion 2 is independent of the function g and follows directly from Theo-
rem 5.31.
Since uηηη < 0 holds at an edge, be have by Theorem 5.31, 1, that ∂t uη > 0 if and
only if f2 (uη ) < 0. From (5.16) one concludes that this holds if and only if uη > λ.
Otherwise, we have f2 (uη ) > 0 and thus ∂t uη < 0, which proves assertion 1.
To see assertion 3, we need the second derivative of the flux function:
s2
−
f2 (s) = − 3s
λ2
− s3
λ4
e 2λ2 .

By Theorem 5.31, 3, one has

∂t uηηη = 3f (uη )u2ηηη + f (uη )uηηηηη .

Since uηηηηη > 0, we conclude that f2 (uη ) < 0 for uη > λ and f2 (uη ) < 0 for
√
uη < 3λ, which implies assertion 3.
For g1 from (5.14) and f1 from (5.16) we have that f1 (uη ) < 0 if and only if
uη > λ (which proves assertion 1), and moreover,
2
− 3 − λs 2 λ2s2
f1 (s) = 2 3
.
1 + λs 2

Similarly to the case g2 , we also have f1 (uη ) < 0 for uη > λ and f1 (uη ) < 0 for
√
uη < 3λ. #
$
We may interpret the corollary as follows:
• The second point says that inflection points remain inflection points.
• The first point says that steep edges get steeper and flat edges become flatter.
More precisely: Edges no steeper than λ become flatter.
• √
If an edge is steeper than λ, there are two possibilities: If it is no steeper than
3λ, the inflection point remains an inflection√ point of the first kind, i.e. the
edge stays an edge. If the edge is steeper than 3λ, the inflection point may try
to change its type (uηηη can grow and become positive). This potentially leads to
the so-called staircasing effect; see Fig. 5.13.
Using a similar technique, one can derive a condition for a maximum principle of
the one-dimensional Perona-Malik equation; see Exercise 5.11.
216 5 Partial Differential Equations in Image Processing

3 u0 3

2 2

1 1
u0

2 4 2 4

Fig. 5.13 Illustration of the staircasing effect, sharpening and smoothing by the Perona-Malik
equation with the function g2 and λ = 1.6. Left: Initial value of the Perona-Malik equation and
= 0.5, 2, 3.5, 5. We see all
its derivative. Right: solution of the Perona-Malik equation at times t √
predicted effects from Corollary 5.32: the first edge is steeper
√ than 3λ and indeed we observe
staircasing. The middle edge has a slope between λ and 3λ and becomes steeper, while the right
edge has a slope below λ and is flattened out

Application 5.33 (Perona-Malik as Preprocessing for Edge Detection) The

presence of noise severely hinders automatic edge detection. Methods based on
derivatives (such as the Canny edge detection from Application 3.23) find many
edges in the noise. This can be avoided by a suitable presmoothing. Smoothing by
the heat equation is not suitable, since this may move the position of edges quite a
bit (see Remark 5.21). A presmoothing by the Perona-Malik equation gives better
results; see Fig. 5.14. We also note one drawback of the Perona-Malik equation:
there is no smoothing along the edges, and this results in noise at the edges. This
reduces the quality of edge detection.
Finally, we turn to the problem that the Perona-Malik equation does not have
solutions in general. From an analytical perspective, the problem is that gradients
do not stay bounded, and thus the diffusion coefficient may come arbitrarily close to
zero. One way to circumvent this difficulty is to slightly change the argument of the
function g. It turns out that a slight smoothing is enough. For simplicity, assume that
the domain is a square: = ]0, 1[2 . For some σ > 0 we define the Gauss kernel
Gσ as in (3.2). For some u ∈ L2 () let uσ = Gσ ∗ u, where we have extended
u by symmetry to the whole of R2 (compare Fig. 3.11). We consider the modified
Perona-Malik equation

∂t u = div(g(|∇uσ |)∇u) in [0, ∞[ ×

∂ν u = 0 on [0, ∞[ × ∂ (5.18)
u(0, x) = u0 (x) for x ∈ .
5.3 Nonlinear Diffusion 217

Fig. 5.14 Detection of edges in noisy images. Left column: Noisy image with gray values in
[0, 1] and the edges detected by Canny’s edge detector from Application 3.23 (parameters: σ = 2,
τ = 0.01). Middle column: Presmoothing by the heat equation, final time T = 20. Right column:
Presmoothing by the Perona-Malik equation with the function g1 , final time T = 20 and λ = 0.05

The only difference from the original Perona-Malik equation is the smoothing of u
in the argument of g. To show existence of a solution, we need a different notion
from that of viscosity solutions, namely the notion of weak solutions. This notion is
based on an observation that is similar to that in Theorem 5.15.
Theorem 5.34 Let A : → Rd×d be differentiable and T > 0. Then u ∈
C 2 ([0, ∞[×) with u(t, · ) ∈ L2 () is a solution of the partial differential equation

∂t u = div(A∇u) in [0, T ] ×
A∇u · ν = 0 on [0, T ] × ∂

if and only if for every function v ∈ H 1 () and every t ∈ [0, 1] one has

(∂t u(t, x))v(x) dx = − (A(x)∇u(t, x)) · ∇v(x) dx.

Proof Let u be a solution of the differential equation with the desired regularity and
v ∈ H 1 (). Multiplying both sides of the differential equation by v, integrating
218 5 Partial Differential Equations in Image Processing

over , and performing partial integration leads to

(∂t u(t, x))v(x) dx = (div(A∇u))(t, x)v(x) dx

= v(x)(A(x)∇u(t, x)) · ν dHd−1 − (A(x)∇u(t, x)) · ∇v(x) dx.
∂

The boundary integral vanishes because of the boundary condition, and this implies
the claim.
Conversely, let the equation for the integral be satisfied for all v ∈ H 1 ().
Similar to the above calculation we get

∂t u − div(A∇u) (t, x)v(x) dx = − v(x)(A(x)∇u(t, x)) · ν dHd−1.
∂

Since v is arbitrary, we can conclude that the integrals on both sides have to
vanish. Invoking the fundamental lemma of the calculus of variations (Lemma 2.75)
establishes the claim. $
#
The characterization of solutions in the above theorem does not use the assump-
tion u ∈ C 2 ([0, ∞[ × ), and also differentiability of A is not needed. The
equality of the integrals can be formulated for functions u ∈ C 1 ([0, T ], L2 ()) ∩
L2 (0, T ; H 1 ()). The following reformulation allows us to get rid of the time
derivative of u: we define a bilinear form a : H 1 () × H 1 () → R by

a(u, v) = (A∇u) · v dx.

Now we define the notion of weak solution of an initial-boundary value problem:

Definition 5.35 (Weak Solutions) Let u0 ∈ L2 () and A ∈ L∞ (, Rd×d ). A
function u ∈ L2 (0, T , H 1 ()) ∩ C ([0, T ], L2 ()) is called a weak solution of the
initial-boundary value problem

∂t u = div(A∇u) in [0, T ] × ,
A∇u · ν = 0 on [0, T ] × ∂,
u(0) = u0 ,

if for all v ∈ H 1 ()

d
(u(t), v) + a(u(t), v) = 0,
dt
u(0) = u0 .
5.3 Nonlinear Diffusion 219

This form of the initial-boundary value problem is called the weak formulation.
Remark 5.36 The time derivative in the weak formulation has to be understood in
the weak sense as described in Sect. 2.3. In more detail: the first equation of the
weak formulation states that for all v ∈ H 1 () and φ ∈ D(]0, 1[)
T 3 4
a(u(t), v)φ(t) − (u(t), v)φ (t) dt = 0.
0

Remark 5.37 The modification proposed in (5.18) can be interpreted in a different

way: the diffusion coefficient g(|∇u|) should act as an edge detector and slow down
diffusion at edges. Our investigation of edges in Application 3.23 showed that some
presmoothing makes edge detection more robust. Hence, the proposed diffusion
coefficient g(|∇uσ |) takes a more robust edge detector than the classical Perona-
Malik model. This motivation, as well as the model itself, goes back to [30].
The modified Perona-Malik equation (5.18) does have weak solutions, as has
been shown in [30].
Theorem 5.38 Let u0 ∈ L∞ (), σ > 0, T > 0, and g : [0, ∞[ → [0, ∞[ be
infinitely differentiable. Then the equation

∂t u = div(g(|∇uσ |)∇u) in [0, ∞[ ×

∂ν u = 0 on [0, ∞[ × ∂
u(0, x) = u0 (x) for x ∈

has a unique weak solution u : [0, T ] → L2 (). Moreover, u as a mapping from

[0, T ] to L2 () is continuous, and for almost all t ∈ [0, T ], one has u(t) ∈ H 1 ().
The proof is based on Schauder’s fixed point theorem [22] and uses deep results
from the theory of linear partial differential equations. The (weak) solutions of
the modified Perona-Malik equation have similar properties to the (in general
non-existent) solutions of the original Perona-Malik equation. In particular, The-
orem 5.29 holds similarly, as you shall show in Exercise 5.12.
Example 5.39 (Denoising with the Modified Perona-Malik Equation) We consider
the modified Perona-Malik equation as in Theorem 5.38 and aim to see its perfor-
mance for denoising of images. As with the standard Perona-Malik equation (5.15),
we choose g as a decreasing function from (5.14). Again λ > 0 acts as a threshold
for edges. However, this time we have to take into account that the argument of
g is the magnitude of the smoothed gradient. This is, depending on the smoothing
parameter σ , smaller than the magnitude of the gradient itself. Thus, the parameter
λ should be chosen in dependence on σ . A rule of thumb is, the parameter σ should
be adapted to the noise level in u0 such that u0 ∗ Gσ is roughly noise-free. Then
one inspects the magnitude of the gradient of u0 ∗ Gσ and chooses λ such that the
dominant edges have a gradient above λ.
220 5 Partial Differential Equations in Image Processing

= 0.1, = 0.01 = 0.1, =1

= 0.025, =1 = 0.025, =2

Fig. 5.15 Denoising with the modified Perona-Malik equation

Figure 5.15 shows the denoising capabilities of the modified Perona-Malik

equation. The function g2 from Eq. (5.14) has been used with final time T = 50.
For very small values of σ (0.01) one does not get good results: for λ = 0.1,
some edges are quite blurred, while the noise has not been removed well. If the
value of σ is increased without adjusting λ appropriately, one sees, as expected,
that the magnitudes of the gradient at edges fall below the edge threshold due to
presmoothing and hence are smoothed away. A suitable adaption at λ to σ leads to
good results, as in the case of σ = 1 and σ = 2. For larger σ the effect that the
Perona-Malik equation cannot remove noise at steep edges is pronounced.
In conclusion, we collect the following insight into the Perona-Malik equation:
• The choice u ∗ Gσ as edge detector makes sense, since on the one hand, it gives
a more robust edge detector, while on the other hand, it also allows for a rigorous
theory for solutions.
• The choice of the parameters λ, σ , and T is of great importance for the result.
For the smoothing parameter σ and the edge threshold λ there exist reasonable
rules of thumb.
• The modified Perona-Malik equation shows the predicted properties in practice:
homogeneous regions are denoised, edges are preserved, and at steep edges, noise
is not properly eliminated.
5.3 Nonlinear Diffusion 221

Example 5.40 (Color Images and the Perona-Malik Equation) We use the example
of nonlinear diffusion according to Perona-Malik to illustrate some issues that arise
in the processing of color images. We consider a color image with three color
channels u0 : → R3 (cf. Sect. 1.1). Alternatively, we could also consider three
separate images or u0 : × {1, 2, 3} → R, where the image u0 ( · , k) is the kth
color channel. If we want to apply the Perona-Malik equation to this image, we
have several possibilities for choosing the diffusion coefficient. One naive approach
would be to apply the Perona-Malik equation to all channels separately:

∂t u(t, x, k) = div g(|∇u( · , · , k)|)∇u( · , · , k) (t, x), k = 1, 2, 3.

This may lead to problems, since it is not clear whether the different color channels
have their edges at the same positions. This can be seen clearly in Fig. 5.16. The
image there consists of a superposition of slightly shifted, blurred circles in the RGB
channels. After the Perona-Malik equation has been applied to all color channels,
one can clearly see the shifts. One way the get around this problem is to use the
HSV color system. However, there is another problem: edges do not need to have
the same slope in the different channels, and this may lead to further errors in the
colors. In the HSV system, the V-channel carries the most information, and often it

original RGB system, separately HSV system, separately

HSV system, only V channel RGB system, coupled

Fig. 5.16 Nonlinear Perona-Malik diffusion for color images. The original image consists of
slightly shifted and blurred circles with different intensities in the three color RGB channels. The
color system and the choice of the diffusion coefficient play a significant role. The best results are
achieved when the diffusion coefficients are coupled among the channels
222 5 Partial Differential Equations in Image Processing

is enough just to denoise this channel; however, there may still be errors. All these
effects can be seen in Fig. 5.16.
Another possibility is to remember the role of the diffusion coefficient as an edge
detector. Since being an edge is not a property of a single color channel, edges
should be at the same places in all channels. Hence, one should choose the diffusion
coefficient to be equal in all channels, for example by using the average of the
magnitudes of the gradients:

1 3

∂t u(t, x, k) = div g ∇u( · , · , i) ∇u( · , · , k) (t, x), k = 1, 2, 3.
3
i=1

Hence, the diffusion coefficient is coupled among the channels. Usually this gives
the best results. The effects can also be seen in real images, e.g., in images where
so-called chromatic aberration is present. This refers to the effect that is caused by
the fact that light rays of different wavelengths are refracted differently. This effect
can be observed, for example, in lenses of low quality; see Fig. 5.17.

5.3.2 Anisotropic Diffusion

The Perona-Malik equation has shown good performance for denoising and simul-
taneous preservation of edges. Smoothing along edges has not been that good,
though. This drawback can be overcome by switching to an anisotropic model. The
basic idea is to design a diffusion tensor that enables Perona-Malik-like diffusion
perpendicular to the edges but linear diffusion along the edges. We restrict ourselves
to the two-dimensional case, since edges are curves in this case and there is only
one direction along the edges. The development of methods based on anisotropic
diffusion goes back to [140].
The diffusion tensor should encode as much local image information as possible.
We follow the modified model (5.18) and take ∇uσ as an edge detector. As
preparation for the following, we define the structure tensor:
Definition 5.41 (Structure Tensor) The structure tensor for u : R2 → R and
noise level σ > 0 is the matrix-valued function J0 (∇uσ ) : R2 → R2×2 defined by

J0 (∇uσ ) = ∇uσ ∇uTσ .

It is obvious that the structure tensor does not contain any more information than
the smoothed gradient ∇uσ , namely the information on the local direction of the
image structure and the rate of the intensity change. We can find this information in
the structure tensor as follows:
5.3 Nonlinear Diffusion 223

Fig. 5.17 Nonlinear Perona-Malik diffusion for a color image degraded by chromatic aberration.
If one treats the color channels separately in either the RGB or HSV system, some color errors
along the edges occur. Only denoising of the V channels shows good results; coupling of the color
channels gives best results

Lemma 5.42 The structure tensor has two orthonormal eigenvectors v1 ∇uσ and
v2 ⊥ ∇uσ . The corresponding eigenvalues are |∇uσ |2 and zero.
Proof For v1 = c∇uσ , one has J0 (∇uσ )v1 = ∇uσ ∇uTσ (c∇uσ ) = ∇uσ c|∇uσ |2 =
|∇uσ |2 v1 . Similarly, one sees that J0 (∇uσ )v2 = 0. $
#
Thus, the directional information is encoded in the eigenvalues of the structure
tensor. The eigenvalues correspond, roughly speaking, to the contrast in the
224 5 Partial Differential Equations in Image Processing

directions of the respective eigenvectors. A second spatial averaging allows one to

encode even more information in the structure tensor:
Definition 5.43 The structure tensor to u : R2 → R, noise level σ > 0, and spatial
scale ρ > 0 is the matrix-valued function Jρ (∇uσ ) : R2 → R2×2 defined by

Jρ (∇uσ ) = Gρ ∗ (∇uσ ∇uTσ ).

The convolution is applied componentwise, i.e. separately for every component of

the matrix. Since the smoothed gradient ∇uσ enters in the structure tensor J0 (∇uσ )
quadratically, one cannot express the different smoothings by Gσ and Gρ each
in terms
of the other.
The top left entry of Jρ (∇uσ ), for example, has the form
Gρ ∗ (Gσ ∗ ∂x u)2 . Due to the presence of the square, one cannot express the two
convolutions as just one. In other words, the second convolution with Gρ is not just
another averaging of the same type as the convolution with Gσ .
Lemma 5.44 The structure tensor Jρ (∇uσ )(x) is positive semi-definite for all x.
Proof Obviously, the matrix J0 (∇uσ )(x) is positive semi-definite for every x (the
eigenvalues are |∇uσ (x)|2 ≥ 0 and zero). In particular, for every vector v ∈ R2 and
every x one has v T J0 (∇uσ )(x) v ≥ 0. Hence

v T Jρ (∇uσ )(x)v = v T Gσ (x − y)J0 (∇uσ )(y) dy v
R2

= G (x − y) v T J (∇u )(y)v dy
R2
σ
0 σ
≥0 ≥0

≥ 0. $
#
As a consequence of the above lemma, Jρ (∇uσ ) also has orthonormal eigenvectors
v1 , v2 and corresponding nonnegative eigenvalues μ1 ≥ μ2 ≥ 0. We interpret these
quantities as follows:
• The eigenvalues μ1 and μ2 are the “averaged contrasts” in the directions v1 and
v2 , respectively.
• The vector v1 points in the direction of “largest averaged gray value variation.”
• The vector v2 points in the direction of “average local direction of an edge.” In
other words, v2 is the “averaged direction of coherence.”
Starting from this interpretation we can use the eigenvalues μ1 and μ2 to discrimi-
nate different regions of an image:
• μ1 , μ2 small: There is no direction of significant change in gray values. Hence,
this is a flat region.
• μ1 large, μ2 small: There is a large gray value variation in one direction, but not
in the orthogonal direction. Hence, this is an edge.
5.3 Nonlinear Diffusion 225

large, small: edge large: corner

Fig. 5.18 The structure tensor Jρ (∇uσ ) encodes information about flat regions, edges, and
corners. In the bottom row, the respective regions have been colored in white. In this example
the noise level is σ = 4 and the spatial scale is ρ = 2

• μ1 , μ2 both large: There are two orthogonal directions of significant gray value
change, and this is a corner.
See Fig. 5.18 for an illustration. We observe that the structure tensor Jρ (∇uσ )
indeed contains more information than J0 (∇uσ ): For the latter, there is always one
eigenvalue zero, and thus it cannot see corners. The matrix Jρ (∇uσ ) is capable of
doing this, since direction information from some neighborhood is used.
Before we develop special methods for anisotropic diffusion, we cite a theorem
about the existence of solutions of anisotropic diffusion equations where the diffu-
sion tensor is based on the structure tensor. The theorem is due to Weickert [140]
and is a direct generalization of Theorem 5.38.
Theorem 5.45 Let u0 ∈ L∞ (), ρ ≥ 0, σ > 0, T > 0, and let D : R2×2 → R2×2
satisfy the following properties:
• D ∈ C ∞ (, R2×2 ).
• D(J ) is symmetric for every symmetric J .
• For every bounded function w ∈ L∞ (, R2 ) with w∞ ≤ K there exists a
constant ν(K) > 0 such that the eigenvalues of D(Jρ (w)) are larger than ν(K).
226 5 Partial Differential Equations in Image Processing

Then the equation

∂t u = div D(Jρ (∇uσ ))∇u in [0, ∞[ × ,
∂D(Jρ (∇uσ ))ν u = 0 on [0, ∞[ × ∂,
u(0, x) = u0 (x) for x ∈ ,

has a unique solution u : [0, T ] → L2 (). Moreover, u is a continuous mapping

from [0, T ] to L2 () and for almost all t ∈ [0, T ], one has u(t) ∈ H 1 ().
The structure tensor helps us to motivate the following anisotropic diffusion
equations, which were also developed by Weickert [140].
Example 5.46 (Edge-Enhancing Diffusion) In this case we want to have uniform
diffusion along the edges and Perona-Malik-like diffusion perpendicular to the
edges. With the larger eigenvalue μ1 of the structure tensor Jρ (∇uσ ) we define,
using a function g as in (5.14),

λ1 = g(μ1 ), λ2 = 1.

Using the orthonormal eigenvectors v1 and v2 of Jρ (∇uσ ), we define a diffusion

tensor D by
: T; : T;
λ1 0 v1 g(μ1 ) 0 v1
D = v1 v2 = v1 v2 .
0 λ2 v2T 0 1 v2T

The eigenvectors of D are obviously the vectors v1 and v2 , and the corresponding
eigenvalues are λ1 and λ2 . Hence, the equation

∂t u = div D(Jρ (∇uσ ))∇u

should indeed lead to linear diffusion as in the heat equation along the edges and
to Perona-Malik-like diffusion perpendicular to each edge; see Fig. 5.19. For results
on existence and uniqueness of solutions we refer to [140].

Example 5.47 (Coherence Enhancing Diffusion) Now we aim to enhance “coher-

ent regions,” i.e. regions where the local structure points in the same direction.
To that end, we recall the meaning of the eigenvalues μ1 and μ2 of the structure
tensor Jρ (∇uσ ): they represent the contrast in the orthogonal eigendirections. The
local structure is incoherent if both eigenvalues have a similar value. In this case
we either have a flat region (if both eigenvalues are small) or some kind of corner
(both eigenvalues large). If μ1 is considerably larger than μ2 , there is a dominant
direction (and this is v2 , since v1 is orthogonal to the edges). The idea behind
coherence-enhancing diffusion is, to use |μ1 − μ2 | as a measure of local coherence.
The diffusion tensor is then basically similar to the previous example: it will have
5.3 Nonlinear Diffusion 227

Fig. 5.19 Effect of anisotropic diffusion according to Examples 5.46 and 5.47 based on the
structure tensor Jρ (∇uσ ) (parameters σ = 0.5, ρ = 2). Top left: original image. Middle row:
edge-enhancing diffusion with function g2 and parameter λ = 0.0005. Bottom row: coherence-
enhancing diffusion with function g2 and parameters λ = 0.001, α = 0.001. Images are shown at
times t = 25, 100, 500
228 5 Partial Differential Equations in Image Processing

the same eigenvectors v1 and v2 as the structure tensor, and the eigenvalue for v2
should become larger for higher local coherence |μ1 − μ2 |. With a small parameter
α > 0 and a function g as in (5.14) we use the following eigenvalues:

λ1 = α, λ2 = α + (1 − α)(1 − g(|μ1 − μ2 |)).

Similar to Example 5.46, we define the diffusion tensor as

: T;
λ1 0 v1
D = v1 v2 .
0 λ2 v2T

The parameter α > 0 is needed to ensure that the diffusion tensor is positive definite.
As in the previous example we use the model

∂t u = div D(Jρ (∇uσ ))∇u .

The function g is used in a way that the eigenvalue λ2 is small for low coherence (in
the order of α) and close to one for high coherence. In Fig. 5.19 one sees that this
equation indeed enhances coherent structures. For further theory we refer to [140].
Application 5.48 (Visualization of Vector Fields) Vector fields appear in several
applications, for example vector fields that describe flows as winds in the weather
forecast or fluid flows around an object. However, the visualization of vector fields
for visual inspection is not easy. Here are some methods: On the one hand, one
can visualize a vector field v : → Rd by plotting small arrows v(x) at some
grid points x ∈ . Another method is to plot so-called integral curves, i.e., curves
γ : [0, T ] → such that the vector field is tangential to the curve, which means
γ (t) = v(γ (t)). The first variant can quickly lead to messy pictures, and the choice
of the grid plays a crucial role. for the second variant one has to choose a set of
integral curves, and it may happen that these curves accumulate in one region of the
image while other regions may be basically empty.
Another method for the visualization of a vector field, building on anisotropic
diffusion, has been proposed in [52]. The idea is to design a diffusion tensor that
allows diffusion along the vector field, but not orthogonal to the field. The resulting
diffusion process is than applied to an image consisting of pure noise. In more detail,
the method goes as follows: for a continuous vector field v : → Rd that is not
zero on , there exists a continuous map B(v) that maps a point x ∈ to a rotation
matrix B(v)(x) that rotates the vector v(x) to the first unit vector e1 : B(v)v = |v|e1 .
Using an increasing and positive mapping α : [0, ∞[ → [0, ∞[ and a decreasing
map G : [0, ∞[ → [0, ∞[ with G(r) → 0 for r → ∞, we define the matrix
8 9
α(|v|) 0
A(v, r) = B(v)T B(v).
0 G(r) idd−1
5.4 Numerical Solutions of Partial Differential Equations 229

For some initial image u0 : → [0, 1] and σ > 0 we consider the following
differential equation:

∂t u = div(A(v, |∇uσ |)∇u).

Since this equation drives all initial images to a constant image in the limit t → ∞
(as all the diffusion equations in this chapter do) the authors of [52] proposed to
add a source term. For concreteness let f : [0, 1] → R be continuous such that
f (0) = f (1) = 0, f < 0 on ]0, 0.5[, and f > 0 on ]0.5, 1[. This leads to the
modified differential equation

∂t u = div(A(v, |∇uσ |)∇u) + f (u). (5.19)

The new term f (u) will push the gray values toward 0 and 1, respectively, and thus
leads to higher contrast. This can be accomplished, for example, with the following
function:

f
f (u) = (u − 1
2) ( 21 )2 − (u − 1 2
2)
1

Figure 5.20 shows the effect of Eq. (5.19) with this f and a random initial value.
One can easily extract the directions of the vector field, even in regions where the
vector field has a small magnitude.

5.4 Numerical Solutions of Partial Differential Equations

To produce images from the equations of the previous sections of this chapter,
we have to solve these equations. In general this is not possible analytically, and
numerical methods are used. The situation in image processing is a little special in
some respects:
• Typically, images are given on a rectangular equidistant grid, and one aims to
produce images of a similar shape. Hence, it is natural to use these grids.
• The visual quality of the images is more important than to solve the equations
as accurately as possible. Thus, methods with lower order are acceptable if they
produce “good images.”
• Some of the partial differential equations we have treated preserve edges or create
“kinks.” Two examples are the equations for erosion and dilation (5.12). This
poses a special challenge for numerical methods, since the solutions that will be
approximated are not differentiable.
We consider an introductory example:
230 5 Partial Differential Equations in Image Processing

Fig. 5.20 Visualization of vector fields by anisotropic diffusion as in Application 5.48. Top left:
The vectorfield. Below: The initial value and different solutions of Eq. (5.19) at different times
5.4 Numerical Solutions of Partial Differential Equations 231

Example 5.49 (Discretization of the Heat Equation) We want to solve the follow-
ing initial-boundary value problem for the heat equation

∂t u = u in [0, T ] × ,
∂ν u = 0 on [0, T ] × ∂, (5.20)
u(0) = u0 ,

on a rectangular domain ⊂ R2 . Our initial image u0 is given in discrete form, i.e.,

as an N × M matrix. We assume that the entries of the matrix have been obtained
by sampling a continuous image with sampling rate h in the x1 and x2 directions,
respectively. If we use (by slight abuse of notation) u0 for both the discrete and the
continuous image, we can view the values u0i,j as u0 ((i − 1)h, (j − 1)h) (we shift
by 1 to make the indices i, j start at one and not at zero). Thus, we know the values
of u0 on a rectangular equidistant grid as follows:

h 2h · · · (M − 1)h
x1
h ···

.. ..
. .

(N − 1)h
x2

For the time variable we proceed similarly and discretize it with step-size τ .
Using u for the solution of the initial-boundary value problem (5.20), we want to
find uni,j as an approximation to u(nτ, (i − 1)h, (j − 1)h), i.e., all three equations
in (5.20) have to be satisfied. The initial condition u(0) = u0 is expressed by the
equation

u0i,j = u0 ((i − 1)h, (j − 1)h).

To satisfy the differential equation ∂t u = u, we replace the derivatives by

difference quotients. We discretized the Laplace operator in Sect. 3.3.3 already:

u((i + 1)h, j h)+u((i − 1)h, j h)+u(ih, (j + 1)h)+u(ih, (j − 1)h)−4u(ih, j h)

u(ih, j h) ≈ .
h2
232 5 Partial Differential Equations in Image Processing

The derivative in the direction t is approximated by a forward difference quotient:

u((n + 1)τ, ih, j h) − u(nτ, ih, j h)

∂t u(nτ, ih, j h) ≈ .
τ

This gives the following equation for the discrete values uni,j :

un+1
i,j − ui,j
n uni+1,j + uni−1,j + uni,j +1 + uni,j −1 − 4uni,j
= . (5.21)
τ h2
There is a problem at the points with i = 1, N or j = 1, M: the terms that involve
i = 0, N + 1 or j = 0, M + 1, respectively, are not defined. To deal with this
issue we take the boundary condition ∂ν u = 0 into account. In this example, the
domain is a rectangle, and the boundary condition has to be enforced for values
with i = 1, N and j = 1, M. We add the auxiliary points un0,j , unN+1,j , uni,0 , and
uni,M+1 and replace the derivative by a central difference quotient and get

un0,j − un2,j unN−1,j − unN+1,j

= 0, = 0,
2h 2h
ui,0 − uni,2
n ui,M−1 − uni,M+1
n
= 0, = 0.
2h 2h
This leads to the equations

un0,j = un2,j , unN+1,j = unN−1,j

uni,0 = uni,2 , uni,M+1 = uni,M−1 .

Thus, the boundary condition is realized by mirroring the values over the boundary.
The discretized equation (5.21) for i = 1, for example, has the following form:

un+1
1,j − u1,j
n
2un2,j + un1,j +1 + un1,j −1 − 4un1,j
= .
τ h2
We can circumvent the distinction of different cases by the following notation: we
solve Eq. (5.21) for un+1
i,j and get

τ n
un+1
i,j = ui,j +
n
(u + uni−1,j + uni,j +1 + uni,j −1 − 4uni,j ).
h2 i+1,j
This can be realized by a discrete convolution as in Sect. 3.3.3:
⎡ ⎤
0 1 0
τ
un+1 = un + 2 un ∗ ⎣1 −4 1⎦ .
h
0 1 0
5.4 Numerical Solutions of Partial Differential Equations 233

One advantage of this formulation is that we can realize the boundary condition
by a symmetric extension over the boundary. Starting from the initial value u0i,j =
u0 ((i − 1)h, (j − 1)h) we can use this to calculate an approximate solution uni,j
iteratively for every n.
We call the resulting scheme explicit, since we can calculate the values un+1
directly from the values un . One reason for this is that we discretized the time
derivative ∂t u by a forward difference quotient. If we use a backward difference
quotient, we get

uni,j − un−1
i,j uni+1,j + uni−1,j + uni,j +1 + uni,j −1 − 4uni,j
=
τ h2
or
⎛ ⎡ ⎤⎞
0 1 0
τ n
⎝u − u ∗ ⎣1 −4 1⎦⎠ = un−1
n
h2
0 1 0

(again, with a symmetric extension over the boundary to take care of the boundary
condition). This is a linear system of equations for un , and we call this scheme
implicit.
This initial example illustrates a simple approach to constructing numerical
methods for differential equations:
• Approximate the time derivative by forward or backwards difference quotients.
• Approximate the spatial derivative by suitable difference quotients and use
symmetric boundary extension to treat the boundary condition.
• Solve the resulting equation for un+1 .
A little more abstractly: we can solve a partial differential equation of the form

∂t u(t, x) = L(u)(t, x)

with a differential operator L that acts on the spatial variable x only by so-called
semi-discretization:
• Treat the equation in a suitable space X, i.e., u : [0, T ] → X, such that ∂t u =
L(u).
• Discretize the operator L: Choose a spatial discretization of the domain of the
x variable and thus, an approximation of the space X. Define an operator L
that operates on the discretized space and approximates L. Thus, this partial
differential equation turns into a system of ordinary differential equations

∂t u = L(u).

• Solve the system of ordinary differential equations with some method known
from numerical analysis (see, e.g., [135]).
234 5 Partial Differential Equations in Image Processing

In imaging, one often faces the special case that the domain for the x variable
is a rectangle. Moreover, the initial image u0 is often given on an equidistant grid.
This gives a natural discretization of the domain. Hence, many methods in imaging
replace the differential operators by difference quotients. This is generally known
as the method of finite differences.
The equations from Sects. 5.2 and 5.3 can be roughly divided into two groups:
equations of diffusion type (heat equation, nonlinear diffusion) and equations of
transport type (erosion, dilation, mean curvature motion). These types have to be
treated differently.

5.4.1 Diffusion Equations

In this section we consider equations of diffusion type

∂t u = div(A∇u),

i.e., the differential operator L(u) = div(A∇u). First we treat the case of isotropic
diffusion, i.e., A : → R is a scalar function. We start with the approximation of
the differential operator div(A∇u) = ∂x1 (A∂x1 u)+∂x2 (A∂x2 u) by finite differences.
Obviously it is enough to consider the term ∂x1 (A∂x1 u). At some point (i, j ) we
proceed as follows:

1
∂x1 (A∂x1 u) ≈ (A∂x1 u) 1 − (A∂x1 u) 1
h i+ 2 ,j i− 2 ,j

with
u − ui,j u − ui−1,j
i+1,j i,j
(A∂x1 u) 1 =A 1 , (A∂x1 u) 1 =A 1 .
i+ 2 ,j i+ 2 ,j h i− 2 ,j i− 2 ,j h

Using a similar approximation for the x2 direction we get

1
div(A∇u) ≈ A 1 ui,j −1 + A 1 ui,j +1 + A 1 ui−1,j + A 1 ui+1,j
h2 i,j − 2 i,j + 2 i− 2 ,j i+ 2 ,j

− (A 1 +A 1 + A 1 + A 1 )ui,j .
i,j − 2 i,j + 2 i− 2 ,j i+ 2 ,j
(5.22)

We can arrange this efficiently in matrix notation. To that end, we arrange the matrix
u ∈ RN×M in a vector U ∈ RNM , by stacking the rows into a vector.1 We define a

1 Of course, we could also stack the columns into a vector, and indeed, some software packages

have this as a default operation. The only difference between these two approaches is the direction
of the x1 and x2 coordinates.
5.4 Numerical Solutions of Partial Differential Equations 235

bijection : {1, . . . , N }×, {1, . . . , M} → {1, . . . , NM} of the index sets by

(i, j ) = (i − 1)M + j, −1 (I ) = ( M"

I
+ 1, I mod M).

Thus we get

U(i,j ) = ui,j , bzw. UI = u−1 (I ) .

..
.

··· ..
.
..
.

.. ..
. .

..
.

If we denote the right-hand side in (5.22) by vi,j and define V(i,j ) = vi,j and
U(i,j ) = ui,j , we get V = AU with a matrix A ∈ RNM×NM defined by
⎧
⎪
⎪−(A 1 +A 1 +A 1 +A 1 ) if i = k, j = l,
⎪
⎪ i,j − 2 i,j + 2 i− 2 ,j i+ 2 ,j
⎪
⎪
⎨A 1 if i ± 1 = k, j = l,
A(i,j ),(k,l) = i± 2 ,j
⎪
⎪ if i = k, j ± 1 = l,
⎪
⎪A 1
⎪
⎪ i,j ± 2
⎩
0 otherwise.
(5.23)

We can treat boundary conditions ∂ν u = 0 similarly to Example 5.49 by introducing

auxiliary points and again realize the boundary condition by symmetric extension
over the boundary. We obtain the following semi-discretized equation

1
∂t U = AU.
h2
This is a system of linear ordinary differential equations. Up to now we did not
incorporate that the diffusion coefficient A may depend on u (or on the gradient of
u, respectively). In this case A depends on u and we obtain the nonlinear system

1
∂t U = A(U )U.
h2
236 5 Partial Differential Equations in Image Processing

Approximating the time derivative by a simple difference, we see three variants:

Explicit:

U n+1 − U n 1
= 2 A(U n )U n .
τ h

Implicit:

U n+1 − U n 1
= 2 A(U n+1 )U n+1 .
τ h

Semi-implicit:

U n+1 − U n 1
= 2 A(U n )U n+1 .
τ h
The implicit variant leads to a nonlinear system of equations, and its solution may
pose a significant challenge. Hence, implicit methods are usually not the method of
choice. The explicit method can be written as
τ
U n+1 = (id + A(U n ))U n (5.24)
h2
and it requires only one discrete convolution per iteration. The semi-implicit method
leads to
τ
(id − A(U n ))U n+1 = U n . (5.25)
h2

This is a linear system of equations for U n+1 , i.e., in every iteration we need to solve
one such system.
Now we analyze properties of the explicit and semi-implicit methods.
Theorem 5.50 Let A 1 ≥ 0, A 1 ≥ 0, and A(U n ) according to Eq. (5.23).
i± 2 ,j i,j ± 2
For the explicit method (5.24) assume that the step-size restriction

h2
τ≤
maxI |A(U n )I,I |

holds, while for the semi-implicit method (5.25) no upper bound on τ is assumed.
Then the iterates U n of (5.24) and (5.25), respectively, satisfy the discrete maximum
principle, i.e. for all I ,

min UJ0 ≤ UIn ≤ max UJ0 .

J J
5.4 Numerical Solutions of Partial Differential Equations 237

Moreover, under the same assumptions,

NM
NM
UJn = UJ0 ,
J =1 J =1

for both the explicit and semi-implicit methods, i.e., the mean gray value is
preserved.
Proof First we consider the explicit iteration (5.24) and set Q(U n ) =
id +τ/ h2 A(U n ). Then
the explicit iteration reads U n+1 = Q(U n )U n . By definition
NM
of A(U ), we have J =1 A(U n )I,J = 0 for all I (note the boundary condition
n

∂ν u = 0!), and hence

NM
Q(U n )I,J = 1.
J =1

This immediately implies the preservation of the mean gray value, since

NM
NM NM
NM
NM
NM
UJn+1 = Q(U n )I,J UIn = Q(U n )I,J UIn = UIn ,
J =1 J =1 I =1 I =1 J =1 I =1

which, by recursion, implies the claim.

Moreover, for I = J one has Q(U n )I,J = A(U n )I,J ≥ 0. On the diagonal,
τ
Q(U n )I,I = 1 + A(U n )I,I .
h2

The step-size restriction implies Q(U n )I,I ≥ 0, which shows that the matrix Q(U n )
has nonnegative components. We deduce that

NM
NM
UIn+1 = Q(U n )I,J UJn ≤ max UKn Q(U n )I,J = max UKn .
K K
J =1 J =1

=1

Similarly, one shows that

UIn+1 ≥ min UKn ,

which, again by recursion, implies the maximum principle.

In the semi-implicit case we write R(U n ) = (id −τ/ h2 A(U n )), and hence
the iteration reads U n+1 = R(U n )−1 U n . For the matrix A(U n ), we have by
238 5 Partial Differential Equations in Image Processing

construction

A(U n )I,I = − A(U n )I,J ,
J =I

and thus
τ τ τ
R(U n )I,I = 1 − 2
A(U n )I,I = 1 + 2 A(U n )I,J > 2 A(U n )I,J = |R(U n )I,J |.
h h h
J =I J =I J =I

The property R(U n )I,I > J =I |R(U n )I,J | is called “strict diagonal dominance.”
This property implies that R(U n ) is invertible and that the inverse matrix R(U n )−1
has nonnegative entries (cf. [72]). Moreover, with e = (1, . . . , 1)T ∈ RNM , we have
R(U n )e = e, which implies, by invertibility of R(U n ), that R(U n )−1 e = e holds,
too. We conclude that

NM
(R(U n )−1 )I,J = 1.
J =1

Similarly to the explicit case, we deduce the preservation of the mean gray value
from the nonnegativity R(U n )−1 , and

NM
NM
UIn+1 = (R(U n )−1 )I,J UJn ≤ max UKn (R(U n )−1 )I,J = max UKn
K K
J =1 J =1

=1

implies, by recursion, the full claim. $

#
Remark 5.51 (Step-Size Restrictions and the AOS Method) The step-size restriction
of the explicit method is a severe limitation. To compute the solution for a large
time t, one may need many iterations. Since the matrix A is sparse, every iteration
is comparably cheap (each row has only five non-zero entries). However, if the
diffusion coefficient depends on U , A has to be assembled for every iteration, which
also costs some time. The semi-implicit method does not have a step-size restriction,
but one needs to solve a linear system in every iteration. Again, by sparseness of A,
this may be done efficiently with an iterative method like the Gauss-Seidel method
(see, e.g., [72]). However, for larger steps τ , these methods tend to take longer to
converge, and this restricts the advantage of the semi-implicit method. One way
to circumvent this difficulty is the method of “additive operator splitting” (AOS,
see [140]). Here one treats the differences in the x1 - and x2 -directions separately
and averages. One need to solve two tridiagonal linear systems per iteration, which
can be done in linear time. For details of the implementation we refer to [140].
5.4 Numerical Solutions of Partial Differential Equations 239

Example 5.52 (Perona-Malik and Modified Perona-Malik) In the case of Perona-

Malik diffusion we have

A(u) = g(|∇u|) id .

Hence, the entries of A are

A 1 = g(|∇u| 1 ).
i± 2 ,j i± 2 ,j

To calculate the magnitude of the gradient at intermediate coordinates we can use,

for example, linear interpolation of the magnitudes of the gradients at neighboring
integer places, i.e.,

|∇u|i,j + |∇u|i±1,j
|∇u| 1 = .
i± 2 ,j 2

The gradients at these integer places can be approximated by finite differences. For
the modified Perona-Malik equation we have

A 1 = g(|∇uσ | 1 ).
i± 2 ,j i± 2 ,j

Then the entries of A are calculated in exactly the same way, after an initial
presmoothing. If the function g is nonnegative, the entries A 1 and A 1 are
i± 2 ,j i,j ± 2
nonnegative, and Theorem 5.50 applies. This discretization was used to generate the
images in Fig. 5.10. Alternative methods for the discretization of isotropic nonlinear
diffusion are described, for example, in [142] and [111].
Remark 5.53 (Anisotropic Equations) In the case of anisotropic diffusion with
symmetric diffusion tensor
8 9
B C
A= ,
CD

there are mixed second derivatives in the divergence. For example, in two dimen-
sions,

div(A∇u) = ∂x1 (B∂x1 u + C∂x2 u) + ∂x2 (C∂x1 u + D∂x2 u).

If we form the matrix A similar to Eqs. (5.22) and (5.23) by finite differences, it
is not clear a priori how one can ensure that the entries A 1 and A 1 are
i± 2 ,j i,j ± 2
nonnegative. In fact, this is nontrivial, and we refer to [140, Section 3.4.2]. One
alternative to finite differences is the method of finite elements, and we refer to [52,
115] for details.
240 5 Partial Differential Equations in Image Processing

5.4.2 Transport Equations

Transport equations are a special challenge. To see why this is so, we begin with a
simple one-dimensional example of a transport equation. For a = 0 we consider

∂t u + a∂x u = 0, t > 0, x ∈ R,

with initial value u(0, x) = u0 (x). It is simple to check that the solution is just the
initial value transported with velocity a, i.e.,

u(t, x) = u0 (x − at).

This has two interesting aspects:

1. The formula for the solution is applicable for general measurable functions u0 ,
and no continuity or differentiability is needed whatsoever. Hence, one could also
“solve” the equation for this type of initial condition.
2. There are curves (in this case even straight lines) along which the solution is
constant. These curves are the solutions of the following ordinary differential
equation:

X (t) = a, X(0) = x0 .

These curves are called characteristics.

The first point somehow explains why methods based on finite differences tend to be
problematic for transport equations. The second point can be extended to a simple
method, the method of characteristics.

Method of Characteristics

We describe the solution of a transport equation in Rd :

Lemma 5.54 Let a : Rd → Rd be Lipschitz continuous, u0 : Rd → R continuous,
and u a solution of the Cauchy problem

∂t u + a · ∇u = 0,
u(0, x) = u0 (x).

If X is a solution of the ordinary initial value problem

X = a(X), X(0) = x0 ,

then u(t, X(t)) = u0 (x0 ).

5.4 Numerical Solutions of Partial Differential Equations 241

Proof We consider u along the solutions X of the initial value problems and take
the derivative with respect to t:

d
u(t, X(t)) = ∂t u(t, X(t)) + a(X(t)) · ∇u(t, X(t)) = 0.
dt
Thus, u is constant along X, and by the initial value for X we get at t = 0 that

u(0, X(0)) = u(0, x0 ) = u0 (x0 ). $

#
Also in this case the curves X are called characteristics of the equation. For a given
vector field a one can solve the transport equations by calculating the characteristics,
which amounts to solving ordinary differential equations.
Application 5.55 (Coordinate Transformations) The coordinate transformation
from Example 5.2 has the infinitesimal generator

A[u](x) = v(x) · ∇u(x)

(cf. Exercise 5.6). Hence, the scale space is described by the differential equation

∂t u − v · ∇u = 0, u(0, x) = u0 (x).

This differential equation can be solved by the method of characteristics as follows:

for some x0 we calculate the solution of the ordinary initial value problem

X (t) = v(X(t)), X(0) = x0 ,

with some suitable routine up to time T . Here one can use, for example, the Runge-
Kutta methods, see, e.g., [72]. If v is given only on a discrete set of points, one can
use interpolation as in Sect. 3.1.1 to evaluate v at intermediate points. Then one gets
u(T , X(T )) = u0 (x0 ) (where one may need interpolation again to obtain u(T , · ) at
the grid points). This method was used to generate the images in Fig. 5.1.
Application 5.56 (Erosion, Dilation, and Mean Curvature Motion) The equa-
tions for erosion, dilation, and mean curvature motion can be interpreted as transport
equations, cf. Remark 5.25 and Sect. 5.2.2. However, the vector field v depends on
u in these cases, i.e.,

∂t u − v(u) · ∇u = 0.

Hence, the method of characteristics from Application 5.55 cannot be applied in its
plain form. One still obtains reasonable results if the function u is kept fixed for
the computation of the vector field v(u) for some time. In the example of mean
242 5 Partial Differential Equations in Image Processing

curvature motion this looks as follows:

• For a given time tn and corresponding image u(tn , x) compute the vector field

∇u(tn , x)
v(u(tn , x)) = κ(tn , x) .
|∇u(tn , x)|

By Exercise 5.9, we have

∇u
κ = div( ).
|∇u|

Thus we may proceed as follows. First calculate the unit vector field
∇u(tn ,x)
ν(tn , x) = |∇u(t , e.g., by finite differences (avoiding division by zero,
n ,x)| <
e.g., by |∇u(tn , x)| ≈ |∇u(tn , x)| + ε2 with some small ε > 0). Compute
vtn (x) = (div ν)(tn , x)ν(tn , x), e.g. again by finite differences.
• Solve the equation

∂t u − vtn · ∇u = 0

with initial value u(tn , x) up to time tn+1 = tn + T with T not too large by the
method of characteristics and go back to the previous step.
This method was used to produce the images in Fig. 5.8.
Similarly one can apply the method for the equations of erosion and dilation with
a circular structure element

∂t u ± |∇u| = 0;

see Fig. 5.21 for the case of dilation. One notes some additional smoothing that
results from the interpolation.
In this application we did not treat the nonlinearity in a rigorous way. For a
nonlinear transport equation of the form ∂t u + ∇(F (u)) = 0 one can still define
characteristics, but it may occur that two characteristics intersect or are not well
defined. The first case leads to so-called “shocks,” while the second case leads to
nonunique solutions. Our method does not consider these cases and hence may run
into problems.

Finite Difference Methods

The application of finite difference methods to transport equations needs special

care. We illustrate this with an introductory example and then state a suitable
method.
5.4 Numerical Solutions of Partial Differential Equations 243

original image t=2

t=4 t=6

Fig. 5.21 Solutions of the equation for the dilation by the method of characteristics according to
Application 5.56

Example 5.57 (Stability Analysis for the One-Dimensional Case) Again we begin
with a one-dimensional example

∂t u + a∂x u = 0, u(0, x) = u0 (x).

We use forward differences in the t direction and a central difference quotient in the
x direction and get with notation similar to Example 5.49, the explicit scheme
τ n
un+1 = unj + a (u − unj−1 ). (5.26)
j
2h j +1
To see that this method is not useful we use the so-called “von Neumann stability
analysis.” To that end, we consider the method on a finite interval with periodic
boundary conditions, i.e., j = 1, . . . , M and unj+M = unj . We make a special ansatz
244 5 Partial Differential Equations in Image Processing

for the solution, namely

vjn = ξ n eikj πh , 0 ≤ k ≤ M = 1/ h, ξ ∈ C \ {0}.

Plugging this into the scheme gives

τ n ik(j +1)πh
ξ n+1 eikj πh = ξ n eikj πh + a ξ e − ξ n eik(j −1)πh
2h

and after multiplication by ξ −n e−ikj πh we obtain the following equation for ξ :

τ
ξ = 1 + ia sin(kπh).
h
This shows that ξ has a magnitude that is strictly larger than one, and hence for
every solution that contains vjn , this part will be amplified exponentially in some
sense. This contradicts our knowledge that the initial value is only transported and
not changed in magnitude and we see that the scheme (5.26) is unstable.
Now we consider a forward difference quotient in the x-direction, i.e., the scheme
τ n
un+1 = unj + a (u − unj ). (5.27)
j
2h j +1
Similar to the previous calculation, we obtain
τ
ξ = 1 + a (eikj πh − 1).
h

Now we see that |ξ | ≤ 1 holds for 0 ≤ a hτ ≤ 1. Since τ and h are nonnegative, we

get stability for a ≥ 0 only under the condition that
τ
a ≤ 1.
h
Arguing similarly for a backward difference quotient, we see that the condition
−1 ≤ a hτ guarantees stability. Thus, we should use different schemes for different
signs of a, i.e., for different transport directions. If a depends on x, we should use
different difference quotients for different signs of a, namely

unj + aj τh (unj+1 − unj ) if aj ≥ 0,
un+1
j =
unj + aj τh (unj − unj−1 ) if aj ≤ 0,

or more compactly,
τ
un+1
j = unj + max(0, aj )(unj+1 − unj ) + min(0, aj )(unj − unj−1 ) .
h
5.4 Numerical Solutions of Partial Differential Equations 245

This scheme is stable under the condition that

τ
|a| ≤ 1.
h
Since one uses the transport direction to adapt the scheme, this method is called an
upwind scheme. The condition |a| τh ≤ 1 is called the CFL condition and goes back
to Courant et al. [47].
Application 5.58 (Upwind Method in 2D: The Rouy-Tourin Method) We apply
the idea of the upwind method to the two-dimensional dilation equation
!
∂t u = |∇u| = (∂x1 u)2 + (∂x2 u)2 .

Depending on the sign of ∂xi u, we choose either the forward or backward difference,
i.e.,

1 2
(∂x1 u)2i,j ≈ 2
max 0, ui+1,j − ui,j , −(ui,j − ui−1,j ) ,
h
1 2
(∂x2 u)2i,j ≈ 2 max 0, ui,j +1 − ui,j , −(ui,j − ui,j −1 ) .
h
The resulting method is known as Rouy-Tourin method [120]. Results for this
method are shown in Fig. 5.22. Again we note, similar to the method of characteris-
tics from Application 5.56, that a certain blur occurs. This phenomenon is called
numerical viscosity. Finite difference methods with less numerical viscosity are
proposed, for example, in [20].

Remark 5.59 (Upwind Method According to Osher and Sethian) The authors
of [106] propose the following different upwind method:

1
(∂x1 u)2i,j ≈ 2
max(0, ui+1,j − ui,j )2 + max(0, ui−1,j − ui,j )2
h
1
(∂x2 u)2i,j ≈ 2 max(0, ui,j +1 − ui,j )2 + max(0, ui,j −1 − ui,j )2 .
h
This method gives results that are quite similar to that of Rouy-Tourin, and hence we
do not show extra pictures. Especially, some numerical viscosity can be observed,
too.
246 5 Partial Differential Equations in Image Processing

Fig. 5.22 Solution of the dilation equation with the upwind method according to Rouy-Tourin
from Application 5.58

5.5 Further Developments

Partial differential equations can be used for many further tasks, e.g., also for
inpainting; cf. Sect. 1.2. An idea proposed by Bertalmio [13], is, to “transport” the
information of the image into the inpainting domain. Bornemann and März [15]
provide the following motivation for this approach: in two dimensions we denote by
∇ ⊥ u the gradient of u turned π2 to the left,
8 9 8 9
0 −1 −∂x2 u
∇⊥u = ∇u = .
1 0 ∂x1 u

Now consider the transport equation

∂t u = −∇ ⊥ (u) · ∇u.
5.6 Exercises 247

As illustrated in Application 3.23, the level lines of u roughly follow the edges.
Hence, the vector ∇ ⊥ (u) is also tangential to the edges, and thus the equation
realizes some transport along the edges of the image. This is, roughly speaking,
the same one would do by hand to fill up a missing piece of an image: take the
edges and extend them into the missing part and then fill up the domain with the
right colors. Bornemann and März [15] propose, on the one hand, to calculate the
transport direction ∇ ⊥ (uσ ) with a presmoothed uσ and, on the other hand, get
further improved results by replacing the transport direction with the eigenvector
corresponding to the smaller eigenvalue of the structure tensor Jρ (∇uσ ).
Methods that are based on diffusion can be applied to objects different from
images; we can also “denoise” surfaces. Here a surface is a manifold, and the
Laplace operator has to be replaced by the so-called Laplace-Beltrami operator. Also
one can adapt the ideas of anisotropic diffusion to make them work on surfaces, too;
see, e.g., [45].
The Perona-Malik equation and its analytical properties are still subject to
research. Amann [4] describes a regularization of the Perona-Malik equation that
uses a temporal smoothing instead of the spatial smoothing used in the modified
model (5.18). This can be interpreted as a continuous analogue of the semi-
implicit method (5.25). Chen and Zhang [44] provide a new interpretation of the
nonexistence of solutions of the Perona-Malik equation in the context of Young
measures. Esedoglu [59] develops a finer analysis of the stability of the discretized
Perona-Malik method and proves maximum principle for certain initial values.

5.6 Exercises

Exercise 5.1 (Scale Space Properties of Coordinate Transformations) Show that

the coordinate transformations from Example 5.2 satisfy the Axioms [REG],
[COMP], [GLSI], [GSI] and [SCALE]. Moreover, show that [TRANS] and [ISO]
are not satisfied in general.
Exercise 5.2 (Gray Value Scaling Invariance of Convolution Operators) Show that
the multiscale convolution of Example 5.3 does not satisfy the axiom [GSI] of gray
value scaling invariance.
Exercise 5.3 (Scale Space Properties of the Moving Average) In the context of
Example 5.3 let

(1 + d/2)
ϕ(x) = χB1 (0)(x)
π d/2
x
(cf. Example 2.38) and ϕt (x) = τ (t)−d ϕ τ (t ) . We consider the scale space

u ∗ ϕt if t > 0
Tt u =
u if t = 0.
248 5 Partial Differential Equations in Image Processing

Which scale space axioms are satisfied, and which are not? Can you show that an
infinitesimal generator exists?
Exercise 5.4 (Recursivity of Scaled Dilation) Let B ⊂ Rd be nonempty.
1. Show that

B convex ⇐⇒ for all t, s ≥ 0, tB + sB = (t + s)B.

2. Show that the multiscale dilation from Example 5.4 satisfies the axiom [REC] if
B is convex. Which assumptions are needed for the reverse implication?
Exercise 5.5 (Properties of the Infinitesimal Generator) Let the assumptions of
Theorem 5.11 be fulfilled. Show the following:
1. If axiom [TRANS] is satisfied in addition, then

A[u](x) = F (u(x), ∇u(x), ∇ 2 u(x)).

2. If axiom [GLSI] if satisfied in addition, then

A[u](x) = F (x, ∇u(x), ∇ 2 u(x)).

Exercise 5.6 (Infinitesimal Generator of Coordinate Transformations) Let (Tt ) be

the multiscale coordinate transformation from Example 5.2, i.e.,

(Tt u)(x) = u(j (t, x)),

where j ( · , x) denotes the solution of the initial value problem

∂j
(t, x) = v(j (t, x)), j (0, x) = x.
∂t
Show that the infinitesimal generator of (Tt ) is given by

A[u](x) = v(x) · ∇u(x).

(You may assume that the generator exists.)

Exercise 5.7 (Gradient and Hessian and Gray Value Scalings) Let h : R → R and
u : Rd → R be twice differentiable. Show that

∇(h ◦ u) = h ∇u,
∇ 2 (h ◦ u) = h ∇ 2 u + h ∇u ⊗ ∇u.
5.6 Exercises 249

Exercise 5.8 (Auxiliary Calculation for Theorem 5.23) Let X ∈ S d×d with xd,d =

0 and M = d−1 2
i=1 xd,i . Moreover, let ε > 0 and

⎛ ⎞ ⎛ ⎞
1 ε
⎜ .. ⎟ ⎜ .. ⎟
⎜ . ⎟ ⎜ . ⎟
Q=⎜ ⎟, Iε = ⎜ ⎟.
⎝ 1 ⎠ ⎝ ε ⎠
M
0 ε

Show that

QXQ X + Iε ,
X QXQ + Iε .

Exercise 5.9 (Curvature of Level Sets) Let u : R2 → R be twice differentiable and

let (η, ξ ) be the local coordinates.
1. Show that
∇u ∂ξ ξ u
div = .
|∇u| |∇u|

2. Let c : [0, 1] → R2 be a twice differentiable curve. The curvature of c is,

expressed in the coordinate functions c(s) = (x(s), y(s))T , as follows:

x y − x y
κ= 3/2 .
(x )2 + (y )2

Let u : R2 → R be such that the zero level set {(x, y) u(x, y) = 0} is
parameterized by such a curve c.
Show that on this zero level set at points with ∇u = 0 one has
∇u

κ = div |∇u| .

Exercise 5.10 (Infinitesimal Generator of the Perona-Malik Equation) Show that

the Perona-Malik equation has the infinitesimal generator

g (|p|) T
F (p, X) = p Xp + g(|p|) trace X
|p|

(cf. Lemma 5.26).

250 5 Partial Differential Equations in Image Processing

Exercise 5.11 (Maximum Principle for the One-Dimensional Perona-Malik Equa-

tion) We consider the Cauchy problem

∂t u = ∂x g (∂x u)2 ∂x u in [0, ∞[ × R
(5.28)
u(0, x) = u0 (x) for x ∈ R.

Let g be differentiable, u a solution of this Cauchy problem, and (t0 , x0 ) a point

such that the map x → u(t0 , x) has a local maximum at x0 .
1. Under what conditions on g is the map t → u(t, x0 ) decreasing at the point t0 ?
2. Use the previous result to deduce a condition which implies that for all solutions
u of (5.28) and all t ≥ 0, x ∈ R, one has

inf u0 (x) ≤ u(t, x) ≤ sup u0 (x).

x∈R x∈R

(In this case one says that the solution satisfies a maximum principle.)
Exercise 5.12 (Decrease of Energy and Preservation of the Mean Gray Value for
the Modified Perona-Malik Equation) Let ⊂ Rd , u0 ∈ L∞ (), g : [0, ∞[ →
[0, ∞[ infinitely differentiable and u : [0, T ] × → R a solution of the modified
Perona-Malik equation, i.e., a solution of the initial-boundary value problem

∂t u = div(g(|∇uσ |)∇u) in [0, T ] ×

∂ν u = 0 on [0, T ] × ∂ (5.29)
u(0, x) = u0 (x) for x ∈ .

1. Show that the quantity

h(t) = u(t, x)p dx

is nonincreasing for all p ∈ [2, ∞[.

2. Show that the quantity

μ(t) = u(t, x) dx

is constant.
Chapter 6
Variational Methods

6.1 Introduction and Motivation

To motivate the general approach and possibilities of variational methods in

mathematical imaging, we begin with several examples. First, consider the problem
to remove additive noise from a given image, i.e., of reconstructing u† from the data

u0 = u† + η,

where the noise η is unknown. As we have already seen, there are different
approaches to solve the denoising problem, e.g., the application of the moving
average, morphological opening, the median filter, and the solution of the Perona-
Malik equation.
Since we do not know the noise η, we need to make assumptions on u† and η and
hope that these assumptions are indeed satisfied for the given data u0 . Let us make
some basic observations:
• The noise η = u0 − u† is a function whose value at every point is independent of
the values in the neighborhood of that point. There is no special spatial structure
in η.
• The function u† represents an image that has some spatial structure. Hence, it is
possible to make assertions about the behavior of the image in the neighborhood
of a point.
In a little bit more abstract terms, the image and the noise will have different
characteristics that allow one to discriminate between these two; in this case these
characteristics are given by the local behavior. These assumptions, however, do
not lead to a mathematical model and of course not to a denoising method. The
basic idea of variational methods in mathematical imaging is to express the above
assumptions in quantitative expressions. Usually, these expressions say how “well”

© Springer Nature Switzerland AG 2018 251

K. Bredies, D. Lorenz, Mathematical Image Processing,
Applied and Numerical Harmonic Analysis,
https://doi.org/10.1007/978-3-030-01458-2_6
252 6 Variational Methods

a function “fits” the modeling assumption; it should be small for a good fit, and large
for a bad fit.
With this is mind, we can reformulate the above points as follows:
• There is a real-valued function that gives, for every “noise” function η, the
“size” of the noise. The function should use only point-wise information.
“Large” noise, or the presence of spatial structure, should lead to large values.
• There is a real valued function ! that says how much an image u looks like a
“natural image.” The function should use information from neighborhoods and
should be “large” for “unnatural” images and “small” for “natural” images.
These assumptions are based on the hope that the quantification of the local behavior
is sufficient to discriminate image information from noise. For suitable functions
and ! one chooses a weight λ > 0, and this leads, for every image u (and,
consequently, for every noise η = u − u0 ), to the expression

(u0 − u) + λ!(u),

which gives a value that says how well both requirements are fulfilled; the smaller,
the better. Thus, it is natural to look for an image u that minimizes the expression,
i.e., we are looking for u∗ for which

(u0 − u∗ ) + λ!(u∗ ) = min (u0 − u) + λ!(u)

holds.
Within the model given by , !, and λ, the resulting u∗ is optimal and gives
the denoised image. A characteristic feature of these methods is the solution of a
minimization problem. Since one varies over all u to search for an optimum u∗ ,
these methods are called variational methods or variational problems. The function
to be minimized in a variational problem is called an objective functional. Since
measures the difference u0 − u, it is often called a discrepancy functional or
discrepancy term. In this context, ! is also called a penalty functional.
The following example, chosen such that the calculations remain simple, gives
an impression as to how the model assumption can be transferred to a functional
and what mathematical questions can arise in this context.
Example 6.1 (L2 -H 1 Denoising) Consider the whole space Rd and a (complex-
valued) noisy function u0 ∈ L2 (Rd ). It appears natural to choose for the squared
norm

1
(u) = |u(x)|2 dx.
2 Rd

Indeed, it uses only point-wise information. In contrast, the functional

1
!(u) = |∇u(x)|2 dx,
2 Rd
6.1 Introduction and Motivation 253

uses the gradient of u and hence uses in some sense also information from a
neighborhood. The corresponding variational problem reads

1 λ
min |u0 (x) − u(x)|2 dx + |∇u(x)|2 dx. (6.1)
u∈H 1 (Rd ) 2 Rd 2 Rd

Note that the gradient has to be understood in the weak sense, and hence the
functional is well defined in the space H 1 (Rd ). Something that is not clear a priori
is the existence of a minimizer and hence the justification to use a minimum instead
of an infimum.
We will treat the question of existence of minimizers in greater detail later in this
chapter and content ourselves with a formal solution of the minimization problem:
by the Plancherel formula (4.2) and the rules for derivatives from Lemma 4.28, we
can reformulate the problem (6.1) as

1 λ
min |u00 (ξ ) − 0
u(ξ )|2 dξ + |ξ |2 |0
u(ξ )|2 dξ.
u∈H 1 (Rd ) 2 Rd 2 Rd

We see that this is now a minimization problem for 0u in which we aim to minimize
an integral that depends only on 0
u(ξ ). For this “point-wise problem,” and we will
argue in more detail later, the overall minimization is achieved by “point-wise
almost everywhere minimization.” The point-wise minimizer u∗ satisfies for almost
all ξ ,

1 00 λ
0
u(ξ ) = arg min |u (ξ ) − z|2 + |ξ |2 |z|2 .
z∈C 2 2

Rewriting and with z = |z| sgn(z), we get

1 00 λ 1 1
|u (ξ ) − z|2 + |ξ |2 |z|2 = 1 + λ|ξ |2 |z|2 + |u00 (ξ )|2 − |z| Re sgn(z)u00 (ξ ) ,
2 2 2 2
and hence the minimization with respect to the argument sgn(z) yields sgn(z) =

sgn u00 (ξ ) . This leads to

1 00 λ 1 1
|u (ξ ) − z|2 + |ξ |2 |z|2 = 1 + λ|ξ |2 |z|2 − |z||u00 (ξ )| + |u00 (ξ )|2 ,
2 2 2 2
which we minimize with respect to the absolute value |z| and obtain |z| =
|u00 (ξ )|/(1 + λ|ξ |2 ). In total we get z = u00 (ξ )/(1 + λ|ξ |2 ), and hence u0∗ is unique
and given by

u00 (ξ )
u0∗ (ξ ) = for almost every ξ ∈ Rd .
1 + λ|ξ |2
254 6 Variational Methods

5λ (ξ ) = (2π)−d/2 /(1 + λ|ξ |2 ), we get with Theorem 4.27 on the

Letting P
convolution and the Fourier transform

u∗ = u0 ∗ Pλ .

Using the (d/2 − 1)th modified Bessel function of the second kind Kd/2−1 , we can
write Pλ as

|x|1−d/2 2π|x|
Pλ (x) = K d/2−1 √ (6.2)
(2π)d−1 λ(d+2)/4 λ

(cf. Exercise 6.1).

We see that variational denoising with squared L2 -norm and squared H 1 -
seminorm on the whole space leads to a certain class of linear convolution filters. In
particular, they give a motivation to use the convolution kernels Pλ .
It is simple to implement the method numerically: instead of the continuous con-
volution one uses the discrete version and realizes the multiplication in frequency
space with the help of the fast Fourier transform (see also Exercise 4.12). The result
of this method can be seen in Fig. 6.1. Since the method corresponds to a linear filter,
the noise reduction also leads to a loss of sharp edges in the image; cf. Example 4.19.
Minimization of functionals is used not only for denoising. All problems in which
the inversion of an operation shall be done implicitly, so-called inverse problems,
can be formulated in a variational context. This can be done by an adaptation of the
discrepancy term.
Example 6.2 (H 1 Deconvolution) In Remark 4.21 we saw that blur can be modeled
by the linear operation of convolution and that it can, in principle, we reversed by
division in the Fourier-space. However, even for distortions so small that they are not
visually detectable, such as a quantization to 256 gray levels, direct reconstruction
leads to notably worse reconstruction quality. If the image u0 even contains noise,
or if the Fourier transform of the convolution kernel is close to zero at many places,
deconvolution is not possible in this way.
Let us model the deblurring problem as a variational problem. If we assume that
the convolution kernel k ∈ L1 (Rd ) ∩ L2 (Rd ) is known and satisfies Rd k dx = 1,
then the data and the noise satisfy the identities

u0 = u† ∗ k + η and η = u0 − u† ∗ k, respectively.

Hence, we should replace the term (u0 − u) by (u0 − u ∗ k) in the respective

minimization problem. Choosing and ! similar to Example 6.1, we obtain the
problem

1 λ
min |u0 (x) − (u ∗ k)(x)|2 dx + |∇u(x)|2 dx. (6.3)
u∈H 1 (Rd ) 2 Rd 2 Rd
6.1 Introduction and Motivation 255

Fig. 6.1 Denoising by solving problem (6.1). Upper left: Original image u† with 256 × 256
pixels, upper right: Noisy version u0 (PSNR(u0 , u† ) = 19.98 dB). Bottom row: Denoised images
by solving the minimization problem (6.1) u1 (left, PSNR(u1 , u† ) = 26.21 dB) and u2 (right,
PSNR(u2 , u† ) = 24.30 dB). The regularization parameters are λ1 = 25 × 10−5 and λ2 =
75 × 10−5 , respectively and u0 has been extended to R2 by 0

Similar to Example 6.1 we get, using the convolution theorem this time (Theo-
rem 4.27), that the minimization is equivalent to

1 λ
min |u00 (ξ ) − (2π)d/20 u(ξ )|2 dξ +
k(ξ )0 |ξ |2 |0
u(ξ )|2 dξ,
u∈H 1 (Rd ) 2 Rd 2 Rd

which can again be solved by point-wise almost everywhere minimization. Some

calculations lead to

u00 (ξ )(2π)d/20
k(ξ )
u0∗ (ξ ) = for almost all ξ ∈ Rd ,
d 0
(2π) |k(ξ )| + λ|ξ |2
2
256 6 Variational Methods

and hence the solution is again obtained by convolution, this time with the kernel

0k
u∗ = u0 ∗ k λ , kλ = F −1 . (6.4)
(2π)d |0
k|2 + λ| · |2

We note that the assumptions k ∈ L1 (Rd ) ∩ L2 (Rd ) and Rd k dx = 1 guarantee
that the denominator is continuous and bounded away from zero, and hence, we have
kλ ∈ L2 (Rd ). For λ → 0 it follows that (2π)d/2k0λ → (2π)−d/20
k −1 point-wise, and
hence we can say that the convolution with kλ is in some sense a regularization of
the division by (2π)d/20k, which would be “exact” deconvolution.
The numerical implementation can be done similarly to Example 6.1. Figures 6.2
and 6.3 show some results for this method. In contrast to Remark 4.21 we used

u†

u0 = quant256 (u† ∗ k) u∗ , = 10−7

Fig. 6.2 Solutions of the deblurring problem in (6.3). Top left: Original image, extended by zero
(264 × 264 pixel). Bottom left: Convolution with an out-of-focus kernel (diameter of 8 pixels) and
quantized to 256 gray values (not visually noticeable). Bottom right: Reconstruction with (6.4)
(PSNR(u∗ , u† ) = 32.60 dB)
6.1 Introduction and Motivation 257

u† u0 , PSNR(u0 , u† ∗ k ) = 32.03db

u∗ , PSNR(u∗ , u† ) = 24.48db u∗ , PSNR(u∗ , u† ) = 23.06db

Fig. 6.3 Illustration of the influence of noise in u0 on the deconvolution according to (6.3). Top
left: Original image. Top right: Convolved image corrupted by additive normally distributed noise
(visually noticeable). Bottom row: Reconstruction with different parameters (λ = 5 × 10−6 left,
λ = 10−6 right). Smaller parameters amplify the artifacts produced by noise

a convolution kernel for which the Fourier transform is equal to zero at many
places. Hence, division in the Fourier space is not an option. This problem can be
solved by the variational approach (6.3): even after quantization to 256 gray levels
we can achieve a satisfactory, albeit not perfect, deblurring (Fig. 6.2; cf. Fig. 4.3).
However, the method has its limitations. If we distort the image by additive noise,
this will be amplified during the reconstruction (Fig. 6.3). We obtain images that
look somehow sharper, but they contain clearly perceivable artifacts. Unfortunately,
both “sharpness” and artifacts increase for smaller λ, and the best results (in the
sense of visual perception) are obtained with a not-too-small parameter.
This phenomenon is due to the fact that deconvolution is an ill-posed inverse
problem, i.e., minimal distortions in the data lead to arbitrarily large distortions in
the solution. A clever choice of λ can lead to an improvement of the image, but
some essential information seems to be lost. To some extent, this information is
258 6 Variational Methods

amended by the “artificial” term λ! in the minimization problem. The penalty term
!, however, is part of our model that we have for our original image; hence, the
result depends on how well u† is represented by that model. Two questions come
up: what reconstruction quality can be expected, despite the loss of information, and
how does the choice of the minimization problem influence this quality?
As a consequence, the theory of inverse problems [58] is closely related to
mathematical imaging [126]. There one deals with the question how to overcome
the problems induced by ill-posedness in greater detail and also answers, to some
extent, the question how to choose the parameter λ. In this book we do not discuss
questions of parameter choice and assume that the parameter λ is given.
Remark 6.3 The assumption that the convolution kernel k is known is a quite strong
assumption in practice. If a blurred image u0 is given, k can usually not be deduced
from the image. Hence, one faces the problem of reconstructing both u† and k
simultaneously, a problem called blind deconvolution. The task is highly ambiguous
since one sees by the convolution theorem that for every u0 there exist many pairs of
u and k such that (2π)d/20 k0u = u00 . This renders blind deconvolution considerably
more difficult and we will restrict ourselves to the, already quite difficult, task of
“non-blind” deconvolution. For blind deconvolution we refer to [14, 26, 39, 81].
As a final introductory example we consider a problem that seems to have a
different flavor at first sight: inpainting.
Example 6.4 (Harmonic Inpainting) The task to fill in a missing part of an image
in a natural way, called inpainting, can also be written as a minimization problem.
We assume that the “true,” real valued image u† is given on a domain ⊂ Rd , but
on a proper subset ⊂ with ⊂⊂ it is not known. Hence, the given data
consists of and u0 = u† |\ .
Since we have to “invent” suitable data, the model that we have for an image is of
great importance. Again we note that the actual brightness value of an image is not
so important in comparison to the behavior of the image in a local neighborhood.
As before, we take the Sobolev space H 1 () as a model (this time real-valued) and
postulate u† ∈ H 1 (). In particular, we assume that the corresponding (semi-)norm
measures how well an element u ∈ H 1 () resembles a natural image. The task of
inpainting is then formulated as the minimization problem

1
min |∇u(x)|2 dx. (6.5)
u∈H 1 () 2
u=u0 on \

Again, we are interested in a concrete minimizer, but in contrast to the previous

examples, the Fourier transform is not helpful here, since the domains and
introduce an inherent dependence of the space that is not reflected in the frequency
representation.
6.1 Introduction and Motivation 259

We use a different technique. Assume that we know a minimizer u∗ ∈ H 1 ().

For every other u ∈ H 1 () with u = u0 in \ we consider the function

1 ∗
Fu (t) = ∇ u + t (u − u∗ ) (x)2 dx,
2

which has, by minimality of u∗ , a minimum at t = 0. We take the derivative of Fu

with respect to t in t = 0 and obtain

∂Fu Fu (t) − Fu (0)

(0) = lim
∂t t →0 t
8
1
= lim |∇u∗ (x)|2 dx + ∇u∗ (x) · ∇(u − u∗ )(x) dx
t →0 2t
9
t 1
+ |∇(u − u∗ )(x)|2 dx − |∇u∗ (x)|2 dx
2 2t

= ∇u∗ (x) · ∇(u − u∗ )(x) dx = 0.

The set

{u − u∗ ∈ H 1 () u = u0 in \ } = {v ∈ H 1 () v = 0 in \ }

is a subspace of H 1 (), and it can be shown that it is equal to H01 ( ) (see
Exercise 6.2). Hence, u∗ has to satisfy

∇u∗ (x) · ∇v(x) dx = 0 for all v ∈ H01 ( ). (6.6)

This is the weak form of the so-called Euler-Lagrange equation associated to (6.5).
In fact, (6.6) is the weak form of a partial differential equation. If u∗ is twice
continuously differentiable in , one has for every v ∈ D( ) that

∗
∇u (x) · ∇v(x) dx = − u∗ (x)v(x) dx = 0

and by the fundamental lemma of the calculus of variations (Lemma 2.75) we obtain
u∗ = 0 in , i.e. the function u∗ is harmonic there (see [80] for an introduction
to the theory of harmonic functions). It happens that u∗ is indeed always twice
differentiable, since it is weakly harmonic, i.e., it satisfies

u∗ (x)v(x) dx = 0 for all v ∈ D( ).

260 6 Variational Methods

By Weyl’s lemma [148, Theorem 18.G] such functions are infinitely differentiable
in . If we further assume that u∗ | has a trace on ∂ , it has to be the same as the
trace of u0 (which is given only on \ ). This leads to the conditions

u∗ = 0 in , u∗ = u0 on ∂ ,

which is the strong form of the Euler-Lagrange equation for (6.5). Hence, the
inpainting problem with H 1 norm leads to the solution of the Laplace equation
with so-called Dirichlet boundary values, and is also called harmonic inpainting.
Some properties of u∗ are immediate. On the one hand, harmonic functions
satisfy the maximum principle , which says that non constant u∗ do not have local
maxima and minima in . This says that the solution is unique and that harmonic
inpainting cannot introduce new structures. This seems like a reasonable property.
On the other hand, u∗ satisfies the mean-value property

1
u∗ (x) = u∗ (x − y) dy
L Br (0)
d
Br (0)

as soon as Br (x) ⊂ . In other words, u∗ is invariant under “out-of-focus” blur

(compare Example 3.12). The strength of the averaging, i.e., the radius r depends
on : for “larger” regions we choose r larger. We already know that the moving
average blurs edges and regions of high contrast. Since u∗ is the result of an
average, we cannot expect sharp edges there. Thus, the model behind the variational
method (6.5) cannot extend edges into the inpainting region. We may conclude that
harmonic inpainting is most useful in filling in homogeneous regions.
A numerical implementation can be done, for example, by finite differences as
in Sect. 5.4. Figure 6.4 shows an application of this method. In the left example
it looks, at first glance, that homogeneous region can be filled in a plausible way.
A closer look shows that indeed edges and regions with high contrast changes are
blurred. This effect is much stronger in the right example. The reconstruction shows
blurred regions that do not look plausible. One also notices that the blurred regions
do not look that much diffuse if the domain of inpainting is small in relation to
the image structures. In conclusion, we see a good fit with the theoretical results. It
remains open whether a better choice of the model (or minimization problem) leads
to a method with comparably good reconstruction but that is able to extend edges.
Remark 6.5 (Compression by Inpainting) Figure 6.4 shows that harmonic inpaint-
ing can be used to compress images: Is it possible to choose in a way such that it
is possible to reconstruct a high quality image? If is chosen such that it contains
only smooth regions, the reconstruction by harmonic inpainting looks similar to the
original image. This approach is worked out in [64], for example.
6.1 Introduction and Motivation 261

Fig. 6.4 Inpainting by minimization of the H 1 norm. Left column: The original image (top,
256 × 256 pixels) contains some homogeneous regions that should be reconstructed by inpainting
(middle, is the checkerboard pattern). Bottom, result of the minimization of (6.5). Right column:
The original image (top, 256 × 256 pixels) contains some fine structures with high contrast, which
has been removed in the middle picture. The bottom shows the result of harmonic inpainting
262 6 Variational Methods

As we have seen, we can use minimization problems to solve various tasks in

mathematical imaging. We aim to provide a general mathematical foundation to
treat such problems. As a start, we collect the questions that have arisen so far:
• How do we motivate the choice of the discrepancy and penalty terms and what
influence do they have?
In the previous example we had, similarly to linear regression, quadratic
discrepancy terms. Also, the gradient entered quadratically into the penalty term.
What changes if we use different models or different functionals and which
functionals are better suited for imaging?
• How can we ensure existence and uniqueness of minimizers?
Our calculation in the previous examples implicitly assumed the existence of
minimizers. Strictly speaking, we still need to prove that the obtained functions
indeed minimize the respective functionals (although this seems obvious in the
example of point-wise almost everywhere minimization in Fourier space). A
theory that provides existence, and, if possible, also uniqueness from general
assumptions is desirable.
• Are there general methods to calculate minimizers?
For a general functional it will not be possible to obtain an explicit expression
for the solution of a variational problem. How can we still describe the solutions?
If the solution cannot be given explicitly, we are in need of numerical approxi-
mation schemes that are broadly applicable.
Each of these questions can be answered to a good extent by well-developed
mathematical theories. In the following we describe the aspects of these theories
that are relevant for mathematical imaging. We aim for great generality, but keep
the concrete applications in mind. We will begin with fundamentals of the topic.
• The question of existence of solutions of minimization problem can be answered
with techniques from functional analysis. Here one deals with a generalization
of the theorem of Weierstrass (also called the extreme value theorem) to infinite
dimensional (function) spaces. We give an overview of generalizations that are
relevant for our applications.
• An important role for the uniqueness of solutions, but also for their description,
is played by convex functionals. Convex analysis and the calculus of subdifferen-
tials provide a unified theory for convex minimization problems. In the following
we will develop the most important and fundamental results of this theory.
• Functions with discontinuities play a prominent role in mathematical imaging,
since these discontinuities model edges in images. The space of functions with
bounded total variation is the most widely used model in this context. We
introduce this Banach space and prove some important results.
Building on the developed theory, we will model and analyze various variational
method in the context of mathematical imaging. In particular, we treat the following
problems and show how variational methods and the analysis of these can be used
to solve these problems:
6.2 Foundations of the Calculus of Variations and Convex Analysis 263

• Denoising and image decomposition,

• deblurring,
• restoration/inpainting,
• interpolation/zooming.
Moreover, we will discuss numerical methods to solve the respective minimization
problems.

6.2 Foundations of the Calculus of Variations and Convex

Analysis

6.2.1 The Direct Method

One of the most widely used proof techniques to show the existence of minimizers
of some functional is the direct method in the calculus of variations. Its line of
argumentation is very simple and essentially follows three abstract steps.
Before we treat the method we recall the definition of the extended real numbers:

R∞ = ]−∞, ∞] = R ∪ {∞}.

Of course, we set t ≤ ∞ for all t ∈ R∞ and t < ∞ if and only if t ∈ R. Moreover,

we use the formal rules t + ∞ = ∞ for t ∈ R∞ and t · ∞ = ∞ if t > 0 as well as
0 · ∞ = 0. Subtraction of ∞ as well as multiplication of ∞ by negative
numbers are
not defined. For mappings F : X → R∞ let dom F = {u ∈ X F (u) < ∞} be the
effective domain of definition. Often we want to exclude the special case dom F = ∅,
and hence we call F proper if F is not constant ∞.
We further recall the following notions.
Definition 6.6 (Epigraph) The epigraph of a functional F : X → R∞ is the set

epi F = {(u, t) ∈ X × R F (u) ≤ t}.

Definition 6.7 (Sequential Lower Semicontinuity) A functional F : X → R∞

on a topological space X is sequential lower semicontinuous if for every sequence
(un ) and u ∈ X with limn→∞ un = u, one has

F (u) ≤ lim inf F (un ).

n→∞

Remark 6.8 A functional F : X → R∞ is sequential lower semicontinuous if and

only if epi F is sequentially closed in X × R:
For every sequence ((un , tn )) in epi F with limn→∞ (un , tn ) = (u, t), one has for
every n that F (un ) ≤ tn and thus

F (u) ≤ lim inf F (un ) ≤ lim inf tn = t ⇒ (u, t) ∈ epi F.

n→∞ n→∞
264 6 Variational Methods

Conversely, for every sequence (un ) in X with limn→∞ un = u there exists a subse-
quence (unk ) such that for tn = F (un ), one has limk→∞ tnk = lim infn→∞ F (un ).
We conclude that F (u) ≤ lim infn→∞ F (un ).
With these notions in hand, we can describe the direct method as follows:

To Show
The functional F : X → R∞ , defined on a topological space X has a minimizer u∗ ,
i.e.,

F (u∗ ) = inf F (u) = min F (u).

u∈X u∈X

The Direct Method

1. Establish that F is bounded from below, i.e., infu∈X F (u) > −∞. By the
definition of the infimum this implies the existence of a minimizing sequence
(un ), i.e. a sequence such that F (un ) < ∞ for all n and limn→∞ F (un ) =
infu∈X F (u).
2. Show that the sequence (un ) lies in a set that is sequentially compact with respect
to the topology on X. This implies the existence of a subsequence of (un ),
denoted by (unk ), and a u∗ ∈ X such that limk→∞ unk = u∗ , again with respect
to the topology in X. This u∗ is a “candidate” for a minimizer.
3. Prove sequential lower semicontinuity of F with respect to the topology on X.
Applying this to the subsequence from above, we obtain

inf F (u) ≤ F (u∗ ) ≤ lim inf F (unk ) = inf F (u),

u∈X k→∞ u∈X

which shows that u∗ has to be a minimizer.

If one is about to apply this method to a given minimization problem, one needs
to decide on one open point, namely the choice of the topology on the set X. While
the first step is independent of the topology, it influences the following two steps.
However, the second and third steps have somehow competing demands on the
topology: in weaker (or coarser) topologies there are more convergent sequences,
and consequently, more sequentially compact sets. By contrast, the functional F
has to satisfy the lim inf condition in step three for more sequences, and hence there
are fewer sequentially lower semicontinuous functionals for weaker topologies.
We exemplify this issue for topologies on Banach spaces using Examples 6.1–
6.4, since most applications are in this context. Assume that a given minimization
problem is stated on a Banach space X, and one chooses the, somehow natural,
strong topology on this space. Then the requirement of compactness is an unfortu-
nately strong restriction.
6.2 Foundations of the Calculus of Variations and Convex Analysis 265

Remark 6.9 (Compactness in Examples 6.1–6.4) For the functional in (6.1), (6.3),
and (6.5) one can show only that every minimizing sequence (un ) is bounded in
X = H 1 (Rd ) or X = H 1 (), respectively. For the functionals in (6.1) and (6.3),
for example, the form of the objective functional implies that there exists a C > 0
such that for all n,

2
∇un 22 ≤ F (un ) ≤ C,
λ

which immediately implies the boundedness of the sequence (un ), (translations by

constant functions are not possible in these examples). It is easy to see, that also
the incorporation of the discrepancy term does not allow stronger claims. Since the
space H 1 (Rd ) is infinite-dimensional, there is no assertion about (pre-)compactness
of the sequence (un ) possible in this situation. In the case of the functional in (6.5),
one directly concludes that ∇un 22 ≤ C for a minimizing sequence and continuous
the reasoning as above.
What seems like a dead end at first sight has a simple remedy. An infinite
dimensional Banach space X has a larger collection of sets that are compact in
the weak topology. The situation is especially simple for reflexive Banach spaces:
as a consequence of the theorem of Eberlein-Šmulyan, every bounded and weakly
sequentially closed set is weakly compact. In our examples we can conclude the
following:
Remark 6.10 (Weak Sequential Compactness in Examples 6.1–6.4) The spaces
X = H 1 (Rd ) and X = H 1 () are Hilbert spaces and hence reflexive. As we have
seen in Remark 6.9, every minimizing sequence for (6.1), (6.3), (6.5) is bounded
in these spaces. Consequently, we can always extract a subsequence that is at least
weakly convergent in X.
Therefore, one often uses the weak topology on reflexive Banach spaces for the
second step of the direct method. An important notion in this context, which also
gives a simple criterion for boundedness on minimizing sequences, is the following:
Definition 6.11 (Coercivity) Let X be a Banach space. A functional F : X → R∞
is called coercive if

F (un ) → ∞ for un X → ∞.

Furthermore, it is strongly coercive if F (un )/un X → ∞ for un X → ∞.

With this notion we argue as follows:
Lemma 6.12 Let X be reflexive and F : X → R∞ proper and coercive. Then every
minimizing sequence (un ) for F has a weakly convergence subsequence.
Proof Coercivity of F implies that every minimizing sequence (un ) is bounded
(otherwise, there would be a subsequence, which again would be a minimizing
sequence, for which the functional value would tend to infinity, and hence the
266 6 Variational Methods

sequence would not be a minimal sequence). Hence, (un ) is contained in a closed

ball, i.e., a weakly sequentially compact set. $
#
Remark 6.13 The following criterion is sufficient for coercivity: there exist an R ≥
0 and a ϕ : [R, ∞[ → R∞ with limt →∞ ϕ(t) = ∞ such that

F (u) ≥ ϕ uX for uX ≥ R.

A similar conclusion holds for strong coercivity if limt →∞ ϕ(t)/t = ∞.

It is not difficult to see that coercivity is not necessary for the existence of
minimizers (Exercise 6.3). Strong coercivity is a special case of coercivity and may
play a role if the sum of two functionals is to be minimized.
To complete the argument of the direct method, F has to be weakly sequentially
lower semicontinuous, i.e., sequentially lower semicontinuous with respect to the
weak topology. This allows us to perform the third and final step of the direct
method. To show weak sequential lower semicontinuity (in the following we say
simply weak lower semicontinuity) it is helpful to know some sufficient conditions:
Lemma 6.14 Let X, Y be Banach spaces and F : X → R∞ a functional. Then the
following hold:
1. If F is weakly lower semicontinuous, then so is αF for α ≥ 0.
2. If F and G : X → R∞ are weakly lower semicontinuous, then so is F + G.
3. If F is weakly lower semicontinuous and ϕ : R∞ → R∞ is monotonically
increasing and lower semicontinuous, then ϕ◦F is weakly lower semicontinuous.
4. If : Y → X is weakly sequentially continuous and F weakly lower
semicontinuous, then F ◦ is weakly lower semicontinuous.
5. For every nonempty family Fi : X → R∞ , i ∈ I , of weakly lower
semicontinuous functionals, the point-wise supremum supi∈I Fi is also weakly
lower semicontinuous.
6. Every functional of the form Lx ∗ ,ϕ = ϕ ◦ x ∗ , · X∗ ×X with x ∗ ∈ X∗ and ϕ :
K → R∞ lower semicontinuous is weakly lower semicontinuous on X.
Proof In the following let (un ) be a sequence in X such that un u for some
u ∈ X.
Assertions 1 and 2: Simple calculations show that

αF (u) ≤ α lim inf F (un ) = lim inf (αF )(un ),

n→∞ n→∞

(F + G)(u) ≤ lim inf F (u ) + lim inf G(un ) ≤ lim inf (F + G)(un ).

n
n→∞ n→∞ n→∞

Assertion 3: Applying monotonicity of ϕ to the defining property of weak lower

semicontinuity gives

F (u) ≤ lim inf F (un ) ⇒ ϕ F (u) ≤ ϕ lim inf F (un ) .
n→∞ n→∞
6.2 Foundations of the Calculus of Variations and Convex Analysis 267

For any subsequence nk , for which F (unk ) converges, we get by monotonicity and
lower semicontinuity of ϕ

ϕ lim inf F (un ) ≤ ϕ lim F (unk ) ≤ lim inf ϕ F (unk ) .
n→∞ k→∞ k→∞

Since we can argue as above for any subsequence, we obtain the claim.
Assertion 4: For un u in Y one has (un ) (u) in X, and thus

F (u) ≤ lim inf F (un ) .
n→∞

Assertion 5: For every n ∈ N and i ∈ I one has Fi (un ) ≤ supi∈I Fi (un ), and hence
we conclude that

Fi (u) ≤ lim inf Fi (un ) ≤ lim inf sup Fi (un ) ⇒ sup Fi (u) ≤ lim inf sup Fi (un )
n→∞ n→∞ i∈I i∈I n→∞ i∈I

by taking the supremum.

Assertion 6: By the definition of weak convergence, u → x ∗ , uX∗ ×X is
continuous, and the assertion follows from Assertion 4. $
#
Corollary 6.15 For monotonically increasing and lower semicontinuous ϕ :
[0, ∞[ → R the functional F (u) = ϕ(uX ) is weakly lower semicontinuous.
Proof In a Banach space we have

uX = sup |x ∗ , u| = sup Lx ∗ ,| · | (u),

x ∗ X∗ ≤1 x ∗ X∗ ≤1

and hence the norm is, by Lemma 6.14, items 5 and 6, weakly lower semicontinuous.
The claim follows by item 3 of that lemma. $
#
Example 6.16 (Weak Lower Semicontinuity for Examples 6.1–6.4) Now we have
all the ingredients to prove weak lower semicontinuity of the functionals in the
examples from the beginning of this chapter.

1. We write the functional F1 (u) = 12 Rd |u0 − u|2 dx with ϕ(x) = 12 x 2 and
(u) = u − u0 as

F1 = ϕ ◦ · L2 ◦ .

Here we consider the mapping as a mapping from H 1 (Rd ) to L2 (Rd ). Since

the embedding H 1 (Rd ) → L2 (Rd ) is linear and continuous (which is simple
to see), it is also weakly sequentially continuous (see Remark 2.24), and hence
the same holds for , since it involves only an additional translation by u0 .
By Corollary 6.15, the composition ϕ ◦ · L2 is weakly lower semicontinuous,
and by Lemma 6.14, item 4, F1 is so, too. The weak lower semicontinuity of
268 6 Variational Methods

F2 (u) = λ
2 Rd |∇u|2 dx is shown similarly: we use ϕ(x) = λ2 x 2 and = ∇ and
write

F2 = ϕ ◦ · L2 ◦

(where ∇ : H 1 (Rd ) → L2 (Rd , Rd )). By Lemma 6.14, item 2, we get that

F = F1 + F2 is weakly lower semicontinuous on H 1 (Rd ), which is exactly
the functional from (6.1).
2. For (6.3) we work analogously. The only difference is that we use (u) = u0 −
u ∗ k for the functional F1 . The weak lower semicontinuity is then a simple
consequence of Young’s inequality from Theorem 3.13.
3. For Example 6.4, we need to consider the restriction u = u0 on \ . We treat
it as follows:

0 if v ∈ H01 ( )
F1 (u) = IH 1 ( ) (u − u ), IH 1 ( ) (v) =
0
0 0
∞ otherwise.

It is easy to see (Exercise 6.2) that H01 ( ) is a closed subspace of H 1 (). For
every sequence (un ) in H01 ( ) ⊂ H 1 () with weak limit u ∈ H 1 (), one has

v ∈ H01 ( )⊥ ⊂ H 1 () : (u, v) = lim (un , v) = 0.

n→∞

Hence, we have u ∈ H01 ( ), and this shows that IH 1 ( ) is weakly lower
0
semicontinuous. With F2 from item 1 and F = F1 + F2 we obtain the weak
lower semicontinuity of the functional in (6.5).
This settles the question of existence of minimizing elements for the introductory
problems. We note the general procedure in the following theorem:
Theorem 6.17 (The Direct Method in Banach Spaces) Let X be a reflexive
Banach space and let F : X → R∞ be bounded from below, coercive, and weakly
lower semicontinuous. Then the problem

min F (u)
u∈X

has a solution in X.
For dual spaces X∗ of separable normed spaces (not necessarily reflexive) one
can prove a similar claim under the assumption that F is weakly* lower semicon-
tinuous (Exercise 6.4). As we have seen, the notion of weak lower semicontinuity
is a central element of the argument. Hence, the question as to which functionals
have this property is well investigated within the calculus of variations. However,
there is no general answer to this question. The next example highlights some of the
difficulties that can arise with weak lower semicontinuity.
6.2 Foundations of the Calculus of Variations and Convex Analysis 269

ϕ1 ϕ2
2 2
1.5 1.5
1 1
0.5 0.5

(a) -2 -1 0 1 2s -2 -1 0 1 2s

u0 u1 u2 u3
1 1 1 1
0 0 0 0
1 1 1 1
-1 -1 -1 -1

u4 1 u5 u
1 1
0 0 · · · 0
1 1 1
(b) -1 -1 -1

Fig. 6.5 Visualization of the functions from Example 6.18. (a) The pointwise “energy functions”
ϕ1 and ϕ2 . (b) The first elements of the sequence (un ) and its weak limit u

Example 6.18 (Example/Counterexample for Weak Lower Semicontinuity) Let

X = L2 ([0, 1]), ϕ1 (x) = 12 x 2 and ϕ2 (x) = 14 (x − 1)2 (x + 1)2 , depicted in Fig. 6.5a.
Consider the functionals

1 1
F1 (u) = ϕ1 u(t) dt, F2 (u) = ϕ2 u(t) dt.
0 0

By Corollary 6.15, F1 = ϕ1 ◦ · 2 is weakly lower semicontinuous on X. However,

F2 is not: the sequence (un ) given by

1 if 2k ≤ t < 2k + 1 for some k ∈ Z,
un (t) = v(2n t), v(t) =
−1 otherwise,

forms a set of mutually orthonormal vectors, and hence by a standard argument in

Hilbert space theory (see Sect. 2.1.3), it converges weakly to u = 0 (cf. Fig. 6.5b).
But
1
∀n ∈ N : F2 (un ) = 0, F2 (u) = ,
4
and thus F2 (u) > lim infn→∞ F2 (un ).
Although the functionals F1 and F2 have a similar structure, they differ with
respect to weak lower semicontinuity. The reason for this difference lies in the fact
270 6 Variational Methods

that ϕ1 is convex, while ϕ2 is not. This is explained within the theory of convex
analysis, which we treat in the next section.
During the course of the chapter we will come back to the notion of weak lower
semicontinuity of functionals. But for now, we end this discussion with a remark.
Remark 6.19 For every functional F : X → R∞ , for which there exists a weakly
lower semicontinuous F0 : X → R∞ such that F0 ≤ F holds pointwise, one can
consider the following construction:

F (u) = sup {G(u) G : X → R∞ , G ≤ F, G weakly lower semicontinuous}.

By Lemma 6.14, item 5, this leads to a weakly lower semicontinuous functional

with F ≤ F . By construction, it is the largest such functional below F , and it is
called the weak lower semicontinuous envelope of F .

6.2.2 Convex Analysis

This and the following subsection give an overview of the basic ideas of convex
analysis, where we focus on the applications to variational problems in mathemati-
cal imaging. The results, more details, and further material can be found in standard
references on convex analysis such as [16, 57, 118]. We focus our study of convex
analysis on convex functionals and recall their definition.
Definition 6.20 (Convexity of Functionals) A functional F : X → R∞ on a
normed space X is called convex if for all u, v ∈ X and λ ∈ [0, 1], one has

F λu + (1 − λ)v ≤ λF (u) + (1 − λ)F (v).

It is called strictly convex if for all u, v ∈ X with u = v and λ ∈ ]0, 1[, one has

F λu + (1 − λ)v < λF (u) + (1 − λ)F (v).

We will study convex functionals in depth in the following, and we will see that
they have several nice properties. These properties make them particularly well
suited for minimization problems. Let us start with fairly obvious constructions
and identify some general examples of convex functionals. The method from
Lemma 6.14 can be applied to convexity in a similar way.
Lemma 6.21 (Construction of Convex Functionals) Let X, Y be normed spaces
and F : X → R∞ convex. Then we have the following:
1. For α ≥ 0 the functional αF is convex.
2. If G : X → R∞ is convex, then so is F + G.
6.2 Foundations of the Calculus of Variations and Convex Analysis 271

3. For ϕ : R∞ → R∞ convex and monotonically increasing on the range of F , the

composition ϕ ◦ F is also convex.
4. For : Y → X affine linear, i.e., λu + (1 − λ)v = λ(u) + (1 − λ)(v)
for all u, v ∈ Y and λ ∈ [0, 1], the composition F ◦ is convex on Y .
5. The pointwise supremum F (u) = supi∈I Fi (u) of a family of convex functionals
Fi : X → R∞ with i ∈ I and I = ∅ is convex.
Proof Assertions 1 and 2: This follows by simple calculations.
Assertion 3: One simply checks that for u, v ∈ X, λ ∈ [0, 1], one has

ϕ F λu + (1 − λ)v ≤ ϕ λF (u) + (1 − λ)F (v) ≤ λϕ F (u) + (1 − λ)ϕ F (v) .

Assertion 4: Affine linear mappings are compatible with convex combinations.

Hence,

F λu + (1 − λ)v = F λ(u) + (1 − λ)(v) ≤ λF (u) + (1 − λ)F (v)

for u, v ∈ Y and λ ∈ [0, 1].

Assertion 5: Let u, v ∈ X and λ ∈ [0, 1]. For all i ∈ I we have, by definition of
the supremum, that

Fi λu + (1 − λ)v ≤ λFi (u) + (1 − λ)Fi (v) ≤ λF (u) + (1 − λ)F (v),

and consequently, the same holds for the supremum F λu+(1−λ)v over all i ∈ I .
$
#
Remark 6.22 Similar claims to those in Lemma 6.21 can be made for strictly convex
functionals F :
=
α > 0, G convex,
• ⇒ αF, F + G, ϕ ◦ F strictly convex,
ϕ convex, strictly increasing
=
affine linear, injective,
• ⇒ F ◦ , maxi=1,...,N Fi strictly convex.
F1 , . . . , FN strictly convex
Besides these facts we give some concrete and some more abstract examples; see
also Fig. 6.6.
Example 6.23 (Convex Functionals)
1. Exponentiation
The functions ϕ : R → R defined by x → |x|p are convex for p ≥ 1
and strictly convex for p > 1. Every twice continuously differentiable function
ϕ : R → R with ϕ (x) ≥ 0 for all x is convex, and strictly so if strict inequality
holds.
272 6 Variational Methods

ϕ1
2
1.5 ϕ3
2.5
1
epi ϕ1 2
0.5
s
-2 -1 0 1 2 epi ϕ3
ϕ2 x1
2 2
1.5 1
1 -2 -1 0 1 2 x2
epi ϕ2 -1
0.5 -2
s
-2 -1 0 1 2
3 2
10 s , s ≤ 0, 0, s ∈ [−1, 1],
ϕ1 (s) = ϕ2 (s) = ϕ3 (x) = 21 (x12 + x22 ).
2 s + s, ∞,
1 2
s > 0, otherwise,

Fig. 6.6 Examples of convex functionals on R and R2

2. Norms
Every norm · X on a normed space is convex, since for all u, v ∈ X and
λ ∈ K,

λu + (1 − λ)vX ≤ |λ|uX + |1 − λ|vX .

For a normed space Y with Y ⊂ X, we can extend the norm · Y to X by ∞

and obtain a convex functional on X. With a slight abuse of notation we write for
u ∈ X,

uY if u ∈ Y,
uY =
∞ otherwise.

We remark that a norm on a nontrivial normed space is never strictly convex (due
to positive homogeneity). The situation is different for strictly convex functions
of norms: for ϕ : [0, ∞[ → R∞ strictly
monotonically increasing and strictly
convex, the functional F (u) = ϕ uX is strictly convex if and only if the norm
in X is strictly convex, i.e. for all u, v ∈ X with uX = vX = 1, u = v, and
for all λ ∈ ]0, 1[, one has λu + (1 − λ)v X < 1.
The norm in a Hilbert space is always strictly convex, since for uX =
2
vX = 1 and u = v, the function Fλ : λ → λu + (1 − λ)v X is twice
continuously differentiable with Ft (λ) = 2u − v2X > 0, and hence convex.
6.2 Foundations of the Calculus of Variations and Convex Analysis 273

3. Indicator functionals
The indicator functional of a set K ⊂ X, i.e.,

0 if u ∈ K,
IK (u) =
∞ otherwise,

is convex if and only if K is convex. Such functionals are used to describe

constraints for minimization problems; see Example 6.4. The minimization of
F over K is then written as

min F (u) + IK (u).

u∈X

4. Functionals in X∗
An element x ∗ ∈ X∗ and a convex function ϕ : K → R∞ always lead to a
composition ϕ ◦ x ∗ , · X∗ ×X that is convex.
5. Composition with a linear map F ◦ A
If A : dom A ⊂ Y → X is a linear map defined on a subspace dom A and
F : X → R∞ is convex, then the functional F ◦ A : Y → R∞ with

F (Ay) if y ∈ dom A,
(F ◦ A)(y) =
∞ otherwise,

is also convex.
6. Convex functions in integrals
Let (, F, μ) be a measure space, X = Lp (, KN ) for some N ≥ 1, and
ϕ : KN → R∞ convex and lower semicontinuous. Then

F (u) = ϕ u(x) dx

is convex, at least at the points where the integral exists (it may happen that
| · | ◦ ϕ ◦ u is not integrable; the function ϕ ◦ u is, due to lower semicontinuity
of ϕ, always measurable). A similar claim holds for strict convexity of ϕ, since
for nonnegative f ∈ Lp (), it is always the case that f dx = 0 implies that
f = 0 almost everywhere.
1/p
In particular, the norms up = |u(x)|p dx in Lp (, KN ) are strictly
convex norms for p ∈ ]1, ∞[ if the vector norm | · | on KN is strictly convex.
Remark 6.24 (Convexity in the Introductory Examples) The functional in (6.1) from
Examples 6.1 is strictly convex: We see this by Remark 6.22 or Lemma 6.21 and
item 2 of Example 6.23. Similarly one sees strict convexity of the functionals in (6.3)
and (6.5) for Examples 6.2–6.4.
274 6 Variational Methods

Convex functions satisfy some continuity properties. These can be deduced, quite
remarkably, only from assumptions on boundedness. Convexity allows us to transfer
local properties to global properties.
Theorem 6.25 If F : X → R∞ is convex and there exists u0 ∈ X such that F is
bounded from above in a neighborhood of u0 , then F is locally Lipschitz continuous
at every interior point of dom F .
Proof We begin with the proof of the following claim: If F bounded from above in
a neighborhood of u0 ∈ X, then it is Lipschitz continuous in a neighborhood of u0 .
By assumption, there exist δ0 > 0 and R > 0, such that F (u) ≤ R for u ∈ Bδ0 (u0 ).
Since F is bounded from below on Bδ0 (u0 ) we conclude that for u ∈ Bδ0 (u0 ) we
have u0 = 12 u + 12 (2u0 − u), and consequently

1 1
F (u0 ) ≤ F (u) + F (2u0 − u).
2 2

Since 2u0 − u ∈ Bδ0 (u0 ), we obtain the lower bound −L = 2F (u0 ) − R:

F (u) ≥ 2F (u0 ) − F (2u0 − u) ≥ 2F (u0 ) − R = −L.

For distinct u, v ∈ Bδ0 /2 (u0 ) the vector w = u+α −1 (u−v) with α = 2u − vX /δ0
is still in Bδ0 (u0 ), since

δ0 δ0
w − u0 X ≤ u − u0 X + α −1 v − uX < + u − vX = δ0 .
2 2u − vX

We write u = 1
1+α v + α
1+α w as a convex combination, and conclude that

1 α α
F (u) − F (v) ≤ F (v) + F (w) − F (v) = F (w) − F (v)
1+α 1+α 1+α
≤ α(R + L) = Cu − vX

(here we used the boundedness of F in Bδ0 (u0 ) from above and below and the
definition of α). Swapping the roles of u and v in this argument, we obtain the
Lipschitz estimate |F (u) − F (v)| ≤ Cu − vX in Bδ0 /2 (u0 ).
Finally, we show that every interior point u1 of dom F has a neighborhood
on
which F is bounded from above. To that end, let λ ∈ ]0, 1[ be such that F λ−1 u1 −

(1−λ)u0 = S < ∞. Such a λ exists, since the mapping λ → λ−1 u1 −(1−λ)u0
is continuous at λ = 1 and has the value u1 there.
Furthermore, for a given v ∈ B(1−λ)δ0 (u1 ) we choose a vector u = u0 + (v −
u )/(1 − λ) (which is also in Bδ0 (u0 )). This v is a convex combination, since v =
1

λλ−1 (u1 − (1 − λ)u0 ) + (1 − λ)u, and we conclude, that

F (v) ≤ λF λ−1 u1 − (1 − λ)u0 + (1 − λ)F (u) ≤ S + R.
6.2 Foundations of the Calculus of Variations and Convex Analysis 275

These v form a neighborhood of u1 on which F is bounded, and hence F is locally

Lipschitz continuous. $
#
Remark 6.26 The following connection of convex functionals and convex sets,
similar to Remark 6.8, is simple to see:
• A functional is convex if and only if its epigraph is convex.
• It is convex and lower semicontinuous if and only if its epigraph is convex and
closed.
a convex and lower semicontinuous F and all t ∈ R, the sublevel
• Especially, for
sets {u ∈ X F (u) ≤ t} are convex and closed.
This observation is behind the proof of the following fundamental property of
closed sets in Banach spaces:
Lemma 6.27 (Convex, Weakly Closed Sets) A convex subset A ⊂ X of a Banach
space X is weakly sequentially closed if and only if it is closed.
Proof Since strong convergence implies weak convergence, we immediately see
that weakly sequentially closed sets are also closed. For the reverse implication,
we assume that there exist a sequence (un ) in A and some u ∈ X\A such that
un u for n → ∞. Since A is convex and closed, Hahn-Banach separation
theorems (Theorem 2.29) imply that there exist an x ∗ ∈ X∗ and a λ ∈ R such
that Re x ∗ , un X∗ ×X ≤ λ and Re x ∗ , uX∗ ×X > λ. However, this implies that
limn→∞ x ∗ , un X∗ ×X = x ∗ , uX∗ ×X , which contradicts the weak convergence.
Hence, A has to be weakly sequentially closed. $
#
This result implies a useful characterization of weak (sequential) lower semicon-
tinuity.
Corollary 6.28 A convex functional F : X → R∞ is weakly lower semicontinuous
if and only if it is lower semicontinuous.
Proof This follows by applying Lemma 6.27 to the epigraph in combination with
Remark 6.26. $
#
We apply this to prove the weak lower semicontinuity of functionals of similar
type to those we have already seen in Examples 6.1, 6.18, and 6.23.
Example 6.29 (Convex, Weakly Lower Semicontinuous Functionals)
1. Norms
Let Y be a reflexive Banach space, continuously embedded in X. Then · Y
is lower semicontinuous on X: Let (un ) with un → u in X for n → ∞. If
lim infn→∞ un Y = ∞ the desired inequality is surely satisfied; otherwise,
take a subsequence (unk ) with lim infn→∞ un Y = limk→∞ unk Y < ∞.
By reflexivity of Y we assume without loss of generality that (unk ) converges
weakly to some v ∈ Y . By the continuous embedding in X we also get unk v
for k → ∞ in X, and hence v = u. Since · Y is weakly lower semicontinuous,
276 6 Variational Methods

we conclude that

uY ≤ lim inf unk Y = lim inf un Y .

k→∞ n→∞

Moreover, · Y is convex (cf. Example 6.23), and hence also weakly sequen-
tially lower semicontinuous.
The claim also holds if we assume that Y is the dual space of a separable
normed space (by weak* sequential compactness of the unit ball, see Theo-
rem 2.21) and that the embedding into X is weakly*-to-strongly closed.
2. Composition with a linear map F ◦ A
For reflexive Y , F : Y → R∞ convex, lower semicontinuous and coercive
and a strongly-to-weakly closed linear mapping A : X ⊃ dom A → Y , the
composition F ◦ A is convex (see Example 6.23) and also lower semicontinuous:
if un → u in X and lim infn→∞ F (Aun ) < ∞, then by coercivity, (un Y ) is
bounded. Hence, there exists a weakly convergent subsequence (Aunk ) v with
v ∈ Y , and without loss of generality, we can assume that limk→∞ F (Aunk ) =
lim infn→∞ F (Aun ). Moreover, unk → u, and we conclude that v = Au and
thus

F (Au) ≤ lim F (Aunk ) = lim inf F (Aun ).

k→∞ n→∞

A similar argument establishes the convexity and lower semicontinuity of F ◦ A

if Y is the dual space of a separable Banach space, F : Y → R∞ is convex,
weakly* lower semicontinuous and coercive, and A is strong-to-weakly* closed.
3. Indicator functionals
It is straightforward to see that an indicator function IK is lower semicontin-
uous, if and only if K is closed. Hence, for convex and closed K, the functional
IK is weakly sequentially lower semicontinuous.
4. Convex functions in integrals
Let (, F, μ) be a measure space, and for 1 ≤ p < ∞ and N ≥ 1, let X =
Lp (, KN ) be the respective Lebesgue space. Moreover, let ϕ : KN → R∞ be
convex, lower semicontinuous, and satisfy ϕ(t) ≥ 0 for all t ∈ Kn and ϕ(0) = 0
if has infinite measure or let ϕ be bounded from below if has finite measure.
Then

F (u) = ϕ u(x) dx,

F : X → R∞ , is well defined and weakly lower semicontinuous: Assume

un → u for some (un ) and u in Lp (, KN ). The theorem of Fischer-Riesz
(Theorem 2.48) shows the existence of a subsequence, that we again call (un ),
which converges pointwise almost everywhere. This leads, for almost all x ∈ ,
to the estimate

ϕ u(x) ≤ lim inf ϕ un (x) ,
n→∞
6.2 Foundations of the Calculus of Variations and Convex Analysis 277

which together with Fatou’s lemma (Lemma 2.46) implies the inequality

F (u) = ϕ u(x) dx ≤ lim inf ϕ un (x) dx
n→∞

≤ lim inf ϕ un (x) dx = lim inf F (un ).
n→∞ n→∞

This prove the lower semicontinuity of F and, together with convexity, the weak
sequential lower semicontinuity (Corollary 6.28).
Lower continuity of convex functionals implies not only weak lower semiconti-
nuity but also strong continuity in the interior of the effective domain.
Theorem 6.30 A convex and lower semicontinuous functional F : X → R∞ on a
Banach space X is continuous at every interior point of dom F .
Proof For the nontrivial case it is, by Theorem 6.25, enough to show, that F is
bounded from above in a neighborhood. We choose u0 ∈ int(dom F ), R > F (u0 )
and define V = {u ∈ X F (u) ≤ R}, so that the sets

u − u0

V n = u ∈ X u0 + ∈V ,
n
n ≥ 1, are a sequence of convex and closed sets (since F is convex and lower
semicontinuous; see Remark 6.26). Moreover, Vn0 ⊂ Vn1 for n0 ≤ n1 , since u0 +
n−1 0 0 −1
1 (u − u ) = u + (n0 /n1 )n0 (u − u ) is, by convexity, contained in V if u +
0 0
−1
n0 (u − u0 ) ∈ V .
Finally, we see that for all u ∈ X, the convex function t → F (u0 + t (u − u0 ))
is finite in a neighborhood of 0 (otherwise, u0 would not be an interior point of
dom F ). Without loss of generality, we assume that Fu is continuous even in this
−1
6.25. Hence, there exists n ≥ 1 such that u + n (u −
neighborhood, see Theorem 0

u ) ∈ V . This shows that n≥1 Vn = X.

By the Baire category theorem (Theorem 2.14), one Vn has an interior point,
and hence V has an interior point, which implies the boundedness of F in a
neighborhood. $
#
Now we state the direct method for the minimization of convex functionals.
Theorem 6.31 (The Direct Method for Convex Functionals in a Banach Space)
Let X be a reflexive Banach space and F : X → R∞ a convex, lower semicontinu-
ous, and coercive functional. Then, there is a solution of the minimization problem

min F (u).
u∈X

The solution is unique if F is strictly convex.

278 6 Variational Methods

Proof We use Theorem 6.17. If F is everywhere infinite, there is nothing to prove.

So let F be proper, i.e., epi F = ∅. Since F is by Corollary 6.28 weakly lower
semicontinuous, we have to prove boundedness from below. To that end, we use
the separation theorem (Theorem 2.29) for the closed set epi F and the compact
set {(u0 , F (u0 ) − 1)} (which are disjoint by definition). Hence, there exist a pair
(x ∗ , t ∗ ) ∈ X∗ × R and some λ ∈ R such that both

Re x ∗ , u + t ∗ F (u) ≥ λ ∀u ∈ X

and

Re x ∗ , u0 + t ∗ (F (u0 ) − 1) = Re x ∗ , u0 + t ∗ F (u0 ) − t ∗ < λ

hold. This shows that λ − t ∗ < λ, i.e., t ∗ > 0. For all R > 0 we get, for u ∈ X with
uX ≤ R, the estimate

λ − Rx ∗ X∗ λ − Re x ∗ , u
≤ ≤ F (u).
t∗ t∗
Coercivity of F implies the existence of some R > 0 such that F (u) ≥ 0 for all
uX ≥ R. This shows that F is bounded from below, and Theorem 6.17 shows that
a minimizer exists.
If we now assume strict convexity of F and let u∗ = u∗∗ be two minimizers of
F , we obtain

1 1 u∗ + u∗∗
min F (u) = F (u∗ ) + F (u∗∗ ) > F ≥ min F (u),
u∈X 2 2 2 u∈X

a contradiction. Hence, there is only one minimizer. $

#
Example 6.32 (Tikhonov Functionals) Let X, Y be Banach spaces, X reflexive,
A ∈ L(X, Y ), and u0 ∈ Y . The linear map A should be a model of some forward
operator, which maps a image to the data that can be measured. This could be,
for example, a convolution operator (Example 6.2) or the identity (Example 6.1).
Moreover, u0 should be the noisy data. Building on the considerations of Sect. 6.1,
we use the norm Y to quantify the discrepancy. Moreover, the space X should be a
good model for the reconstructed data u. This motivates the choice

1 p 1 q
(v) = vY , !(u) = uX
p q

with p, q ≥ 1. The corresponding variational problem, in this abstract situation, is

minu∈X (Au − u0 ) + λ!(u) or
p q
Au − u0 Y uX
min F (u), F (u) = +λ (6.7)
u∈X p q
6.2 Foundations of the Calculus of Variations and Convex Analysis 279

with some λ > 0. Functionals F of this form are also called Tikhonov functionals.
They play an important role in the theory of ill-posed problems.
Using Lemma 6.21 and the considerations in Example 6.23, it is not hard to show
that F : X → R∞ is convex. The functional is finite on all of X, hence continuous
q
(Theorem 6.25), in particular lower semicontinuous. The term qλ uX implies the
coercivity of F (see Remark 6.13). Hence, we can apply Theorem 6.31 and see that
there exists a minimizer u∗ ∈ X. Note that the penalty λ! is crucial for this claim,
since u → (Au − u0 ) is not coercive in general. If it were, there would exist A−1
on rg(A) and it would be continuous; in the context of inverse problems, Au = u0
would not be ill-posed (see also Exercise 6.6).
For the uniqueness of the minimizer we immediately see two sufficient condi-
q
tions. On the one hand, strict convexity of · X (q > 1) implies strict convexity of
p
! and also of F . On the other hand, an injective A and strictly convex · Y (p > 1)
lead to strict convexity of u → (Au − u ) and also of F . In both cases we obtain
0

the uniqueness of u∗ by Theorem 6.31.

If we compare F with the functionals in Examples 6.1–6.4, we note that the
functionals there are not Tikhonov functionals in this sense. This is due to the fact
that the term ! is only a squared seminorm ∇ · 2 on a Sobolev space. It would be
desirable to have a generalization of the above result on existence to the case !(u) =
q
q |u|X with a suitable seminorm | · |X . Since seminorms vanish on a subspace we
1

need additional assumptions on A to obtain coercivity. One such case is treated in

Exercise 6.7.
Finally, we note that in view of item 1 in Example 6.29, one can also use the qth
power of a norm in a reflexive Banach space Z, which is continuously embedded
in X; in this case one considers the functional on the space Z. The same can be
done for certain seminorms in Z. However, it may be advantageous to consider the
problem in X if it comes to optimality conditions, as we will see later.
Once the existence of minimizers of some convex functional is settled, e.g.,
by the direct method, one aims to calculate one of the minimizers. The following
approach is similar to what one might see in any calculus course:
Suppose u∗ ∈ X is a minimizer of F : X → R. For every direction v ∈ X we
vary in this direction, i.e., we consider Fv (t) = F (u∗ + tv), and we get, that 0 is a
minimizer of Fv . If we further assume that Fv is differentiable at 0, we get Fv (0) = 0
and if F is also Gâteaux differentiable at u∗ with derivative DF (u∗ ) ∈ X∗ ,

Fv (0) = DF (u∗ ), v = 0 ∀v ∈ X ⇒ DF (u∗ ) = 0.

In other words, every minimizer of F is necessarily a stationary point. Until now

we have not used any convexity in this argument. However, convexity allows us to
conclude that this criterion is also sufficient.
280 6 Variational Methods

Theorem 6.33 Let F : U → R be a functional that is Gâteaux differentiable on an

open neighborhood U of a convex set K of a normed space X. Then F is convex in
K if and only if

F (u) + DF (u), v − u ≤ F (v) ∀u, v ∈ K. (6.8)

If u is an interior point of K, then w = DF (u) is the unique element in X∗ for

which

F (u) + w, v − u ≤ F (v) ∀v ∈ K. (6.9)

Proof
Let F be convex.
u, v ∈ K and t ∈ ]0, 1], and obtain F tv + (1 −
We choose
t)u ≤ F (u) + t F (v) − F (u) , and dividing by t and letting t → 0, yields

F u + t (v − u) − F (u)
DF (u), v − u = lim ≤ F (v) − F (u)
t →0 t

and thus (6.8). Now assume that (6.8) holds. Swapping the roles of u and v and
adding the inequalities gives

DF (u) − DF (v), u − v ≥ 0 ∀ u, v ∈ K.

For fixed u, v ∈ K we consider the functions f (t) = F u + t (v − u) , defined and
continuous on [0, 1]. By Gâteaux differentiability and the above inequality we get
> ?
f (t) − f (s) (t − s) = DF u + t (v − u) − DF u + s(v − u) , (t − s)(v − u) ≥ 0

for all s, t ∈ ]0, 1[. Hence, f is monotonically increasing. If we choose some

t ∈ ]0, 1[, we obtain by the mean value theorem, applied to 0 and t, and t and 1,
respectively, the existence of s, s with 0 < s < t < s < 1 such that
f (t) − f (0) f (1) − f (t)
= f (s) ≤ f (s ) = .
t −0 1−t

Some rearrangement gives F tv+(1−t)u = f (t) ≤ tf (1)+(1−t)f (0) = tF (v)+
(1 − t)F (u). This proves the convexity in all non-trivial cases, since u, v ∈ K and
t ∈ ]0, 1[ are arbitrary.
For the last point, let u be an interior point of K and w ∈ X∗ such that (6.9) is
satisfied. for all v̄ ∈ X we have v = u + t v̄ ∈ K in some interval t ∈ ]−ε, ε[. For
t > 0 we conclude by Gâteaux differentiability, that

F (u + t v̄) − F (u)
w, v̄ ≤ lim = DF (u), v̄
t →0+ t

and similarly for t < 0, we get w, v̄ = DF (u), v̄. This shows that w = DF (u).
$
#
6.2 Foundations of the Calculus of Variations and Convex Analysis 281

DF (u)(v − u) G1
v−u F G2
v u K u

Fig. 6.7 Left: Interpretation of the derivative of a convex function at some point as the slope of the
respective affine support function. Right: The characterization does not hold outside of the interior
of the domain K. Both G1 (v) = F (u) + DF (u)(v − u) and G2 (v) = F (u) + w(v − u) with
w > DF (u) are estimated by F in K

Remark 6.34 For convex Gâteaux differentiable functionals we can characterize the
derivative at any interior point u of K by the inequality (6.9). More geometrically,
this means that there is an affine linear support functional at the point u, that is tight
at F (u) and below F on the whole of K. The “slope” of this functional must be
equal to DF (u); see Fig. 6.7.

Corollary 6.35 If in Theorem 6.33 the functional F is also convex and if DF (u∗ ) =
0 holds for some u∗ ∈ K, then u∗ is a minimizer of F in K.
Proof Just plug u∗ in (6.8). $
#
In the special case K = X we obtain that
if F : X → R is convex and Gâteaux differentiable, then u∗ is a minimizer if and only if
DF (u∗ ) = 0.

Example 6.36 (Euler-Lagrange Equation for Example 6.1) Let us consider again
Example 6.1, now with real-valued u0 and respective real spaces L2 (Rd ) and
H 1 (Rd ). The functional F from (6.1) is Gâteaux differentiable, and it is easy to
compute the derivative as

DF (u) (v) = u(x) − u0 (x) v(x) dx + λ ∇u(x) · ∇v(x) dx.
Rd Rd

We can find an optimal u∗ by solving DF (u∗ ) = 0, or equivalently,

u∗ (x)v(x) + λ∇u∗ (x) · ∇v(x) dx = u0 (x)v(x) dx
Rd Rd

for all v ∈ H 1 (Rd ). This is a weak formulation (see also Example 6.4) of the
equation

u∗ − λu∗ = u0 in Rd .
282 6 Variational Methods

Hence, the Euler-Lagrange equation of the denoising problem is a partial differential

equation. Transformation into frequency space gives

(1 + λ|ξ |2 )u0∗ = u00 ,

which leads to the solution (6.2) we obtained before.

We note that we can use this method also for Example 6.2; we already used a
variant of it in Example 6.4. However, the method has its limitations as we already
see in the case of convex constraints. In this case we get different conditions for
optimality.
Example 6.37 (Minimization with Convex Constraints) Let F : R → R be convex
and continuously differentiable and consider the minimization problem

min F (u),
u∈[−1,1]

which has a solution u∗ . For u∗ ∈ ]−1, 1[ we have necessarily DF (u∗ ) = 0, but in

the case u∗ = 1 we get

F (1 − t) − F (1)
−DF (u∗ ) = lim ≥ 0,
t →0 t

and similarly, in the case u∗ = −1 that DF (u∗ ) ≥ 0. By (6.8) these conditions are
also sufficient for u∗ to be a minimizer. Hence, we can characterize optimality of u∗
as follows: there exists μ∗ ∈ R such that

DF (u∗ ) + μ∗ sgn(u∗ ) = 0, |u∗ | − 1 ≤ 0, μ∗ ≥ 0, (|u∗ | − 1)μ∗ = 0.

The variable μ∗ is called the Lagrange multiplier for the constraint |u|2 − 1 ≤ 0. In
the next section we will investigate this notion in more detail.
Since we are interested in minimization problems in (infinite-dimensional)
Banach spaces, we ask ourselves how we can transfer the above technique to this
setting. For a domain ⊂ Rd and F : L2 () → R convex and differentiable on
the real Hilbert space L2 () and u∗ a solution of

min F (u),
u∈L2 (), u2 ≤1

we have DF (u∗ ) = 0 if u∗ 2 < 1. Otherwise, we vary by tv with (u∗ , v) < 0,

and conclude that ∂t∂ u + tv22 = 2(u, v) + 2tv22 and for t small enough, we
have u∗+ tv2 < 1. Using that u∗ is a minimizer, we obtain (DF (u∗ ), v) =
limt →0 1t F (u∗ + tv) − F (u∗ ) ≥ 0. For this to hold for all v with (v, u∗ ) < 0, we
need that DF (u∗ ) + μ∗ u∗ = 0 for some μ∗ ≥ 0 (Exercise 6.8). Hence, we can write
6.2 Foundations of the Calculus of Variations and Convex Analysis 283

the optimality system as

∗
DF (u∗ ) + μ∗ uu∗ 2 = 0, u∗ 2 − 1 ≤ 0, μ∗ ≥ 0, u∗ 2 − 1 μ∗ = 0.

Again, we have a Lagrange multiplier μ∗ ∈ R.

The situation is different if we consider a pointwise almost everywhere con-
straint, i.e.,

min F (u).
u∈L2 (), u∞ ≤1

If we have a minimizer with u∗ ∞ < 1, we can not conclude that DF (u∗ ) = 0,

since the set {u ∈ L () u∞ ≤ 1} has empty interior (otherwise, the embedding
2

L∞ () → L2 () would be surjective, by the open mapping theorem, Theo-

rem 2.16, which is a contradiction). However, for t > 0 small enough we have,
u ∗ < 1 for all measurable v = σ χ , σ ∈ {−1, 1}, ⊂ . Thus we have
+ tv∞ ∗ ∗
|DF (u )| dx = 0 for all measurable ⊂ and hence DF (u ) = 0.
∗
The case u ∞ = 1 is more difficult to analyze. On the one hand, we see,
similarly to the above argumentation, that for all measurable subsets ⊂ with
u∗ | ∞ < 1, one has DF (u∗ )| = 0; and a similar claim holds
on the union of
all such sets 0 . With + = {x ∈ u∗ (x) = 1}, − = {x ∈ u∗ (x) = −1} we
of in 0 ∪ + ∪ − . Now we conclude
get, up to a null set, a disjoint partition
for all measurable ⊂ + that DF (u∗ ) dx ≤ 0; i.e. DF (u∗ ) ≤ 0 almost
everywhere in + . Similarly, we obtain DF (u∗ ) ≥ 0 almost everywhere in − . In
conclusion, we have the following optimality system: there exists μ∗ ∈ L2 (), such
that

DF (u∗ ) + μ∗ sgn(u∗ ) = 0 and

|u∗ | − 1 ≤ 0, μ∗ ≥ 0, (|u∗ | − 1)μ∗ = 0 almost everywhere in .

The Lagrange multiplier is, in contrast to the above cases, an infinite-dimensional

object.
As we see, different constraints lead to qualitatively different optimality con-
ditions. Moreover, the derivation of these conditions can be lengthy and depends
on the underlying space. Hence, we would like to have methods to treat convex
constraints in a unified way. We will see that the subdifferential is well suited for
this task. Before we come to its definition, we present the following motivation for
this generalized notion of differentiability.
Example 6.38 For a, b, c ∈ R with a > 0, ac − b2 > 0, and f ∈ R2 we consider
the minimization problem

au21 + 2bu1 u2 + cu22

min F (u), F (u) = − f1 u1 − f2 u2 + |u1 | + |u2 |.
u∈R2 2
284 6 Variational Methods

It is easy to see that a unique solution exists. The functional F is convex, continuous,
and coercive, but not differentiable. Nonetheless, a suitable case distinction allows
us to determine the solution. As an example, we treat the case a = 1, b = 0, and
c = 1.
1. If u∗1 = 0 and u∗2 = 0, then DF (u∗ ) = 0 has to hold, i.e.,

u∗1 − f1 + sgn(u∗1 ) = 0, u∗1 + sgn(u∗1 ) = f1 ,
⇐⇒
u∗2 + f2 + sgn(u∗2 ) = 0, u∗2 + sgn(u∗2 ) = f2 ,

and hence |f1 | > 1 as well as |f2 | > 1 (since we would get a contradiction
otherwise). Is it easy to check that the solution is

u∗1 = f1 − sgn(f1 ), u∗2 = f2 − sgn(f2 ),

in this case.
2. If u∗1 = 0 and u∗2 = 0, then F is still differentiable with respect to u2 , and we
obtain u∗2 + sgn(u∗2 ) = f2 , and hence |f2 | > 1 and u∗2 = f2 − sgn(f2 ).
3. For u∗1 = 0 and u∗2 = 0 we obtain similarly |f1 | > 1 and u∗1 = f1 − sgn(f1 ).
4. The case u∗1 = u∗2 = 0 does not lead to any new conclusion.
All in all, we get
⎧
⎪
⎪ (0, 0) if |f1 | ≤ 1, |f2 | ≤ 1,
⎪
⎨f − sgn(f ), 0
⎪
if |f1 | > 1, |f2 | ≤ 1,
u∗ =
1 1

⎪
⎪ 0, f2 − sgn(f2 ) if |f1 | ≤ 1, |f2 | > 1,
⎪
⎪
⎩
f1 − sgn(f1 ), f2 − sgn(f2 ) if |f1 | > 1, |f2 | > 1,

since anything else would contradict the conclusions of the above cases 1–3.
In the case of general a, b, c, the computations get a little bit more involved, and
a similar claim holds in higher dimensions. In the case of infinite dimensions with
A ∈ L(2 , 2 ) symmetric and positive definite, f ∈ 2 , however, we cannot apply
the above reasoning to the problem

∞
(u, Au)
min − (f, u) + |ui |,
u∈2 2
i=1

since the objective functional is nowhere continuous, and consequently not differ-
entiable.
The above example shows again that a unified treatment of minimization
problems with nondifferentiable (or even better, noncontinuous) convex objectives
is desirable. The subdifferential is an appropriate tool in these situations.
6.2 Foundations of the Calculus of Variations and Convex Analysis 285

6.2.3 Subdifferential Calculus

Some preparations are in order before we define subgradients and the subdifferen-
tial.
Lemma 6.39 Let X be a complex normed space. Then there exist a real normed
space XR and norm-preserving maps iX : X → XR and jX∗ : X∗ → XR ∗ , such that
∗ ∗ ∗ ∗
jX x , iX x = Re x , x for all x ∈ X and x ∈ X .
∗

Proof The complex vector space X turns into a real one XR by restricting the scalar
multiplication to real numbers. Then · XR = · X is a norm on XR , and hence
iX = id maps X → XR and preserves the norm. We define jX∗ : X∗ → XR ∗ via

−1
jX∗ x ∗ , xXR∗ ×XR = Re x ∗ , iX xX∗ ×X ∀x ∈ XR .

It remains to show that jX∗ preserves the norm. On the one hand, we have
−1
jX∗ x ∗ XR∗ = sup |Re x ∗ , iX x| ≤ sup |x ∗ , x| = x ∗ X∗ .
xXR ≤1 xX ≤1

(x n ) in X with x X ≤ 1
On the other hand, we can choose for every sequence n
∗ ∗

n ∗
and |x , x | → x X∗ the sequence x̄ = iX sgn x , x x in XR , which also
n n n

satisfies x̄ n XR ≤ 1, and moreover

> ?
jX∗ x ∗ XR∗ ≥ lim |jX∗ x ∗ , x̄ n | = lim Re x ∗ , sgn x ∗ , x n x n
n→∞ n→∞
∗ ∗
= lim |x , x | = x X∗ .
n
n→∞

Hence jX∗ x ∗ XR∗ = x ∗ X∗ , as claimed. $

#
Remark 6.40
−1 ∗ ∗
• We see that jX∗ iX ∗ : (X )R → XR is always an isometric isomorphism, hence
we will tacitly identify these spaces.
−1
• Analogously, for A ∈ L(X, Y ) we can form AR = iY AiX in L(XR , YR ). The
∗ ∗ −1
adjoint is AR = jX∗ A jY ∗ .
• For (pre-) Hilbert spaces X, this construction leads to the scalar product
−1 −1
(x, y)XR = Re (iX x, iX y)X for x, y ∈ XR .
Due to the above consideration we can restrict ourselves to real Banach and
Hilbert spaces in the following and can still treat complex-valued function spaces.
As another prerequisite we define a suitable linear arithmetic for set-valued
mappings, also called graphs.
Definition 6.41 (Set-Valued Mappings, Graphs) Let X, Y be real normed
spaces.
286 6 Variational Methods

1. A set-valued mapping
F : X ⇒ Y or graph is a subset F ⊂ X × Y . We write
F (x) = {y ∈ Y (x, y) ∈ F } and use y ∈ F (x) synonymous to (x, y) ∈ F .
every mapping F : X → Y we denote its graph also by F =
2. For
x, F (x) x ∈ X and use F (x) = y and F (x) = {y} interchangeably.
3. For set-valued mappings F, G : X ⇒ Y and λ ∈ R let

(F + G)(x) = {y1 + y2 y1 ∈ F (x), y2 ∈ G(x)},

(λF )(x) = {λy y ∈ F (x)}.

4. For a real normed space Z and G : Y ⇒ Z, we define the composition of G and

F as

(G ◦ F )(x) = {z ∈ G(y) y ∈ F (x)}.

5. The inversion F −1 : Y ⇒ X of F : X ⇒ Y is defined by

F −1 (y) = {x ∈ X y ∈ F (x)}.

As a motivation for the definition of the subgradient we recall Theorem 6.33.

The derivative of a convex functional F , which is also Gâteaux differentiable, is
characterized by the inequality (6.9). The idea behind the subgradient is, to omit the
assumption of Gâteaux differentiability to obtain a generalized notion of derivative.
Definition 6.42 (Subgradient, Subdifferential) Let X be a real normed space and
F : X → R∞ a convex functional. An element w ∈ X∗ is called a subgradient if

F (u) + w, v − u ≤ F (v) ∀v ∈ X. (6.10)

The relation (u, w) ∈ ∂F ⇔ (u, w) satisfies (6.10) defines a graph ∂F : X ⇒
X∗ , called the subdifferential of F . The inequality (6.10) is called the subgradient
inequality.
Hence, the set ∂F (u) consists of all slopes of affine linear supporting functionals
that realize the value F (u) at u and are below F . It may well happen that ∂F (u) is
empty or contains more than one element.
The subdifferential provides a handy generalization of Corollary 6.35 for mini-
mization problems with convex functionals.
Theorem 6.43 (Optimality for Convex Minimization Problems) Let F : X →
R∞ be a convex functional on a real normed space. Then

u∗ ∈ X solves min F (u) ⇐⇒ 0 ∈ ∂F (u∗ ).

u∈X
6.2 Foundations of the Calculus of Variations and Convex Analysis 287

Proof An element u ∈ X is a solution of minu∈X F (u) if and only if F (u∗ ) ≤ F (u)

for all u ∈ X. But F (u∗ ) = F (u∗ )+0, u − u∗ is, by the definition of subgradients,
equivalent to 0 ∈ ∂F (u∗ ). $
#
To apply the above result to concrete problems, we spend some time to analyze
further properties of the subdifferential. In general, the subgradient obeys “almost”
the same rules as the classical derivative, but some care has to be used. Examples
will show that many functionals relevant to our applications that do not have a
classical derivative can be treated with this generalized concept.
Example 6.44 (Subdifferential in R) Let us discuss the subgradients of the convex
functions ϕ1 , ϕ2 : R → R∞ given by

10 t − 4 t if t ≤ 0, if t ∈ [−1, 1],
3 2 1
0
ϕ1 (t) = ϕ2 (t) =
2t + t ∞ otherwise.
1 2
if t > 0,

The first function is differentiable on R\{0}. It has a kink at the origin, and it is easy
to see that st ≤ 12 t 2 + t for all t ≥ 0 if and only if s ≤ 1. Similarly, st ≤ 10
3 2
t − 14 t
holds for all t ≤ 0 if and only if s ≥ − 14 . Hence, ∂ϕ1 (0) = [− 14 , 1].
The function ϕ2 is constant on ]−1, 1[, hence differentiable there with derivative
0. At the point 1 we note that s(t −1) ≤ 0 for all t ∈ [−1, 1] if and only if s ≥ 0. This
shows that ∂ϕ2 (1) = [0, ∞[. A similar argument shows that ∂ϕ2 (−1) = ]−∞, 0],
and the subgradient is empty at all other points. In conclusion we get
⎧
⎧ ⎪ ]−∞, 0] if t = −1,
⎪ ⎪
⎪
⎨{ 10 t − 4 }
6 1
⎪ if t < 0, ⎪
⎨{0} if t ∈ ]−1, 1[,
∂ϕ1 (t) = [− 14 , 1] if t = 0, ∂ϕ2 (t) =
⎪
⎪ ⎪[0, ∞[
⎪ if t = 1,
⎩{t + 1} ⎪
⎪
if t > 0, ⎩
∅ otherwise

(see also Fig. 6.8).

Before we come to the proofs of the next theorems we note that an element of
the subgradient has a geometric interpretation. Every w ∈ ∂F (u) gives via

{(v, t) w, v − t = w, u − F (u)}

a closed hyperplane in X×R that separates epi F and { u, F (u) }. It is not “vertical”
in the sense that its projection onto X is the whole space. Compare this with the
Hahn-Banach theorem: in this generality we get only hyperplanes that separate
epi F and { u, F (u) − ε } for every ε > 0. To treat the limiting case, which is
what we need for the subgradient, we note the following variant of the separation
theorem of Hahn-Banach, sometimes called Eidelheit’s theorem.
Lemma 6.45 (Eidelheit’s Theorem) Let A, B ⊂ X be nonempty convex subsets
of a normed space X. If int(A) = ∅ and int(A) ∩ B = ∅, then there exist x ∗ ∈ X∗ ,
288 6 Variational Methods

ϕ1 ϕ2
2 2
1.5 1.5
1 1
0.5 0.5
s s
-2 -1 0 1 2 -2 -1 -0.5 0 1 2
-0.5
-1 -1

∂ϕ1 ∂ϕ2
3 3
2 2
1 1
s s
-2 -1 0 1 2 -2 -1 0 1 2
-1 -1
-2 -2

Fig. 6.8 Example of subdifferentials of convex functions. The top row shows the graphs, the
bottom row the respective subdifferentials. The gray affine supporting functionals correspond to
the gray points in the subdifferential. The function ϕ1 is differentiable except at the origin, and
there the subgradient is a compact interval; the subgradient of the indicator functional ϕ2 is the
nonpositive semiaxis at −1 and the nonnegative semiaxis at 1. Outside of [−1, 1] it is empty

x ∗ = 0, and λ ∈ R such that

Re x ∗ , x ≤ λ for all x ∈ A and Re x ∗ , x ≥ λ for all x ∈ B.

Proof First, Theorem 2.29 provides us with x ∗ ∈ X∗ , x ∗ = 0, and λ ∈ R such that

Re x ∗ , x ≤ λ for all x ∈ int(A) and Re x ∗ , x ≥ λ for all x ∈ B.

But since int(A) = A (Exercise 6.9), there exists for every x ∈ A a sequence (x n )
in int(A) converging to x, and hence Re x ∗ , x = limn→∞ Re x ∗ , x n ≤ λ. $
#
Now we collect some fundamental properties of the subdifferential.
Theorem 6.46 Let F : X → R∞ be a convex function on a real normed space X.
Then the subdifferential ∂F satisfies the following:
1. For every u, the set ∂F (u) is a convex weakly* closed subset of X∗ .
2. If F is also lower semicontinuous, then ∂F is a strongly-weakly* and also
weakly-strongly closed subset of X × X∗ , i.e., for sequences ((un , wn )) in ∂F ,
=
∗
un → u in X, wn w in X∗
⇒ (u, w) ∈ ∂F.
or un u in X, wn → w in X∗

3. If F is continuous at u, then ∂F (u) is nonempty and bounded.

6.2 Foundations of the Calculus of Variations and Convex Analysis 289

Proof Assertions 1 and 2: The proof is a good exercise (Exercise 6.10).

Assertion 3: Let F be continuous at u. Then there is a δ > 0 such that
|F (v) − F (u)| < 1 for v ∈ Bδ (u). In particular, for some w ∈ ∂F (u) and all
v ∈ X with vX < 1, one has the inequality

1 > F (u + δv) − F (u) ≥ w, δv ⇒ w, v < δ −1 .

Taking the supremum over vX < 1, we obtain wX∗ ≤ δ −1 , and hence ∂F (u)
is bounded.
To show that ∂F (u) is not empty, note that epi F has nonempty interior, since
openset Bδ (u) × ]F (u) + 1, ∞[ is a subset of the epigraph of F . Moreover,
the
u, F (u) is not in int(epi F ), since every (v, t) ∈ int(epi(F )) satisfies t > F (v).
Now we choose A = epi F and B = { u, F (u) } in Lemma 6.45 to get 0 =
(w0 , t0 ) ∈ X∗ × R,

w0 , v + t0 t ≤ λ ∀v ∈ dom F, F (v) ≤ t and w0 , u + t0 F (u) ≥ λ.

Taking v = u, t = F (u) shows that λ = w0 , u + t0 F (u). Also t0 < 0 holds,

since t0 > 0 leads to a contradiction by letting t → ∞ and for t0 = 0 we would get
w0 , v − u ≤ 0 for all v ∈ Bδ (u), which would imply w0 = 0, which contradicts
(w0 , t0 ) = 0. With w = −t0−1 w0 and some rearrangements we finally get

F (u) + w, v − u ≤ F (v) ∀v ∈ dom F ⇒ w ∈ ∂F (u). $

If F is continuous at u, then one needs to check the subgradient inequality only

for v in a dense subset of dom F to calculate ∂F (u). In the case of finite-dimensional
spaces, this even follows, somewhat surprisingly, from convexity alone, even at
points where F is not continuous.
Lemma 6.47 Let F : RN → R∞ be proper and convex, and V ⊂ dom F a dense
subset of dom F . If for some u ∈ dom F , w ∈ RN and all v ∈ V one has

F (u) + w · (v − u) ≤ F (v),

then w ∈ ∂F (u).
Proof We show that for every v 0 ∈ dom F there exists a sequence (v n ) in V with
limn→∞ v n = v 0 and lim supn→∞ F (v n ) ≤ F (v 0 ). Then the claim follows by
taking limits in the subgradient inequality.
Let v 0 ∈ dom F and set

K = max {k ∈ N ∃ u1 , . . . , uk ∈ dom F, u1 − v 0 , . . . , uk − v 0 linearly independent}.
290 6 Variational Methods

The case K = 0 is trivial; hence we assume K ≥ 1 and choose u1 , . . . , uK ∈ dom F

such that

dom F ⊂ U = v 0 + span(u1 − v 0 , . . . , uK − v 0 ).

Now we consider the sets

K K
1
Sn = v 0 + λi (ui − v 0 ) λi ≤ , λ1 , . . . , λK ≥ 0
n
i=1 i=1

for n ≥ 1 and note that their interior with respect to the relative topology on U is
not empty. Hence, for every n ≥ 1 there exists some v n ∈ Sn ∩ V , since V is also
dense in Sn . We have

K
K
K
vn = v0 + λni (ui − v 0 ) = 1 − λni v 0 + λni ui
i=1 i=1 i=1

K
for suitable λn1 , . . . , λnK ≥ 0 with n
i=1 λi ≤ 1
n. Thus, limn→∞ v n = v 0 and by
convexity of F

K
K
lim sup F (v n ) ≤ lim sup 1 − λni F (v 0 ) + λni F (ui ) = F (v 0 ). $
#
n→∞ n→∞
i=1 i=1

The following example is a first nontrivial application of the calculus of

subdifferentials.
Example 6.48 (Subdifferential of Convex and Closed Constraints) Let K be a
nonempty, convex, and closed set in a real normed space X. The subdifferential
of the indicator functional ∂IK at u ∈ K is characterized by

w ∈ ∂IK (u) ⇔ w, v − u ≤ 0 for all v ∈ K.

It is easy to see that ∂IK (u) is a convex cone: for w1 , w2 ∈ ∂IK (u) we also have
w1 + w2 ∈ ∂IK (u), and for u ∈ ∂IK (u), α ≥ 0, also αu ∈ ∂IK (u). This cone is
called the normal cone (of K at u). Moreover, it is always the case that 0 ∈ ∂IK (u),
i.e., the subgradient is nonempty exactly on K. The special case K = U + u0 , with
a closed subspace U and u0 ∈ X, leads to ∂IK (u) = U ⊥ for all u ∈ K.
If K = ∅ satisfies

K = {u ∈ X G(u) ≤ 0, G : X → R convex and Gâteaux differentiable},
6.2 Foundations of the Calculus of Variations and Convex Analysis 291

and if some u with G(u) < 0 exists, then we claim that

{μDG(u) μ ≥ 0, μG(u) = 0} if u ∈ K,
∂IK (u) =
∅ otherwise.

For G(u) < 0 we need to show that ∂IK (u) = {0}. By continuity of G we haves for
a suitable δ > 0 that G(u + v) < 0 for all vX < δ, and hence, every w ∈ ∂IK (u)
satisfies the inequality w, v = w, u + v − u ≤ 0 for all vX < δ; thus w = 0
has to hold. Consequently, {0} is the only element in ∂G(u).

For G(u) = 0 the claim is ∂IK (u) = {μDG(u) μ ≥ 0}. We argue that
DG(u) = 0 has to hold: otherwise, u would be a minimizer of G and the functional
could not take on negative values. Now choose w ∈ ∂IK (u). For every v ∈ X with
DG(u), v = −αv < 0, Gâteaux differentiability enables us to find some t > 0
such that
αv
G(u + tv) − G(u) − DG(u), tv ≤ t.
2

This implies G(u + tv) ≤ t ( α2v + DG(u), v) = −t α2v < 0 and hence u + tv ∈ K.
Plugging this into the subgradient inequality we get

w, v ≤ 0 for all DG(u), v < 0.

Now assume that w ∈ / {μDG(u) μ ≥ 0}. For w = μDG(u) with μ < 0, we get
for every v with DG(u), v < 0 the inequality w, v > 0, a contradiction. Hence,
we can assume that w and DG(u) are linearly independent. We conclude that the
mapping T : X → R2 with v → (DG(u), v, w, v) is surjective: otherwise,
there would be a pair (α, β) = 0 with αDG(u), v = βw, v for all v ∈ X,
DG(u), and w would not be linearly independent. This shows the existence of
some v ∈ X with DG(u), v < 0 and w, v > 0, which implies the desired
contradiction. Hence, we conclude that w = μDG(u) with μ ≥ 0.
Finally, every w = μDG(u) with μ ≥ 0 is an element of ∂G(u), since from
Theorem 6.33 we get that for all v ∈ K,

w, v − u = μG(u) + μDG(u), v − u ≤ μG(v) ≤ 0.

Hence, the subgradient of IK can be expressed in terms of G and its derivative

only. Moreover, it contains at most one direction, a property that is not satisfied in
the general case (see Fig. 6.9).

Example 6.49 (Subdifferential of Norm Functionals) Let ϕ : [0, ∞[ → R∞ be

a convex monotonically increasing function and set R = sup {t ≥ 0 ϕ(t) < ∞},
where we allow R = ∞. Then the functional F (u) = ϕ uX is convex on the
real normed space X.
292 6 Variational Methods

∂IK (u)
DG(u)
0
u u
G≤0 K

DG(u), (v − u) 0

Fig. 6.9 Visualization of the normal cone associated with convex constraints. Left: The normal
cone for the set K = {G ≤ 0} with Gâteaux differentiable G consists of the nonnegative multiples
of the derivative DG(u). The plane DG(u), v − u = 0 is “tangential” to u, and K is contained
in the corresponding nonpositive halfspace DG(u), v − u ≤ 0. Right: An example of a convex
set for which the normal cone ∂IK (u) at some point u contains more than one linearly independent
direction

The subgradient of F in u is given by

∂F (u) = w ∈ X∗ w, u = wX∗ uX and wX∗ ∈ ∂ϕ uX .

To prove this claim, let w ∈ ∂F (u) for u ≤ R. For every vector v ∈ X with
vX = uX , the subgradient inequality (6.10) implies

ϕ uX + w, v − u ≤ ϕ uX ⇒ w, v ≤ w, u ≤ wX∗ uX .

Taking the supremum over all vX = uX , we obtain w, u = wX∗ uX .
For u = 0, we get, additionally, by the subgradient inequality that for fixed t ≥ 0
and all vX = t, one has

ϕ(0) + w, v − 0 ≤ ϕ vX = ϕ(t) ⇒ ϕ(0) + wX t ≤ ϕ(t).

And since t ≥ 0 is arbitrary, we get wX∗ ∈ ∂ϕ uX . (For the latter claim we
have implicitly extended ϕ by ϕ(t) = infs≥0 ϕ(s) for t < 0.) For the case u = 0 we
plug v = tu−1X u for some t ≥ 0 in the subgradient inequality,

ϕ(t) = ϕ vX ≥ ϕ uX + w, v − u = ϕ uX + wX∗ (t − uX ),

and conclude that wX∗ ∈ ∂ϕ uX also in this case.
∗
To prove the reverse
inclusion, let w ∈ X be such that w, u = wX∗ uX
and wX∗ ∈ ∂ϕ uX . For all v ∈ X, we have

ϕ uX + w, v − u ≤ ϕ uX + wX∗ (vX − uX ) ≤ ϕ vX ,

i.e., w ∈ ∂F (u).
6.2 Foundations of the Calculus of Variations and Convex Analysis 293

Theorem 6.46 says that ∂F (u) = ∅ for all uX < R. If, additionally, ∂ϕ(R) = ∅
holds, we also have ∂F (u) = ∅ for all uX ≤ R, since there always exists w ∈ X∗
that satisfies w, u = wX∗ uX if the norm wX∗ is prescribed.
If X is a Hilbert space, we can describe ∂F a little bit more concretely. Using the
Riesz map JX−1 we argue as follows: For u = 0, one has w, u = wX∗ uX
if and only if (u, JX−1 w) = JX−1 wX uX , which in turn is equivalent to the
existence of some λ ≥ 0 such that JX−1 w = λu holds. Then, the condition
wX ∈ ∂ϕ uX ) becomes λ ∈ ∂ϕ uX /uX , and thus the subgradient is
given by λJX u with λ ∈ ∂ϕ uX /uX . For u = 0, it consists exactly of these
JX v ∈ X∗ with vX ∈ ∂ϕ(0). In conclusion, we get
⎧
⎨∂ϕ uX JX u if u = 0,
∂F (u) = uX
⎩
∂ϕ(0)JX {vX = 1} if u = 0.

Example 6.50 (Subdifferential for Convex Functions of Integrands) As in Exam-

ple 6.23 let ϕ : RN → R∞ , N ≥ 1 be a convex functional,
p ∈ [1, ∞[,
and F : Lp (, RN ) → R∞ given by F (u) = ϕ u(x) dx. If ϕ is lower
semicontinuous, then the subgradient of F at u ∈ Lp (, RN ) is the set
∗
∂F (u) = {w ∈ Lp (, RN ) w(x) ∈ ∂ϕ u(x) for almost all x ∈ }.

N ∗ ∗
If w ∈ L (, R ) = L (, R ) satisfies
This can be seen as follows: p p N the

condition w(x) ∈ ∂ϕ u(x) almost everywhere, we take any v ∈ Lp (, RN )
and plug v(x) for almost every x in the subgradient inequality for φ and get, after
integration,

ϕ u(x) dx + w, v − uLp ×Lp∗ ≤ ϕ v(x) dx,

i.e., w ∈ ∂F (u). For the reverse inclusion, let w ∈ ∂F (u). Then for every v ∈
Lp (, RN ), we have

ϕ v(x) − ϕ u(x) − w(x) · v(x) − u(x) dx ≥ 0.

Now choose an at most countable set V ⊂ dom ϕ, which is dense in dom ϕ. For
every v̄ ∈ V and every measurable A ⊂ with μ(A) < ∞ we can plug vA (x) =
χA v̄ + χ\A u in the subgradient inequality and get

ϕ v(x) − ϕ u(x) − w(x) · v(x) − u(x) dx ≥ 0
A
294 6 Variational Methods

and consequently

ϕ u(x) + w(x) · v̄ − u(x) ≤ ϕ v̄ for almost every x ∈ .

Since V is countable, the union of all sets where the above does not hold is still a
nullset, and hence we get that for almost every x ∈ ,

ϕ u(x) + w(x) · v̄ − u(x) ≤ ϕ(v̄) for all v̄ ∈ V .

By Lemma 6.47 we finally conclude that w(x) ∈ ∂ϕ u(x) for almost every x ∈ .
Now we prove some useful rules for subdifferential calculus. Most rules are
straightforward generalizations of the respective rules for the classical derivatives,
sometimes with additional assumptions.
In this context we denote translations as follows: Tu0 u = u+u0 . Since this notion
is used in this chapter exclusively, there will be no confusion with the distributions
Tu0 induced by u0 (cf. Sect. 2.3). The rules of subgradient calculus are particularly
useful to find minimizers (cf. Theorem 6.43). Note that these rules often require
additional continuity assumptions.
Theorem 6.51 (Calculus for Subdifferentials) Let X, Y be real normed spaces,
F, G : X → R∞ proper convex functionals, and A : Y → X linear and continuous.
The subdifferential obeys the following rules:
1. ∂(λF ) = λ∂F for λ > 0,
2. ∂(F ◦ Tu0 )(u) = ∂F (u + u0 ) for u0 ∈ X,
3. ∂(F + G) ⊃ ∂F + ∂G and ∂(F + G) = ∂F + ∂G if F is continuous at some
point u0 ∈ dom F ∩ dom G,
4. ∂(F ◦ A) ⊃ A∗ ◦ ∂F ◦ A and ∂(F ◦ A) = A∗ ◦ ∂F ◦ A if F is continuous at some
point u0 ∈ rg(A) ∩ dom F .
Proof Assertions 1 and 2: It is simple to check the rules by direct application of the
definition.
Assertion 3: The inclusion is immediate: for u ∈ X, w1 ∈ ∂F (u) and w2 ∈
∂G(u) the subgradient inequality (6.10) implies

F (u) + G(u) + w1 + w2 , v − u
= F (u) + w1 , v − u + G(u) + w2 , v − u ≤ F (v) + G(v)

for all v ∈ X.
For the reverse inclusion, let w ∈ ∂(F + G)(u), which implies u ∈ dom F ∩
dom G and hence

F (v) − F (u) − w, v − u ≥ G(u) − G(v) for all v ∈ dom F ∩ dom G.

(6.11)
6.2 Foundations of the Calculus of Variations and Convex Analysis 295

With F̄ (v) = F (v) − w, v the inequality becomes F̄ (v) − F̄ (u) ≥ G(u) − G(v).
Now we aim to find a suitable linear functional that “fits” between this inequality,
i.e., some w2 ∈ X∗ for which

F̄ (v) − F̄ (u) ≥ w2 , u − v for all v ∈ dom F,

(6.12)
w2 , u − v ≥ G(u) − G(v) for all v ∈ dom G.

We achieve this by a suitable separation of the sets

K1 = v, t − F̄ (u) ∈ X × R F̄ (v) ≤ t , K2 = v, G(u) − t ∈ X × R G(v) ≤ t .

We note that K1 , K2 are nonempty convex sets, and moreover, int(K1 ) is not empty
(the latter due to continuity of F̄ in u0 ). Also we note that int(K1 ) ∩ K2 = ∅,
since (v, t) ∈ int(K1 ) implies t > F̄ (v) − F̄ (u) while (v, t) ∈ K2 means that
G(u) − G(v) ≥ t. If both were satisfied we would get a contradiction to (6.11).
Lemma 6.45 implies that there exist 0 = (w0 , t0 ) ∈ X∗ × R and λ ∈ R such that

w0 , v + t0 t − F̄ (u) ≤ λ ∀v ∈ dom F, F̄ (v) ≤ t,

w0 , v + t0 G(u) − t ≥ λ ∀v ∈ dom G, G(v) ≤ t.

Now we show that t0 < 0. The case t0 > 0 leads to a contradiction by letting v = u,
t > F̄ (u) and t → ∞. In the case t0 = 0, we would get w0 , v ≤ λ for all v ∈
dom F and especially w0 , u0 < λ, since u0 is in the interior of dom F . However,
since u0 ∈ dom G, we also get w0 , u0 ≥ λ, which is again a contradiction.
With t = F̄ (v) and t = G(v), respectively, we obtain

F̄ (v) − F̄ (u) ≥ −t0−1 w0 , v + t0−1 λ ∀v ∈ dom F,

G(u) − G(v) ≤ −t0−1 w0 , v + t0−1 λ ∀v ∈ dom G.

Letting v = u in both inequalities and w2 = t0−1 w0 , we see that λ = w0 , u,

which implies (6.12). On the one hand, this implies w2 ∈ ∂G(u), and on the other
hand, by definition of F̄ , we get w1 = w − w2 ∈ ∂F (u). Altogether, we get w ∈
∂F (u) + ∂G(u) and the proof is complete.
Assertion 4: Again, it is simple to see the inclusion: for w = A∗ w̄ with w̄ ∈
∂F (Au), we get

F (Au) + A∗ w̄, v − u = F (Au) + w̄, Av − Au ≤ F (Av)

for all v ∈ Y . This shows that w ∈ ∂(F ◦ A)(u).

The proof of the reverse inclusion proceeds analogously to item 3. Let w ∈ ∂(F ◦
A)(u), i.e.,

F (Au) + w, v − u ≤ F (Av) for all v ∈ Y. (6.13)

296 6 Variational Methods

We aim to introduce a separating linear functional into this inequality that amounts
to a separation of the nonempty convex sets

K1 = epi F, K2 = Av, F (Au) + w, v − u ∈ X × R v ∈ Y .

In analogy to item 3 we note that int(K1 ) is nonempty and also int(K1 ) ∩ K2 = ∅.

Again, by Lemma 6.45 there is some 0 = (w0 , t0 ) ∈ X∗ × R such that

w0 , v̄ + t0 t ≤ λ ∀v̄ ∈ dom F, t ≥ F (v̄),

(6.14)
w0 , Av + t0 F (Au) + w, v − u ≥ λ ∀v ∈ Y.

The case t0 > 0 cannot occur, and t0 = 0 follows from the continuity of F at u0 and
u0 ∈ dom F ∩ rg(A). If we set v̄ = Au, t = F (v̄) and v = u, we also conclude that
λ = w0 , Au + t0 F (Au). By the second inequality in (6.14) we get

w0 , Av − Au + t0 w, v − u ≥ 0 ∀v ∈ Y

and hence w = −t0−1 A∗ w0 . Setting w̄ = −t0−1 w0 , the first inequality in (6.14)

implies

w̄, v̄ − Au + F (Au) ≤ F (v̄) ∀v̄ ∈ dom F,

i.e., w̄ ∈ ∂F (Au). This shows that ∂(F ◦ A)(u) ⊂ (A∗ ◦ ∂F ◦ A)(u). $

#
Corollary 6.52 Let F be convex and continuous at u0 and ∂F (u0 ) = {w}. Then F
is Gâteaux-differentiable at u0 with DF (u0 ) = w.
Proof Without loss of generality, we can assume that u0 = 0. For all v ∈ X
we define, the convex function t → Fv (t) = F (tv). By Theorem 6.51 we have
∂Fv (0) = {w, v}, which implies, for t > 0,

Fv (t) − Fv (0)
0≤ − w, v.
t
On the other hand, for every ε > 0, there exists tε > 0 such that Fv (tε ) < Fv (0) +
tε w, v + tε ε (note that w, v + ε ∈
/ ∂Fv (0)). By convexity of Fv , for every
t ∈ [0, tε ] one has

t tε − t
Fv (t) ≤ Fv (tε ) + Fv (0) ≤ F (0) + tw, v + tε,
tε tε

and hence
Fv (t) − Fv (0)
− w, v ≤ ε.
t
6.2 Foundations of the Calculus of Variations and Convex Analysis 297

Since ε > 0 was arbitrary, it follows that limt →0+ 1t Fv (t) − Fv (0) = w, v,
which proves Gâteaux-differentiability and DF (u0 ) = w. #
$
Remark 6.53
• The assertion in item 4 of Theorem 6.51 remains valid in other situations as well.
If, for example, rg(A) = X and F is convex, then ∂(F ◦ A) = A∗ ◦ ∂F ◦ A
without additional assumptions on continuity (cf. Exercise 6.11).
As another example, in the case of a densely defined linear mapping A :
dom A ⊂ Y → X, dom A = Y , we obtain the same formula for ∂(F ◦ A)
(cf. Example 6.23) when the adjoint has to be understood in the sense of
Definition 2.25 (Exercise 6.12).
• One can also generalize the continuity assumptions. Loosely speaking, the sum
rule holds if continuity of F and G at one point holds relatively to some subspaces
whose sum is the whole of X and on which one can project continuously.
Analogously, the continuity of F with respect to a subspace that contains the
complement of rg(A) is sufficient for the chain rule to hold (cf. Exercises 6.13–
6.16).
On suitable spaces, there are considerably more “subdifferentiable” convex
functions than Gâteaux-differentiable or continuous ones. We prove this claim using
the sum rule.
Theorem 6.54 (Existence of Nonempty Subdifferentials) Let F : X → R∞
be proper, convex and lower semicontinuous on a reflexive Banach space X. Then
∂F = ∅.
Proof Choose some u0 ∈ X with F (u0 ) < ∞ and t0 < F (u0 ) and consider the
minimization problem

min v − u0 2X + (t − t0 )2 + Iepi F (v, t) . (6.15)
(v,t )∈X×R

Since F is convex and lower semicontinuous, Iepi F is convex and lower semicontin-
1/2
uous (Remark 6.26 and Example 6.29). The functional (v, t) = v2X + t 2
is a norm on X × R, and hence convex, continuous, and coercive (Example 6.23
and
Remark
6.13), and obviously, the same holds for the functional G defined by
G (v, t) = (v − u0 , t − t0 )2 . Hence, problem (6.15) satisfies all assumptions of
Theorem 6.31 (see also Lemmas 6.14 and 6.21) and consequently, it has a minimizer
(u, τ ) ∈ epi F .
We prove that τ = t0 . Ifτ = t0 held,
we would get u = u and also that the
0

segment joining (u, t0 ) and u , F (u ) was contained in epi F . For λ ∈ [0, 1] we

0 0

would get
2
G u + λ(u0 − u), t0 + λ(F (u0 ) − t0 ) = (λ − 1)2 u − u0 2X + λ2 F (u0 ) − t0 .
298 6 Variational Methods

2
Setting a = u − u0 2X and b = F (u0 ) − t0 , we calculate that the right-hand side
is minimal for λ = a/(a + b) ∈ ]0, 1[, which leads to

ab2 + a 2 b a(b2 + 2ab + a 2 )

G u + λ(u0 − u), t0 + λ(F (u0 ) − t0 ) = < = u − u0 2X ,
(a + b)2 (a + b)2

since u − u0 2X > 0. Then (u, τ ) would not be a minimizer, which is a

contradiction.
By the sum rule (Theorem 6.51, item 3) we write the optimality condition as

0 ∈ ∂(Iepi F + G)(u, τ ) = ∂Iepi F (u, τ ) + ∂G(u, τ ).

In particular, there exists (w, s) ∈ X∗ × R such that

w, v − u + s(t − τ ) ≤ 0 ∀(v, t) ∈ epi F, (6.16)

and

w, v − u + s(t − τ ) ≥ u − u0 2X + (τ − t0 )2 − v − u0 2X − (t − t0 )2 (6.17)

for all (v, t) ∈ X × R. One has s ≤ 0, since for s > 0 we obtain a contradiction
to (6.16) by letting v = u, t > τ and t → ∞. The case s = 0 can also not occur:
We choose v = u and t = t0 , in (6.17), and since τ = t0 we obtain

0 = w, u − u ≥ u − u0 2X + (τ − t0 )2 − u − u0 2X > 0,

a contradiction. Hence, s < 0 and (6.16) implies with v = u and t = F (u) the
inequality τ ≤ F (u), moreover, since (u, τ ) ∈ epi F , we even get τ = F (u). For
v ∈ dom F and t = F (v) we finally get

w, v − u + s F (v) − F (u) ≤ 0 ∀v ∈ dom F,

which we rewrite as F (u) + −s −1 w, v − u ≤ F (v) for all v ∈ X, showing that

−s −1 w ∈ ∂F (u). $
#
The obtained results show their potential when applied to concrete minimization
problems
Example 6.55
1. Minimization under constraints
Let F : X → R∞ be a convex and Gâteaux-differentiable functional and K
a nonempty, convex, and closed subset of X. By Theorem 6.43, the minimizers
u∗ of

min F (u)
u∈K
6.2 Foundations of the Calculus of Variations and Convex Analysis 299

are exactly the u∗ for which 0 ∈ ∂(F + IK )(u∗ ). By Theorem 6.51 we can write
∂(F + IK )(u∗ ) = ∂F (u∗ ) + ∂IK (u∗ ) = DF (u∗ ) + ∂IK (u∗ ). Using the result of
Example 6.48 we get the optimality condition

u∗ ∈ K : DF (u∗ ) + w∗ = 0 with w∗ , v − u∗ ≤ 0 for all v ∈ K.

(6.18)

For the special case in which K = ∅ is given as

6
M

K= Km , Km = {u ∈ X Gm (u) ≤ 0}
m=1

with convex and Gâteaux-differentiable Gm : X → R, where there exists some

u ∈ X with Gm (u) < 0 for all m = 1, . . . , M, we get, by the characterization in
Example 6.48 and the sum rule for subgradients, that

M
∂IK (u) = ∂IKm (u)
m=1
⎧
⎪
⎪ M
⎨
μm DGm (u) μm ≥ 0, μm Gm (u) = 0 if u ∈ K,
=
⎪
⎪ m=1
⎩∅ otherwise.

The optimality condition becomes

M
u∗ ∈ K : DF (u∗ ) + μ∗m DGm (u∗ ) = 0, μ∗m ≥ 0, μ∗m Gm (u∗ ) = 0
m=1
(6.19)
for m = 1, . . . , M.
In this context, one calls the variables μ∗m ≥ 0 the Lagrange multipliers for
the constraints {Gm ≤ 0}. These have to exist for every minimizer u∗ .
The subdifferential calculus provides an alternative approach to optimality
for general convex constraints. The w∗ in (6.18) corresponds to the linear
combination of the derivatives DGm (u∗ ) in (6.19), and the existence of Lagrange
multipliers μ∗m is abstracted by the condition that w∗ is in the normal cone of K.
2. Tikhonov functionals
We consider an example similar to Example 6.32. Let X be a real Banach
space, Y a Hilbert space, A ∈ L(X, Y ), u0 ∈ Y , and p ∈ ]1, ∞[. Moreover, let
Z → X be a real Hilbert space that is densely and continuously embedded in
X, λ > 0, and q ∈ ]1, ∞[. We aim at optimality conditions for the minimization
p
of the Tikhonov functional (6.7). We begin with the functional (v) = p1 vY ,
300 6 Variational Methods

which is continuous, and by Example 6.49 the subgradient is

p−2
∂(v) = (JY v)vY .

Using the rules from Theorem 6.51, we obtain for u ∈ X,

∂( ◦ T−u0 ◦ A)(u) = A∗ JY (Au − u0 )Au − u0 Y

p−2
.

The objective function F from (6.7) is continuous, and hence we can apply the
sum rule for subgradients to obtain

∂F (u) = A∗ JY (Au − u0 )Au − u0 Y

p−2
+ λ∂!(u).

q
To calculate ∂!, we note that we can write ! as a concatenation of q1 · Z and
the inverse embedding i −1 from X to Z with domain of definition dom i −1 = Z.
By construction, i −1 is a closed and densely defined mapping. By continuity
of the norm in Z, the respective chain rule (Remark 6.53) holds, hence ∂! =
(i −1 )∗ ◦ ∂ q1 · Z ◦ i −1 . The space X∗ is densely and continuously embedded
q

in Z ∗ by i ∗ : X∗ → Z ∗ , and (i −1 )∗ is a closed mapping from Z ∗ to X∗ , and

it is simple to see that it is equal to the inverse of the adjoint, i.e., to (i ∗ )−1
(Exercise 6.17). We obtain

if u ∈ Z and JZ u ∈ X∗ ,
q−2
(JZ u)uZ
∂!(u) =
∅ otherwise.

For the minimizer u∗ of the Tikhonov functional F , it must be the case that
u∗ ∈ Z, (JZ u∗ ) ∈ X∗ , and

A∗ JY (Au∗ − u0 )Au∗ − u0 Y + λ(JZ u∗ )u∗ Z

p−2 q−2
= 0.

Note that by construction u∗ , belongs to the “better” space {u ∈ Z JZ u ∈ X∗ }.
This is a consequence of the penalty term ! and basically independent of the
choice of .
As announced, we can treat the minimization problems from Examples 6.37
and 6.38 is a unified way (Exercise 6.18). Moreover, we can treat more general
problems than these in Examples 6.1–6.4; see Sect. 6.3.
Besides the subdifferential, there is another useful notion to treat minimization
problems, which we introduce in the next section.
6.2 Foundations of the Calculus of Variations and Convex Analysis 301

6.2.4 Fenchel Duality

One important concept for the treatment of convex minimization problems is

Fenchel duality. It is related to the notion of the dual problem of a minimization
problem. The following example gives some motivation.
Example 6.56 Let X be a real Hilbert space and Y a real, reflexive Banach space
that is densely and continuously embedded in X, i.e. the embedding map j : Y →
X is continuous and has dense range. We consider the strictly convex minimization
problem

u − u0 2X
min + λuY (6.20)
u∈X 2

for some λ > 0. This could, for example, model a denoising problem (see
Example 6.1, where X = L2 (Rd ), but the penalty is a squared seminorm in
H 1 (Rd )). To reformulate the problem, we identify X = X∗ and write Y ⊂ X =
X∗ ⊂ Y ∗ . Every u ∈ Y is mapped via w = j ∗ j u to some w ∈ Y ∗ , where
j ∗ : X∗ → Y ∗ denotes the adjoint of the continuous embedding. It is easy to
see that X = X∗ → Y ∗ densely, and hence

λuY = sup {w, u w ∈ Y ∗ , wY ∗ ≤ λ} = sup {(w, u) w ∈ X, wY ∗ ≤ λ}.

Consequently, we can write (6.20) as

u − u0 2X
min sup + (w, u).
u∈X w ∗ ≤λ
Y
2

Now assume that we can swap the minimum and the supremum (in general one has
only “sup inf ≤ inf sup”, see Exercise 6.19) to obtain

u − u0 2X u − u0 2X
inf sup + (w, u) = sup inf + (w, u).
u∈X w ∗ ≤λ
Y
2 wY ∗ ≤λ u∈X 2

The functional on the right-hand side is rewritten as

u − u0 2X u − (u0 − w)2X u0 2X u0 − w2X

+ (w, u) = + − ,
2 2 2 2

and hence is minimal for u = u0 − w. Plugging this into the functional, we obtain

u − u0 2X u0 2X u0 − w2X

min + λuY = max − . (6.21)
u∈X 2 wY ∗ ≤λ 2 2
302 6 Variational Methods

The maximization problem on the right-hand side is the dual problem to (6.20).
Obviously, it is equivalent (in the sense that the solutions are the same) to the
projection problem

u0 − w2X
min + I{wY ∗ ≤λ} (w).
w∈X 2

The latter has the unique solution w∗ = P{wY ∗ ≤λ} (u0 ), since projections onto
nonempty, convex and closed sets in Hilbert spaces are well defined (note that
{wY ∗ ≤ λ} is closed by the continuous embedding j ∗ : X∗ → Y ∗ ).
For wY ∗ ≤ λ and u ∈ X, one has

u0 2X u0 − w2X u − u0 2X

− ≤ max min + (w, u)
2 2 wY ∗ ≤λ u∈X 2
u − u0 2X u − u0 2X
= min max + (w, u) ≤ + λuY ,
u∈X wY ∗ ≤λ 2 2

and we conclude that

u0 − w2X u0 2X u − u0 2X

0≤ − + + λuY for all u ∈ X, wY ≤ λ,
2 2 2

where equality holds if and only if one plugs in the respective optimal solutions u∗
and w∗ of the primal problem (6.20) and the dual problem (6.21). We rewrite the
last inequality as

u0 − w2X u0 2X u − u0 2X

0≤ − + + (w, u) − (w, u) + λuY
2 2 2
u0 − w − u2X
= + λuY − (w, u) .
2
Both summands on the right-hand side are nonnegative for all u ∈ X and wY ∗ ≤
λ, and in particular both equal zero for a pair (u∗ , w∗ ) of primal and dual solutions.
This is equivalent to
: ;
u0 − w∗ − u∗ 2X ∗ ∗
=0 ⇔ u =u −w 0
and (w∗ , u∗ ) = λu∗ Y .
2

If we know the dual solution w∗ , we get u∗ = u0 − w∗ as primal solution. In

conclusion, we have found a way to solve (6.20) by a projection:

u∗ minimizes (6.20) ⇔ u∗ = u0 − P{wY ∗ ≤λ} (u0 ).

6.2 Foundations of the Calculus of Variations and Convex Analysis 303

The theory of Fenchel duality is a systematic treatment of the techniques we used

in the above example. This leads to the definition of the dual functional and thus to
the dual problem. The argument crucially relies on the fact that the supremum and
infimum can be swapped, which is equivalent to the claim that the infimum for the
primal problem equals the supremum for the dual problem. In the following we
will derive sufficient conditions for this to hold. Finally, we will be interested in
the relation of primal and dual solutions, which allows us, for example, to obtain a
primal solution from a dual solution.
We begin with the definition of the dual functional. We aim to write a suitable
convex functional as the supremum. The following result is the basis of the
construction.
Lemma 6.57 Every convex and lower semicontinuous functional F : X → R∞ on
a real Banach space X is the pointwise supremum of a nonempty family of affine
linear functionals, i.e., there exists a subset K0 ⊂ X∗ × R, K0 = ∅, such that

F = sup w, · X∗ ×X + s

(w,s)∈K0

pointwise.
Proof For F ≡ ∞, we just set K0 = X∗ × R. Otherwise, we construct in the
following for every pair (u, t) ∈ X × R with t < F (u) a pair (w, s) ∈ X∗ × R such
that

w, uX∗ ×X + s > t and w, vX∗ ×X + s < F (v) for all v ∈ X.

If we set K0 as the collection of all these (w, s), we obtain the claim by

sup w, vX∗ ×X + s ≤ F (v) for all v ∈ X

(w,s)∈K0

and the observation that for every fixed u ∈ X and all t < F (u), one has

sup w, uX∗ ×X + s > t ⇒ sup w, uX∗ ×X + s ≥ sup t = F (u).

(w,s)∈K0 (w,s)∈K0 t<F (u)

So let (u, t) with t < F (u) < ∞ be given (such a pair exists in this case).
Analogously to Theorem 6.31 we separate {(u, t)} from the closed, convex, and
nonempty set epi F by some (w0 , s0 ) ∈ X∗ × R, i.e., for some λ ∈ R and ε > 0,

w0 , v + s0 τ ≥ λ + ε for all v ∈ dom F and τ ≥ F (v)

and w0 , u + s0 t ≤ λ − ε. Using v = u and τ = F (u) in the above inequality, we

see that

w0 , u + s0 F (u) > λ > w0 , u + s0 t ⇒ s0 F (u) − t) > 0 ⇒ s0 > 0.
304 6 Variational Methods

Hence, w = −s0−1 w0 and s = s0−1 λ with τ = F (v) gives the desired inequalities
w, u + s > t and w, v + s < F (v) for all v ∈ X. In particular, K0 is not empty.
Now we treat the case t < F (u) = ∞, where we can also find some (w0 , s0 ) ∈
X∗ × R and λ ∈ R, ε > 0 with the above properties. exists v ∈ dom F with
If there
w0 , u − v > −2ε, we plug in v and τ > max t, F (v) and obtain

w0 , v +s0 τ −ε ≥ λ ≥ w0 , u +s0 t +ε ⇒ s0 (τ −t) ≥ w0 , u − v +2ε > 0

and thus s0 > 0; moreover, we can choose (w, s) as above. If such a v does not
exist, then w0 , v − u + 2ε ≤ 0 for all v ∈ dom F . By the above consideration
there exists a pair (w∗ , s ∗ ) ∈ X∗ × R with w∗ , · + s∗ < F . For this, one has with
c > 0 that

w∗ , v + s ∗ + c(w0 , v − u + 2ε) < F (v) for all v ∈ X.

The choice
c > max 0, (2ε)−1 (t − s ∗ − w∗ , u) and w = w∗ + cw0 , s =
s ∗ + c 2ε − w∗ , u gives the desired inequality. #
$
Note that we already identified proper, convex, and lower semicontinuous
functionals as interesting objects for minimization problems (see, for example, The-
orem 6.31). Motivated by Lemma 6.57, we study pointwise suprema of continuous
affine linear functionals in more detail. To begin with, it is clear that a set K0 for
which F = sup(w,s)∈K0 (w, · ) +s holds is not uniquely determined. However, there
is always a distinguished set like this.
Corollary 6.58 Every convex and lower semicontinuous functional F : X → R∞
on a real Banach space can be written as

F = sup w, · + s, spt(F ) = {(w, s) ∈ X∗ × R w, · + s ≤ F },
(w,s)∈spt(F )

where spt(F ) is the set of affine linear supporting functionals for F .

Proof The set spt(F ) contains the set K0 we constructed in Lemma 6.57, and hence
sup(w,s)∈spt(F ) w, · + s ≥ F . The reverse inequality holds by construction, hence
we obtain equality. $
#
The set spt(F ) has some notable properties.
Lemma 6.59 For every functional F : X → R∞ on a real Banach space X,
the set spt(F ) of affine linear supporting functionals is convex and closed, and for
(w0 , s0 ) ∈ spt(F) and s ≤ s0 , one has also (w0 , s) ∈ spt(F ). For fixed w ∈ X∗ one
has that {s ∈ R (w, s) ∈ spt(F )} is unbounded from above if and only if F is not
proper.
6.2 Foundations of the Calculus of Variations and Convex Analysis 305

Analogous claims hold for functionals on the dual space, i.e., for G : X∗ → R∞
with

spt(G) = {(u, t) ∈ X × R · , uX∗ ×X + t ≤ G}.

Proof All claims follow by direct computations (Exercise 6.20). $

#
We now see that the set spt(F ) for a proper F can be seen as the negative epigraph
of a convex and lower semicontinuous
functional F ∗ : X∗ → R∞ ; one simply
sets −F (w) = sup {s ∈ R (w, s) ∈ spt(F )}. This condition (w, s) ∈ spt(F ) is
∗

equivalent to

w, u − F (u) ≤ −s for all u ∈ X ⇔ s ≤ − sup w, u − F (u) ,
u∈X

and hence F ∗ (w) = supu∈X w, u − F (u). This motivates the following definition.
Definition 6.60 (Dual Functional) Let F : X → R∞ be a proper functional on a
real Banach space X. Then

F ∗ : X ∗ → R∞ , F ∗ (w) = sup w, uX∗ ×X − F (u)

u∈X

defines a functional, called the dual functional or Fenchel conjugate of F .

Moreover, for a proper G : X∗ → R∞ we define

G ∗ : X → R∞ , G∗ (u) = sup w, uX∗ ×X − G(w).

w∈X ∗

In particular, we call F ∗∗ : X → R∞ and G∗∗ : X∗ → R∞ the bidual functionals

for F and G, respectively.
Moreover, the pointwise suprema of continuous affine linear functionals are
denoted by

0 (X) = F : X → R∞ F = sup w, · X ∗ ×X + s = ∞ for some ∅ = K0 ⊂ X∗ × R ,
(w,s)∈K0

0 (X∗ ) = G : X∗ → R∞ G = sup · , uX ∗ ×X + t = ∞ for some ∅ = K0 ⊂ X × R ,
(u,t)∈K0

where ∞ denotes the functional that is constant infinity.

This definition is fundamentally related to the notion of subgradients.
306 6 Variational Methods

Remark 6.61 (Fenchel Inequality)

• For F : X → R∞ proper and (u, w) ∈ X×X∗ with F (u) < ∞ and F ∗ (w) < ∞,
one has w, u − F (u) ≤ F ∗ (w). It is easy to see that then (and also in all other
cases),

w, u ≤ F (u) + F ∗ (w) for all u ∈ X, w ∈ X∗ . (6.22)

This inequality is called Fenchel inequality.

• If in the same situation, one has w, u = F (u) + F ∗ (w), then F (u) < ∞ and
by definition w, u − F (u) ≥ w, v − F (v) for all v ∈ dom F . Put differently,
for all v ∈ X, one has F (u) + w, v − u ≤ F (v), i.e., w ∈ ∂F (u). Conversely,
for w ∈ ∂F (u), one has w, u ≥ F (u) + F ∗ (w), which implies equality (by
Fenchel’s inequality).
In conclusion, the subdifferential consists of pairs (u, w) ∈ X × X∗ for which
Fenchel’s inequality is sharp:

(u, w) ∈ ∂F ⇔ w, u = F (u) + F ∗ (w).

Let us analyze conjugation as a mapping that maps functionals to functionals and

collect some fundamental properties.
Remark 6.62 (Conjugation as a Mapping)
• The conjugation of functionals in X or X∗ , respectively, maps onto 0 (X∗ )∪{∞}
or 0 (X) ∪ {∞}, respectively. The image is constant ∞ if there is no continuous
affine linear functional below F .
• If we form the pointwise supremum over all continuous affine linear maps
associated with (w, s) ∈ spt(F ), we obtain, if spt(F ) = ∅,

sup w, · + s = sup w, · + sup s = sup w, · − F ∗ (w) = F ∗∗ ,

(w,s)∈spt(F ) w∈X ∗ s∈R, w∈X ∗
(w,s)∈spt(F )

and by definition we have F ∗∗ ≤ F . In other words, F ∗∗ is the largest functional

in 0 (X) that is below F . A similar claim holds for functionals G : X∗ → R∞ .
• Every functional in 0 (X) or 0 (X∗ ), respectively, is, as a pointwise supremum
of convex and lower semicontinuous functions, again convex and lower semicon-
tinuous (see Lemmas 6.14 and 6.21).
Conversely, by Lemma 6.57 one ha

0 (X) = {F : X → R∞ F proper, convex and lower semicontinuous}

and if X is reflexive, the analogous statement for 0 (X∗ ) holds, too.

With the exception of the constant ∞, the sets 0 (X) and 0 (X∗ ) are exactly
the images of Fenchel conjugation. On these sets, Fenchel conjugation is even
invertible.
6.2 Foundations of the Calculus of Variations and Convex Analysis 307

Lemma 6.63 Let X be a real Banach space. The Fenchel conjugation ∗ : 0 (X) →
0 (X∗ ) is invertible with inverse ∗ : 0 (X∗ ) → 0 (X).
Proof First note that the conjugation on 0 (X) and 0 (X∗ ), respectively, maps by
definition to the respective sets (Remark 6.62). For F ∈ 0 (X), there exists ∅ =
K0 ⊂ X∗ × R with F = sup(w,s)∈K0 w, · + s. By definition, K0 is contained in
spt(F ), and thus, by Remark 6.62

F = sup w, · + s ≤ sup w, · + s = F ∗∗ ,

(w,s)∈K0 (w,s)∈spt(F )

which shows that F = F ∗∗ . Hence, conjugation on X∗ is a left inverse to

conjugation on X. A similar argument for G ∈ 0 (X∗ ) leads to G = G∗∗ , and
hence conjugation on X∗ is also a right inverse to conjugation on X. $
#
Example 6.64 (Fenchel Conjugates)
1. Closed λ-balls and norm functionals
For λ > 0 and F = I{uX ≤λ} the conjugate is

F ∗ (w) = sup w, u − I{uX ≤λ} (u) = sup w, u = λwX∗ .

u∈X uX ≤λ

A similar claim is true for the conjugate of G = I{wX∗ ≤λ} . The latter situation
occurred in Example 6.56.
More generally, for F (u) = ϕ(uX ) with a proper and even ϕ : R → R∞
(i.e., ϕ(x) = ϕ(−x)), one has

F ∗ (w) = sup w, u − ϕ(uX )

u∈X

= sup sup w, u − ϕ(uX )

t ≥0 uX =t

= sup wX∗ t − ϕ(t)

t ≥0

= sup wX∗ t − ϕ(t) = ϕ ∗ (wX∗ ).

t ∈R

2. Powers and positively homogeneous functionals

The real function ϕ(t) = p1 |t|p with 1 < p < ∞ has the conjugate ψ(s) =
p∗
p ∗ |s| + = 1. One the one hand, this leads to Young’s inequality for
1 1 1
with p p∗
products

1 p 1 ∗ 1 p∗
st ≤ |t| + ∗ |s|p ⇒ ϕ ∗ (s) ≤ |s| = ψ(s),
p p p∗
308 6 Variational Methods

1
and on the other hand, we obtain for t = sgn(s)|s| p−1 that

1 p 1 p−1p 1 ∗
st − |t| = 1 − |s| = ∗ |s|p .
p p p

The special cases p ∈ {1, ∞} follow from the first item:

ϕ(t) = |t| ⇒ ϕ ∗ (s) = I[−1,1] (s), ϕ(t) = I[−1,1] (t) ⇒ ϕ ∗ (s) = |s|.

In general, a positively p-homogeneous function has a positively p∗ -

homogeneous functional as conjugate; see Exercise 6.21.
3. Convex cones and subspaces
An important special case is that of conjugates of indicator functionals of
closed convex cones, i.e., F = IK with K ⊂ X closed, nonempty, and with the
property that for all u1 , u2 ∈ K, also u1 + u2 ∈ K, and for all u ∈ K, α ≥ 0,
also αu ∈ K. For w ∈ X∗ , one has

0 if w, u ≤ 0 for all u ∈ K
F ∗ (w) = sup w, u =
u∈K ∞ otherwise,
= IK ⊥ (w)

with K ⊥ = {w ∈ X∗ w, u ≤ 0 for all u ∈ K}. We check that for two elements
w1 , w2 in K ⊥ one has

w1 + w2 , u = w1 , u + w2 , u ≤ 0 for all u ∈ K,

and hence w1 + w2 ∈ K ⊥ . Similarly we obtain for w ∈ K ⊥ and α ≥ 0 the

inequality αw, u = αw, u ≤ 0 for all u ∈ K. Hence, the set K ⊥ is again a
closed and convex cone, called the dual cone. In this sense, K ⊥ is dual to K.
Of course, closed subspaces U ⊂ X are closed convex cones. We note that

w, u ≤ 0 for all u ∈ U ⇔ w, u = 0 for all u ∈ U,

and hence the set U ⊥ is a closed subspace of X∗ , called the annihilator of U . In a

Hilbert space, the annihilator U ⊥ is the orthogonal complement if X and X∗ are
identified by the Riesz map. Hence, Fenchel duality contains the usual duality of
closed spaces as a special case.
There is also a geometric interpretation of Fenchel conjugation. For a fixed
“slope” w ∈ X∗ , one has, as we already observed,

−F ∗ (w) = sup {s ∈ R w, · + s ≤ F }.
6.2 Foundations of the Calculus of Variations and Convex Analysis 309

Fig. 6.10 Graphical visualization of Fenchel duality in dimension one. Left: A convex (and
continuous) function F . Right: Its conjugate F ∗ (with inverted s-axis). The plotted lines are
maximal affine linear functionals below the graphs, their intersections with the respective s-axis
correspond to the negative values of F ∗ and F , respectively (suggested by dashed lines). For some
slopes (smaller than −1) there are no affine linear functionals below F , and consequently, F ∗
equals ∞ there

In other words, mw = w, · − F ∗ (w) is, for given w ∈ X∗ , the largest affine linear
functional below F . We have mw (0) = −F ∗ (w), i.e. the intersection of the graph
of mw and the s-axis is the negative value of the dual functional at w. This allows
us to construct the conjugate in a graphical way, see Fig. 6.10.
We collect some obvious rules for Fenchel conjugation.
Lemma 6.65 (Calculus for Fenchel Conjugation) Let F1 : X → R∞ be a
proper functional on a real Banach space X.
1. For λ ∈ R and F2 = F1 + λ, we have F2∗ = F1∗ − λ,
2. for λ > 0 and F2 = λF1 , we have F2∗ = λF1∗ ◦ λ−1 id,

3. for u0 ∈ X, w0 ∈ X∗ , and F2 = F1 ◦ Tu0 + w0 , · we have F2∗ = F1∗ −

· , u0 ◦ T−w0 ,
4. for a real Banach space Y and K ∈ L(Y, X) continuously invertible, we have for
F2 = F1 ◦ K that F2∗ = F1∗ ◦ (K −1 )∗ .
Proof Assertion 1: For w ∈ X∗ we get by definition that

F2∗ (w) = sup w, u − F1 (u) − λ = sup w, u − F1 (u) − λ = F1∗ (w) − λ.
u∈X u∈X

Assertion 2: We use that multiplication by positive constants can be interchanged

with taking suprema and get

F2∗ (w) = sup λλ−1 w, u − λF1 (u) = λ sup λ−1 w, u − F1 (u) = λF1∗ (λ−1 w).
u∈X u∈X
310 6 Variational Methods

Assertion 3: We observe that

sup w, u − F1 (u + u0 ) − w0 , u
u∈X

= sup w − w0 , u + u0 − F1 (u + u0 ) − w − w0 , u0
u∈X

= sup w − w0 , ũ − F1 (ũ) − w − w0 , u0
ũ=u+u0 , u∈X

= F1∗ (w − w0 ) − w − w0 , u0 .

Assertion 4: Since K is invertible, rg(K) = X, and hence for ω ∈ Y ∗ ,

F2∗ (ω) = sup ω, K −1 Kv − F1 (Kv)
v∈Y
> −1 ∗ ?
= sup (K ) ω, u − F1 (u) = F1∗ (K −1 )∗ ω . $
#
u=Kv,
v∈Y

In view of Lemmas 6.14 and 6.21, we may ask how conjugation acts for
pointwise suprema and sums. Similarly for the calculus of the subdifferential
(Theorem 6.51), this question is a little delicate. Let us first look at pointwise
@ Let {Fi }, i ∈ I = ∅ be a family of proper functionals Fi : X → R∞
suprema.
with i∈I dom Fi = ∅. For w ∈ X∗ we deduce:
∗
sup Fi (w) = sup w, u − sup Fi (u) = sup inf {w, u − Fi (u)}
i∈I u∈X i∈I u∈X i∈I

≤ inf sup w, u − Fi (u) = inf Fi∗ (w).
i∈I u∈X i∈I

It is natural to ask whether equality holds, i.e., whether infimum and supremum can
be swapped. Unfortunately, this is not true in general: we know that the Fi∗ are con-
vex and lower semicontinuous, but these properties are not preserved by pointwise
infima, i.e. infi∈I Fi∗ is in general neither convex nor lower semicontinuous. Hence,
this functional is not even a conjugate in general. However, it still contains enough
information to extract the desired conjugate:
Theorem 6.66 (Conjugation of Suprema) Let I = 0 and Fi : X → R∞ , i ∈ I
and supi∈I Fi be proper on a real Banach space X. Then
∗ ∗∗
sup Fi = inf Fi∗ .
i∈I i∈I

You should do the proof in Exercise 6.23.

Now let us come to the conjugation of sums. Let F1 , F2 : X → R∞ be proper
functionals with dom F1 ∩ dom F2 = ∅. To calculate the conjugate of F1 + F2 at
6.2 Foundations of the Calculus of Variations and Convex Analysis 311

w ∈ X∗ we split w = w1 + w2 with w1 , w2 ∈ X∗ and note that

sup w, u − F1 (u) − F2 (u) ≤ sup w 1 , u − F1 (u) + sup w 2 , u − F2 (u)
u∈X u∈X u∈X

= F1∗ (w 1 ) + F2∗ (w 2 ).

Moreover, this holds for all decompositions w = w1 + w2 , and thus

(F1 + F2 )∗ (w) ≤ inf F1∗ (w1 ) + F2∗ (w2 ) = (F1∗ F2∗ )(w). (6.23)
w=w1 +w2

The operation on the right-hand side,

: (F, G) → F G, (F G)(w) = inf F (w1 ) + G(w2 )

w=w1 +w2

for F, G : X → R∞ is called infimal convolution. In some cases equality holds

in (6.23). A thorough discussion of the needed arguments and the connection with
the sum rule for subgradients can be found in Exercise 6.24. We only state the main
result here.
Theorem 6.67 (Conjugation of Sums) Let F1 , F2 : X → R∞ be proper, convex,
and lower semicontinuous functionals on a reflexive and real Banach space X.
Furthermore, let there exist u0 ∈ dom F1 ∩ dom F2 such that F1 is continuous
at u0 . Then

(F1 + F2 )∗ = F1∗ F2∗ .

Now we apply Fenchel duality to convex minimization problems. We will treat

the following situation, which is general enough for our purposes:

Primal problem: min F1 (u) + F2 (Au) (6.24)

u∈X

with F1 : X → R∞ proper, convex, and lower semicontinuous on the real Banach

space X, A ∈ L(X, Y ) and F2 : Y → R∞ also proper, convex, and lower
semicontinuous on the real Banach space Y . We write F2 as a suitable supremum,
and the minimum from above, written as an infimum, becomes

inf sup w, Au + F1 (u) − F2∗ (w).

u∈X w∈Y ∗

If we assume that we can swap infimum and supremum and that the supremum is
actually assumed, then this turns into

sup inf −A∗ w, −u + F1 (u) − F2∗ (w) = max∗ −F1∗ (−A∗ w) − F2∗ (w).
w∈Y ∗ u∈X w∈Y
312 6 Variational Methods

We have just derived the dual optimization problem, which is

Dual problem: max −F1∗ (−A∗ w) − F2∗ (w). (6.25)

w∈Y ∗

It remains to clarify whether infimum and supremum can be swapped and whether
the supremum is realized. In other words, we have to show that the minimum (6.24)
equals the maximum in (6.25). The following theorem gives a sufficient criterion
for this to hold.
Theorem 6.68 (Fenchel-Rockafellar Duality) Let F1 : X → R∞ , F2 : Y →
R∞ be proper, convex, and lower semicontinuous on the real Banach spaces X and
Y , respectively. Further, let A : X → Y be linear and continuous, and suppose that
the minimization problem

min F1 (u) + F2 (Au)

u∈X

has a solution u∗ ∈ X. If there exists some u0 ∈ X such that F1 (u0 ) < ∞,

F2 (Au0 ) < ∞ and F2 is continuous at Au0 , then

max −F1∗ (−A∗ w) − F2∗ (w) = min F1 (u) + F2 (Au).

w∈Y ∗ u∈X

In particular, the maximum is realized at some w∗ ∈ Y .

Proof Since u∗ is a solution of the minimization problem, we immediately get 0 ∈
∂(F1 +F2 ◦A)(u∗ ). Now we want to use the subdifferential calculus (Theorem 6.51)
and note that F2 ◦ A is continuous at u0 by assumption. Hence,

∂(F1 + F2 ◦ A) = ∂F1 + ∂(F2 ◦ A) = ∂F1 + A∗ ◦ ∂F2 ◦ A.

This means that there exists w∗ ∈ Y ∗ such that −A∗ w∗ ∈ ∂F1 (u∗ ) and w∗ ∈
∂F2 (Au∗ ). Now we reformulate the subgradient inequality for −A∗ w∗ ∈ ∂F1 (u∗ ):

F1 (u∗ ) + −A∗ w∗ , v − u∗ ≤ F1 (v) ∀v ∈ X

⇔ F1 (u∗ ) − −A∗ w∗ , u∗ ≤ F1 (v) − −A∗ w∗ , v ∀v ∈ X
∗ ∗ ∗ ∗ ∗ ∗
⇔ −A w , u − F1 (u ) ≥ sup −A w , v − F1 (v)
v∈X

= F1∗ (−A∗ w∗ ).

Similarly we obtain w∗ , Au∗ − F2 (Au∗ ) ≥ F2∗ (w∗ ). Adding these inequalities,
we get

F1∗ (−A∗ w∗ ) + F2∗ (w∗ ) ≤ −A∗ w∗ , u∗ − F1 (u∗ ) + w∗ , Au∗ − F2 (Au∗ )

= −F1 (u∗ ) − F2 (Au∗ )
6.2 Foundations of the Calculus of Variations and Convex Analysis 313

and hence

F1 (u∗ ) + F2 (Au∗ ) ≤ −F1∗ (−A∗ w∗ ) − F2∗ (w∗ ).

On the other hand, we have by Remark 6.62 (see also Exercise 6.19) that

F1 (u∗ ) + F2 (Au∗ ) = inf sup F1 (u) + w, Au − F2∗ (w)

u∈X w∈Y ∗

≥ sup inf F1 (u) − −A∗ w, u − F2 (w)

w∈Y ∗ u∈X

= sup −F1∗ (−A∗ w) − F2∗ (w).

w∈Y ∗

We conclude that

sup −F1∗ (−A∗ w) − F2∗ (w) ≤ inf F1 (u) + F2 (Au)

w∈Y ∗ u∈X

≤ −F1∗ (−A∗ w∗ ) − F2∗ (w∗ ) ≤ sup −F1∗ (−A∗ w) − F2∗ (w),

w∈Y ∗

which is the desired equality for the supremum. We also see that it is assumed at w∗ .
$
#
Remark 6.69 The assumption that there exists some u0 ∈ X such that F1 (u0 ) <
∞, F2 (Au0 ) < ∞ and that F2 is continuous at Au0 is used only to apply the
sum rule and the chain rule for subdifferentials. Hence, we can replace it with the
assumption that ∂(F1 + ∂(F2 ◦ A)) = ∂F1 + A∗ ◦ ∂F2 ◦ A. See Exercises 6.11–6.15
for more general sufficient conditions for this to hold.
The previous proof hinges on the applicability of rules for subdifferential
calculus and hence fundamentally relies on the separation of suitable convex sets. A
closer inspection of the proof reveals the following primal-dual optimality system:
Corollary 6.70 (Fenchel-Rockafellar Optimality System) If for proper, convex,
and lower semicontinuous functionals F1 : X → R∞ and F2 : Y → R∞ it is the
case that

max −F1∗ (−A∗ w) − F2∗ (w) = min F1 (u) + F2 (Au), (6.26)

w∈Y ∗ u∈X

then a pair (u∗ , w∗ ) ∈ X × Y ∗ is a solution of the primal-dual problem if and only

− A∗ w∗ ∈ ∂F1 (u∗ ), w∗ ∈ ∂F2 (Au∗ ). (6.27)

Proof It is clear that (u∗ , w∗ ) ∈ X × Y ∗ is an optimal pair if and only if

−F1∗ (−A∗ w∗ ) − F2∗ (w∗ ) = F1 (u∗ ) + F2 (Au∗ ).

314 6 Variational Methods

This is equivalent to

−A∗ w∗ , u∗ + w∗ , Au∗ = F1 (u∗ ) + F1∗ (−A∗ w∗ ) + F2 (Au∗ ) + F2∗ (w∗ ),

and since by Fenchel’s inequality (6.22) one has −A∗ w∗ , u∗ ≤ F1 (u∗ ) +

F1∗ (−A∗ w∗ ) and w∗ , Au∗ ≤ F2 (Au∗ ) + F2∗ (w∗ ), this is equivalent to

−A∗ w∗ , u∗ = F1 (u∗ ) + F1∗ (−A∗ w∗ ) and w∗ , Au∗ = F2 (Au∗ ) + F2∗ (w∗ ).

These equalities characterize the inclusions −A∗ w∗ ∈ ∂F1 (u∗ ) and w∗ ∈

∂F2 (Au∗ ); see Remark 6.61. $
#
Example 6.71 We analyze the special case of Tikhonov functionals from Exam-
ple 6.32. Let X be a reflexive Banach space, Y a Hilbert space, and A ∈ L(X, Y ) a
given forward operator. We define

1
F1 (u) = λuX , F2 (v) = v − u0 2Y ,
2
and obtain as primal problem (6.24)

Au − u0 2Y
min + λuX ,
u∈X 2

the minimization of the Tikhonov functional. In Example 6.32 we showed the

existence of a minimizer. Moreover, F2 is continuous everywhere, and we can apply
Theorem 6.68. To that end, we use Example 6.64 and Lemma 6.65 to get

0 if ωX ∗ ≤ λ, w2Y w + u0 2Y u0 2Y
F1∗ (ω) = F2∗ (w) = + (w, u0 ) = − .
∞ otherwise, 2 2 2

We identify Y = Y ∗ and also A∗ : Y → X∗ . The respective dual problem is a

constrained maximization problem, namely

w + u0 2Y u0 2Y

max
∗
− + .
−A wX∗ ≤λ 2 2

Substituting w̄ = −w, flipping the sign, and dropping terms independent of w̄, we
see that the problem is equivalent to a projection problem in a Hilbert space:

u0 − w̄2Y
w̄∗ = arg min ⇔ w̄∗ = P{A∗ w̄X∗ ≤λ} (u0 ).
A∗ w̄X∗ ≤λ 2
6.2 Foundations of the Calculus of Variations and Convex Analysis 315

The optimal w∗ = −w̄∗ and every solution u∗ of the primal problem satisfy (6.27),
and in particular we have

w∗ ∈ ∂F2 (Au∗ ) ⇔ w∗ = Au∗ − u0 ⇔ Au∗ = u0 − w̄∗ ,

and we see that u0 − w̄∗ lies in the image of A even if this image is not closed. If
A is injective, we can apply its inverse (which is not necessarily continuous) and
obtain

u∗ = A−1 u0 − P{A∗ w̄X∗ ≤λ} (u0 ) .

This is a formula for the solution of the minimization problem (however, often of
limited use in practice), and also we have deduced that the result from Example 6.56
in the case A = I was correct.
At the end of this section we give a more geometric interpretation of the solution
of the primal-dual problem.
Remark 6.72 (Primal-Dual Solutions and Saddle Points) We can interpret the
simultaneous solution of the primal and dual problem as follows. We define the
Lagrange functional L : dom F1 × dom F2∗ → R by

L(u, w) = w, Au + F1 (u) − F2∗ (w),

and observe that every optimal pair (u∗ , w∗ ) ∈ X × Y ∗ , in the situation of (6.26),
has to satisfy the inequalities

L(u∗ , w ∗ ) ≤ sup L(u∗ , w) = min F1 (u) + F2 (Au)

w∈Y ∗ u∈X

= max∗ −F1∗ (−A∗ w) − F2∗ (w) = inf L(u, w ∗ ) ≤ L(u∗ , w ∗ )

w∈Y u∈X

and hence is a solution of the saddle point problem

L(u∗ , w) ≤ L(u∗ , w∗ ) ≤ L(u, w∗ ) for all (u, w) ∈ dom F1 × dom F2∗ .

We say that a solution (u∗ , w∗ ) is a saddle point of L. Conversely, every saddle

point (u∗ , w∗ ) satisfies

L(u∗ , w∗ ) = sup L(u∗ , w) = F1 (u∗ ) + F2 (Au∗ )

w∈Y ∗

= inf L(u, w∗ ) = −F1∗ (−A∗ w∗ ) − F2∗ (w∗ )

u∈X

which means that (u∗ , w∗ ) is a solution of the primal-dual problem. Hence, the
saddle points of L are exactly the primal-dual solutions.
316 6 Variational Methods

This fact will be useful in deriving so-called primal-dual algorithms for the
numerical solution of the primal problem (6.24). The basic idea of these methods is
to find minimizers of the Lagrange functional in the primal direction and respective
maximizers in the dual direction. In doing so, one can leverage the fact that L has
a simpler structure than the primal and dual problems. More details on that can be
found in Sect. 6.4.

6.3 Minimization in Sobolev Spaces and BV

Now we pursue the goal to apply the theory we developed in the previous sections
to convex variational problems in imaging. As described in the introduction and
in Examples 6.1–6.4, it is important to choose a good model for images; in our
terminology, to choose the penalty ! appropriately. Starting from the space H 1 ()
we will consider Sobolev spaces with more general exponents. However, we will
see that these spaces are not satisfactory for many imaging tasks, since they do not
allow a proper treatment of discontinuities, i.e. jumps, in the gray values as they
occur on object borders. This matter is notably different in the space of functions
with bounded total variation, and consequently, this space plays an important role
in mathematical imaging. We will develop the basic theory for this Banach space in
the context of convex analysis and apply it to some concrete problems.

6.3.1 Functionals with Sobolev Penalty

We begin with the analysis of problems with functions of the Sobolev seminorm in
H m,p (). To that end, we begin with some important notions and results from the
theory of Sobolev spaces.
Lemma 6.73 Let ⊂ Rd be a domain, α ∈ Nd a multiindex, and 1 ≤ p, q < ∞.
∂α
Then the weak partial derivative ∂x α

∂αu
∂α ∂α
∂x α : dom ∂x α = u ∈ Lp
() α ∈ Lq () → Lq (),
∂x

defines a densely defined and closed linear mapping Lp () → Lq ().

Moreover, it is weak-to-weak closed.
Proof First we prove the weak-to-weak closedness, which by linearity implies
(strong) closedness. Let (un ) be a sequence in Lp () with un u for some
∂α n
u ∈ Lp () and ∂x αu v for some v ∈ Lq (). Choose some test function
α
ϕ ∈ D(). These and the derivative ∂∂xϕα are guaranteed to lie in the respective dual
6.3 Minimization in Sobolev Spaces and BV 317

∗ ∗
spaces Lp () and Lq (), and hence, by definition of the weak derivative,

∂α ϕ ∂α ϕ ∂ α un
u dx = lim un dx = lim (−1)|α| ϕ dx = (−1)|α| vϕ dx.
∂x α n→∞ ∂x α n→∞ ∂x α
α
Since this holds for every test function, we see that v = ∂x
∂
α u as desired.
∂α
We show that the mapping ∂x α is densely defined: every u ∈ D() also lies in
∂α
Lp () and has a continuous (strong) derivative of order α and in particular, ∂x αu ∈
∂α
Lq (). This shows that D() ⊂ dom ∂x α , and using Lemma 3.16 we obtain the
claim. $
#
In Sect. 3.3 on linear filters we have seen that smooth functions are dense in
Sobolev spaces with = Rd . The density of C ∞ () for bounded , however,
needs some regularity of the boundary of .
Theorem 6.74 Let ⊂ Rd be a bounded Lipschitz domain, m ≥ 1, and
1 ≤ p < ∞. Then there exists a sequence of linear operators (Mn ) in
L(H m,p (), H m,p ()) such that rg(Mn ) ⊂ C ∞ () for all n ≥ 1 and the property
that for every u ∈ H m,p (),

lim Mn u = u in H m,p ().

n→∞

The mappings can be chosen independently of m and p.

As a consequence, C ∞ () is dense in H m,p ().
Proof A bounded Lipschitz domain satisfies the so-called segment condition (see
[2]), i.e., for every x ∈ ∂ there exist an open neighborhood Ux and some ηx ∈ Rd ,
ηx = 0, such that for every y ∈ ∩ Ux and t ∈ ]0, 1[, one has y + tηx ∈ .
Now cover ∂ with finitely many such Ux and name them U1 , . . . , UK (these
exist due to the compactness
of ∂). Moreover, cover the rest of with another
open set U0 , i.e., \ K k=1 k ⊂⊂ U0 with U0 ⊂⊂ . The U0 , . . . , UK cover
U
, and for k = 0, . . . , K one can choose open Vk with Vk ⊂ Uk , such that the
V0 , . . . , VK still cover . We construct a partitionof unity that is subordinated to
this cover, i.e., ϕk ∈ D(Rd ), supp ϕk ⊂⊂ Vk , and K k=1 ϕk (x) = 1 for x ∈ , and
is an element of [0, 1] otherwise.
We note that the product ϕk u is contained in H m,p () and we have
α
∂ (ϕk u) =
α
(∂ α ϕk )(∂ β−α u) (6.28)
β
β≤α

for every multiindex |α| ≤ m. The following argument for m = 1 can be easily
extended by induction for the full proof. For i = 1, . . . , d and v ∈ D(), one has
∂v
ϕk ∂x i
∈ D() and

∂v ∂ ∂ϕk
ϕk = (ϕk v) − v.
∂xi ∂xi ∂xi
318 6 Variational Methods

Hence,

∂v ∂ ∂ϕk ∂u ∂ϕk
uϕk dx = u (ϕk v) − u v dx = − ϕk + u v dx,
∂xi ∂xi ∂xi ∂xi ∂xi

and this means that the weak derivative ∂

∂xi (ϕk v) = ∂u
∂xi ϕk + u ∂ϕk p
∂xi is in L (). The
representation (6.28) implies that

α
∂ α (ϕk u)p ≤ ∂ α ϕk ∞ ∂ β−α up ≤ Cum,p ,
β
β≤α

and hence u → ϕk u is a linear and continuous map from H m,p () to itself.
In the following we write uk = ϕk u and set η0 = 0. Our aim is to translate
uk in the direction of ηk from the segment condition and to convolve the translated
function such that uk is evaluated only on a compact subset of . To that end,
choose (tn ) in ]0, 1] with tn → 0 such that Vk − tn ηk ⊂⊂ Uk holds for all k and
n. Let ψ ∈ D(Rd ) be a mollifier and choose, for every n, an εn > 0 such that the
support of the scaled function ψ n = ψεn satisfies

( − supp ψ n + tn ηk ) ∩ Vk ⊂⊂

for all k = 0, . . . , K. This is possible, since by the segment condition and the choice
of tn , we have

( + tn ηk ) ∩ Vk = ∩ (Vk − tn ηk ) + tn ηk ⊂⊂ .

The operation consisting of smooth cutoff, translation, and convolution is then

expressed by u → (Ttn ηk uk ) ∗ ψ n and reads

(Ttn ηk uk ) ∗ ψ n (x) = ψ n (y + tn ηk )uk (x − y) dy. (6.29)
Rd

For x ∈ we need to evaluate u only at the points x − y ∈ Vk with y + tn ηk ∈

supp ψ n ; these points are always contained in ( − supp ψ n + tn ηk ) ∩ Vk and hence
are compactly contained in . Now, Theorems 3.15 and 3.13 show that

∂ α (Ttn ηk uk ) ∗ ψ n p ≤ Ttn ηk ψ n 1 ∂ α uk p ≤ C∂ α up

for all multiindices with |α| ≤ m. We define Mn by

K

Mn u = Ttn ηk (ϕk u) ∗ ψ n , (6.30)
k=0
6.3 Minimization in Sobolev Spaces and BV 319

which is, by the above consideration, in L(H m,p (), H m,p ()). Since ψ is a
mollifier, we also have Mn u ∈ C ∞ () for all n and u ∈ H m,p (). Note that
the construction of Mn is indeed independent of m and p. It remains to show that
Mn u − um,p → 0 for n → ∞.
To that end, let ε > 0. Since ϕ0 , . . . , ϕK is a partition of unity, it follows that

K

Mn u − um,p ≤ Tt uk ∗ ψ n − uk m,p .
n ηk
k=0

For fixed k one has ∂ α (Ttn ηk uk − uk ) = Ttn ηk ∂ α uk − ∂ α uk , and by continuity of

translation in Lp (), we obtain, for n large enough,

ε
∂ α (Ttn ηk uk − uk )p <
2M(K + 1)

for all multiindices with |α| ≤ m, and we denote the number of these multiindices
by M. With vk,n = Ttn ηk uk we get, by Theorem 3.15, the property that translation
and the weak derivative commute, and by Lemma 3.16 that for n large enough,

∂ α (vk,n − vk,n ∗ ψ n )p = ∂ α vk,n − (∂ α vk,n ) ∗ ψ n p

ε
≤ ∂ α (uk − uk ∗ ψ n )p <
2(K + 1)M

for all multiindices up to order m. Altogether, this gives

ε
(Ttn ηk uk ) ∗ ψ n − uk m,p ≤ vk,n ∗ ψ n − vk,n m,p + vk,n − uk m,p < ,
K+1

and by summation over k, we finally get Mn u − um,p < ε.

The density is a direct consequence of the above. $
#
A direct application of the density result is the chain rule for the weak derivative
in the respective spaces.
Lemma 6.75 (Chain Rule for Sobolev Functions) Let ϕ : R → R be
continuously differentiable with ϕ ∞ ≤ L for some L > 0. Moreover, let
be a bounded Lipschitz domain and 1 ≤ p < ∞. Then u → ϕ ◦ u maps the space
H 1,p () into itself, and

∇(ϕ ◦ u) = ϕ (u)∇u in Lp ().

The claim is still true for the functions ϕ(t) = min(a, t) and ϕ(t) = max(a, t) with
some a ∈ R (by abuse of notation we set ϕ (a) = 0 here).
Proof For u ∈ C ∞ (), the result follows from the usual chain rule. For general
u ∈ H 1,p () we choose a sequence (un ) in C ∞ () that converges to u in the
320 6 Variational Methods

Sobolev norm. By Lipschitz continuity of ϕ we obtain

n
ϕ ◦ un − ϕ ◦ up =
p ϕ u (x) − ϕ u(x) p dx

p
≤ Lp |un (x) − u(x)|p dx = Lp un − up

and hence ϕ ◦ un → ϕ ◦ u in Lp ().

By the theorem of Fischer-Riesz (Theorem 2.48) we also get a subsequence
(still indexed
by n)
thatconverges
pointwise almost everywhere to u, and hence
∗
limn→∞ ϕ un (x) = ϕ u(x) almost everywhere. For w ∈ Lp (, Rd ) we get by
Lebesgue’s dominated convergence theorem (Theorem 2.47) limn→∞ wϕ (un ) =
∗
wϕ (u) in Lp (, Rd ), and by continuity of the dual pairing,

lim w · ϕ (u )∇u dx = lim
n n
wϕ (u ) · ∇u dx =
n n
wϕ (u) · ∇u dx.
n→∞ n→∞

Hence, ϕ (un )∇un ϕ (u)∇u in Lp (, Rd ), and the claim follows from the
strong-to-weak closedness of the weak derivative (Lemma 6.73).
Now let ϕ(t) = min(a, t) and choose, for ε > 0,
<
(t − a)2 + ε2 − ε + a if t > a,
ϕε (t) =
a otherwise,

such that ∇(ϕε ◦ u) = ϕε (u)∇u holds with

⎧
⎨√ t −a
if t > a,
ϕε (t) = (t −a)2 +ε 2
⎩0 otherwise.

We have pointwise convergence ϕε → ϕ and hence ϕε ◦ u → ϕ ◦ u almost

everywhere. Moreover, we have a ≤ ϕε (t) ≤ min(a, t), and by Lebesgue’s
dominated convergence theorem, ϕε ◦ u → ϕ ◦ u in Lp (). Similarly we have
ϕε → χ]a,∞[ = ϕ pointwise and by ϕε (t) ∈ [0, 1] (independent of ε) also
ϕε (u)∇u → ϕ (u)∇u in Lp (). Again, the claim follows by the closedness of
the weak derivative.
The claim for ϕ(t) = max(a, t) is similar. $
#
For the treatment of Sobolev functions it is extremely helpful to know about
continuous embeddings into spaces with higher order of integrability but smaller
order of differentiability.
6.3 Minimization in Sobolev Spaces and BV 321

Theorem 6.76 (Embedding Between Sobolev Spaces) Let ⊂ Rd be a bounded

Lipschitz domain, j, m ∈ N, and p ∈ [1, ∞]. Then:
1. For mp < d, H m+j,p () → H j,q () for all 1 ≤ q ≤ pd/(d − mp).
2. Fort mp = d, H m+j,p () → H j,q () for all 1 ≤ q < ∞.
3. For mp > d, H m+j,p () → C j ().
All embeddings are compact except the upper endpoint case in statement 1 (i.e.,
mp < d and q = pd/(d − mp)).
The above result can be deduced, for example, from Theorems 4.12 and 6.3 in
[2]. Note that elements in Sobolev spaces are equivalence classes, and hence the
embedding into C j () has to be understood in the sense of representatives. Here is
another spatial case that can also be found in [2].
Remark 6.77 For p = d = 1 and m, j ∈ N, it is even the case that H m+j,p () →
H j,q () for all 1 ≤ q ≤ ∞.
Now we focus on the Sobolev penalty
m p/2
p m ∂ u 2
!(u) = ϕ ∇ m up , ∇ m
up = (x) dx,
α ∂x α
|α|=m
(6.31)

where, of course, the derivatives are weak derivatives. In the following we also
m
understand ∇ m u as an Rd -valued mapping, i.e., ∇ m u(x) is a d × d × · · · × d
tuple with
∂ ∂ ∂
(∇ m u)i1 ,i2 ,...,im = ··· u.
∂xi1 ∂xi2 ∂xim
By the symmetry of higher derivatives, one has for every permutation π :
{1, . . . , m} → {1, . . . , m} that

(∇ m u)iπ(1) ,...,iπ(m) = (∇ m u)i1 ,...,im .

m
To account for this symmetry we denote the coefficients of a symmetric ξ ∈ Rd
also by (ξα ), and in particular, we can express the Euclidean norm for symmetric
m
ξ ∈ Rd by

d 1/2 1/2
2 m 2
|ξ | = |ξi1 ,...,im | = |ξα | . (6.32)
α
i1 ,...,im =1 |α|=m

Remark 6.78
The Sobolev seminorm in (6.31) is slightly different from the usual
definition |α|=m ∂ α up , but both are equivalent. We choose the form (6.31) to
ensure that the norm is both translation-invariant and also rotation-invariant; see
Exercise 6.25.
322 6 Variational Methods

To guarantee the existence of a solution of minimization problems with ! as

penalty, it is helpful the analyze the coercivity of the Sobolev seminorm (since the
discrepancy function is not expected to be coercive for ill-posed problems; see again
Exercise 6.6). While coercivity is obvious for norms, the same question for semi-
norms is a bit more subtle; for Tikhonov functionals we recall Exercise 6.7. We
consider a slightly more general situation, but make use of the fact that ∇ m · p is
an admissible seminorm on H m,p (), i.e., that there exist a linear and continuous
mapping Pm : H m,p () → H m,p () and constants 0 < c ≤ C < ∞ such that

cPm um,p ≤ ∇ m up ≤ CPm um,p ∀u ∈ H m,p (). (6.33)

To construct these mappings Pm we start with the calculation of the kernel of the
seminorms ∇ m · p .
Lemma 6.79 Let be a bounded domain, m ≥ 1, and 1 ≤ p, q < ∞. Then for
m
u ∈ Lq () and ∇ m u ∈ Lp (, Rd ), one has that ∇ m up = 0 if and only if

u ∈ "m () = {u : → R u is a polynomial of degree < m}.

Proof Let u be a polynomial of degree less than m; then ∇ m u = 0 and consequently

∇ m up = 0.
The prove the reverse implication, we begin with two remarks: On the one hand,
the implication is clear for m-times continuously differentiable u. On the other hand
it is enough to prove that u ∈ "m ( ) for every open subset , ⊂⊂ . Now we
choose for u ∈ Lq () with ∇ m u = 0 (in the weak sense) such an , a mollifier η in
D(B1 (0)), and ε0 > 0 small enough that + Bε (0) ⊂⊂ for all 0 < ε < ε0 . The
smoothed uε = u ∗ ηε , ηε (x) = ε−d η(x/ε) then satisfy ∇ m uε = ∇ m u ∗ ηε = 0, at
least in (see Theorem 3.15). But uε is m-times continuously differentiable there,
and hence we have uε ∈ "m ( ). Moreover, uε → u in Lq ( ), and since "m ( )
is finite dimensional, in particular closed, we obtain u ∈ "m ( ) as desired. $
#
A mapping Pm suitable for (6.33) has to send polynomials of degree less than m
to zero. We construct such a mapping and also formalize the space of polynomials
of a fixed degree that we already used in Lemma 6.79.
Definition 6.80 (Projection onto the Space of Polynomials) Let ⊂ Rd be a
bounded domain and m ∈ N with m ≥ 1. The space of polynomials of degree up to
m − 1 is

"m () = span {x α : → R α ∈ Nd , |α| < m} .

Moreover, let q ∈ [1, ∞]. The mapping Qm : Lq () → Lq (), defined by

Qm u = v ⇔ v ∈ "m and v(x)x α dx = u(x)x α dx ∀ |α| < m

is called the projection onto "m ; the mapping Pm : Lq () → Lq () defined by
Pm = id −Qm is the projection onto the complement of "m .
6.3 Minimization in Sobolev Spaces and BV 323

It is clear that the set of monomials {x → x α |α| < m} is a basis for "m ().
Hence, the projection is well defined.
One easily sees that Q2m = Qm . The map Pm = id −Qm is also a projection with
ker(Pm ) = "m and should be a map suitable for (6.33) to hold. The upper estimate
is clear: since Qm u ∈ "m holds and the seminorm can be estimated by the Sobolev
norm, it follows that

∇ m up = ∇ m (u − Qm u)p ≤ CPm um,p

for all u ∈ H m,p (). The following lemma establishes the other inequality:
Lemma 6.81 Let ⊂ Rd be a bounded Lipschitz domain, 1 < p < ∞, m ≥ 1.
There exists a constant C > 0 such that for all u ∈ H m,p () with Qm u = 0 one
has

um−1,p ≤ C∇ m up . (6.34)

Proof Let us assume that the inequality is wrong, i.e., that there exists a sequence
(un ) in H m,p () with Qm un = 0, un m−1,p = 1, such that ∇ m un p ≤ n1 for all
n, i.e., ∇ m un → 0 for n → ∞. Since H m,p () is reflexive and un is bounded in
the respective norm, we can also assume that un u for some u ∈ H m,p () with
Qm u = 0. By the weak closedness of ∇ m we see that also ∇ m u = 0 has to hold. By
Lemma 6.79 we have u ∈ "m and consequently u = 0, since Qm u = 0.
By the compact embedding into H m−1,p () (see Theorem 6.76) we obtain
Pm un m−1,p → 0 for n → ∞. This is a contradiction to Pm un m−1,p = 1. $
#
Corollary 6.82 (Poincaré-Wirtinger Inequality) In the above situation, for k =
0, . . . , m one has

Pm uk,p ≤ Ck ∇ m up ∀ u ∈ H m,p (). (6.35)

Proof First, let k = m. We plug Pm u into (6.34), and get Pm um−1,p ≤
C∇ m Pm up = C∇ m up . Adding ∇ m Pm up = ∇ m up on both sides and
using the fact that · m−1,p +∇ m · p is equivalent to the norm in H m,p () yields
the claim.
The case k < m follows from the estimate uk,p ≤ um,p for all u ∈
H m,p (). $
#
Now we prove the desired properties of the Sobolev penalty.
Lemma 6.83 Let ⊂ Rd be a bounded Lipschitz domain and let ϕ : [0, ∞[ →
R∞ be proper, convex, lower semicontinuous, and non-decreasing. Further, let 1 <
p < ∞ and m ≥ 0. Then the functional

ϕ ∇ m up if u ∈ H m,p (),
!(u) =
∞ otherwise,

is proper, convex, and (weakly) lower semicontinuous on every Lq (), 1 ≤ q < ∞.

324 6 Variational Methods

If, moreover, q ≤ pd/(d − mp), mp < d, and ϕ is coercive, then ! is also

coercive in the sense that for every sequence un ∈ Lq (), one has

Pm uq → ∞ ⇒ !(u) → ∞.

If ϕ is strongly coercive, then Pm uq → ∞ ⇒ !(u)/Pm uq → ∞.

Proof Since ϕ is proper and non-decreasing, it follows that ϕ(0) < ∞. Hence, ! is
proper, since !(0) = ϕ(0) < ∞.
m
The mapping ξ → |ξ | is by (6.32) a norm on the finite dimensional space Rd .
1/p m
Hence, v → vp = |v|p dx is a Lebesgue norm on Lp (, Rd ), and the
functional can be written as ϕ ◦ · p ◦ ∇ m . We consider the linear mapping
m
∇ m : dom ∇ m → Lp (, Rd ), dom ∇ m = Lq () ∩ H m,p ().

By Lemma 3.16, ∇ m is densely defined, and we aim to show that it is also strongly-
to-weakly closed. To that end, let (un ) be a sequence in Lq () ∩ H m,p (),
converging in Lq () to some u ∈ Lq () that also satisfies ∇ m un v in
m
Lp (, Rd ). By Lemma 6.73 we get that v = ∇ m u, but it remains to show that
u∈H m,p ().
Here we apply (6.35) and conclude that the sequence (Pm un ) in H m,p ()
is bounded. By reflexivity we can assume, by moving to a subsequence, that
Pm un w for some w ∈ H m,p (), and by the compact embedding from
Theorem 6.76 we get Pm un → w and, as a consequence of the embedding of the
Lebesgue spaces, un → u in L1 (). This shows strong convergence of the sequence
Qm un = un − Pm un → u − w in the finite dimensional space "m . We can view this
space as a subspace of H m,p () and hence, we have Qm un → u − w in H m,p ()
by equivalence of norms. This gives un = Pm un + Qm un w + u − w = u in
H m,p (); in particular, u is contained in this Sobolev space.
Since ∇ m is strongly-to-weakly closed, we get from Example 6.23 and
Lemma 6.21 the convexity and by Example 6.29 and Lemmas 6.28 and 6.14
the (weak) lower semicontinuity of !.
To prove coercivity, let q ≤ pd/(d − mp) if mp < d and let (un ) be a sequence
in Lq () with Pm un q → ∞. Now assume that there exists a L > 0 such that
∇ m un p ≤ L for infinitely many n. For these n, one has un ∈ H m,p () and by
the Poincaré-Wirtinger inequality (6.35) and the continuous embedding of H m,p ()
into Lq () by Theorem 6.76, we obtain

Pm un q ≤ CPm un m,p ≤ C∇ m un p ≤ CL, C > 0,

a contradiction. Hence, Pm un q → ∞ implies ∇ m un p → ∞, and by the

coercivity of ϕ we also get !(un ) → ∞. If ϕ is strongly convex, then the inequality
Pm un −1 −1 m n −1
q ≥ C ∇ u p shows the assertion. $
#
6.3 Minimization in Sobolev Spaces and BV 325

Now we have all the ingredients for proving the existence of solutions of
minimization problems with Sobolev seminorms as penalty.
Theorem 6.84 (Existence of Solutions with Sobolev Penalty) Let ⊂ Rd be a
bounded Lipschitz domain, m ∈ N, m ≥ 1, 1 < p, q < ∞ with q ≤ pd/(d − mp)
if mp < d. Moreover, let : Lq () → R∞ be proper on H m,p (), convex, lower-
semicontinuous on Lq (), and coercive on "m in the sense that

(Pm uq ) bounded and Qm uq → ∞ ⇒ (u) → ∞.

Moreover, set

ϕ ∇ m up if u ∈ H m,p (),
!(u) =
∞ otherwise

where ϕ : [0, ∞[ → R∞ is proper, convex, lower semicontinuous and non-

decreasing, and strongly coercive. Then there exists for every λ > 0 a solution
u∗ of the minimization problem

min (u) + λ!(u).

u∈Lq ()

If ϕ is strictly convex, any two solutions u∗ and u∗∗ differ only in "m .
Proof By assumption and using Lemma 6.83 we see that F = + λ! is proper,
convex, and lower semicontinuous on the reflexive Banach space Lq (). To apply
Theorem 6.31, we need only to show coercivity.
First, note that is bounded from below by an affine linear functional, and by
∗
Theorem 6.54 we can even choose u0 ∈ Lq (), w0 ∈ Lq () such that (u0 ) +
w0 , u − u0 ≤ (u) for all u ∈ Lq (). In particular, we obtain the boundedness
of from below on bounded sets. Now assume that un q → ∞ for a sequence
(un ) in Lq (). For an arbitrary subsequence (unk ) consider the sequences (Pm unk )
and (Qm unk ). We distinguish two cases. First, if Pm unk q is bounded, Qm unk q
has to be unbounded, and by moving to a subsequence we get Qm unk q → ∞. By
assumption we get (unk ) → ∞ and, since !(unk ) ≥ 0, also F (unk ) → ∞.
Second, if Pm unk q is unbounded, we get by Lemma 6.83 (again moving to a
subsequence if necessary) !(unk ) → ∞. If, moreover, Qm unk q is bounded, then

(unk ) ≥ (u0 ) + w0 , unk − u0

≥ (u0 ) − w0 q ∗ u0 q − w0 q ∗ Pm unk q − w0 q ∗ Qm unk q
≥ C − w0 q ∗ Pm unk q
326 6 Variational Methods

with some C ∈ R independent of k. Hence

λ!(unk )
F (unk ) ≥ C + Pm unk q − w
0 ∗
q → ∞,
Pm unk q

since the term in parentheses goes to infinity by the strong coercivity of ! (again,
cf. Lemma 6.83). In the case that Qm unk q is unbounded, we obtain (again moving
to a subsequence if necessary) (unk ) → ∞ and hence F (unk ) → ∞.
Since the above reasoning holds for every subsequence, we see that for the whole
sequence we must have F (un ) → ∞, i.e., F is coercive. By Theorem 6.31 there
exists a minimizer u∗ ∈ Lq ().
Finally, let u∗ and u∗∗ be minimizers with u∗ − u∗∗ ∈ / "m . Then ∇ m u∗ =
∗∗ m
∇ u , and since · p on L (, R ) is based on a Euclidean norm on Rd
m p d m

(see Definition (6.31) and explanations in Example 6.23 for norms and convex
integrands), for strongly convex ϕ, we have
∇ m (u∗ + u∗∗ ) 1 1

ϕ < ϕ ∇ m u∗ p + ϕ ∇ m u∗∗ p
2 p 2 2

and hence F 12 (u∗ + u∗∗ ) < 12 F (u∗ ) + 12 F (u∗∗ ), a contradiction. We conclude that
u∗ − u∗∗ ∈ "m . $
#
Remark 6.85
• One can omit the assumption q ≤ pd/(d − mp) for mp < d if is coercive on
the whole space Lq ().
• Strong coercivity ϕ can be replaced by mere coercivity if is bounded from
below.
• If is strictly convex, minimizers are unique without further assumptions on ϕ.
We can apply the above existence result to Tikhonov functionals that are
associated with the inversion of linear and continuous mappings.
Theorem 6.86 (Tikhonov Functionals with Sobolev Penalty) Let , d, m, p be
as in Theorem 6.84, q ∈ ]1, ∞[, and Y a Banach space and A ∈ L(Lq (), Y ). If
one of the conditions
1. q ≤ pd/(d − mp) for mp < d and A injective on "m
2. A injective and rg(A) closed
is satisfied, then there exists for every u0 ∈ Y , r ∈ [1, ∞[, and λ > 0 a solution for
the minimization problem
p
Au − u0 rY ∇ m up
min +λ . (6.36)
q
u∈L () r p

In the case r > 1 and strictly convex norm in Y , the solution is unique.
6.3 Minimization in Sobolev Spaces and BV 327

Proof To apply Theorem 6.84 in the first case, it suffices to show coercivity in the
sense that Pm un q bounded and Qm uq → ∞ ⇒ 1r Au − u0 rY → ∞. All
other conditions are satisfied by the assumptions or are simple consequences of them
(see also Example 6.32).
Now consider A restricted to the finite-dimensional space "m and note that this
restriction is, by assumption, injective and hence boundedly invertible on rg(A|"m ).
Hence, there is a C > 0 such that uq ≤ CAuY for all u ∈ "m . Now let
(un ) be a sequence in Lq () with Pm un q bounded and Qm un q → ∞. Then
(APm un − u0 Y ) is also bounded (by some L > 0), and since Qm projects onto
"m , one has

Aun − u0 Y ≥ AQm un Y − APm un − u0 Y ≥ C −1 Qm un q − L.

For large n, the right-hand side is nonnegative, and thus

−1 r
C Qm un q − L
(u ) ≥
n
→ ∞.
r
Hence, has all properties needed in Theorem 6.84, which shows the existence of
a minimizer.
In the second case, (u) = 1r Au − u0 rY is coercive on Lq (); this follows
from the assertion in Exercise 6.6. By Remark 6.85 there exists a minimizer in this
situation.
r
For the uniqueness in the case r > 1 and · Y strictly convex note that ϕ(t) = tr
∗ ∗∗ m
is strictly convex and that minimizers u and u differ only in " . In particular,
Pm u∗ = Pm u∗∗ . Now assume that u∗ = u∗∗ . Then it holds that

Au∗ = Pm u∗ + Qm u∗ = Pm u∗ + Qm u∗∗ = Pm u∗∗ + Qm u∗∗ = Au∗∗

by the injectivity of A on "m . We obtain

12 (Au∗ − u0 ) + 12 (Au∗∗ − u0 )rY Au∗ − u0 rY Au∗∗ − u0 rY

< + ,
r 2r 2r

which contradicts the minimizing property of u∗ and u∗∗ . $

#
Remark 6.87 Injectivity (and hence invertibility) of A on "m gives the coercivity of
the objective functional in the directions that do not change the Sobolev seminorm.
So in some sense, one can say that a solution u∗ gets its components in "m by
inversion directly from u0 . In the case that Y is a Hilbert space, one can show this
rigorously; see Exercise 6.26.
Before we turn to concrete applications, we analyze the subgradient of the
p
penalty p1 ∇ m up . To that end, we derive the adjoint operator for ∇ m : Lq () →
m m
Lp (, Rd ). Since ∇ m u has values in Rd , the adjoint depends on the inner product
328 6 Variational Methods

in that space. We use the inner product coming from the norm (6.32), namely

d
m
a·b = ai1 ,...,im bi1 ,...,im for a, b ∈ Rd .
i1 ,...,im =1

What is (∇)∗ w for w ∈ D(, Rd )? We test with u ∈ C ∞ () ⊂ dom ∇ m , and get
m

by integration by parts that

d
w · ∇ m u dx = wi1 ,...,im ∂ i1 · · · ∂ im u dx
i ,...,i =1
1 m

d
= (−1)m ∂ i1 · · · ∂ im wi1 ,...,im u dx.
i1 ,...,im =1

Since C ∞ () is dense in Lp (), we see that the adjoint is the differential operator
on the right-hand side. In the case m = 1 this amounts to ∇ ∗ = − div, and hence
we write

d
(∇ m )∗ = (−1)m divm = (−1)m ∂ i1 · · · ∂ im .
i1 ,...,im =1

Every element in dom ∇ m can be approximated by some element in C ∞ () in the

sense of the norm · q + ∇ m · p (see Theorem 6.74), and hence w ∈ dom (∇ m )∗
and we get D(, Rd ) ⊂ dom (∇ m )∗ .
m

The above formulation motivated the definition of the weak divergence: An

element v ∈ L1loc () is called the mth weak divergence of a vector field w ∈
m m
L1loc (, Rd ) if for every u ∈ D(, Rd ), one has

w · ∇ m u dx = (−1)m vu dx.

We write v = divm w if the weak divergence exists. Similarly to Lemma 6.73 one
∗ m ∗
can show that this defines a closed operator between Lp (, Rd ) and Lq ().
Since (∇ m )∗ is also closed, we obtain for (wn ) in D(, Rd ) with limn→∞ wn = w
m

∗ ∗
in Lp (, Rd ) and limn→∞ (−1)m divm wn = v in Lq () also (∇ m )∗ w = v =
m

(−1)m divm w with the weak mth divergence. We have shown that for

∗ m ∗ m
m
Ddiv = w ∈ Lp (, Rd ) ∃ divm w ∈ Lq () and sequence (wn ) in D(, Rd )

with lim wn − wp∗ + divm (wn − w)q ∗ = 0 , (6.37)
n→∞
6.3 Minimization in Sobolev Spaces and BV 329

m ⊂ dom (∇ m )∗ . Note that this space depends on p and q, but this

one has Ddiv
dependence is not reflected in the notation. In the case m = 1 one can show by
simple means that this space coincides with dom ∇ ∗ .
Theorem 6.88 (Characterization of ∇ ∗ ) For a bounded Lipschitz domain ,
1 < p ≤ q < ∞, and the linear operator ∇ between Lq () and Lp (, Rd ) with
domain H 1,p () one has

∇ ∗ = − div with dom ∇ ∗ = Ddiv

1
as in (6.37).
∗ ∗
Proof Let w ∈ dom ∇ ∗ ⊂ Lp (, Rd ). We apply the Lp − Lp -adjoints of the
approximating operators Mn from (6.30) in Theorem 6.74 to every component of
w (and denote them by M∗n ):

K

wn = M∗n w = ϕk T−tn ηk (w ∗ ψ̄ n ) ,
k=0

where ψ̄ n = D− id ψ n . Every wn is infinitely differentiable, so we can consider their

supports. Let

K
n = ( − supp ψ n + tn ηk ) ∩ Vk ,
k=0

which satisfies n ⊂⊂ by construction. Applied to u ∈ D(\n ) (with extension

by zero), we obtain for every k that

Ttn ηk (ϕk u) ∗ ψ n = 0

by (6.29) and the fact that u vanished on every ( − supp ψ n + tn ηk ) ∩ Vk . This

shows that Mn u = 0. Now we test every wn with the above u and get

(M∗n w)u dx = wMn u dx = 0,

and by the fundamental lemma of the calculus of variations (Lemma 2.75) we also
get wn = 0 on \n . This shows that wn ∈ D(, Rd ).
∗
The sequence wn converges weakly in Lp (, Rd ) to w: for u ∈ Lp (, Rd ),
one has Mn u → u in Lp (, Rd ) and hence wn , u = w, Mn u → w, u.
Moreover, for u ∈ dom ∇, one has
K
Mn ∇u = ∇(Mn u) − Ttn ηk (u∇ϕk ) ∗ ψ n = ∇(Mn u) − Nn u

k=0
=Nn u
330 6 Variational Methods

and this shows that ∇ ∗ wn = − div wn = −M∗n (div w) − Nn∗ w, since

−div wn , u = M∗n w, ∇u = w, Mn ∇u = w, ∇(Mn u) − w, Nn u

= −M∗n (div w), u − Nn∗ w, u.

Similar to Theorem 6.74, one sees that for every u ∈ Lq (),

K
lim Nn u = u∇ϕk = 0 in Lq (, Rd )
n→∞
k=0

holds, since (ϕk ) is a partition of unity. This implies that for every u ∈ Lq (),

|−div wn , u| ≤ |−div w, Mn u| + |w, Nn u|

≤ div wq ∗ sup Mn uq + wp∗ sup Nn up ≤ Cu ,
n n

since Nn up ≤ CNn uq because we have p ≤ q. By the uniform boundedness
∗
principle (Theorem 2.15) we get that (− div wn ) is bounded in Lq (, Rd ), and
hence there exists a weakly convergent subsequence. By the weak closedness of
− div, the weak limit of every weakly convergent subsequence has to coincide with
− div w; consequently, the whole sequence converges, i.e., − div wn − div w as
n → ∞.
We have shown that

∗ ∗
dom ∇ ∗ = w ∈ Lp (, Rd ) ∃ div w ∈ Lq () and sequence (w n ) in D(, Rd )
∗ ∗

with w n w in Lp (, Rd ) and div w n div w in Lq () .

Now assume that there exist ε > 0 and w0 ∈ dom ∇ ∗ for which w − w0 p∗ ≥ ε
or div(w − w0 )q ∗ ≥ ε for every w ∈ D(, Rd ). Then by the definition of the
dual norm as a supremum, there exists v ∈ Lp (, Rd ), vp ≤ 1, or u ∈ Lq (),
uq ≤ 1, with

ε ε
w − w 0 , v ≥ or div(w − w 0 ), u ≥ for all w ∈ D(, Rd ).
2 2

This is a contradiction, and hence we can replace weak by strong convergence and
get dom ∇ ∗ = Ddiv
1 , as desired. $
#
Remark 6.89 The domain of definition of the adjoint ∇ ∗ can be interpreted in a
different way. To that end, we consider certain boundary values on ∂.
6.3 Minimization in Sobolev Spaces and BV 331

If for some w ∈ C ∞ (, Rd ) and v ∈ L1Hd−1 (∂) the identity

uv dHd−1 = u div w + ∇u · w dx
∂

holds for every u ∈ C ∞ (), then v = w · ν on ∂ Hd−1 -almost everywhere, since

by Gauss’s Theorem (Theorem 2.81)

u(v − w · ν) dHd−1 = 0 for all u ∈ C ∞ (),
∂

and with Theorem 2.73 and the fundamental lemma of the calculus of variations
(Lemma 2.75) we get that v − w · ν = 0 Hd−1 -almost everywhere on ∂. This
motivates a more general definition of the so-called normal trace on the boundary.
We say that v ∈ L1Hd−1 (∂) is the normal trace of the vector field w ∈
∗ ∗
Lp (, Rd ) with div w ∈ Lq (), if there exists a sequence (wn ) in C ∞ (, Rd )
∗ ∗
with wn → w in Lp (, Rd ), div wn → div w in Lq () and wn · ν → v in
L1Hd−1 (∂), such that for all u ∈ C ∞ (),

uv dH d−1
= u div w + ∇u · w dx.
∂

In this case we write v = w · ν on ∂. Note that this definition corresponds to

the closure of the normal trace for smooth vector fields w with respect to wp∗ +
div wq ∗ and (w · ν)|∂ 1 .
By the definition of the adjoint and Theorem 6.88 we see that dom ∇ ∗ is precisely
the set on which the normal trace vanishes. Hence, we can say that

∇ ∗ = − div defined on {w · ν = 0 on ∂}.

Remark 6.90 The closed operator ∇ m from Lq () to Lp (, Rd ) on the bounded
Lipschitz domain with q ≤ pd/(d − mp) if mp < d has a closed range: If (un )
m
is a sequence in Lq () such that limn→∞ ∇ m un = v for some v ∈ Lp (, Rd ),
then (Pm un ) is a Cauchy sequence, since the Poincaré-Wirtinger inequality (6.35)
and the embedding into Lq () (Theorem 6.76) lead to

Pm un1 − Pm un2 q ≤ C∇ m un1 − ∇ m un2 p

for n1 , n2 ≥ n0 and arbitrary n0 . Hence, there exists a limit limn→∞ Pm un = u in

Lq (). Since ∇ m (Pm un ) = ∇ m un we get by closedness of the weak gradient that
∇ m u = v. Hence, the range rg(∇ m ) is also closed.
332 6 Variational Methods

By the closed range theorem (Theorem 2.26) we get rg((∇ m )∗ ) = ker(∇ m )⊥ ⊂

∗
Lq (). Since also ker ∇ m = "m (see Lemma 6.79), this is equivalent to

m ∗ m ⊥ q∗
rg((∇ ) ) = (" ) = w ∈ L () w(x)x α dx = 0 for all α ∈ N with |α| < m .

In the special case m = 1 and p ≥ q we can use the characterization of ∇ ∗ to solve

some divergence equations: the equation

q∗
− div v = w in ,
w ∈ L (), w dx = 0 :
v · ν = 0 on ∂,
∗
has a solution in v ∈ Lp (, Rd ).
p
Finally, we can calculate the subgradient of the functional u → p ∇up .
1

Lemma 6.91 For a bounded Lipschitz domain ⊂ Rd , 1 < p ≤ q < ∞, ∇ a

closed mapping between Lq () and Lp (, Rd ) with domain H 1,p () and
p
1
∇up if u ∈ H 1,p (),
!(u) = p
∞ otherwise,

one has for u ∈ Lq () that

⎧ ⎧
⎪ ⎨∇u ∈ Lp (, Rd ),
⎨− div |∇u|p−2 ∇u
⎪ ∗
if div |∇u|p−2 ∇u ∈ Lq (),
∂!(u) = ⎩
⎪
⎪ and |∇u| ∇u · ν = 0 on ∂,
p−2
⎩
∅ otherwise.

p
Proof The convex functional F (v) = p1 vp defined on Lp (, Rd ) is, as a pth
power of a norm, continuous everywhere, in particular at every point of rg(∇).
Since ! = F ◦ ∇, we can apply the identity ∂! = ∇ ∗ ◦ ∂F ◦ ∇ in the sense
of Definition 6.41 (see Exercise 6.12). By the rule for subdifferentials for convex
integrands (Example 6.50) as well as Gâteaux differentiability of ξ → p1 |ξ |p we
get

1
F (v) = |v(x)|p dx ⇒ ∂F (v) = {|v|p−2 v},
p

and it holds ∂!(u) = ∅ if and only if u ∈ dom ∇ = H 1,p () and |∇u|p−2 ∇u ∈
dom ∇ ∗ . By Theorem 6.88 and Remark 6.89, respectively, we can express the latter
∗
by div(|∇u|p−2 ∇u) ∈ Lq () with |∇u|p−2 ∇u · ν = 0 on ∂. In this case we get
∗
∂!(u) = ∇ ◦ ∂F (∇u) which shows the desired identity. $
#
Remark 6.92 It is easy to see that the case p = 2 leads to the negative Laplace
operator for functions u that satisfy ∇u · ν = 0 on the boundary; ∂ 12 ∇ · 22 = −.
6.3 Minimization in Sobolev Spaces and BV 333

Thus, the generalization for p ∈ ]1, ∞[ is called p-Laplace operator; hence, one
p
can say that the subgradient ∂ p1 ∇ · p is the p-Laplace operator for functions with
the boundary conditions |∇u|p−2 ∇u · ν = 0.
Example 6.93 (Solution of the p-Laplace Equation) An immediate application of
the above result is the proof of existence and uniqueness of the p-Laplace equation.
For a bounded Lipschitz domain , 1 < p ≤ q < ∞, q ≤ d/(d − p) if p < d, and
∗
f ∈ Lq () we consider the minimization problem

1
min |∇u|p dx − f u dx + I{v∈Lq () (u). (6.38)
q
u∈L () p v dx=0}

The restriction in the indicator function is exactly the condition u ∈ ("1 )⊥ . We

want to apply Theorem 6.84 to

(u) = − f u dx + I("1 )⊥ (u),

and ϕ(t) = p1 t p . To that end we note that (0) = 0, and hence, is proper
on H 1,p (). Convexity and lower semicontinuity are immediate, and coercivity
follows from the fact that for u ∈ Lq() and v ∈ "1 (i.e. v is constant) with
v = 0, one has (u + v) = ∞, since u + v dx = 0. Thus, the assumptions in
Theorem 6.84 are satisfied, and the minimization problem has a solution u∗ , which
is unique up to contributions from "1 . A solution u∗∗ different from u∗ would
satisfy u∗∗ = u∗ + v with v ∈ "1 , v = 0, and this would imply (u∗∗ ) = ∞,
a contradiction. Hence, the minimizer is unique.
Let us deduce the optimality conditions for u∗ . Here we face a difficulty, since
neither the Sobolev term ! nor is continuous, i.e., the assumption for the sum
rule in Theorem 6.51 are not satisfied. However, we see that both u → !(Q1 u) and
u → (P1 u) are continuous. Since for all u ∈ Lq () we have u = P1 u + Q1 u,
we can apply the conclusion from Exercise 6.14 and get 0 ∈ ∂!(u∗ ) + ∂(u∗ ).
By Lemma 6.91 we know that ∂!; let us compute ∂(u∗ ). Since u → f u dx is
continuous, Theorem 6.51 and Example 6.48 lead to

−f + "1 if u dx = 0,
∂(u) =
∅ otherwise,
⊥
since ("1 )⊥ = "1 . Thus, u∗ is optimal if and only if there exists some λ∗ ∈ R
such that

− div |∇u∗ |p−2 ∇u∗ = f − λ∗ 1 in ,
|∇u∗ |p−2 ∇u∗ · ν = 0 on ∂,
∗
u dx = 0,
334 6 Variational Methods

∗
where 1 ∈ Lq () denotes the function that is constant 1. It is easy to calculate the
value λ∗ : we integrate the equation in on both sides to get

∗

f dx − λ || = ∇ ∗ |∇u∗ |p−2 ∇u∗ dx = 0,

and hence λ∗ = || −1
f dx, which is the mean value of f . In conclusion, we
∗
have shown that for every f ∈ L () with f dx = 0, there exists a unique
q

solution u ∈ Lq () of the nonlinear partial differential equation

− div |∇u|p−2 ∇u = f in ,
|∇u|p−2 ∇u · ν = 0 on ∂,

u dx = 0.

The solution u is exactly the minimizer of (6.38).

6.3.2 Practical Applications

The theory of convex minimization with Sobolev penalty that we have developed up
to now gives a unified framework to treat the motivating examples from Sect. 6.1. In
the following we revisit these problems and present some additional examples.
Application 6.94 (Denoising with Lq -Data and H 1,p -Penalty) Consider the
denoising problem on a bounded Lipschitz domain ⊂ Rd . Further we assume
that 1 < p ≤ q < ∞. Let u0 ∈ Lq () be a noisy image and let λ > 0 be given. We
aim to denoise u0 by solving the minimization problem

1 λ
min |u − u | dx +
0 q
|∇u|p dx. (6.39)
u∈Lq () q p

It is easy to see that this problem has a unique solution: the identity A = id is
injective and has closed image, and since for r = q the norm on Lr () is strictly
convex, we obtain uniqueness and existence from Theorem 6.86.
Let us analyze the solutions u∗ of (6.39) further. For example, it is simple to see
that the mean values of u∗ and u0 are equal in the case q = 2, i.e., Q1 u∗ = Q1 u0
(see Exercise 6.27, which treats a more general case). It is a little more subtle to
show that a maximum principle holds. We can derive this fact directly from the
properties of the minimization problem:
Theorem 6.95 Let ⊂ Rd be a bounded Lipschitz domain and let 1 < p ≤ q <
∞. Moreover, let u0 ∈ L∞ () with L ≤ u0 ≤ R almost everywhere and λ > 0.
Then the solution u∗ of (6.39) also satisfies L ≤ u∗ ≤ R almost everywhere.
6.3 Minimization in Sobolev Spaces and BV 335

Proof Let F be the functional in (6.39). We plug in the function u =

Moreover, by Lemma 6.75 we get u ∈ H 1,p (), and also that ∇u = ∇u∗ almost
everywhere in {L ≤ u∗ ≤ R} and ∇u = 0 almost everywhere else. Hence,

1 1
|∇u|p dx ≤ |∇u∗ |p dx,
p p

and consequently F (u) ≤ F (u∗ ). Since u∗ is the unique minimizer, we get u∗ = u.

The construction of u shows the claim. #
$
The above shows that images u0 ∈ L∞ () are mapped to images in L∞ (), and
in particular it cannot happen that u∗ assumes values outside of the interval in which
u0 has its values. This can be seen in Figs. 6.11 and 6.12 which show and discuss
some results for different parameters p, q.
Now consider the Euler-Lagrange equation for (6.39), which we derive using
q
subgradients. The data term = q1 · − u0 q is continuous, and hence we
can use the sum rule (Theorem 6.51). The subgradient of satisfies ∂(u) =
|u − u0 |q−2 (u−u0 ), and the application of Lemma 6.91 together with the optimality
condition for subgradients (Theorem 6.43) leads to

|u∗ − u0 |q−2 (u∗ − u0 ) − λ div |∇u∗ |p−2 ∇u∗ = 0 in ,
(6.40)
|∇u∗ |p−2 ∇u∗ · ν = 0 on ∂.

This shows, at least formally, that the solution u∗ satisfies the nonlinear partial
differential equation

−G x, u∗ (x), ∇u∗ (x), ∇ 2 u∗ (x) = 0 in ,
ξ ξ
G(x, u, ξ, Q) = |u − u0 (x)|q−2 u0 (x) − u + λ|ξ |p−2 trace id +(p − 2) ⊗ Q
|ξ | |ξ |

with respective boundary conditions. The function G is (degenerate) elliptic in the

sense of Theorem 5.11 (see also Definition 5.10). For p = 2, the second-order term
is the Laplace operator weighted by λ, which has favorable analytical properties.
However, in the case ξ = 0 and p < 2 there is a singularity in G. On the other hand,
the case ξ = 0 and p > 2 leads to a G that is independent of Q, i.e. the respective
differential operator is degenerate there.
336 6 Variational Methods

u† u0 = u† + , PSNR = 20.00db

u∗ , p = 2, PSNR = 26.49db u∗ , p = 1.5, PSNR = 27.12db

u∗ , p = 1.2, PSNR = 27.65db u∗ , p = 1.1, PSNR = 27.84db

Fig. 6.11 Illustration of the denoising capabilities of variational denoising with Sobolev penalty.
Top: Left the original, right its noisy version. Middle and bottom: The minimizer of (6.39) for
q = 2 and different Sobolev exponents p. To allow for a comparison, the parameter λ has been
chosen to maximize the PSNR with respect to the original image. Note that the remaining noise
and the blurring of edges is less for p = 1.1 and p = 1.2 than it is for p = 2
6.3 Minimization in Sobolev Spaces and BV 337

u† u0 = u† + , PSNR = 12.02db

u∗ , q = 2, PSNR = 26.70db u∗ , q = 4, PSNR = 26.19db

u∗ , q = 6, PSNR = 25.81db u∗ , q = 12, PSNR = 26.70db

Fig. 6.12 Illustration of the influence of the exponent q in the data term of Application 6.94. Top:
Left the original, right a version with strong noise. Middle and bottom: The minimizer of (6.39)
for p = 2 and different exponents q, again with λ optimized with respect to the PSNR. For larger
exponents q we see some “impulsive” noise artifacts, which again get less for q = 6, q = 12 due
to the choice of λ. The image sharpness does not vary much
338 6 Variational Methods

These facts complicate the analysis of solutions; however, in the case of u0 ∈

L∞ () one can show that u∗ has to be more regular than H 1,p () in the interior of
: for p ≤ 2 one has u∗ ∈ H 2,p ( ) for such that ⊂⊂ , and for p > 2 the
solution is still in a suitable Besov space (see [131] for details).
Using Theorem 6.76 on Sobolev embeddings, this shows that the denoised image
u∗ for d = 2 is continuous in (but probably does not have a continuous extension
to the boundary): for 1 < p ≤ 2 this follows from the embedding H 2,p ( ) →
C ( ), and for p > 2 this is a consequence of H 1,p () → C (). This shows that it
is impossible to reconstruct images with discontinuities by solving (6.39) or (6.40),
respectively. However, one notes that the solutions change qualitatively if p varies:
for p close to 1, the solution appears to be less blurred; see again Fig. 6.11. This
suggests having a closer look at the case p = 1, and we will do so in the next
subsection. Figure 6.12 allows us to study the influence of the exponent q in the
data term. It mostly influences the remaining noise, but does not change the overall
smoothness properties of the solution.
Remark 6.96 Another variation of the denoising approach from Application 6.94 is
to consider Sobolev penalties of higher order (see Exercise 6.27). Again, in the case
q = 2 we see that u∗ reproduces the respective polynomial parts of u0 up to order
m − 1.
Application 6.97 (Deconvolution with Sobolev Penalty) Let us analyze the
reconstruction of images from a blurred and noisy image u0 . We assume that we
know u0 on a bounded domain
and that the blur results from a convolution with
a kernel k ∈ L (0 ) with k dx = 1. For the data in one needs information
1

of u at most in the slightly larger set − 0 , and hence we assume that is a

bounded Lipschitz domain that satisfies − 0 ⊂ . Hence, the forward operator
A is defined as follows:

x∈ : (Au)(x) = (u ∗ k)(x) = u(x − y)k(y) dy.

By Theorem 3.13, A maps Lq () to Lq ( ) linearly and continuously for every
1 ≤ q < ∞. Moreover, A maps constant functions in to constant functions in ,
and hence A is injective on "1 .
If we choose 1 < p ≤ q < ∞ and q ≤ pd/(d − p) if p < d, then Theorem 6.86
implies the existence of a unique minimizer of the Tikhonov functional

1 0 q λ
min |u ∗ k − u | dx + |∇u|p dx (6.41)
q
u∈L () q p

for every u0 ∈ Lq ( ) and λ > 0.

Let us analyze the properties of the solution u∗ of the deconvolution problem;
see Fig. 6.13. Since the convolution maps
constant
functions to constant functions,
we get in the case q = 2 the identity u∗ dx = u0 dx (see also Exercise 6.29).
6.3 Minimization in Sobolev Spaces and BV 339

u† u0 , PSNR(u0 , u† ∗ k) = 33.96db k

u∗ , p = 2, PSNR = 26.04db u∗ , p = 1.5, PSNR = 26.82db

u∗ , p = 1.2, PSNR = 27.38db u∗ , p = 1.1, PSNR = 27.55db

Fig. 6.13 Illustration of the method (6.41) for joint denoising and deblurring. Top: Left the
original (320 × 320 pixels), right the measured data (310 × 310 pixels) obtained by convolution
with an out-of-focus kernel (right, diameter of 11 pixels) and addition of noise. Middle and bottom:
The minimizer of (6.41) for q = 2 and different exponents p with λ optimized for PSNR. For p
close to 1 one sees, similarly to Fig. 6.11, a reduction of noise, fewer oscillating artifacts, and a
sharper reconstruction of edges
340 6 Variational Methods

However, a similar derivation of a maximum principle as in Theorem 6.95 is not

possible.
The Euler-Lagrange equations for this problem are obtained similarly to Appli-
q
cation 6.94. To find the subgradient of the data term = q1 A · −u0 q , we need
∗
the adjoint A∗ (see Theorem 6.51): for some w ∈ Lq ( ) we have

w(x)(u ∗ k)(x) dx = u(y)k(x − y)w(x) dy dx

= w(x)k(x − y) dx u(y) dy

= (w ∗ D− id k)(y)u(y) dy,

i.e., the operator A∗ amounts to a zero-padding of w followed by convolution with

the reflected kernel k̄ = D− id k. If we introduce an additional variable for the
elements of ∂, we can characterize the minimizer u∗ by

v ∗ − λ div |∇u∗ |p−2 ∇u∗ ) = 0,
∗
|u ∗ k − u0 |q−2 (u∗ ∗ k − u0 ) ∗ k̄ = v ∗ in , (6.42)

|∇u∗ |p−2 ∇u∗ · ν = 0 on ∂.

In contrast to (6.40), we cannot see u∗ as the solution of a partial differential

equation: the system (6.42) contains a partial differential equation with a p-Laplace
operator, and also an equation with an integral operator (to be more precise, a
convolution). To analyze properties of u∗ one can focus on the p-Laplace operator.
If we assume that k ∈ Lq (0 ), for example, then v ∗ ∈ C (), since v ∗ is the
∗
convolution of |u∗ ∗ k − u0 |q−2 (u∗ ∗ k − u0 ) ∈ Lq ( ) and k̄ ∈ Lq (−0 ), and
by Theorem 3.14 it is continuous. Similar to the analysis of the solution of (6.40)
we can, for p ≤ 2, deduce from the results of [131] that u∗ ∈ H 2,p ( ) for all
⊂⊂ . Similarly, for d = 2, we know that every solution u∗ is continuous, i.e.
we expect qualitatively similar properties to those in the denoising case of (6.39).
This is confirmed by the numerical examples of this deconvolution method in
Fig. 6.13. There, we fix q = 2 and vary p. For p = 2 one notes, besides the
influence of the noise, also artifacts of the deconvolution: the solution oscillates in
the neighborhood of large variations of contrast, especially along the contour of the
penguins. Intuitively, this can be explained by the absence of a maximum principle:
if such a principle held, these oscillations would lead to “overshooting” and hence
would give a contradiction. For p close to 1 we again see a qualitative change of the
solution similar to Fig. 6.11.
To enforce a maximum principle, we may employ, for u0 with L ≤ u0 ≤ R,
the bounds L ≤ u ≤ R simply by adding the respective indicator functional IK .
It is simple to prove the existence of minimizers in Lq () (Exercise 6.28), but the
6.3 Minimization in Sobolev Spaces and BV 341

derivation of optimality conditions leads to some difficulties: the subgradient of ∂IK

∗
has to be seen as a subset of H 1,p ()∗ and not, as before, as a subset of Lq ().
Finally, we point out the generalization to penalties with higher order, i.e., m ≥ 2
(Exercise 6.29).
Application 6.98 (Inpainting with Sobolev Penalty) Now we turn to the
reconstruction of missing image parts, i.e., to inpainting. Denote by a bounded
Lipschitz domain and by ⊂ a bounded Lipschitz subdomain with ⊂⊂ , on
which we want to reconstruct an image that we know only on \ . For simplicity
we assume that we want to reconstruct only on a connected set ; the more general
case of a union of finitely many connected Lipschitz subdomains is obvious.
Our model for the images is the Sobolev space H 1,p () with p ∈ ]1, ∞[; we
assume that the “true” image u† is in H 1,p () and also assume that there exists
u0 ∈ H 1,p () for which we know that u† = u0 almost everywhere in \ . The
task of variational inpainting is to find an extension u∗ to with smallest Sobolev
seminorm, i.e. the solution of

1
min |∇u|p dx + I{v∈Lq () v| =u0 | } (u) (6.43)
u∈Lq () p \ \

for some q ∈ ]1, ∞[ with q ≤ d/(d − p) if p < d. This is a mini-

mization problem of a different type from the previously considered Tikhonov
functionals, but the situation is still covered by Theorem 6.84. The set K =
{v ∈ Lq () v = u0 almost everywhere in \ } is convex and bounded and has
nonempty intersection with H 1,p (), the corresponding indicator functional =
IK has the needed properties, except coercivity. However, coercivity follows from
the fact that for u ∈ K and v ∈ "1 with v = 0 it is always the case that u + v ∈ / K.
Hence, with ϕ(t) = p1 t p , we have all assumptions of Theorem 6.84 satisfied and the
existence of a minimizer u∗ is guaranteed.
Since ϕ is strictly convex, solutions may differ only in "1 , but this is not allowed
by the definition of K. Hence, the minimizer u∗ is unique.
We study properties of the minimizer of (6.43). It is easy to see that u∗ satisfies a
maximum principle: if L ≤ u0 ≤ R almost everywhere in \ , then L ≤ u∗ ≤ R
has to hold almost everywhere in , since one can use arguments
similar
those in
the proof of Theorem 6.95 that the function u = min R, max(L, u∗ ) is also a
minimizer. By uniqueness we obtain u∗ = u, as desired.
If we derive the optimality condition with the help of subdifferentiation, we face
a problem: the indicator functional for the constraint v = u0 almost everywhere
in \ is not continuous in Lq (), as well as the pth power of the seminorm
p
∇up . This poses difficulties in the applicability of Theorem 6.51. To prove
additivity of the subdifferential nonetheless, we derive an equivalent version of the
problem (6.43). The following lemma will be useful for that purpose.
342 6 Variational Methods

Lemma 6.99 Let be a bounded Lipschitz domain, m ≥ 1, and p ∈ [1, ∞[.

1. If, for u ∈ H m,p (Rd ) the identity u|Rd \ = 0 holds, then u is obtained by zero
m,p
padding of some u0 ∈ H0 ().
2. If, for some u ∈ H m,p (), the traces of u, ∇u, . . . , ∇ m−1 u vanish on ∂, then
m,p
u ∈ H0 ().
The proof uses the approximation arguments we already used in Theorems 6.74
and 6.88 and is a simple exercise (see Exercise 6.30).
Now consider u ∈ H 1,p () with u = u0 almost everywhere in \ . Then the
extension of v = u − u0 by zero is in H 1,p (Rd ) with v = 0 almost everywhere
outside of . By Lemma 6.99 we obtain that v ∈ H0 ( ), and thus the traces of
1,p

u and u coincide on ∂ . If, conversely, for u ∈ H ( ), the trace of u equals the
0 1,p

trace of u0 on ∂ , then Lemma 6.99 implies that u − u0 ∈ H0 () has to hold.
1,p

Thus, the extension of u by u0 outside of is in H 1,p (). We have shown that

{u ∈ H 1,p () u|\ = u0 |\ }

= {u ∈ Lq () u|\ = u0 |\ , (u − u0 )| ∈ H0 ( )}
1,p

= {u ∈ Lq () u|\ = u0 |\ , u| ∈ H 1,p ( ), u|∂ = u0 |∂ },

where we have understood the restriction onto ∂ as taking the trace with respect
to . This motivates the definition of a linear map ∇0 from Lq ( ) to Lp ( ):

dom ∇0 = H0 ( ) ⊂ Lq ( ), ∇0 u = ∇u in Lp ( ).

1,p

The space Lq ( ) contains H0 ( ) as a dense subspace, and thus ∇0 is densely
1,p

defined. The map is also closed: To see this, let un ∈ H0 ( ) with un → u in
1,p

Lq ( ) and ∇0 un → v in Lp ( , Rd ). By Lemma 6.73 we see that v = ∇u, and it

1,p
remains to show that u ∈ H0 (). To that end, note that Q1 un → Q1 u in Lp ()
by the equivalence of norms and by the Poincaré-Wirtinger inequality (see (6.35))
we get P1 (un − u)p ≤ C∇(un − u)p → 0 for n → ∞. Hence, we get un → u
in H 1,p ( ), and since H0 ( ) is a closed subspace, we get u ∈ H0 ( ). This
1,p 1,p

shows that ∇0 is closed.

This allows us to reformulate problem (6.43) equivalently as

1
∇0 (u − u0 )| + ∇u0 p dx + I
{v∈Lq () v|\ =u0 |\ }
min (u),
q
u∈L () p
(6.44)

where we implicitly extend the image of ∇0 by 0 to all of . The difference between

this and the formulation in (6.43) is that the functional

1
F1 (u) = ∇0 (u − u0 )| + ∇u0 p dx
p
6.3 Minimization in Sobolev Spaces and BV 343

is continuous on the affine subspace u0 + X1 , X1 = {v ∈ Lq () v| = 0}. Also

F2 (u) = I{v∈Lq () v| (u)

\ =u |\ }
0

is continuous on the subspace u0 + X2 , X2 = {v ∈ Lq () v|\ = 0}. Since the
restrictions P1 : u → uχ\ and P2 : u → uχ , respectively, are continuous and
they sum to the identity, we can apply the result of Exercise 6.14 and get ∂(F1 +
F2 ) = ∂F1 + ∂F2 .
If we denote by A : Lq () → Lq ( ) the restriction to and by E :
L ( , Rd ) → Lp (, Rd ) the zero padding, we get
p

p
F1 = p · p
1
◦ T∇u0 ◦ E ◦ ∇0 ◦ A ◦ T−u0 .

The map A is surjective, and by the results of Exercises 6.11 and 6.12 for A and
∇0 , respectively, as well as Theorem 6.51 for T−u0 , E, and T∇u0 , the subgradient
satisfies

A∗ ∇0∗ E ∗ Jp ∇u0 + ∇(u − u0 )|
1,p
if (u − u0 )| ∈ H0 (),
∂F1 (u) =
∅ otherwise,

with Jp (w) = w|w|p−2 for w ∈ Lp (, Rd ). It is easy to see that E ∗ is a restriction

onto (in the respective spaces), and similarly, A∗ is a zero padding. Finally, we
∗
calculate ∇0∗ : If w ∈ dom ∇0∗ ⊂ Lp ( ), then for all u ∈ D( )

w · ∇u dx = w · ∇0 u dx = (∇0∗ w)u dx,

and this shows that ∇0∗ w = − div w in the sense of the weak divergence. Conversely,
∗ ∗
let w ∈ Lp ( , Rd ) such that − div w ∈ Lq ( ). Then by the definition of
H0 ( ), we can choose for every u ∈ H0 ( ) a sequence (un ) in D( ) such
1,p 1,p

that u → u in L ( ) as well as ∇u → ∇u in Lp ( , Rd ). Thus,

n p n

w · ∇u dx = lim w · ∇un dx = − lim (div w)un dx = − (div w)u dx,
n→∞ n→∞

and hence w ∈ dom ∇0∗ and ∇0∗ = − div w. We have shown that
∗ ∗
∇0∗ = − div, dom ∇0∗ = {w ∈ Lp ( , Rd ) div w ∈ Lq ( )}

in other words, the adjoint of the gradient with zero boundary conditions is the weak
divergence. In contrast to ∇ ∗ , ∇0∗ operates on all vector fields for which the weak
divergence exists and not only on those for which the normal trace vanishes at the
boundary (cf. Theorem 6.88 and Remark 6.89).
344 6 Variational Methods

For the subgradients of F1 we get, using the convention that gradient and
divergence are considered on and the divergence will be extended by zero, that

if (u − u0 )| ∈ H0 ( ),
1,p
− div ∇(u| )|∇(u| )|p−2
∂F1 (u) =
∅ otherwise.

The calculation of the subgradient of the constraint F2 = IK is simple: Using the

subspace X2 , defined above, we can write K = u0 + X2 , and by Theorem 6.51 and
Example 6.48, we get

X2⊥ if u|\ = u0 |\ , ∗
∂F2 (u) = X2⊥ = {w ∈ Lq () w| = 0}.
∅ otherwise,
(6.45)

The optimality conditions for the minimizer u∗ of (6.44) are

⎧
⎨ (u∗ − u0 )| ∈ H 1,p ( ),
0
0 ∈ − div ∇(u∗ | )|∇(u∗ | )| p−2
+ X2⊥ with
⎩ u∗ |\ = u0 |\ .

Since the divergence of \ is extended by zero and (u∗ − u0 )| ∈ H0 () if
1,p

and only if u∗ | ∈ H 1,p ( ) with u∗ |∂ = u0 |∂ in the sense of the trace, we
conclude the characterization

u∗ = u0 in \ ,

− div ∇u∗ |∇u∗ |p−2 = 0 in , (6.46)

u∗ = u0 on ∂ .

Note that the last equality has to be understood in the sense of the trace of u0 on
∂ with respect to . In principle, this could depend on the values of u0 in the
inpainting domain . However, it is simple to see that the traces of ∂ with respect
to and \ coincide for Sobolev functions u0 ∈ H 1,p (). Hence, the solution
of the inpainting problem is independent of the auxiliary function u0 .
Again, the optimality conditions (6.46) show that u∗ has to be locally smooth in
: by the same argument as in Applications 6.94 and 6.97, we get u∗ ∈ H 2,p ( )

for all ⊂⊂ , if p < 2. For p = 2 we even get that the solution u∗ in
is harmonic (Example 6.4), and hence, u∗ ∈ C ∞ ( ). The case p > 2 is
treated in some original papers (see, e.g., [60] and the references therein) and at
least gives u∗ ∈ C 1 ( ). Thus, the two-dimensional case (d = 2) always leads to
continuous solutions, and this says that this method of inpainting is suited only for
the reconstruction of homogeneous regions; see also Fig. 6.14.
6.3 Minimization in Sobolev Spaces and BV 345

Fig. 6.14 Inpainting by solving (6.43) or (6.44), respectively. Top: Left the original u† together
with two enlarged details (according to the marked regions in the image), right the given u0 on
\ ( is given by the black region), again with details. Middle and bottom: The minimizer of
the inpainting functional for different p together with enlarged details. While the reconstruction of
homogeneous regions is good, edges get blurred in general. As the details show, this effect is more
prominent for larger p; on the other hand, for p = 1.1 some edges are extended in a sharp way (the
edge of the arrow in the left, lower detail), but the geometry is not always reconstructed correctly
(disconnected boundaries of the border of the sign in the upper right detail)
346 6 Variational Methods

Application 6.100 (Variational Interpolation/Zooming) We again consider the

problem to generate a continuous image u∗ : → R, = ]0, N[ × ]0, M[ ⊂
R2 from a discrete one U 0 : {1, . . . , N} × {1, . . . , M} → R. In Sect. 3.1.1 we
have already seen an efficient method to do so. While that method uses evaluation
of the continuous image at specific points, we show a variational approach in the
following.
To begin with, we assume that we are given a map A : Lq () → RN×M , q ∈
]1, ∞[, that maps a continuous image to the discretely sampled points. In addition,
we assume that A is linear, continuous, and surjective. Moreover, A should not map
constant functions to zero, which would indeed not be appropriate, since A is a kind
of “restriction operator.”
These assumptions on A imply immediately that u → (Au)i,j is an element in
∗
the dual space of Lq (), and this means that there are functions wi,j ∈ Lq ()
such that

(Au)i,j = w , uLq ∗ ×Lq =
i,j
wi,j u dx

holds for all u ∈ Lq (). The surjectivity of A is equivalent to the linear indepen-
dence of the wi,j ; the assumption that constant functions are not in the kernel of
A can be expressed with the vector w̄ ∈ RN×M , defined by w̄i,j = wi,j dx,
simply as w̄ = 0. In view of the above, the choice wi,j (x1 , x2 ) = k(i − x1 , j − x2 )
∗
with suitable k ∈ Lq (R2 ) seems natural. This amounts to a convolution with
subsequence point sampling, and hence k should be a kind of low-pass filter, see
Sect. 4.2.3. It is not hard to check that for example, k = χ]0,1[×]0,1[ satisfies the
assumptions for the map A. In this case the map A is nothing else than averaging u
over the squares ]i − 1, i[ × ]j − 1, j [. Using k(x1 , x2 ) = sinc(x1 − 12 ) sinc(x2 − 12 )
leads to the perfect low-pass filter for the sampling rate 1 with respect to the
midpoints of the squares (i − 12 , j − 12 ); see Theorem 4.35.
The assumption that the image u is an interpolation for the data U0 is now
expressed as Au = U 0 . However, this is true for many images; indeed it is true
for an infinite-dimensional affine subspace of Lq (). We aim to find the image that
is best suited for a given image model. Again we use the Sobolev space H 1,p ()
for p ∈ ]1, ∞[ as a model, where we assume that q ≤ 2/(2 − p) holds if p < 2.
This leads to the minimization problem

1
min |∇u|p dx + I{v∈Lq () Av=U 0 } (u). (6.47)
q
u∈L () p

Let us check for the existence of a solution using Theorem 6.84

and set, similar to
Application 6.98, = IK , this time with K = {u ∈ Lq () Au = U 0 }. We check
the assumption on .
It is obvious that there is some u0 ∈ Lq () such that Au0 = U 0 . However,
we want to show the existence of some u1 ∈ H 1,p () with this property. To that
end, we note that A : H 1,p () → RN×M is well defined, linear and continuous
6.3 Minimization in Sobolev Spaces and BV 347

by the embedding H 1,p () → Lq () (see Theorem 6.76). Now assume that A :
H 1,p () → RN×M is not surjective. Since RN×M is finite-dimensional, the image
of A is a closed subspace, and hence it must be that Au − U 0 ≥ ε for all u ∈
H 1,p () and some ε > 0. However, H 1,p () is dense in Lq (), and hence there
has to be some ū ∈ H 1,p () with ū − u0 q < 2ε A−1 , and thus

Aū − U 0 = Aū − Au0 ≤ Aū − u0 q < ε,

which is a contradiction. Hence, the operator A has to map H 1,p () onto RN×M ,
and thus there is some u1 ∈ H 1,p () with Au1 = U 0 . In particular is proper on
H 1,p ().
Convexity of is obvious, and the lower semicontinuity follows from the
continuity of A and the representation K = A−1 ({U 0 }). Finally, we assumed that
A does not map constant functions to zero, i.e., for u ∈ K and v ∈ "1 with v = 0,
we have u + v ∈ / K. This shows the needed coercivity of . We conclude with
Theorem 6.84 that there exists a minimizer u∗ of the functional in (6.47), and one
can argue along the lines of Application 6.98 that it is unique.
If we try to apply convex analysis to study the minimizer u∗ , we face similar
problems to the one in Application 6.98: The restriction is not continuous, and
hence we cannot use the sum rule for subdifferentials without additional work.
However, there is a remedy.
To see this, we note that by surjectivity of A : H 1,p () → RN×M there are NM
linear independent vectors ui,j ∈ H 1,p () (1 ≤ i ≤ N and 1 ≤ j ≤ M) such that
the restriction of A to V = span(ui,j ) ⊂ Lq () is bijective. Hence, there exists
A−1 −1
V and for T1 = AV A, one has T1 = T1 and ker(T1 ) = ker(A). For T2 = id −T1
2

this implies that rg(T2 ) = ker(A). With the above u1 the function

1
u∈V : u → |∇(u1 + u)|p dx
p

maps to R and is continuous in the subspace topology (Theorem 6.25). Similarly we

see the continuity of u → IK (u1 + u) = 0 for u ∈ rg(T2 ) in the subspace topology.
Hence, the sum rule for subgradients holds in this case (again, see Exercise 6.14);
some u∗ is a solution of (6.47) if and only if 0 ∈ ∂ p1 · p ◦ ∇ (u∗ ) + ∂IK (u∗ ). The
p

first term is again the p-Laplace operator, while for the second we have

ker(A)⊥ if Au = U 0 ,
∂IK (u) =
∅ otherwise.

Since A is surjective, the closed range theorem (Theorem 2.26) implies that
ker A⊥ = rg(A∗ ) = span(wi,j ), the latter since wi,j = A∗ ei,j for 1 ≤ i ≤ N
and 1 ≤ j ≤ M. Hence, the optimality conditions say that u∗ ∈ Lq () is a solution
348 6 Variational Methods

of (6.47) if there exists some λ∗ ∈ RN×M such that

N M ∗ i,j
− div |∇u∗ |p−2 ∇u∗ = λi,j w in ,
i=1 j =1

|∇u∗ |p−2 ∇u∗ · ν = 0 on ∂, (6.48)

1 ≤ i ≤ N,
u∗ wi,j dx = Ui,j
0
1 ≤ j ≤ M.

The components of λ∗ can be seen as Lagrange multipliers for the constraint Au =

U 0 (cf. Example 6.48). In case of optimality of u∗ , the λ∗ obey another constraint:
if we integrate the first equation in (6.48), similar to Example 6.93, on both sides
we get

N
M

λ∗i,j w i,j
dx = ∇ ∗ |∇u∗ |p−2 ∇u∗ dx = 0,
i=1 j =1

i.e., λ∗ · w̄ = 0 with the above defined vector of integrals w̄. Similar to the previous
applications one can use some theory for the p-Laplace equation to show that the
solution u∗ has to be continuous if the functions wi,j are in L∞ ().
While (6.48) is a nonlinear partial differential equation for p = 2 that is coupled
with linear equalities and the Lagrange multipliers, the case p = 2 leads to a linear
system of equalities. This can be solved as follows. In the first step, we solve for
every (i, j ) the equation
⎧
⎪
⎪ −zi,j = wi,j − 1
w
i,j dx in ,
⎪
⎨ ||

∇zi,j · ν = 0 on ∂, (6.49)

⎪
⎪
⎪
⎩ i,j
z = 0.

Such zi,j ∈ H 1 () exist and are unique, see

Example 6.93. Then we plug these into
the optimality system and set λ∗0 = ||−1 u∗ dx, which then leads to

1 1
u∗ wi,j dx = u∗ wi,j dx − u∗ dx wi,j dx + u∗ dx wi,j dx
|| ||

= u∗ (−zi,j ) dx + λ∗0 w̄i,j = (−u∗ )zi,j dx + λ∗0 w̄i,j

N
M
= λ∗k,l wk,l zi,j dx + λ∗0 w̄i,j = Ui,j
0
.
k=1 l=1
6.3 Minimization in Sobolev Spaces and BV 349

Here we used that ∇zi,j and ∇u∗ are contained in dom ∇ ∗ . We can simplify the
scalar product further: using zi,j dx = 0, we get

1
wk,l zi,j dx = wk,l − wk,l dy zi,j dx = (−zk,l )zi,j dx
||

= ∇zk,l · ∇zi,j dx.

Setting S(i,j ),(k,l) = ∇zk,l · ∇zi,j dx and using the constraint λ∗ · w̄ = 0, we
obtain the finite-dimensional linear system of equations
A
Sλ∗ + w̄λ∗0 = U 0 ,
(6.50)
w̄T λ∗ = 0,

for the Lagrange multipliers. This system has a unique solution (see Exercise 6.31),
and hence λ∗ and λ∗0 can be computed. The identity for −u∗ in (6.48) gives
uniqueness of u∗ up to constant functions, and the constant offset is determined
by λ∗0 , namely

N
M
u∗ = λ∗i,j zi,j + λ∗0 1, (6.51)
i=1 j =1

where 1 denotes the function that is equal to 1 on . In conclusion, the method for
H 1 interpolation reads as follows
1. For all (i, j ) solve Eq. (6.49).
2. Calculate the matrix S(i,j ),(k,l) = ∇zk,l · ∇zi,j dx and the vector w̄i,j =
i,j dx, and solve the linear system (6.50).
w
3. Calculate the solution u∗ by plugging λ∗ and λ∗0 into (6.51).
In practice, the solution of (6.47) and (6.48) is done numerically, i.e., the domain
is discretized accordingly. This allows one to obtain images of arbitrary resolution
from the given image u0 . Figure 6.15 compares the variational interpolation with
the classical methods from Sect. 3.1.1. One notes that variational interpolation deals
favorably with strong edges. Figure 6.16 illustrates the influence on the sampling
operator A. The choice of this operator is especially important if the data U 0
does not exactly match the original image u† . in this case, straight edges are not
reconstructed exactly, but change their shape in dependence of the sampling operator
(most prominently seen for p close to 1).
One possibility to reduce these unwanted effects is to allow for Au = U 0 .
For example, one can replace problem (6.47) by the minimization of the Tikhonov
functional

Au − U 0 22 λ
min + |∇u|p dx
u∈Lq () 2 p
350 6 Variational Methods

ukonstant u∗ , p = 2

ulinear u∗ , p = 1.5

usinc u∗ , p = 1.1

Fig. 6.15 Comparison of classical methods for interpolation and variational interpolation for
eightfold zooming. Left column: the interpolation methods from Sect. 3.1.1 of an image with
128 × 96 pixels to 1024 × 768 pixels. Specifically, we show the result of constant interpolation
(top), piecewise bilinear interpolation (middle), and tensor product interpolation with the sinc
function (bottom). Right column: The minimizers of the variational interpolation method (6.47)
for different exponents p. The used sampling operator A is the perfect low-pass filter. Constant
and bilinear interpolation have problems with the line structures. These are handled satisfactorily,
up to oscillation at the boundary, by usinc , and also u∗ with p = 2 is comparable. Smaller p allows
for sharp and almost perfect reconstruction of the dark lines, but at the expense of smaller details
like the lighter lines between the dark lines

M
for some λ > 0 (with the norm v22 = N i=1 j =1 |vi,j | on the finite dimensional
2

space R N×M ). As is generally the case with Tikhonov functionals, increasing λ

lowers the influence of data errors with respect to the sampling operator A.
6.3 Minimization in Sobolev Spaces and BV 351

U0 u∗ , k = kconst , p = 1.1 u∗ , k = kconst , p = 2

u∗ , k = ksinc , p = 1.1 u∗ , k = ksinc , p = 2

Fig. 6.16 Comparison of different sampling operators A for variational interpolation (6.47). Top
left: Given data U 0 (96 × 96 pixel). Same row: Results of eightfold zooming by averaging over
squares (i.e. kconst = χ]0,1[×]0,1[ ) for different p. Bottom row: Respective results for sampling with

the perfect low-pass filter ksinc (x1 , x2 ) = sinc(x1 − 12 ) sinc x2 − 12 . The operator that averages
over squares favors solutions with “square structures”; the images look “blocky” at some places.
This effect is not present for the perfect lowpass filter but there are some oscillation artifacts due
to the non-locality of the sinc function

6.3.3 The Total Variation Penalty

The applications in the previous section always used p > 1. One reason for this
m
constraint was that the image space of ∇ m , Lp (, Rd ), is a reflexive Banach space,
and for F convex, lower semicontinuous and coercive, one also could deduce that
F ◦ ∇ m is lower semicontinuous (see Example 6.29). However, the illustrations
indicated that p → 1 leads to interesting effects. On the one hand, edges are more
emphasized, and on the other hand, solutions appear to have more “linear” regions,
352 6 Variational Methods

which is a favorable property for images with homogeneous regions. Hence, the
question whether p = 1 can be used for a Sobolev penalty suggests itself, i.e.
whether H 1,1 () can be used as an image model. Unfortunately, this leads to
problems in the direct method:
Theorem 6.101 (Failure of Lower Semicontinuity for the H 1,1 Semi-norm) Let
⊂ Rd be a domain and q ∈ [1, ∞[. Then the functional ! : Lq () → R∞ given
by

!(u) = |∇u| dx if u ∈ H 1,1().
∞ otherwise,

is not lower semicontinuous.

Proof We prove the claim for with B1 (0) ⊂⊂ , and the general case follows
by translation and scaling. Let = B1 (0) and u = χ . Clearly u ∈ Lq (). We
choose a mollifier ϕ ∈ D(B1 (0)) and consider un = u ∗ ϕn−1 for n ∈ N such that
B1+n−1 (0) ⊂⊂ . Then it holds that un → u in Lq (). Moreover, un ∈ H 1,1()
with

0 n or |x| ≥ n ,
if |x| ≤ n−1 n+1
∇u (x) =
n
u ∗ ∇ϕn−1 otherwise,

where ∇ϕn−1 (x) = nd+1 ∇ϕ(nx). Young’s inequality for convolutions and the

identities |Br (0)| = r d |B1 (0)| and (1 − n−1 )d = dk=0 dk (−1)k n−k lead to

|∇un | dx ≤ u dx |∇ϕn−1 | dx = n∇ϕ1 1 dx
{ n−1
n ≤|x|≤ n }
n+1
Rd {1−n−1 ≤|x|≤1}

d
d
d−1
d
≤ Cn 1 − (1 − n−1 )d = Cn (−1)k+1 n−k ≤ C n−k .
k k+1
k=1 k=0

The right-hand side is bounded for n → ∞, and hence lim infn→∞ !(un ) < ∞.
However, u ∈ H 1,1 () cannot be true. If it were, we could test with φ ∈
D(B1 (0)) for i = 1, . . . , d and get

∂φ ∂φ
u dx = dx = ϕνi dx = 0.
∂xi ∂xi ∂

By the fundamental lemma of the calculus of variations we would get ∇u|B1 (0) = 0.
Similarly one could conclude that ∇u|\B1 (0) = 0, i.e., ∇u = 0 almost everywhere
in . This would imply that u is constant in (see Lemma 6.79), a contradiction.
By definition of ! this means !(u) = ∞.
In other words, we have found a sequence un → u, with !(u) >
lim infn→∞ !(un ), and hence ! is not lower semicontinuous. $
#
6.3 Minimization in Sobolev Spaces and BV 353

Fig. 6.17 Illustration of a sequence (un ) in H 1,1 (]0, 1[), which violates the definition property of
1
lower semicontinuity for u → 0 |u | dt in Lq (]0, 1[). If the ramp part of the functions un gets
arbitrarily steep while the total increase remains constant, the derivatives ((un ) ) are a bounded
sequence in L1 (]0, 1[), but the Lq limit u of (un ) is only in Lq (]0, 1[) and not in H 1,1 (]0, 1[).
In particular, ((un ) ) does not converge in L1 (]0, 1[), and one wonders in what sense there is still
some limit

See also Fig. 6.17 for an illustration of a slightly different counterexample for the
lower semicontinuity of the H 1,1 semi-norm in dimension one.
The above result prohibits direct generalizations of Theorems 6.84 and 6.86 to the
case p = 1. If we have a closer look at the proof of lower semicontinuity of F ◦A for
F : Y → R∞ convex, lower semicontinuous, coercive and A : X ⊃ dom A → Y
strongly-weakly closed in Example 6.29 we note that an essential ingredient is
to deduce the existence of a weakly convergent subsequence of (Aun ) from the
boundedness of F (Aun ). However, this very property fails for ∇ as a strongly-
weakly closed operator between Lq () and L1 (, Rd ).
However, we can use this failure as a starting
point to define a functional that is, is
some sense, a generalization of the integral |∇u| dx. More precisely, we replace
L1 (, Rd ) by the space of vector-valued Radon measures M(, Rd ), which is, on
the one hand, the dual space of a separable Banach space: by the Riesz-Markov
theorem (Theorem 2.62) M(, Rd ) = C0 (, Rd )∗ . On the other hand, L1 (, Rd )
is isometrically embedded in M(, Rd ) by the map u → uLd , i.e., uLd M =
u1 for all u ∈ L1 (, Rd ) (see Example 2.60).
Since the norm on M(, Rd ) is convex, weakly* lower-semicontinuous and
coercive, it is natural to define the weak gradient ∇ on a subspace of Lq () with
values in M(, Rd ) and to consider the concatenation · M ◦ ∇. How can this
weak gradient be defined? We simply use the most general notion of derivative
that we have, namely the distributional gradient, and claim that this should have a
representation as a finite vector-valued Radon measure.
Definition 6.102 (Weak Gradient in M(, Rd )/Total Variation) Let ⊂ Rd
be a domain. Some μ ∈ M(, Rd ) is the weak gradient of some u ∈ L1loc () if for
every ϕ ∈ D(, Rd )

u div ϕ dx = − ϕ dμ.

354 6 Variational Methods

If it exists, we denote μ = ∇u and call its norm, denoted by TV(u) = ∇uM , the
total variation of u. If there does not exist a μ ∈ M(, Rd ) such that μ = ∇u, we
define TV(u) = ∞.
It turns out that this definition is useful, and we can deduce several pleasant
properties.
Lemma 6.103 Let be a domain and q ∈ [1, ∞[. Then for u ∈ Lq () one has he
following.
1. If there exists a μ ∈ M(, Rd ) with μ = ∇u as in Definition 6.102, it is unique.
2. It holds ∇u ∈ M(, Rd ) with ∇uM ≤ C if and only if for all ϕ ∈ D(, Rd ),

u div ϕ dx ≤ Cϕ∞ .

In particular, we obtain

TV(u) = sup u div ϕ dx ϕ ∈ D(, Rd ), ϕ∞ ≤ 1 . (6.52)

3. If a sequence (un ) converges to u in Lq () and ∇un ∈ M(, Rd ) for every n

∗
with ∇un μ for some μ ∈ M(, Rd ), then μ = ∇u.
The proof does not use any new techniques (see Exercise 6.33).
We remark that Eq. (6.52) is often used as a definition of the total variation.
Since it can also take the value ∞, we will use he total variation for general
functions in Lq () in the following. If, however, it is clear from the context that
∇u ∈ M(, Rd ) exists, then we also write, equivalently, ∇uM .
Example 6.104 (Total Variation for Special Classes of Functions)
1. Sobolev functions
() has a gradient in M(, R )
We already noted that every element u ∈ H 1,1 d

with (∇u)M = (∇u)L1 L . Hence ∇uM = |∇u| dx.

2. Characteristic functions
Let be a bounded Lipschitz subdomain of and u = χ . Then by the
divergence theorem (Theorem 2.81), one has for ϕ ∈ D(, Rd ) with ϕ∞ ≤ 1
that

u div ϕ dx = div ϕ dx = ϕ · ν dHd−1 .
∂

This means simply that ∇u = −νHd−1 ∂ , and this is a measure in M(, Rd ),
since for every ϕ ∈ C0 (, Rd ), ϕ · ν is Hd−1 integrable on ∂ , and

− ϕ · ν dHd−1 ≤ Hd−1 (∂ )ϕ∞ ,

6.3 Minimization in Sobolev Spaces and BV 355

i.e., ∇u ∈ C0 (, Rd )∗ , and the claim follows from the Riesz-Markov theorem
(Theorem 2.62). We also see that ∇uM ≤ Hd−1 (∂ ) has to hold. In fact, we
even have equality (see Exercise 6.36), i.e. TV(u) = Hd−1 (∂ ).
In other words, the total variation of a characteristic function of Lipschitz sub-
domains equals its perimeter. This motivates the following generalization of the
perimeter for measurable sets ⊂ :

Per( ) = TV(χ ).

In this sense, bounded Lipschitz domains have finite perimeter. The study of
sets with finite perimeter leads to the notion of Caccioppoli sets, which are a
further slight generalization.
3. Piecewise smooth functions
Let u ∈ Lq () be piecewise smooth, i.e., we can write as a union of some
k with mutually disjoint bounded Lipschitz domains 1 , . . . , K , and for every
k = 1, . . . , K one has uk = u|k ∈ C 1 (k ), i.e., u restricted to k can be
extended to a differentiable function uk on k . Let us see, whether the weak
gradient is indeed a Radon measure.
To begin with, it is clear that u ∈ H 1,1 (1 ∪· · ·∪K ) with (∇u)L1 |k = ∇uk .
For ϕ ∈ D(, Rd ) we note that
K

u div ϕ dx = ϕ · u ν dH
k k d−1
− ϕ · ∇uk dx, (6.53)
k=1 ∂k ∩ k

where ν k denotes the outer unit normal to k . We can rewrite this as follows. For
pairs 1 ≤ l < k ≤ K, let l,k = l ∩ k ∩ and ν = ν k on l,k . Then

νk = ν on l,k ∩ k , ν l = −ν on l,k ∩ l ,

and for 1 ≤ k ≤ K, one has

∂k ∩ = 1,k ∪ · · · ∪ k−1,k ∪ k,k+1 ∪ · · · ∪ k,K ,

and this implies

k−1
K

ϕ · uk ν k dHd−1 = ϕ · uk ν dHd−1 − ϕ · uk ν dHd−1 .
∂k ∩ l=1 l,k l=k+1 k,l

Plugging this into (6.53), we get the representation

k−1
K
u div ϕ dx = − ϕ · (ul − uk )ν dHd−1 − ϕ · (∇u)L1 dx,
k=1 l=1 l,k
356 6 Variational Methods

which means nothing other than

∇u = (∇u)L1 Ld + (ul − uk )νHd−1 l,k .
l<k

Note that this representation is independent of the order of the k : the product
(ul − uk )ν stays the same if we swap k and l . It is easy to see (Exercise 6.37),
that ∇u is a finite vector-valued Radon measure, and for the norm, one has

TV(u) = ∇uM = (∇u)L1 1 + |ul − uk | dHd−1 .
l<k l,k

Hence, the weak gradient is measured in the L1 norm, and the “jumps” ul −uk of
the function u are integrated along the interfaces l,k ; see Fig. 6.18 for a simple
example.
These considerations show that the total variation, when used as a penalty in
minimization problems, still allows functions with discontinuities. For images,
this is an exceptionally favorable property, for then we may view images, some-
what oversimplified, as smooth within objects k and discontinuous along object
boundaries (i.e., edges) ∂k . We expect that solutions of suitable optimization
problems have exactly these properties.
Prepared with the results of Lemma 6.103, we see that the total variation has
properties relevant for the direct method, that the functional |∇ · | dx does not
have.
Lemma 6.105 Let ∈ Rd be bounded. The space of functions of bounded total
variation

BV() = {u ∈ L1 () ∇u ∈ M(, Rd )}

equipped with the norm uBV = u1 + ∇uM is a Banach space.

Fig. 6.18 A piecewise constant function u and its gradient as Radon measure. It holds that ∇u =
σ H1 , where σ is vector field on the set of discontinuity of the function in the left image
(visualized in the right image). The point at the T-junction in has H1 measure zero and ∇u is not
defined there
6.3 Minimization in Sobolev Spaces and BV 357

Moreover, if ϕ : [0, ∞[ → R∞ is proper, convex, lower semicontinuous and

increasing, then

ϕ ∇uM for u ∈ BV(),
!(u) = ϕ TV(u) =
∞ otherwise,

is proper, convex, and (weakly) lower semicontinuous on every Lq (), 1 ≤ q < ∞.

Proof First we show the properties of the functional !. Since !(0) = ϕ(0), we see
that ! is proper. Also, we see that · M is convex, weakly* lower semicontinuous,
and coercive. Since ∇ with dom ∇ = BV() is strongly-weakly* closed as a
mapping from Lq () to M(, Rd ) (we have Lq () → L1 ()), we obtain by the
supplement in Example 6.29 that · M ◦ ∇ is convex and lower semicontinuous,
and by Lemmas 6.14 and 6.21 also that of ! = ϕ ◦ · M ◦ ∇.
It is clear that BV() is a normed space, so let us prove its completeness. Let
(un ) be a Cauchy sequence in BV(). Hence, there exists a u ∈ L1 () such that
limn→∞ un = u in L1 (). Moreover, for every k ≥ 1 there exists a nk such that
for all m ≥ nk one has ∇(unk − num )M ≤ 1k . By the lower semicontinuity of
· M ◦ ∇ we get

1
∇(unk − u)M ≤ lim inf ∇(unk − um )M ≤ ,
m→∞ k

and in particular u ∈ BV() and ∇(unk − u)M → 0 converges for k → ∞.

The sequence (∇(un − u)M ) is a Cauchy sequence in R, and hence, has at most
one accumulation points, and hence limn→∞ ∇(un − u)M = 0 and consequently
limn→∞ un = u in BV(). $
#
To obtain a result on existence of minimizers of functionals with total variation
penalty, we need some coercivity of !. This will be deduced from some kind of
Poincaré-Wirtinger inequality for the TV functional. To prove this, we need some
preparations.
Lemma 6.106 Let be a bounded Lipschitz domain and 1 ≤ q < ∞. Then for
every u ∈ BV() ∩ Lq (), there exists a sequence (un ) in C ∞ () such that

lim un = u in Lq () and lim |∇un | dx = ∇uM .
n→∞ n→∞

Proof We reuse the operators Mn from Theorem 6.74, and show that the sequence
un = Mn u for u ∈ BV() ∩ Lq () has the desired properties. Let us recall their
definition:

K

Mn u = Ttn ηk (ϕk u) ∗ ψ n
k=0
358 6 Variational Methods

for a smooth partition of unity (ϕk ), translation vectors ηk , step sizes tn , and scaled
mollifiers ψ n . By the arguments of Theorem 6.74 we get convergence un → u in
Lq ().
Now we choose w ∈ D(, Rd ) with w∞ ≤ 1 as a test function, and obtain

un div w dx = u(M∗n div w) dx = u div(M∗n w) dx − u(Nn∗ w) dx

= u div(M∗n w) dx − (Nn u) · w dx, (6.54)

where

K

K

M∗n w = ϕk T−tn ηk (w ∗ ψ̄ n ) , Nn∗ w = ∇ϕk · T−tn ηk (w ∗ ψ̄ n ) ,
k=0 k=0

and ψ̄ n = D− id ψ n (the operators coincide with the adjoints of the operators Mn

and Nn in the proof of Theorem 6.88). We have the following subgoals:
1. M∗n w ∈ D(, Rd ) with M∗n w∞ ≤ 1 for all n ∈ N,
2. limn→∞ div(M∗n w) = div w in C () and
3. limn→∞ Nn u = 0 in L1 ().
Subgoal 1: From the properties of M∗n we already get M∗n w ∈ D(, Rd ) (see proof
of Theorem 6.88), so we estimate for al x ∈ (using the properties of the partition
of unity and the mollifiers)

K
|(M∗n w)(x)| ≤ ϕk |w(y)|ψ n (y − x + tn ηk ) dy
k=0
(6.55)

K
≤ w∞ ϕk ψ (y) dy ≤ w∞ ≤ 1.
n

k=0 Rd

This shows that M∗n w∞ ≤ 1.

Subgoal 2: From ψ n dx = 1 and K
k=0 ∇ϕk · w = 0 we conclude that for all
x ∈ ,

K

div(M∗n w − w) (x) = ϕk (x) div w(y) − div w(x) ψ n (y − x + tn ηk ) dy
k=0

K

+ ∇ϕk (x) · w(y) − w(x) ψ n (y − x + tn ηk ) dy.
k=0
6.3 Minimization in Sobolev Spaces and BV 359

K
K
div(M∗ w − w)(x) ≤ ε ϕ k (x) ψ n (y) dy + ε |∇ϕk (x)| ψ n (y) dy ≤ Cε.
n
k=0 Rd k=0 Rd

The constant C>0 can be chosen independently of n, and hence div M∗n w−
div w∞ → 0 for n → ∞.
Subgoal 3: We easily check that

K

K
K
lim Nn u = lim Ttn ηk (u∇ϕk ) ∗ ψ n = lim Ttn ηk (u∇ϕk ) = u ∇ϕk = 0,
n→∞ n→∞ n→∞
k=0 k=0 k=0

since smoothing with a mollifier converges in L1 () (see Lemma 3.16) and
translation is continuous in L1 () (see Exercise 3.4).
Together with (6.54) our subgoals give the desired convergence |∇un | dx: for
every ε > 0 we can find some n0 such that for n ≥ n0 , Nn u1 ≤ 3ε . For these
n and arbitrary w ∈ D(, Rd ) with w∞ ≤ 1 we get from (6.54) and the total
variation as defined in (6.52) that

un div w dx = u div(M∗n w) dx − (Nn u)w dx

ε
≤ ∇uM + Nn u1 w∞ ≤ ∇uM + .
3
Now we take, for every n, the supremum over all test functions w and get

|∇un | dx ≤ ∇uM + ε.

hand, by (6.52), we can εfind some w ∈ D(, R ) with w∞ ≤ 1

On the other d

such that u div w dx ≥ ∇uM − 3 . For this w we can ensure for some n1 and
all n ≥ n1 that

ε
u div(M∗n w − w) dx ≤ u1 div(M∗n w − w)∞ ≤ .
3
360 6 Variational Methods

This implies

un div w dx = u div w dx + u div(M∗n w − w) dx − (Nn u)w dx

ε ε ε
≥ ∇uM − − −
3 3 3
and in particular, by the supremum definition (6.52),

|∇un | dx ≥ ∇uM − ε.

|∇u
Since ε>0 is arbitrary, we have shown the desired convergence limn→∞ n | dx

= ∇uM . $
#
Remark 6.107 The above theorem guarantees, for every u ∈ BV(), the existence
∗
of a sequence (un ) in C ∞ () with un → u in L1 () and ∇un ∇u in M(, Rd ).
By Lemma 6.103, the latter implies only the weaker property

∇uM ≤ lim inf |∇un | dx.
n→∞

Hence, the convergence un → u in L1 () and ∇un M → ∇uM for some

sequence (un ) in BV() and u ∈ BV() has a special name; it is called strict
convergence.
The approximation property from Lemma 6.106 allows us to transfer some
properties of H 1,1 () to BV().
Lemma 6.108 For a bounded Lipschitz domain ⊂ Rd and 1 ≤ q ≤ d/(d − 1)
with d/(d − 1) = ∞ for d = 1, one has the following:
1. There is a continuous embedding BV() → Lq (), and in the case q < d/(d −
1) the embedding is also compact.
2. There exists C > 0 such that for every u ∈ BV(), the following Poincaré-
Wirtinger inequality is satisfied:

1
P1 uq = u − u dx ≤ C∇uM .
|| q

Proof Assertion 1: Using Lemma 6.106 we choose, for every u ∈ BV(), a

sequence (un ) in C ∞ () that converges strictly to u. In particular, un ∈ H 1,1()
for every n. Since H 1,1 () → Lq () (see Theorem 6.76 and Remark 6.77), there
is a C > 0 such that for all v ∈ H 1,1 (), vq ≤ Cv1,1 = C(v1 + ∇vM ).
The Lq -norm is lower semicontinuous in L1 () (to see this, we note that it can be
expressed as · q ◦ A, where A is the identity with domain Lq () ⊂ L1 (), see
also Example 6.29 for a similar argument for q > 1). Thus, with the above C and
6.3 Minimization in Sobolev Spaces and BV 361

by strict convergence, we get

uq ≤ lim inf un q ≤ C(lim inf un 1 +lim inf ∇un M ) = C(u1 +∇uM ),
n→∞ n→∞ n→∞

which shows the desired inequality and embedding.

To show compactness of the embedding for q < d/(d − 1) we first consider the
case q = 1. For a bounded sequence
(un ) in BV(), choose v n ∈ C ∞ () such that
n n n n
v − u 1 ≤ n as well as ∇v M − ∇u M ≤ n . Then, the sequence (v n )
1 1

is bounded in H 1,1 (), and there exists, since H 1,1() → L1 () is compact, a
subsequence (v nk ) with limk→∞ v nk = v for some v ∈ L1 (). For the respective
subsequence (unk ), one has by construction that limk→∞ unk = v, which shows that
BV() → L1 () compactly.
For the case 1 < q < d/(d − 1) we recall Young’s inequality for numbers,
which is
∗
ap bp
ab ≤ + ∗, a, b ≥ 0, p ∈ ]1, ∞[.
p p

We choose r ∈ ]q, d/(d − 1)[ and p = (r − 1)/(q − 1), and get that for all a ≥ 0
and δ > 0,

1 r(q−1) − 1 r−q q − 1 r r − q − p∗
a q = δ p a r−1 δ p a r−1 ≤ δa + δ p a,
r −1 r −1

since p∗ = (r − 1)/(r − q) and pr + p1∗ = q. Consequently, there exists for every

δ > 0 a Cδ > 0, such that for all a ≥ 0,

a q ≤ δa r + Cδ a. (6.56)

Now let again (un ) be bounded in BV(), hence also bounded in Lr () with bound
L > 0 on the norm. By the above, we can assume without loss of generality that
εq
un → u in L1 (). For every ε > 0 we choose 0 < δ < 2L r and Cδ > 0 such

that (6.56) holds. Since (un ) is a Cauchy sequence in L1 (), there exists an n0 such
that

εq
|un1 − un2 | dx ≤
2Cδ

and hence (un ) is a Cauchy sequence in Lq () and hence convergent. By the
continuous embedding Lq () → L1 () the limit has to be u.
Assertion 2: We begin the proof for q = 1 with a preliminary remark. If ∇u = 0
for some u ∈ BV() then obviously u ∈ H 1,1 (), and by Lemma 6.79 we see
that u is constant. The rest of the argument can be done similarly to Lemma 6.81.
If
then inequality didn not hold, there would exist a sequence (un ) in BV() with
u dx = 0, u 1 = 1, and ∇u M ≤ n . It particular, we would have
n 1

limn→∞ ∇u = 0. By the compact embedding BV() → L1 () we get the

existence of a u ∈ L1 () and a subsequence (unk ) with limk→∞ unk = u. The

gradient is a closed mapping, and hence, ∇u = 0,
and by the preliminary remark, u
is constant in . Since also u dx = limn→∞ un dx = 0 holds, we get u = 0.
This shows that limk→∞ unk = 0, which contradicts un 1 = 1 and, hence the
inequality is proved for q = 1.
For 1 < q ≤ d/(d − 1) the claim follows using assertion 1:

P1 uq ≤ C P1 u1 + ∇uM ≤ C∇uM . $
#

Corollary 6.109 In the situation of Lemma 6.108, let ϕ : [0, ∞[ → R∞ be

coercive and 1 ≤ q ≤ d/(d − 1). The function !(u) = ϕ TV(u) is coercive
in the sense P1 uq → ∞ ⇒ !(u) → ∞ (since for all u ∈ Lq () one has
P1 uq ≤ C TV(u) for some C > 0, the claim follows from the coercivity of ϕ).
If ϕ is strongly coercive, one gets “strong” coercivity, i.e., P1 uq → ∞ ⇒
!(u)/P1 uq → ∞, in a similar way.
The theory of functions of bounded total variation is a large field, which is used
not only in image processing, but also for so-called free discontinuity problems (see,
e.g., [5]). It is closely related to geometric measure theory (the standard treatment
of which is [62]). For the sake of completeness we cite three results of this theory
that do not need any further notions.
Theorem 6.110 (Traces of BV() Functions) For a bounded Lipschitz domain
⊂ Rd there exists a unique linear and continuous mapping T : BV() →
L1Hd−1 (∂) such that for all u ∈ BV() ∩ C (), T u = u|∂ .
Moreover, this map is continuous with respect to strict convergence, i.e.,
=
un → u in L1 ()
⇒ T un → T u in L1Hd−1 (∂).
∇u M → ∇uM
n

The notion of trace allows one to formulate and prove a property that distin-
guishes BV() from the spaces H 1,p ().
Theorem 6.111 (Zero Extension of BV Functions) For a bounded Lipschitz
domain ⊂ Rd , the zero extension E : BV() → BV(Rd ) is continuous, and

∇(Eu) = ∇u − νT uHd−1 ∂,

6.3 Minimization in Sobolev Spaces and BV 363

where ν is the outer unit normal and T u is the trace on ∂ defined in Theo-
rem 6.110.
Corollary 6.112 For a bounded Lipschitz domain ⊂ Rd and a Lipschitz
subdomain ⊂ with ⊂⊂ , for u1 ∈ BV( ), u2 ∈ BV(\ ), and
u = u1 + u2 (with implicit zero extension) one has that u ∈ BV() and

∇u = ∇(u1 )| + ∇(u2 )|\ + (u2 |∂(\ ) − u1 |∂ )νHd−1 ∂ .

Here we take the trace of u1 on ∂ with respect to , and the trace of u2 on
∂(\ ) with respect to \ .
The following result relates functions of bounded total variation and the perimeter
of their sub-level sets
Theorem 6.113 (Co-area Formula) For a bounded Lipschitz domain ⊂ Rd
and u ∈ BV() it holds that

∇uM = 1 d|∇u| = Per {x ∈ u(x) ≤ t} dt = TV(χ{u≤t} ) dt.
R R

In other words, the total variation is the integral over all perimeters of the sublevel
sets.
The proofs of the previous three theorems can be found, e.g., in [5] and [61].
Using the co-area formula, one sees that for u ∈ BV() and h : R → R strongly
increasing and continuously differentiable with h ∞ < ∞, the scaled versions
h◦u are also contained in BV(). The value TV(h◦u) depends only on the sublevel
sets of u; see Exercise 6.34.
Now we can start to use TV as an image model in variational problems.
Theorem 6.114 (Existence of Solutions with Total Variation Penalty) Let ⊂
Rd be a bounded Lipschitz domain, q ∈ ]1, ∞[, with q ≤ d/(d − 1) and :
Lq () → R∞ proper on BV(), convex, lower semicontinuous on Lq (), and
coercive in "1 , i.e.,

1
u − u dx bounded and u dx → ∞ ⇒ (u) → ∞.
|| q

Moreover, let ϕ : [0, ∞[ → R∞ be proper, convex, lower semicontinuous,

increasing, and strongly coercive. Then for every λ > 0 there exists a solution u∗ of
the problem

min (u) + λϕ TV(u) .
u∈Lq ()

The assertion is also true if is bounded from below and ϕ is only coercive.
364 6 Variational Methods

Proof One argues similarly to the proof of Theorem 6.84, the respective properties
for the TV functional follow from Lemmas 6.105 and 6.108 as well as Corol-
lary 6.109. $
#
Analogously to Theorem 6.86, we obtain the following (see also [1, 31]):
Theorem 6.115 (Tikhonov Functionals with Total Variation Penalty) For a
bounded Lipschitz domain , q ∈ ]1, ∞[, a Banach space Y , and A ∈ L(Lq (), Y )
one has the following implication: If
1. q ≤ d/(d − 1) and A does not vanish for constant functions or
2. A is injective and rg(A) closed,
then there exists for every u0 ∈ Y , r ∈ [1, ∞[, and λ > 0 a solution u∗ of the
problem

Au − u0 rY
min + λ TV(u). (6.57)
q
u∈L () r

If A is injective, r > 1, and the norm in Y is strictly convex, then u∗ is unique.

The proof can be done with the help of Theorem 6.114, and one needs to show
only the needed properties of = 1r Au − u0 rY , which goes along the lines of the
proof of Theorem 6.86. The uniqueness follows from general consideration about
Tikhonov functionals (see Example 6.32). $
#
Now we aim to derive optimality conditions for minimization problems involving
the total variation. Hence, we are interested in the subgradient of TV. To develop
some intuition on what it looks like, we write TV = · M ◦ ∇, where ∇
is considered as a closed mapping from Lq () to M(, Rd ). By the result of
Exercise 6.12 we get

∂ TV = ∇ ∗ ◦ ∂ · M ◦ ∇.

Let us analyze ∂ · M ⊂ M(, Rd ) × M(, Rd )∗ . According to Example 6.49,

this subgradient is, for some μ ∈ M(, Rd ), the set

{σ M∗ ≤ 1} if μ = 0,
∂ · M (μ) = (6.58)
{σ M∗ = 1, σ, μM∗ ×M = μM } otherwise,

as a subset of M(, Rd )∗ . However, there is no simple characterization of the space

M(, Rd )∗ as a space of functions or measures, and hence it is difficult to describe
these sets. We note, though, that for some σ ∈ M(, Rd )∗ with σ M∗ ≤ 1
we can use the following construction: with the total variation measure |μ| (see
Definition 2.57) and an arbitrary v ∈ L1|μ| (, Rd ), the finite vector-valued measure

v|μ| is a Radon measure with v|μ| M = v1 , where the latter norm is the norm
6.3 Minimization in Sobolev Spaces and BV 365

in L1|μ| (, Rd ). It follows that

σ, v|μ|M∗ ×M ≤ v|μ| M = v1 .

Since v ∈ L1|μ| (, Rd ) was arbitrary and |μ| is finite, we can identify σ by duality
with an element in (σ )|μ| ∈ L∞ |μ| (, R ) with (σ )|μ| ∞ ≤ 1. By the polar
d
∞
decomposition μ = σμ |μ|, σμ ∈ L|μ| (, Rd ) (see Theorem 2.58) we obtain

σ, μM∗ ×M = μM ⇐⇒ (σ )|μ| · σμ d|μ| = 1 d|μ|.

Since (σ )|μ| ∞ ≤ 1, it follows that 0 ≤ 1 − (σ )|μ| · σμ |μ| almost everywhere, and
hence the latter is equivalent to (σ )|μ| · σμ = 1 |μ|-almost everywhere and hence,
to (σ )|μ| = σμ |μ|-almost everywhere (by the Cauchy-Schwarz inequality).
We rewrite the result more compactly: since the meaning is clear from context,
μ
we write σ instead of (σ )|μ| , and also we set σμ = |μ| , since σμ is the |μ|
almost everywhere uniquely defined density of μ with respect to |μ|. This gives
the characterization

μ
∂ · M (μ) = σ ∈ M(, Rd )∗ σ M∗ ≤ 1, σ = |μ| almost everywhere .
|μ|
(6.59)

Now we discuss the adjoint operator ∇ ∗ as a closed mapping between M(, Rd )∗

∗
and Lq (). We remark that it is not clear whether ∇ ∗ is densely defined (since
M(, Rd ) is not reflexive). Testing with u ∈ D(), we get for σ ∈ dom ∇ ∗ that

(∇ ∗ σ )u dx = σ, ∇uM∗ ×M ,

hence in the sense of distributions we have ∇ ∗ σ = − div σ . Moreover, for σ ∈

C ∞ (, Rd ) and all u ∈ C ∞ () one has that

σ · ∇u dx = uσ · ν dHd−1 − u div σ dx.
∂

The right-hand side canbe estimated by the Lq norm only if σ · ν = 0 on ∂. If this
is the case, then μ → σ dμ induces a continuous linear map σ̄ on M(, Rd ),
which satisfies for every u ∈ C ∞ () that

|σ̄ , ∇uM∗ ×M | ≤ div σ q ∗ uq .

Since test functions u are dense in Lq () (see Theorem 6.74), σ̄ ∈ dom ∇ ∗ has
to hold. In some sense, dom ∇ ∗ contains only the elements for which the normal
366 6 Variational Methods

trace vanishes on the boundary. Since we will use another approach later, we will
not try to formulate a normal trace for certain elements in M(, Rd )∗ , but for the
time being, use the slightly sloppy characterization
∗
∇ ∗ = − div, dom ∇ ∗ = {σ ∈ M(, Rd )∗ div σ ∈ Lq (), σ · ν = 0 on ∂}.

Collecting the previous results, we can describe the subgradient of the total
variation as
∇u

∂ TV(u) = − div σ σ M∗ ≤ 1, σ · ν = 0 on ∂, σ = |∇u| almost everywhere .
|∇u|
(6.60)

Unfortunately, some objects in the representation are not simple to deal with. As
already noted, the space M(, Rd )∗ does not have a characterization as a function
space (there are attempts to describe the biduals of C (K) for compact Hausdorff
spaces K [82], though). Also, we would like to have a better understanding of the
divergence operator on M(, Rd )∗ , especially under what circumstances one can
speak of a vanishing normal trace on the boundary. Therefore, we present a different
approach to characterizing ∂TV, which does not use the dual space of the space of
vector-valued Radon measures but only regular functions.
At the core of the approach lies the following normed space, which can be seen
as a generalization of the sets defined in (6.37) for p = 1:

∗
Ddiv,∞ = σ ∈ L∞ (, Rd ) ∃ div σ ∈ Lq (Rd ) and a sequence (σ n ) in D(, Rd )

with lim σ n − σ q ∗ + div(σ n − σ )q ∗ = 0 ,
n→∞

σ div,∞ = σ ∞ + div σ q ∗ .
(6.61)

We omit the dependence on q also for these spaces. In some sense, the elements of
Ddiv,∞ satisfy σ · ν = 0 on ∂, cf. Remark 6.89. Let us analyze this space further;
we first prove its completeness.
Lemma 6.116 Let ⊂ Rd be a bounded Lipschitz domain and q ∈ ]1, ∞[. Then
Ddiv,∞ according to the definition in (6.61) is a Banach space.
Proof For a Cauchy sequence (σ n ) in Ddiv,∞ we have the convergence σ n → σ
∗
in L∞ (, Rd ) as well as div σ n → w in Lq (). By the closedness of the weak
divergence we have w = div σ , hence it remains to show the needed approximation
property from the definition (6.61). To that end, choose for every n a sequence (σ n,k )
of test functions (i.e., in D(, Rd )), that approximate σ n in the sense of (6.61). Now,
for every n there is a kn such that

1
σ n,kn − σ n q ∗ + div(σ n,kn − σ n )q ∗ ≤ .
n
6.3 Minimization in Sobolev Spaces and BV 367

Since is bounded, we have σ n − σ q ∗ ≤ ||(q−1)/q σ n − σ ∞ . For ε > 0 we

choose n0 such that for all n ≥ n0 we have the inequalities
q
1 ε || q−1 ε ε
≤ , σ − σ ∞
n
≤ , div(σ n − σ )q ∗ ≤ ,
n 3 3 3
and for these n we have

σ n,kn − σ q ∗ + div(σ n,kn − σ )q ∗ ≤ σ n,kn − σ n q ∗ + div(σ n,kn − σ n )q ∗

q−1
+ || q σ n − σ ∞ + div(σ n − σ )q ∗
q
ε q−1 || q−1 ε ε
≤ + || q + = ε.
3 3 3

Hence, we have found a sequence in D(, Rd ) that approximates σ is the desired

way, which shows that σ ∈ Ddiv,∞ . $
#
Remark 6.117 According to (6.37) with p = q, the construction (6.61) differs from
1 only in the fact that its elements additionally lie in L∞ (, Rd ). Together
the set Ddiv
with Proposition 6.88 and Remark 6.89, we can therefore claim that

Ddiv,∞ = dom ∇ ∗ ∩ L∞ (, Rd )

∗
= {σ ∈ L∞ (, Rd ) div σ ∈ Lq (), σ · ν = 0 on ∂}, (6.62)

where the gradient is regarded as a closed mapping from Lq () to Lq (, Rd )

with domain H 1,q (). In particular, the proof of Proposition 6.88 demonstrates
that for σ ∈ Ddiv,∞ , the sequence (σ n ) in D(, Rd ) defined by σ n = M∗n σ
is approximating in the sense of (6.61). For these σ n , one can now estimate the
L∞ -norm: |σ n (x)| = |(M∗n σ )(x)| ≤ σ ∞ for x ∈ ; cf. (6.55) in the proof of
Lemma 6.106. We can therefore also write

∗
Ddiv,∞ = σ ∈ L∞ (, Rd ) ∃ div σ ∈ Lq (Rd ), (σ n ) in D(, Rd ) with

σ n ∞ ≤ σ ∞ , lim σ n − σ q ∗ + div(σ n − σ )q ∗ = 0 .
n→∞

The Banach space Ddiv,∞ is well suited for the description of the subgradient of
TV, as the following lemma will show.
Lemma 6.118 (Ddiv,∞ -Vector Fields and ∂ TV) For ⊂ Rd a bounded Lipschitz
∗
domain, q ∈ ]1, ∞[ and u ∈ BV() ∩ Lq (), we have that w ∈ Lq () lies in
∂ TV(u) if and only if there exists σ ∈ Ddiv,∞ such that

σ ∞ ≤ 1, − div σ = w and − u div σ dx = TV(u).

368 6 Variational Methods

Proof Let us first prove that w ∈ ∂ TV(u) if and only if

vw dx ≤ TV(v) for all v ∈ BV() ∩ Lq () and uw dx = TV(u).

(6.63)

We use similar arguments to those in Example 6.49. Let w ∈ ∂ TV(u). According

to the subgradient inequality, we have for all v ∈ BV() ∩ Lq () and λ > 0 that
after inserting λv, we have

λ vw dx ≤ λ TV(v) + uw dx − TV(u) .

Since uw − TV(u) does not depend on λ, on dividing by λ and considering the
limit as λ → ∞, we obtain the inequality vw dx ≤ TV(v). This shows the
first property asserted, which in particular implies uw dx ≤ TV(u). On the other
hand, inserting v = 0 into the subgradient inequality yields TV(u) ≤ uw dx.
Therefore, equality has to hold.
∗
For the converse, assume that w ∈ Lq () satisfies vw dx ≤ TV(v) for all
v ∈ BV() ∩ Lq () and uw dx = TV(u). Then for arbitrary v ∈ BV() ∩
Lq (), one has

TV(u) + w(v − u) dx = vw dx ≤ TV(v).

Hence, the subgradient inequality is satisfied, and we have w ∈ ∂ TV(u).

∗
Next, we show, for w ∈ Lq (), the equivalence

there exists σ ∈ Ddiv,∞ with
vw dx ≤ TV(v) for all v ∈ BV() ∩ Lq () ⇐⇒
w = − div σ and σ ∞ ≤ 1.

This assertion is true if and only if equality holds for the sets K1 and K2 defined by

q∗
K1 = w ∈ L () vw dx ≤ TV(v) for all v ∈ BV() ∩ Lq () ,

K2 = {− div σ σ ∈ Ddiv,∞ with σ ∞ ≤ 1}.

Being an intersection of convex closed halfspaces, K1 is convex, closed, and, of

course, nonempty. Analogously, it is easy to conclude that K2 is nonempty as well
as convex, and we will now show that it is also closed. For this purpose, we choose
∗
a sequence (wn ) in K2 with limn→∞ wn = w for some w ∈ Lq (). Every wn
can be represented as wn = − div σ n with σ ∈ Ddiv,∞ and σ n ∞ ≤ 1. Note
∗
that the sequence (σ n ) is also bounded in Lq (, Rd ), and hence there exists a
weakly convergent subsequence, which we again index by n. Since the div operator
6.3 Minimization in Sobolev Spaces and BV 369

is weakly-strongly closed, we obtain w = − div σ , and furthermore, the weak lower

semicontinuity of the L∞ -norm yields

σ ∞ ≤ lim inf σ n ∞ ≤ 1.
n→∞

It remains to show that σ can be approximated by D(, Rd ). Since every σ n is

contained in Ddiv,∞ , we obtain, analogously to the argumentation in Lemma 6.116,
a sequence (σ̄ n ) in D(, Rd ) with
∗ ∗
σ̄ n σ in Lq (, Rd ) and div σ̄ n → div σ in Lq ().

Using the same argument as in the end of the proof of Proposition 6.88, weak
convergence can be replaced by strong convergence, possibly yielding a different
sequence. Finally, this implies σ ∈ Ddiv,∞ , and thus K2 is closed.
Let us now show that K2 ⊂ K1 : For w = − div σ ∈ K2 , according to
Remark 6.117, there exists a sequence (σ n ) in D(, Rd ) with σ n ∞ ≤ σ ∞ ≤ 1
∗
and limn→∞ − div σ n = − div σ = w in Lq (). Using the supremum definition
of TV (6.52), this implies for arbitrary v ∈ BV() ∩ Lq () that

vw dx = lim − v div σ n dx ≤ TV(v),
n→∞

i.e., w ∈ K1 . Conversely, assume there existed w0 ∈ K1 \K2 . Then according to

the Hahn-Banach theorem (Proposition 2.29), there would exist an element u0 ∈
Lq (), that separates the compact convex set {w0 } and K2 , i.e.,

sup u0 w dx < u0 w0 dx.
w∈K2

However, since − div σ ∈ K2for every σ ∈ D(, Rd ) with σ ∞ ≤ 1, we infer

u0 ∈ BV() and TV(u0 ) < u0 w0 dx, a contradiction to w0 ∈ K1 . Hence, we
have K1 = K2 .
From this together with (6.63), the assertion follows. $
#
This result already shows that we can use the function space Ddiv,∞ and the
weak divergence instead of ∇ ∗ on M(, Rd )∗ . In order to obtain a representation
as in (6.60) with Ddiv,∞ , we still need a characterization analogous to (6.59). As
we have seen before, an essential ingredient for that is taking the trace σ → (σ )|μ|
from M(, Rd )∗ to L∞ d
|μ| (, R ). An analogous statement will not be possible for
general σ ∈ Ddiv,∞ and μ ∈ M(, Rd ), but according to (6.63), it suffices to
consider μ = ∇u with u ∈ BV(). This is the purpose of the following lemma.
370 6 Variational Methods

Lemma 6.119 (Traces of Ddiv,∞ -Vector Fields) For ⊂ Rd a bounded Lipschitz

domain, q ∈ ]1, ∞[ and u ∈ BV() ∩ Lq (), there is a linear and continuous
mapping

Tuν : Ddiv,∞ → L∞
|∇u| () with Tu σ ∞ ≤ σ ∞ ,
ν

such that for every σ ∈ D(, Rd ), one has

∇u
Tuν σ = σ · in L∞
|∇u| ()
|∇u|
∇u
where |∇u| is the sign of the polar decomposition of ∇u. Furthermore, Tuν is weakly
continuous in the sense that
∗
⎫
σn σ in Lq (, Rd )⎬ ∗
&⇒ Tuν σ n Tuν σ in L∞|∇u| ().
n
div σ div σ in L () q∗ ⎭

Proof Let u ∈ BV() ∩ Lq (). For σ ∈ Ddiv,∞ , we choose, according to

Remark 6.117, a sequence (σ n ) in D(, Rd ) with σ n ∞ ≤ σ ∞ , limn→∞ σ n =
σ , and limn→∞ div σ n = div σ in the respective spaces. Furthermore, we construct
an associated linear form as follows: for every ϕ ∈ C ∞ (), let

L(ϕ) = − u(ϕ div σ + ∇ϕ · σ ) dx = lim − u(ϕ div σ n + ∇ϕ · σ n ) dx
n→∞

∗
where the last equality is due to the convergence in Lq () and u ∈ Lq ().
Additionally, ϕσ n ∈ D(, Rd ) with div(ϕσ n ) = ϕ div σ n + ∇ϕ · σ n . Due to
u ∈ BV() and the characterization of TV in (6.52), this implies

|L(ϕ)| = lim u div(ϕσ n ) dx = lim ϕσ n d∇u
n→∞ n→∞

≤ lim inf σ n ∞ |ϕ| d|∇u| ≤ σ ∞ ϕ1 ,
n→∞

where the latter norm is taken in L1|∇u| (). Note that the set D() ⊂ C ∞ () is
densely contained in this space (cf. Exercise 6.35). Therefore, L can be uniquely
extended to an element in Tuν σ ∈ L1|∇u| ()∗ = L∞|∇u| () with Tu σ ∞ ≤ σ ∞ ,
ν

and hence the linear mapping Tuν : Ddiv,∞ → L∞ |∇u| () is continuous.
6.3 Minimization in Sobolev Spaces and BV 371

For σ ∈ D(, Rd ), we can choose σ n = σ , and the construction yields for all
ϕ ∈ C ∞ ()
∇u
(Tuν σ )ϕ d|∇u| = L(ϕ) = − u div(ϕσ ) dx = ϕ σ· d|∇u|.
|∇u|

Since the test functions are dense in L1|∇u| (), this implies the identity Tσν u =
∇u
σ · |∇u| in L∞
|∇u| ().
In order to establish the weak continuity, let (σ n ) and σ in Ddiv,∞ be given as in
the assertion. For ϕ ∈ C ∞ (), we infer, due to the construction as well as the weak
convergence of (σ n ) and (div σ n ), that

lim (Tuν σ n )ϕ d|∇u| = lim − u(ϕ div σ n + ∇ϕ · σ n ) dx
n→∞ n→∞

=− u(ϕ div σ + ∇ϕ · σ ) dx = (Tuν σ )ϕ d|∇u|.

∗
Resultingly, Tuν σ n Tuν σ converges in L∞
|∇u| (); again due to the density of test
functions. $
#
Remark 6.120 (Tuν as Normal Trace Operator) The mapping Tuν is the unique
∇u
element in L(Ddiv,∞ , L∞ |∇u| ()) that satisfies Tu σ = σ · |∇u| for all σ ∈ D(, R )
ν d

and that exhibits the weak continuity described in Lemma 6.119. (This fact is
implied by the approximation property specified in the definition of Ddiv,∞ .) If we
∇u
view − |∇u| as a vector field of outer normals with respect to the level-sets of u, we
can also interpret Tuν as the normal trace operator.
∇u
Thus, we write σ · |∇u| = Tuν σ for σ ∈ Ddiv,∞ . In particular, due to the weak
continuity, the following generalization of the divergence theorem holds:

∇u
u ∈ BV(), σ ∈ Ddiv,∞ : − u div σ dx = σ· d|∇u|.
|∇u|
(6.64)
The assertions of Lemmas 6.118 and 6.119 are the crucial ingredients for the
desired characterization of the subdifferential of the total variation.
Theorem 6.121 (Characterization of ∂ TV with Normal Trace) Let ⊂ Rd be
a bounded Lipschitz domain, q ∈ ]1, ∞[, q ≤ d/(d − 1), and u ∈ BV(). Then for
∗
w ∈ Lq (), one has the equivalence
⎧
⎪ σ ∞ ≤ 1,
⎪
⎨
w ∈ ∂ TV(u) ⇐⇒ there exists σ ∈ Ddiv,∞ with − div σ = w,
⎪
⎪
⎩ ∇u
σ · |∇u| = 1,
372 6 Variational Methods

∇u
where σ · |∇u| = 1 represents the |∇u|-almost everywhere identity for the normal
trace of σ .
In particular, using the alternative notation (6.62), we have that
∇u

∂ TV(u) = − div σ σ ∞ ≤ 1, σ · ν = 0 on ∂ and σ · = 1 |∇u|-almost everywhere ,
|∇u|

∗
with σ ∈ L∞ (, Rd ) and div σ ∈ Lq ().
Proof In view of the assertion in Lemma 6.118, we first consider an arbitrary σ ∈
Ddiv,∞ with σ ∞ ≤ 1. For this σ , there exists,
according
to Lemma 6.119, the
∇u
normal trace Tuν σ = σ · |∇u| ∈ L∞ () with σ · ∇u (x) ≤ 1 for |∇u|-almost all
|∇u| |∇u|
x ∈ . Together with (6.64), this implies
∇u
− u div σ dx = 1 d|∇u| ⇐⇒ 1− σ · d|∇u| = 0,
|∇u|

and since the integrand on the right-hand side is |∇u|-almost everywhere nonposi-
∇u
tive, we infer the equivalence to σ · |∇u| = 1 |∇u|-almost everywhere.
According to Lemma 6.118, w ∈ ∂ TV(u) if and only if there exists σ ∈ Ddiv,∞
with σ ∞ ≤ 1 and − div σ = w such that

− u div σ dx = 1 d|∇u|.

Due to the observation above, this is equivalent to the existence of a σ ∈ Ddiv,∞

∇u
with σ ∞ ≤ 1, − div σ = w and σ · |∇u| = 1 |∇u|-almost everywhere.
The last assertion simply expresses the equivalence just proved in set notation,
using the interpretation of Ddiv,∞ in (6.62). $
#
Remark 6.122 (Characterization of ∂ TV with Full Trace) According to an idea
in [77], we can define a “full” trace as follows: for u ∈ BV() ∩ Lq (),
∇u
σ ∈ Ddiv,∞ with σ ∞ ≤ 1 and σ · |∇u| = 1 |∇u|-almost everywhere, it
∇u
is also the case in a certain sense that σ = |∇u| |∇u|-almost everywhere. For
∗
an approximating sequence in (σ n ) D(, Rd )
with σ n → σ in Lq (, Rd ),
∗
div σ → div σ in L (), and σ ∞ ≤ σ ∞ , we infer for all n that the norm
n q n

of σ n ∈ L∞
|∇u| (, R ) is not greater than 1. Furthermore, due to 1 − |σ | ≥ 0
d n 2

|∇u|-almost everywhere, one has that

1 n ∇u 2 1 ∇u 1
σ − = |σ n |2 − σ n · +
2 |∇u| 2 |∇u| 2
1 1 n2 ∇u 1 1 ∇u
≤ + |σ | − σ n · + − |σ n |2 = 1 − σ n · .
2 2 |∇u| 2 2 |∇u|
6.3 Minimization in Sobolev Spaces and BV 373

The weak continuity of the normal trace now implies the convergence

n ∇u 2 ∇u
lim σ − d|∇u| ≤ lim 2 1 − σ n · d|∇u| = 0,
n→∞ |∇u| n→∞ |∇u|
∇u
i.e., we have limn→∞ σ n = |∇u| in L2|∇u| (, Rd ) and, due to the finiteness of the
measure |∇u|, also in L1|∇u| (, Rd ).
We now say that σ ∈ Ddiv,∞ has a (full) trace Tud σ ∈ L∞ d
|∇u| (, R ) if
and only if for every approximating sequence (σ ) as above, one has σ →
n n

Tud σ ∈ L1|∇u| (, Rd ). This yields, by a simple calculation, a densely defined,

closed operator Tud between Ddiv,∞ and L∞ |∇u| (, R ) with Tu σ = σ for all
d d

σ ∈ D(, R ). In contrast to the normal trace operator Tu , Tu does not have to

d ν d

be continuous.
Using this notion of a trace and writing, slightly abusing notation, σ = Tud σ , we
can express ∂ TV(u) by
∇u

∂ TV(u) = − div σ σ ∞ ≤ 1, σ · ν = 0 on ∂, σ = |∇u|-almost everywhere ,
|∇u|
∗
where throughout, we assume σ ∈ L∞ (, Rd ), div σ ∈ Lq (), and the existence
of the full trace of σ in L∞ d
|∇u| (, R ).
We now have three equivalent characterizations of the subgradient of the total
variation at hand. In order to distinguish them, let us summarize them here again.
∇u
Recall that for u ∈ BV() ∩ Lq (), |∇u| ∈ L∞ d
|∇u| (, R ) denotes the unique
∇u
element of the polar decomposition of the Radon measure ∇u, i.e., ∇u = |∇u| |∇u|.

1. Dual space representation: (Equation (6.60))

∇u

∂ TV(u) = − div σ σ M∗ ≤ 1, σ · ν = 0 on ∂, σ = |∇u|-a. e. .
|∇u|

In this case, − div σ ∈ Lq () and σ · ν = 0 on ∂ hold in the sense of

σ ∈ dom ∇ ∗ ⊂ M(, Rd )∗ with ∇ as a closed mapping between Lq ()
∇u
and M(, Rd ). The equality σ = |∇u| holds in the sense of σ = (σ )|∇u| ∈
L∞|∇u| (, R d ) as a restriction of σ on L1
|∇u| (, R ) ⊂ M(, R ).
d d

2. Ddiv,∞ normal trace representation: (Proposition 6.121)

∇u

∂ TV(u) = − div σ σ ∞ ≤ 1, σ · ν = 0 on ∂, σ · = 1 |∇u|-a. e. .
|∇u|

The existence of − div σ and σ · ν = 0 is considered to mean that σ ∈ Ddiv,∞

∇u
according to (6.61), whereas we interpret σ · |∇u| = 1 as an equation for
374 6 Variational Methods

∇u
σ · |∇u| = Tuν σ with the continuous normal trace operator Tuν : Ddiv,∞ →
L∞
|∇u| () according to Lemma 6.119.
3. Ddiv,∞ trace representation: (Remark 6.122)
∇u

∂ TV(u) = − div σ σ ∞ ≤ 1, σ · ν = 0 on ∂, σ = |∇u|-a. e. .
|∇u|

Again, the existence of − div σ and σ · ν = 0 on ∂ is an alternative expression

∇u
for σ ∈ Ddiv,∞ . The identity σ = |∇u| implicitly expresses that σ ∈ dom Tud ⊂
Ddiv,∞ with the not necessarily continuous trace operator Tud of Remark 6.119
∇u
and that the equation Tud σ = |∇u| in L∞ d
|∇u| (, R ) is satisfied.

Remark 6.123 (∂ TV and the Mean Curvature) For u ∈ H 1,1(), one has
(∇u)M = (∇u)L1 Ld , which implies that |(∇u)M |-almost everywhere is equivalent
to (Lebesgue)-almost everywhere in {(∇u)L1 = 0}. Furthermore, we have the
agreement of the sign
(∇u) (∇u)L1 (x)
M
(x) = for almost all x ∈ {(∇u)L1 = 0}.
|(∇u)M | |(∇u)L1 (x)|

Let us further note that the trace Tud σ (cf. Remark 6.122) exists for every σ ∈
Ddiv,∞ . This implies, writing ∇u = (∇u)L1 , that
∇u

∂TV(u) = − div σ σ ∞ ≤ 1, σ · ν = 0 on ∂, σ = a.e. in {∇u = 0} .
|∇u|

For sufficiently smooth u, we therefore

have that w ∈ ∂ TV(u) on {∇u = 0}
∇u
satisfies the identity w = − div |∇u| = −κ, where for x ∈ with ∇u(x) = 0,

κ(x) represents the mean curvature of the level-set {y ∈ u(y) = u(x)} in x (cf.
Exercise 5.9).
Therefore, the subgradient of the total variation provides a generalization of the
mean curvature of the level-sets of u:

κ = −∂ TV(u).

We will get back to this interpretation in the examples below.

Finally, let us consider the application of the total variation image model to the
image-processing problems introduced in Sect. 6.3.2.
Example 6.124 (Denoising with TV Penalty Term) As in Application 6.94, we
desire to denoise an image u0 ∈ Lq (), q ∈ ]1, ∞[ on a bounded Lipschitz domain
⊂ Rd , this time using the total variation as an image model. For λ > 0, we look
6.3 Minimization in Sobolev Spaces and BV 375

for a solution to

1
min |u − u0 |q dx + λ TV(u). (6.65)
q
u∈L () q

According to Proposition 6.115, there exists a unique solution to this problem.

The Euler-Lagrange equations can be derived analogously to Application 6.94;
we have only to substitute the subgradient of p1 ∇ · p accordingly. Therefore, u∗ is
p
∗
a solution of (6.65) if and only if there exists σ ∈ Ddiv,∞ such that the equations
⎧ ∗
⎪
⎪ |u − u0 |q−2 (u∗ − u0 ) − λ div σ ∗ = 0 in ,
⎪
⎪
⎨
σ∗ · ν = 0 on ∂,
⎪
⎪
⎪
⎪ ∇u∗
⎩ σ ∗ ∞ ≤ 1 and σ∗ = |∇u∗ |-almost everywhere
|∇u∗ |
(6.66)

are satisfied. Writing κ ∗ = div σ ∗ and interpreting this as the mean curvature of the
level-sets of u∗ , we can consider u∗ as the solution of the equation

|u∗ − u0 |q−2 (u∗ − u0 ) − λκ ∗ = 0.

Remark 6.125 The total variation penalty term was presented in the form of the
denoising problem with quadratic data term for the first time in [121]. For that
reason, the cost functional in (6.65) is also named after the authors as the Rudin-
Osher-Fatemi functional. Since then, TV has become one of the standard models in
image processing.
By means of Eq. (6.66) or rather its interpretation as an equation for the mean cur-
vature, we can gain a qualitative understanding of the solutions of problem (6.65).
For this purpose, we first derive a maximum principle again.
Lemma 6.126 (Maximum Principle for Lq -TV Denoising) If in the situation of
Example 6.124 L ≤ u0 ≤ R almost everywhere in for some L, R ∈ R, then the
solution u∗ of (6.65) also satisfies L ≤ u∗ ≤ R almost everywhere in .
Proof Let (un ) be a sequence in C ∞ () such that un → u∗ in Lq () as well as
TV(un ) → TV(u∗ ). According to Lemma 6.106, such a sequence exists; without
loss of generality, we can assume that un → u∗ even holds pointwise almost
everywhere in (by applying the theorem of Fischer-Riesz, Proposition 2.48).
As seen in the proof of the maximum principle in Proposition 6.95, we set
v n = min R, max(L, un ) as well as v ∗ = min R, max(L, u∗ ) . Analogously
to that situation, we see that one always has |v n − u0 | ≤ |un − u0 | almost
everywhere in . Together with the pointwise almost everywhere convergence and
376 6 Variational Methods

the boundedness of (v n ), we infer that v n → v ∗ in Lq () and

|v ∗ − u0 |q dx = lim |v n − u0 |q dx ≤ lim |un − u0 |q dx = |u∗ − u0 |q dx.
n→∞ n→∞

Furthermore, due to the chain rule for Sobolev functions (Lemma 6.75), we have
v n ∈ H 1,1 () with ∇v n = ∇un on {∇un = 0} and ∇v n = 0 otherwise, i.e.,

TV(v n ) = |∇v n | dx ≤ |∇un | dx = TV(un ).

If F denotes the cost functional in (6.65), then the choice of (un ), the properties of
(v n ), and the lower semicontinuity of F in Lq () imply

∗ 1
F (v ) ≤ lim inf F (v ) ≤
n
|u∗ − u0 |q dx + lim inf λ TV(un ) = F (u∗ ).
n→∞ q n→∞

Therefore,
v ∗ is a minimizer and due to uniqueness, we have u∗ =
min R, max(L, u∗ ) , which proves the assertion. $
#
Like the method in Application 6.94, the variational denoising presented in
Example 6.124 yields in particular a solution u∗ ∈ L∞ () if u0 ∈ L∞ (). In
this case, we also obtain |u0 − u∗ |q−2 (u0 − u∗ ) ∈ L∞ () and since the Euler-
Lagrange equation (6.66) holds, we have κ ∗ = div σ ∗ ∈ L∞ (). Interpreting κ ∗
as the mean curvature, we conclude that the mean curvatures of the level-sets of u∗
therefore have to be essentially bounded. This condition still allows u∗ to exhibit
discontinuities (in contrast to the solutions associated with the Sobolev penalty term
in (6.39)), but corners and objects with high curvature cannot be reproduced.
Figure 6.19 shows some numerical examples of this method. We can see that it
yields very good results for piecewise constant functions, especially in terms of the
reconstruction of the object boundaries while simultaneously removing the noise. If
this condition is violated, which usually happens with natural images, artifacts arise
in many cases, which let the result appear “blocky” or “staircased.” In this context,
this is again referred to as “staircasing artifacts” or the “staircasing effect.”
The effect of the regularization parameter λ can exemplarily be observed in
Fig. 6.20 for q = 2: for higher values, also the size of the details, that are not
reconstructed in the smoothed images anymore, increases. Relevant edges, however,
are preserved. An undesired effect, which appears for large λ, is a reduction of
contrast. One can show that this is a consequence of the quadratic error term and
that it is possible to circumvent it by a transition to the L1 -norm (L1 -TV, cf.
[36, 55]). Solutions of this problem even satisfy a variant of the gray value scaling
invariance [GSI] of Chap. 5: for suitable strictly increasing h : R → R and for a
minimizer u∗ of the L1 -TV problem for the data u0 , the scaled version h ◦ u∗ is a
minimizer for the scaled data h ◦ u0 ; see Exercise 6.38 for more details.
6.3 Minimization in Sobolev Spaces and BV 377

u0 , PSNR = 20.00db u∗ , PSNR = 27.98db

u0 , PSNR = 12.03db u∗ , PSNR = 28.06db

Fig. 6.19 Denoising with total variation penalty. Top: Left a noisy natural image (see Fig. 6.11
for the original), right the solution of (6.65). Bottom: The noisy version of a piecewise constant
artificial image (original in Fig. 5.7), right the result of TV denoising. In both cases q = 2 as
well as λ chosen to optimize PSNR. For the natural image we see effects similar to Fig. 6.11 for
p = 1.1. The reconstruction for the artificial image is particularly good: The original has small
total variation and hence, fits exactly to the modeling assumptions for (6.65)

Example 6.127 (Deconvolution with Total Variation Penalty) Of course, we can

deconvolution. As in Application 6.97, let ⊂ R
use the TV image model also for d

be a domain, k ∈ L (0 ) with k dx = 1, and a bounded Lipschitz domain such

that − 0 ⊂ . Moreover, assume that q ∈ ]1, ∞[ satisfies q ≤ d/(d − 1). Then

by Theorem 6.115 we get that for every u0 ∈ Lq ( ) and λ > 0 there is a solution
of the minimization problem

1
min |u ∗ k − u0 |q dx + λ TV(u). (6.67)
u∈Lq () q

Similarly to Application 6.97, we can derive the Euler-Lagrange equations for u∗ ,

and the only difference is the use of the subdifferential of the TV semi-norm: u∗ is
378 6 Variational Methods

Fig. 6.20 Effect of L2 -TV denoising with varying regularization parameter. Top left: Original.
Second row: Solutions u∗ for different λ. Third row: The difference images (u∗ − u0 )/λ with
contours of the level sets of u∗ . In both rows λ = 0.1, λ = 0.3, and λ = 0.9 have been used. The
Euler-Lagrange equation (6.66) states that the difference images coincide with the curvature of the
level sets, and indeed, this can be seen for the depicted contours

an optimal solution of (6.67) if and only if there exists σ ∗ ∈ Ddiv,∞ such that
⎧ ∗
⎪
⎪ |u ∗ k − u0 |q−2 (u∗ ∗ k − u0 ) ∗ k̄ = λ div σ ∗ in ,
⎪
⎪
⎨
σ∗ · ν = 0 on ∂,
⎪
⎪
⎪
⎪ ∇u∗
⎩ σ ∗ ∞ ≤ 1 and σ∗ = |∇u∗ | almost everywhere,
|∇u∗ |
(6.68)

where k̄ = D− id k.
6.3 Minimization in Sobolev Spaces and BV 379

u0 , PSNR(u0 , u† ∗ k ) = 33.96db u∗ , PSNR(u∗ , u† ) = 27.61db

u0 , PSNR(u0 , u† ∗ k ) = 32.03db u∗ , PSNR(u∗ , u† ) = 29.02db

Fig. 6.21 Deconvolution with total variation penalty. Top: Left the convolved and noisy data u0
from Fig. 6.13, right a minimizer u∗ of the functional (6.67) for q = 2. Bottom: Left convolved and
noisy data (comparable to u0 in Fig. 6.3), right the respective deconvolved version. The parameter
λ has been optimized for maximal PSNR

Qualitatively, the results are comparable with those of the denoising problem
with total variation; in particular one can see that the mean curvature is essentially
bounded if k ∈ Lq (0 ) (cf. the arguments in Application 6.97). Numerical results
for this method are reported in Fig. 6.21. Despite the noise, a considerably sharper
version of the noisy image could be reconstructed. However, details smaller than a
certain size have been lost. Moreover, the staircasing effect is a little bit stronger
than in Example 6.124 and leads to a blocky appearance which is typical for total
variation methods.
Example 6.128 (Total Variation Inpainting) Let ⊂ Rd be a bounded Lipschitz
domain and with ⊂⊂ a Lipschitz subdomain on which the “true image” u† :
→ R is to be reconstructed. Moreover, we assume that u† |\ ∈ BV(\ ),
and so the zero extension u0 of u† |\ is, by Theorem 6.111, in BV(). As image
380 6 Variational Methods

model we choose the TV functional, i.e., we want to find u∗ that solves

min TV(u) + I{v∈Lq () v| (u) (6.69)

\ =u |\ }
0
u∈Lq ()

for some given q ∈ ]1, ∞[ with q ≤ d/(d − 1). Using Theorem 6.114 and the same
techniques as in Application 6.98, we can prove the existence of such a u∗ ; however,
we do not know whether it is unique, since TV is not strictly convex.
For the optimality conditions we analyze the total variation functional on the set

K = {v ∈ Lq () v = u0 almost everywhere on \ }.

A v ∈ K with TV(v) < ∞ has to be of the form v = u0 + u, where u is the zero

extension of some element in BV( ). Conversely, Theorem 6.111 implies that the
zero extension of every u ∈ BV( ) is in BV(). Hence, we have that

{u ∈ BV() u|\ = u0 |\ } = {u ∈ Lq () u|\ = u0 |\ , u| ∈ BV( )}.

This allows us to rewrite problem (6.69) as

min TV uχ + u0 + I{v∈Lq () v| (u). (6.70)
\ =u |\ }
0
u∈Lq ()

The first summand in (6.70), let us denote it by F1 , is constant on the affine

subspace u0 + X1 , X1 = {u ∈ Lq () u| = 0} and hence continuous, while the
second one, the indicator
functional F2 = IK , is continuous on u0 + X2 with
X2 = {u ∈ L () u|\ = 0} for the same reason. This ensures that the sum
q

rule for subgradients can be applied in this situation (see Exercise 6.14). If we
denote by A the mapping u → uχ , we have that rg(A) = X2 , and according
to Exercise 6.15, we can apply the chain rule for subgradients to

F1 = TV ◦Tu0 ◦ A

and obtain with Theorem 6.51 that

∂F1 (u) = A∗ ∂ TV uχ + u0 ,
∗ ∗
where A∗ is the zero extension Lq ( ) → Lq (). Applying the characterization
of ∂ TV leads to w ∈ ∂F1 (u), and that u| ∈ BV( ) is equivalent to the existence
of some σ ∈ Ddiv,∞ such that
⎧
⎪
⎪ w=0 in \ , σ ·ν =0 on ∂,
⎨
σ ∞ ≤ 1, ∇ ū
⎪
⎪ σ = , |∇ ū| almost everywhere,
⎩ |∇ ū|
− div σ = w in ,
6.3 Minimization in Sobolev Spaces and BV 381

with ū = uχ + u0 . Corollary 6.112 gives

∇ ū = ∇(uχ + u0 ) = ∇(u| ) + ∇(u0 |\ ) + (u0 |∂(\ ) − u|∂ )νHd−1 ∂ ,

where u0 |∂(\ ) is the trace of u0 on ∂ with respect to \ and u|∂ is the
trace on ∂ with respect to . Since div σ is arbitrary on \ , σ plays no role
there, and we can modify the condition for the trace of σ accordingly to
⎧
⎪ ∇(u| )
⎪
⎪ σ = |∇(u| )| almost everywhere,
⎪
⎨ |∇(u| )|

⎪
⎪ σ =ν on {u|∂ < u0 |∂(\ ) },
⎪
⎪
⎩
σ = −ν on {u|∂ > u0 |∂(\ ) },

and the latter equalities hold Hd−1 -almost everywhere on ∂ .

The subdifferential of F2 is similar to (6.45), and with the characterization of
∂ TV we obtain, similarly to Application 6.98, that u∗ is a minimizer of (6.69) if
and only if there exists a σ ∗ ∈ Ddiv,∞ such that
⎧
⎪ u∗ = u0 in \ ,
⎪
⎪
⎪
⎪
⎪
⎪ − div σ ∗ = 0, σ ∗ ∞ ≤ 1 in ,
⎪
⎪
⎪
⎪
⎪
⎪ σ∗ · ν = 0 on ∂,
⎨
∇(u∗ | ) (6.71)
⎪
⎪
⎪ σ∗ = ∗ | )|
|∇(u∗ | )| almost everywhere,
⎪
⎪ |∇(u
⎪
⎪
⎪
⎪ ∗
⎪
⎪ σ =ν on {u∗ |∂ < u0 |∂(\ ) },
⎪
⎪
⎩
σ ∗ = −ν on {u∗ |∂ > u0 |∂(\ ) }.

In we can see div σ ∗ as the mean curvature of the level sets of u∗ , and we see
that the optimality conditions tell us that it has to vanish there. Hence, by definition,
every level set is a so-called minimal surface, and this notion underlies a whole
theory for BV functions; see, e.g., [65]. These minimal surfaces are connected to
the level sets of u0 on the boundary ∂ where the traces of u∗ |∂ and u0 |∂(\ )
coincide. In this case we get the impression that u∗ indeed connects the boundaries
of objects. However, it can happen that u∗ jumps at some parts of ∂ . According
to (6.71) this can happen only under special circumstances, but still, in this case
some level sets “end” at that point, and this gives results in which some objects
appear to be “cut off.”
In the case of images, i.e., for d = 2, we even have that vanishing mean
curvature means that TV inpainting connects object boundaries by straight lines
(see Exercise 6.40). In fact, this can be seen in Figs. 6.22 and 6.23.
382 6 Variational Methods

Fig. 6.22 Illustration of total variation inpainting. Left: Given data; the black region is the region
which is to be restored (see also Fig. 6.14). Right: The solution of the minimization problem (6.69).
Edges of larger objects are reconstructed well, but finer structures are disconnected more often

Fig. 6.23 TV inpainting connects level-sets with straight lines. Top row, left: An artificial image
with marked inpainting domains of increasing width. Right and below: Solutions u∗ of the
inpainting problem (6.69) for these regions. One clearly sees that some level sets are connected by
straight lines. At the points on the boundary ∂ where this does not happen, which are the points
where the solutions jump, such a connection would increase the total variation

In conclusion, we note that the TV model has favorable properties for the
reconstruction of image data, but the solutions of TV inpainting problems like (6.69)
may jump at the boundary of the inpainting domain, and hence the inpainting
domain may still be visible after inpainting. Moreover, object boundaries can be
connected only by straight lines which is not necessarily a good fit for the rest of the
object’s shape.
6.3 Minimization in Sobolev Spaces and BV 383

We also note that the solutions of (6.69) obey a maximum principle similar to
inpainting with Sobolev semi-norm (Application 6.98). Moreover, a variant of gray
value scaling invariance [GSI] from Chap. 5 is satisfied, similar to L1 -TV denoising
(Exercise 6.39).
Example 6.129 (Interpolation with Minimal Total Variation) We can also use the
TV functional in the context of Application 6.100. We recall the task: for a discrete
image U 0 ∈ RN×M we want to find a continuous u∗ : → R with = ]0, N[ ×
]0, M[ such that u∗ is an interpolation of the data U 0 . For a linear, continuous and
surjective sampling operator A : Lq () → RN×M with q ∈ ]1, 2] we require that
Au∗ = U 0 holds. Moreover, u∗ should correspond to some image model, in this
case the total variation model. This leads to the minimization problem

min TV(u) + I{v∈Lq () Av=U 0 } (u). (6.72)

u∈Lq ()

If constant functions are not in the kernel of A, we can argue similarly to

Application 6.100 using Theorem 6.114 and obtain the existence of a minimizer
u∗ . As in Example 6.128, minimizers are not necessarily unique.
We have all techniques to obtain optimality conditions for u∗ from Applica-
∗
tion 6.100, we have only to modify (6.48) for ∂ TV. With the wi,j ∈ Lq ()
associated to the linear maps u → (Au)i,j , i.e., (Au)i,j = uwi,j dx, we can
say that u∗ ∈ Lq () is optimal in the sense of (6.72) if and only if there exist
σ ∗ ∈ Ddiv,∞ and some λ∗ ∈ RN×M such that
⎧
⎪
⎪ σ ∗ ∞ ≤ 1,
⎪
⎪
⎪
⎪ N M
⎪
⎪
⎪
⎪ − div σ ∗ = λ∗i,j wi,j in ,
⎪
⎪ i=1 j =1
⎪
⎪
⎪
⎨ σ∗ · ν = 0 on ∂,
(6.73)
⎪
⎪ ∗ ∇u∗
⎪
⎪
⎪ σ = |∇u∗ |almost everywhere,
⎪
⎪ |∇u∗ |
⎪
⎪
⎪
⎪
⎪
⎪ 1 ≤ i ≤ N,
⎪ ∗ i,j 0
⎩ u w dx = Ui,j ,
⎪
1 ≤ j ≤ M.

Hence, optimal solutions u∗ satisfy an equation for the mean curvature that also has
to lie in the subspace spanned by the {wi,j }. If wi,j ∈ L∞ () for all i, j , then the
mean curvature of the level sets of u∗ has to be essentially bounded.
The actual form of solutions depends on the choice of the functions wi,j , i.e.,
on the sampling operator A; see Fig. 6.24 for some numerical examples. If one
chooses the mean value over squares, i.e. wi,j = χ]i−1,i[×]j −1,j [ , the level sets
necessarily have constant curvature on these squares. The curvature is determined
by λ∗i,j . The level sets of u∗ on ]i − 1, i[ × ]j − 1, j[ are, in the case of λ∗i,j = 0,
line segments (similar to Example 6.128), and are segments of circles in other cases
384 6 Variational Methods

Fig. 6.24 Examples of TV interpolation with perfect low-pass filter. Outer left and right: Original
images U 0 . Middle: Solutions u∗ of the TV interpolation problem with eightfold magnification.
All images are shown with the same resolution

Fig. 6.25 TV interpolation with mean values over image valued leads to solutions with piecewise
constant curvature of the level sets. Left: Original images U 0 (9 × 9 pixels). Middle: Solutions
u∗ of the respective TV interpolation problem with 60-fold magnification. Right: Level sets of u∗
together with a grid of the original image

(see Exercise 6.40). Hence, TV interpolation is well suited for images that fulfill
these characteristics. For images with a more complex geometry, however, we may
still hope that the geometry is well approximated by these segments. On the other
hand, it may also happen that straight lines are interpolated by segments of varying
curvature. This effect is most strongly if the given Data U 0 only fits loosely to the
unknown “true” image; see Fig. 6.25.
6.3 Minimization in Sobolev Spaces and BV 385

6.3.4 Generalization to Color Images

In the following we will briefly show how one can extend the introduced variational
methods to color images. We recall: depending on the choice of the color space, a
color image has N components, and hence can be modeled as a function u : →
RN . We discussed some effects of the choice of the color space, may it be RGB or
HSV, already in Chap. 5. We could also note that methods that couple these color
components in some way, usually lead to better results than methods without such
coupling. Hence, we focus on the development of methods that couple the color
components; moreover, we restrict ourselves to the RBG color space.
As an example, let us discuss the denoising problem with Sobolev semi-norm
or total variation from Application 6.94 and Example 6.124, respectively. Let u0 ∈
Lq (, RN ), N ≥ 1, be a given noisy color image. The result of the variational
problem (6.39) applied to all color channels separately amounts to the solution of
⎧
⎪
N
⎪
⎪
λ
|∇u | p

N ⎪
⎨p i dx if p > 1,
1
min |ui − u0i |q dx + i=1
u∈Lq (,RN ) q ⎪
⎪
N
i=1 ⎪
⎪ if p = 1,
⎩λ TV(ui )
i=1
(6.74)

respectively. To couple the color channels, we can choose different vector norms
in RN for the data term and different matrix norms in RN×d for the penalty term.
We want to do this in a way such that the channels do not separate, i.e., such that
the terms are not both sums over the contributions of the channels i = 1, . . . , N.
We focus on the pointwise matrix norm for ∇u, since one can see the influence of
the norms more easily in this case, and choose the usual pointwise Euclidean vector
norm on Lq (, RN ):

N q 1
N 1
2 q 2
1≤q<∞: uq = |ui (x)|2 dx , u∞ = ess sup |ui (x)|2 .
i=1 x∈ i=1

The analogous choice for the matrix norm, i.e., the sum of the squares of
entries, seems suitable; this amounts to the so-called Frobenius norm |∇u(x)|2F =
d
|∇u(x)|2 = N i=1 j =1 |∂xj ui (x)| , i.e., for 1 ≤ p < ∞
2

d
N 2 p 1 d
N 2 1
∂ui 2 p ∂ui 2
∇up = (x) dx , ∇u∞ = ess sup (x) .
∂xj x∈ ∂xj
i=1 j =1 i=1 j =1

If we define the divergence of a matrix-valued

d function componentwise, i.e., for
v : → RN×d we set (div v)i = j =1 xj i,j , then we obtain the following
∂ v
386 6 Variational Methods

generalization of the total variation to the vectorial total variation:

TV(u) = sup u · div v dx v ∈ D(, RN×d ), v∞ ≤ 1 . (6.75)

For q = 2 or p = 2 the Sobolev and total variation denoising

⎧
⎨ λ ∇up
1 p if p > 1,
min |u − u0 |q dx + p (6.76)
u∈Lq (,RN ) q ⎩
λ TV(u) if p = 1,

couples the color channels in the intended way.

We derive another possibility to couple the color channels and follow an idea
from [124]. For a fixed point x ∈ we consider a singular value decomposition of
∇u(x), i.e.
⎧
⎪
⎨ξ1 (x), . . . , ξK (x) ∈ R
d

K ⎪ orthonormal,

∇u(x) = σk (x) ηk (x) ⊗ ξk (x) , η1 (x), . . . , ηK (x) ∈ RN orthonormal,
⎪
⎪
k=1 ⎩
σ1 (x), . . . , σK (x) ≥ 0 singular values

with K = min (d, N) and (η ⊗ ξ )i,j = ηi ξj (see, e.g., [102]). The values σk (x) are
uniquely determined up to reordering. In the case N = 1 one such decomposition
∇u
is σ1 (x) = |∇u(x)|, ξ1 (x) = |∇u| (x) and η1 (x) = 1. For N > 1 we can interpret
ξk (x) as a “generalized” normal direction for which the color u(x) changes in the
direction ηk (x) at the rate σk (x). If σk (x) is large (or small, respectively), then
the color changes in the direction ξk (x) a lot (or only slightly, respectively). In
particular, maxk=1,...,K σk (x) quantifies the intensity of the largest change in color.
The way to define a suitable matrix norm for ∇u is, to use this intensity as a norm:

|∇u(x)|spec = max σk (x), σ1 (x), . . . , σK (x) singular values of ∇u(x).

k=1,...,K

This is indeed a matrix norm on RN×d , the so-called spectral norm. It coincides
with the operator norm of ∇u(x) as a linear mapping from Rd to RN . To see this,
note that for z ∈ Rd with |z| ≤ 1 we get by orthonormality of ηk (x) and ξk (x), the
Pythagorean theorem, and Parseval’s identity that

K
2
K

|∇u(x)z|2 = σk (x)2 ξk (x) · z ≤ ∇u(x)2spec ξk (x) · z2 ≤ |∇u(x)|2 .
spec
k=1 k=1

The supremum over all |z| ≤ 1 is assumed at z = ξk (x) with σk (x) =

maxl=1,...,K σl (x), and hence |∇u(x)|spec equals the operator norm.
6.3 Minimization in Sobolev Spaces and BV 387

This allows us to define respective Sobolev semi-norms and total variation: For
1 ≤ p < ∞ let
1
p p
∇up,spec = |∇u(x)|spec dx , u∞,spec = ess sup |∇u(x)|spec ,
x∈

as well as

TVspec (u) = sup u · div v dx v ∈ D(, RN×d ), v∞,spec ≤ 1 . (6.77)

The denoising problem with this penalty term reads

⎧
⎨λ p
1 |∇u(x)|spec dx if p > 1,
min |u − u | dx + p
0 q
(6.78)
u∈Lq (,RN ) q ⎩
λ TVspec (u) if p = 1.

The minimization problems (6.74)–(6.78) are convex optimization problems in a

Banach space. They can be treated with the methods developed in this chapter. In
contrast to the case of gray value images, we have to use results for Sobolev spaces
of vector valued functions and the space of vector valued functions of bounded
total variation, respectively. Basically, the theory for existence and uniqueness
of solutions carries over without problems, and hence we can speak of (unique)
minimizers. Also, convex analysis, especially subgradient calculus, carries over, and
we can derive Euler-Lagrange equations for optimality and discuss these equations.
These are, depending on the coupling of the minimization problems, a coupled
system of partial differential equations with couplings similar to the coupling of
the color components for the Perona-Malik equation (see Sect. 5.3.1).
Figure 6.26 shows numerical results for the three variational methods for total
variation denoising with quadratic data term. All methods give good results, but
there are some differences in the details: there are color artifacts at sharp transitions,
and the image seems have “too many colors” in these regions. This effect is
most prominent for the separable functional (6.74); the coupling by the Frobenius
norm (6.76) reduces the effect significantly. A further visual improvement is seen
for the reconstruction with the spectral norm (6.78), where areas with constant color
are reconstructed without any disturbances in the colors.
Other variational problems for color images can be formulated and solved,
e.g., deblurring as in Example 6.127. This approach leads to similar results; see
Fig. 6.27. It is instructive to observe the different effects for the different inpainting
problems in Figs. 6.28 and 6.29 and see how different models for color images
lead to different reconstructions for color transitions. Figure 6.28 clearly shows that
different total variation models with different pointwise matrix norms make a large
difference. The Frobenius norm (6.75) leads to many smooth color transitions in
the reconstruction, while the spectral norm (6.77) leads to the formation of edges,
388 6 Variational Methods

Fig. 6.26 Illustration of variational L2 -TV denoising of color images. Top: The original u† with
some marked details. Middle: Left the images with additive chromatic noise u0 (PSNR(u0 , u† ) =
16.48 dB), right the solution u∗sep with separable penalty term (6.74) (PSNR(u∗sep , u† ) = 27.84 dB).
Bottom: Left the solution u∗ for the pointwise Frobenius matrix norm (6.76) (PSNR(u∗ , u† ) =
28.46 dB), right the solution u∗spec for the pointwise spectral norm (6.78) (PSNR(u∗spec , u† ) =
28.53 dB)
6.3 Minimization in Sobolev Spaces and BV 389

u†

k u0 , PSNR(u0 , u† ∗ k) = 33.96db u∗ , PSNR(u∗ , u† ) = 23.59db

Fig. 6.27 Solution of the variational deconvolution problem with monochromatic noise and
blurred color data. Top: Original image u† . Bottom, left to right: the convolution kernal k, the
given data u0 , and the reconstruction u∗ obtained by L2 -TV deconvolution (with Frobenius matrix
norm)

Fig. 6.28 Inpainting of color images with different models. From left to right: Given data with
inpainting region, solutions for H 1 , TV, and TVspec penalty, respectively
390 6 Variational Methods

Fig. 6.29 Reconstruction of a color image from information along edges. Top: Left original u†
with two highlighted regions, right the given data u0 along edges and the inpainting region. Bottom:
Reconstruction with H 1 inpainting (left) and TV inpainting (with Frobenius matrix norm, right)

which is characteristic for total variation method and which we have seen for gray
images; cf. Fig. 6.23.
The use of the total variation with Frobenius matrix norm for the inpainting of
mostly homogeneous regions has some advantages over H 1 inpainting (which is
separable). In general, the reconstruction of edges is better and the “cropping effect”
is not as strong as in the case of scalar gray-valued TV inpainting; see Fig. 6.29.
For the sake of completeness, we mention that similar effects can be observed
for variational interpolation. Figure 6.30 shows a numerical example in which the
TVspec penalty leads to sharper color transitions than the TV penalty with Frobenius
matrix norm.
6.4 Numerical Methods 391

Fig. 6.30 Interpolation for color images. Top: Left the color image U 0 to be interpolated, right
the sinc interpolation (zoom factor 4). Bottom: Solution of the TV interpolation problem (left) and
the TVspec interpolation problem (right) for fourfold zooming

6.4 Numerical Methods

Our ultimate goal is, as it was in Chap. 5, where we developed methods based
on partial differential equations, to apply the method to concrete images. Since
our variational methods are genuinely minimization problems for functions on
continuous domains, we are faced with the question of appropriate discretization,
but we also need numerical methods to solve the respective optimization problems.
There exists a vast body of work on this topic, some of it developing methods for
special problems in variational imaging while other research develops fairly abstract
optimization concepts. In this section we will mainly introduce tools that allow us
to solve the variational problems we developed in this chapter. Our focus is more on
broad applicability of the tools than on best performance, high speed or efficiency.
However, these latter aspects should not be neglected, but we refer to the original
literature on these topics.
Let us start with the problem to find a solution for the convex minimization
problem

min F (u)
u∈X

over a Banach space X. We assume that F is proper, convex, lower semicontinuous,

and that there exists a solution. By Theorem 6.43 it is necessary and sufficient
392 6 Variational Methods

to solve the Euler-Lagrange equation 0 ∈ ∂F (u). If we assume that we can

calculate the inverse graph (∂F )−1 by elementary operations that our computer
can perform, we can just choose u∗ ∈ (∂F )−1 (0) as a solution. However, in all
our cases this is not the case; ∂F may be a nonlinear differential operator (as
in the optimality conditions in (6.40), (6.46)), or contain further linear operators
(see (6.42) and Application 6.97), and, on top of that, may even be multivalued (as
in Application 6.100 and in all examples in Sect. 6.3.3). Moreover, neither continuity
nor differentiability can be assumed, and hence classical numerical methods for the
solution of nonlinear equations, such as Newton’s methods, are of limited use.

6.4.1 Solving a Partial Differential Equation

A first very simple idea for solving Euler-Lagrange equations for some F is
motivated by the fact that these equations are often partial differential equations
of the form

− G x, u(x), ∇u(x), ∇ 2 u(x) = 0 in (6.79)

with G : × R × Rd × S d×d → R and respective boundary conditions on

∂. Since the right-hand side is zero, we can interpret solutions of the equation as
stationary points of the scale space associated to the differential operator G. Hence,
we introduce a scale parameter t ≥ 0 and consider

∂u(t, x)
= G x, u(t, x), ∇u(t, x), ∇ 2 u(t, x) in ]0, ∞[ × , u(0, x) = f (x) in
∂t
(6.80)

with arbitrary initial value f : → R and boundary conditions on [0, ∞[ × ∂.

If we assume that Eq. (6.80) is solvable and ∂u ∂t (t, · ) → 0 for t → ∞ (both in
some suitable sense), then the solution u(T , · ), for some T > 0 large enough,
is an approximation to the solution of (6.79). Equation (6.80) can be discretized
and solved numerically with some of the methods from Sect. 5.4, e.g., with finite-
difference approximations and a semi-implicit time stepping method. Then, one
iterates until the difference of two consecutive steps is small enough (which makes
sense, since this condition corresponds to a “small” discrete time derivative).
Example 6.130 (Variational Denoising) The Euler-Lagrange equation (6.40) of the
minimization problem

1 λ
min |u − u | dx +
0 q
|∇u|p dx
u∈Lq () q p
6.4 Numerical Methods 393

from Application 6.94 is a nonlinear elliptic equation. The instationary version with
initial value f = u0 reads
⎧
⎪ ∂u
⎪
⎪ − div |∇u|p−2 ∇u = |u0 − u|q−2 (u0 − u) in ]0, ∞[ × ,
⎪
⎨ ∂t
⎪ |∇u|p−2 ∇u · ν = 0 on ]0, ∞[ × ∂,
⎪
⎪
⎪
⎩
u(0, · ) = u0 in ,

and is a nonlinear diffusion equation. The methods from Sect. 5.4.1 allow a
numerical solution as follows. We choose a spatial stepsize h > 0, a time stepsize
τ > 0, denote by U n the discrete solution at time nτ (and with U 0 the discrete and
noisy data), and we discretize the diffusion coefficient |∇u|p−2 by
|∇U | + |∇U |i,j p−2 (Ui+1,j − Ui,j )2 (Ui,j +1 − Ui,j )2
i±1,j
A(U )i± 1 ,j = , |∇U |2i,j = + ,
2 2 h2 h2
and A(U )i,j ± 1 similarly and obtain, with the matrix A(U ) from (5.23), the
2
following semi-implicit method:
τ −1
U n+1 = id − 2 A(U n ) U n + τ |U 0 − U n |q−2 (U 0 − U n ) .
h
In every time step we have to solve a linear system of equations, which can be done
efficiently.
In the case p < 2, the entries in A can become arbitrarily large or even become
infinite. This leads to numerical problems. A simple workaround is the following
trick: we choose a “small” ε > 0 and replace |∇U | by

(Ui+1,j − Ui,j )2 (Ui,j +1 − Ui,j )2

|∇U |2i,j = 2
+ + ε2 .
h h2
This eliminates the singularities in A, and with p = 1 we can even approximate the
denoising with total variation penalty from Example 6.124. However, this approach
leads to reduced accuracy, and a “wrong” choice of ε may lead to results of bad
quality.
Example 6.131 (Variational Inpainting) In Application 6.98 we faced an Euler-
Lagrange equation that was a partial differential equation in , and its scale space
version is
⎧
⎪ ∂u
⎪
⎪
⎪ = div |∇u|p−2 ∇u in ]0, ∞[ × ,
⎨ ∂t
⎪ u = u0 on ]0, ∞[ × ∂ ,
⎪
⎪
⎪
⎩
u(0, · ) = u0 | in ,

where u0 ∈ H 1,p () on \ is the image to be extended.

394 6 Variational Methods

We obtain a numerical method as follows. We assume that the image data U 0

is given on a rectangular grid. By h we denote the set of discrete grid points in
. For simplicity we assume that h does not contain any boundary points of the
rectangular grid. We define the discrete boundary as

/ h {(i − 1, j ), (i + 1, j ), (i, j − 1), (i, j + 1)} ∩ h = ∅ .
(∂ )h = (i, j ) ∈

The discrete images U will be defined on h ∪ (∂ )h . For a given U we denote
by A(U )|h the restriction of the matrix A(U ) from Example 6.130 on h , i.e., the
matrix in which we eliminated the rows and columns that belong to the indices that
are not in h . A semi-implicit method is then given by the successive solution of
the linear system of equations
τ
idh − 2 A(U n )|h U n+1 |h = U n |h ,
h
id(∂ )h U n+1 |(∂ )h = U 0 |(∂ )h .

As in Example 6.130, we may need to replace the diffusion coefficient A(U ) by a

smoothed version in the case p < 2.
The above approach can also be applied for Euler-Lagrange equations that are
not only partial differential equations.
Example 6.132 (Variational Deblurring) If we transform the optimality condi-
tion (6.42) of the deconvolution problem from Application 6.97 into an instationary
equation and use, for simplicity, q = 2, we obtain the problem
⎧ ∂u
⎪
⎪ − div |∇u|p−2 ∇u = (u0 − u ∗ k) ∗ k̄ in ]0, ∞[ × ,
⎪
⎨ ∂t
⎪
⎪ |∇u|p−2 ∇u · ν = 0 on ]0, ∞[ × ∂,
⎪
⎩
u(0, · ) = f in .

The convolution introduced implicit integral terms that appear in addition to the
differential terms. Hence, we also need to discretize the convolution in addition to
the partial differential equation (which can be done similarly to Example 6.130).
From Sect. 3.3.3 we know how this can be done. We denote the matrix that
implements the convolution with k on discrete images by B, and its adjoint, i.e.,
the matrix for the convolution with k̄, by B∗ . The right-hand side of the equation is
affine linear in u, and we discretize it implicitly in the context of the semi-implicit
method, i.e.,

U n+1 − U n 1
− 2 A(U n )U n+1 = B∗ U 0 − B∗ BU n+1 .
τ h
6.4 Numerical Methods 395

We rearrange this into the iteration

τ −1
U n+1 = id +τ B∗ B − 2 A(U n ) (U n + τ B∗ U 0 ).
h
It is simple to derive that the linear system has a unique solution, i.e., the iteration
is well defined.
Remark 6.133 More abstractly, we note that the approach in this section realizes a
so-called gradient flow. Let X be a Hilbert space and F : X → R a continuously
differentiable functional. Then using the Riesz mapping, we define a continuous
“vector field” by u → JX−1 DF (u). Now we pose the problem to find some u :
[0, ∞[ → X that satisfies

∂u
= −JX−1 DF (u) for t > 0, u(0) = u0 ,
∂t

with some u0 ∈ X. We see that for every solution, one has

∂F ◦ u E ∂u F
(t) = DF u(t) , (t) = − DF u(t) , DF u(t) X∗
∂t ∂t
2
= − DF u(t) X∗ ≤ 0.

This shows that u(t) reduces the functional values with increasing t. This justifies
the use of gradient flows for minimization processes.
For real Hilbert spaces X we can generalize the notion of gradient flow to
subgradients. If F : X → R∞ is proper, convex, and lower semicontinuous, one
can show that for every u0 ∈ dom ∂F there exists a function u : [0, ∞[ → X that
solves, in some sense, the differential inclusion

∂u
− (t) ∈ ∂F u(t) for t > 0, u(0) = u0 .
∂t
The study of such problems is the subject of the theory of nonlinear semigroups and
monotone operators in Hilbert spaces, see, e.g., [21, 130].
The above examples show how one can use the well-developed theory of
numerics of partial differential equations to easily obtain numerical methods to
solve the optimality conditions of the variational problems. As we have seen in
Examples 6.130–6.132, sometimes some modifications are necessary in order to
avoid undesired numerical effects (cf. the case p < 2 in the diffusion equations).
The reason for this is the discontinuity of the differential operators or, more
abstractly, the discontinuity of the subdifferential. This problem occurs, in contrast
to linear maps, even in finite dimensions and leads to problems with numerical
methods.
396 6 Variational Methods

Hence, we are going to develop another approach that does not depend on the
evaluation of general subdifferentials.

6.4.2 Primal-Dual Methods

Again we consider the problem to minimize a proper, convex, and lower semicon-
tinuous functional F , now on a real Hilbert space X. In the following we identify,
via the Riesz map, X = X∗ . In particular, we will consider the subdifferential as a
graph in X × X, i.e., ∂F (u) is a subset of X.
Suppose that F is of the form F = F1 + F2 ◦ A, where F1 : X → R∞ is proper,
convex, and lower semicontinuous, A ∈ L(X, Y ) for some Banach space Y , and
F2 : Y → R is convex and continuously differentiable. Moreover, we assume that
F1 has a “simple” structure, which we explain in more detail later. Theorem 6.51
states that the Euler-Lagrange equation for the optimality of u∗ for the problem

min F1 (u) + F2 (Au) (6.81)

u∈X

is given by

0 ∈ ∂F1 (u∗ ) + A∗ DF2 (Au∗ ).

Now we reformulate this equivalently: for an arbitrary σ > 0 we have

0 ∈ ∂F1 (u∗ ) + A∗ DF2 (Au∗ )

⇔ −σ A∗ DF2 (Au∗ ) ∈ σ ∂F1 (u∗ )
⇔ u∗ − σ A∗ DF2 (Au∗ ) ∈ (id +σ ∂F1 )(u∗ )

⇔ u∗ ∈ (id +σ ∂F1 )−1 ◦ (id −σ A∗ ◦ DF2 ◦ A) (u∗ ).
(6.82)

Somewhat surprisingly, it turns out that the operation on the right-hand side of the
last formulation is single-valued.
Lemma 6.134 Let X be a real Hilbert space and F : X → R∞ proper, convex, and
lower semicontinuous. For every σ > 0, one has that (id +σ ∂F )−1 is characterized
by the mapping that maps u to the unique minimizer of

v − u2X
min + σ F (v). (6.83)
v∈X 2
Moreover, the map (id +σ ∂F )−1 is nonexpansive, i.e., for u1 , u2 ∈ X,

(id +σ ∂F )−1 (u1 ) − (id +λ∂F )−1 (u2 ) ≤ u1 − u2 X .
X
6.4 Numerical Methods 397

Proof For u ∈ H we consider the minimization problem (6.83). The objective func-
tional is proper, convex, lower semicontinuous, and coercive, and by Theorem 6.31
there exists a minimizer v ∗ ∈ X. By strict convexity of the norm, this minimizer is
X. By Theorems 6.43 and 6.51 (the norm is continuous in X), v ∈ X is a minimizer
if and only if
1 2

0∈∂ 2 · X ◦ T−u (v) + σ ∂F (v) ⇐⇒ 0 ∈ v − u + σ ∂F (v).

The latter is equivalent to v ∈ (id +σ ∂F )−1 (u), and by uniqueness of the minimizer
we obtain that v ∈ (id +σ ∂F )−1 (u) if and only if v = v ∗ . Is particular,
(id +σ ∂F )−1 (u) is single-valued.
To prove the inequality we first show monotonicity of ∂F (cf. Theorem 6.33 and
its proof). Let v 1 , v 2 ∈ X and w1 ∈ ∂F (v 1 ) as well as w2 ∈ ∂F (v 2 ). By the
respective subgradient inequalities we get

(w1 , v 2 − v 1 ) ≤ F (v 2 ) − F (v 1 ), (w2 , v 1 − v 2 ) ≤ F (v 1 ) − F (v 2 ).

Adding both inequalities leads to (w1 − w2 , v 2 − v 1 ) ≤ 0, and hence

(w1 − w2 , v 1 − v 2 ) ≥ 0.
Now let v i = (id +σ ∂F )−1 (ui ) for i = 1, 2. This means that ui = v i + σ wi
with wi ∈ ∂F (v i ) and the monotonicity of the subdifferential leads to

u1 − u2 2X = v 1 − v 2 2X + 2σ (v 1 − v 2 , w1 − w2 )X +σ 2 w1 − w2 2X ≥ v 1 − v 2 2X

≥0 ≥0

as desired. $
#
Remark 6.135 The mapping (id +σ ∂F )−1 is also called the resolvent of ∂F for
σ > 0.
Another name is the proximal mapping of σ F , and denoted by proxσ F but we
will stick to the resolvent notation in this book.
We revisit the result in (6.82) and observe that we have obtained an equivalent
formulation of the optimality condition for u∗ as a fixed point equation

u∗ = (id +σ ∂F1 )−1 ◦ (id −σ A∗ ◦ DF2 ◦ A) (u∗ ).

This leads to a numerical method immediately, namely to the fixed point iteration

un+1 = T (un ) = (id +σ ∂F1 )−1 ◦ (id −σ A∗ ◦ DF2 ◦ A) (un ). (6.84)

This method is known as “forward backward splitting”, see [46, 93]. It is a special
case of splitting methods, and depending on how the sum ∂F1 + A∗ DF2 A is split
up, one obtains different methods, e.g., the Douglas-Rachford splitting method or
the alternating direction method of multipliers; see [56].
398 6 Variational Methods

To justify the fixed point iteration, we assume that the iterates (un ) converge
to some u. By assumptions on A and F2 we see that T is continuous, and hence
we also have the convergence T (un ) → T (u). Since T (un ) = un+1 , we see that
T (u) = u, and this gives the optimality of u. Hence, the fixed point iteration is, in
case of convergence, a continuous numerical method for the minimization of sums
of the form F1 + F2 ◦ A with convex functionals under the additional assumption
that F2 is continuously differentiable.
Let us analyze the question whether and when (id +σ ∂F1 )−1 can be computed
by elementary operations in more detail. We cannot expect this to be possible for
general functionals, but for several functionals that are interesting in out context,
there are indeed formula. We begin with some elementary rules of calculus for
resolvents and look at some concrete examples.
Lemma 6.136 (Calculus for Resolvents) Let F1 : X → R∞ be a proper, convex,
and lower semicontinuous functional on the real Hilbert space X, with Y another
real Hilbert space and σ > 0.
1. For α ∈ R

F2 = F1 + α ⇒ (id +σ ∂F2 )−1 = (id +σ ∂F1 )−1 ,

i.e., if F2 (u) = F1 (u) + α, then (id +σ ∂F2 )−1 (u) = (id +∂σ F1 )−1 (u),
2. for τ, λ > 0

F2 = τ F1 ◦ λ id ⇒ (id +σ ∂F2 )−1 = λ−1 id ◦(id +σ τ λ2 ∂F1 )−1 ◦ λ id,

i.e., if F2 (u) = τ F1 (λu), then (id +σ ∂F2 )−1 (u) = λ−1 (id +σ τ λ2 ∂F1 )−1 (λu),
3. for u0 ∈ X, w0 ∈ X

F2 = F1 ◦ Tu0 + (w0 , · ) ⇒ (id +σ ∂F2 )−1 = T−u0 ◦ (id +σ ∂F1 )−1 ◦ Tu0 −σ w0 ,

i.e., if F2 (u) = F1 (u + u0 ) + (w0 , u), then (id +σ ∂F2 )−1 (u) =

(id +σ ∂F1 )−1 (u + u0 − σ w0 ) − u0 ,
4. for an isometric isomorphism A ∈ L(Y, X)

F2 = F1 ◦ A ⇒ (id +σ ∂F2 )−1 = A∗ ◦ (id +σ ∂F1 )−1 ◦ A,

i.e., if F2 (u) = F1 (Au), then (id +σ ∂F2 )−1 (u) = A∗ (id +σ ∂F1 )−1 (Au),
5. if F2 : Y → R∞ is proper, convex, and lower semicontinuous, then
: ;
−1 (id +σ ∂F1 )−1 (u)
F3 (u, w) = F1 (u) + F2 (w) ⇒ (id +σ ∂F3 ) (u, w) = .
(id +σ ∂F2 )−1 (w)

Proof Assertions 1–3: The proof of the identities consists of obvious and elemen-
tary steps, which we omit here.
6.4 Numerical Methods 399

Assertion 4: We reformulate the problem (6.83) for the calculation of the

resolvent of ∂F2 equivalently as

v − u2X
v∗ solves min + σ F1 (Av)
v∈X 2
A(v − u)2Y
⇔ v∗ solves min ∗ + σ F1 (Av)
v∈rg(A ) 2
w − Au2Y
⇔ Av ∗ solves min + σ F1 (w).
w∈Y 2

Here we used both the bijectivity of A and A∗ A = id. The last formulation is
equivalent to v ∗ = A∗ (id +σ ∂F1 )−1 (Au), and thus
−1
id +σ ∂F2 = A∗ ◦ (id +σ ∂F1 )−1 ◦ A.

Assertion 5: Similar to Assertion 4 we write

v − u2X ω − w2Y
(v ∗ , ω∗ ) solves min + σ F1 (v) + + σ F2 (ω)
v∈X 2 2
ω∈Y
⎧
⎪
⎪ ∗ v − u2X
⎪
⎨v solves min + σ F1 (v),
v∈X 2
⇔
⎪
⎪ ω − w2X
⎪
⎩ ω∗ solves min + σ F2 (ω).
ω∈Y 2

The first minimization problem is solved only by v ∗ = (id +σ ∂F1 )−1 (u), and the
second only by ω∗ = (id +σ ∂F2 )−1 (w), i.e.,

−1 (id +σ ∂F1 )−1 (u)
(id +σ ∂F3 ) (u, w) =
(id +σ ∂F2 )−1 (w),

as desired. $
#
Example 6.137 (Resolvent Maps)
1. Functionals in R
For F : R → R∞ proper, convex, and lower semicontinuous, dom ∂F
has to be an interval (open, half open, or closed), and every ∂F (t) is a closed
interval which we denote by [G− (t), G+ (t)] (where the values ±∞ are explicitly
allowed, but then are excluded from the interval). The functions G− , G+ are
monotonically increasing in the sense that G+ (s) ≤ G− (t) for s < t.
400 6 Variational Methods

Often it is possible to approximate (id +σ ∂F )−1 (t) numerically. Suppose that

we know an initial interval in which the solution lies, i.e., we know s − < s + with

s + + σ G+ (s − ) < t < s + + σ G− (s + ). (6.85)

Then we can perform bisection as follows:

• Choose s = 12 (s − + s + ) and check whether t ∈ s + σ [G− (s), G+ (s)]: in this
case set s = (id +σ ∂F )−1 (t).
• In the case t < s + σ G− (s) replace s + by s.
• In the case t > s + σ G+ (s) replace s − by s.
One immediately sees that, in the case the solution has not been found, the new
bounds s − , s + again satisfy (6.85). Hence, one can simply iterate until a desired
precision has been reached.
Moreover, we remark that for continuously differentiable F the calculation of
the resolvent reduces to solving

s + σ F (s) = t.

If we have an approximate solution s0 , in which F is differentiable, we can use

Newton’s method to obtain a better approximation:
t − s0 − σ F (s0 )
s = s0 + .
1 + σ F (s0 )
If F is well behaved enough and s0 is close enough to the solution, then the
iteration converges faster than bisections and this method should be preferred.
However, the prerequisites may not always be satisfied.
2. Norm functionals
Let F be of the form F (u) = ϕ uX with ϕ : [0, ∞[ → R∞ proper, con-
vex, lower semicontinuous, and increasing. Then we can express (id +σ ∂F )−1
in terms of the resolvent of ∂ϕ: we observe that v = (id +σ ∂F )−1 (u) holds if
and only if
1
u ∈ v + σ ∂F (v) = ∂ 2 · X
2
+ σ F (v) = ∂ 12 | · |2 + σ ϕ ◦ · X (v)

= {w ∈ X (w, v)X = wX vX , wX ∈ (id +σ ∂ϕ)(vX )}

(cf. Example 6.49). Moreover, (u, v)X = uX vX if and only if v = λu for
some λ ≥ 0, and hence the properties that define the set in the last equation are
equivalent to vX = (id +σ ∂ϕ)−1 (uX ) and v = λu for a λ ≥ 0, which in
turn is equivalent to

(id +σ ∂ϕ)−1 (uX ) u
u
if u = 0,
v= X
0 if u = 0.
6.4 Numerical Methods 401

By the monotonicity of ϕ, we have (id +σ ∂ϕ)−1 (0) = 0, and we can write the
resolvent, using the notation 00 = 1, as

−1 u
id +σ ∂(ϕ ◦ · X ) (u) = (id +σ ∂ϕ)−1 uX .
uX

3. Linear quadratic functionals

Let Q : X → X be a linear, continuous, self-adjoint, and positive semidefinite
map, i.e., Q ∈ L(X, X) with (Qu, v) = (u, Qv) for all u, v ∈ X and (Qu, u) ≥
0 for all u. Moreover, let w ∈ X. We consider the functional

(Qu, u)X
F (u) = + (w, u)X ,
2
which is differentiable with derivative DF (u) = Qu + w. By the assumption
on Q,

(Qu, u)X
F (u) + (DF (u), v − u)X = + (w, u)X + (Qu + w, v − u)X
2
(Qu, u)X (Qv, v)X (Qv, v)X
=− + (Qu + w, v)X − +
2 2 2
(Q(u − v), (u − v))X (Qv, v)X
=− + + (w, v)X
2 2
≤ F (v)

for all u, v ∈ X, and this implies, using Theorem 6.33, the convexity of F . For
given σ and u ∈ X we would like to calculate the resolvent (id +σ ∂F )−1 (u).
One has u = v + σ Qv + σ w if and only if v = (id +σ Q)−1 (u − σ w), and hence

(id +σ ∂F )−1 (u) = (id +σ Q)−1 (u − σ w).

Thus in finite-dimensional spaces, the resolvent is a translation followed by the

solution of a linear system of equations. It is easy to see that id +σ Q corresponds
to a positive definite matrix, and hence one can use, for example, the method of
conjugate gradients to compute (id +σ Q)−1 (u − σ w), and one may even use a
preconditioner; cf. [66].
4. Indicator functionals
Let K ⊂ X be nonempty, convex, and closed, and let F = IK be the
respective indicator functional. The resolvent in u ∈ X amounts to the solution
of problem (6.83), which is equivalent to the projection

v − u2X
min .
v∈K 2
402 6 Variational Methods

Hence,

(id +σ ∂IK )−1 (u) = PK (u),

and in particular, the resolvent does not depend on σ . Without further information
on K we cannot say how the projection can be calculated, but in several special
cases, simple ways to do so exist.
(a) Let K be a nonempty closed interval in R. Depending on the boundedness,
we get

K = [a, ∞[ ⇒ (id +σ ∂IK )−1 (t) = max(a, t),

K = ]−∞, b] ⇒ (id +σ ∂IK )−1 (t) = min(b, t),

K = [a, b] ⇒ (id +σ ∂IK )−1 (t) = max a, min(b, t) .

(b) Let K be a closed subspace. With a complete

orthonormal system V of K we
can express the projection as PK (u) = v∈V (v, u)v. The formula becomes
simpler if K is separable, since then V is at most countable and

K
dim
PK (u) = (v n , u)X v n .
n=1

If one has only some basis {v 1 , . . . , v N } for a subspace K with dim K < ∞
and Gram-Schmidt orthonormalization would be too expensive, one could
still calculate the projection PK (u) by solving the linear system of equations

x ∈ RN : Mx = b, Mi,j = (v i , v j )X , bj = (u, v j )X

and setting PK (u) = N i
i=1 xi v . By definition, M is positive definite and
hence invertible. However, depending on the basis {v i }, it may be badly
conditioned.
5. Convex integrands and summands
Let X = L2 (, RN ) be the Lebesgue-Hilbert space with respect to a
measure space (, F, μ) and let ϕ : RN → R∞ be proper, convex, and lower
semicontinuous with known resolvent (id +σ ∂ϕ)−1 . Moreover, assume ϕ ≥ 0
and ϕ(0) = 0 if has infinite measure, and ϕ bounded from below otherwise
(cf. Examples 6.23 and 6.29).
The minimization problem (6.83) corresponding to the resolvent of the
subgradient of

F (u) = ϕ u(x) dx

6.4 Numerical Methods 403

reads

|v(x) − u(x)|2
min + σ ϕ v(x) dx,
v∈L2 (,RN ) 2

and by Example 6.50 we can express the subdifferential pointwise almost

everywhere. Hence, we have v ∗ = (id +σ ∂F )−1 (u) if and only if

0 ∈ v ∗ (x) − u(x) + σ ∂ϕ v ∗ (x) for almost all x ∈ ,

which in turn is equivalent to

v ∗ (x) = (id +σ ∂ϕ)−1 u(x) for almost all x ∈ .

Thus, the resolvent satisfies the identity

(id +σ ∂F )−1 (u) = (id +σ ∂ϕ)−1 ◦ u.

In the special case of finite sums, we have an even more general result: for a given
family ϕ1 , . . . , ϕM : RN → R∞ of proper, convex, and lower semicontinuous
functionals one gets for u : {1, . . . , M} → RN ,

M
F (u) = ϕj (uj ) ⇒ (I + σ ∂F )−1 (u)j = (I + σ ∂ϕj )−1 (uj ).
j =1

Applying the rules of the calculus of resolvents and using the previous examples,
we obtain a fairly large class of functionals for which we can evaluate the resolvent
by elementary means and which can be used for practical numerical methods. One
important case is the pth power of the Lp norms.
p
Example 6.138 (Resolvent of ∂ p1 · p ) For some measure space (, F, μ) we
consider X = L2 (, RN ) and the functionals

1
F (u) = u(x)p dx
p

for p ∈ [1, ∞[ and

F (u) = I{| · |≤1} u(x) dx

for p = ∞, respectively. We aim to calculate the resolvent of the subgradient σ ∂F

for all σ > 0. By item 5 in Example 6.137 we restrict our attention to the resolvent of
the subgradient of x → p1 |x|p in RN . This functional depends only on the Euclidean
404 6 Variational Methods

norm | · |, and by item 2 in Example 6.137 we have only to determine the resolvent
of ∂ϕp with
⎧
⎨ 1 tp if t ≥ 0,
ϕp (t) = p
for p < ∞, ϕ∞ (t) = I]−∞,1] (t) for p = ∞,
⎩0 otherwise,

as a function on R, and moreover, this only for positive arguments. Let us discuss
this for fixed t ≥ 0 and varying p: In the case p = 1 we can reformulate s =
(id +σ ∂ϕ)−1 (t) by definition as

[0, σ ] if s = 0
t∈
{s + σ } otherwise.

It is easy to see that this is equivalent to s = max(0, t − σ ). In the case p ∈ ]1, ∞[

we have that s = (id +σ ∂ϕ)−1 (t) is equivalent to solving

t = s + σ s p−1 ,

for s. For p = 2 this is the same as s = t/(1 + σ ) and in all other cases, this is
a nonlinear equation, but for t = 0 we have that s = 0 is always a solution. If
p > 2 is an integer, the problem is equivalent to finding a root of a polynomial of
degree p − 1. As is known, this problem can be solved in closed form using roots in
the cases p = 3, 4, 5 (with the quadratic formula, Cardano’s method, and Ferrari’s
method, respectively). By the substitution s p−1 = ξ we can extend this method to
the range p ∈ ]1, 2[; in this way, we can also treat the cases p = 32 , 43 , 54 exactly.
For all other p ∈ ]1, ∞[ we resort to a numerical method. For t > 0 we can employ
Newton’s method, for example, and get the iteration

p−1
t − sn − σ sn
sn+1 = sn + p−2
.
1 + σ (p − 1)sn

The iterates are decreasing and converge, giving high precision after a few iterations,
if of initializes the iteration with
t 1
p−1
s0 ≥ t and s0 < if p < 2;
σ (2 − p)

see Exercise 6.41. The only case remaining is p = ∞, but this is treated already in
item 4 of Example 6.137 and gives s = min(1, t).
6.4 Numerical Methods 405

In conclusion, we can calculate the resolvent of ∂F at a point u ∈ L2 (, RN )

as follows (using Newton’s method for all p ∈ ]1, ∞[ not explicitly mentioned
otherwise):
• The case p = 1

u
F (u) = |u(x)| dx ⇒ (id +σ ∂F )−1 (u) = max(0, |u| − σ ) .
|u|

This is exactly the so-called soft thresholding.

• The case p = 2

1 u
F (u) = |u(x)|2 dx ⇒ (id +σ ∂F )−1 (u) = .
2 1+σ

• The case p = ∞
u u
F (u) = I{v∞ ≤1} (u) ⇒ (id +σ ∂F )−1 (u) = min(1, |u|) = .
|u| max(1, |u|)

Hence, the resolvent is the pointwise projection onto the closed unit ball in RN .
• The case 1 < p < ∞
Newton’s method amounts to the following procedure:
1. Set v = |u| and choose some v 0 ∈ L2 () with
⎧ 0
⎨ v (x) ≥ v(x)
⎪ almost everywhere in ,
v(x) 1
⎪
⎩ v 0 (x) < p−1
almost everywhere in {v(x) = 0} if p < 2.
σ (2 − p)

2. Iterate

v − v n − σ |v n |p−1
v n+1 = v n + .
1 + σ (p − 1)|v n |p−2

The sequence convergences monotonically decreasing and pointwise almost

everywhere to some v ∗ ∈ L2 (), and by Lebesgue’s dominated convergence
theorem (Theorem 2.47) also in L2 ().
3. With this scalar-valued limit function v ∗ = (id +σ | · |p−1 )−1 ◦ u, one has

1 u
(id +σ ∂F )−1 (u) = v ∗
p
F = · p ⇒ .
p |u|

We continue our discussion of suitable optimization methods. If the assumption

of continuous differentiability that were needed in the derivation of the fixed point
iteration (6.84) are fulfilled, then this fixed point iteration is a suitable method.
406 6 Variational Methods

Unfortunately, the assumption of continuous differentiability of one term is often

not satisfied for problems in imaging. We consider the simple example of Tikhonov
functionals (cf. Example 6.32)
p q
λuX v − u0 Y
min F1 (u) + F2 (Au), F1 (u) = , F2 (v) = .
u∈X p q

Then F2 may be continuous but not continuously differentiable. Hence, we aim for
a method that can treat more general functionals F2 .
At this point, Fenchel duality comes in handy. We assume that F2 is the Fenchel
conjugate with respect to a Hilbert space Y , i.e., F2 = F2∗∗ in Y . In the following
we identify the spaces Y = Y ∗ also for conjugation, i.e.,

w∈Y : F2∗ (w) = sup (w, v)Y −F2 (v), v∈Y : F2 (v) = sup (v, w)Y −F2∗ (w),
v∈Y w∈Y

and similarly for conjugation in X. We postulate that also F2∗ is “simple enough,”
but do not make assumptions on continuity or differentiability (note that lower
semicontinuity is already implied, though).
In the following we also need that the conclusion of Fenchel-Rockafellar duality
(see Theorem 6.68 for sufficient conditions) holds:

max −F1∗ (−A∗ w) − F2∗ (w) = min F1 (u) + F2 (Au). (6.86)

w∈Y u∈X

By Remark 6.72 we know that a simultaneous solution of the primal-dual problem

is equivalent to finding a saddle point of the Lagrange functional L : dom F1 ×
dom F2∗ → R, which is defined by

L(u, w) = (w, Au)Y + F1 (u) − F2∗ (w). (6.87)

Recall that (u∗ , w∗ ) ∈ dom F1 × dom F2∗ is a saddle point of L if and only if for all
(u, w) ∈ dom F1 × dom F2∗ ,

L(u∗ , w) ≤ L(u∗ , w∗ ) ≤ L(u, w∗ ).

For the Lagrange functional we define for every pair (u0 , w0 ) ∈ dom F1 × dom F2∗
the restrictions Lw0 : X → R∞ , Lu0 : Y → R ∪ {−∞} by

L(u, w 0 ) if u ∈ dom F1 , L(u0 , w) if w ∈ dom F2∗ ,
Lw 0 (u) = Lu0 (w) =
∞ otherwise, −∞ otherwise.

It is simple to see, using the notions of Definition 6.60 and the result of Lemma 6.57,
that Lw0 ∈ 0 (X) and −Lu0 ∈ 0 (Y ). Hence by Lemma 6.134 the resolvents
−1
(id +σ ∂Lw0 )−1 and id +τ ∂(−Lu0 ) exist, and with the help of Lemma 6.136
6.4 Numerical Methods 407

one easily checks that

(id +σ ∂Lw0 )−1 (u) = (id +σ ∂F1 )−1 (u − σ A∗ w0 ),

−1
id +τ ∂(−Lu0 ) (w) = (id +τ ∂F2∗ )−1 (w + τ Au0 ).

Thus, the property of (u∗ , w∗ ) ∈ dom F1 × dom F2∗ being a saddle point of L is
equivalent to

u∗ solves min Lw∗ and w∗ solves min (−Lu∗ ).

u∈X w∈Y

The optimality conditions from Theorem 6.43 allow, for arbitrary σ, τ > 0, the
following equivalent formulations:

(u∗ , w∗ ) saddle point ⇔ 0 ∈ ∂Lw∗ (u∗ ), 0 ∈ ∂(−Lu∗ )(w∗ )
∗
u ∈ u∗ + σ ∂Lw∗ (u∗ ),
⇔
w∗ ∈ w∗ + τ ∂(−Lu∗ )(w∗ )
⎧ ∗
⎪
⎪ u = (id +σ ∂Lw̄∗ )−1 (u∗ ),
⎨
−1
⇔ w∗ = id +τ ∂(−Lū∗ ) (w∗ ),
⎪
⎪
⎩ ∗
ū = u∗ , w̄∗ = w∗ .

The pair (ū∗ , w̄∗ ) ∈ X × Y has been introduced artificially and will denote the
points at which we take the resolvents of ∂Lw̄∗ and ∂(−Lū∗ ). Hence, we have again
a formulation of optimality in terms of a fixed point equation: the saddle points
(u∗ , w∗ ) ∈ dom F1 × dom F2 ∗ are exactly the elements of X × Y that satisfy the
coupled fixed point equations
⎧ ∗
⎪
⎪ u = (id +σ ∂F1 )−1 (u∗ − σ A∗ w̄∗ ),
⎨
w∗ = (id +τ ∂F2∗ )−1 (w∗ + τ Aū∗ ), (6.88)
⎪
⎪
⎩ ∗
ū = u∗ , w̄∗ = w∗ .

We obtain a numerical method by fixed point iteration: to realize the right hand side,
we need only the resolvents of ∂F1 and ∂F2∗ and the application of A and its adjoint
A∗ . Since the resolvents are nonexpansive (cf. Lemma 6.134) and A and A∗ are
continuous, the iteration is even Lipschitz continuous. We recall that no assumptions
on differentiability of F2 have been made.
From Eq. (6.88) we can derive a number of numerical methods, and we give a
few (see also [7]).
408 6 Variational Methods

• Explicit Arrow-Hurwicz method

Here we set ūn = un , w̄n = un and iterate the joint resolvent in (6.88), i.e.,

un+1 = (id +σ ∂F1 )−1 (un − σ A∗ wn ),
wn+1 = (id +τ ∂F2∗ )−1 (wn + τ Aun ).

• Semi-implicit Arrow-Hurwicz method

This method differs from the explicit method in that we use the “new” primal
vector un+1 to calculate wn+1 :
n+1
u = (id +σ ∂F1 )−1 (un − σ A∗ wn ),
wn+1 = (id +τ ∂F2∗ )−1 (wn + τ Aun+1 ).

In practice one can save some memory, since we can overwrite un directly with
un+1 . Moreover, we note that one can interchange the roles of u and w, of course.
• Modified Arrow-Hurwicz method/Extra gradient method
The idea of the modified Arrow-Hurwicz method is not to use ūn = un and
w̄ = wn but to carry these on and update them with explicit Arrow-Hurwicz
n

steps [113]:
⎧ n+1
⎪
⎪ u = (id +σ ∂F1 )−1 (un − σ A∗ w̄n ),
⎪
⎪
⎪
⎪
⎨ wn+1 = (id +τ ∂F2∗ )−1 (wn + τ Aūn ),
⎪
⎪ ūn+1 = (id +σ ∂F1 )−1 (un+1 − σ A∗ w̄n ),
⎪
⎪
⎪
⎪
⎩ n+1
w̄ = (id +τ ∂F2∗ )−1 (wn+1 + τ Aūn ).

The earlier extra gradient method, proposed in [89], uses a similar idea, but
differs in that in the calculations of ūn+1 and w̄n+1 we evaluate the operators
A∗ and A at wn+1 and un+1 , respectively.
• Arrow-Hurwicz method with linear primal extra gradient/Chambolle-Pock
method
A method that seems to work well for imaging problems, proposed in [34], is
based on the semi-implicit Arrow-Hurwicz method with a primal “extra gradient
sequence” (ūn ). Instead of an Arrow-Hurwicz step, one uses a well-chosen linear
combination (based on ū∗ = 2u∗ − u∗ ).
⎧ n+1
⎪ w = (id +τ ∂F2∗ )−1 (wn + τ Aūn ),
⎪
⎨
un+1 = (id +σ ∂F1 )−1 (un − σ A∗ wn+1 ), (6.89)
⎪
⎪
⎩ n+1
ū = 2un+1 − un .
6.4 Numerical Methods 409

Note that only the primal variable uses an “extra gradient,” and that the order in
which we update the primal and dual variable is important, i.e., we have to update
the dual variable first. Of course we can swap the roles of the primal and dual
variables, and in this case we could speak of a dual extra gradient. The memory
requirements are comparable to those in the explicit Arrow-Hurwicz method.
For all the above methods there are conditions on the steps sized σ and τ
that guarantee convergence in some sense. In comparison to the classical Arrow-
Hurwicz method, the modified Arrow-Hurwicz method and the extra gradient
method need weaker conditions to ensure convergence, and hence they are of
practical interest, despite the slightly higher memory requirements. For details we
refer to the original papers and treat only the case of convergence of the Arrow-
Hurwicz method with linear primal extra gradient. We follow the presentation
of [34] and first derive an estimate for one Arrow-Hurwicz step with fixed (ū, w̄).
Lemma 6.139 Let X, Y be real Hilbert spaces, A ∈ L(X, Y ), F1 ∈ 0 (X), F2∗ ∈
0 (Y ), and σ, τ > 0. Moreover, let Z = X × Y , and for elements z = (u, w),
z̄ = (ū, w̄) in X × Y , let

(u, ū)X (w, w̄)Y u2X w2Y

(z, z̄)Z = + , z2Z = + .
σ τ σ τ

If z̄, zn , zn+1 ∈ Z with z̄ = (ū, w̄), zn = (un , wn ), and zn+1 = (un+1 , wn+1 ), and
the equations

un+1 = (id +σ ∂F1 )−1 (un − σ A∗ w̄)
(6.90)
wn+1 = (id +τ ∂F2∗ )−1 (wn + τ Aū),

are satisfied, then for every z = (u, w), u ∈ dom F1 , w ∈ dom F2∗ , one has the
estimate

zn − zn+1 2Z

+ (wn+1 − w, A(un+1 − ū))Y − (wn+1 − w̄, A(un+1 − u))Y
2
z − zn 2Z z − zn+1 2Z
+ L(un+1 , w) − L(u, wn+1 ) ≤ − , (6.91)
2 2
where L denotes the Lagrange functional (6.87).
Proof First we note that the equation for un+1 means nothing other than un −
σ A∗ w̄ − un+1 ∈ σ ∂F1 (un+1 ). The respective subgradient inequality in u leads to

σ F1 (un+1 ) + (un − σ A∗ w̄ − un+1 , u − un+1 )X ≤ σ F1 (u),

410 6 Variational Methods

which we rearrange to

(un − un+1 , u − un+1 )X

≤ F1 (u) − F1 (un+1 ) + (w̄, A(u − un+1 ))Y .
σ

Proceeding similarly for wn+1 , we obtain the inequality

(wn − wn+1 , w − wn+1 )Y

≤ F2∗ (w) − F2∗ (wn+1 ) − (w − wn+1 , Aū)Y .
τ

Adding the right-hand sides, using the definition of L, and inserting ±(wn+1 ,
Aun+1 )Y , we obtain

F1 (u) − F2∗ (w n+1 ) + F2∗ (w) − F1 (un+1 ) + (w̄, A(u − un+1 ))Y − (w − w n+1 , Aū)Y

= L(u, w n+1 ) − L(un+1 , w) + (w̄, A(u − un+1 ))Y − (w n+1 , Au)Y

− (w − w n+1 , Aū)Y + (w, Aun+1 )Y

= L(u, w n+1 ) − L(un+1 , w) + (w̄, A(u − un+1 ))Y − (w n+1 , Au)Y + (w n+1 , Aun+1 )Y

− (w − w n+1 , Aū)Y + (w, Aun+1 )Y − (w n+1 , Aun+1 )Y

= L(u, w n+1 ) − L(un+1 , w) + (w̄ − w n+1 , A(u − un+1 ))Y

− (w − w n+1 , A(ū − un+1 ))Y . (6.92)

Adding the respective left-hand sides, we obtain the scalar product in Z, which we
reformulate as

zn − zn+1 2Z z − zn+1 2Z

(zn − zn+1 , z − zn+1 )Z = − + (zn − zn+1 , z − zn+1 )Z −
2 2
zn − zn+1 2Z z − zn+1 2Z
+ +
2 2
zn − zn+1 2Z z − zn+1 2Z z − zn 2Z
= + − .
2 2 2

Moving 12 z − zn+1 2Z − 12 z − zn 2Z to the right-hand side and all terms in (6.92)
to the left-hand side, we obtain the desired inequality. $
#
Next we remark that the iteration (6.89) can be represented by (6.90): the choice
ū = 2un − un−1 and w̄ = wn+1 with u−1 = u0 is obviously the correct one,
given that we initialize with ū0 = u0 , which we assume in the following. Hence, we
can use the estimate (6.91) to analyze the convergence. To that end, we analyze the
scalar products in the estimate and aim to estimate them from below. Ultimately we
would like to estimate in a way such that we can combine them with the norms in
the expression.
6.4 Numerical Methods 411

Lemma 6.140 In the situation of Lemma 6.139 let (zn ) be a sequence in Z = X×Y
with components zn = (un , wn ) and u−1 = u0 . If σ τ A2 ≤ 1, then for all w ∈ Y
and M, N ∈ N with M ≤ N, one has

N−1
wN − w2Y
− wn+1 − w, A(un+1 − 2un + un−1 ) Y
≤ δM (w) + σ τ A2
2τ
n=M

uM − uM−1 2X √ N−2

zn+1 − zn 2 zN − zN−1 2
+ + σ τ A Z
+ Z
,
2σ 2 2
n=M
(6.93)

where δn (w) = (wn − w, A(un − un−1 ))Y .

Proof First we fix n ∈ N. Then by definition of δn (w) and the Cauchy-Schwarz
inequality, we obtain

−(wn+1 − w, A(un+1 − 2un + un−1 ))Y

= (wn+1 − w, A(un − un−1 ))Y − (wn+1 − w, A(un+1 − un ))Y
= δn (w) − δn+1 (w) + (wn+1 − wn , A(un − un−1 ))Y
≤ δn (w) − δn+1 (w) + Awn+1 − wn Y un − un−1 X .
√
Young’s inequality for numbers ab ≤ λa 2 /2 + b2 /(2λ) applied with λ = σ/τ
leads to

√ un − un−1 2 wn+1 − wn 2Y

Awn+1 − wn Y un − un−1 X ≤ σ τ A X
+ .
2σ 2τ

Summing over n, we get, using the condition that σ τ A2 ≤ 1,

N −1
− (wn+1 − w, A(un+1 − 2un + un−1 ))Y
n=M
−1
wn+1 − wn 2Y
N
√ un − un−1 2X
≤ δM (w) − δN (w) + σ τ A +
2σ 2τ
n=M

uM − uM−1 2X

≤ δM (w) − δN (w) +
2σ
√ N
−2
zn+1 − zn 2Z wN − wN −1 2Y
+ σ τ A + .
2 2τ
n=M
412 6 Variational Methods

We estimate the value −δN (w) also with the Cauchy-Schwarz inequality, the
operator norm, and Young’s inequality, this time with λ = σ to get

w N − w2Y uN − uN −1 2X

−δN (w) ≤ Aw N − wY uN − uN −1 X ≤ σ τ A2 + .
2τ 2σ

This proves the claim. $

#
Together, the inequalities (6.91) and (6.93) are the essential ingredients to prove
convergence in finite-dimensional spaces.
Theorem 6.141 (Convergence of the Arrow-Hurwicz Method with Primal
Extra Gradient) Let X, Y be finite-dimensional real Hilbert spaces, A ∈
L(X, Y ), F1 ∈ 0 (X), F2∗ ∈ 0 (Y ) and let σ, τ > 0 be step sizes that satisfy
σ τ A2 < 1. Moreover, assume that the Lagrange functional L from (6.87) has a
saddle point.
Then the iteration
⎧ n+1
⎪ w = (id +τ ∂F2∗ )−1 (wn + τ Aūn ),
⎪
⎨
un+1 = (id +σ ∂F1 )−1 (un − σ A∗ wn+1 ),
⎪
⎪
⎩ n+1
ū = 2un+1 − un ,

converges for arbitrary initial values (u0 , w0 ) ∈ X × Y , ū0 = u0 to a saddle point

(u∗ , w∗ ) ∈ X × Y of L.
Proof Let z ∈ Z = X × Y with components z = (u, w). For fixed n let ū = 2un −
un−1 (with u−1 = u0 ), w̄ = wn+1 , and apply Lemma 6.139. The estimate (6.91)
becomes

zn+1 − zn 2Z n+1

+ w − w, A(un+1 − 2un + un−1 ) Y
2
z − zn 2Z z − zn+1 2Z
+ L(un+1 , w) − L(u, wn+1 ) ≤ − . (6.94)
2 2
For some N ∈ N we sum from 0 to N − 1 and use (6.93) from Lemma 6.140 to
estimate from below (noting that δ0 (w) = 0 and u0 − u−1 X = 0) to get

N−1
zn+1 − zn 2 wN − w2Y √ zn+1 − zn 2
N−2
Z
−σ τ A2 − σ τ A Z
2 2τ 2
n=0 n=0

zN − zN−1 2Z

N−1
z − z0 2Z z − zN 2Z
− + L(un+1 , w)−L(u, wn+1 ) ≤ − .
2 2 2
n=0
6.4 Numerical Methods 413

Rearranging, using the definition of the norm in Z and −σ τ A2 ≤ 0, leads to

√ zn+1 − zn 2
N−2 z − zN 2Z
(1 − σ τ A) Z
+ (1 − σ τ A2 )
2 2
n=0

N−1
z − z0 2Z
+ L(un+1 , w) − L(u, wn+1 ) ≤ .
2
n=0

If we plug in an arbitrary saddle point z̃ = (ũ, w̃) of L (which exists by assumption),

we obtain that L(un+1 , w̃) − L(ũ, wn+1 ) ≥ 0 for all n. This means that all three
terms on the left-hand side are nonnegative, and the right-hand side is independent
of N, which implies convergence and boundedness, respectively, i.e.,
∞
zn − zn+1 2 z̃ − z0 2Z z̃ − zN 2Z z̃ − z0 2Z
Z
≤ √ , ≤ .
2 2(1 − σ τ A) 2 2(1 − σ τ A2 )
n=0

The first estimate implies limn→∞ zn+1 − zn 2Z = 0. Consequently, the sequence
(zn ) is bounded, and by the finite dimensionality of Z there exists a convergent
subsequence (znk ) with limk→∞ znk = z∗ for some z∗ = (u∗ , w∗ ) ∈ Z.
Moreover, we have convergence of neighboring subsequences limk→∞ znk +1 =
limk→∞ znk −1 = z∗ , and also, by continuity of A, A∗ , (id +σ ∂F1 )−1 and
(id +τ ∂F2∗ )−1 (see Lemma 6.134), we conclude that

ū∗ = lim ūnk = lim 2unk − unk −1 = u∗ ,

k→∞ k→∞

w = lim (id +τ ∂F2∗ )−1 (wnk + τ Aūnk ) = (id +τ ∂F2∗ )−1 (w∗ + τ Au∗ ),
∗
k→∞

u = lim (id +σ ∂F1 )−1 (unk − σ A∗ wnk +1 ) = (id +σ ∂F1 )−1 (u∗ − σ A∗ w∗ ).
∗
k→∞

Thus, the pair (u∗ , w∗ ) satisfies Eq. (6.88), and this shows that it is indeed a saddle
point of L.
It remains to show that the whole sequence (zn ) converges to z∗ . To that end fix
k ∈ N and N ≥ nk + 1. Summing (6.94) from M = nk to N − 1 and repeating the
above steps, we get

√ zn+1 − zn 2
N−2 z − zN 2Z
(1 − σ τ A) Z
+ (1 − σ τ A2 )
n=n
2 2
k

N−1
unk − unk −1 2X z − znk 2Z
+ L(un+1 , w) − L(u, wn+1 ) ≤ δnk (w) + + .
n=nk
2σ 2
414 6 Variational Methods

Plugging in z∗ , using that σ τ A2 < 1 and that (u∗ , w∗ ) is a saddle point, we
arrive at

2σ δnk (w∗ ) + unk − unk −1 2X + σ z∗ − znk 2Z

z∗ − zN 2Z ≤ .
σ (1 − σ τ A2 )

Obviously, limk→∞ δnk (w∗ ) = (w∗ − w∗ , A(u∗ − u∗ ))Y = 0, and hence the right-
hand side converges to 0 for k → ∞. In particular, for every ε > 0 there exists k
such that the right-hand side is smaller than ε2 . This means, that for all N ≥ nk ,

z∗ − zN 2Z ≤ ε2 ,

and we can conclude that limN→∞ zN = z∗ , which was to be shown. $

#
Unfortunately, the proof of convergence does not reveal anything about the speed
of convergence of (un , wn ), and hence, it is difficult to decide when to stop the
iteration. There are several approaches to solving this problem, one of which is
based on the duality gap. If (ũ, w̃) is a saddle point of L, then

inf L(u, wn ) ≤ L(ũ, wn ) ≤ L(ũ, w̃) ≤ L(un , w̃) ≤ sup L(un , w).
u∈X w∈Y

The infimum on the left-hand side is exactly the dual objective value
−F1∗ (−A∗ wn ) − F2∗ (wn ), while the supremum on the right-hand side is the primal
objective value F1 (un ) + F2 (Aun ). The difference is always nonnegative and
vanishes exactly at the saddle points of L. Hence, one defines the duality gap
G : X × Y → R∞ as follows:

G(u, w) = F1 (u) + F2 (Au) + F1∗ (−A∗ w) + F2∗ (w). (6.95)

This also shows that G is proper, convex, and lower semicontinuous. In particular,

G(un , wn ) ≥ F1 (un ) + F2 (Aun ) − min F1 (u) + F2 (Au) ,
u∈X

G(u , w ) ≥ max
n n
−F1∗ (−A∗ w) − F2∗ (w) − − F1∗ (−A∗ wn ) − F2∗ (wn ) .
w∈Y
(6.96)

Hence, a small duality gap implies that the differences between the functional
values of un and wn and the respective optima of the primal and dual problems,
respectively, are also small. Hence, the condition G(un , wn ) < ε for some given
tolerance ε > 0 is a suitable criterion to terminate the iteration.
If (un , wn ) converges to a saddle point (u∗ , w∗ ), then the lower semicontinuity
G gives only 0 ≤ lim infn→∞ G(un , wn ), i.e., the duality gap does not necessarily
converge to 0. However, this may be the case, and a necessary condition is the
continuity of F1 , F2 and their Fenchel conjugates. In these cases, G gives a stopping
6.4 Numerical Methods 415

criterion that guarantees for the primal-dual method (6.89) (given its convergence)
the optimality of the primal and dual objective values up to a given tolerance.

6.4.3 Application of the Primal-Dual Methods

Now we want to apply a discrete version of the method we derived in the previous
subsection to solve the variational problems numerically. To that end, we have
to discretize the respective minimization problems and check whether Fenchel-
Rockafellar duality holds, in order to guarantee the existence of a saddle point
of the Lagrange functional. If we succeed with this, we have to identify how to
implement the steps of the primal-dual method (6.89) and then, by Theorem 6.141,
we have a convergent numerical method to solve our problems. Let us start with the
discretization of the functionals
As in Sect. 5.4 we assume that we have rectangular discrete images, i.e., N × M
matrices (N, M ≥ 1). The discrete indices (i, j ) always satisfy 1 ≤ i ≤ N and
1 ≤ j ≤ M. For a fixed “pixel size” h > 0, the (i, j )th entry corresponds to the
function value at (ih, j h). The associated space is denoted by RN×M and equipped
with the scalar product

N
M
(u, v) = h2 ui,j vi,j .
i=1 j =1

To form the discrete gradient we also need images with multidimensional values,
and hence we denote by RN×M×K the space of images with K-dimensional values.
For u, v ∈ RN×M×K we define a pointwise and a global scalar product by

K
N
M
ui,j · vi,j = ui,j,k vi,j,k , (u, v) = h2 ui,j · vi,j ,
k=1 i=1 j =1

√
definitions give a pointwise absolute value |ui,j | = ui,j · ui,j
respectively. These √
and a norm u = (u, u).
With this notation, we can write the Lp norms simply as “pointwise summation”:
For u ∈ RN×M×K and p ∈ [1, ∞[ we have

N
M 1
p
up = h2 |ui,j |p , u∞ = max max |ui,j |.
i=1,...,N j =1,...,M
i=1 j =1
416 6 Variational Methods

To discretize the Sobolev semi-norm, we need a discretization of ∇. We assume,

as we have done before, a finite difference approximation with constant boundary
treatment:
⎧ ⎧
⎨ ui+1,j − ui,j if i < N, ⎨ ui,j +1 − ui,j if j < M,
(∂1 u)i,j = h (∂2 u)i,j = h
⎩0 otherwise ⎩0 otherwise.

The discrete gradient ∇h u is an element in RN×M×2 given by

(∇h u)i,j,k = (∂k u)i,j . (6.97)

As a linear operator, it maps ∇h : RN×M → RN×M×2 . Since we aim to use it as

A = ∇h in method (6.89), we need its adjoint as well as its norm. The former is
related to the discrete divergence operator divh : RN×M×2 → RN×M with zero
boundary values:
⎧v ⎧v
⎪
⎪
1,j,1
if i = 1, ⎪
⎪
i,1,2
if j = 1,
⎪ h
⎪ ⎪
⎪
⎨v ⎨v h − v
i,j,1 − vi−1,j,1 i,j,2 i,j −1,2
(divh v)i,j = if 1 < i < N, + if 1 < j < M,
⎪
⎪ h ⎪
⎪ h
⎪ ⎪
⎩ −vN −1,j,1
⎪
if i = N, ⎩ −vi,M−1,2
⎪
if j = M.
h h
(6.98)

Lemma 6.142 (Nullspace, Adjoint, and Norm Estimate for ∇h ) The linear map
∇h : RN×M → RN×M×2 has the following properties:
1. The nullspace is ker(∇h ) = span(1) with the constant vector 1 ∈ RN×M ,
2. the adjoint is ∇h∗ = − divh , and
3. the norm satisfies ∇h 2 < h82 .

Proof Assertion 1: Let u ∈ RN×M with ∇h u = 0. By induction we get for every

1 ≤ i ≤ N −1 and 1 ≤ j ≤ M by definition of ∇h that ui,j = ui+1,j = · · · = uN,j .
Hence, u is constant along the first component. For 1 ≤ j ≤ M − 1 we argue
similarly to see that uN,j = uN,j +1 = · · · = uN,M , and consequently u is a scalar
multiple of the constant image 1.
Assertion 2: For u ∈ RN×M and v ∈ RN×M×2 let ∂1− v 1 and ∂2− v 2 be the first
and second summand in (6.98), respectively. We form the scalar product (∇h u, v)
and evaluate

N−1
M
N M−1
(∇h u, v) = h (ui+1,j − ui,j )vi,j,1 + h (ui,j +1 − ui,j )vi,j,2
i=1 j =1 i=1 j =1

M
N N−1

=h ui,j vi−1,j,1 − ui,j vi,j,1
j =1 i=2 i=1
6.4 Numerical Methods 417

N
M M−1

+h ui,j vi,j −1,2 − ui,j vi,j,2
1=1 j =2 j =1

M
N−1

=h − v1,j,1 u1,j + (vi−1,j,1 − vi,j,1 )ui,j + vN−1,j,1 uN,j
j =1 i=2

N
M−1

+h − vi,1,2 ui,1 + (vi,j −1,2 − vi,j,2 )ui,j + vi,M−1,2 ui,M
i=1 j =2

N
M

= h2 − (∂1− v 1 )i,j − (∂2− v 2 )i,j ui,j = (u, − divh v).
i=1 j =1

Assertion 3: To begin with, we show that for u ∈ RN×M with u = 1,

N−1 M
N M−1
−h2 ui+1,j ui,j < 1, −h2 ui,j +1 ui,j < 1.
i=1 j =1 i=1 j =1

Assume that the first inequality is not satisfied. Let vi,j = −ui+1,j if i < N and
vN,j = 0. By the Cauchy-Schwarz inequality we obtain (v, u) ≤ vu ≤ 1,
i.e. the scalar product has to be equal to 1. This means that the Cauchy-Schwarz
inequality is tight and hence u = v. In particular, we have uN,j = 0, and recursively
we get for i < N that

ui,j = −ui+1,j = (−1)N−i uN,j = 0,

i.e., u = 0, which is a contradiction. For the second inequality we argue similarly.

Now we estimate
N M
−1
u − ui,j 2 ui,j +1 − ui,j 2
N M−1
i+1,j
∇h u2 = h2 + h2
h h
i=1 j =1 i=1 j =1

N
−1
M
N M−1
= u2i+1,j + u2i,j − 2ui+1,j ui,j + u2i,j +1 + u2i,j − 2ui,j +1 ui,j
i=1 j =1 i=1 j =1

N
M N
−1
M
N M−1
≤4 u2i,j − 2 ui+1,j ui,j − 2 ui,j +1 ui,j
i=1 j =1 i=1 j =1 i=1 j =1

4 2 2 8
< + 2 + 2 = 2.
h2 h h h

Since the set {u = 1} is compact, there exists some u∗ with u∗ = 1 where the
value of the operator norm is attained, showing that ∇h 2 = ∇h u∗ 2 < h82 . $
#
418 6 Variational Methods

The Sobolev semi-norm of u is discretized as ∇h up , and the total variation
corresponds to the case p = 1, i.e., TVh (u) = ∇h u1 . This allows one to
discretize the applications with Sobolev penalty from Sect. 6.3.2 and their coun-
terparts with total variation from Sect. 6.3.3. If Fenchel-Rockafellar duality (6.86)
holds, method (6.89) yields a saddle point of the Lagrange functional and the
primal component is a solution of the original problem. With a little intuition and
some knowledge about the technical feasibility of resolvent maps it is possible to
derive practical algorithms for a large number of convex minimization problems in
imaging. To get an impression, how this is done in concrete cases, we discuss the
applications from Sects. 6.3.2 and 6.3.3 in detail.
Example 6.143 (Primal-Dual Method for Variational Denoising) For 1 ≤ p < ∞,
1 < q < ∞, X = RN×M , and a discrete, noisy image U 0 ∈ RN×M and λ > 0 the
discrete denoising problem reads
q p
u − U 0 q λ∇h up
min + .
u∈X q p

Obviously, the objective is continuous at X, and also coercive, since in finite

dimensions all norms are equivalent. By the direct method (Theorem 6.17) we get
the existence of minimizers also for this special case. We introduce Y = RN×M×2
and define
1 q λ p
u∈X: F1 (u) = u − U 0 q , v∈Y : F2 (v) = vp , A = ∇h ,
q p

and note that the assumptions for Fenchel-Rockafellar duality from Theorem 6.68
are satisfied, and hence the associated Lagrange functional has a saddle point. To
apply method (6.89) we need the resolvents of ∂F1 and ∂F2∗ . Note that Lemma 6.65
and Example 6.64 applied to F2∗ lead to
p∗ ∗
p∗
λ−p /p
p ∗ · p ∗
λ
◦ λ−1 id = p ∗ · p ∗ if p > 1,
F2∗
p
F2 = p · p
λ
⇒ =
λI{v∞ ≤1} ◦ λ−1 id = I{v∞ ≤λ} if p = 1.

Now let σ, τ > 0. For the resolvent (id +σ ∂F1 )−1 we have by Lemma 6.136 and
Example 6.138,

(id +σ ∂F1 )−1 (u) i,j
0
= Ui,j 0
+ sgn(ui,j − Ui,j )(id +σ | · |q−1 )−1 |ui,j − Ui,j
0
| .

The same tools lead to

⎧ ∗ −1 wi,j
⎪ −p ∗
⎨ id +τ λ p | · |p −1 |wi,j | if p > 1,
(id +τ ∂F2∗ )−1 (w) = |wi,j |
i,j ⎪ wi,j wi,j
⎩min(λ, |wi,j |) = if p = 1.
|wi,j | max(1, |wi,j |/λ)
6.4 Numerical Methods 419

Table 6.1 Primal-dual method for the numerical solution of the discrete variational denoising
problem
Primal-dual method for the solution of the variational denoising problem

q p
u − U 0 q λ∇h up
min + .
u∈RN×M q p

1. Initialize
Let n = 0, ū0 = u0 = U 0 , w 0 = 0. Choose σ, τ > 0 with σ τ ≤ 8
h2
.
2. Dual step

w̄ n+1 = w n + τ ∇h ūn ,
⎧ n+1
⎪
⎪ p∗ n+1 w̄i,j
⎪ id +τ λ− p | · |p −1 −1 |w̄i,j
∗
⎪
⎨ | if p > 1,
n+1
|w̄i,j | 1 ≤ i ≤ N,
n+1
wi,j =
⎪
⎪
n+1
w̄i,j 1 ≤ j ≤ M.
⎪
⎪ if p = 1,
⎩
max(1, |w̄i,j
n+1
|/λ)

3. Primal step and extra gradient

v n+1 = un + σ divh w n+1 − U 0 ,

1 ≤ i ≤ N,
q−1 −1
un+1 0 n+1
i,j = Ui,j + sgn(vi,j )(id +σ | · |
n+1
) (|vi,j |),
1 ≤ j ≤ M.

ūn+1 = 2un+1 − un .

4. Iterate
Update n ← n + 1 and continue with Step 2.

Finally, we note that ∇h∗ = − divh holds and that the step size restriction
σ τ ∇h 2 < 1 is satisfied for σ τ ≤ h82 ; see Lemma 6.142. Now we have all
ingredients to fully describe the numerical method for variational denoising; see
Table 6.1. By Theorem 6.141 this method yields a convergent sequence ((un , wn )).
Let us briefly discuss a stopping criterion based on the duality gap (6.95). By
Lemma 6.65 and Example 6.64, we have

q∗
F1∗ =
q
F1 = q1 · q ◦ T−U 0 ⇒ q ∗ · q ∗
1
+ (U 0 , · ).

For p > 1 we get

q∗ −p∗
un − U 0 q divh w q ∗
q n
λ n p λ
p
p∗
G (un , wn ) = + +(U 0
, divh w n
) + ∇ h u p + wn p ∗
q q∗ p p∗
420 6 Variational Methods

which is a continuous functional, and we conclude in particular that G(un , wn ) → 0

for n → ∞.
In the case p = 1 every wn satisfies the condition wn ∞ ≤ λ, and thus

q∗
q
un − U 0 q divh wn q ∗
G(u , w ) =
n n
+ + (U 0 , divh wn ) + λ TVh (un ),
q q∗

and the convergence G(un , wn ) → 0 is satisfied as well. Thus, stopping the iteration
as soon as G(un , wn ) < ε, and this has to happen at some point, one has arrived
at some un such that the optimal primal objective value is approximated up to a
tolerance of ε.
Example 6.144 (Primal-Dial Method for Tikhonov Functionals/Deconvolution)
Now we consider the situation with a discretized linear operator Ah . Let again
1 ≤ p < ∞, 1 ≤ q < ∞ (this time we also allow q = 1) and X = RN1 ×M1 ,
moreover, let Ah ∈ L(X, Y ) with Y = RN2 ×M2 a forward operator that does not
map constant images to zero, and let U 0 ∈ Z be noisy measurements. The problem
is the minimization of the Tikhonov functional
q p
Ah u − U 0 q λ∇h up
min + .
u∈X q p

Noting that the nullspace of ∇h consists of the constant image only (see
Lemma 6.142), we can argue similarly to Theorems 6.86 and 6.115 and
Example 6.143 that minimizers exist. Application 6.97 and Example 6.127
motivate the choice of Ah as a discrete convolution operator as in Sect. 3.3.3,
i.e., Ah u = u ∗ kh with a discretized convolution kernel kh with nonnegative entries
that sum to one. The norm is, in this case, estimated by Ah ≤ 1, and the adjoint
A∗h amounts to a convolution with k̄h = D− id k. In the following the concrete
operator will not be of importance, and we will discuss only the general situation.
Since the minimization problem features both Ah and ∇h , we dualize with Z =
RN×M×2 , but in the following way:
G H
1 q λ p Ah
u ∈ X : F1 (u) = 0, (v, w) ∈ Y ×Z : F2 (v, w) = v − U 0 q + wp , A = .
q p ∇h

Again, the assumption in Theorem 6.68 are satisfied and hence we only need to
find saddle points of the Lagrange functional. We dualize F2 , which, similar to
Example 6.143, leads to
q∗
∗
p∗
λ−p /p
q ∗ v̄q ∗ + (U 0 , v̄)
1
F2∗ (v̄, w̄) =
if q > 1,
+ p ∗ w̄p ∗ if p > 1,
I{v̄∞ ≤1} (v̄) + (U 0 , v̄) if q = 1, I{w̄∞ ≤λ} (w̄) if p = 1.
6.4 Numerical Methods 421

By Lemma 6.136 item 5 we can apply the resolvent of ∂F2∗ for σ by applying
the resolvents for v̄ and w̄ componentwise. We calculated both already in Exam-
ple 6.143, and we only need to translate the variable v̄ by −σ U 0 . The resolvent for
∂F1 is trivial: (id +τ ∂F1 )−1 = id.
Let us note how σ τ can be chosen: we get an estimate of A2 by

Au2 = Ah u2 + ∇h u2 ≤ Ah 2 + ∇h 2 )u2 < Ah 2 + 8
h2
u2 ,

and hence σ τ ≤ (Ah 2 + h82 )−1 is sufficient for the estimate σ τ A2 < 1.
The whole method is described in Table 6.2. The duality gap is, in the case of
p > 1 and q > 1, as follows (note that F1∗ = I{0} ):

1 λ
G(un , v n , wn ) = I{0} (divh wn − A∗h v n ) +
q p
Ah un − U 0 q + ∇h un p
q p
∗
− −p
1 q∗ λ p p∗
+ ∗ v n q ∗ + (U 0 , v n ) + ∗
wn p∗ .
q p

Due to the presence of the indicator functional it will be ∞ in general, since

we cannot guarantee that divh wn = A∗h v h ever holds. However, if we knew
that the iterates (un ) lay is some norm ball, i.e., we had an a-priori estimate
un ≤ C for all n, we could use the following trick: we change F1 to F1 =
I{u≤C} , and the resolvent (id +τ ∂F1 )−1 becomes the projection onto {u ≤ C}
(see Example 6.137), and especially, the method is unchanged. The Fenchel dual
becomes F1∗ = C · (see Example 6.64). Hence, the duality gap of the iterates is
now given by the continuous function

1 λ
G̃(un , v n , wn ) = Cdivh wn − A∗h v n +
q p
Ah un − U 0 q + ∇h un p
q p
∗
− −p
1 q∗ λ p p∗
+ ∗ v n q ∗ + (U 0 , v n ) + ∗
wn p∗ ,
q p

and can be used as a stopping criterion. It is possible, but tedious, to obtain an a

priori estimate un ≤ C for this problem, e.g., using the coercivity of F2 and F2∗
and the estimate of the iterates in the proof of Theorem 6.141. We omit this estimate
and just note that in practice it is possible to use a crude estimate C > 0 without
dramatic consequences: if C is too small, G̃(un , v n , wn ) still goes to zero, but it may
become negative and the estimates (6.96) may be invalid.
422 6 Variational Methods

Table 6.2 Primal-dual method for the minimization of the Tikhonov functional with Sobolev- or
total variation penalty
Primal-dual method for the minimization of Tikhonov functionals

q p
Ah u − U 0 q λ∇h up
min + .
u∈RN1 ×M1 q p

1. Initialization
Let n = 0, ū0 = u0 = 0, v 0 = 0 and w 0 = 0.
Choose σ, τ > 0 such that σ τ ≤ (Ah 2 + h82 )−1 .
2. Dual step

v̄ n+1 = v n + τ (Ah ūn − U 0 ),

∗
n+1
sgn(v̄i,j )(id +τ | · |q −1 )−1 (|v̄i,j
n+1
|) if q > 1, 1 ≤ i ≤ N2 ,
vi,j =
n+1

min 1, max(−1, v̄i,j n+1
) if q = 1, 1 ≤ j ≤ M2 ,

w̄ n+1
= w + τ ∇h ū ,
n n

⎧ n+1
⎪ id +τ λ− pp | · |p∗ −1 −1 |w̄ n+1 | w̄i,j
⎪ ∗
⎪
⎪ if p > 1,
⎨ i,j
|w̄i,j |
n+1 1 ≤ i ≤ N1 ,
n+1
wi,j =
⎪
⎪ w̄ n+1
1 ≤ j ≤ M1 .
⎪
⎪
i,j
if p = 1,
⎩ n+1
max(1, |w̄i,j |/λ)

3. Primal step and extra gradient

un+1 = un + σ (divh w n+1 − A∗h v n+1 ),

ūn+1 = 2un+1 − un .
4. Iteration
Set n ← n + 1 and continue with step 2.

Example 6.145 (Primal-Dual Method for Variational Inpainting) Let h =

{1, . . . , N } × {1, . . . , M}, h h be an inpainting region, U 0 : h \h → R an
image to be inpainted, and consider for 1 ≤ p < ∞ the discrete inpainting problem
in X = RN×M :
p
∇h up
min + I{v| 0 (u).
u∈X p h \h =U }

We want to apply the primal-dual method to this problem. First note that a minimizer
exists. This can be shown, as in the previous examples, using arguments similar
to those for the existence of solutions in Application 6.98 and Example 6.128. To
6.4 Numerical Methods 423

dualize we choose Y = RN×M×2 and

λ p
u∈X: F1 (u) = I{v| (u), v∈Y : F2 (v) = vp , A = ∇h .
h \ =U
0}
h p

Again similarly to the continuous case, we use the sum and the chain rule for
subgradients: F2 is continuous everywhere and F1 and F2 ◦∇h satisfy the assumption
in Exercise 6.14. We obtain

F = F1 + F2 ◦ ∇h ⇒ ∂F = ∂F1 + ∇h∗ ◦ ∂F2 ◦ ∇h ,

and by Remark 6.72 there exists a saddlepoint of the respective Lagrange function-
als.
To apply the primal-dual method
we need (id +σ ∂F1 )−1 , and this amounts to the

projection onto K = {v ∈ X v|h \h = U 0 } (see Example 6.137). To evaluate this
we do

(id +σ ∂F1 )−1 (u) = χh \h U 0 + χh u

after extending U 0 to h arbitrarily. The resulting method (6.89) is shown in

Table 6.3.

Example 6.146 (Methods for Linear Equality Constraints/Interpolation) Our final

example will deal with general linear equality constraints combined with minimiz-
ing a Sobolev semi-norm or the total variation. Let 1 ≤ p < ∞, X = RN1 ×M1 ,
and Y = RN2 ×M2 . For U 0 ∈ Y we encode the linear equality constraint with a map
Ah : RN1 ×M1 → RN2 ×M2 as Ah u = U 0 . We assume that Ah is surjective and that
Ah 1 = 0 holds; see also Application 6.100 and Example 6.128. Our aim is to solve
the minimization problem
p
∇h up
min + I{Ah v=U 0 } (u)
u∈X p

numerically. Having the interpolation problem in mind we consider the equality

constraint in the equivalent form (Ah u)i,j = (v i,j , u) = Ui,j
0 , where the v i,j are the
1 ∗ i,j
images of the normalized standard basis vectors v = h Ah e for 1 ≤ i ≤ N2 ,
i,j

1 ≤ j ≤ M2 .
Similarly to Example 6.145 one sees that this problem has a solution. With Z =
RN×M×2 we write
λ p
u∈X: F1 (u) = I{Ah v=U 0 } (u), v∈Z: F2 (v) = vp , A = ∇h ,
p

such that Fenchel-Rockafellar duality holds for the sum F = F1 + F2 ◦ A (cf.

Example 6.145). The projection (id +σ ∂F1 )−1 (u) can be realized by the solution of
424 6 Variational Methods

Table 6.3 Primal-dual method for the numerical solution of the variational inpainting problem
Primal-dual method for variational inpainting

p
∇h up
min + I{v| 0 (u).
u∈RN×M p h \h =U }

1. Initialization
Let n = 0, ū0 = u0 = U 0 , w 0 = 0. Choose σ, τ > 0 with σ τ ≤ 8
h2
.
2. Dual step

w̄ n+1 = w n + τ ∇h ūn ,
⎧ n+1
⎪ id +τ | · |p∗ −1 −1 |w̄ n+1 | w̄i,j
⎪
⎪
⎪
⎨ i,j if p > 1,
1 ≤ i ≤ N,
|w̄i,j |
n+1
n+1
wi,j =
⎪
⎪ w̄ n+1
1 ≤ j ≤ M.
⎪
⎪
i,j
if p = 1,
⎩ n+1
max(1, |w̄i,j |)

3. Primal step and extra gradient

un+1 = χh \h U 0 + χh (un + σ divh w n+1 ),

ūn+1 = 2un+1 − un .

4. Iteration
Set n ← n + 1 and continue with step 2.

the linear system

λ∈Y : (Ah A∗h )λ = U 0 − Ah u ⇒ (id +σ ∂F1 )−1 (u) = u + A∗h λ;

see Example 6.137. Table 6.4 shows the resulting numerical method.
Now let us consider the concrete case of the interpolation problem from
Application 6.100, i.e., the k-fold zooming of an image U 0 ∈ RN×M . Here it is
natural to restrict oneself to positive integers k. Thus, the map Ah models a k-fold
downsizing from N1 × M1 to N2 × M2 with N1 = kN, M1 = kM, N2 = N, and
M2 = M. If we choose, for example, the averaging over k × k squares, then the v i,j
are given by
1
v i,j = χ[(i−1)k+1,ik]×[(j −1)k+1,j k], 1 ≤ i ≤ N, 1 ≤ j ≤ M.
hk
1
The factor hk makes these vectors orthonormal, and hence the matrix Ah A∗h is the
identity. Thus, there is no need to solve a linear system and we can set λ = μ in
step 3 of the method in Table 6.4. For the perfect low-pass filter one can leverage
the orthogonality of the discrete Fourier transform in a similar way.
6.5 Further Developments 425

Table 6.4 Primal-dual method for the numerical solution of minimization problems with Sobolev
semi-norm or total variation and linear equality constraints
Primal-dual method for linear equality constraints

p
∇h up
min + I{Ah v=U 0 } (u).
u∈RN1 ×M1 p

1. Initialization
Let n = 0, ū0 = u0 = U 0 , w 0 = 0. Choose σ, τ > 0 with σ τ ≤ 8
h2
.
2. Dual step

w̄ n+1 = w n + τ ∇h ūn ,
⎧ n+1
⎪
⎪ n+1 w̄i,j
⎪ id +τ | · |p −1 −1 |w̄i,j
∗
⎪
⎨ | if p > 1,
|w̄i,j
n+1
| 1 ≤ i ≤ N1 ,
n+1
wi,j =
⎪
⎪
n+1
w̄i,j 1 ≤ j ≤ M1 .
⎪
⎪ if p = 1,
⎩ n+1
max(1, |w̄i,j |)

3. Primal step and extra gradient

μn+1 = U 0 − Ah (un + σ divh w n+1 ),

λn+1 = (Ah A∗h )−1 μn+1 ,

un+1 = un + A∗h λn+1 ,

ūn+1 = 2un+1 − un .

4. Iteration
Set n ← n + 1 and continue with step 2.

6.5 Further Developments

Variational methods continue to develop rapidly, and it seems impossible to come

close to a complete overview of further development. We only try to sketch some
selected topics, which we describe briefly in the following. We also like to point the
reader to the monographs [8, 37, 126].
One early proposal of a variational method is the so-called Mumford-Shah
functional for image segmentation [101]. This functional is based on the model that
an image consists of smooth (twice differentiable) parts that are separated by jump
discontinuities. Mumford and Shah proposed to segment an image u0 : → R with
426 6 Variational Methods

an edge set ⊂ and a piecewise smooth function u that minimize the following
functional:

E(u, ) = (u − u0 )2 dx + λ |∇u|2 dx + μHd−1 (),
\

where λ, μ > 0 are again some regularization parameters. The third term penalizes
the length of the edge set and prohibits the trivial and unnatural minimizer =
and u0 = u. The Mumford-Shah functional is a mathematically challenging object,
and both its analysis and numerical minimization have been the subject of numerous
studies [5, 6, 9, 18, 33, 49, 76, 112]. On big challenge in the analysis of the
minimization of E results from an appropriate description of the objects and the
functions u ∈ H 1 (\) (with changing ) in an appropriate functional-analytic
context. It turned out that the space of special functions of bounded total variation
SBV() is well suited. It consists of these functions in BV(), where the derivative
can be written as

∇u = (∇u)L1 Ld + (u+ − u− )νHd−1 ,

where (u+ − u− ) denotes the jump, ν the measure-theoretic normal, and the jump
set. The vector space SBV() is a proper subspace of BV() (for BV functions the
gradient also contains a so-called Cantor part), and the Mumford-Shah functional
is well defined on this space but is not convex. Moreover, it is possible to prove the
existence of minimizers of the Mumford-Shah functional in SBV(), but due to the
nonconvexity, the proof is more involved than in the cases we treated in this chapter.
A simplified variational model for segmentation with piecewise constant images
has been proposed by Chan and Vese in [38]. Formally it is a limiting case of the
Mumford-Shah problem with the parameters (αλ, αμ), where α → ∞, and hence it
mainly contains the geometric parameter . In the simplest case of two gray values
one has to minimize the functional

F (, c1 , c2 ) = (χ c1 + χ\ c2 − u0 )2 dx + μHd−1 (),

where = ∂
−1
∩0. For a fixed , the optimal
−1
constants are readily calculated as
c1 = | | u dx, and c 2 = |\ | \ u 0 dx and the difficult part is to

find the edge set . For this problem so-called level-set methods are popular [105,
128]. Roughly speaking, the main idea is to represent as the zero level-set of a
function φ : → R that is positive in and negative in \ . With the Heaviside
function H = χ[0,∞[ we can write F as

F (φ, c1 , c2 ) = (c1 − u0 )2 (H ◦ φ) + (c2 − u0 )2 (1 − H ) ◦ φ dx + μ TV(H ◦ φ),

6.5 Further Developments 427

which is to be minimized. In practice one replaces H by a smoothed version Hε ,

formally derives the Euler-Lagrange equations, and performs a gradient descent as
described in Sect. 6.4.1:
∂φ ∇φ
= (Hε ◦ φ) − (c1 − u0 )2 + (c2 − u0 )2 + div .
∂t |∇φ|

The numerical solution of this equation implicitly defines the evolution of the edge
set . If one updates the mean values c1 and c2 during the iteration, this approach
results, after a few more numerical tricks, in a solution method [38, 73].
To decompose images into different parts there are more elaborate approaches
such as the denoising method we have seen in this chapter, especially with respect
to texture. The model behind these methods assumes that the dominant features of
an image u0 are given by a piecewise smooth “cartoon” part and a texture part.
The texture part should contain mainly fine and repetitive structures, but not contain
noise. If one assumes that u0 contains noise, one postulates

u0 = ucartoon + utexture + η.

We mention specifically the so-called G-norm model by Meyer [70, 87, 99]. In this
model the texture part utexture is described by the semi-norm that is dual to the total
variation:

u∗ = inf σ ∞ σ ∈ Ddiv,∞ , div σ = u .

The G-norm has several interesting properties and is well suited to describe
oscillating repetitive patterns. In this context ucartoon is usually modeled with
Sobolev semi-norms or the total variation, which we discussed extensively in this
chapter. An associated minimization problem can be, for example,

1 λ
min |u0 − ucartoon − utexture |q dx+ |∇ucartoon|p dx+μutexture∗
ucartoon ,utexture q p

with two parameters λ, μ > 0 and exponents 1 < p, q < ∞ or λ TV instead of

p
∇up . Approaches to modeling textures that are different from the G-norm also
use dual semi-norms, e.g., the negative Sobolev semi-norm

∗
1≤p<∞: uH −1,p = sup uv dx v ∈ H 1,p (), ∇vp∗ ≤ 1 .

There are also theoretical results and several numerical methods for this approach
available [10, 92, 107].
The problem to determine the optical flow can be cast as a variational problem
in different ways; see, for example, [17, 24, 75]. We show the classical approach
of [78]. For a given image sequence u : [0, 1] × → R on a domain one aims to
428 6 Variational Methods

find a velocity field v : [0, 1] × → Rd for which v(t, · ) gives the directions and
velocities of the objects in each image u(t, · ). The respective variational problem is
then derived from the following considerations. If we traced the movement of every
point x ∈ in time, the path would follow the trajectory ϕx : [0, 1] → with
ϕx (0) = x. Now we assume that the overall brightness of the sequence u0 does not
change, and even that all the points do not change theirbrightness over time. If we
plug this assumption into the trajectory, we see that u t, ϕx (t) = u(0, x) has to
hold. Differentiating this equation with respect to t leads to
∂u
t, ϕx (t) + ∇u t, ϕx (t) · ϕx (t) = 0.
∂t

The derivative ϕx (t) is exactly the velocity v t, ϕx (t) . If we claim that the above
equation is satisfied throughout [0, 1] × we obtain the optical flow constraint

∂u
+ ∇u · v = 0. (6.99)
∂t
On the other hand, the velocity field v should have a certain smoothness in space,
i.e., we expect that the objects mainly follow rigid motions and deform only slightly,
e.g., by a change of the viewpoint The idea from [78] is to enforce the smoothness
by an H 1 penalty. Since the brightness will not be exactly constant, one does not
enforce the optical flow constraint exactly. This leads to the following minimization
problem for the optical flow at time t:
2
1 ∂u λ
F (v) = (t) + ∇u(t) · v dx + |∇v|2 dx.
2 ∂t 2
There are many variants of this theme. On the one hand, one could use the whole
time interval in the optimization, and hence the minimization problem would
contain an integral over the time interval [0, 1]. On the other hand, one can consider
numerous other data and regularization terms [24].
In practice there are only discrete images. There are several approaches in the
literature that determine the optical flow in the case that only u0 = u(0) and
u1 = u(1) are known. Here one assumes again that u(t) satisfies the optical flow
constraint. If v is known, one can set the initial condition u(0) = u0 and solve
the transport equation (6.99) (e.g., by the method of characteristics; see Chap. 5) to
obtain u(1) = u1 . If this is not satisfied, one can use the discrepancy u(1) − u1
to determine how well the unknown v fits the data. This motivates the optimization
problem
⎧
1 ∂v ⎪
⎨
∂u
+ ∇u · v = 0,
1
min |u(1) − u0 |2 dx + λ ϕ , ∇v dx dt with ∂t
v 2 0 ∂t ⎪
⎩ u(0) = u0 ;

see [17]. The penalty for v contains a regularization in space, but also the time
derivative ∂v
∂t . Another possibility is to fix both endpoints u(0) and u(1) and to
6.5 Further Developments 429

look for an interpolation u : [0, 1] → and measure the deviation of the optical
flow constraint. In this case one optimizes over both u and v; for this approach,
see [42, 43, 84]:

1 1 ∂u 2 ∂v u(0) = u0 ,
min + v · ∇u + λϕ , ∇v dx dt with
u,v 2 0 ∂t ∂t u(1) = u1 .

The registration of an image u0 : → R with another u1 : → R uses similar

functionals. The main difference lies in the fact that the vector field v : →
Rd now is a deformation field, i.e., it is time independent and does not need to
satisfy the optical flow constraint, but is considered to be a stationary coordinate
transformation. The forward model is the equation u0 ◦ (id +v) = u1 . Variational
methods use a minimization problem that measures both the data fit u0 ◦(id +v)−u1
and the smoothness of v. For more details we refer to the survey article [63] and the
book [100].
Finally, we give an outlook on recent developments for regularization terms
for images. As developed in Sects. 6.3.1 and 6.3.3, spaces of weakly differentiable
functions are well suited for the regularization of variational imaging methods. The
total variation is especially interesting as a penalty due to its ability to preserve
edges. It is used in virtually all imaging problems. Unfortunately, there are some
problems with the TV semi-norm. The most prominent drawback is the “staircasing
effect,” i.e. the problem that new edges arise in places where no new edges should
be (see, e.g., Figs. 6.19 and 6.21). These are two sides of the same coin. On the one
hand, we like to have new edges to appear at places where the original image has
jumps, but on the other hand they also appear where the original image was smooth.
The TV semi-norm cannot distinguish between these two cases.
One possibility to overcome the staircasing effect is to incorporate terms with
higher-order derivatives. This should be done in a way that still allows one to
reconstruct jump discontinuities. Unfortunately, these are competing demands: If
a function in BV() has a jump, then its derivative ∇u is singular with respect to
the Lebesgue measure, and it is differentiable only in the sense of distributions. If,
on the other hand, ∇ 2 u is, for example, a Radon measure or an Lp function, one can
show that ∇u is an Lp function, i.e., u can no longer have edges. There are many
different approaches to enabling higher-order methods to respect edges, both with
convex and non-convex functionals [19, 31, 40, 41, 74, 95, 125, 129]. We present
only some convex approaches, since there is a well-developed theory in this case.
A simple generalization of TV is the total variation of second order [125]:

TV2 (u) = sup u div2 v dx v ∈ D(, Rd×d ), v∞ ≤ 1 .

Of course, one can use other differential operators for a similar definition, e.g., the
Laplace operator

uM = sup uv dx v ∈ D(), v∞ ≤ 1

430 6 Variational Methods

or the variant [95]

d
∂ 2v
i
diag ∇ 2 uM = sup u 2
dx v ∈ D(, Rd ), v∞ ≤ 1 .
i=1
∂x i

As such, these regularization terms in a denoising problem give smooth solutions,

but in comparison with TV they lead to notably less sharp edges (see Fig. 6.31).
To employ the advantages of the total variation, one can use the infimal
convolution. There are the following variants proposed in the literature [31, 41]:

(TV α TV2 )(u) = inf TV(u1 ) + α TV2 (u2 ),

u1 +u2 =u

(TV α · M ◦ )(u) = inf TV(u1 ) + αu2 M .

u1 +u2 =u

Both approaches are well suited for preserving edges with reduced staircasing,
and the different second-order differential operators lead to slightly different

Fig. 6.31 Illustration of variational denoising with second-order penalties. Left: The original on
top, below the noisy version. Middle and right: The minimizers of the L2 -! denoising problem
with PSNR-optimal parameter λ
6.5 Further Developments 431

characteristics. The reduction of staircasing artifacts is even more prominent for

functionals that constrain the zeroth and first order derivatives in the predual form,
i.e., in the functional [129]

d
∂2v
i
!diag (u) = sup u 2
dx v ∈ D(, Rd ), v∞ ≤ α, diag ∇v∞ ≤ 1
i=1
∂x i

or in the “total generalized variation” of second order

TGV2α (u) = sup u div2 v dx v ∈ D(, S d×d ), v∞ ≤ α, div v∞ ≤ 1 ,

which easily generalized to higher orders [19]. Figure 6.32 shows results for these
functionals in a denoising problem.

Fig. 6.32 Illustration of variational denoising with penalties that combine first and second order.
Left: The original on top, below the noisy version. Middle and right: The minimizer of the L2 -!
denoising problems with PSNR-optimal parameters λ
432 6 Variational Methods

6.6 Exercises

5λ (ξ ) = (2π)−d/2(1 + λ|ξ |2 )−1 with λ > 0.

Exercise 6.1 Let P
1. Show that the inverse Fourier transform Pλ in the sense of distributions is given
by

|x|1−d/2 2π|x|
Pλ (x) = d−1 (d+2)/4
Kd/2−1 √ .
(2π) λ λ

The following result from [127] may come in handy:

m−d
2 −m
(2π) 2 m−d
F (1 + | · | ) 2 = m | · | 2 K d−m (2π| · |).
( 2 ) 2

2. For odd d, use

2
π −z
n
(n + k)!
K−ν = Kν , ν ∈ C, Kn+1/2 (z) = e , n ∈ N,
2z k!(n − k)!(2z)k
k=0

from [138] to derive a formula Pλ that does not use Kd/2−1.

3. Finally, show that Pλ ∈ L1 (Rd ) for every dimension d ≥ 1. You may use the
estimates

O(|z|−ν ) if Re ν > 0,
|Kν (z)| = for |z| → 0,
O(log |z|) if ν = 0,
e−|z|
|Kν (z)| = O for |z| → ∞,
|z|1/2

from [138].
Exercise 6.2 Let ⊂ Rd be a domain and ⊂ a bounded Lipschitz subdomain
such that ⊂⊂ .
1. Show the identities

{u ∈ H 1 () u = 0 on \ } = {u ∈ H 1 () u| ∈ H01 ( )} = D( ),

where the closure is to be understood with respect to the H 1 norm.

2. Prove that the traces of the functions u1 = u|\ and u2 = u| in L2Hd−1 (∂ )
are equal for all functions in u ∈ H 1 ().
Exercise 6.3 Show that on every nontrivial Banach space X there is a noncoercive
functional F : X → R∞ that has a unique minimizer.
6.6 Exercises 433

Exercise 6.4 Let X∗ be the dual space of a separable normed space. Show that a
functional F : X∗ → R∞ , that is bounded from below, coercive, and sequentially
weak* lower semicontinuous has a minimizer in X∗ .
Exercise 6.5 Let : X → Y be an affine linear map between vector spaces X, Y ,
i.e.,

λx + (1 − λ)y = λ(x) + (1 − λ)(y) for all x, y ∈ X and λ ∈ K.

Show that there exist a y 0 ∈ Y and a linear F : X → Y such that (x) = y 0 + F x

holds for all x ∈ X.
Exercise 6.6 Let A ∈ L(X, Y ) be a linear and continuous map between Banach
spaces X and Y . Show that the following assertions are equivalent:
1. For some u0 ∈ Y and p ≥ 1 the functional F : X → R given by
p
Au − u0 Y
F (u) =
p

is coercive on X.
2. The operator A is injective, rg(A) is closed, and in particular A−1 : rg(A) → X
is continuous.
Exercise 6.7 Let X and Y be Banach spaces and let X be reflexive. Moreover let
| · |X : X → R be an admissible semi-norm on X, i.e., there exist a linear and
continuous P : X → X and constants 0 < c ≤ C < ∞ such that

cP uX ≤ |u|X ≤ CP uX for all u ∈ X.

Show that if A ∈ L(X, Y ) is continuously invertible on {u ∈ X |u|X = 0} =
ker(P ) then there exists a minimizer of the Tikhonov functional F with
p q
Au − u0 Y |u|
F (u) = +λ X
p q

for every u0 ∈ Y , p, q ≥ 1, and λ > 0.

Addition: The assertion remains true if X is the dual space of a separable normed
space and A is weakly*-to-weakly continuous.
Exercise 6.8 Let X be a real Hilbert space, and u ∈ X with uX = 1.
Show w ∈ X with (w, v) ≥ 0 for all v ∈ X with (u, v) < 0 then there exists
some μ ≥ 0 such that w + μu = 0.
Exercise 6.9 Let K ⊂ X be a nonempty convex subset of a normed space and
assume that Br (x 1 ) ⊂ K for some x 1 ∈ X and r > 0. Show that for every x 0 ∈ K
434 6 Variational Methods

and λ ∈ ]0, 1] also Bλr (x λ ) ⊂ K, where x λ = λx 1 + (1 − λ)x 0 . Deduce that if

int(K) = ∅, then int(K) = K.
Exercise 6.10 Let F : X → R∞ be convex on a real normed space X. Show that:
1. If w1 ∈ ∂F (u) and w2 ∈ ∂F (u) for some u ∈ X, then λw1 +(1 −λ)w2 ∈ ∂F (u)
for all λ ∈ [0, 1].
2. If F is lower semicontinuous, then for a sequence ((un , wn )) in X × X∗ with
∗
wn ∈ ∂F (un ) and un → u and wn w one has w ∈ ∂F (u).
3. The assertion in item 2 remains true if un u and wn → w.
4. The set ∂F (u), u ∈ X is sequentially weakly* closed.
Exercise 6.11 Let X, Y be real Banach spaces. Prove that ∂(F ◦ A) = A∗ ◦ ∂F ◦ A
holds for F : X → R∞ convex and A : Y → X linear, continuous, and surjective.
[Hint:] Use the open mapping theorem (Theorem 2.16) for B : Y × R → X × R with
B : (v, t) → (Av, w, vY ∗ ×Y + t) and w ∈ Y ∗ to show that

K2 = {(Av, s) ∈ X × R s ≤ F (Au) + w, v − uY ∗ ×Y }

has nonempty interior for every u ∈ dom F .

Exercise 6.12 Let X, Y be real normed spaces, dom A ⊂ Y a dense subspace of Y ,

A : dom A → X linear, and F : X → R∞ convex. Show that the following criteria
are sufficient for ∂(F ◦ A) = A∗ ◦ ∂F ◦ A to hold (with A∗ : X∗ ⊃ dom A∗ → Y ∗
from Definition 2.25):
1. F is continuous at some point u0 ∈ dom F ∩ rg(A),
2. X, Y are complete and A is surjective.
[Hint:] Show that A is an open mapping in this situation.

Exercise 6.13 Let K1 , K2 ⊂ X be nonempty convex subsets of a normed space X

and X = X1 + X2 with subspaces X1 , X2 and continuous Pi ∈ L(X, Xi ) such that
id = P1 + P2 . Show that:
1. If the Ki are disjoint and both relatively open with respect to Xi , i.e., (Ki −x)∩Xi
is relatively open in Xi for every x ∈ X, then there exist x ∗ ∈ X∗ , x ∗ = 0, and
λ ∈ R such that

Re x ∗ , x ≤ λ for all x ∈ K1 and Re x ∗ , x ≥ λ for all x ∈ K2 .

[Hint:] Separate {0} from the open set K1 − K2 .

2. The assertion remains true if intXi (Ki ) are nonempty and disjoint. Here intXi (Ki )
is the set of all x ∈ Ki such that 0 is a relative interior point of (Ki − x) ∩ Xi in
Xi .
[Hint:] First prove that Ki = intXi (Ki ) (see also Exercise 6.9).
6.6 Exercises 435

Exercise 6.14 Let F1 , F2 : X → R∞ be convex functionals on a real normed

space X with the property that X = X1 + X2 for subspaces Xi and continuous
Pi ∈ L(X, Xi ) with id = P1 + P2 .
Furthermore, assume that there exists a point x 0 ∈ dom F1 ∩ dom F2 such that
x → Fi (x 0 + x i ) with x i ∈ Xi continuous at 0. Prove that the identity
i

∂(F1 + F2 ) = ∂F1 + ∂F2

holds.
Exercise 6.15 Show that for real Banach spaces X, Y and A ∈ L(Y, X) such that
rg(A) is closed and there exists a subspace X1 ⊂ X with X = X1 + rg(A), as well
as mappings P1 ∈ L(X, X1 ) and P2 ∈ L(X, rg(A)) with id = P1 + P2 , one has for
convex F : X → R∞ for which there exists a point x 0 ∈ dom F ∩ rg(A) such that
x 1 → F (x 0 + x 1 ), x 1 ∈ X1 is continuous at the origin that

∂(F ◦ A) = A∗ ◦ ∂F ◦ A.

Exercise 6.16 Use the results of Exercise 6.15 to find an alternative proof for the
third point in Theorem 6.51.
Exercise 6.17 Let X, Y be real Banach spaces and A ∈ L(X, Y ) injective and rg(A)
dense in Y . Show:
1. A−1 with dom A−1 = rg(A) is a closed and densely defined linear map,
2. (A∗ )−1 with dom (A∗ )−1 = rg(A∗ ) is a closed and densely defined linear map,
3. it holds that (A−1 )∗ = (A∗ )−1 .
Exercise 6.18 Let ⊂ Rd be a domain and F : L2 () → R convex and Gâteaux-
differentiable. Consider the problems of minimizing the functionals

min F1 (u), F1 = F + I{v2 ≤1} , min F2 (u), F2 = F + I{v∞ ≤1} .

u∈L2 () u∈L2 ()

Use subdifferential calculus to calculate the subgradients of F1 and F2 . Derive the

optimality conditions and verify that they are equivalent to those in Example 6.37.
Exercise 6.19 Let X, Y be nonempty sets and F : X × Y → R∞ . Show that

sup inf F (x, y) ≤ inf sup F (x, y),

y∈Y x∈X x∈X y∈Y

where the supremum of a set that is unbounded from above is ∞ and the infimum
of a set that is unbounded from below is −∞.
Exercise 6.20 Prove the claims in Lemma 6.59.
436 6 Variational Methods

Exercise 6.21 For p ∈ [1, ∞[ let F : X → R∞ be a positively p-homogeneous

and proper functional on the real Banach space X, i.e.,

F (αu) = α p F (u) for all u ∈ X and α ≥ 0.

Show:
1. For p > 1, one has that F ∗ is positively p∗ -homogeneous with p1 + p1∗ = 1.
2. For p = 1, one has that F ∗ = IK with a convex and closed set K ⊂ X∗ .
3. For “p = ∞,” i.e., F = IK with K = ∅ positively absorbing, i.e., αK ⊂ K for
α ∈ [0, 1], one has that F ∗ is positively 1-homogeneous.
Exercise 6.22 Let X be a real Banach space and F : X → R∞ strongly coercive.
Show that the Fenchel conjugate F ∗ : X → R is continuous.

@ space, I = ∅ an index set, and Fi : X → R∞

Exercise 6.23 Let X be a real Banach
a family of proper functionals with i∈I dom Fi = ∅. Prove the identity
∗ ∗∗
sup Fi = inf Fi∗ .
i∈I i∈I

Exercise 6.24 Let X be a real reflexive Banach space and let F1 , F2 : X → R∞

be proper functionals. Moreover, assume that there exists a u0 ∈ dom F1 ∩ dom F2
such that F1 is continuous in u0 .
Show:
1. For every L ∈ R and R > 0, the set

M = {(w1 , w2 ) ∈ X∗ × X∗ F1∗ (w1 ) + F2∗ (w2 ) ≤ L, w1 + w2 X∗ ≤ R}

is bounded.
[Hint:] Use the uniform boundedness principle. To that end, use the fact that for every
(v 1 , v 2 ) ∈ X × X one can find a representation v 1 − αu0 = v 2 − αu1 with α > 0 and
u1 ∈ dom F1 and apply the Fenchel inequality to w 1 , u1 and w 2 , u0 .

2. The infimal convolution F1∗ F2∗ is proper, convex, and lower semicontinuous.
[Hint:] For the lower semicontinuity use the result of the first point to conclude that
for sequences w n → w with (F1∗ F2∗ )(w n ) → t, sequences ((w 1 )n ), ((w 2 )n ) with

(w 1 )n +(w 2 )n = w n are bounded and F1∗ (w 1 )n +F2∗ (w 2 )n ≤ (F1∗ F2∗ )(w n )+ n1 .
∗
Also use the lower semicontinuity of F1 and F2 . ∗

3. Moreover, the infimal convolution is exact, i.e., for every w ∈ X∗ the minimum
in

min F1∗ (w1 ) + F2∗ (w2 )

w1 +w2 =w

is attained.
6.6 Exercises 437

[Hint:] Use the first point and the direct method to show that for a given w ∈ X ∗ a
respective minimizing sequence ((w 1 )n , (w 2 )n ) with (w 1 )n + (w 2 )n = w is bounded.

4. If F1 and F2 are convex and lower semicontinuous, then (F1 + F2 )∗ = F1∗ F2∗ .
5. The exactness of F1∗ F2∗ and (F1 + F2 )∗ = F1∗ F2∗ implies that for convex
and lower semicontinuous F1 , F2 , one has that ∂(F1 + F2 ) = ∂F1 + ∂F2 .
Exercise 6.25 Consider the semi-norm
m p/2 1/p
m ∂ u 2
∇ up =
m
(x) dx
Rd α ∂x α
|α|=m

on the space H m,p (Rd ), m ≥ 0, p ∈ [1, ∞].

Show that for every u ∈ H m,p (Rd ), x 0 ∈ Rd , and all isometries O ∈ Rd×d one
has

∇ m · p = ∇ m · p ◦ Tx 0 ◦ O.

In other words, ∇ m up = ∇ m (u(O · +x0 ))p , i.e., ∇ m · p is invariant under

translation and the application of isometries.
[Hint:] Consider ∇ m u(x) as m-linear map Rd × · · · × Rd → R and use the identities

∇ m (Tx 0 DO u)(x)(h1 , . . . , hm ) = ∇ m u(Ox + x 0 )(Oh1 , . . . , Ohm ),

m ∂ m u 2
d d
∂ mu 2

α (x) = ··· (x) ,
α ∂x ∂xi1 · · · ∂xim
|α|=m im =1 i1 =1

which hold for smooth functions.

Exercise 6.26 Let the assumptions in Theorem 6.86 be satisfied. Moreover, assume
that Y is a Hilbert space. Denote by T : Y → A"m the orthogonal projection
onto the image of the polynomials of degree up to m − 1 under A and further set
S = A−1 T A, where A is inverted on A"m .
Show that every solution u∗ of the problem (6.36) satisfies ASu∗ = T u∗ .
Exercise 6.27 Let be a bounded Lipschitz domain, m ≥ 1, and p, q ∈ ]1, ∞[.
1. Using appropriate conditions for q, show that the denoising problem

1 λ
min |u − u0 |q dx + |∇ m u|p dx
u∈Lq () q p

has solutions for every u0 ∈ Lq () and λ > 0.

2. Show that for q = 2, every minimizer u∗ satisfies Qm u∗ = Qm u0 .
3. Does a maximum principle as in Theorem 6.95 hold for m ≥ 2?
Exercise 6.28 Let be a bounded domain, k ∈ L1 (0 ), a bounded Lipschitz
domain such that − 0 ⊂ holds, and let m ≥ 1 and p, q ∈ ]1, ∞[.
438 6 Variational Methods

Furthermore, let λ > 0 and u0 ∈ Lq () satisfy the estimates L ≤ u0 ≤ R almost

everywhere in for some L, R ∈ R.
Prove that the problem

1 λ
min |u ∗ k − u0 |q dx + |∇ m u|p dx + I{v∈Lq () L≤v≤R almost everywhere}
(u)
q
u∈L () q p

has a unique solution.

Exercise 6.29 Let be a bounded domain, k ∈ L1 (0 ), a bounded Lipschitz
domain such that − 0 ⊂ holds, and let m ≥ 1 and p, q ∈ ]1, ∞[.
1. Show that A : Lq () → Lq ( ) defined by Au = (u ∗ k)| maps polynomials
in to polynomials in .
2. Assume that k satisfies, for all multi-indices α with |α| < m, that

1 if α = 0,
k(x)x α dx = (6.100)
0 0 otherwise.

Prove that Au = u| for all polynomials u ∈ "m .

α β α−β
[Hint:] You may use the multinomial theorem (x + y)α = β≤α β x y .

3. Prove: If k satisfies the assumption (6.100) and q satisfies appropriate assump-

tions, then the minimization problem

1 λ
min |u ∗ k − u | dx +
0 q
|∇ m u|p dx
u∈Lq () q p

has a unique solution for every u0 ∈ Lq ( ) and λ > 0.

Exercise 6.30 Prove the claims in Lemma 6.99.
1. For the first claim you may proceed as follows: consider the smoothing operators
M∗n from Theorem 6.88, i.e.,

K

M∗n u = ϕk T−tn ηk (u ∗ ψ̄k,n ) ,
k=0

and show that M∗n u ∈ D() as well as M∗n u| → u0 = u| in H m,p ().
2. For the second claim you may use Gauss’s theorem (Theorem 2.81) to reduce it
to the first claim.
Exercise 6.31 Let M, N ∈ N with N, M ≥ 1 and let S ∈ RN×N and W ∈ RM×N
be matrices.
6.6 Exercises 439

Show that if S is positive definite on ker(W ), i.e., for W x = 0 and x = 0, one

has that x T Sx > 0, then the block matrix

S WT
A=
W 0

is invertible.
Exercise 6.32 Let ⊂ Rd be a domain, N ∈ N, and L : D(, RN ) → R such that
there exists a constant C ≥ 0 such that |L(ϕ)| ≤ Cϕ∞ for every ϕ ∈ D(, RN ).
Show that L has a unique continuous extension to C0 (, RN ) and hence that there
exists a vector-valued finite Radon measure μ ∈ M(, RN ) such that

L(ϕ) = ϕ dμ for all ϕ ∈ D(, RN ).

[Hint:] Use the definition of C0 (, RN ) as well as Theorems 3.13 and 2.62.

Exercise 6.33 Prove the assertions in Lemma 6.103.

[Hint:]
1. Use the results of Exercise 6.32 and the characterization of M(, Rd ) as a dual space.
2. Combine the definition of the weak gradient with the identification M(, Rd ) =
C0 (, Rd )∗ .
3. Follow the proof of Lemma 6.73.

Exercise 6.34 Let ⊂ Rd be a bounded Lipschitz domain, and let h : R → R b

strictly increasing, continuous, differentiable, and satisfy h ∞ < ∞.
Prove that for every u ∈ BV() one has also h ◦ u in BV() and

TV(h ◦ u) = h (t) Per({u ≤ t}) dt ≤ h ∞ TV(u).
R

Exercise 6.35 Let ⊂ Rd be a domain, N ∈ N, and μ a σ -finite positive Radon

p
measure on . Prove that D(, RN ) is dense in every Lμ (, RN ) with 1 ≤ p < ∞.
p
[Hint:] For u ∈ Lμ (, RN ) use an argument based on truncation, Lusin’s theorem, and the
inner regularity of Borel measures (see, e.g., [5]) to find an approximating sequence (un ) in
Cc (, RN ). Then use a mollifier to uniformly approximate u ∈ Cc (, Rd ) by a sequence in
D (, RN ).

Exercise 6.36 Let ⊂ Rd be a domain and a bounded Lipschitz domain. Show

that the total variation of u = χ is given by TV(u) = Hd−1 (∂ ).
[Hint:] Use the definition of a Lipschitz domain to extend the field of inner normals −ν to
a neighborhood 0 of ∂ such that ν∞ ≤ 1 holds. Then use mollifiers and truncation to
construct a sequence −νε ∈ D (, Rd ) with νε ∞ ≤ 1, that converges on ∂ pointwise
Hd−1 -almost everywhere to −ν. Finally, show that this sequence is a maximizing sequence
in the definition of TV.
440 6 Variational Methods

Exercise
K 6.37 Let , 1 , . . . , K ⊂ Rd be bounded Lipschitz domains with =
k=1 k and k mutually disjoint. Moreover, let u : → R be such that every
uk = u|k can be extended to an element in C 1 (k ). Show that with l,k = l ∩
k ∩ , one has

K

TV(u) = |∇uk | dx + |ul − uk | dHd−1.
k=1 k l<k l,k

[Hint:] For every ε > 0 choose a neighborhood l,k of the part l,k of the boundary with
|l,k | < ε. Approximate sgn(uk −ul )ν there by smooth functions (similar to Exercise 6.36)

and also approximate on k \ 1≤l<k≤K l,k almost everywhere the negative sign
∇uk
− |∇u k|
. Patch these piecewise functions by smooth cutoff functions to construct a sequence
∇u k
(ϕ n ) in D (, Rd ) with ϕ n ∞ ≤ 1 that converges almost everywhere on k to − |∇u k | and

Hd−1 -almost everywhere on l,k to (uk − ul )ν.

Exercise 6.38 Let ⊂ Rd be a bounded Lipschitz domain, q ∈ ]1, ∞[ with q ≤

d/(d − 1), and u0 ∈ L1 () as well as λ > 0.
1. Prove that there exists a minimizer of the L1 -TV denoising problem

min
q
|u − u0 | dx + λ TV(u). (6.101)
u∈L ()

2. Show that if h : R → R is strictly increasing, continuously differentiable with

h ∞ < ∞, and u∗ is a solution of (6.101) with data u0 , then h ◦ u∗ is a solution
of (6.101) with data h ◦ u0 .
[Hint:] Use Fubini’s theorem and the result of Exercise 6.34.

Exercise 6.39 Let ⊂ Rd be a bounded Lipschitz domain, with ⊂⊂ a

bounded Lipschitz subdomain, q ∈ ]1, ∞[ with q ≤ d/(d −1) and u0 ∈ BV(\ )
with L ≤ u0 ≤ R almost everywhere in \ . Moreover, let u∗ ∈ BV() be a
solution of the TV-inpainting problem (6.69).
1. Prove that L ≤ u∗ ≤ R holds almost everywhere in .
2. Moreover, show that for every strictly increasing and continuously differentiable
h : R → R with h ∞ < ∞, one has that h ◦ u∗ is a solution of the TV-
inpainting problem with h ◦ u0 ∈ BV(\ ).
Exercise 6.40 Let ⊂ R2 be a domain and u : → R twice continuously
differentiable in a neighborhood of (x0 , y0 ) ∈ with u(x0 , y0 ) = 0. Moreover,
let the zero level-set {u = 0} be locally parameterized by arclength by functions
x, y : ]−ε,ε[ → R, i.e., x(0) = x0 , y(0) = y0 , |x |2 + |y |2 = 1, and
u x(s), y(s) = 0.
6.6 Exercises 441

Show that if
(∇u)x(s), y(s)
(∇u) x(s), y(s) = 0 as well as div = κ
(∇u) x(s), y(s)

for some κ ∈ R and all s ∈ ]−ε, ε[, then there exists a ϕ0 ∈ R such that (x, y) can
be written as

x(s) = x0 + sin(κs + ϕ0 ),
y(s) = y0 + cos(κs + ϕ0 ),

for all s ∈ ]−ε, ε[. In particular, (x, y) parameterizes a piece of a line or circle with
curvature κ.
Exercise 6.41 Let t > 0, p ∈ ]1, ∞[, σ > 0, and s0 be such that
t 1
p−1
s0 ≥ t and s0 < if p < 2.
σ (2 − p)
Show that the sequence (sn ) defined by the iteration

p−1
t − sn − σ sn
sn+1 = sn + p−2
,
1 + σ (p − 1)sn

is well defined, fulfills sn > 0 for all n and is decreasing. Moreover, it converges to
the unique s, which fulfills the equation s + σ s p−1 = t.
Exercise 6.42 Implement the primal-dual method for variational denoising
(Table 6.1).
Exercise 6.43 For K ≥ 1 let the matrix κ ∈ R(2K+1)×(2K+1) represent
K a convolu-
tion kernel that is indexed by −K ≤ i, j ≤ K and satisfies K
i=−K j =−K κi,j =
1. Moreover, for N, M ≥ 1 let Ah : R (N+2K)×(M+2K) → R N×M be a discrete
convolution operator,

K
K
(Ah u)i,j = u(i+K−k),(j +K−k)κk,l .
k=−K l=−K

1. Implement the primal-dual method from Table 6.2 for the solution of the
variational deconvolution problem
q p
Ah u − U 0 q λ∇h up
min +
u∈R(N+2K)×(M+2K) q p

for given data U 0 ∈ RN×M and parameter λ > 0.

442 6 Variational Methods

2. Derive a method that takes additional constraints U 0 ≤ ui,j ≤ U 0 for 1 ≤ i ≤

N + 2K, 1 ≤ j ≤ M + 2K with U 0 = mini,j Ui,j 0 and U 0 = max U 0 into
i,j i,j
account.
3. Implement and test the method with additional constraints. Do you notice
differences in the results of the two methods?
Exercise 6.44 Let (u∗ , w∗ ) ∈ RN×M × RN×M×2 be a saddle point of the Lagrange
functional for the discrete inpainting problems
p∗
p ∗ wp ∗
1
if p > 1,
L(u, w) = (∇h u, w) + I{v| (u) −
h \ =U
0}
h I{w∞ ≤1} (w) if p = 1,

from Example 6.145.

1. Prove the discrete maximum principle U 0 ≤ u∗i,j ≤ U 0 for all 1 ≤ i ≤ N and
1 ≤ j ≤ M, where U 0 and U 0 are the minimum and maximum values of Ui,j 0 ,

respectively. Derive an a priori estimate u∗ ∞ ≤ U 0 ∞ .

2. For p > 1 use the dual problem and Young’s inequality for numbers to derive
the following a priori estimate for w∗ :
⎧
⎪
⎪ 2p−1 p−1
⎨ ∇h U 0 p if p ≥ 2,
w∗ p∗ ≤ p − 1
⎪
⎪ p − 1 p p−1
⎩ ∇h U 0 p if p < 2.
2−p

3. Use the convergence proof of Theorem 6.141 to estimate the norm of the iterates
(un , wn ) of the primal-dual method from Table 6.3.
Exercise 6.45 Implement the primal-dual inpainting method from Table 6.3.
Add-on: Use the results from Exercise 6.44, to derive a modified duality gap G̃
according to Example 6.144. Prove that G̃(un , wn ) → 0 for the iterates (un , wn )
and modify your program such that it terminates if G̃ falls below a certain threshold.
Exercise 6.46 Let 1 ≤ p < ∞, N1 , N2 , M1 , M2 ∈ N positive, let the map
Ah : RN1 ×M1 → RN2 ×M2 be linear and surjective, and let Ah 1 = 0. Consider
the minimization problem
p
∇h up
min + I{Ah v=U 0 } (u)
u∈RN1 ×M1 p

for some U 0 ∈ RN2 ×M2 . Define X = RN1 ×N2 , Z = RN2 ×M2 × RN1 ×M1 ×2 and
p
wp
F1 : X → R, F1 (u) = 0, F2 : Z → R, F2 (v, w) = I{0} (v −U 0 )+ .
p
6.6 Exercises 443

Prove that the minimization problem is equivalent to the saddle point problem for

L(u, v, w) = (Ah u, v) + (∇h u, w) + F1 (u) − F2∗ (v, w).

Derive an alternative method for the minimization of the discrete Sobolev and total
variation semi-norm, respectively, that in contrast to the method from Table 6.4,
does not use the projection onto {Ah u = U 0 } and hence does not need to solve a
linear system.
Exercise 6.47 Let N, M ∈ N be positive, U 0 ∈ RN×M and K ∈ N with K ≥ 1.
For 1 ≤ p < ∞ consider the interpolation problem
p
∇h up
min + I{Ah u=U 0 } (u)
u∈RKN×KM p

with

1
K K
(Ah u)i,j = u((i−1)K+k),((j −1)K+l).
K2
k=1 l=1

Use the algorithm from Table 6.4 to implement a numerical method.

Add-on: Implement and test the alternative method from Exercise 6.46. How do
the methods differ in practice?
References

1. R. Acar, C.R. Vogel, Analysis of bounded variation penalty methods for ill-posed problems.
Inverse Probl. 10(6), 1217–1229 (1994)
2. R.A. Adams, J.J.F. Fournier, Sobolev Spaces. Pure and Applied Mathematics, vol. 140, 2nd
edn. (Elsevier, Amsterdam, 2003)
3. L. Alvarez, F. Guichard, P.-L. Lions, J.-M. Morel, Axioms and fundamental equations in
image processing. Arch. Ration. Mech. Anal. 123, 199–257 (1993)
4. H. Amann, Time-delayed Perona-Malik type problems. Acta Math. Univ. Comenian. N. Ser.
76(1), 15–38 (2007)
5. L. Ambrosio, N. Fusco, D. Pallara, Functions of Bounded Variation and Free Discontinuity
Problems. Oxford Mathematical Monographs (Oxford University Press, Oxford, 2000)
6. L. Ambrosio, N. Fusco, J.E. Hutchinson, Higher integrability of the gradient and dimension
of the singular set for minimisers of the Mumford-Shah functional. Calc. Var. Partial Differ.
Equ. 16(2), 187–215 (2003)
7. K.J. Arrow, L. Hurwicz, H. Uzawa, Studies in Linear and Non-linear Programming. Stanford
Mathematical Studies in the Social Sciences, 1st edn. (Stanford University Press, Palo Alto,
1958)
8. G. Aubert, P. Kornprobst, Mathematical Problems in Image Processing (Springer, New York,
2002)
9. G. Aubert, L. Blanc-Féraud, R. March, An approximation of the Mumford-Shah energy by
a family of discrete edge-preserving functionals. Nonlinear Anal. Theory Methods Appl. Int.
Multidiscip. J. Ser. A Theory Methods 64(9), 1908–1930 (2006)
10. J.-F. Aujol, A. Chambolle, Dual norms and image decomposition models. Int. J. Comput.
Vis. 63(1), 85–104 (2005)
11. V. Aurich, J. Weule, Non-linear gaussian filters performing edge preserving diffusion, in
Proceedings 17. DAGM-Symposium, Bielefeld (Springer, Heidelberg, 1995), pp. 538–545
12. C. Bär, Elementary Differential Geometry (Cambridge University Press, Cambridge, 2010).
Translated from the 2001 German original by P. Meerkamp
13. M. Bertalmio, G. Sapiro, V. Caselles, C. Ballester, Image inpainting, in Proceedings of
SIGGRAPH 2000, New Orleans (2000), pp. 417–424
14. M. Bertero, P. Boccacci, Introduction to Inverse Problems in Imaging (Institute of Physics,
London, 1998)
15. F. Bornemann, T. März, Fast image inpainting based on coherence transport. J. Math. Imaging
Vis. 28(3), 259–278 (2007)

K. Bredies, D. Lorenz, Mathematical Image Processing,
Applied and Numerical Harmonic Analysis,
https://doi.org/10.1007/978-3-030-01458-2
446 References

16. J.M. Borwein, A.S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and
Examples. CMS Books in Mathematics, vol. 3, 2nd edn. (Springer, New York, 2006)
17. A. Borzí, K. Ito, K. Kunisch, Optimal control formulation for determining optical flow. SIAM
J. Sci. Comput. 24, 818–847 (2002)
18. B. Bourdin, A. Chambolle, Implementation of an adaptive finite-element approximation of
the Mumford-Shah functional. Numer. Math. 85(4), 609–646 (2000)
19. K. Bredies, K. Kunisch, T. Pock, Total generalized variation. SIAM J. Imaging Sci. 3(3),
492–526 (2010)
20. M. Breuß, J. Weickert, A shock-capturing algorithm for the differential equations of dilation
and erosion. J. Math. Imaging Vis. 25(2), 187–201 (2006)
21. H. Brézis, Operateurs maximaux monotones et semi-groupes de contractions dans les espaces
de Hilbert. North-Holland Mathematics Studies, vol. 5. Notas de Matemática (50) (North-
Holland, Amsterdam; Elsevier, New York, 1973).
22. H. Brézis, Analyse fonctionnelle - Théorie et applications. Collection Mathématiques
Appliquées pour la Maîtrise (Masson, Paris, 1983)
23. T. Brox, O. Kleinschmidt, D. Cremers, Efficient nonlocal means for denoising of textural
patterns. IEEE Trans. Image Process. 17(7), 1083–1092 (2008)
24. A. Bruhn, J. Weickert, C. Schnörr, Lucas/Kanade meets Horn/Schunck: combining local and
global optical flow methods. Int. J. Comput. Vis. 61(3), 211–231 (2005)
25. A. Buades, J.-M. Coll, B. Morel, A review of image denoising algorithms, with a new one.
Multiscale Model. Simul. 4(2), 490–530 (2005)
26. M. Burger, O. Scherzer, Regularization methods for blind deconvolution and blind source
separation problems. Math. Control Signals Syst. 14, 358–383 (2001)
27. E.J. Candès, D.L. Donoho, New tight frames of curvelets and optimal representations of
objects with piecewise c2 singularities. Commun. Pure Appl. Math. 57(2), 219–266 (2004)
28. E. Candès, L. Demanet, D. Donoho, L. Ying, Fast discrete curvelet transforms. Multiscale
Model. Simul. 5(3), 861–899 (2006)
29. J. Canny, A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach.
Intell. 8(6), 679–698 (1986)
30. F. Catté, P.-L. Lions, J.-M. Morel, T. Coll, Image selective smoothing and edge detection by
nonlinear diffusion. SIAM J. Numer. Anal. 29(1), 182–193 (1992)
31. A. Chambolle, P.-L. Lions, Image recovery via Total Variation minimization and related
problems. Numer. Math. 76, 167–188 (1997)
32. A. Chambolle, B.J. Lucier, Interpreting translation-invariant wavelet shrinkage as a new
image smoothing scale space. IEEE Trans. Image Process. 10, 993–1000 (2001)
33. A. Chambolle, G.D. Maso, Discrete approximation of the Mumford-Shah functional in
dimension two. Math. Model. Numer. Anal. 33(4), 651–672 (1999)
34. A. Chambolle, T. Pock, A first-order primal-dual algorithm for convex problems with
applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
35. A. Chambolle, R.A. DeVore, N. Lee, B.J. Lucier, Nonlinear wavelet image processing:
variational problems, compression and noise removal through wavelet shrinkage. IEEE Trans.
Image Process. 7, 319–335 (1998)
36. T.F. Chan, S. Esedoglu, Aspects of total variation regularized L1 function approximation.
SIAM J. Appl. Math. 65, 1817 (2005)
37. T.F. Chan, J. Shen, Image Processing And Analysis: Variational, PDE, Wavelet, and
Stochastic Methods (Society for Industrial and Applied Mathematics, Philadelphia, 2005)
38. T.F. Chan, L.A. Vese, Active contours without edges. IEEE Trans. Image Process. 10(2),
266–277 (2001)
39. T.F. Chan, C. Wong, Total variation blind deconvolution. IEEE Trans. Image Process. 7,
370–375 (1998)
40. T.F. Chan, A. Marquina, P. Mulet, High-order total variation-based image restoration. SIAM
J. Sci. Comput. 22(2), 503–516 (2000)
References 447

41. T.F. Chan, S. Esedoglu, F.E. Park, A fourth order dual method for staircase reduction in
texture extraction and image restoration problems. Technical report, UCLA CAM Report
05-28 (2005)
42. K. Chen, D.A. Lorenz, Image sequence interpolation using optimal control. J. Math. Imaging
Vis. 41(3), 222–238 (2011)
43. K. Chen, D.A. Lorenz, Image sequence interpolation based on optical flow, segmentation,
and optimal control. IEEE Trans. Image Process. 21(3), 1020–1030 (2012)
44. Y. Chen, K. Zhang, Young measure solutions of the two-dimensional Perona-Malik equation
in image processing. Commun. Pure Appl. Anal. 5(3), 615–635 (2006)
45. U. Clarenz, U. Diewald, M. Rumpf, Processing textured surfaces via anisotropic geometric
diffusion. IEEE Trans. Image Process. 13(2), 248–261 (2004)
46. P.L. Combettes, V.R. Wajs, Signal recovery by proximal forward-backward splitting.
Multiscale Model. Simul. 4(4), 1168–1200 (2005)
47. R. Courant, K.O. Friedrichs, H. Lewy, Über die partiellen Differenzengleichungen der
mathematischen Physik. Math. Ann. 100(1), 32–74 (1928)
48. I. Daubechies, Orthonormal bases of compactly supported wavelets. Commun. Pure Appl.
Math. 41(7), 909–996 (1988)
49. G. David, Singular Sets of Minimizers for the Mumford-Shah Functional. Progress in
Mathematics, vol. 233 (Birkhäuser, Basel, 2005)
50. J. Diestel, J.J. Uhl Jr., Vector Measures. Mathematical Surveys and Monographs, vol. 15
(American Mathematical Society, Providence, 1977)
51. J. Dieudonné, Foundations of Modern Analysis. Pure and Applied Mathematics, vol. 10-I
(Academic, New York, 1969). Enlarged and corrected printing
52. U. Diewald, T. Preußer, M. Rumpf, Anisotropic diffusion in vector field visualization on
Euclidean domains and surfaces. IEEE Trans. Visual. Comput. Graph. 6(2), 139–149 (2000)
53. N. Dinculeanu, Vector Measures. Hochschulbücher für Mathematik, vol. 64 (VEB Deutscher
Verlag der Wissenschaften, Berlin, 1967)
54. D.L. Donoho, Denoising via soft thresholding. IEEE Trans. Inf. Theory 41(3), 613–627
(1995)
55. V. Duval, J.-F. Aujol, Y. Gousseau, The TVL1 model: a geometric point of view. Multiscale
Model. Simul. 8(1), 154–189 (2009)
56. J. Eckstein, D.P. Bertsekas, On the Douglas-Rachford splitting method and the proximal point
algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
57. I. Ekeland, R. Temam, Convex Analysis and Variational Problems. Studies in Mathematics
and Its Applications, vol. 1 (North-Holland, Amsterdam, 1976)
58. H.W. Engl, M. Hanke, A. Neubauer, Regularization of Inverse Problems. Mathematics and
Its Applications, vol. 375, 1st edn. (Kluwer Academic, Dordrecht, 1996)
59. S. Esedoḡlu, Stability properties of the Perona-Malik scheme. SIAM J. Numer. Anal. 44(3),
1297–1313 (2006)
60. L.C. Evans, A new proof of local C 1,α regularity for solutions of certain degenerate elliptic
P.D.E. J. Differ. Equ. 45, 356–373 (1982)
61. L.C. Evans, R.F. Gariepy, Measure Theory and Fine Properties of Functions (CRC Press,
Boca Raton, 1992)
62. H. Federer, Geometric Measure Theory (Springer, Berlin, 1969)
63. B. Fischer, J. Modersitzki, Ill-posed medicine — an introduction to image registration. Inverse
Probl. 24(3), 034008 (2008)
64. I. Galić, J. Weickert, M. Welk, A. Bruhn, A. Belyaev, H.-P. Seidel, Image compression with
anisotropic diffusion. J. Math. Imaging Vis. 31, 255–269 (2008)
65. E. Giusti, Minimal Surfaces and Functions of Bounded Variation. Monographs in
Mathematics, vol. 80 (Birkhäuser, Boston, 1984)
66. G.H. Golub, C.F. Van Loan, Matrix Computations. Johns Hopkins Studies in the
Mathematical Sciences, 4th edn. (Johns Hopkins University Press, Baltimore, 2013)
67. R.C. Gonzalez, P.A. Wintz, Digital Image Processing (Addison-Wesley, Reading, 1977)
68. K. Gröchenig, Foundations of Time-Frequency Analysis (Birkhäuser, Boston, 2001)
448 References

69. F. Guichard, J.-M. Morel, Partial differential equations and image iterative filtering, in The
State of the Art in Numerical Analysis, ed. by I.S. Duff, G.A. Watson. IMA Conference Series
(New Series), vol. 63 (Oxford University Press, Oxford, 1997)
70. A. Haddad, Texture separation BV − G and BV − L1 models. Multiscale Model. Simul.
6(1), 273–286 (electronic) (2007)
71. P.R. Halmos, Measure Theory (D. Van Nostrand, New York, 1950)
72. M. Hanke-Bourgeois, Grundlagen der Numerischen Mathematik und des Wissenschaftlichen
Rechnens, 3rd edn. (Vieweg+Teubner, Wiesbaden, 2009)
73. L. He, S.J. Osher, Solving the Chan-Vese model by a multiphase level set algorithm based on
the topological derivative, in Scale Space and Variational Methods in Computer Vision, ed. by
F. Sgallari, A. Murli, N. Paragios. Lecture Notes in Computer Science, vol. 4485 (Springer,
Berlin, 2010), pp. 777–788
74. W. Hinterberger, O. Scherzer, Variational methods on the space of functions of bounded
Hessian for convexification and denoising. Computing 76, 109–133 (2006)
75. W. Hinterberger, O. Scherzer, C. Schnörr, J. Weickert, Analysis of optical flow models in the
framework of the calculus of variations. Numer. Funct. Anal. Optim. 23(1), 69–89 (2002)
76. M. Hintermüller, W. Ring, An inexact Newton-CG-type active contour approach for the
minimization of the Mumford-Shah functional. J. Math. Imaging Vis. 20(1–2), 19–42 (2004).
Special issue on mathematics and image analysis
77. M. Holler, Theory and numerics for variational imaging — artifact-free JPEG decompression
and DCT based zooming. Master’s thesis, Universität Graz (2010)
78. B.K.P. Horn, B.G. Schunck, Determining optical flow. Artif. Intell. 17, 185–203 (1981)
79. G. Huisken, Flow by mean curvature of convex surfaces into spheres. J. Differ. Geom. 20(1),
237–266 (1984)
80. J. Jost, Partial Differential Equations. Graduate Texts in Mathematics, vol. 214 (Springer,
New York, 2002). Translated and revised from the 1998 German original by the author
81. L.A. Justen, R. Ramlau, A non-iterative regularization approach to blind deconvolution.
Inverse Probl. 22, 771–800 (2006)
82. S. Kakutani, Concrete representation of abstract (m)-spaces (a characterization of the space
of continuous functions). Ann. Math. Second Ser. 42(4), 994–1024 (1941)
83. B. Kawohl, N. Kutev, Maximum and comparison principle for one-dimensional anisotropic
diffusion. Math. Ann. 311, 107–123 (1998)
84. S.L. Keeling, W. Ring, Medical image registration and interpolation by optical flow with
maximal rigidity. J. Math. Imaging Vis. 23, 47–65 (2005)
85. S.L. Keeling, R. Stollberger, Nonlinear anisotropic diffusion filtering for multiscale edge
enhancement. Inverse Probl. 18(1), 175–190 (2002)
86. S. Kichenassamy, The Perona-Malik paradox. SIAM J. Appl. Math. 57, 1328–1342 (1997)
87. S. Kindermann, S.J. Osher, J. Xu, Denoising by BV-duality. J. Sci. Comput. 28(2–3), 411–
444 (2006)
88. J.J. Koenderink, The structure of images. Biol. Cybern. 50(5), 363–370 (1984)
89. G.M. Korpelevič, An extragradient method for finding saddle points and for other problems.
Ékonomika i Matematicheskie Metody 12(4), 747–756 (1976)
90. G. Kutyniok, D. Labate, Construction of regular and irregular shearlets. J. Wavelet Theory
Appl. 1, 1–10 (2007)
91. E.H. Lieb, M. Loss, Analysis. Graduate Studies in Mathematics, vol. 14, 2nd edn. (American
Mathematical Society, Providence, 2001)
92. L.H. Lieu, L. Vese, Image restoration and decomposition via bounded Total Variation and
negative Hilbert-Sobolev spaces. Appl. Math. Optim. 58, 167–193 (2008)
93. P.-L. Lions, B. Mercier, Splitting algorithms for the sum of two nonlinear operators. SIAM
J. Numer. Anal. 16(6), 964–979 (1979)
94. A.K. Louis, P. Maass, A. Rieder, Wavelets: Theory and Applications (Wiley, Chichester,
1997)
References 449

95. M. Lysaker, A. Lundervold, X.-C. Tai, Noise removal using fourth-order partial differential
equation with applications to medical magnetic resonance images in space and time. IEEE
Trans. Image Process. 12(12), 1579–1590 (2003)
96. J. Ma, G. Plonka, The curvelet transform: a review of recent applications. IEEE Signal
Process. Mag. 27(2), 118–133 (2010)
97. S. Mallat, A Wavelet Tour of Signal Processing - The Sparse Way, with Contributions from
Gabriel Peyré, 3rd edn. (Elsevier/Academic, Amsterdam, 2009)
98. D. Marr, E. Hildreth, Theory of edge detection. Proc. R. Soc. Lond. 207, 187–217 (1980)
99. Y. Meyer, Oscillating Patterns in Image Processing and Nonlinear Evolution Equations.
University Lecture Series, vol. 22 (American Mathematical Society, Providence, 2001). The
fifteenth Dean Jacqueline B. Lewis memorial lectures
100. J. Modersitzki, FAIR: Flexible Algorithms for Image Registration. Fundamentals of Algo-
rithms, vol. 6 (Society for Industrial and Applied Mathematics, Philadelphia, 2009)
101. D. Mumford, J. Shah, Optimal approximations by piecewise smooth functions and variational
problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989)
102. H.J. Muthsam, Lineare Algebra und ihre Anwendungen, 1st edn. (Spektrum Akademischer
Verlag, Heidelberg, 2006)
103. F. Natterer, F. Wuebbeling, Mathematical Methods in Image Reconstruction (Society for
Industrial and Applied Mathematics, Philadelphia, 2001)
104. J. Nečas, Les méthodes directes en théorie des équations elliptiques (Masson, Paris, 1967)
105. S.J. Osher, R. Fedkiw, Level Set Methods and Dynamic Implicit Surfaces. Applied
Mathematical Sciences, vol. 153 (Springer, Berlin, 2003)
106. S.J. Osher, J.A. Sethian, Fronts propagating with curvature-dependent speed: algorithms
based on Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988)
107. S.J. Osher, A. Sole, L. Vese, Image decomposition and restoration using Total Variation
minimization and the h−1 norm. Multiscale Model. Simul. 1(3), 349–370 (2003)
108. S. Paris, P. Kornprobst, J. Tumblin, F. Durand, Bilateral filtering: theory and applications.
Found. Trends Comput. Graph. Vis. 4(1), 1–73 (2009)
109. W.B. Pennebaker, J.L. Mitchell, JPEG: Still Image Data Compression Standard (Springer,
New York, 1992)
110. P. Perona, J. Malik, Scale-space and edge detection using anisotropic diffusion. IEEE Trans.
Pattern Anal. Mach. Intell. 12(7), 629–639 (1990)
111. G. Plonka, G. Steidl, A multiscale wavelet-inspired scheme for nonlinear diffusion. Int. J.
Wavelets Multiresolut. Inf. Process. 4(1), 1–21 (2006)
112. T. Pock, D. Cremers, H. Bischof, A. Chambolle, An algorithm for minimizing the Mumford-
Shah functional, in 2009 IEEE 12th International Conference on Computer Vision (2009),
pp. 1133–1140
113. L.D. Popov, A modification of the Arrow-Hurwicz method for search of saddle points. Math.
Notes 28, 845–848 (1980)
114. W.K. Pratt, Digital Image Processing (Wiley, New York, 1978)
115. T. Preußer, M. Rumpf, An adaptive finite element method for large scale image processing.
J. Vis. Commun. Image Represent. 11(2), 183–195 (2000)
116. J.M.S. Prewitt, Object enhancement and extraction, in Picture Processing and Psychopic-
torics, ed. by B.S. Lipkin, A. Rosenfeld (Academic, New York, 1970)
117. T.W. Ridler, S. Calvard, Picture thresholding using an iterative selection method. IEEE Trans.
Syst. Man Cybern. 8(8), 630–632 (1978)
118. R.T. Rockafellar, Convex Analysis. Princeton Mathematical Series (Princeton University
Press, Princeton, 1970)
119. A. Rosenfeld, A.C. Kak, Digital Picture Processing (Academic, New York, 1976)
120. E. Rouy, A. Tourin, A viscosity solutions approach to shape-from-shading. SIAM J. Numer.
Anal. 29(3), 867–884 (1992)
121. L.I. Rudin, S.J. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms.
Phys. D Nonlinear Phenom. 60(1–4), 259–268 (1992)
450 References

122. W. Rudin, Functional Analysis. McGraw-Hill Series in Higher Mathematics (McGraw-Hill,

New York, 1973)
123. W. Rudin, Principles of Mathematical Analysis. International Series in Pure and Applied
Mathematics, 3rd edn. (McGraw-Hill, New York, 1976)
124. G. Sapiro, Color snakes. Comput. Vis. Image Underst. 68(2), 247–253 (1997)
125. O. Scherzer, Denoising with higher order derivatives of bounded variation and an application
to parameter estimation. Computing 60(1), 1–27 (1998)
126. O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, F. Lenzen, Variational Methods in
Imaging (Springer, New York, 2009)
127. L. Schwartz, Théorie des distributions, vol. 1, 3rd edn. (Hermann, Paris, 1966)
128. J.A. Sethian, Level Set Methods and Fast Marching Methods, 2nd edn. (Cambridge University
Press, Cambridge, 1999)
129. S. Setzer, G. Steidl, Variational methods with higher order derivatives in image processing, in
Approximation XII, ed. by M. Neamtu, L.L. Schumaker (Nashboro Press, Brentwood, 2008),
pp. 360–386
130. R.E. Showalter, Monotone Operators in Banach Space and Nonlinear Partial Differential
Equations. Mathematical Surveys and Monographs, vol. 49 (American Mathematical Society,
Providence, 1997)
131. J. Simon, Régularité de la solution d’une équation non linéaire dans RN , in Journées
d’Analyse Non Linéaire, ed. by P. Bénilan, J. Robert. Lecture Notes in Mathematics, vol. 665
(Springer, Heidelberg, 1978)
132. S.M. Smith, J.M. Brady, SUSAN—A new approach to low level image processing. Int. J.
Comput. Vis. 23(1), 45–78 (1997)
133. I.E. Sobel, Camera models and machine perception. PhD thesis, Stanford University, Palo
Alto (1970)
134. P. Soille, Morphological Image Analysis - Principles and Applications (Springer, Berlin,
1999)
135. J. Stoer, R. Bulirsch, Introduction to Numerical Analysis. Texts in Applied Mathematics,
vol. 12, 3rd edn. (Springer, New York, 2002). Translated from the German by R. Bartels, W.
Gautschi and C. Witzgall
136. C. Tomasi, R. Manduchi, Bilateral filtering for gray and color images, in International
Conference of Computer Vision (1998), pp. 839–846
137. F. Tröltzsch, Optimal Control of Partial Differential Equations. Graduate Studies in
Mathematics, vol. 112 (American Mathematical Society, Providence, 2010). Theory, methods
and applications, Translated from the 2005 German original by Jürgen Sprekels
138. G.N. Watson, A Treatise on the Theory of Bessel Functions. Cambridge Mathematical Library,
2nd edn. (Cambridge University Press, Cambridge, 1995). Revised edition
139. J.B. Weaver, Y. Xu, D.M. Healy Jr., L.D. Cromwell, Filtering noise from images with wavelet
transforms. Magn. Reson. Med. 21, 288–295 (1991)
140. J. Weickert, Anisotropic Diffusion in Image Processing. European Consortium for
Mathematics in Industry (B. G. Teubner, Stuttgart, 1998)
141. J. Weidmann, Linear Operators in Hilbert Spaces. Graduate Texts in Mathematics, vol. 68.
(Springer, New York, 1980). Translated from the German by Joseph Szücs
142. M. Welk, J. Weickert, G. Steidl, A four-pixel scheme for singular differential equations, in
Scale-Space and PDE Methods in Computer Vision, ed. by R. Kimmel, N. Sochen, J. Weick-
ert. Lecture Notes in Computer Science, vol. 3459 (Springer, Berlin, 2005), pp. 610–621
143. M. Welk, J. Weickert, F. Becker, C. Schnörr, C. Feddern, B. Burgeth, Median and related
local filters for tensor-valued images. Signal Process. 87(2), 291–308 (2007)
144. A.P. Witkin, Scale-space filtering, in Proceedings of the International Joint Conference on
Artificial Intelligence (1983), pp. 1019–1021
145. L.P. Yaroslavsky, Digital Picture Processing, An Introduction (Springer, Berlin, 1985)
146. J. Yeh, Real Analysis, 3rd edn. (World Scientific, Hackensack, 2014). Theory of measure and
integration
References 451

147. K. Yosida, Functional Analysis. Grundlehren der Mathematischen Wissenschaften [Funda-

mental Principles of Mathematical Sciences], vol. 123, 6th edn. (Springer, Berlin, 1980)
148. E. Zeidler, Nonlinear Functional Analysis and Its Applications. II/A - Linear Monotone
Operators (Springer, New York, 1990)
149. W.P. Ziemer, Weakly Differentiable Functions (Springer, New York, 1989)
Picture Credits

All figures not listed in these picture credits only contain our own pictures. Only the
first figure that features a certain image is listed.
Fig. 1.1 Matthew Mendoza@Flickr http://www.flickr.
com/photos/mattmendoza/2421196777/(License:
http://creativecommons.org/licenses/by-sa/2.0/
legalcode)PerkinElmer (http://www.cellularimaging.com/
assays/receptor_activation),CNRS/Université de St-Etienne
(France), Labor für Mikrozerspanung (Universität Bremen) . . . . . . . 3
Fig. 1.5 Last Hero@Flickr http://www.flickr.com/photos/
uwe_schubert/4594327195/ (License: http://
creativecommons.org/licenses/by-sa/2.0/legalcode),
Kai Schreiber@Flickr http://www.flickr.com/photos/genista/
1249056653/ (License: http://creativecommons.org/licenses/
by-sa/2.0/legalcode), http://grin.hq.nasa.gov/ABSTRACTS/
GPN-2002-000064.html.. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8
Fig. 3.14 huangjiahui@Flickr http://www.flickr.com/photos/
huangjiahui/3128463578/ (license: http://creativecommons.
org/licenses/by-sa/2.0/legalcode) . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 96
Fig. 5.17 Mike Baird@Flickr http://www.flickr.com/photos/
mikebaird/4533794674/ (License: http://creativecommons.
org/licenses/by/2.0/legalcode).. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 223
Fig. 5.19 Benson Kua@Flickr http://www.flickr.com/photos/
bensonkua/3301838191/ (License: http://creativecommons.
org/licenses/by-sa/2.0/legalcode) . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 227

K. Bredies, D. Lorenz, Mathematical Image Processing,
Applied and Numerical Harmonic Analysis,
https://doi.org/10.1007/978-3-030-01458-2
454 Picture Credits

Fig. 6.4 Grzegorz Łobiński@Flickr

http://www.flickr.com/photos/gregloby/3073384793/
(License: http://creativecommons.org/licenses/by/2.0/
legalcode),
damo1977@Flickr http://www.flickr.com/photos/
damo1977/3944418313
(License: http://creativecommons.org/licenses/by/2.0/legalcode) .. 261
Fig. 6.11 John Lambert Pearson@Flickr http://www.flickr.
com/photos/orphanjones/419121524/ (License: http://
creativecommons.org/licenses/by/2.0/legalcode) . . . . . . . . . . . . . . . . . . . 336
Fig. 6.13 Paul Mannix@Flickr http://www.flickr.com/photos/
paulmannix/552264573/ (License: http://creativecommons.
org/licenses/by/2.0/legalcode) . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 339
Fig. 6.14 jphilipg@Flickr http://www.flickr.com/photos/
15708236@N07/2754478731/ (License: http://
creativecommons.org/licenses/by/2.0/legalcode) . . . . . . . . . . . . . . . . . . . 345
Fig. 6.26 jphilipg@Flickr http://www.flickr.com/photos/
15708236@N07/3317203380/ (License: http://
creativecommons.org/licenses/by/2.0/legalcode) . . . . . . . . . . . . . . . . . . . 388
Fig. 6.27 Jerry Ferguson@Flickr http://www.flickr.com/photos/
fergusonphotography/3056953388/ (License: http://
creativecommons.org/licenses/by/2.0/legalcode) . . . . . . . . . . . . . . . . . . . 389
Fig. 6.29 D. Sharon Pruitt@Flickr http://www.flickr.com/photos/
pinksherbet/2242046686/ (License: http://creativecommons.
org/licenses/by/2.0/legalcode) . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 390
Fig. 6.30 John Morgan@Flickr http://www.flickr.com/
photos/aidanmorgan/2574179254/ (License: http://
creativecommons.org/licenses/by/2.0/legalcode) . . . . . . . . . . . . . . . . . . . 391
Notation

Abbreviations
B(Rd ) space of bounded functions, page 89
B() Borel algebra over , page 32
BC(Rd ) space of bounded and continuous functions, page 173
BUC(Rd ) space of bounded and uniformly continuous functions, page 187
Br (x) ball around x with radius r, page 17
C (U , Y ) space of bounded and uniformly continuous mapping on U with
values in Y , page 19
C (U, Y ) space of continuous mappings on U with values in Y , page 19
C set of complex numbers, page 16
Cα,β (u) semi-norms on the Schwartz space, page 112
Cc (, X) space of continuous function with compact support, page 42
C0 (, X) closure of Cc (, X) in C (, X), page 44
C k () space of k-times continuously differentiable functions, page 49
C ∞ () space of infinitely differentiable functions, page 49
Cb∞ (Rd ) space of infinitely differentiable functions with bounded derivatives,
page 173
cψ constant in the admissibility condition for a wavelet ψ, page 147
DF (x) Fréchet derivative of F at the point x, page 20
Dk F kth derivative of F , page 21
D() space of test functions, page 49
D()∗ space of distributions, page 50
DA u linear coordinate transformation of u : Rd → K using A ∈ Rd×d ,
i.e., DA u(x) = u(Ax), page 56
DCT(u) discrete cosine transform of u, page 140
Ddiv
m
space of vector fields with mth weak divergence and vanishing trace
on the boundary, page 329

K. Bredies, D. Lorenz, Mathematical Image Processing,
Applied and Numerical Harmonic Analysis,
https://doi.org/10.1007/978-3-030-01458-2
456 Notation

Ddiv,∞ space of L∞ vector fields with weak divergence and vanishing

normal trace on the boundary, page 366
div F divergence of F , page 22
divm F mth divergence of F , page 328
dom(F ) domain of a mapping F : X ⊃ dom(F ) → Y , page 18
dom F effective domain of definition of a functional F : X → R∞ ,
page 263
epi F epigraph of a functional F : X → R∞ , page 263
F Fourier transform, page 110
Gg u windowed Fourier transform of u w.r.t. the window g, page 142
graph(F ) graph of a mapping F : X → Y , page 18
Gσ Gaussian function with variance σ , page 75
Hk k-dimensional Hausdorff measure, page 34
H m,p () Sobolev space, page 51
m,p
H0 () closure of D() in H m,p (), page 51
H m () abbreviation for H m,2 (), page 52
H s (Rd ) fractional Sobolev space, page 124
Hu histogram of u, page 63
IK indicator functional of the set K, page 273
i imaginary unit, page 16
id identity map, page 20
IDCT(u) inverse discrete cosine transform of u, page 141
Im z imaginary part of a complex number z ∈ C, page 16
int(U ) interior of the set U , page 17
J0 (∇uσ ) structure tensor of u and noise level σ , page 222
Jρ (∇uσ ) structure tensor to u, noise level σ , and spatial scale ρ, page 224
K set of real or complex numbers, page 16
ker(F ) kernel of mapping F , page 20
L1loc () space of locally integrable functions, page 49
Ld d-dimensional Lebesgue measure, page 34
L(X, Y ) space of linear continuous mappings from X to Y , page 19
Lk (X, Y ) space of k-linear continuous mappings from X to Y , page 21
p
Lμ (, X) space of p-integrable functions w.r.t. the measure μ, page 38
Lp () standard Lebesgue space, page 39
Lψ u continuous wavelet transform of u w.r.t. the wavelet ψ, page 145
M(, X) space of vector-valued finite Radon measures on , page 43
Mn sequence of operators for the C ∞ -approximation on bounded Lips-
chitz domains, page 317
MSE(u, v) mean squared error of the images u and v, page 61
My u modulation of u : Rd → K with y ∈ Rd , i.e., My u(x) = exp(ix ·
y)u(x), page 111
Pm projection onto the complement of "m , page 322
PSNR(u, v) Peak-Signal-to-Noise-Ratio of images u and v, page 61
Qm projection onto "m , page 322
Notation 457

Qp projection matrix onto the subspace perpendicular to p ∈ Rd ,

i.e., Qp = (id − p⊗p
|p|2
), page 200
R set of real numbers, page 16
R∞ extended real numbers, page 263
Re z real part of a complex number z ∈ C, page 16
rg(F ) range of mapping F , page 20
S(Rd ) Schwartz space of rapidly decreasing functions, page 112
S(Rd )∗ space of tempered distributions, page 120
S d×d space of symmetric d × d matrices, page 189
sinc sinus cardinalis, page 58
spt(F ) set of affine linear supporting functionals for F , page 304
supp f support of f , page 42
Tud σ trace of σ w.r.t. ∇u of a BV-function u, page 373
Tf distribution induced by the function f , page 50
Tuν σ normal trace of σ w.r.t. ∇u of a BV-function u, page 370
T u0 u translation of u ∈ X by u0 ∈ X, i.e., Tu0 u = u + u0 , page 294
Ty u shifting u : Rd → K by y ∈ Rd , i.e., Ty u(x) = u(x + y), page 56
TV(u) total variation of u, page 354
Vj subspace in a multiscale analysis, also approximation space to scale
j , page 150
Wj detail space to scale j , page 152

Symbols
|x| Euclidean norm of vectors x ∈ Kn , page 16
|x|p p-norm of vector x ∈ Kn , page 16
|α| order of a multi-index α ∈ Nd , page 22
|μ| total variation measure to μ, page 43
|| Lebesgue measure of a set ∈ B(Rd ), page 34
x direction of vector x ∈ R2 , page 78
χB characteristic function of the set B, page 57
δx Dirac measure in x, page 33
x ∗ , xX∗ ×X duality pairing of x ∗ and x, page 25
η, ξ local image coordinates, page 204
∂α
∂x α αth derivative, page 22
y" ceiling function applied to y, largest integer that is smaller than y,
page 56
medB (u) median filter with B applied to u, page 101
μ restriction of μ to , page 34
∇F gradient of F , page 21
∇ 2F Hessian matrix of F , page 21
∇mF mth derivative of F organized as an m-tensor, page 321
μM norm of the Radon measure μ, page 43
f m,p norm in the Sobolev space H m,p (), page 51
f p norm in the Banach space Lp (, X), page 39
458 Notation

xX norm in a Banach space X, page 16

U closure of the set U , page 17
∂F subdifferential of the convex functional F , page 286
∂U boundary of the set U , page 17
∂α αth derivative, page 22
∂η u, ∂ξ u derivatives of u in local coordinates, page 204
(x, y)X inner product in the Hilbert space X, page 29
(xn ) sequence (x1 , x2 , . . . ), page 17
u Laplace operator applied to u, page 22
gamma function, page 34
0 (X) space of pointwise suprema of continuous affine linear functionals,
page 305
"m () space of polynomials of degree up to m − 1 on , page 322
q
u inverse Fourier transform of u, page 115
Tq inverse Fourier transform of a distribution T , page 121
T0 Fourier transform of a distribution T , page 121
0
u Fourier transform of u, page 110
0
u discrete Fourier transform of u, page 135
F G infimal convolution of F and G, page 311
F∗ Fenchel conjugate of F , page 305
F∗ adjoint mapping of F , page 27
F Hilbert space adjoint of F , page 32
K⊥ cone normal to K, page 290
p⊗q tensor product of two Rd vectors, i.e., p ⊗ q = pq T , page 199
p∗ dual exponent to p ∈ [1, ∞[, i.e., p1 + p1∗ = 1, page 41
u*B erosion of u with B, page 88
u⊕B dilation of u with B, page 88
u$B black-top-hat operator with B applied to u, page 95
u (B, C) hit-or-miss operator with B and C applied to u, page 94
u∗h convolution of u with h, page 69
U H U filtered with H , page 83
u•B closing of u with B, page 93
u◦B opening of u with B, page 93
um B mth rank-order filter, page 101
u#B white-top-hat operator with B applied to u, page 95
U ⊂⊂ X U is a compact subset of X, page 17
uv periodic convolution of u and v, page 138
U⊥ annihilator of the set U , page 24
U⊥ orthogonal complement of U , page 30
Vj ⊗ Vj tensor product of the approximation spaces Vj with itself, page 161
x·y Euclidean inner product of x and y in KN , page 30
x⊥y x is orthogonal to y, i.e., (x, y)X = 0, page 30
X → Y X is continuously embedded in Y , page 20
XY matrix inequality for symmetric matrices X and Y : X−Y is positive
semi-definite, page 189
Notation 459

x∨y maximum of x and y, page 88

x∧y minimum of x and y, page 88
X∗ dual space of X, page 24
xn → x x is the limit of the sequence (xn ), page 17
xn x x is the weak limit of the sequence (xn ), page 25
∗
xn x x is the weak*-limit of the sequence (xn ), page 26
ψj,k shifted and scaled function ψ, i.e., ψj , k(x) = 2−j/2 ψ(2−j x − k),
page 149
Index

Aberration Ball
chromatic, 222 closed, 17
Adjoint, 27 open, 16
Hilbert space ∼, 32 Banach space, 23
of unbounded mappings, 27 Bandwidth, 128
of the weak gradient, 329 Bessel function
Admissibility condition, 147 modified, 254
Algorithm Bessel’s inequality, 31
Arrow-Hurwicz, 408 Bidual space, 25
edge detection ∼ according to Canny, 96 Bilateral filter, 101
extra gradient ∼, 408 Binomial filter, 84
forward-backward splitting, 397 Black top-hat operator, 95
isodata ∼, 67 Borel algebra, 32
primal-dual, 316 Boundary
Alias effect, 60 topological, 17
Aliasing, 125, 128, 133, 137 Boundary extension, 82
Almost everywhere, 35 constant, 82
Annihilator, 24, 308 periodical, 82
Anti-extensionality symmetrical, 82
of opening, 93 zero-∼, 82
Aperture problem, 11 Boundary initial value problem, 210
Approximation space, 152 Boundary treatment, 82
Artifact, 1, 2, 340 Boundedness, 18
color, 387
compression ∼, 61
staircasing ∼, 376 Caccioppoli set, 355
Average Calculus of variations, 263
moving, 68, 84, 247 direct method of, 263
nonlocal, 103 fundamental lemma of the, 50
Averaging filter Cauchy problem, 184, 191
non-local, 104 CFL condition, 245
Axioms Characteristic
∼ of a scale space, 173 method of ∼s, 240
of a transport equation, 240, 241

K. Bredies, D. Lorenz, Mathematical Image Processing,
Applied and Numerical Harmonic Analysis,
https://doi.org/10.1007/978-3-030-01458-2
462 Index

Closing, 93 periodic, 138

Closure ∼ theorem, 111, 122, 138
topological, 17 Coordinates
CMYK space, 4 local ∼, 204
Coercivity, 265 Coordinate transformation
Coherence, 5, 224, 226 linear, 56, 176
Color channel, 4 Corner, 225
Color space, 2 Correspondence problem, 10
CMYK-∼, 4 Cosine transform
discrete, 3 discrete, 140
HSV-∼, 4 Counting measure, 33
RGB-∼, 4 Curvature motion, 205, 241
Comparison principle Curvelet, 165
of a scale space, 174 ∼ transform, 165
Complement
orthogonal, 30
Completely continuous, 28 Deblurring, 7, 79, 119
Completion variational, 254
of a normed space, 25 Deconvolution, 119
of a σ -algebra, 35 blind, 258
Compression, 13, 61 Sobolev, 338
with the DCT, 141 total variation, 377
by inpainting, 260 variational, 254, 420
by means of the wavelet transform, 164 Definiteness
Cone positive, 16, 29
convex, 290, 308 Delta comb, 59, 129
dual, 308 Denoising, 6, 102, 104, 117, 418
normal, 290 with the heat equation, 198
Conjugate with median filter, 101
Fenchel, 305 with the moving average, 68
Continuity, 18 of objects, 86
Lipschitz ∼, 18 with the Perona-Malik equation, 219
at a point, 18 Sobolev, 334
sequential-∼, 18 total variation, 374
uniform, 18 variational, 252
weakly sequentially ∼, 27 by wavelet soft thresholding, 184
weak*-sequentially ∼, 27 Derivative, 20
Contrast invariance, 92 of a distribution, 51
of erosion and dilation, 92 distributional, 51
Contrast invariance of a scale space, 175 Fréchet-∼, 20
Convergence Gâteaux-∼, 23
in D (), 50 at a point, 20
in D ()∗ , 50 weak, 51
of sequences, 17 Detail space, 152
strict, 360 Differentiability, 20
weak, 25 continuous, 20
weak*-∼, 26 Fréchet, 20
Convex, 28, 270 Gâteaux, 23
∼ analysis, 270 weak, 51
strictly ∼, 270 Differential equation
Convolution, 69 numerical solution, 229
discrete, 81, 82 partial, 184, 196
∼ kernel, 69 Differential operator
infimal, 311 elliptic, 189
Index 463

Diffusion of grayscale images, 90

anisotropic, 206, 222 multiscale ∼, 181
coherence enhancing, 226 Error
edge-enhancing, 226 mean squared ∼, 61
equation, 207 Error measure
∼ equation mean squared error, 61
isotropic, 206 peak signal-to-Noise ratio, 61
numerical solution, 234 Euler-Lagrange equation, 259
∼ tensor, 206 for Sobolev deconvolution, 340
Dilation for Sobolev denoising, 335
of binary images, 88 of Sobolev inpainting, 344
of grayscale images, 89 for Sobolev interpolation, 348
multiscale ∼, 181 for total variation deconvolution, 378
Discrepancy functional, 252 for total variation-denoising, 375
Discrepancy term, 252 for total variation inpainting, 381
Distribution, 50 for total variation interpolation, 383
delta ∼, 50, 121 Exponent
Dirac ∼, 50 dual, 41
regular, 50 Extensionality
tempered, 120 closing, 93
Distribution function, 63
Distributivity
of erosion and dilation, 90 Fenchel conjugate, 305
Divergence, 22 of indicator functionals, 308
weak, 328, 343 of norm functionals, 307
Domain, 47 of positively homogeneous functionals, 307
bounded, 47 of sums, 311
of a linear mapping, 19 of suprema, 310
Lipschitz ∼, 47 Fenchel conjugation
Domain of definition calculus for, 309
effective, 263 Field, 16
Duality Filter
of Banach spaces, 23 bilateral, 101
of erosion and dilation, 90 binomial, 84
Fenchel, 301 discrete, 83
Fenchel-Rockafellar, 312 efficient implementation, 85
of opening and closing, 93 Gaussian, 84
Duality gap, 414 high-pass , 117
Duality pairing, 25 linear, 68, 75
Dual space, 24 low-pass , 117
of Lebesgue spaces, 41 median , 101
of spaces of continuous functions, 44 morphological, 86
moving average, 84
rank-order ∼, 100
Edge, 5, 213, 224 Filter function, 68
detection, 7 Filter mask, 83
∼detection according to Canny, 76 separable, 85
Elliptic, 189 Finite differences, 234
Embedded, 20 Fourier coefficients, 126
Embedding Fourier series, 125
compact, 28 Fourier transform, 109
Epigraph, 263 discrete, 135
Erosion on L1 (Rd ), 110
of binary images, 88 on L2 (Rd ), 116
464 Index

on S (Rd )∗ , 121 positive, 16, 308

windowed, 142 HSV space, 4
Fréchet derivative, 20
Frequency representation, 117
Function Idempotence
harmonic, 259 of opening and closing, 93
integrable, 36 Image, 2
interpolation ∼, 58 binary ∼, 3
measurable, 36 continuous, 2
p-integrable, 38 ∼ decomposition, 6, 117, 427
weakly harmonic, 259 discrete, 2
Functional grayscale-∼, 3
affine linear supporting, 304 ∼ processing
bidual, 305 high-level method, 5
coercive, 265 low-level method, 5
convex, 270 Image domain, 2
discrepancy, 252 Inequality
dual, 305 Cauchy-Schwarz ∼, 29
indicator, 273 Fenchel, 306
Lagrange, 315 Hölder’s ∼, 41
objective, 252 Minkowski-∼, 16
penalty, 252 Minkowski ∼ for integrals, 39
proper, 263 Poincaré-Wirtinger ∼, 323, 360
Rudin-Osher-Fatemi-∼, 375 subgradient, 286
supporting, 304 Young’s ∼, 69
Tikhonov, 279, 420 Young’s ∼ for products, 307
Infimal convolution, 311
exact, 436
Gabor transform, 143 Infimum
Gamma function, 34 essential, 39
Gaussian filter, 84 Injection
Gaussian function, 75, 78, 167, 177 canonical, 25
Generator Inner product, 29
infinitesimal, 187 Euclidean, 30
∼ of a multiscale analysis, 150 Inpainting, 12, 246, 258
Gradient, 21 harmonic ∼, 260
Gradient flow, 395 Sobolev ∼, 341
Graph, 285 total variation ∼, 379
Grayscale invariance Integral, 36
of a scale space, 175 Bochner ∼, 37
Gray-value-shift invariance Integration theory, 32
of a scale space, 175 Interior
topological, 17
Interior point, 17
Hat function, 57 Interpolation, 55, 75
Heat conduction, 78 bilinear ∼, 58
Heat equation, 196, 206, 231 nearest-neighbor ∼, 56, 58
Hessian matrix, 21 piecewise constant ∼, 56
High-level methods, 4 piecewise linear ∼, 57
Hilbert space, 29 separable ∼, 56
Hilbert space adjoint, 32 Sobolev, 346
Histogram, 63 tensor product ∼, 58
Hit-or-miss operator, 94 total variation, 383
Homogeneity Interpolation function, 58, 75
Index 465

Inverse problem compact, 28

ill-posed, 257 densely defined linear, 19
Isometry invariance differentiable, 20
of a scale space, 175 essentially bounded, 39
Isomorphism k-linear and continuous, 21
isometric, 20 linear and continuous, 19
linear, 20 multilinear and continuous, 21
nonexpansive, 396
Riesz ∼, 32
Jacobian matrix, 22 self-adjoint, 32
set-valued, 285
unbounded, 19
Kernel weakly closed, 26
of a linear mapping, 20 weakly*-closed, 26
weakly continuous, 26
weakly*-continuous, 26
Lagrange multiplier, 282, 299 Maximum principle
Laplace discrete, 236
∼ filter, 85 for harmonic functions, 260
∼ operator, 22, 80 of the Perona-Malik equation, 212, 250
∼ sharpening, 79 of Sobolev denoising, 334
Lebesgue space, 38 for Sobolev inpainting, 341
standard ∼, 39 for total variation inpainting, 440
Leibniz rule, 22 of total variation denoising, 375
Lemma Mean curvature motion, 203
Fatou’s, 40 Mean squared error, 61
fundamental ∼ of the calculus of variations, Mean-value property, 260
50 Measurability, 32
Weyl’s, 260 of a function, 36
Level-set Lebesgue ∼, 35
-method, 426 of vector-valued mappings, 36
sub-∼, 275 w.r.t. a measure, 35
Limit point, 17 Measure, 33
Lipschitz Borel ∼, 33
∼ constant, 19 counting ∼, 59
∼ continuity, 18 Dirac ∼, 33
∼ domain, 47 finite, 33
∼ property, 47 Hausdorff ∼, 34
Locality Lebesgue ∼, 34
of a scale space, 174 positive Radon ∼, 33
Lower semicontinuity product ∼, 45
sequential, 263 restriction of a ∼, 34
Low-level methods, 5 σ -finite, 33
Low-pass filter signed, 42
perfect, 118, 134, 346 ∼ theory, 32
total variation, 43, 364
vector-valued, 42
Map Measure space, 33
affine linear, 433 Median, 101
p-integrable, 38 Median filter, 104
Mapping Method
adjoint, 27 ∼ of characteristics, 240
affine linear, 271 direct ∼ in the calculus of variations, 263
bilinear, 21 level-set, 426
466 Index

splitting, 397 Orthonormal system, 31

See also Algorithm complete, 31
Mexican hat function, 148 Outer normal, 48
Minimal surface, 381 Oversampling, 132
Modulation, 110
Mollifier, 52, 73
Monotonicity Parseval
of erosion and dilation, 90 ∼ identity, 31
of the integral, 37 Partition of unity, 48
of opening and closing, 93 Peak signal-to-noise-ratio, 61
of a scale space, 174 Penalty functional, 252
Morphology, 86 Perimeter
Motion blur, 75 of a measurable set, 355
Motion blurring, 119, 120 Perona-Malik equation, 207
Multi-index, 22 in local coordinates, 212
Multiscale analysis, 150, 172 modified, 216
one-dimensional, 213
Photograph, 1
Neighborhood, 17 Pixel, 1
Noise, 1, 6, 61 Plancherel’s formula, 117
impulsive, 101 Plateau function, 57
Norm, 16 Poisson formula, 128
equivalent, 16 Polar decomposition, 365
Euclidean vector ∼, 16 of measures, 43
Frobenius, 385 Pre-dual space, 25
operator∼, 19 Pre-Hilbert space, 29
spectral, 386 Prewitt filter, 85
strictly convex, 272 Probability measure, 33
Normal Problem
outer ∼, 48 dual, 301
∼ trace, 331 minimization ∼, 252
Null set, 35 saddle point, 315
Numbers Projection
complex, 16 orthogonal, 30
extended real, 263 Pushforward measure, 46
real, 16
Nyquist rate, 128
Range
of a linear mapping, 20
Opening, 93 Recursivity
Operator of a scale space, 173
divergence∼, 22 Reflexive, 25
forward, 278 Registration, 11, 429
Laplace∼, 22 Regularity
normal trace ∼, 371 of a scale space, 174
p-Laplace, 333 Relative topology, 17
trace ∼, 52 Resolvent, 397
Operator norm, 19 RGB space, 4
Optical flow, 9, 427 Riesz mapping, 32
Order
of a multi-index, 22
Orthogonal, 30 Saddle point, 315
Orthogonality, 30 problem, 315
Orthonormal basis, 31 Sampling, 59, 125, 127, 129, 131
Index 467

of mean values, 59 of linear and continuous mappings, 19

over∼, 132 of locally integrable functions, 49
point ∼, 59 measurable, 32
under∼, 132, 134 measure ∼, 33
Sampling rate, 126 normed, 16
Sampling theorem, 127, 137 of p-integrable functions, 38
Scale, 173 of polynomials of degree up to m − 1, 322
Scale invariance pre-dual, 25
of a scale space, 175 pre-Hilbert ∼, 29
Scale space, 171, 173, 392 quotient∼, 18
linear, 196 of Radon measures, 43
morphological, 199 reflexive, 25
Scaling equation, 151 Schwartz ∼, 112
Scaling function separable, 17
∼ of a multiscale analysis, 150 Sobolev ∼, 51
Schwartz function, 112 of special functions of bounded total
Schwartz space, 112 variation, 426
Segmentation, 8, 66 standard Lebesgue ∼, 39
Semi-group property of tempered distributions, 120
of a scale space, 173 of test functions, 49
Semi linear, 31 Staircasing effect, 215, 376, 429
Seminorm Step function, 36
admissable, 322 Step-size restriction, 236
Sequence, 17 Structure element, 88, 89
Cauchy ∼, 23 Structure tensor, 222, 224
convergent, 17 Subdifferential, 286
strictly convergent, 360 chain rule, 294, 434
weak* convergent, 26 chain rule for the, 435
weakly convergent, 25 for convex functions of integrands, 293
Sequence space, 40 of indicator functionals, 290
Sharpening, 79, 119 of norm functionals, 291
Shearlet, 165 of the Sobolev seminorm, 332
Shearlet transform, 166 sum rule, 294, 434
σ -algebra, 32 of the total variation, 373
Singular value decomposition, 386 Subgradient, 286
Sobel filter, 85 existence of a ∼, 297
Sobolev space, 51 inequality, 286
fractional, 124 Sublevel-set, 275
Soft thresholding, 405 Subset
Fourier, 184 closed, 17
wavelet-∼, 184 compact, 17
Space dense, 17
Banach ∼, 23 open, 17
bidual, 25 sequentially closed, 17
of bounded and continuous functions, 173 sequentially compact, 17
of bounded functions, 89 weakly sequentially compact, 26
of the continuous mappings, 19 weak*-sequentially compact, 26
of differentiable functions, 49 Support
of distributions, 50 compact, 42
dual, 24 Support functional
embedded, 20 affine linear, 281
of functions of bounded total variation, 356 Supremum
of k-linear and continuous mappings, 21 essential, 39
Lebesgue ∼, 38 Symlets, 155
468 Index

Symmetry of a Sobolev function, 52

Hermitian, 29 Transfer function, 117, 138
Translation, 56
Translation invariance
Tensor product, 161 of the convolution, 69
Test functions, 50 of erosion and dilation, 90
Texture, 5, 119 of opening and closing, 93
Theorem of a scale space, 175
Baire category∼, 23 Transport equation, 203, 205, 234
Banach-Alaoglu, 44 numerical solution, 240
Banach-Alaoglu ∼, 26 Triangle inequality, 16
Banach-Steinhaus, 24
closed range, 28
convolution ∼, 111, 122, 138 Undersampling, 132, 134
divergence ∼, 49 Upwind scheme, 245
Eberlein-Šmulyan ∼, 26, 42
Eidelheit’s ∼, 287
Fischer-Riesz, 40 Variation
Fubini’s ∼, 45 ∼al method, 252
Gauss’ integral ∼ on BV(), 371 ∼al problem, 252
Gauss’s ∼, 49 ∼ measure, 43
Hahn-Banach extension ∼, 25 total, 354
Hahn-Banach separation ∼, 28 Vector field visualization, 228
of the inverse mapping, 24 Viscosity
Lebesgue’s dominated convergence ∼, 40 numerical, 245
of the open mapping, 24 Viscosity solution, 192, 194
Pythagorean, 30 Viscosity sub-solution, 193
Riesz-Markov representation ∼, 44 Viscosity super-solution, 194
Riesz representation ∼, 31, 41
weak Gauss’ ∼, 53
Threshold, 66 Wavelet, 145, 147
Topology, 16, 264 Daubechies ∼s, 155
norm-∼, 16 fast ∼ reconstruction, 159
strong, 16, 264 fast ∼ transform, 158
weak, 26, 265 Haar ∼, 149, 153
weak*-∼, 26 inverse ∼ transform, 147
Total variation, 354 ∼ space, 152
bounded, 356 Weak formulation, 219
vectorial, 386 Weak solution, 217, 218
Trace, 52 Weyl’s Lemma, 260
of a BV() function, 362 White top-hat operator, 95
of Ddiv,∞ -vector field, 372 Window function, 141
normal, 331, 371
Applied and Numerical Harmonic
Analysis (81 Volumes)

1. A. I. Saichev and W.A. Woyczyński: Distributions in the Physical and Engi-

neering Sciences (ISBN 978-0-8176-3924-2)
2. C.E. D’Attellis and E.M. Fernandez-Berdaguer: Wavelet Theory and Harmonic
Analysis in Applied Sciences (ISBN 978-0-8176-3953-2)
3. H.G. Feichtinger and T. Strohmer: Gabor Analysis and Algorithms (ISBN
978-0-8176-3959-4)
4. R. Tolimieri and M. An: Time-Frequency Representations (ISBN 978-0-8176-
3918-1)
5. T.M. Peters and J.C. Williams: The Fourier Transform in Biomedical Engineer-
ing (ISBN 978-0-8176-3941-9)
6. G.T. Herman: Geometry of Digital Spaces (ISBN 978-0-8176-3897-9)
7. Teolis: Computational Signal Processing with Wavelets (ISBN 978-0-8176-
3909-9)
8. J. Ramanathan: Methods of Applied Fourier Analysis (ISBN 978-0-8176-3963-
1)
9. J.M. Cooper: Introduction to Partial Differential Equations with MATLAB
(ISBN 978-0-8176-3967-9)
10. Procházka, N.G. Kingsbury, P.J. Payner, and J. Uhlir: Signal Analysis and
Prediction (ISBN 978-0-8176-4042-2)
11. W. Bray and C. Stanojevic: Analysis of Divergence (ISBN 978-1-4612-7467-4)
12. G.T. Herman and A. Kuba: Discrete Tomography (ISBN 978-0-8176-4101-6)
13. K. Gröchenig: Foundations of Time-Frequency Analysis (ISBN 978-0-8176-
4022-4)
14. L. Debnath: Wavelet Transforms and Time-Frequency Signal Analysis (ISBN
978-0-8176-4104-7)
15. J.J. Benedetto and P.J.S.G. Ferreira: Modern Sampling Theory (ISBN 978-0-
8176-4023-1)
16. D.F. Walnut: An Introduction to Wavelet Analysis (ISBN 978-0-8176-3962-4)

K. Bredies, D. Lorenz, Mathematical Image Processing,
Applied and Numerical Harmonic Analysis,
https://doi.org/10.1007/978-3-030-01458-2
470 Applied and Numerical Harmonic Analysis (81 Volumes)

17. Abbate, C. DeCusatis, and P.K. Das: Wavelets and Subbands (ISBN 978-0-
8176-4136-8)
18. O. Bratteli, P. Jorgensen, and B. Treadway: Wavelets Through a Looking Glass
(ISBN 978-0-8176-4280-80)
19. H.G. Feichtinger and T. Strohmer: Advances in Gabor Analysis (ISBN 978-0-
8176-4239-6)
20. O. Christensen: An Introduction to Frames and Riesz Bases (ISBN 978-0-8176-
4295-2)
21. L. Debnath: Wavelets and Signal Processing (ISBN 978-0-8176-4235-8)
22. G. Bi and Y. Zeng: Transforms and Fast Algorithms for Signal Analysis and
Representations (ISBN 978-0-8176-4279-2)
23. J.H. Davis: Methods of Applied Mathematics with a MATLAB Overview (ISBN
978-0-8176-4331-7)
24. J.J. Benedetto and A.I. Zayed: Sampling, Wavelets, and Tomography (ISBN
978-0-8176-4304-1)
25. E. Prestini: The Evolution of Applied Harmonic Analysis (ISBN 978-0-8176-
4125-2)
26. L. Brandolini, L. Colzani, A. Iosevich, and G. Travaglini: Fourier Analysis and
Convexity (ISBN 978-0-8176-3263-2)
27. W. Freeden and V. Michel: Multiscale Potential Theory (ISBN 978-0-8176-
4105-4)
28. O. Christensen and K.L. Christensen: Approximation Theory (ISBN 978-0-
8176-3600-5)
29. O. Calin and D.-C. Chang: Geometric Mechanics on Riemannian Manifolds
(ISBN 978-0-8176-4354-6)
30. J.A. Hogan: Time–Frequency and Time–Scale Methods (ISBN 978-0-8176-
4276-1)
31. Heil: Harmonic Analysis and Applications (ISBN 978-0-8176-3778-1)
32. K. Borre, D.M. Akos, N. Bertelsen, P. Rinder, and S.H. Jensen: A Software-
Defined GPS and Galileo Receiver (ISBN 978-0-8176-4390-4)
33. T. Qian, M.I. Vai, and Y. Xu: Wavelet Analysis and Applications (ISBN 978-3-
7643-7777-9)
34. G.T. Herman and A. Kuba: Advances in Discrete Tomography and Its Applica-
tions (ISBN 978-0-8176-3614-2)
35. M.C. Fu, R.A. Jarrow, J.-Y. Yen, and R.J. Elliott: Advances in Mathematical
Finance (ISBN 978-0-8176-4544-1)
36. O. Christensen: Frames and Bases (ISBN 978-0-8176-4677-6)
37. P.E.T. Jorgensen, J.D. Merrill, and J.A. Packer: Representations, Wavelets, and
Frames (ISBN 978-0-8176-4682-0)
38. M. An, A.K. Brodzik, and R. Tolimieri: Ideal Sequence Design in Time-
Frequency Space (ISBN 978-0-8176-4737-7)
39. S.G. Krantz: Explorations in Harmonic Analysis (ISBN 978-0-8176-4668-4)
40. Luong: Fourier Analysis on Finite Abelian Groups (ISBN 978-0-8176-4915-9)
41. G.S. Chirikjian: Stochastic Models, Information Theory, and Lie Groups,
Volume 1 (ISBN 978-0-8176-4802-2)
Applied and Numerical Harmonic Analysis (81 Volumes) 471

42. Cabrelli and J.L. Torrea: Recent Developments in Real and Harmonic Analysis
(ISBN 978-0-8176-4531-1)
43. M.V. Wickerhauser: Mathematics for Multimedia (ISBN 978-0-8176-4879-4)
44. B. Forster, P. Massopust, O. Christensen, K. Gröchenig, D. Labate, P. Van-
dergheynst, G. Weiss, and Y. Wiaux: Four Short Courses on Harmonic Analysis
(ISBN 978-0-8176-4890-9)
45. O. Christensen: Functions, Spaces, and Expansions (ISBN 978-0-8176-4979-
1)
46. J. Barral and S. Seuret: Recent Developments in Fractals and Related Fields
(ISBN 978-0-8176-4887-9)
47. O. Calin, D.-C. Chang, and K. Furutani, and C. Iwasaki: Heat Kernels for
Elliptic and Sub-elliptic Operators (ISBN 978-0-8176-4994-4)
48. C. Heil: A Basis Theory Primer (ISBN 978-0-8176-4686-8)
49. J.R. Klauder: A Modern Approach to Functional Integration (ISBN 978-0-
8176-4790-2)
50. J. Cohen and A.I. Zayed: Wavelets and Multiscale Analysis (ISBN 978-0-8176-
8094-7)
51. Joyner and J.-L. Kim: Selected Unsolved Problems in Coding Theory (ISBN
978-0-8176-8255-2)
52. G.S. Chirikjian: Stochastic Models, Information Theory, and Lie Groups,
Volume 2 (ISBN 978-0-8176-4943-2)
53. J.A. Hogan and J.D. Lakey: Duration and Bandwidth Limiting (ISBN 978-0-
8176-8306-1)
54. Kutyniok and D. Labate: Shearlets (ISBN 978-0-8176-8315-3)
55. P.G. Casazza and P. Kutyniok: Finite Frames (ISBN 978-0-8176-8372-6)
56. V. Michel: Lectures on Constructive Approximation (ISBN 978-0-8176-8402-
0)
57. D. Mitrea, I. Mitrea, M. Mitrea, and S. Monniaux: Groupoid Metrization
Theory (ISBN 978-0-8176-8396-2)
58. T.D. Andrews, R. Balan, J.J. Benedetto, W. Czaja, and K.A. Okoudjou:
Excursions in Harmonic Analysis, Volume 1 (ISBN 978-0-8176-8375-7)
59. T.D. Andrews, R. Balan, J.J. Benedetto, W. Czaja, and K.A. Okoudjou:
Excursions in Harmonic Analysis, Volume 2 (ISBN 978-0-8176-8378-8)
60. D.V. Cruz-Uribe and A. Fiorenza: Variable Lebesgue Spaces (ISBN 978-3-
0348-0547-6)
61. W. Freeden and M. Gutting: Special Functions of Mathematical (Geo-)Physics
(ISBN 978-3-0348-0562-9)
62. A. I. Saichev and W.A. Woyczyński: Distributions in the Physical and Engi-
neering Sciences, Volume 2: Linear and Nonlinear Dynamics of Continuous
Media (ISBN 978-0-8176-3942-6)
63. S. Foucart and H. Rauhut: A Mathematical Introduction to Compressive Sensing
(ISBN 978-0-8176-4947-0)
64. Herman and J. Frank: Computational Methods for Three-Dimensional
Microscopy Reconstruction (ISBN 978-1-4614-9520-8)
472 Applied and Numerical Harmonic Analysis (81 Volumes)

65. Paprotny and M. Thess: Realtime Data Mining: Self-Learning Techniques for
Recommendation Engines (ISBN 978-3-319-01320-6)
66. Zayed and G. Schmeisser: New Perspectives on Approximation and Sampling
Theory: Festschrift in Honor of Paul Butzer’s 85th Birthday (ISBN 978-3-319-
08800-6)
67. R. Balan, M. Begue, J. Benedetto, W. Czaja, and K.A Okoudjou: Excursions in
Harmonic Analysis, Volume 3 (ISBN 978-3-319-13229-7)
68. Boche, R. Calderbank, G. Kutyniok, J. Vybiral: Compressed Sensing and its
Applications (ISBN 978-3-319-16041-2)
69. S. Dahlke, F. De Mari, P. Grohs, and D. Labate: Harmonic and Applied
Analysis: From Groups to Signals (ISBN 978-3-319-18862-1)
70. Aldroubi, New Trends in Applied Harmonic Analysis (ISBN 978-3-319-27871-
1)
71. M. Ruzhansky: Methods of Fourier Analysis and Approximation Theory (ISBN
978-3-319-27465-2)
72. G. Pfander: Sampling Theory, a Renaissance (ISBN 978-3-319-19748-7)
73. R. Balan, M. Begue, J. Benedetto, W. Czaja, and K.A Okoudjou: Excursions in
Harmonic Analysis, Volume 4 (ISBN 978-3-319-20187-0)
74. O. Christensen: An Introduction to Frames and Riesz Bases, Second Edition
(ISBN 978-3-319-25611-5)
75. E. Prestini: The Evolution of Applied Harmonic Analysis: Models of the Real
World, Second Edition (ISBN 978-1-4899-7987-2)
76. J.H. Davis: Methods of Applied Mathematics with a Software Overview, Second
Edition (ISBN 978-3-319-43369-1)
77. M. Gilman, E. M. Smith, S. M. Tsynkov: Transionospheric Synthetic Aperture
Imaging (ISBN 978-3-319-52125-1)
78. S. Chanillo, B. Franchi, G. Lu, C. Perez, E.T. Sawyer: Harmonic Analysis,
Partial Differential Equations and Applications (ISBN 978-3-319-52741-3)
79. R. Balan, J. Benedetto, W. Czaja, M. Dellatorre, and K.A Okoudjou: Excursions
in Harmonic Analysis, Volume 5 (ISBN 978-3-319-54710-7)
80. Pesenson, Q.T. Le Gia, A. Mayeli, H. Mhaskar, D.X. Zhou: Frames and Other
Bases in Abstract and Function Spaces: Novel Methods in Harmonic Analysis,
Volume 1 (ISBN 978-3-319-55549-2)
81. Pesenson, Q.T. Le Gia, A. Mayeli, H. Mhaskar, D.X. Zhou: Recent Applications
of Harmonic Analysis to Function Spaces, Differential Equations, and Data
Science: Novel Methods in Harmonic Analysis, Volume 2 (ISBN 978-3-319-
55555-3)
82. F. Weisz: Convergence and Summability of Fourier Transforms and Hardy
Spaces (ISBN 978-3-319-56813-3)
83. Heil: Metrics, Norms, Inner Products, and Operator Theory (ISBN 978-3-319-
65321-1)
84. S. Waldron: An Introduction to Finite Tight Frames: Theory and Applications.
(ISBN: 978-0-8176-4814-5)
85. Joyner and C.G. Melles: Adventures in Graph Theory: A Bridge to Advanced
Mathematics. (ISBN: 978-3-319-68381-2)
Applied and Numerical Harmonic Analysis (81 Volumes) 473

86. B. Han: Framelets and Wavelets: Algorithms, Analysis, and Applications

(ISBN: 978-3-319-68529-8)
87. H. Boche, G. Caire, R. Calderbank, M. März, G. Kutyniok, R. Mathar:
Compressed Sensing and Its Applications (ISBN: 978-3-319-69801-4)
88. N. Minh Chong: Pseudodifferential Operators and Wavelets over Real and
p-adic Fields (ISBN: 978-3-319-77472-5)
89. A. I. Saichev and W.A. Woyczyński: Distributions in the Physical and Engi-
neering Sciences, Volume 3: Random and Fractal Signals and Fields (ISBN:
978-3-319-92584-4)
For an up-to-date list of ANHA titles, please visit http://www.springer.com/
series/4968

Ipp - Developer Reference - 2021.7 773258 773259
No ratings yet
Ipp - Developer Reference - 2021.7 773258 773259
1,801 pages
Lattice Wave Digital Filter Implementations of Wavelet Transform
No ratings yet
Lattice Wave Digital Filter Implementations of Wavelet Transform
143 pages
Wave Dynamics (296 Pages) - Snehashish Chakraverty & Perumandla Karunakar - World Scientific Publishing Company, Singapore, 2022 - World Scientific - 9789811245350 - Anna's Archive
No ratings yet
Wave Dynamics (296 Pages) - Snehashish Chakraverty & Perumandla Karunakar - World Scientific Publishing Company, Singapore, 2022 - World Scientific - 9789811245350 - Anna's Archive
297 pages
Microelectronic Devices Circuits and Systems Second International Conference ICMDCS 2021 Vellore India February 11 13 2021 Revised Selected Papers 1st Edition V. Arunachalam (Editor) - The latest ebook is available, download it today
100% (3)
Microelectronic Devices Circuits and Systems Second International Conference ICMDCS 2021 Vellore India February 11 13 2021 Revised Selected Papers 1st Edition V. Arunachalam (Editor) - The latest ebook is available, download it today
76 pages
An Introduction To Frames and Riesz Bases (PDFDrive) PDF
100% (2)
An Introduction To Frames and Riesz Bases (PDFDrive) PDF
449 pages
Wavelet Filter-Based Weak Signature Detection Method and Its Application On Rolling Element Bearing Prognostics
No ratings yet
Wavelet Filter-Based Weak Signature Detection Method and Its Application On Rolling Element Bearing Prognostics
26 pages
Four Short Courses On Harmonic Analysis Wavelets, Frames, Time
No ratings yet
Four Short Courses On Harmonic Analysis Wavelets, Frames, Time
265 pages
ASOC D 24 05576 - Reviewer
No ratings yet
ASOC D 24 05576 - Reviewer
53 pages
Complex Analysis Unit 5 (2marks)
No ratings yet
Complex Analysis Unit 5 (2marks)
1 page
Numerical Fourier Analysis Second Edition (Gerlind Plonka, Daniel Potts, Gabriele Steidl Etc.) (Z-Library)
No ratings yet
Numerical Fourier Analysis Second Edition (Gerlind Plonka, Daniel Potts, Gabriele Steidl Etc.) (Z-Library)
676 pages
Saichev 2013
No ratings yet
Saichev 2013
427 pages
Distributions in Physics and Engineering - Saichev, Woyczyński
No ratings yet
Distributions in Physics and Engineering - Saichev, Woyczyński
345 pages
Acoustic Emission Fracture Detection in Structural Materials
No ratings yet
Acoustic Emission Fracture Detection in Structural Materials
231 pages
PHD Thesis On Medical Image Fusion
100% (3)
PHD Thesis On Medical Image Fusion
6 pages
Notification Project Coordinat o 24625
No ratings yet
Notification Project Coordinat o 24625
6 pages
G. Plonka, D. Potts, G. Steidl, M. Tasche - Numerical Fourier Analysis-Birkhäuser (2018)
No ratings yet
G. Plonka, D. Potts, G. Steidl, M. Tasche - Numerical Fourier Analysis-Birkhäuser (2018)
624 pages
The XFT Quadrature in Discrete Fourier Analysis (Rafael G. Campos) (Z-Library)
No ratings yet
The XFT Quadrature in Discrete Fourier Analysis (Rafael G. Campos) (Z-Library)
245 pages
Sampling: Theory and Applications: Stephen D. Casey Kasso A. Okoudjou Michael Robinson Brian M. Sadler Editors
100% (1)
Sampling: Theory and Applications: Stephen D. Casey Kasso A. Okoudjou Michael Robinson Brian M. Sadler Editors
210 pages
Eee-Nd-2021-Ee 6602-Embedded Systems-335614348-70481 (Ee6602)
No ratings yet
Eee-Nd-2021-Ee 6602-Embedded Systems-335614348-70481 (Ee6602)
3 pages
Nano
No ratings yet
Nano
38 pages
EE6602 Syllaabus
No ratings yet
EE6602 Syllaabus
2 pages
A Machine Learning Approach For Recognizing Intellectual Development Disorder Using EEG
No ratings yet
A Machine Learning Approach For Recognizing Intellectual Development Disorder Using EEG
4 pages
A Comparative Study of Current Matlab and C++ Wavelet Software
No ratings yet
A Comparative Study of Current Matlab and C++ Wavelet Software
13 pages
Great Pyramid of Giza and Golden Section Transform Preview
No ratings yet
Great Pyramid of Giza and Golden Section Transform Preview
26 pages
Py Wavelets
No ratings yet
Py Wavelets
122 pages
Engineering Journal Image Restoration Using A Combination of Blind and Non-Blind Deconvolution Techniques
No ratings yet
Engineering Journal Image Restoration Using A Combination of Blind and Non-Blind Deconvolution Techniques
15 pages
Dokumen - Pub Machine Learning in Bioinformatics of Protein Sequences Algorithms Databases and Resources For Modern Protein Bioinformatics 9811258570 9789811258572
No ratings yet
Dokumen - Pub Machine Learning in Bioinformatics of Protein Sequences Algorithms Databases and Resources For Modern Protein Bioinformatics 9811258570 9789811258572
378 pages
Assignment4 40168195
No ratings yet
Assignment4 40168195
10 pages
Noise Filtering by Fourier Series and NHA
No ratings yet
Noise Filtering by Fourier Series and NHA
9 pages
Seismic Attributes CSEG RECORDER
No ratings yet
Seismic Attributes CSEG RECORDER
15 pages
Computational Mechanics With Deep Learning: Genki Yagawa Atsuya Oishi
100% (1)
Computational Mechanics With Deep Learning: Genki Yagawa Atsuya Oishi
408 pages
Numerical Solution of Abel Integral Equaation
No ratings yet
Numerical Solution of Abel Integral Equaation
2 pages
Grafos PDF
No ratings yet
Grafos PDF
344 pages
Al-Naji2018 Article AnEfficientMotionMagnification
No ratings yet
Al-Naji2018 Article AnEfficientMotionMagnification
16 pages
Moon Asaki Snipes Application Inspired Linear Algebra
No ratings yet
Moon Asaki Snipes Application Inspired Linear Algebra
538 pages
Vdoc - Pub - Foundations of Discrete Harmonic Analysis Applied and Numerical Harmonic Analysis
100% (1)
Vdoc - Pub - Foundations of Discrete Harmonic Analysis Applied and Numerical Harmonic Analysis
257 pages
Dokumen - Pub - Numerical Approximation of Partial Differential Equations 1st Ed 3319323539 978 3 319 32353 4 978 3 319 32354 1 3319323547
100% (2)
Dokumen - Pub - Numerical Approximation of Partial Differential Equations 1st Ed 3319323539 978 3 319 32353 4 978 3 319 32354 1 3319323547
541 pages
White
No ratings yet
White
21 pages
Wave Let Book
No ratings yet
Wave Let Book
75 pages
Wavelets and Subband Codding
100% (3)
Wavelets and Subband Codding
519 pages
Discrete Wavelet Transform For Image Processing: International Journal of Emerging Technology and Advanced Engineering
No ratings yet
Discrete Wavelet Transform For Image Processing: International Journal of Emerging Technology and Advanced Engineering
5 pages
Differential-Algebraic Equations A Projection Analysis
100% (1)
Differential-Algebraic Equations A Projection Analysis
667 pages
Graph Theory Assignment
No ratings yet
Graph Theory Assignment
3 pages
Tenenbaum Pollard
100% (1)
Tenenbaum Pollard
819 pages
Math 06 04 203
No ratings yet
Math 06 04 203
17 pages
Algorithm Engineering - Selected Results and Surveys
No ratings yet
Algorithm Engineering - Selected Results and Surveys
428 pages
Fourier Analysis On Finite Abe - Bao Luong PDF
100% (5)
Fourier Analysis On Finite Abe - Bao Luong PDF
166 pages
Image Compression Using Combined Approach of Ezw and Spiht With DCT
No ratings yet
Image Compression Using Combined Approach of Ezw and Spiht With DCT
4 pages
Algorithms With JULIA
100% (1)
Algorithms With JULIA
447 pages
Front
No ratings yet
Front
6 pages
Introduction To The Tools of Scientific Computing
100% (2)
Introduction To The Tools of Scientific Computing
429 pages
PDE Notes
No ratings yet
PDE Notes
9 pages
Umberto Michelucci - Fundamental Mathematical Concepts For Machine Learning in Science-Springer (2024)
100% (1)
Umberto Michelucci - Fundamental Mathematical Concepts For Machine Learning in Science-Springer (2024)
259 pages
Fault Identification and Monitoring in Rolling Element Bearing
100% (1)
Fault Identification and Monitoring in Rolling Element Bearing
234 pages
Varuna Shree-Kumar 2018
No ratings yet
Varuna Shree-Kumar 2018
8 pages
ExamSectionGuide403 AdmitCard 2022 09-06-03 15
No ratings yet
ExamSectionGuide403 AdmitCard 2022 09-06-03 15
5 pages
Review of Matrix Analysis by Rajendra B - 1998 - Linear Algebra and Its Applicat
No ratings yet
Review of Matrix Analysis by Rajendra B - 1998 - Linear Algebra and Its Applicat
3 pages
2RC Filter
No ratings yet
2RC Filter
14 pages
Different Image Fusion Techniques - A Critical Review: Deepak Kumar Sahu, M.P.Parsai
No ratings yet
Different Image Fusion Techniques - A Critical Review: Deepak Kumar Sahu, M.P.Parsai
4 pages
Artificial Intelligence For Scientific Discoveries: Raban Iten
100% (1)
Artificial Intelligence For Scientific Discoveries: Raban Iten
168 pages
Recent Advances in Computer Vision Applications Using Parallel Processing
No ratings yet
Recent Advances in Computer Vision Applications Using Parallel Processing
126 pages
Linear Algebra Coding With Python Pythons Application For Linear Algebra
100% (3)
Linear Algebra Coding With Python Pythons Application For Linear Algebra
196 pages
Section-A (Multiple Choice Questions) : (A) Z (B) Z (C) Z
No ratings yet
Section-A (Multiple Choice Questions) : (A) Z (B) Z (C) Z
3 pages
EPWT
No ratings yet
EPWT
5 pages
CUDA C Programming Guide
No ratings yet
CUDA C Programming Guide
376 pages
SSC Teaching Qualification
No ratings yet
SSC Teaching Qualification
3 pages
(Lecture Notes in Electrical Engineering 84) Manuel Duarte Ortigueira (Auth.) - Fractional Calculus For Scientists and Engineers-Springer Netherlands (2011) PDF
100% (2)
(Lecture Notes in Electrical Engineering 84) Manuel Duarte Ortigueira (Auth.) - Fractional Calculus For Scientists and Engineers-Springer Netherlands (2011) PDF
159 pages
Numerical Methods For Eigenvalue Problems (PDFDrive)
No ratings yet
Numerical Methods For Eigenvalue Problems (PDFDrive)
217 pages
Katsushi Ikeuchi (Editor) - Computer Vision - A Reference Guide-Springer (2021)
100% (4)
Katsushi Ikeuchi (Editor) - Computer Vision - A Reference Guide-Springer (2021)
1,436 pages
IMPORTANTE - Monte Carlo Methods-Springer (2020)
No ratings yet
IMPORTANTE - Monte Carlo Methods-Springer (2020)
433 pages
EP Den Help
No ratings yet
EP Den Help
3 pages
(Cambridge Texts in Applied Mathematics) Gabriel J. Lord, Catherine E. Powell, Tony Shardlow - An Introduction To Computational Stochastic PDEs-Cambridge University Press (2014)
No ratings yet
(Cambridge Texts in Applied Mathematics) Gabriel J. Lord, Catherine E. Powell, Tony Shardlow - An Introduction To Computational Stochastic PDEs-Cambridge University Press (2014)
513 pages
2022 Book GraphNeuralNetworksFoundations PDF
100% (2)
2022 Book GraphNeuralNetworksFoundations PDF
701 pages
Advances in Big Data PDF
100% (2)
Advances in Big Data PDF
364 pages
Optimization For Learning and Control - 2023 - Hansson
No ratings yet
Optimization For Learning and Control - 2023 - Hansson
413 pages
(Richard O. Duda, Peter E. Hart, David G. Stork) P (BookFi)
No ratings yet
(Richard O. Duda, Peter E. Hart, David G. Stork) P (BookFi)
673 pages
Nice Hand Book - Numerical Analysis PDF
100% (14)
Nice Hand Book - Numerical Analysis PDF
472 pages
Artificial Neural Networks in Real-Life Applications
No ratings yet
Artificial Neural Networks in Real-Life Applications
395 pages
Feature Extraction Using GLCM (Mamography)
No ratings yet
Feature Extraction Using GLCM (Mamography)
3 pages
Advances in Machine Learning and Signal Processing
100% (1)
Advances in Machine Learning and Signal Processing
309 pages
Python For Graph and Network Analysis
100% (3)
Python For Graph and Network Analysis
214 pages
Computer Simulation
100% (2)
Computer Simulation
314 pages
20220602123413331joint CSIR UGC JRF NET EXAM
No ratings yet
20220602123413331joint CSIR UGC JRF NET EXAM
1 page
The Pillar of Computation Theory
100% (3)
The Pillar of Computation Theory
343 pages
Algorithmic Mathematics (Hougardy, Stefan, Vygen, Jens)
83% (6)
Algorithmic Mathematics (Hougardy, Stefan, Vygen, Jens)
167 pages
Bayesoptbook A4
No ratings yet
Bayesoptbook A4
374 pages
Zhiyuan L. Introduction To Graph Neural Networks 2020
No ratings yet
Zhiyuan L. Introduction To Graph Neural Networks 2020
129 pages
Mathematics Computer Scientists Practice
100% (6)
Mathematics Computer Scientists Practice
583 pages
Mathematics of Deep Learning Introduction - Leonid Berlyand
100% (3)
Mathematics of Deep Learning Introduction - Leonid Berlyand
134 pages
Jürgen Jost - Mathematical Concepts-Springer International Publishing (2015)
No ratings yet
Jürgen Jost - Mathematical Concepts-Springer International Publishing (2015)
315 pages
Sanet - ST 3030528146
100% (5)
Sanet - ST 3030528146
506 pages
Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
100% (9)
Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
530 pages
Interactive Segmentation Techniques Algorithms and Performance Evaluation
No ratings yet
Interactive Segmentation Techniques Algorithms and Performance Evaluation
82 pages
10.1007@978 3 030 15628 2
100% (2)
10.1007@978 3 030 15628 2
552 pages
Tutorial: Good Practice in Well Ties: First Break October 2003
No ratings yet
Tutorial: Good Practice in Well Ties: First Break October 2003
27 pages
Foundations of Deep Reinforcement Learning Theory and Practice in Python (Laura Graesser, Wah Loon Keng) (Z-Library)
100% (3)
Foundations of Deep Reinforcement Learning Theory and Practice in Python (Laura Graesser, Wah Loon Keng) (Z-Library)
413 pages
Probability and Computing Randomization and Probabilistic Techniques in Algorithms and Data Analysis, 2nd Edition
100% (2)
Probability and Computing Randomization and Probabilistic Techniques in Algorithms and Data Analysis, 2nd Edition
490 pages
Geometry of Deep Learning - Ye
100% (6)
Geometry of Deep Learning - Ye
338 pages
Normalization Techniques in Deep Learning
100% (1)
Normalization Techniques in Deep Learning
117 pages
Linear Algebra Optimization Machine Learning PDF
100% (12)
Linear Algebra Optimization Machine Learning PDF
507 pages
Practical Linear Algebra A Geometry Toolbox
100% (7)
Practical Linear Algebra A Geometry Toolbox
506 pages
Ronald Kneusel-Numbers and Computers-En
100% (2)
Ronald Kneusel-Numbers and Computers-En
237 pages
Kernel Methods For Pattern Analysis
100% (3)
Kernel Methods For Pattern Analysis
478 pages
(Studies in Computational Intelligence) Witold Pedrycz, Shyi-Ming Chen - Deep Learning - Algorithms and Applications-Springer (2020)
100% (7)
(Studies in Computational Intelligence) Witold Pedrycz, Shyi-Ming Chen - Deep Learning - Algorithms and Applications-Springer (2020)
368 pages

Mathematical Image Processing 9783030014575 9783030014582

Uploaded by

Mathematical Image Processing 9783030014575 9783030014582

Uploaded by

Applied and Numerical Harmonic Analysis

Editorial Advisory Board

Akram Aldroubi Gitta Kutyniok

Douglas Cochran Mauro Maggioni

Hans G. Feichtinger Zuowei Shen

Christopher Heil Thomas Strohmer

Stéphane Jaffard Yang Wang

More information about this series at http://www.springer.com/series/4968

ISSN 2296-5009 ISSN 2296-5017 (electronic)

Library of Congress Control Number: 2018961010

© Springer Nature Switzerland AG 2018

Along with our commitment to publish mathematically significant works at the

Antenna theory Prediction theory

adaptive modeling inherent in time-frequency-scale methods such as wavelet theory.

University of Maryland John J. Benedetto

Graz, Austria Kristian Bredies

4 Frequency and Multiscale Methods . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 109

6.3 Minimization in Sobolev Spaces and BV . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 316

Picture Credits.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 453

Applied and Numerical Harmonic Analysis (81 Volumes) . . . . . . . . . . . . . . . . . . 469

1.1 What Are Images?

© Springer Nature Switzerland AG 2018 1

Microscopy: Digital and analog microscopy is a kind of mixture of photography

We distinguish between discrete and continuous image domains:

Different color spaces are, for example:

Fig. 1.2 RGB space, visualized as a cube

Fig. 1.3 HSV space, H

1.2 The Basic Tasks of Imaging

a superposition of different parts, e.g.,

image = cartoon + texture + noise.

We cover edge detection at the following places: Application 3.23 and

separated by edges, edge detection is a method for segmentation. In other cases,

To solve these problems, we make additional assumptions, for example:

Abstract Mathematical image processing, as a branch of applied mathematics, is

Mathematical image processing, as a branch of applied mathematics, is not a self-

2.1 Fundamentals of Functional Analysis

© Springer Nature Switzerland AG 2018 15

it exhibits different analytical properties. Functional analysis allows us to abstract

2.1.1 Analysis on Normed Spaces

Definition 2.1 (Normed Space) Let X be a vector space over K. A function  ·  :

cx1 ≤ x2 ≤ Cx1 for all x ∈ X.

define equivalent norms on KN . The triangle inequality for | · |p is also known as

The fact that the set U is a compact subset of X we denote by U ⊂⊂ X.

Example 2.3 (Construction of Normed Spaces)

can be constructed with the following norm:

These topological constructions lead to a notion of continuity:

x − yX < δ ⇒ F (x) − F (y)Y < ε

holds. If F is continuous at every point x ∈ U , we call F simply continuous. If,

exists a constant C > 0 such that for all x, y ∈ U , one has

F (x) − F (y)Y ≤ Cx − yX .

The set and the norm

form the normed space of the (uniformly) continuous mappings on U .

Depending on the situation, we will also use the following characterizations of

In this context, let us note the following:

cxX ≤ F xY for all x ∈ X.

In this case, we have F −1  ≤ c−1 .

X → Y ⇐⇒ ∃C > 0 ∀x ∈ X : xY ≤ CxX .

Since L(X, Y ) is also a normed space, we can consider the differentiability of DF

DF (x)y = ∇F (x)y, D2 F (x)(y, z) = zT ∇ 2 F (x)y.

Finally, for U ⊂ RN and F : U → KM , we introduce the notion of the Jacobian

denotes the Laplace operator.

Let us finally introduce a weaker variant of differentiability than the Fréchet

2.1.2 Banach Spaces and Duality

there exists some x ∈ X with xn → x for n → ∞.

Theorem 2.15 (Banach Steinhaus or Uniform Boundedness Principle) Let X

sup Fi xY < ∞ for all x ∈ X ⇒ sup Fi  < ∞.

Theorem 2.16 (Open/Inverse Mapping Theorem) A linear and continuous map-

x ∗ X∗ = sup x ∗ (x).

and for the subspace V ⊂ X∗ , the definition reads

with the norm  · ∗ dual to  · .

 · , · X∗ ×X : X∗ × X → K, x ∗ , xX∗ ×X = x ∗ (x).

J : X → X∗∗ , J (x), x ∗ X∗∗ ×X∗ = x ∗ , xX∗ ×X , J (x)X∗∗ = xX .

The latter equality of the norms is a consequence of the Hahn-Banach extension

lim x ∗ , xn X∗ ×X = x ∗ , xX∗ ×X .

In this case, we write xn x for n → ∞.

Analogously, a sequence (xn∗ ) in X∗ converges in the weak* sense to some x ∗ ∈

Definition 2.1 (Normed Space) Let X be a vector space over K. A function · :

cx1 ≤ x2 ≤ Cx1 for all x ∈ X.

x − yX < δ ⇒ F (x) − F (y)Y < ε

F (x) − F (y)Y ≤ Cx − yX .

cxX ≤ F xY for all x ∈ X.

In this case, we have F −1 ≤ c−1 .

X → Y ⇐⇒ ∃C > 0 ∀x ∈ X : xY ≤ CxX .

sup Fi xY < ∞ for all x ∈ X ⇒ sup Fi < ∞.

x ∗ X∗ = sup x ∗ (x).

with the norm · ∗ dual to · .

· , · X∗ ×X : X∗ × X → K, x ∗ , xX∗ ×X = x ∗ (x).

J : X → X∗∗ , J (x), x ∗ X∗∗ ×X∗ = x ∗ , xX∗ ×X , J (x)X∗∗ = xX .

lim x ∗ , xn X∗ ×X = x ∗ , xX∗ ×X .

lim xn∗ , xX∗ ×X = x ∗ , xX∗ ×X

F ∗ y ∗ , xX∗ ×X = y ∗ , F xY ∗ ×Y , for all x ∈ X, y ∗ ∈ Y ∗ ,

y ∗ , F xn Y ∗ ×Y = F ∗ y ∗ , xn X∗ ×X → F ∗ y ∗ , xX∗ ×X = y ∗ , F xY ∗ ×Y ,

and satisfying that F ∗ y ∗ is the extension of x → y ∗ , F xY ∗ ×Y onto the whole of

Re x ∗ , x ≤ λ for all x ∈ A and Re x ∗ , x ≥ λ for all x ∈ B.

Re x ∗ , x ≤ λ − ε for all x ∈ A and Re x ∗ , x ≥ λ + ε for all x ∈ B.

|(x, y)X | ≤ xX yX for all x, y ∈ X.

x + y2X = x2X + y2X .

x ∗ , xX∗ ×X = (x, y)X for all x ∈ X.

A∈F: (μ )(A) = μ(A ∩ )

defines another measure on , the restriction of μ to .

the triple ( , F| , μ ) defines the measure space restricted to .

where (fn ) is a sequence of step functions that converges pointwise to f almost

Jf ∗ Lpμ (,X)∗ ≤ f ∗ p∗

endowed with the norm μM = |μ|(), is a Banach space.

Furthermore, one has F = μM .