[go: up one dir, main page]

0% found this document useful (0 votes)
31 views282 pages

Course On Optimal Control

This document is a preface and introduction to 'A Course on Optimal Control' by Gjerrit Meinsma and Arjan van der Schaft, part of the Springer Undergraduate Texts in Mathematics and Technology series. It outlines the book's focus on optimal control for finite-dimensional deterministic dynamical systems, providing a self-contained treatment suitable for undergraduate and beginning graduate students in mathematics and engineering. The book includes exercises, examples, and a rich collection of topics related to optimal control, aiming to guide students through mathematical developments and applications.

Uploaded by

mohadam361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views282 pages

Course On Optimal Control

This document is a preface and introduction to 'A Course on Optimal Control' by Gjerrit Meinsma and Arjan van der Schaft, part of the Springer Undergraduate Texts in Mathematics and Technology series. It outlines the book's focus on optimal control for finite-dimensional deterministic dynamical systems, providing a self-contained treatment suitable for undergraduate and beginning graduate students in mathematics and engineering. The book includes exercises, examples, and a rich collection of topics related to optimal control, aiming to guide students through mathematical developments and applications.

Uploaded by

mohadam361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 282

Springer Undergraduate Texts

in Mathematics and Technology

Gjerrit Meinsma
Arjan van der Schaft

A Course on
Optimal Control
Springer Undergraduate Texts
in Mathematics and Technology

Series Editors
Helge Holden, Department of Mathematical Sciences, Norwegian University
of Science and Technology, Trondheim, Norway
Keri A. Kornelson, Department of Mathematics, University of Oklahoma,
Norman, OK, USA

Editorial Board
Lisa Goldberg, Department of Statistics, University of California, Berkeley,
Berkeley, CA, USA
Armin Iske, Department of Mathematics, University of Hamburg, Hamburg,
Germany
Palle E.T. Jorgensen, Department of Mathematics, University of Iowa, Iowa
City, IA, USA
Springer Undergraduate Texts in Mathematics and Technology (SUMAT)
publishes textbooks aimed primarily at the undergraduate. Each text is
designed principally for students who are considering careers either in the
mathematical sciences or in technology-based areas such as engineering,
finance, information technology and computer science, bioscience and
medicine, optimization or industry. Texts aim to be accessible introductions
to a wide range of core mathematical disciplines and their practical,
real-world applications; and are fashioned both for course use and for
independent study.
Gjerrit Meinsma Arjan van der Schaft

A Course on
Optimal Control

123
Gjerrit Meinsma Arjan van der Schaft
Faculty of Electrical Engineering Department of Mathematics
Mathematics and Computer Science Bernoulli Institute
University of Twente University of Groningen
Enschede, The Netherlands Groningen, The Netherlands

ISSN 1867-5506 ISSN 1867-5514 (electronic)


Springer Undergraduate Texts in Mathematics and Technology
ISBN 978-3-031-36654-3 ISBN 978-3-031-36655-0 (eBook)
https://doi.org/10.1007/978-3-031-36655-0

Mathematics Subject Classification: 34Dxx, 37C75, 37N35, 49-01, 49J15, 49K15,


49L12, 49L20, 49N05, 49N10, 93D05, 93D15, 93D30

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2023

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher,
whether the whole or part of the material is concerned, specifically the rights of translation,
reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any
other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are
exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in
this book are believed to be true and accurate at the date of publication. Neither the publisher nor
the authors or the editors give a warranty, expressed or implied, with respect to the material
contained herein or for any errors or omissions that may have been made. The publisher remains
neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

This book reflects a long history of teaching optimal control for students in mathe-
matics and engineering at the universities of Twente and Groningen, the Netherlands.
In fact, the book has grown out of lecture notes that were tested, adapted, and expanded
over many years of teaching.
The present book provides a self-contained treatment of what the undersigned con-
sider to be the core topics of optimal control for finite-dimensional deterministic
dynamical systems. The style of writing aims at carefully guiding the students through the
mathematical developments, and emphasizes motivational examples; either of a math-
ematical nature or motivated by applications of optimal control.
Chapter 1 covers the basics of the classical calculus of variations, including
second-order conditions and integral constraints. This directly motivates the intro-
duction of the minimum principle (more commonly known as the maximum principle)
in Chapter 2. Although the presentation of the book aims at minimizing generalities, the
treatment of the minimum principle as given in Chapter 2 is self-contained, and suited
for a basic course on optimal control. Chapter 3 continues with the dynamic pro-
gramming approach to optimal control, culminating in the Hamilton-Jacobi-Bellman
equation. The connection with the minimum principle is discussed, as well as the
relation between infinite horizon optimal control and Lyapunov functions. In Chapter 4,
the theory of Chapters 2 and 3 is applied, and specialized, to linear control systems with
quadratic cost criteria (LQ-optimal control). This includes a succinct but detailed
treatment of Riccati differential equations and algebraic Riccati equations. The chapter
is concluded with a section on controller design based on LQ-optimal control.
In our experience, the material of Chapters 1–4 provides a good coverage of an
introductory, but mathematically self-contained, course on optimal control for final
year BSc (or beginning MSc) students in (applied) mathematics and beginning MSc
students in engineering. We have taught the course for such an audience for many years
as an 8-week course of 4 lecture hours and 2 tutorial hours per week (5 ECTS in the
European credit system). Of course, the contents of the course can still be adapted. For
example, in some of the editions of the course we did not cover Section 1.7 on integral
constraints, but instead we paid more attention to Lyapunov stability theory as detailed
in Appendix B.
Required background for the course is linear algebra and calculus, basic knowledge
of differential equations, and (rudimentary) acquaintance with control systems. Some
mathematical background is summarized in Appendix A for easy recollection and for
bringing students to the same mathematical level. Appendix B goes further: it does not
only recall some of the basics of differential equations, but also provides a rather
detailed treatment of Lyapunov stability theory including LaSalle’s invariance principle,
as occasionally used in Chapters 3 and 4 of the book.
Chapter 5 of the book is of a different nature. It is not considered to be part of the
basic material for a course on optimal control. Instead, it provides brief outlooks to a
number of (somewhat arbitrarily chosen) topics that are related to optimal control, in
order to raise interest of students. As such Chapter 5 is written differently from Chapters
1–4. In particular the treatment of the covered topics is not always self-contained.

v
vi PREFACE

At the end of each of Chapters 1–4, as well as of Appendix B, there is a rich collection
of exercises, including a number of instructive examples of applications of optimal
control. Solutions to the odd-numbered exercises are provided.
Main contributors to the first versions of the lecture notes (developed since the
1990s) were Hans Zwart, Jan Willem Polderman (both University of Twente), and Henk
Nijmeijer (University of Twente, currently Eindhoven University of Technology). We
thank them for their initial contributions.
In 2006–2008 Arjan van der Schaft (University of Groningen) made a number of
substantial revisions and modifications to the then available lecture notes. In the period
2010–2018 Gjerrit Meinsma (University of Twente) rewrote much of the material, and
added more theory, examples, and illustrations. Finally, in 2021–2023 the book took its
final shape. We thank the students and teaching assistants for providing us with con-
stant feedback and encouragement over the years.
We have profitably used many books and papers in the writing of this book. Some
of these books are listed in the References at the end. In particular some of our
examples and exercises are based on those in Bryson and Ho (1975) and Seierstad and
Sydsaeter (1987). We thank Leonid Mirkin (Technion, Haifa) for Example 4.6.2.
Unavoidably, there will be remaining typos and errors in the book. Mentioning
of them to us is highly welcomed. A list of errata will be maintained at the
website https://people.utwente.nl/g.meinsma?tab=projects.
This website can also be reached by scanning the QR code below.

Enschede, The Netherlands Gjerrit Meinsma


Groningen, The Netherlands Arjan van der Schaft
May 2023
Contents

Notation and Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 Calculus of Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Euler-Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Beltrami Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Higher-Order Euler-Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Relaxed Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.6 Second-Order Conditions for Minimality . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.7 Integral Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2 Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.1 Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2 Quick Summary of the Classic Lagrange Multiplier Method . . . . . . . . . . . 49
2.3 First-order Conditions for Unbounded and Smooth Controls . . . . . . . . . . 50
2.4 Towards the Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.5 Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6 Optimal Control with Final Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.7 Free Final Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.8 Convexity and the Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.2 Principle of Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3 Discrete-Time Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4 Hamilton-Jacobi-Bellman Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5 Connection with the Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.6 Infinite Horizon Optimal Control and Lyapunov Functions . . . . . . . . . . . . 107
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4 Linear Quadratic Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121


4.1 Linear Systems with Quadratic Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.2 Finite Horizon LQ: Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.3 Finite Horizon LQ: Dynamic Programming. . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.4 Riccati Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.5 Infinite Horizon LQ and Algebraic Riccati Equations . . . . . . . . . . . . . . . . . 137
4.6 Controller Design with LQ Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

vii
viii CONTENTS

5 Glimpses of Related Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169


5.1 H1 Theory and Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.2 Dissipative Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.3 Invariant Lagrangian Subspaces and Riccati . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.4 Model Predictive Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
A Background Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
A.1 Positive Definite Functions and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
A.2 A Notation for Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
A.3 Separation of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
A.4 Linear Constant-Coefficient DE’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
A.5 Systems of Linear Time-Invariant DE’s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
A.6 Stabilizability and Detectability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
A.7 Convex Sets and Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
A.8 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

B Differential Equations and Lyapunov Functions . . . . . . . . . . . . . . . . . . . . . . . . . 205


B.1 Existence and Uniqueness of Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
B.2 Definitions of Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
B.3 Lyapunov Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
B.4 LaSalle’s Invariance Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
B.5 Cost-to-Go Lyapunov Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
B.6 Lyapunov’s First Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
B.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

Solutions to Odd-Numbered Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Notation and Conventions

While most of the notation used in this book is standard, there are a few conventions we
would like to emphasize.
Notation for vectors and functions of time. We frequently switch from functions

x : R ! Rn

to vectors
x 2 Rn
and back to functions, and this can be confusing upon first reading. To highlight the
difference we typeset functions of time usually in an upright math font, e.g., x, instead
of the standard italic math font, e.g., x. This convention is used in differential equations

d
xðtÞ ¼ axðtÞ; xð0Þ ¼ x0 ;
dt

and solutions of them, e.g., xðtÞ ¼ eat x0 : But it is used mainly to avoid possibly
ambiguous expressions. For instance whenever we use V(x) we mean that V is a
function of x 2 Rn and not of the whole time function x : R ! Rn . We still use the italic
math font for functions of time if they only play a minor role, such as a(t) in

d
xðtÞ ¼ aðtÞxðtÞ þ uðtÞ:
dt

In equations like these a : R ! R is typically just a given function.


Notation for derivatives. The material in this book requires partial derivatives, total
derivatives, and derivatives with respect to vectors. There is no universally accepted
notation for this. In this book, we use the following notation.
For functions of a single real variable, g : R ! Rk ; we denote its derivative at t 2 R as

d dgðtÞ
_
gðtÞ or g 0 ðtÞ or gðtÞ or :
dt dt
Now if, for example, F : R3 ! R and x : R ! R; then

d @
Fðt; xðtÞ; x_ ðtÞÞ
dt @ x_
@
means the total derivative with respect to t of the partial derivative @v Fðt; x; vÞ evaluated
at ðt; x; vÞ ¼ ðt; xðtÞ; x_ ðtÞÞ. For instance, if Fðt; x; vÞ ¼ t v3 then

ix
x Notation and Conventions

   
d @ d @ t x_ ðtÞ
3 _ 2 ðtÞ
d 3t x
Fðt; xðtÞ; x_ ðtÞÞ ¼ ¼ _ 2 ðtÞ þ 6t x_ ðtÞ x
¼ 3x € ðtÞ:
dt @ x_ dt @ x_ dt
Notation for differentiation with respect to a column or a row vector. For functions
f : Rn ! R we think of its gradient at some x 2 Rn as a column vector, and we denote it as

@f ðxÞ
; so
@x 2 3

@f ðxÞ
6 @x1 7
6 7
6 7
 7
6 @f ðxÞ

@f ðxÞ 6 7
¼6
6 @x2 7 n
72R :
@x 6 .. 7
6 . 7
6 7
 5
4 @f ðxÞ
@xn
In fact, throughout we think of Rn as the linear space of n-dimensional column vectors.
The above is a derivative with respect to a column vector x 2 Rn , and the outcome (the
gradient) is then a column vector as well. By the same logic, if we differentiate f with
respect to a row vector xt 2 R1n —mind the transpose—then we mean the gradient
seen as a row vector,
 

@f ðxÞ 
@f ðxÞ 
@f ðxÞ 
@f ðxÞ
¼  2 R1n :
@xt @x1 @x2 @xn
The previous two conventions are also combined: if F : R  Rn  Rn ! R and
x : R ! Rn , then

d @
Fðt; xðtÞ; x_ ðtÞÞ
dt @ x_

means the total derivative with respect to t of the column vector with n entries of
@ _ d @ _
@ x_ Fðt; xðtÞ; xðtÞÞ: Then dt @ x_ Fðt; xðtÞ; xðtÞÞ is a column vector with n entries as well.
Occasionally we also need second-order derivatives with respect to vectors x, such as
Hessian matrices. Their notation is discussed in Appendix A.2.
Chapter 1

Calculus of Variations

1.1 Introduction

Optimal control theory is deeply rooted in the classical mathematical subject


referred to as the calculus of variations; the name of which seems to go back to
the famous mathematician Leonhard Euler (1707–1783). Calculus of variations
deals with minimization of expressions of the form
T
F (t , x (t ), ẋ (t )) dt
0

over all functions

x : [0, T ] → Rn .

Here F : R × Rn × Rn → R is some given function. Recall that ẋ (t ) denotes the


derivative of x with respect to its argument, i.e., ẋ (t ) = d xdt(t ) . In contrast to
basic optimization—where we optimize over a finite number of variables—we
minimize in principle over all (sufficiently smooth) functions x : [0, T ] → Rn .
Many fruitful applications of the calculus of variations have been developed in
physics, in particular, in connection with Hamilton’s principle of least action.
Also in other sciences such as economics, biology, and chemistry, the calculus
of variations has led to many useful applications.
We start with some motivating examples. The first example is the cel-
ebrated brachistochrone problem. This problem was introduced in 1696 by
Johann Bernoulli and it was one of the first problems of this type1 . When
Bernoulli formulated the problem it immediately attracted a lot of attention and
several mathematicians submitted solutions to this problem, including Leibniz,
Newton, l’Hôpital, and Johann Bernoulli’s older brother Jakob.
1 Newton’s minimal resistance problem can also be seen as a calculus of variations problem,

and it predates the brachistochrone problem by almost 10 years.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 1


G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0_1
2 1 C ALCULUS OF VARIATIONS

A A

B B

A A

B B

F IGURE 1.1: Four paths from A to B . Which is fastest? See Example 1.1.1.

x
(x0 , 0)

F IGURE 1.2: In this case a positive y means a negative altitude. See Exam-
ple 1.1.1.

y(x)

ds

dx x

d y (x)
F IGURE 1.3: ds = 1 + ẏ 2 (x) dx. Here ẏ (x) := dx .
1.1 I NTRODUCTION 3

Example 1.1.1 (Brachistochrone). A present-day formulation2 of the brachis-


tochrone problem is as follows. Consider two points A = (x 0 , y 0 ) and B = (x 1 , y 1 )
in R2 . The problem is to find a path (a function) y : [x 0 , x 1 ] → R through A and B
such that a point mass released with zero speed at A and that glides along this
path without friction reaches B in minimal time. It is assumed that the gravi-
tational acceleration g is constant. Figure 1.1 depicts four possible such paths
and, actually, one of them is the optimal solution. Can you guess which one?
This is a hard problem, and we will solve this problem step by step in a series of
examples (Examples 1.2.4 and 1.3.1). First we set it up as a calculus of variations
problem. It is convenient to take the vertical displacement y to increase when
going down (i.e., y points downwards, in the same direction as the gravitational
force, see Fig. 1.2). Also, without loss of generality we take (x 0 , y 0 ) = (x 0 , 0). That
is, we release the point mass at zero altitude. As the mass moves friction free we
have that kinetic plus potential energy is constant, i.e.,

1
mv 2 − mg y = c
2
for some constant c. Here v is the speed of the mass. We release the mass at
zero altitude and with zero speed, so c = 0. Hence the speed v follows uniquely
from y as

v = 2g y.

By the Pythagorean theorem, an infinitesimal horizontal displacement


 dx cor-
responds to a displacement along the curve y (x) of ds := 1 + ẏ 2 (x) dx, see
Fig. 1.3. The amount of time that this takes is

ds 1 + ẏ 2 (x)
dt = = dx.
v 2g y (x)

This way the time T needed to travel from (x 0 , 0) to (x 1 , y 1 ) can be seen as an


integral over x,
T x 1 
1 + ẏ 2 (x)
T= 1 dt = dx. (1.1)
0 x0 2g y (x)

Thus the brachistochrone problem is to minimize the integral (1.1) over all func-
tions y : [x 0 , x 1 ] → R subject to y (x 0 ) = y 0 = 0 and y (x 1 ) = y 1 . 

Example 1.1.2 (Optimal economic investment). This example is based on an


example from Seierstad and Sydsaeter (1987). We consider a simple model of
2 For more on the history of the brachistochrone problem and subsequent developments see

H.J. Sussmann and J.C. Willems. 300 years of optimal control: from the brachistochrone to the
maximum principle. IEEE Control Systems Magazine, 17:32–44, 1997.
4 1 C ALCULUS OF VARIATIONS

y y (x) u
u(c)

x c
(a) (b)

F IGURE 1.4: Left: national product y is concave in capital stock x. Right:


utility functions u are strictly concave in consumption c. See Example 1.1.2.

an economy of some country. We distinguish its capital stock x (t ) in, say, euros,
which is a measure of the physical capital in the country at time t . We also need
the net national product y (t ) in euros per unit time, which is the value of all
that is produced at time t per unit time in the country. The derivative ẋ (t ) of
capital stock with respect to t is the increase in physical capital, and it is called
investment. Therefore, what is left for consumption (euros per unit time) at time
t is the difference between national product and investment,

c (t ) := y (t ) − ẋ (t ).

This is called consumption. It is assumed that the national product y follows


from capital stock x, so at all times t

y (t ) = φ( x (t )) (1.2)

for some function φ (which we assume to be twice continuously differentiable).


It is a standard assumption that φ is strictly increasing and concave, that is,
φ (x) > 0 and φ (x) ≤ 0 for all x > 0, see Fig. 1.4(a). This captures the not unrea-
sonable assumption that the national product increases with increasing capital
stock, but that the rate of this increase reduces as x gets larger.
Suppose the economy at t = 0 has a certain initial capital stock

x (0) = x0 . (1.3)

Then, given an arbitrary investment function ẋ (t ), all variables in our model are
determined since
t
x (t ) = x0 + ẋ (τ) dτ. (1.4)
0

The question now is: what is a good investment function ẋ (t )? A way to answer
this question is as follows. Suppose we have a utility function u(c) that mod-
els the enjoyment of consuming c. Standard assumptions on utility functions
are that they are strictly increasing, strictly concave, and twice continuously
1.1 I NTRODUCTION 5

differentiable, so u  (c) > 0, u  (c) < 0 for all c > 0, see Fig. 1.4(b). This is just to
say that additional enjoyment of additional consumption flattens at high levels
of consumption.
An investment function ẋ (t ) is now considered optimal if it maximizes the
T
integrated utility 0 u( c (t )) e−αt dt , that is, if it maximizes
T T
−αt
 
u( c (t )) e dt = u φ( x (t )) − ẋ (t ) e−αt dt (1.5)
0 0

over all investment functions ẋ (t ) or, equivalently, over all functions x (t ) sat-
isfying (1.3). The term e−αt is a so-called discount factor (and α is a discount
rate, assumed positive). This is included to express that the importance of the
future utility u( c (t )) is considered to be declining with t further in the future.
The optimization problem is of the same type as before apart from the fact that
we are maximizing instead of minimizing. Clearly, maximizing the integrated
utility (1.5) is equivalent to minimizing its negation
T T
 
−u( c (t )) e−αt dt = −u φ( x (t )) − ẋ (t ) e−αt dt .
0 0

The end time of the planning period is denoted as T , and we will assume in
addition that

x (T ) = xT (1.6)

for some given desired capital stock x T . This type of model for optimal eco-
nomic growth was initiated by F.P. Ramsey in 1928. 

Example 1.1.3 (Cheese production). A cheesemaker is to deliver an amount of


x T kilos of cheese at a delivery time T . The cheesemaker wants to find a pro-
duction schedule for completing the order with minimal costs. Let x (t ) denote
the amount of cheese at time t . We assume that both producing and storing
cheese is costly. The total cost might be modeled as
T
α ẋ 2 (t ) + β x (t ) dt , (1.7)
0

where β x (t ) models the storage cost per unit time and α ẋ 2 (t ) models the pro-
duction cost per unit time. The constants α, β are positive numbers. The objec-
tive of the cheesemaker is to determine a production profile x (t ) that minimizes
the above cost, subject to the conditions

x (0) = 0, x (T ) = xT , ẋ (t ) ≥ 0. (1.8)

Example 1.1.4 (Shortest path). What is the shortest path between two points
(x 0 , y 0 ) and (x 1 , y 1 ) in R2 ? Of course we know the answer but let us anyway for-
mulate this problem in more detail.
6 1 C ALCULUS OF VARIATIONS

Clearly the path is characterized by a function y : [x 0 , x 1 ] → R. As explained


in Example 1.1.1, the length ds of
 an infinitesimal part of the path follows from
an infinitesimal part dx as ds = 1 + ẏ 2 (x) dx, see Fig. 1.3. So the total length of
the path is
x 1 
1 + ẏ 2 (x) dx. (1.9)
x0

This has to be minimized subject to

y (x0 ) = y 0 , y (x1 ) = y 1 . (1.10)

Note that this problem is different from the brachistochrone problem. 

With the exception of the final example, the optimal solution—if one exists
at all—is not easy to find.

1.2 Euler-Lagrange Equation

The examples given in the preceding section are instances of what is called the
simplest problem in the calculus of variations:

Definition 1.2.1 (Simplest problem in the calculus of variations). Given a final


time T > 0 and a function F : [0, T ] × Rn × Rn → R, and x 0 , x T ∈ Rn , the sim-
plest problem in the calculus of variations is to minimize the cost J defined as
T
J (x) = F (t , x (t ), ẋ (t )) dt (1.11)
0

over all functions x : [0, T ] → Rn that satisfy the boundary conditions

x (0) = x0 , x (T ) = xT . (1.12)

The function J is called the cost (function) or cost criterion, and the inte-
grand F of this cost is called the running cost or the Lagrangian. For n = 1
the problem is visualized in Fig. 1.5: given the two points (0, x 0 ) and (T, x T )
each smooth function x that connects the two points determines a cost J ( x )
as defined in (1.11), and the problem is to find the function x that minimizes
this cost.
The calculus of variations problem can be regarded as an infinite-
dimensional version of the basic optimization problem of finding a z ∗ ∈ Rn
that minimizes a function K : Rn → R. The difference is that the function K
is replaced by an integral expression J , while vectors z ∈ Rn are replaced by
functions x : [0, T ] → Rn .
1.2 E ULER-L AGRANGE E QUATION 7

xT

x(t )

x0
x̃(t )

t 0 t T

F IGURE 1.5: Two functions x , x̃ : [0, T ] → R that satisfy the boundary condi-
tions (1.12).

Mathematically, Definition 1.2.1 is not complete. We have to be more precise


about the class of functions x over which we want to minimize the cost (1.11).
A minimal requirement is that x is differentiable. Also, optimization prob-
lems usually require some degree of smoothness on the cost function, and this
imposes further restrictions on x as well as on F . Most of the times we assume
that F (t , x, ẋ) and x (t ) are either once or twice continuously differentiable in all
their arguments. This is abbreviated to C 1 (for once continuously differentiable)
and C 2 (for twice continuously differentiable).
We next derive a differential equation that every solution to the simplest
problem in the calculus of variations must satisfy. This differential equation is
the generalization of the well-known first-order condition in basic optimization
that the gradient vector ∂K∂z (z ∗ )
must be equal to zero for every z ∗ ∈ Rn that min-
imizes a differentiable function K : Rn → R.

xT

x (t ) x (t )

x (t )

x0

t 0 t T

F IGURE 1.6: A function x ∗ (t ) and a possible perturbed function x ∗ (t ) +


αδx (t ). At t = 0 and t = T the perturbation αδx (t ) is zero. See the proof
of Theorem 1.2.2.

Theorem 1.2.2 (Euler-Lagrange equation—necessary first-order condition for


optimality). Suppose that F is C 1 . Necessary for a C 1 function x ∗ to mini-
mize (1.11) subject to (1.12) is that it satisfies the differential equation
∂ d ∂
− F (t , x ∗ (t ), ẋ ∗ (t )) = 0 for all t ∈ [0, T ]. (1.13)
∂x dt ∂ẋ
8 1 C ALCULUS OF VARIATIONS

(Recall page ix for an explanation of the notation.)

Proof. Suppose x ∗ is a C 1 solution to the simplest problem in the calculus of


variations, and let δx : [0, T ] → Rn be an arbitrary C 1 function on [0, T ] that van-
ishes at the boundaries,

δx (0) = δx (T ) = 0. (1.14)

We use it to form a variation of the optimal solution

x (t ) = x ∗ (t ) + αδx (t ),

in which α ∈ R. Notice that this x for every α ∈ R satisfies the boundary condi-
tions x (0) = x ∗ (0) = x 0 and x (T ) = x ∗ (T ) = x T , see Fig. 1.6. Since x ∗ is a mini-
mizing solution for our problem we have that

J ( x ∗ ) ≤ J ( x ∗ + αδx ) for all α ∈ R. (1.15)

For every fixed function δx the cost J ( x ∗ + αδx ) is a function of the scalar vari-
able α,

J¯(α) := J ( x ∗ + αδx ), α ∈ R.

The minimality condition (1.15) thus implies that J¯(0) ≤ J¯(α) for all α ∈ R. Given
that x ∗ , δx and F are all assumed C 1 , it follows that J¯(α) is differentiable as a
function of α, and so the above implies that J¯ (0) = 0. This derivative is3

d T
J¯ (0) = F (t , x ∗ (t ) + αδx (t ), ẋ ∗ (t ) + αδ̇x (t )) dt
dα 0 α=0
T
∂F (t , x ∗ (t ), ẋ ∗ (t )) ∂F (t , x ∗ (t ), ẋ ∗ (t ))
= δx (t ) + δ̇x (t ) dt . (1.16)
0 ∂x T
∂ẋ T

In the rest of the proof we assume that F and x ∗ and δx are C 2 . (The case when
they are only C 1 is slightly more involved; this is covered in Exercise 1.7.) Inte-
gration by parts of the second term in (1.16) yields4
T
∂F (t , x ∗ (t ), ẋ ∗ (t ))
δ̇x (t ) dt
0 ∂ẋ T
T
∂F (t , x ∗ (t ), ẋ ∗ (t )) T
d ∂F (t , x ∗ (t ), ẋ ∗ (t ))
= δx (t ) − δx (t ) dt .
∂ẋ T 0 0 dt ∂ẋ T
(1.17)

 
3 Leibniz’ integral rule says that d

G(α, t )dt = ∂G(α,t
∂α
)
dt if G(α, t ) and ∂G(α,t
∂α
)
are continu-
ous in t and α. Here they are continuous because F and δx are assumed C . 1
4 The integration by parts rule holds if ∂ F (t , x (t ), ẋ (t )) and δ (t ) are C 1 with respect to
∂ẋ T ∗ ∗ x
time. This holds if F, x ∗ , δx are C 2 in all their arguments.
1.2 E ULER-L AGRANGE E QUATION 9

Plugging (1.17) into (1.16) and using that J¯ (0) = 0 we find that
T
∂F (t , x ∗ (t ), ẋ ∗ (t )) T
∂ d ∂ T
0= δx (t ) + − F (t , x ∗ (t ), ẋ ∗ (t )) δx (t ) dt .
∂ẋ T 0 0 ∂x dt ∂ẋ
(1.18)

The first term on the right-hand side is actually zero because of the boundary
conditions (1.14). Hence we have
T
∂ d ∂ T
0= − F (t , x ∗ (t ), ẋ ∗ (t )) δx (t ) dt . (1.19)
0 ∂x dt ∂ẋ
So far the perturbation δx in our derivation was some fixed function. However
since δx can be arbitrarily chosen, the equality (1.19) must hold for every C 2
perturbation δx that satisfies (1.14). But this implies, via the result presented
next (Lemma 1.2.3), that the term in between the square brackets in (1.19) is
zero for all t ∈ [0, T ], i.e., that (1.13) holds. ■

(t )

x (t )

a t̄ b t

F IGURE 1.7: The function δx (t ) defined in (1.21).

Lemma 1.2.3 (Fundamental lemma (or Lagrange’s lemma)). A continuous


function φ : [0, T ] → Rn has the property that
T
φT (t )δx (t ) dt = 0 (1.20)
0

for every C 2 function δx : [0, T ] → Rn satisfying (1.14) iff φ(t ) = 0 for all t ∈ [0, T ].

Proof. We prove it for n = 1. Figure 1.7 explains it all: suppose that φ is not the
zero function, i.e., that φ(t̄ ) is nonzero for some t̄ ∈ [0, T ]. For example, φ(t̄ ) > 0.
Then, by continuity, φ(t ) is positive on some interval [a, b] around t̄ (with 0 ≤
a < b ≤ T ). In order to provide a formal proof consider the function δx defined
as

((t − a)(b − t ))3 t ∈ [a, b],
δx (t ) = (1.21)
0 elsewhere,

see Figure 1.7. Clearly this δx fulfills the requirements of (1.14), but it vio-
lates (1.20) because both φ and δx are positive on [a, b], and hence the integral
in (1.20) is positive as well. A similar argument works for φ(t̄ ) < 0. The assump-
tion that φ(t̄ ) = 0 at some t̄ ∈ [0, T ] hence is wrong. ■
10 1 C ALCULUS OF VARIATIONS

Theorem 1.2.2 was derived independently by Euler and Lagrange, and in


honor of its inventors Equation (1.13) is nowadays called the Euler-Lagrange
equation (or the Euler equation).
We want to stress that the Euler-Lagrange equation is only a necessary con-
dition for optimality. All it guarantees is that a “small” perturbation of x ∗ results
in a “very small” change in cost. To put it more mathematically, solutions x ∗ of
the Euler-Lagrange equation are precisely those functions for which for every
allowable function δx and α ∈ R we have

J ( x ∗ + αδx ) = J ( x ∗ ) + o(α),

with o some little-o function5 . Such solutions x ∗ are referred to as stationary


solutions. They might be minimizing J ( x ), or maximizing J ( x ), or neither.
Interestingly, the Euler-Lagrange equation does not depend on the initial or
final values x 0 , x T . More on this in § 1.5.

Example 1.2.4 (Brachistochrone; Example 1.1.1 continued). The Euler-


Lagrange equation for the brachistochrone problem, see (1.1), reads

∂ d ∂ 1 + ẏ 2 (x)
− = 0, (1.22)
∂y dx ∂ ẏ 2g y (x)
with the boundary conditions y (x 0 ) = y 0 and y (x 1 ) = y 1 . One may expand (1.22)
but in this form the problem is still rather complicated, and defying an explicit
solution. In the following section, we use a more sophisticated approach. 

Example 1.2.5 (Shortest path; Example 1.1.4 continued). The Euler-Lagrange


equation for the shortest path problem described by (1.9) and (1.10) is

∂ d ∂
0= − 1 + ẏ 2 (x), (1.23)
∂y dx ∂ ẏ


with boundary conditions y (x 0 ) = y 0 and y (x 1 ) = y 1 . Since ∂y 1 + ẏ 2 (x) is zero,
we obtain from (1.23) that
⎛ ⎞

d ∂ d ⎜ ẏ (x) ⎟ −3/2
0= 1 + ẏ 2 (x) = ⎝ ⎠ = ÿ (x)(1 + ẏ (x))
2
. (1.24)
dx ∂ ẏ dx
1 + ẏ (x)
2

Clearly, the solution of (1.24) is given by the differential equation

ÿ (x) = 0,
which is another way of saying that y (x) is a straight line. In light of the bound-
ary conditions y (x 0 ) = y 0 and y (x 1 ) = y 1 , it has the unique solution
y −y
y ∗ (x) = y 0 + x11 −x00 (x − x0 ).

o(y)
5 A little-o function o : Rm → Rk is any function with the property that lim y→0 y = 0.
1.2 E ULER-L AGRANGE E QUATION 11

This solution is not surprising. It is of course the solution, although formally


we may not yet draw this conclusion because the theory presented so far only
claims that solutions of (1.24) are stationary solutions, not necessarily optimal
solutions. Optimality is proved later (Example 1.6.8). 

Example 1.2.6 (Economic investment; Example 1.1.2 continued). For the


problem of Example 1.1.2 the Euler-Lagrange equation (1.13) takes the form

∂ d ∂    
− u φ( x (t )) − ẋ (t ) e−αt = 0,
∂x dt ∂ẋ

which is the same as


  d    
u  φ( x (t )) − ẋ (t ) φ ( x (t )) e−αt − −u φ( x (t )) − ẋ (t ) e−αt = 0, (1.25)
dt
where u  and φ denote the usual derivatives of functions of one variable. Taking
the time derivative in (1.25) yields
 
u  φ( x (t )) − ẋ (t ) φ ( x (t )) e−αt
    
+ u  φ( x (t )) − ẋ (t ) φ ( x (t )) ẋ (t ) − ẍ (t ) e−αt −u  φ( x (t ) − ẋ (t )) e−αt = 0.

Dividing by e−αt (and omitting the time argument) we obtain

u  (φ( x ) − ẋ )φ ( x ) + u  (φ( x ) − ẋ )(φ ( x ) ẋ − ẍ ) − u  (φ( x ) − ẋ ) = 0.

This, together with the boundary conditions (1.3) and (1.6), has to be solved
for the unknown function x (t ), or—see also (1.4)—for the unknown investment
function ẋ (t ). This can be done once the utility function u(c) and the consump-
tion function φ(x) are specified. 

Example 1.2.7 (Cheese production; Example 1.1.3 continued). Corresponding


to the criterion to be minimized, (1.7), we find the Euler-Lagrange equation

∂ d ∂ d 
0= − (α ẋ 2 (t ) + β x (t )) = β − 2α ẋ (t ) = β − 2α ẍ (t ).
∂x dt ∂ẋ dt
β
So ẍ (t ) = 2α , that is,

β 2
x (t ) = t + ẋ 0 t + x 0 . (1.26)

The constants x 0 and ẋ 0 follow from the boundary conditions x (0) = 0 and
x (T ) = xT , i.e., x0 = 0 and ẋ0 = xT /T − βT /(4α). Of course, it still remains to be
seen whether the x (t ) defined in (1.26) is indeed minimizing (1.7). Notice that
the extra constraint, ẋ (t ) ≥ 0, from (1.8) puts a further restriction on the total
amount of x T and the final time T . 
12 1 C ALCULUS OF VARIATIONS

All examples so far considered scalar-valued functions x , but the theory


holds for general vector-valued functions. Here is an example.

Example 1.2.8 (Two-dimensional problem). Consider minimization of the


integral

2
J ( x 1 , x 2 ) := ẋ 21 (t ) + ẋ 22 (t ) − 2 x 1 (t ) x 2 (t ) dt
0

over all functions x 1 , x 2 : [0, T ] → R subject to the boundary conditions

x 1 (0) = 0, x 2 (0) = 0, x 1 ( π2 ) = 1, x 2 ( π2 ) = 1.
 
Since the minimization is over a vector x = xx 12 of two components, the Euler-
Lagrange equation is given by a two-dimensional system of differential equa-
tions

−2 x 2 (t ) d 2 ẋ 1 (t ) 0
− = ,
−2 x 1 (t ) dt 2 ẋ 2 (t ) 0

that is, ẍ 1 (t ) = − x 2 (t ) and ẍ 2 (t ) = − x 1 (t ). This yields the fourth-order differen-


d4 d4
tial equations for each of the components, dt 4 x 1 (t ) = x 1 (t ) and dt 4 x 2 (t ) = x 2 (t ).

These are linear differential equations with constant coefficients, and they can
be solved with standard methods (see Appendix A.4). The general solution is

x 1 (t ) = a et +b e−t +c cos(t ) + d sin(t ),


x 2 (t ) = − ẍ 1 (t ) = −a et −b e−t +c cos(t ) + d sin(t ),

with a, b, c, d ∈ R. The given boundary conditions are satisfied iff a = b = c = 0


and d = 1, that is,

x ∗1 (t ) = x ∗2 (t ) = sin(t ).

1.3 Beltrami Identity

In many applications, the running cost F (t , x, ẋ) does not depend on t and thus
has the form

F (x, ẋ).
∂F (x,ẋ)
Obviously the partial derivative ∂t is zero now. An interesting consequence
is that then
∂F ( x (t ), ẋ (t ))
F ( x (t ), ẋ (t )) − ẋ T (t )
∂ẋ
1.3 B ELTRAMI I DENTITY 13

is constant over time for every solution x of the Euler-Lagrange equation. To see
this, we differentiate the above expression with respect to time (and for ease of
notation we momentarily write x (t ) simply as x ):

d ∂F ( x , ẋ )
F ( x , ẋ ) − ẋ T
dt ∂ẋ
d d  T ∂F ( x , ẋ ) 
= F ( x , ẋ ) − ẋ
dt dt ∂ẋ
∂F ( x , ẋ ) ∂F ( x , ẋ ) ∂F ( x , ẋ )  d ∂F ( x , ẋ ) 
= ẋ T + ẍ T − ẍ T + ẋ T
∂x ∂ẋ ∂ẋ dt ∂ẋ
∂F ( x , ẋ ) d ∂F ( x , ẋ )
= ẋ T − . (1.27)
∂x dt ∂ẋ

This is zero for every solution x of the Euler-Lagrange equation. Hence every
stationary solution x ∗ has the property that

∂F ( x ∗ (t ), ẋ ∗ (t ))
F ( x ∗ (t ), ẋ ∗ (t )) − ẋ ∗T (t ) =C ∀t ∈ [0, T ]
∂ẋ
for some integration constant C . This identity is known as the Beltrami iden-
tity. We illustrate the usefulness of this identity by explicitly solving the brachis-
tochrone problem. It is good to realize, though, that the Beltrami identity is
not equivalent to the Euler-Lagrange equation. Indeed, every constant function
x (t ) satisfies the Beltrami identity. The Beltrami identity and the Euler-Lagrange
equation are equivalent for scalar functions x : [0, T ] → R if ẋ (t ) is nonzero for
almost all t , as can be seen from (1.27).
y

c2

0 c2 x

A
x

c2
F IGURE 1.8: Top: shown in red is the cycloid x (φ) = 2 (φ − sin(φ)), y (φ) =
c2
2 (1 − cos(φ)) for φ ∈ [0, 2π]. It is the curve that a point on a rolling disk of
2
radius c /2 traces out. Bottom: a downwards facing cycloid (solution of the
brachistochrone problem). See Example 1.3.1.
14 1 C ALCULUS OF VARIATIONS

A (0, 0) x

F IGURE 1.9: Cycloids (1.29) for various c > 0. Given a B to the right and
below A = (0, 0) there is a unique cycloid that joins A and B . See Exam-
ple 1.3.1.

Example 1.3.1 (Brachistochrone; Example 1.1.1 continued). The running cost


F (x, y, ẏ) of the brachistochrone problem is

1 + ẏ 2
F (y, ẏ) =  .
2g y

It does not depend on x, so Beltrami applies which says that the solution of the
brachistochrone problem makes the following function constant (as a function
of x):

∂F ( y (x), ẏ (x)) 1 + ẏ 2 (x) ẏ 2 (x)
F ( y (x), ẏ (x)) − ẏ (x) =  −
∂ ẏ 2g y (x) 2g y (x)(1 + ẏ 2 (x))
1
= .
2g y (x)(1 + ẏ 2 (x))

Denote this constant as C . Squaring and inverting both sides gives

y (x)(1 + ẏ 2 (x)) = c 2 , (1.28)

where c 2 = 1/(2gC 2 ). This equation can be solved parametrically by6

c2 c2
x (φ) = (φ − sin(φ)), y (φ) = (1 − cos(φ)). (1.29)
2 2
The curve ( x (φ), y (φ)) is known as the cycloid. It is the curve that a fixed point
on the boundary of a wheel with radius c 2 /2 traces out while rolling with-
6 Quick derivation: since the cotangent cos(φ/2)/ sin(φ/2) for φ ∈ [0, 2π] ranges over all

real numbers once (including ±∞) it follows that any dy/dx can uniquely be written
as dy/dx = cos(φ/2)/ sin(φ/2) with φ ∈ [0, 2π]. Then (1.28) implies that y (φ) = c 2 /(1 +
cos2 (φ/2)/ sin2 (φ/2)) = c 2 sin2 (φ/2) = c 2 (1 − cos(φ))/2 and then dx/dφ = (dy/dφ)/(dy/dx) =
[c 2 sin(φ/2) cos(φ/2)]/[cos(φ/2)/ sin(φ/2)] = c 2 sin2 (φ/2) = c 2 (1 − cos(φ))/2. Integrating this
expression shows that x (φ) = c 2 (φ − sin(φ))/2 + d where d is some integration constant. This d
equals zero because (x, y) :=(0, 0) is on the curve. (See Exercise 1.4 for more details.)
1.3 B ELTRAMI I DENTITY 15

out slipping on a horizontal line (think of the valve on your bike’s wheel), see
Fig. 1.8. For the cycloid, the Beltrami identity and the Euler-Lagrange equation
are equivalent because ẏ (x) is nonzero almost everywhere. Hence all sufficiently
smooth stationary solutions of the brachistochrone problem are precisely these
cycloids.
Varying c in (1.29) generates a family of cycloids, see Fig. 1.9. Given a desti-
nation point B to the right and below A = (0, 0) there is a unique cycloid that
connects A and B , and the solution of the brachistochrone problem is that
segment of the cycloid. Notice that for certain final destinations B the curve
extends below the final destination! 

r(x)

1 x 1

dx

F IGURE 1.10: Given a nonnegative function r : [−1, 1] → [0, ∞) and its


surfaceof revolution, the infinitesimal dx-strip of this surface has area
2π r (x) 1 + ṙ 2 (x) dx. See Example 1.3.2.

Example 1.3.2 (Minimal surface). This is an elaborate example. We want to


determine a nonnegative radius r : [−1, 1] → [0, ∞) for which the surface of rev-
olution about the x-axis,
{(x, y, z) | x ∈ [−1, 1], y 2 + z 2 = r 2 (x)},
has minimal area, see Fig. 1.10. We assume that the radii at the endpoints are
the same and equal to a given ρ > 0,
r (−1) = r (+1) = ρ.
The area of the surface of revolution over an infinitesimal dx-strip at x equals
2π r (x) 1 + ṙ 2 (x) dx (see Fig. 1.10) and therefore the total area J ( r ) of the sur-
face of revolution is
1 
J (r) = 2π r (x) 1 + ṙ 2 (x) dx.
−1
16 1 C ALCULUS OF VARIATIONS

ra ( 1)

aG 1.564

a 0.834

1.509 G (a)

surface
area a a

17.16 a a

(b)

surface
area Goldschmidt

22.56
17.16 a a

G 1.895 (c)

F IGURE 1.11: (a) The endpoint radius r a (±1) := a cosh(1/a) of the catenoid
as a function of a. Its minimal value r a (±1) is ρ ∗ = 1.509 (attained at a ∗ =
0.834); (b) the area of the catenoid as a function of endpoint radius ρ; (c)
the area of the catenoid (in red) and of the Goldschmidt solution (in yellow)
as a function of endpoint radius ρ. The two areas are the same at ρ G =
1.895. This ρ G corresponds to a G = 1.564 (see part (a) of this figure). See
Example 1.3.2.
1.3 B ELTRAMI I DENTITY 17

Beltrami applies and it gives us that



ṙ (x)
2π r (x) 1 + ṙ 2 (x) − ṙ (x)2π r (x)  =C
1 + ṙ 2 (x)

for some constant C . Multiplying left and right by the nonzero 1 + ṙ 2 (x)/(2π)
turns this into

C
r (x)(1 + ṙ 2 (x)) − r (x) ṙ 2 (x) = 1 + ṙ 2 (x),

that is,

C
r (x) = 1 + ṙ 2 (x).

Since the radius r (x) is nonnegative we have that C ≥ 0, and thus a :=C /(2π) is
nonnegative as well. Squaring left- and right-hand side we end up with

r 2 (x) = a 2 (1 + ṙ 2 (x)). (1.30)

The nonnegative even solutions of this differential equation are7

r a (x) := a cosh(x/a), a ≥ 0. (1.31)

Figure 1.10 shows an example of such a hyperbolic cosine. The two-dimensional


surface of revolution of a hyperbolic cosine is called catenoid. From the shape
of hyperbolic cosines, it will be clear that for every a > 0 the derivative ṙ (x) is
nonzero almost everywhere, and so the Beltrami identity and Euler-Lagrange
equation are equivalent.
But are such hyperbolic cosines optimal solutions? Not necessarily, and Fig-
ure 1.11(a) confirms this. It depicts the endpoint radius ρ of the hyperbolic
cosine solution

r a (±1) = a cosh(1/a)

as a function of a (notice the flipped axes in Figure 1.11(a)). The figure demon-
strates that the endpoint radius has a minimum, and the minimum is ρ ∗ =
1.509, and it is attained at a ∗ = 0.834. So if we choose an endpoint radius ρ less
than ρ ∗ then none of these hyperbolic cosines r a is the solution to our prob-
lem! Also, if ρ > ρ ∗ then apparently there are two hyperbolic cosines that meet
the endpoint condition, r a (±1) = ρ, and at most one of them is the optimal
solution. It can be shown that the area of the catenoid is

J ( r a ) = 2πa 2 ( a1 + sinh( a1 ) cosh( a1 )).

7 This hyperbolic cosine solution can be derived using separation of variables (see

Appendix A.3). However, there is a technicality in this derivation that is often overlooked, see
Exercise 1.6, but we need not worry about that now.
18 1 C ALCULUS OF VARIATIONS

It is interesting to plot this against r a (±1) = a cosh(1/a). This is done in


Fig. 1.11(b). The blue curve is for a < a ∗ , and the red curve is for a > a ∗ . The
plot reveals that for a given r a (±1) > ρ ∗ the area of the catenoid is the smallest
for the largest of the two a’s. Thus we need to only consider a ≥ a ∗ .
Now the case that ρ < ρ ∗ . Then no hyperbolic cosine meets the endpoint
condition. What does it mean? It means that no smooth function r (x) exists
that is stationary and satisfies r (±1) < ρ ∗ . A deeper analysis shows that the
only other stationary surface of revolution is the so-called Goldschmidt solu-
tion, see Fig. 1.12. The Goldschmidt solution consists of the two disks with
radius ρ at respective centers (x, y, z) = (±1, 0, 0), and the line of radius zero,
{(x, y, z) | x ∈ (−1, 1), y = z = 0}, that connects the two disks. The area of the
Goldschmidt solution is the sum of the areas of the two disks at the endpoints,
2 × πρ 2 . (The line does not contribute to the area.) This set can not be written
as the surface of revolution of a graph (x, r (x)) of a function r , thus it is not a
surprise that it does not show up in our analysis.
It can be shown that a global optimal solution exists, and since it must be
stationary it is either the Goldschmidt solution or the catenoid for an appropri-
ate a ≥ a ∗ . If ρ < ρ ∗ then clearly the Goldschmidt solution is the only stationary
solution, hence is optimal. For the other case, ρ > ρ ∗ , something odd occurs:
Fig. 1.11(c) gives us the area of the surface of revolution of the Goldschmidt
solution as well as that of the catenoid. We see that there is an endpoint radius,
ρ G = 1.895, at which the Goldschmidt and catenoid solutions have the same
area. This point is attained at a G = 1.564. For ρ > ρ G the catenoid (for the corre-
sponding a > a G ) has the smallest area, hence is optimal, but for ρ < ρ G it is the
Goldschmidt solution that is globally optimal. The conclusion is that the opti-
mal shape depends discontinuously on the endpoint radius ρ! 

F IGURE 1.12: The Goldschmidt solution is the union of disks around the
two endpoints, combined with a line that connects the centers of the two
disks. See Example 1.3.2.

Example 1.3.3 (Lagrangian mechanics). Consider the one-dimensional motion


of a mass m attached to a linear spring with spring constant k, see Fig. 1.13.
Denote the extension of the spring caused by the mass by q ∈ R. Remarkably
1.3 B ELTRAMI I DENTITY 19

k
m

F IGURE 1.13: A mass m attached to a linear spring with spring constant k.


See Example 1.3.3.

enough, the dynamics of the mass is given by the Euler-Lagrange equation cor-
responding to

F (q, q̇) := 12 m q̇ 2 − 12 kq 2 ,

that is, the difference of the kinetic energy 12 m q̇ 2 of the mass and the potential
energy 12 kq 2 of the spring. Indeed, the Euler-Lagrange equation corresponding
to this F (q, q̇) is

 ∂ d ∂  1 d
0= − ( 2 m q̇ 2 (t )− 12 k q 2 (t )) = −k q (t )− (m q̇ (t )) = −k q (t )−m q̈ (t ),
∂q dt ∂q̇ dt

which can be recognized as Newton’s law (mass times acceleration, m q̈ (t ),


equals the force impressed on the mass by the spring, −k q (t )). Hence according
to Beltrami the quantity

∂F ( q (t ), q̇ (t ))  
q̇ (t ) − F ( q (t ), q̇ (t )) = m q̇ 2 (t ) − 12 m q̇ 2 (t ) − 12 k q 2 (t )
∂q̇
= 12 m q̇ 2 (t ) + 12 k q 2 (t )

is constant over time. This quantity is nothing else than the total energy, that is,
kinetic plus potential energy. Thus the Beltrami identity is in this case the well-
known conservation of energy of a mechanical system with conservative forces
(in this case the spring force).
In general, in classical mechanics the difference of the kinetic and potential
energy F ( q (t ), q̇ (t )) is referred to as the Lagrangian, while the integral
T
F ( q (t ), q̇ (t )) dt
0

is referred to as the action integral. The stationary property of the action integral
is known as Hamilton’s principle; see, e.g., Lanczos (1986) for the close connec-
tion between the calculus of variations and classical mechanics. 
20 1 C ALCULUS OF VARIATIONS

1.4 Higher-Order Euler-Lagrange Equation

The Euler-Lagrange equation can directly be extended to the case that the inte-
gral J ( x ) depends on higher-order derivatives of x . Let us state explicitly the
second-order case.

Proposition 1.4.1 (Higher-order Euler-Lagrange equation). Consider the prob-


lem of minimizing
T
J ( x ) := F (t , x (t ), ẋ (t ), ẍ (t )) dt (1.32)
0

over all C functions x : [0, T ] → Rn that satisfy the boundary conditions


2

x (0) = x0 , x (T ) = xT ,
(1.33)
ẋ (0) = x0d , ẋ (T ) = xTd ,
for given x 0 , x 0d , x T , x Td ∈ Rn . Suppose F is C 2 . A necessary condition that a C 2
function x ∗ minimizes (1.32) and satisfies (1.33) is that x ∗ is a solution of the
differential equation
∂ d ∂ d2 ∂
− + 2 F (t , x ∗ (t ), ẋ ∗ (t ), ẍ ∗ (t )) = 0 ∀t ∈ [0, T ]. (1.34)
∂x dt ∂ẋ dt ∂ẍ

Proof. We prove it for the case that F and x ∗ are C 3 . (If they are only C 2
then one can use the lemma of du Bois-Reymond as explained for the stan-
dard problem in Exercise 1.7.) Define J¯(α) = J ( x ∗ + αδx ) where δx : [0, T ] → Rn
is a C 3 perturbation that satisfies the boundary conditions δx (0) = δx (T ) = 0
and δ̇x (0) = δ̇(T ) = 0. Then, as before, the derivative J¯ (0) is zero. Analogously
to (1.16) we compute J¯ (0). For ease of exposition we momentarily omit all time
arguments in x ∗ (t ) and δx (t ) and, sometimes, F :

d T
0 = J¯ (0) = F (t , x ∗ + αδx , ẋ ∗ + αδ̇x , ẍ ∗ + αδ̈x ) dt
dα 0 α=0
T
∂F ∂F ∂F
= δ + T δ̇x + T δ̈x dt .
T x
(1.35)
0 ∂x ∂ẋ ∂ẍ
Integration by parts of the second term of the integrand yields
T T T  T
∂F ∂F d ∂F   d ∂F 
δ̇
T x
dt = δ
T x
− δ x dt = − δx dt .
0 ∂ẋ ∂ẋ dt ∂ẋ dt ∂ẋ T
T
0 0 0
  
=0

The last equality follows from the boundary condition that δx (0) = δx (T ) = 0.
Integration by parts of the third term in (1.35) similarly gives
T T T  T
∂F ∂F d ∂F   d ∂F 
δ̈ x dt = δ̇ x − δ̇ x dt = − δ̇x dt , (1.36)
0 ∂ẍ ∂ẍ dt ∂ẍ dt ∂ẍ T
T T T
0 0
  0
=0
1.4 H IGHER-O RDER E ULER-L AGRANGE E QUATION 21

where now the second equality is the result of the boundary conditions that
δ̇x (0) = δ̇x (T ) = 0. In fact, we can apply integration by parts again on the final
term of (1.36) to obtain
T T T T
∂F  d ∂F  d ∂F d2 ∂F
δ̈ x dt = − δ̇ x dt = − δ x + δx dt .
0 ∂ẍ dt ∂ẍ T dt ∂ẍ T dt 2 ∂ẍ T
T
0 0 0
  
=0

Thus (1.35) equals


T
∂F  d ∂F   d2 ∂F 
0 = J¯ (0) = − + δx dt .
0 ∂x T dt ∂ẋ T dt 2 ∂ẍ T
As before, Lemma 1.2.3 now yields (1.34). ■

x 0 x
y(x)

F IGURE 1.14: Elastic bar. See Example 1.4.2.

Example 1.4.2 (Elastic bar). Consider an elastic bar clamped at its two ends,
see Fig. 1.14. The bar bends under the influence of gravity. The horizontal and
vertical positions we denote by x and y, respectively. The shape of the bar is
modeled with the function y (x). We assume the bar has a uniform cross section
(independent of x). If the curvature of the elastic bar is not too large then the
potential energy due to elastic forces can be considered, up to first order, to be
proportional to the square of the second derivative,
  2
k d y (x) 2
V1 := dx,
2 0 dx 2
where k is a constant depending on the elasticity of the bar. Furthermore, the
potential energy due to gravity is given by

V2 := g ρ(x) y (x) dx.
0
Here, ρ(x) is the mass density of the bar at x, and, again, we assume that the
curvature is small. The total potential energy thus is

k  d2 y (x) 2
+ g ρ(x) y (x) dx.
0 2 dx 2
The minimal potential energy solution satisfies the Euler-Lagrange equa-
tion (1.34), and this gives the fourth-order differential equation
d4 y (x)
k = −g ρ(x) ∀x ∈ [0, ].
dx 4
22 1 C ALCULUS OF VARIATIONS

If ρ(x) is constant then y (x) is a polynomial of degree 4. Figure 1.14 depicts


a solution for constant ρ and boundary conditions y (0) = y ( ) = 0 and ẏ (0) =
gρ  2
ẏ ( ) = 0. In this case, the solution is y (x) = − 4!k x(x − ) . 

1.5 Relaxed Boundary Conditions

In the problems considered so far, the initial x (0) and final x (T ) were fixed. A
useful extension is obtained by removing some of these conditions. This means
that we allow more functions x to optimize over, and, consequently, we expect
that the Euler-Lagrange equation still holds for the optimal solution. To get an
idea we first look at an example.
Suppose x has three components and that the first component of x (0) and
the last component of x (T ) are free to choose:
⎡ ⎤ ⎡ ⎤
free fixed
x (0) = ⎣fixed⎦ , x (T ) = ⎣fixed⎦ . (1.37)
fixed free
In the proof of Theorem 1.2.2 we found the following necessary first-order con-
dition for optimality (Eqn. (1.18)):
T
∂F (t , x ∗ (t ), ẋ ∗ (t )) T
∂ d ∂ T
δx (t ) + − F (t , x ∗ (t ), ẋ ∗ (t )) δx (t ) dt = 0.
∂ẋ T 0 0 ∂x dt ∂ẋ
(1.38)

This equality needs to hold for every possible perturbation δx . In particular, it


needs to hold for every perturbation δx that is zero at t = 0 and t = T . For these
special perturbations, the first-order condition (1.38) reduces to that of the stan-
dard problem, i.e., that
T
∂ d ∂ T
− F (t , x ∗ (t ), ẋ ∗ (t )) δx (t ) dt = 0
0 ∂x dt ∂ẋ
for all such special δx . It proves that also for relaxed boundary conditions the
Euler-Lagrange equation holds (as was to be expected). Knowing this, the first-
order condition (1.38) simplifies to
∂F (t , x ∗ (t ), ẋ ∗ (t )) T
δx (t ) = 0. (1.39)
∂ẋ T 0

When is this equal to zero for every allowable perturbation? Since the perturbed
x ∗ (t ) + αδx (t ) for our example must obey the boundary condition (1.37) it fol-
lows that the allowable perturbations are exactly those that satisfy
⎡ ⎤ ⎡ ⎤
free 0
δx (0) = ⎣ 0 ⎦ , δx (T ) = ⎣ 0 ⎦ .
0 free
1.5 R ELAXED B OUNDARY C ONDITIONS 23

Clearly, the first-order condition (1.39) holds for all such δx iff
⎡ ⎤ ⎡ ⎤
0 free
∂F (0, x (0), ẋ (0)) ⎣ ∂F (T, x (T ), ẋ (T ))
= free⎦ , = ⎣free⎦ .
∂ẋ ∂ẋ
free 0

This example demonstrates that to every initial or final entry of x that is free to
choose there corresponds a condition on the derivative of F with respect to that
component of ẋ . Incidentally, by allowing functions x with free entries at initial
and/or final time, it can now make sense to include an initial- and/or final cost
to the cost function:
T
J (x) = F (t , x (t ), ẋ (t )) dt + G( x (0)) + K ( x (T )). (1.40)
0

Here G( x (0)) denotes an initial cost, and K ( x (T )) a final cost (also known as ter-
minal cost). The addition of these two costs does not complicate matters much,
as detailed in the next proposition.

Proposition 1.5.1 (Relaxed boundary conditions). Let T > 0, and suppose F :


[0, T ] × Rn × Rn → R is C 1 , and that K ,G : Rn → R are C 1 . Let I0 , IT be subsets of
{1, . . . , n}, and consider the functions x : [0, T ] → Rn whose initial x (0) and final
x (T ) are fixed except for the components
x i (0) = free ∀i ∈ I0 and x j (T ) = free ∀ j ∈ IT .
Among these functions, a C 1 function x ∗ is a stationary solution of the
cost (1.40) iff it satisfies the Euler-Lagrange equation (1.13) together with
∂F (0, x ∗ (0), ẋ ∗ (0)) ∂G( x ∗ (0))
− =0 ∀i ∈ I0 , (1.41)
∂ẋ i ∂x i
∂F (T, x ∗ (T ), ẋ ∗ (T )) ∂K ( x ∗ (T ))
+ =0 ∀ j ∈ IT . (1.42)
∂ẋ j ∂x j

Proof. See Exercise 1.10. ■

This general result is needed in the next chapter when we tackle the optimal
control problem. A common special case is the free endpoint problem, which is
when x (0) is completely fixed and x (T ) is completely free. In the terminology
of Proposition 1.5.1 this means I0 =  and IT = {1, . . . , n}. In this case Proposi-
tion 1.5.1 simplifies as follows.

Corollary 1.5.2 (Free endpoint). Let T > 0, x 0 ∈ Rn , and suppose both F : [0, T ] ×
Rn × Rn → R and K : Rn → R are C 1 . Necessary for a C 1 function x ∗ : [0, T ] → Rn
to minimize
T
J (x) = F (t , x (t ), ẋ (t )) dt + K ( x (T ))
0
24 1 C ALCULUS OF VARIATIONS

over all functions with x (0) = x 0 is that it satisfies the Euler-Lagrange equa-
tion (1.13) together with the free endpoint boundary condition
∂F (T, x ∗ (T ), ẋ ∗ (T )) ∂K ( x ∗ (T ))
+ = 0 ∈ Rn . (1.43)
∂ẋ ∂x


Example 1.5.3 (Quadratic cost with fixed and free endpoint). Let α ∈ R, and
consider minimization of
1
α2 x 2 (t ) + ẋ 2 (t ) dt (1.44)
−1

over all functions x : [−1, 1] → R. First we solve the standard problem, so where
both x (0) and x (T ) are fixed. For instance, assume that

x (−1) = 1, x (1) = 1. (1.45)

The running cost α2 x 2 (t ) + ẋ 2 (t ) is a sum of two squares, so with minimization


we would like both terms small. But one depends on the other. The parameter
α models a trade-off between small ẋ 2 (t ) and small x 2 (t ). Whatever α is, the
optimal solution x needs to satisfy the Euler-Lagrange equation,
∂ d ∂  2 2  d
0= − α x (t ) + ẋ 2 (t ) = 2α2 x (t ) − (2 ẋ (t )) = 2α2 x (t ) − 2 ẍ (t ).
∂x dt ∂ẋ dt
Therefore

ẍ (t ) = α2 x (t ).
This differential equation can be solved using characteristic equations (do this
yourself, see Appendix A.4), and the general solution is

x (t ) = c eαt +d e−αt (1.46)

with c, d two arbitrary constants. The two constants follow from the two bound-
ary conditions (1.45):

1 = x (−1) = c e−α +d e+α ,


1 = x (1) = c e+α +d e−α .

The solution is c = d = 1/(eα + e−α ). That c equals d is expected because of the


symmetry of the boundary conditions. We see that there is exactly one func-
tion x that satisfies the Euler-Lagrange equation and that meets the boundary
conditions:
1.6 S ECOND -O RDER C ONDITIONS FOR M INIMALITY 25

For α = 0 the solution is a constant, x ∗ (t ) = 1, which, in hindsight, is not a


surprise because for α = 0 the running cost is just F (t , x (t ), ẋ (t )) = ẋ 2 (t ) and
then clearly a zero derivative (a constant x (t )) is optimal. For large values of
α, on the other hand, the term x 2 (t ) is penalized strongly in the running cost,
ẋ 2 (t ) + α2 x 2 (t ), so then it pays to take x (t ) close to zero, even if that is at the
expense of some increase of ẋ 2 (t ). Indeed this is what happens.
Consider next the free endpoint problem with

x (−1) = 1 but where x (1) is free.

We stick to the same cost function (1.44). In the terminology of (1.40) this means
x (T ))
we take the initial and final costs equal to zero, G(x) = K (x) = 0. Hence ∂K (∂x =
0, and the free endpoint boundary condition (1.43) thus becomes

∂F (T, x (T ), ẋ (T )) ∂K ( x (T )) ∂α2 x 2 (1) + ẋ 2 (1)


0= + = + 0 = 2 ẋ (1).
∂ẋ ∂x ∂ẋ
The parameters c, d in (1.46) now follow from the initial condition x (−1) = 1 and
the above boundary condition 0 = ẋ (1):

1 = x (−1) = c e−α +d e+α ,


0 = ẋ (1) = cα e+α −d α e−α .

The solution is
e−α e+α
c= , d= ,
e2α + e−2α e2α + e−2α
(check it for yourself ). We see that also in this case the first-order conditions
together with the boundary condition have a unique solution,

The free endpoint condition is that the derivative of x is zero at the final time.
Again we see that the solution approaches zero fast if α is large. 

1.6 Second-Order Conditions for Minimality

The Euler-Lagrange equation was derived from the condition that minimizing
solutions x ∗ are necessarily stationary solutions, i.e., solutions for which

J ( x ∗ + αδx ) = J ( x ∗ ) + o(α)
26 1 C ALCULUS OF VARIATIONS

for every fixed admissible perturbation function δx and all scalars α. But not all
stationary solutions are minimizing solutions. To be minimizing the above term
“o(α)” needs to be nonnegative in a neighborhood of α = 0. In this section we
analyze this problem. We derive a necessary condition and a sufficient condition
for stationary solutions to be minimizing. These conditions are second-order
conditions and they require a second-order Taylor series expansion of F (t , x, y)
for fixed t around (x, y) ∈ Rn × Rn :
∂F (t , x, y) ∂F (t , x, y) δx
F (t , x + δx , y + δ y ) = F (t , x, y) +
∂x T ∂y T δy
⎡ 2 2 ⎤
∂ F (t , x, y) ∂ F (t , x, y)
1 T  ⎢ ∂x∂x T ∂x∂y T ⎥
+ δx δTy ⎢ ⎥ δx (1.47)
2 ⎣ ∂2 F (t , x, y) ∂2 F (t , x, y) ⎦ δ y
∂y∂x T ∂y∂y T
  
Hessian of F
# #
# δ #2 
+ o # δxy # .

(The role of the transpose is explained on page x. More details about this nota-
tion can be found in Appendix A.2.) We assume that F (t , x, y) is C 2 so the above
Taylor series is valid, and the 2n × 2n Hessian of F exists and is symmetric.

Theorem 1.6.1 (Legendre condition—second-order necessary condition).


Consider the simplest problem in the calculus of variations, and suppose that F
is C 2 . Let x ∗ be a C 2 solution of the Euler-Lagrange equation (1.13) and which
satisfies the boundary conditions (1.12). Necessary for x ∗ to be minimizing is
that
∂2 F (t , x ∗ (t ), ẋ ∗ (t ))
≥0 ∀t ∈ [0, T ]. (1.48)
∂ẋ∂ẋ T
Proof. For ease of notation we prove it for the case that x has one component.
Similar to the proof of Theorem 1.2.2, let δx be a C 2 -perturbation on [0, T ] that
satisfies the boundary condition (1.14). Let α ∈ R and define J¯(α) as

J¯(α) := J ( x ∗ + αδx ).

By construction we have that every solution x ∗ of the Euler-Lagrange equation


achieves J¯ (0) = 0. For simplicity of notation we omit time arguments in what
follows. With the help of (1.47) we find that
⎡ 2 ⎤
T ∂ F (t , x ∗ , ẋ ∗ ) ∂2 F (t , x ∗ , ẋ ∗ )
 
J¯ (0) = δx δ̇x ⎣ 2
∂x ∂x∂ẋ ⎦ δx dt
2

0 ∂ F (t , x ∗ , ẋ ∗ ) ∂ F (t , x ∗ , ẋ ∗ )
2
δ̇x
∂x∂ẋ ∂ẋ 2
  
Hessian
T
∂2 F 2 ∂ F2
∂ F 2
2
= δ
∂x 2 x
+ 2 ∂x∂ ẋ δx δ̇x + ∂ẋ 2 δ̇x dt . (1.49)
0
1.6 S ECOND -O RDER C ONDITIONS FOR M INIMALITY 27

If x ∗ is optimal then this has to be nonnegative for every allowable δx . This


does not necessarily mean that the Hessian is positive semi-definite because
δx and δ̇x are related. Indeed, using integration by parts, the cross term can be
rewritten as
T T T
T
∂2 F ∂2 F d 2 ∂2 F 2 d ∂2 F 2
2 ∂x∂ẋ δx δ̇x dt = ∂x∂ẋ ( dt δx ) dt = ∂x∂ẋ δx − ( dt ∂x∂ẋ )δx dt .
0
0 0    0
0

Therefore
T
 ∂2 F 
J¯ (0) = ∂2 F
δ2x + ∂∂ẋF2 δ̇2x dt .
2
d
∂x 2
− dt ∂x∂ẋ (1.50)
0

If x ∗ is optimal then J¯ (0) ≥ 0 for every allowable perturbation δx . Lemma 1.6.2
∂2 F (t , x ∗ (t ), ẋ ∗ (t ))
(presented next) applied to (1.50) shows that this implies that ∂ẋ 2
is
nonnegative for all time, i.e., that (1.48) holds. ■

The above proof uses the following lemma.

Lemma 1.6.2 (Technical lemma). Let φ and ψ be continuous functions from


[0, T ] to R, and suppose that
T
φ(t )δ2x (t ) + ψ(t )δ̇2x (t ) dt ≥ 0 (1.51)
0

for every C 2 function δx : [0, T ] → R with δx (0) = δx (T ) = 0. Then

ψ(t ) ≥ 0 ∀t ∈ [0, T ].

Proof. Suppose, on the contrary, that ψ(t̄ ) < 0 for some t̄ ∈ [0, T ]. Then for every
> 0 we can construct a possibly small interval [a, b] about t̄ in [0, T ] and a C 2
function δx on [0, T ] that is zero for t ∈ [a, b] and that satisfies
b b
δ2x (t ) dt < and δ̇2x (t ) dt > 1.
a a

This may be clear from Figure 1.15. Such a δx satisfies all the conditions of the
lemma but renders the integral in (1.51) negative for small enough > 0. That is
a contradiction, and so the assumption that ψ(t̄ ) < 0 is wrong. ■

x (t )
0 a T
b

(t )

F IGURE 1.15: About the construction of a δx (t ) that violates (1.51). See the
proof of Lemma 1.6.2.
28 1 C ALCULUS OF VARIATIONS

This second-order condition (1.48) is known as the Legendre condition.


Notice that the inequality (1.48) means that ∂ F (t ,∂xẋ∂
∗ (t ), ẋ ∗ (t ))
2

ẋ T (which is an n × n
matrix if x has n components) is a symmetric positive semi-definite matrix at
every moment in time.

Example 1.6.3 (Example 1.1.3 continued). The running cost of Example 1.1.3 is

F (t , x, ẋ) = αẋ 2 + βx,


∂2 F (t ,x,ẋ)
and so the second derivative with respect to ẋ is ∂ẋ 2
= 2α. It is given that
α > 0, hence the Legendre condition,
∂2 F (t , x ∗ (t ), ẋ ∗ (t ))
≥0 ∀t ∈ [0, T ],
∂ẋ 2
trivially holds for the solution x ∗ of the Euler-Lagrange equation. 

Example 1.6.4 (Example 1.5.3 continued). The running cost of Example 1.5.3 is
F (t , x, ẋ) = α2 x 2 + ẋ 2 . Therefore ∂2 F (t , x (t ), ẋ (t ))/∂ẋ 2 = 2 ≥ 0 for all functions x
and all t . This holds in particular for x ∗ , so the Legendre condition holds. 

Example 1.6.5 (Optimal investment, Example 1.1.2 continued). The running


cost F for the optimal investment application of Example 1.1.2 is
 
F (t , x, ẋ) = −u φ(x) − ẋ e−αt .

This is derived from (1.5), but we added a minus sign because the application is
about maximization, not minimization. Now
∂2 F (t , x, ẋ)  
= −u  φ(x) − ẋ e−αt ,
∂ẋ 2

and this is nonnegative for every t , x, ẋ since the utility function u is assumed
to be concave, i.e., u  (c) ≤ 0 for all c > 0. So, apart from the standard economic
interpretation that utility functions are concave, this assumption is also crucial
for the optimization problem to have a solution. 

In the preceding examples, the Legendre condition was easy to verify


because the second derivative of F with respect to ẋ turned out to be trivially
nonnegative for all x, ẋ and all time, and not just for the optimal x ∗ (t ), ẋ ∗ (t ).
The Euler-Lagrange condition together with the Legendre condition is nec-
essary but is still not sufficient for minimality. This is illustrated by the next
example.

Example 1.6.6 (Stationary solution, but not a minimizer). The Euler-Lagrange


equation for the minimization of
1
ẋ (t ) 2
− x 2 (t ) dt (1.52)
0 2π
1.6 S ECOND -O RDER C ONDITIONS FOR M INIMALITY 29

is the differential equation (2π)2 x (t ) + ẍ (t ) = 0. Assuming the boundary condi-


tions

x (0) = x (1) = 0,

it is easy to see that the stationary solutions are

x ∗ (t ) = A sin(2πt ), A ∈ R.

Each such solution x ∗ satisfies the Legendre condition (1.48) since

∂2 F (t , x ∗ (t ), ẋ ∗ (t )) 2
= > 0.
∂ẋ 2 (2π)2
Also, each such x ∗ renders the integral in (1.52) equal to zero. There are how-
ever many other functions x that satisfy x (0) = x (1) = 0 but for which the inte-
gral (1.52) takes a negative value. For example x (t ) = −t 2 + t . By scaling this last
function with a constant we can make the cost as negative as we desire. Thus in
this example there is no optimal solution x ∗ . 

A closer look at the proof of Theorem 1.6.1 actually provides us with an ele-
gant sufficient condition for optimality, in fact for global optimality. If the Hes-
sian of F , defined earlier as
⎡ 2 ⎤
∂ F (t , x, y) ∂2 F (t , x, y)
⎢ ∂x∂x T ∂x∂y T ⎥
⎢ ⎥
H (t , x, y) := ⎢ 2 ⎥, (1.53)
⎣ ∂ F (t , x, y) ∂2 F (t , x, y) ⎦
∂y∂x T ∂y∂y T

for each t is positive semi-definite for all x ∈ Rn and all y ∈ Rn , then at each t the
running cost F (t , x, ẋ) is convex in x, ẋ (see Appendix A.7). For convex functions
it is known that stationarity implies global optimality:

Theorem 1.6.7 (Convexity—global optimal solutions). Consider the simplest


problem in the calculus of variations, and suppose that F is C 2 . If the Hes-
sian (1.53) is positive semi-definite8 for all x, y ∈ Rn and all t ∈ [0, T ] then every
C 1 solution x ∗ of the Euler-Lagrange equation that meets the boundary condi-
tions is a global optimal solution.
If the Hessian is positive definite for all x, y ∈ Rn and all t ∈ [0, T ] then this
x ∗ is the unique optimal solution.

Proof. Suppose that the Hessian is positive semi-definite. Let x ∗ , x be two func-
tions that satisfy the boundary conditions, and suppose x ∗ satisfies the Euler-
Lagrange equation. Define the function δ = x − x ∗ and J¯(α) = J ( x ∗ + αδ). This
way J¯(0) = J ( x ∗ ) while J¯(1) = J ( x ). We need to prove that J¯(1) ≥ J¯(0).
8 The relation between positive semi-definite Hessians and convexity is explained in

Appendix A.7.
30 1 C ALCULUS OF VARIATIONS

As before, we have that J¯ (0) is zero by the fact that x ∗ satisfies the Euler-
Lagrange equation.
The second derivative of J¯(α) with respect to α is (omitting time arguments)
T
 T  δ
J¯ (α) = δ δ̇T H (t , x ∗ + αδ, ẋ ∗ + αδ̇) dt .
0 δ̇

Since H (t , x, y) is positive semi-definite for all x, y ∈ Rn and all t , we see that


J¯ (α) ≥ 0 for all α ∈ R. Therefore for every β ≥ 0 there holds

 
¯ ¯
J (β) = J (0) + J¯ (α) dα ≥ J¯ (0) = 0.
0
1
But then J¯(1) = J¯(0) + 0 J¯ (β) dβ ≥ J¯(0), which is what we had to prove.
Next suppose that H (t , x, y) is positive definite and that x = x ∗ . Then δ := x −
x ∗ is not the zero function and so by positive definiteness of H (t , x, y) we have
J  (α) > 0 for every α ∈ [0, 1]. Then J ( x ) = J¯(1) > J¯(0) = J ( x ∗ ). ■

This result produces a lot, but also requires a lot. Indeed the convexity
assumption fails in many cases of interest. Here are a couple examples where
the convexity assumption is satisfied.

Example 1.6.8 (Shortest path; Example 1.2.5 continued).


 In the notation of the
shortest path Example 1.1.4 we have F (x, y, ẏ) = 1 + ẏ , and so we find that
2

∂F (x, y, ẏ) ẏ
= ,
∂ ẏ (1 + ẏ 2 )1/2
and
∂2 F (x, y, ẏ) 1
= .
∂ ẏ 2 (1 + ẏ 2 )3/2
Clearly, this second derivative is positive for all y, ẏ ∈ R. This implies that
the solution y ∗ found in Example 1.2.5—namely, the straight line through the
points (x 0 , y 0 ) and (x 1 , y 1 )—satisfies the Legendre condition.
The Hessian (1.53) is
$ %
0 0
H (x, y, ẏ) = ≥ 0.
0 (1+ ẏ12 )3/2

It is positive semi-definite, and, hence, the straight-line solution y ∗ is globally


optimal. 

Example 1.6.9 (Quadratic cost; Example 1.5.3 continued). For the quadratic
cost
1
J ( x ) := α2 x 2 (t ) + ẋ 2 (t ) dt ,
−1
1.7 I NTEGRAL C ONSTRAINTS 31

as used in Example 1.5.3, the Hessian is constant,

2α2 0
H (t , x, ẋ) = .
0 2

This Hessian is positive definite for every α = 0 and, hence, the solution x ∗ of
the Euler-Lagrange equation found in Example 1.5.3 is the unique optimal solu-
tion of the problem. For α = 0, the Hessian is positive semi-definite, so Theo-
rem 1.6.7 guarantees that x ∗ is optimal, but possibly not unique. (Actually, for
α = 0 the solution x ∗ found in Example 1.5.3 is the unique differentiable optimal
solution because it achieves a zero cost, J ( x ∗ ) = 0, and for all other differentiable
x the cost is positive). 

The Legendre condition (1.48) is only one of several necessary conditions for
optimality. Additional necessary conditions go under the names of Weierstrass
and Jacobi. Actually, the necessary condition of Weierstrass follows nicely from
the dynamic programming approach as explained in Chapter 3, Exercise 3.10
(p. 114).
One can pose many different types of problems in the calculus of varia-
tions by giving different boundary conditions, for instance, involving ẋ (T ), or by
imposing further constraints on the required solution. An example of the latter
we saw in (1.8) where ẋ (t ) needs to be nonnegative for all time. Also, in Exer-
cise 1.18, we explain what to do if x (T ) needs to satisfy an inequality. Another
variation is considered in the next section.

1.7 Integral Constraints

F IGURE 1.16: Three areas enclosed by ropes of the same length. See § 1.7.

An interesting extension is when the function x that is to minimize the cost


T
J ( x ) := F (t , x (t ), ẋ (t )) dt
0

is not free to choose, but is subject to an integral constraint


T
C ( x ) := M (t , x (t ), ẋ (t )) dt = c 0 .
0

The standard example of this type is Queen Dido’s isoperimetric problem. This
is the problem of determining an area as large as possible that is enclosed by
a rope of a given length. Intuition tells us that the optimal area is a disk (the
32 1 C ALCULUS OF VARIATIONS

right-most option in Fig. 1.16). To put it more mathematically, in this prob-


lem we have to find a function x : [0, T ] → R with given boundary values x (0) =
x 0 , x (T ) = x T , that maximizes the area
T
J (x) = x (t ) dt
0

subject to the constraint that


T 
1 + ẋ 2 (t ) dt =
0

for a given .
How to solve such constrained minimization problems? A quick-and-dirty
argument goes as follows: from calculus it is known that the solution of a min-
imization problem of some function J ( x ) subject to the constraint C ( x ) − c 0 = 0
is a stationary solution of the augmented function J defined as
T
J( x , μ) := J ( x ) + μ(C ( x ) − c0 ) = F (t , x (t ), ẋ (t )) + μM (t , x (t ), ẋ (t )) dt − μc 0
0

for some Lagrange multiplier 9 μ ∈ R. The stationary solutions ( x ∗ , μ∗ ) of J( x , μ)


must satisfy the Euler-Lagrange equation,
∂ d ∂
− (F (t , x ∗ (t ), ẋ ∗ (t )) + μ∗ M (t , x ∗ (t ), ẋ ∗ (t )) = 0.
∂x dt ∂ẋ
Below we formally prove that this argument is essentially correct. This may
sound a bit vague, but it does put us on the right track. The theorem presented
next is motivated by the above, but the proof is given from scratch. The proof
assumes knowledge of the inverse function theorem.

Theorem 1.7.1 (Euler-Lagrange for integral-constrained minimization). Let c 0


be some constant. Suppose that F and M are C 1 in all of its components, and
that x ∗ is a minimizer of
T
F (t , x (t ), ẋ (t )) dt
0

subject to boundary conditions x (0) = x 0 , x (T ) = x T and integral constraint


T
M (t , x (t ), ẋ (t )) dt = c 0 ,
0

and that x ∗ is C 2 . Then either there is a Lagrange multiplier μ∗ ∈ R such that


∂ d ∂  
− F (t , x ∗ (t ), ẋ ∗ (t )) + μ∗ M (t , x ∗ (t ), ẋ ∗ (t )) = 0 (1.54)
∂x dt ∂ẋ

9 Lagrange multipliers are usually denoted as λ. We use μ in order to avoid a confusion in the

next chapter.
1.7 I NTEGRAL C ONSTRAINTS 33

for all t ∈ [0, T ], or M satisfies the Euler-Lagrange equation itself,


∂ d ∂
− M (t , x ∗ (t ), ẋ ∗ (t )) = 0 ∀t ∈ [0, T ]. (1.55)
∂x dt ∂ẋ

Proof. This is not an easy proof. Suppose x ∗ solves the constrained minimiza-
tion problem, and fix two C 2 functions δx , x that vanish at the boundaries,

δx (0) = 0 = x (0), δx (T ) = 0 = x (T ).
T T
Define J ( x ) = 0 F (t , x (t ), ẋ (t )) dt and C ( x ) = 0 M (t , x (t ), ẋ (t )) dt and consider
the mapping that sends two real numbers (α, β) to the two real numbers
J¯(α, β) J ( x ∗ + αδx + β x )
:= .
C̄ (α, β) C ( x ∗ + αδx + β x )

The mapping from (α, β) to ( J¯(α, β), C̄ (α, β)) is C 1 . So if the Jacobian at (α, β) =
(0, 0),
⎡ ¯ ⎤
∂ J (α, β) ∂ J¯(α, β)
⎢ ∂α ∂β ⎥
D := ⎢ ⎥
⎣ ∂C̄ (α, β) ∂C̄ (α, β) ⎦ (1.56)

∂α ∂β (α=0,β=0)

of this mapping is nonsingular then by the inverse function theorem there is


a neighborhood of (α, β) = (0, 0) on which the mapping is invertible. In par-
ticular, we then can find small enough α, β such that C̄ (α, β) = C̄ (0, 0) = c 0 —
hence satisfying the integral constraint—but rendering a cost J¯(α, β) smaller
than J¯(0, 0) = J ( x ∗ ). This contradicts that x ∗ is minimizing. Conclusion: at an
optimal x ∗ the Jacobian (1.56) is singular for all allowable perturbation func-
tions δx , x .
We rewrite the Jacobian (1.56) in terms of F and M . To this end define the
functions f and m as
∂ d ∂
f (t ) = − F (t , x ∗ (t ), ẋ ∗ (t )),
∂x dt ∂ẋ
∂ d ∂
m (t ) = − M (t , x ∗ (t ), ẋ ∗ (t )).
∂x dt ∂ẋ
This way the Jacobian (1.56) becomes (verify this for yourself )
$T T %
0 f (t )δx (t )dt 0 f (t ) x (t )dt
D = T T . (1.57)
0 m (t )δ x (t )dt 0 m (t ) x (t )dt
If m (t ) = 0 for all t then (1.55) holds and the proof is complete. Remains to con-
sider the case that m (t 0 ) = 0 for at least one t 0 . Suppose, to obtain a contraction,
that given such a t 0 there is a t for which
f (t0 ) f (t )
(1.58)
m (t0 ) m (t )
34 1 C ALCULUS OF VARIATIONS

is nonsingular. Now take δx to have support around t 0 and x to have support


around t . Then by nonsingularity of (1.58) also (1.57) is nonsingular if the sup-
port is taken small enough. However nonsingularity of the Jacobian is impossi-
ble by the fact that x ∗ solves the minimization problem. Therefore we conclude
that (1.58) is singular at every t . This means that
f (t0 ) m (t ) − f (t ) m (t0 ) = 0 ∀t .
In other words f (t ) + μ∗ m (t ) = 0 for all t if we take μ∗ = − f (t 0 )/ m (t 0 ). ■

The theorem says that the solution x ∗ satisfies either (1.54) or (1.55). The
first of these two is called the normal case, and the second the abnormal case.
Notice that the abnormal case completely neglects the running cost F . The next
example indicates that we usually have the normal case.

Example 1.7.2 (Normal and abnormal Euler-Lagrange equation). Consider


1
minimizing 0 x (t ) dt subject to the boundary conditions x (0) = 0, x (1) = 1 and
integral constraint
1
ẋ 2 (t ) dt = C (1.59)
0
for some given C . The (normal) Euler-Lagrange equation (1.54) becomes
∂ d ∂ d 
0= − ( x ∗ (t ) + μ ẋ 2∗ (t )) = 1 − 2μ ẋ ∗ (t ) = 1 − 2μ ẍ ∗ (t ).
∂x dt ∂ẋ dt
The general solution of this equation is x ∗ (t ) = 4μ
1 2
t + bt + c. The constants b, c
are determined by the boundary conditions x (0) = 0, x (1) = 1, leading to
x ∗ (t ) = 4μ
1 2 1
t + (1 − 4μ )t .
With this form the integral constraint (1.59) becomes
1 1
1
C= ẋ 2∗ (t ) dt = ( 2μ
1 1 2
t + 1 − 4μ ) dt = 1 + 2
. (1.60)
0 0 48μ
If C < 1 then clearly no solution μ exists, and it is not hard to see that then
no smooth function with x (0) = 0 and x (1) = 1 exists that meets the integral
constraint (see Exercise 1.21). For C > 1 there are two μ’s that satisfy (1.60):
±1
μ∗ =  ,
48(C − 1)
and the resulting two functions x ∗ (for C = 2) then are
1.8 E XERCISES 35

1
Clearly, out of these two, the cost J ( x ∗ ) := 0 x ∗ (t ) dt is minimal for the positive
solution μ∗ .
In the abnormal case, (1.55), we have that
∂ d ∂
0= − ẋ 2 (t ) = −2 ẍ ∗ (t ).
∂x dt ∂ẋ ∗
Hence x ∗ (t ) = bt + c for some b, c. Given the boundary conditions x (0) =
0, x (1) = 1 it is immediate that this allows for only one solution: x ∗ (t ) = t :

Now ẋ ∗ (t ) = 1, and the constant C in the integral constraint necessarily equals


1
C = 0 ẋ 2∗ (t ) dt = 1. This corresponds to μ = ∞. In this case the integral con-
straint together with the boundary conditions is tight. There are, so to say, no
degrees of freedom left to shape the function. In particular, there is no feasi-
ble variation, x = x ∗ +αδx , and since the standard Euler-Lagrange equation was
derived from such a variation, it is no surprise that the standard Euler-Lagrange
equation does not apply in this case. 

1.8 Exercises

1.1 Determine all solutions x : [0, T ] → R of the Euler-Lagrange equation for


T
the cost J ( x ) = 0 F (t , x (t ), ẋ (t )) dt with

(a) F (t , x, ẋ) = ẋ 2 − α2 x 2 .
(b) F (t , x, ẋ) = ẋ 2 + 2x.
(c) F (t , x, ẋ) = ẋ 2 + 4t ẋ.
(d) F (t , x, ẋ) = ẋ 2 + x ẋ + x 2 .
(e) F (t , x, ẋ) = x 2 + 2t x ẋ (this one is curious).

1.2 Consider minimization of


1
ẋ 2 (t ) + 12t x (t ) dt
0

over all functions x : [0, 1] → R that satisfy the boundary conditions


x (0) = 0, x (1) = 1.
36 1 C ALCULUS OF VARIATIONS

(a) Determine the Euler-Lagrange equation for this problem.


(b) Determine the solution x ∗ of the Euler-Lagrange equation and that
satisfies the boundary conditions.

1.3 Trivial running cost. Consider minimization of


T
J ( x ) := F (t , x (t ), ẋ (t )) dt
0
over all functions x : [0, T ] → R with given boundary conditions x (0) =
x 0 , x (T ) = x T . Assume that the running cost has the particular form,

F (t , x (t ), ẋ (t )) = dt G(t , x (t ))
d

for some C 2 function G(t , x).

(a) Derive the Euler-Lagrange equation for this problem.


(b) Show that every differentiable function x : [0, T ] → R satisfies the
Euler-Lagrange equation.
(c) Explain this remarkable phenomenon by expressing J ( x ) in terms of
the function G and boundary values x 0 , x T .

1.4 Technical problem: the lack of Lipschitz continuity in the Beltrami identity
for the brachistochrone problem, and how to circumvent it. The footnote
of Example 1.3.1 derives the cycloid equations (1.29) from

c 2 = y (x)(1 + ẏ 2 (x)), y (0) = 0. (1.61)

The derivation was quick, and this exercise shows that it was a bit dirty as
well.
dy
(a) Let x (φ), y (φ) be the cycloid solution (1.29). Use the identity dx =
dy/dφ
dx/dφ to show that they satisfy (1.61).
(b) The curve of this cycloid solution for φ ∈ [0, 2π] is

From this solution we construct a new solution by inserting in the


middle a constant part of some length Δ ≥ 0:
1.8 E XERCISES 37

Argue that for every Δ ≥ 0 also this new function satisfies the Bel-
trami identity (1.61) for all x ∈ (0, c 2 π + Δ).
(c) This is not what the footnote of Example 1.3.1 says. What goes wrong
in this footnote?
(d) This new function y (x) is constant over the interval [ c 2π , c 2π + Δ].
2 2

Show that a constant function y (x) does not satisfy the Euler-
Lagrange equation of the brachistochrone problem.
(e) It can be shown that y (x) solves (1.61) iff it is of this new form for
some Δ ≥ 0 (possibly Δ = ∞). Argue that the only function that sat-
isfies the Euler-Lagrange equation with y (0) = 0 is the cycloid solu-
tion (1.29).

y
y(x)
y1

air speed v 0 x1

F IGURE 1.17: Solid of least resistance. See Exercise 1.5.

1.5 A simplified Newton’s minimal resistance problem. Consider a solid of rev-


olution with diameter y (x) as shown in Fig. 1.17. At x = 0 the diameter is
0, and at x = x 1 it is y 1 > 0. If the air flows with a constant speed v, then
the total air resistance (force) can be modeled as
x 1
2 y (x) ẏ 3 (x)
4πρv dx.
0 1 + ẏ 2 (x)
Here ρ is the air density. The question is: given y (0) = 0 and y (x 1 ) = y 1 > 0,
for which function y : [0, x 1 ] → R is the resistance minimal? Now we are
going to cheat! To make the problem a lot easier we discard the quadratic
term in the denominator of the running cost, that is, we consider instead
the cost function
x 1
J ( y ) := 4πρv 2 y (x) ẏ 3 (x) dx.
0

Given the boundary conditions y (0) = 0 and y (x 1 ) = y 1 > 0, show that


3/4
x
y (x) = y1
x1
is a solution of the Beltrami identity with the given boundary conditions.
(This function y is depicted in Fig. 1.17.)
38 1 C ALCULUS OF VARIATIONS

1.6 Technical problem: the lack of Lipschitz continuity in the minimal-surface


problem, and how to circumvent it. In Example 1.3.2 we claimed that
r a (x) := a cosh(x/a) is the only positive even solution of (1.30). That is
not completely correct. In this exercise we see that the differential equa-
tion (1.30), as derived from the Beltrami identity, has more solutions, but
that r a (x) is the only even solution that satisfies the Euler-Lagrange equa-
tion. We assume that a > 0.

(a) Show that the function



f (r ) := r 2 /a 2 − 1 (1.62)

is not Lipschitz continuous at r = a (see Appendix B.1). Hence we


can expect multiple solutions of the differential equation d rdx(x) =

r 2 (x)/a 2 − 1 if r (x) = a.
(b) Show that (1.30) can be separated as

d r (x)
 = dx.
r 2 (x)/a 2 − 1

(c) If r (x 0 ) > a, show that r (x) = a cosh((x − c)/a) around x = x 0 for


some c.
(d) Argue that r (x) is a solution of (1.30) iff it is pieced together from a
hyperbolic cosine, a constant, and a hyperbolic cosine again, as in

Here c ≤ d . (Notice that for x ∈ [c, d ] the value of r (x) equals a, so at


that point the function f as defined in (1.62) is not Lipschitz contin-
uous.)
(e) If c < d then on the strip [c, d ] the function r (x) is a constant (equal
to a > 0). Show that this r (x) does not satisfy the Euler-Lagrange
equation. (Recall that the Beltrami identity may have more solutions
than the Euler-Lagrange equation.)
(f ) Verify that r a (x) := a cosh(x/a) is the only function that satisfies the
Euler-Lagrange equation of the minimal-surface problem (Exam-
ple 1.3.2) and that has the symmetry property that r (−1) = r (+1).

1.7 Lemma of du Bois-Reymond. The proof of Theorem 1.2.2 at some point


assumes that both x ∗ and F are C 2 . The lemma of du Bois-Reymond that
we explore in this exercise shows that the result also holds if x ∗ and F are
merely C 1 . Throughout this exercise we assume that x ∗ and F are C 1 .
1.8 E XERCISES 39

(a) Lemma of du Bois-Reymond. Let f : [0, T ] → R be a continuous func-


T
tion, and suppose that 0 f (t )φ(t ) dt = 0 for all continuous functions
T
φ : [0, T ] → R for which 0 φ(t ) dt = 0. Show that f (t ) is constant on
[0, T ].
[Hint: If f is not constant then a, b ∈ [0, T ] exist for which f (a) =
T
f (b). Then construct a φ for which 0 f (t )φ(t ) dt = 0.]
(b) We showed in the proof of Theorem 1.2.2 that C 1 optimal solutions
x ∗ satisfy
T
∂F (t , x ∗ (t ), ẋ ∗ (t )) ∂F (t , x ∗ (t ), ẋ ∗ (t ))
δx (t )+ δ̇x (t ) dt = 0 (1.63)
0 ∂x T
∂ẋ T
for all t ∈ [0, T ] and all C 1 functions δx : [0, T ] → Rn with δx (0) =
δx (T ) = 0. In the proof of Theorem 1.2.2, we performed integration
by parts on the second term of the integral in (1.63). Now, instead,
we perform integration by parts on the first term in (1.63). Use that
to show that (1.63) holds iff
T t
∂F (τ, x ∗ (τ), ẋ ∗ (τ)) ∂F (t , x ∗ (t ), ẋ ∗ (t ))
− dτ + δ̇x (t ) dt = 0
0 0 ∂x T
∂ẋ T

for all C 1 functions δx : [0, T ] → Rn with δx (0) = δx (T ) = 0.


(c) Use the lemma of du Bois-Reymond to show that C 1 optimal solu-
tions x ∗ satisfy
t
∂F (t , x ∗ (t ), ẋ ∗ (t )) ∂F (τ, x ∗ (τ), ẋ ∗ (τ))
=c+ dτ ∀t ∈ [0, T ]
∂ẋ 0 ∂x
T
for some constant c ∈ Rn . [Hint: 0 δ̇x (t ) dt = 0.]
(d) Show that for C 1 optimal solutions x ∗ the expression
d ∂F (t , x ∗ (t ), ẋ ∗ (t ))
dt ∂ẋ
is well defined and continuous at every t ∈ [0, T ].
(e) Show that C 1 optimal solutions x ∗ satisfy the Euler-Lagrange equa-
tion (1.13).

1.8 Free endpoint. Minimize


1
x 2 (1) + ẋ 2 (t ) dt
0

over all functions x subject to x (0) = 1 and free endpoint x (1).

1.9 Free endpoint. Consider minimization of


1
J (x) = ẋ 2 (t ) − 2 x (t ) ẋ (t ) − ẋ (t ) dt
0

with initial condition x (0) = 1 and free endpoint x (1).


40 1 C ALCULUS OF VARIATIONS

(a) Show that no function x exists that satisfies the Euler-Lagrange


equation with x (0) = 1 and the free endpoint boundary condi-
tion (1.43).
(b) Conclude that there is no C 1 function x that minimizes J ( x ) subject
to x (0) = 1 with free endpoint.
(c) Determine all functions x that satisfy the Euler-Lagrange equation
and such that x (0) = 1. Then compute J ( x ) explicitly and conclude,
once more, that the free endpoint problem has no solution.

1.10 Relaxed boundary conditions. In this exercise we prove Proposition 1.5.1.

(a) For G(x) = K (x) = 0 the first-order conditions are that (1.38) holds for
all possible perturbations. Adapt this equation for the case that G(x)
and K (x) are arbitrary C 1 functions.
(b) Prove that this equality implies that the Euler-Lagrange equation
holds.
(c) Finish the proof of Proposition 1.5.1.

1.11 Show that the minimal surface example (Example 1.3.2) satisfies the Leg-
endre second-order necessary condition of Theorem 1.6.1.

1.12 Smoothness assumptions in Legendre’s necessary condition. Theorem 1.6.1


assumes that F is C 2 , but looking at the proof it seems we need F to be C 3
(see Eqn. (1.50)). However, C 2 is sufficient: argue that the integral in (1.49)
is nonnegative for all allowable δx only if the Legendre condition holds.
[Hint: Formulate a lemma similar to Lemma 1.6.2.]

1.13 Show that the minimization problem in Example 1.2.8 satisfies the Legen-
dre condition. [Hint: The condition now involves a 2 × 2 matrix.]

1.14 The optimal solar challenge. A solar vehicle receives power from solar
radiation. This power p(x, t ) depends on position x (due to clouds) and
on time t (due to moving clouds and the sun’s angle of inclination). Driv-
ing at some speed ẋ also consumes power. Denote this power loss by f (ẋ).
This assumes that it is a function of speed alone, which is reasonable if we
do not change speed aggressively and if friction depends only on speed.
Driving at higher speed requires more energy per meter than driving at
lower speed. This means that f is convex, in fact

f (ẋ) ≥ 0, f  (ẋ) > 0, f  (ẋ) > 0.

Suppose the solar team starts at

x (0) = 0,
1.8 E XERCISES 41

and at time T it wants to be at some position x (T ) = x T , and, of course,


all that using minimal net energy
T
f ( ẋ (t )) − p( x (t ), t ) dt .
0

(a) Derive the Euler-Lagrange equation for this problem.


(b) Argue from the Euler-Lagrange equation that we should speed up if
we drive into a cloud.
(c) Is Legendre’s second-order condition satisfied?
(d) From now on assume that

f (ẋ) = ẋ 2

(this is actually quite reasonable, modulo scaling) and that p(x, t )


does not depend on time,

p(x, t ) = q(x),

i.e., that the sun’s angle does not change much over our time window
[0, T ] and that clouds are not moving. Use the Beltrami identity to
express ẋ (t ) in terms of q( x (t )) and the initial speed ẋ (0) and initial
q(0).
(e) Argue once again (but now using the explicit relation of the previous
part) that we should speed up if we drive into a cloud.
(f ) (A computer might be useful for this part.) Continue with f (ẋ) = ẋ 2
and p(x, t ) = q(x). Suppose that up to position x = 20 the sky is clear
but that from x = 20 onwards heavy clouds limit the power input:

100 x < 20,
q(x) =
4 x > 20.

Determine the optimal speed ẋ ∗ (t ), t ∈ [0, 7] that brings us from


x (0) = 0 to x (7) = 90.

1.15 Consider minimization of


1
ẋ 2 (t ) − x (t ) dt
0

over all functions x : [0, 1]→R that satisfy the boundary conditions x (0) = 0,
x (1) = 1.

(a) Determine the Euler-Lagrange equation for this problem.


(b) Determine the solution x ∗ of the Euler-Lagrange equation and that
satisfies the boundary conditions.
42 1 C ALCULUS OF VARIATIONS

(c) Does the x ∗ found in (b) satisfy Legendre’s second-order condition?


(d) Is the convexity condition (Theorem 1.6.7) satisfied?
(e) Show that the solution x ∗ found in (b) is globally optimal.

1.16 Convex quadratic cost. Consider minimization of the quadratic cost


1
J (x) = ẋ 2 (t ) + x 2 (t ) + 2t x (t ) dt
0

with boundary conditions

x (0) = 0, x (1) = 1

over all functions x : [0, 1] → R.

(a) Determine the Euler-Lagrange equation for this problem.


(b) Determine the function x ∗ that satisfies the Euler-Lagrange equation
and the given boundary conditions.
(c) Does x ∗ satisfy Legendre’s second-order condition?
(d) Show that
1
J ( x ∗ + δx ) = J ( x ∗ ) + δ2x (t ) + δ̇2x (t ) dt
0

for every continuously differentiable function δx with δx (0) = δx (1) =


0, and conclude that x ∗ is globally optimal.
(e) Is the convexity condition (Theorem 1.6.7) satisfied?

1.17 Smoothness. This exercise is from Liberzon (2012). It shows that smooth
running costs F may result in non-smooth optimal solutions x ∗ . Consider
minimization of
1
J (x) = (1 − ẋ (t ))2 x 2 (t ) dt
−1

subject to the boundary conditions

x (−1) = 0, x (1) = 1.

(a) Show that J ( x ) ≥ 0 for every function x .


(b) Determine a continuous optimal solution x ∗ and argue that it is
unique. (Hint: J ( x ∗ ) = 0 and do not use Euler-Lagrange or Beltrami.)
(c) Argue that there is no continuously differentiable optimal solu-
tion x ∗ .
1.8 E XERCISES 43

1.18 Inequalities. The calculus of variations problems considered in this


chapter all assume that the entries of x (0) and x (T ) are either fixed
or completely free. But what if we demand an inequality? Consider,
as an example, the calculus of variations problem with standard cost
T
0 F (t , x (t ), ẋ (t )) dt and standard initial condition, x (0) = x 0 , but whose
final condition is an inequality,
x (T ) ≥ xT .
Assume sufficient smoothness of all functions involved.

(a) Show that optimal solutions x ∗ must obey the Euler-Lagrange equa-
tion, and the inequality
∂F ( x ∗ (T ), ẋ ∗ (T ), T )
≥ 0.
∂ẋ
1
(b) Verify this statement for the cost 0 ( x (t ) − ẋ (t ))2 dt with x (0) =
1, x (1) ≥ x T , and distinguish the cases x T ≤ e and x T > e.

1.19 The hanging cable. Every hanging cable eventually comes to a halt in a
position of minimal energy, such as these three:

What is the shape of this minimal energy position? When hanging still it
has no kinetic energy, it only has potential energy. If the cable is very flex-
ible then the potential energy is only due to its height y. We assume that
the cable is very thin, does not stretch and that it has a constant mass per
unit length. In a constant gravitational field with gravitational accelera-
tion g the potential energy J ( y ) equals
x 1 
J (y) = ρg y (x) 1 + ẏ 2 (x) dx,
x0

with ρ the mass per unit length of the cable. We want to minimize the
potential energy over all functions y : [x 0 , x 1 ] → R, subject to y (x 0 ) =
y 0 , y (x 1 ) = y 1 and such that the length of the cable is . The length of the
cable can be expressed as
x 1 
1 + ẏ 2 (x) dx = .
x0

To solve this problem we use Theorem 1.7.1.


44 1 C ALCULUS OF VARIATIONS

(a) Consider first the normal case, and the associated Euler-Lagrange
equation (1.54). Analyze the Beltrami identity of this case to show
that the minimal energy solution y ∗ satisfies

y ∗ (x) + μ∗ ρg
1
= a 1 + ẏ 2 (x)

for some constant a and Lagrange multiplier μ∗ . (Hint: We con-


sidered a similar problem in Example 1.3.2.) It can be shown that
the general solution of the above differential equation is y ∗ (x) =
a cosh( x−b 1
a ) − μ∗ ρg with b ∈ R.
(b) Show that the minimal energy solution y ∗ (if it exists) is of the form

a cosh( x−b 1
a ) − μ∗ ρg in the normal case (Eqn. (1.54))
y ∗ (x) =
cx + d in the abnormal case (Eqn. (1.55))

for certain constants a, b, c, d ∈ R and Lagrange multiplier μ∗ ∈ R.


(c) Describe in terms of and x 0 , x 1 , y 0 , y 1 when we have the normal
case, the abnormal case, or no solution at all.
1
1.20 Integral constraint. Minimize 0 ẋ 2 (t ) dt subject to x (0) = x (π) = 0 and
1 2
0 x (t ) dt = 1.

1.21 Consider Example 1.7.2. Prove that for C < 1 there is no smooth function
that satisfies the boundary conditions and integral constraint.

1.22 Discrete calculus of variations. A discrete version of the simplest problem


in the calculus of variations (Definition 1.2.1) can be formulated as fol-
lows. Consider a final time T , a function F : {0, 1, . . . , T − 1} × Rn × Rn → R,
denoted as F (t , x 1 , x 2 ), and fixed x 0 , x T ∈ Rn . Consider the problem of
minimizing
T&
−1
F (t , x (t ), x (t + 1))
t =0

over all sequences x (0), x (1), x (2), . . . , x (T − 1), x (T ) with x (0) = x 0 , x (T ) =


x T (fixed initial and final conditions). In order to derive a discrete version
of the Euler-Lagrange equation for this problem we proceed as follows.
Let

( x ∗ (0), x ∗ (1), . . . , x ∗ (T − 1), x ∗ (T ))

be a minimizing sequence with x ∗ (0) = x 0 , x ∗ (T ) = x T , and consider vari-


ations

( x ∗ (0), x ∗ (1), . . . , x ∗ (T − 1), x ∗ (T )) + (δx (0), δx (1), . . . , δx (T − 1), δx (T ))

with δx (0) = δx (T ) = 0.
1.8 E XERCISES 45

(a) Show that this implies that

T&
−1 ∂F (t , x T&
−1
∗ (t ), x ∗ (t + 1)) ∂F (t , x ∗ (t ), x ∗ (t + 1))
δx (t )+ δx (t +1) = 0
t =0 ∂x 1T t =0 ∂x 2T

for all δx (t ) with δx (0) = δx (T ) = 0.


(b) Rearrange this equation (partly changing the summation index) so
as to obtain the equivalent condition

T&
−1 ∂F (t , x ∗ (t ), x ∗ (t + 1)) ∂F (t − 1, x ∗ (t − 1), x ∗ (t ))
+ δx (t ) = 0,
t =1 ∂x 1T ∂x 2T

and show that this implies

∂F (t , x ∗ (t ), x ∗ (t + 1)) ∂F (t − 1, x ∗ (t − 1), x ∗ (t ))
+ =0
∂x 1 ∂x 2
for all t = 1, . . . , T − 1. This system of equations can be called the dis-
crete Euler-Lagrange equation.
(c) Extend this to the minimization of (with S( x (T )) some final cost)
T&
−1
F (t , x (t ), x (t + 1)) + S( x (T ))
t =0

over all sequences x (0), x (1), . . . , x (T ) with x (0) = x 0 .


(d) Show how this could be used for obtaining numerical schemes solv-
ing the ordinary Euler-Lagrange equation (1.13). For example, given
a running cost F̃ (t , x (t ), ẋ (t )), t ∈ [0, T ], replace ẋ (t ) by its approxi-
mation x (t + 1) − x (t ) so as to obtain the discretized running cost

F (t , x (t ), x (t + 1)) := F̃ (t , x (t ), x (t + 1) − x (t )).

Write out the discrete Euler-Lagrange equation in this case.


Chapter 2

Minimum Principle

2.1 Optimal Control

In the solar challenge problem (Exercise 1.14) we assumed that we could choose
the speed ẋ of the car at will, but in reality, the speed is limited by the dynamics
of the car. For instance, the acceleration of the car is bounded. In this chap-
ter, we take such dynamical constraints into account. We assume that the state
x : [0, T ] → Rn satisfies a system of differential equations with initial state

ẋ (t ) = f ( x (t ), u (t )), x (0) = x0 , (2.1)

and that we can not choose x directly but only can choose u , which is known
as the input of the system. Furthermore, the input is restricted to take values in
some given set U ⊆ Rm , that is,

u : [0, T ] → U. (2.2)

For instance, in a car-parking problem, the input u might be the throttle open-
ing and this takes values in between u = 0 (fully closed) and u = 1 (fully open),
so then U = [0, 1]. For a given U and given (2.1), the optimal control problem is
to determine an input u : [0, T ] → U that minimizes a given cost function of the
form
T
J ( u ) := L( x (t ), u (t )) dt + K ( x (T )). (2.3)
0

Here, K : Rn → R and L : Rn × U → R. The part K ( x (T )) is called the terminal cost


or final cost, and L( x (t ), u (t )) is commonly called the running cost. The optimal
u is referred to as the optimal input or optimal control, and it is often denoted
with a star, i.e., u ∗ .
A variation of the optimal control problem is to fix the final state x (T ) to a
given x T . Clearly, in this case, there is no need for a final cost K ( x (T )) in that
every allowable input results in the same final cost. In this case, the optimal

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 47


G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0_2
48 2 M INIMUM P RINCIPLE

control problem is to find, for a given system (2.1) and x T and U, an input u :
[0, T ] → U that minimizes
T
L( x (t ), u (t )) dt
0

subject to x (T ) = x T . Later in this chapter, we consider the optimal control


problem where also the final time T is variable, and where the cost is to be
minimized over all allowable inputs u as well as all T ≥ 0.
As is clear from the definition, optimal control problems are more general
than calculus of variations problems. This is also reflected by the fact that opti-
mal controls u ∗ may even be discontinuous as a function of time, as illustrated
by the next example.

xT xT
x

x
reachable
states x(t ) T
0 x(t ) dt
0 t T 0 T 0 T xT T

(a) (b) (c)

F IGURE 2.1: Reachable states and candidate optimal states for the optimal
control problem of Example 2.1.1.

Example 2.1.1 (A bang-bang control example). Let

ẋ (t ) = u (t ), x (0) = 0, U = [0, 1].

We want to determine the control u : [0, T ] → U that minimizes


T
J ( u ) := x (t ) dt − x (T ) for some given T ≥ 1.
0

An ad hoc “solution by picture” goes as follows. Since ẋ (t ) = u (t ) ∈ [0, 1] it fol-


lows that the final state x T := x (T ) is an element of [ x (0), x (0) + T ] = [0, T ], see
Fig. 2.1(a). There are usually many ways to reach a given final state x T by choice
of u . Some possible state trajectories x , all reaching the same x T , are shown
in Fig. 2.1(b). Now, for any fixed x T ∈ [0, T ], it is clear that J ( u ) is minimal iff
T
0 x (t ) dt is minimal, but this integral is the area under the curve. So for the
fixed x T of Fig. 2.1(b), the optimal state is the one shown Fig. 2.1(c), and then
T
the area under the curve is 0 x (t ) dt = 12 x T2 . Hence, the minimal cost for a fixed
x T is
1 2
2 xT − xT .
2.2 QUICK S UMMARY OF THE C LASSIC L AGRANGE M ULTIPLIER M ETHOD 49

This cost equals 12 (x T − 1)2 − 12 and, therefore, x T = 1 achieves the smallest pos-
sible cost over all x T . Now we solved the problem: optimal is x T = 1, and the
optimal state x ∗ (t ) is zero for all t ≤ T −x T = T −1, and increases with ẋ ∗ (t ) = +1
for all t > T − 1. Therefore we conclude that the optimal control u ∗ is

0 if t < T − 1,
u ∗ (t ) = (2.4)
1 if t > T − 1.

In particular, the optimal control is discontinuous as a function of time. 

The derivation in this example is ad hoc. We want a theory that can deal
with optimal control problems systematically, including problems whose solu-
tion is discontinuous. To develop this theory we first assume that all functions
involved are sufficiently smooth, and that U = Rm . Combined with the classic
method of Lagrange multipliers we can then employ the theory of calculus of
variations, and this provides first-order conditions that optimal controls must
satisfy. This is derived in § 2.3. Motivated by these first-order conditions, we
then formulate and prove the truly fabulous minimum principle of Pontrya-
gin (§ 2.5). This result shocked the scientific community when it was presented
in the late fifties of the previous century. The minimum principle is very gen-
eral, and it provides necessary conditions for a control to be optimal, even if
the optimal control is discontinuous. In many applications, these conditions are
numerically tractable and allow us to construct the optimal control, assuming
one exists. But be warned: the proof of the minimum principle is involved.

2.2 Quick Summary of the Classic Lagrange Multiplier Method

Optimal control problems are minimization problems subject to dynamical


constraints. The classic way of dealing with constraints is to introduce Lagrange
multipliers. This short section provides a quick summary of this method; more
details can be found in Appendix A.8.
Consider minimizing a function J : Rn → R over a constrained set of Rn
defined as the zero set of some function G : Rn → Rk :

minz∈Rn J (z)
(2.5)
subject to G(z) = 0.

The method of Lagrange multipliers can help to find minimizers. In short, the
idea is to associate with this constrained problem in z an unconstrained prob-
lem in (z, λ) with cost function

J(z, λ) := λT G(z) + J (z).


This function J : Rn × Rk → R is sometimes called the augmented cost function1 ,
1 In optimization, the function J is often called “Lagrangian,” but in the calculus of variations,
as well as in classical mechanics, that terminology is normally reserved for F (t , x, ẋ), see Chap-
ter 1.
50 2 M INIMUM P RINCIPLE

and the components of the vector λ are known as Lagrange multipliers. Assum-
ing J is sufficiently smooth, a pair (z ∗ , λ∗ ) is a stationary solution of the uncon-
strained cost J(z, λ) over all z and λ iff both gradients vanish,
∂J(z ∗ , λ∗ ) ∂J(z ∗ , λ∗ )
= 0, = 0. (2.6)
∂z ∂λ
The gradient of J(z, λ) with respect to λ is G T (z). Hence, stationary solutions
(z ∗ , λ∗ ) of J(z, λ) necessarily satisfy G(z ∗ ) = 0, and, therefore, J(z ∗ , λ∗ ) = J (z ∗ ).
In fact, under mild assumptions, the unconstrained first-order conditions (2.6)
are equivalent to the first-order conditions of the constrained minimization
problem (2.5), see Appendix A.8 for details.
For the optimal control problem, we take a similar approach, however with
the complication that we are not dealing with a minimization over a finite num-
ber of variables z ∈ Rn , but over uncountably many functions u , x , and the con-
straints are the dynamical constraints ẋ (t ) = f ( x (t ), u (t )), and these need to be
satisfied for all t ∈ [0, T ].

2.3 First-order Conditions for Unbounded and Smooth Controls

We return to the optimal control problem of minimizing a cost


T
J ( u ) := L( x (t ), u (t )) dt + K ( x (T )),
0

subject to

ẋ (t ) = f ( x (t ), u (t )), x (0) = x0 . (2.7)

In this section, we do not restrict the inputs, i.e.,

U = Rm ,

and we further assume for now that all functions involved are sufficiently
smooth.
The optimal control problem can be regarded as a constrained optimization
problem, with (2.7) being the dynamical constraint. This observation provides a
clue to its solution: introduce Lagrange multiplier functions p : [0, T ] → Rn cor-
responding to these dynamical constraints. Analogous to the classic Lagrange
multiplier method, we introduce an augmented running cost L : Rn × Rn × U ×
Rn → R, defined as

L(x, ẋ, u, p) = p T ( f (x, u) − ẋ) + L(x, u), (2.8)

and analyze the first-order conditions for the corresponding cost. That is, we
want to know which conditions are satisfied by stationary solutions

q ∗ :=( x ∗ , p ∗ , u ∗ )
2.3 F IRST- ORDER C ONDITIONS FOR U NBOUNDED AND S MOOTH C ONTROLS 51

of the unconstrained problem with cost


T
J( q ) := L( x (t ), ẋ (t ), u (t ), p (t )) dt + K ( x (T )). (2.9)
0

Before we delve into the resulting Euler-Lagrange equation, it is interesting to


first see what the Beltrami identity gives us. Indeed, the L defined in (2.8) is
of the form L(q, q̇) and so does not depend on time. As a result, the Beltrami
identity holds, which says that the function
∂L( q (t ), q̇ (t ))
L( q (t ), q̇ (t )) − q̇ T (t )
∂q̇
is constant over time for every stationary solution q of the cost (2.9). For our L
we have
∂L(q, q̇)
L(q, q̇) − q̇ T
∂q̇
 
∂L(q, q̇) ∂L(q, q̇) ∂L(q, q̇)
= L(q, q̇) − ẋ T + ṗ T + u̇ T
∂ẋ ∂ṗ ∂u̇
= p T ( f (x, u) − ẋ) + L(x, u) − (−ẋ T p + 0 + 0)
= p T f (x, u) + L(x, u). (2.10)

The final function is known as the (optimal control) Hamiltonian and it plays a
central role in optimal control. First, we use it to formulate the necessary first-
order conditions for the augmented problem:

Lemma 2.3.1 (Hamiltonian equations). Let U = Rm , x 0 ∈ Rn , and consider L as


defined in (2.8), and assume f (x), L(x, u), K (x) are C 1 . Then sufficiently smooth
functions x ∗ , p ∗ , u ∗ are stationary solutions of the cost (2.9) with x ∗ (0) = x 0 , iff
they satisfy
∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t ))
ẋ ∗ (t ) = , x ∗ (0) = x0 , (2.11a)
∂p
∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂K ( x ∗ (T ))
ṗ ∗ (t ) = − , p ∗ (T ) = , (2.11b)
∂x ∂x
∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t ))
0= . (2.11c)
∂u
Here H : Rn × Rn × U → R is the (optimal control) Hamiltonian defined as

H (x, p, u) = p T f (x, u) + L(x, u). (2.12)

Proof. The triple ( x ∗ , p ∗ , u ∗ ) is a stationary solution iff it satisfies the Euler-


Lagrange equation together with the boundary conditions of Proposition 1.5.1
(p. 23). Define L(q, q̇) as in (2.8) with q :=(x, p, u), and notice that L(q, q̇) in
terms of the Hamiltonian H (x, p, u) is

L(q, q̇) = H (q) − p T ẋ.


52 2 M INIMUM P RINCIPLE

For ease of exposition, we momentarily denote x (t ) simply as x , etc. The Euler-


∂ d ∂
Lagrange equation, 0 = ( ∂q − dt ∂q̇ )L( q , q̇ ), is a vector equation, that is to say it
holds component-wise. For component x , it says
 ∂ d ∂  ∂H ( x , p , u )
0= − (H ( x , p , u ) − p T ẋ ) = + ṗ .
∂x dt ∂ẋ ∂x
∂H ( x , p , u )
Hence, ṗ = − ∂x . For component p , it says
 ∂ d ∂  ∂H ( x , p , u )
0= − (H ( x , p , u ) − p T ẋ ) = − ẋ .
∂p dt ∂ṗ ∂p
∂H ( x , p , u )
Hence, ẋ = ∂p . For component u , it says
 ∂ d ∂  ∂H ( x , p , u )
0= − (H ( x , p , u ) − p T ẋ ) = .
∂u dt ∂u̇ ∂u
The free final point (also known as free endpoint) conditions (1.42) become
∂L( q (T ), q̇ (T )) x (T ))
0= ∂q̇ + ∂K (∂q , and per component this is

∂L( q (T ), q̇ (T )) ∂K ( x (T )) ∂K ( x (T ))
0= + = − p (T ) + ,
∂ẋ ∂x ∂x
∂L( q (T ), q̇ (T )) ∂K ( x (T ))
0= + = 0 + 0,
∂ṗ ∂p
∂L( q (T ), q̇ (T )) ∂K ( x (T ))
0= + = 0 + 0.
∂u̇ ∂u
x (T ))
The first says that p (T ) = ∂K (∂x , and the other two are void.
Since we have an initial condition on x but not on p and u , the free initial-
point conditions (1.41) on q need to hold for the components p and u (see
∂L( q (0), q̇ (0))
Proposition 1.5.1). The initial-point conditions become 0 = ∂q̇ , and for
the respective components p and u , this gives

∂L( q (0), q̇ (0)) ∂L( q (0), q̇ (0))


0= = 0 and 0 = = 0.
∂ṗ ∂u̇

These conditions are void. ■

The differential equations (2.11a, 2.11b) are known as the Hamiltonian


equations. Note that

∂H ( x (t ), p (t ), u (t ))
= f ( x (t ), u (t )).
∂p

Therefore, the first Hamiltonian equation (2.11a) is nothing else than the given
system equation: ẋ (t ) = f ( x (t ), u (t )), x (0) = x 0 .
The Lagrange multiplier p is called the costate (because mathematically, it
lives in a dual space to the (variations) of the state x ). In examples it often has
2.4 T OWARDS THE M INIMUM P RINCIPLE 53

interesting interpretations—shadow prices in economics and contact forces in


mechanical systems—in terms of the sensitivity of the minimized cost function.
This is already illustrated by the condition p ∗ (T ) = ∂K ( x∂x∗ (T )) , which means that
p ∗ (T ) equals the sensitivity of the final time cost with respect to variations in
the optimal state at the final time. In Chapter 3 we show that
dJ ( u ∗ )
p ∗ (0) = ,
dx 0
where u ∗ now means the optimal input depending on the initial state x 0 , see
§ 3.5. A large p ∗ (0) hence means that the optimal cost is sensitive to changes in
the initial state.

2.4 Towards the Minimum Principle

Based on the previous section, one would conjecture that smooth optimal con-
trols, for U = Rm , must satisfy the first-order conditions of the augmented prob-
lem (Lemma 2.3.1). Specifically, if u ∗ is an optimal control, and x ∗ is the result-
ing optimal state, then one would conjecture the existence of a function p ∗ that
satisfies
∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂K ( x ∗ (T ))
ṗ ∗ (t ) = − , p ∗ (T ) = ,
∂x ∂x
and such that ( x ∗ , p ∗ , u ∗ ) satisfies (2.11c). We will soon see that that is indeed
the case (under some mild smoothness assumption). In fact, it holds in a far
more general setting. To motivate this general result, it is instructive to rewrite
the Legendre condition of calculus of variations problems in terms of Hamilto-
nians:

Example 2.4.1 (Legendre condition in terms of Hamiltonians). Consider the


calculus of variations problem with free endpoint, and where F (t , x, ẋ) does not
depend on t :
T
min n F ( x (t ), ẋ (t )) dt subject to x (0) = x 0 .
x :[0,T ]→R 0

Clearly, this equals the optimal control problem with

ẋ (t ) = u (t ), x (0) = x0 , U = Rn , L(x, u) = F (x, u).

The Hamiltonian in this case is

H (x, p, u) = p T u + F (x, u).

The Legendre condition says that optimal solutions of this calculus of vari-
ations problem must satisfy
∂2 F ( x ∗ (t ), ẋ ∗ (t ))
≥0 ∀t ∈ [0, T ].
∂ẋ∂ẋ T
54 2 M INIMUM P RINCIPLE

From the equality H (x, p, u) = p T u +F (x, u), it follows immediately that the Leg-
endre condition for our problem, in terms of the Hamiltonian, is that
∂2 H ( x ∗ (t ), p (t ), u ∗ (t ))
≥0 ∀t ∈ [0, T ], (2.13)
∂u∂u T
for whatever p (t ). 

Condition (2.13) is particularly interesting if we take p (t ) = p ∗ (t ) because we


also have condition (2.11c), which says that
∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t ))
= 0.
∂u
These two conditions combined suggest that H ( x ∗ (t ), p ∗ (t ), u) at each point in
time is minimized by the optimal control u = u ∗ (t ). Could it be? In the next
section, we see that the answer is yes, and that every optimal control problem
has this pointwise minimality property!

2.5 Minimum Principle

The following celebrated theorem by Pontryagin and coworkers provides a nec-


essary condition for solutions of the true minimization problem (not just sta-
tionary ones), and it can even deal with restricted sets U and discontinuous
controls. The basic feature is that it replaces the first-order optimality condi-
tion (2.11c) with a pointwise minimization condition. Here is the famous result.
It is very general and it is the central result of this chapter.

Theorem 2.5.1 (Minimum principle). Consider the optimal control problem


defined by (2.1, 2.2, 2.3), and assume that f (x, u) and ∂ f (x, u)/∂x and L(x, u)
and ∂L(x, u)/∂x are continuous in x and u, and that K (x) and ∂K (x)/∂x are con-
tinuous in x.
Suppose u ∗ : [0, T ] → U is a solution of the optimal control problem, and
assume it is piecewise continuous2 , and let x ∗ : [0, T ] → Rn be the resulting opti-
mal state. Then there is a unique function p ∗ : [0, T ] → Rn that satisfies
∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t ))
ẋ ∗ (t ) = , x ∗ (0) = x0 , (2.14a)
∂p
∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂K ( x ∗ (T ))
ṗ ∗ (t ) = − , p ∗ (T ) = , (2.14b)
∂x ∂x
and along the solution ( x ∗ (t ), p ∗ (t )), the input u ∗ (t ) minimizes the Hamilto-
nian,
H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) = min H ( x ∗ (t ), p ∗ (t ), u), (2.15)
u∈U
at every t ∈ [0, T ] where u ∗ (t ) is continuous.
2 A function u is piecewise continuous on [0, T ] if it is continuous everywhere except for finitely
many instances t i ∈ (0, T ), and that limt ↑ti u (t ) and limt ↓ti u (t ) exist for all points of discontinuity
t i ∈ (0, T ), and that also limt ↓0 u (t ) and limt ↑T u (t ) exist.
2.5 M INIMUM P RINCIPLE 55

Proof. (This proof requires a couple of easy-to-believe but technical results


regarding continuity of solutions of differential equations. Upon first reading
these can be discarded, but for a full understanding one should have a look at
Appendix B.)
Let u ∗ be an optimal input, and let x ∗ be the corresponding optimal state.
First notice that the costate equations are linear in the costate:

ṗ ∗ (t ) = A(t ) p ∗ (t ) + b(t ), p ∗ (T ) = ∂K ( x ∗ (T ))/∂x

for A(t ) := −∂ f ( x ∗ (t ), u ∗ (t ))/∂x T and b(t ) := −∂L( x ∗ (t ), u ∗ (t ))/∂x. By assump-


tion, both A(t ) and b(t ) are piecewise continuous, and so the solution p ∗ (t )
exists for all t ∈ [0, T ], is continuous and is unique.
Now assume, to obtain a contradiction, that at some time t̄ ∈ (0, T ) where
the input is continuous, a û ∈ U exists that achieves a smaller value of the Hamil-
tonian H ( x ∗ (t̄ ), p ∗ (t̄ ), û) than u ∗ (t̄ ) does. That is, c defined as

c = H ( x ∗ (t̄ ), p ∗ (t̄ ), û) − H ( x ∗ (t̄ ), p ∗ (t̄ ), u ∗ (t̄ ))

is negative. Then, by continuity, for some small enough  > 0 the function
defined as

û if t ∈ [t̄ , t̄ + ],
ū (t ) =
u ∗ (t ) elsewhere

achieves a smaller (or equal) value of the Hamiltonian for all time, and
T
H ( x ∗ (t ), p ∗ (t ), ū (t )) − H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) dt = c + o().
0

Now write ū (t ) as a perturbation of the optimal input,

ū (t ) = u ∗ (t ) + δu (t ).

The so defined perturbation δu (t ) = ū (t ) − u ∗ (t ) has a support of . Its graph


might look like

In the rest of the proof we fix this perturbation and we consider only very small
and positive . Such perturbations are called “needle” perturbations.
By perturbing the input, ū = u ∗ + δu , the solution of ẋ (t ) = f ( x (t ), u ∗ (t ) +
δu (t )) for t > t̄ perturbs as well. Denote the perturbed state as x (t ) = x ∗ (t ) +
δx (t ). The perturbation δx (t ) is probably not a needle but at each t > t̄ it is of
56 2 M INIMUM P RINCIPLE

order3 . To avoid clutter, we now drop all time arguments, that is, x (t ) is simply
denoted as x , etc. The derivative of δx with respect to time satisfies

δ̇x = ( ẋ ∗ + δ̇x ) − ẋ ∗ = f ( x ∗ + δx , u ∗ + δu ) − f ( x ∗ , u ∗ ). (2.16)

This expression we soon need.


Let Δ be the change in cost, Δ := J ( u ∗ + δu ) − J ( u ∗ ). We have

Δ = J ( u ∗ + δu ) − J ( u ∗ )
T
= K ( x ∗ (T ) + δx (T )) − K ( x ∗ (T )) + L( x ∗ + δx , u ∗ + δu ) − L( x ∗ , u ∗ ) dt
0
T
∂K ( x ∗ (T ))
= δ x (T ) + L( x ∗ + δx , u ∗ + δu ) − L( x ∗ , u ∗ ) dt + o().
∂x T 0

Next use that L(x, u) = −p T f (x, u) + H (x, p, u), and substitute p for the optimal
costate p ∗ :
T
Δ = p ∗ (T )δx (T ) +
T
− p ∗T [ f ( x ∗ + δx , u ∗ + δu ) − f ( x ∗ , u ∗ )] dt
0
T
+ H ( x ∗ + δx , p ∗ , u ∗ + δu ) − H ( x ∗ , p ∗ , u ∗ ) dt + o().
0

The term in between square brackets according to (2.16) is δ̇x , so


T
Δ = p ∗T (T )δx (T ) + − p ∗T δ̇x + H ( x ∗ + δx , p ∗ , u ∗ + δu ) − H ( x ∗ , p ∗ , u ∗ + δu ) dt
0
T
+ H ( x ∗ , p ∗ , u ∗ + δu ) − H ( x ∗ , p ∗ , u ∗ ) dt + o().
0

Here, we also subtracted and added a term H ( x ∗ , p ∗ , u ∗ +δu ). The reason is that
now the difference of the first two Hamiltonian terms can be recognized as an
approximate partial derivative with respect to x, and the difference of the final
two Hamiltonian terms is what we considered earlier (it equals c + o()). So:
T
∂H ( x ∗ , p ∗ , u ∗ + δu )
Δ = p ∗T (T )δx (T ) + − p ∗T δ̇x + δx dt + c + o().
0 ∂x T
∂H ( x , p , u +δ ) ∂H ( x , p , u )
Notice that the partial derivative ∗ ∗ ∗
∂x
u
equals − ṗ ∗ = ∗ ∗ ∗
∂x every-
where except for  units of time (for t ∈ [t̄ , t̄ + ]). This, combined with the fact
that δx at each moment in time is also of order , allows us to conclude that
T
Δ = p ∗ (T )δx (T ) +
T
− p ∗T δ̇x − ṗ ∗T δx dt + c + o().
0

3 For t ≤ t̄ , we have δ (t ) = 0. For t ∈ [t̄ , t̄ + ], we have δ (t ) = x (t ) − x (t ) = x (t ) − x (t̄ ) −


x x ∗
( x ∗ (t )− x (t̄ )) ≤ x (t )− x (t̄ ) + x ∗ (t )− x (t̄ ) = (t − t̄ )( f ( x (t̄ ), u (t̄ ))) + (t − t̄ )( f ( x (t̄ ), u (t̄ ))) + o(t −
t̄ ) ≤ M  for some M > 0 and all small enough  > 0. So at t = t̄ +  the solutions x (t ) and x ∗ (t )
differ, in norm, at most M . Now for t > t̄ + , apply Lemma B.1.6 with g (t ) = 0.
2.5 M INIMUM P RINCIPLE 57

The integrand − p ∗T δ̇x − ṗ ∗T δx we recognize as the total derivative of − p ∗T δx with


respect to time. Now it is better to add the time dependence again:
T
Δ = p ∗T (T )δx (T ) + − p ∗T (t )δx (t ) + c + o()
0
= p ∗T (0)δx (0) + c + o()
= c + o().

Here we used that δx (0) = 0. This is because of the initial condition x (0) = x 0 .
Since c < 0 we see that Δ is negative for small enough . But that would mean
that ū for small enough  achieves a smaller cost than optimal. Not possible.
Hence, the assumption that u ∗ (t ) does not minimize the Hamiltonian at every
t where u ∗ (t ) is continuous, is wrong. ■

The theory of the minimum principle was developed during the 1950s in the
former Soviet Union by a group of mathematicians led by Lev Pontryagin, and
in honor of him it is called Pontryagin’s minimum principle. Actually, Pontrya-
gin followed the classical mechanics sign convention, p T f (x, u) minus L(x, u).
Hence, the principle is better known as Pontryagin’s maximum principle.
The principle assumes the existence of an optimal control u ∗ , and then guar-
antees that u ∗ minimizes the Hamiltonian at each moment in time. In practical
situations, this pointwise minimization is used to determine the optimal con-
trol, tacitly assuming an optimal control exists. Hence, one could say that the
principle provides necessary conditions for optimality. In Section 2.8, we discuss
under which conditions these conditions are sufficient as well; see also Chap-
ter 3 for the alternative approach offered by dynamic programming.

Example 2.5.2 (Simple problem). Consider the system and cost


1
ẋ (t ) = u (t ), x (0) = x0 , J (u) = x (t ) dt .
0

So we have that f (x, u) = u, K (x) = 0, and L(x, u) = x. As input set we take

U = [−1, 1].

The Hamiltonian for this problem becomes

H (x, p, u) = pu + x,

and the equation for the costate hence is


∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂K ( x ∗ (T ))
ṗ ∗ (t ) = − = −1, p ∗ (1) = = 0.
∂x ∂x
Clearly, this means that the costate equals

p ∗ (t ) = 1 − t .
58 2 M INIMUM P RINCIPLE

The optimal input u ∗ (t )—assuming it exists—at each t ∈ [0, 1] minimizes the


Hamiltonian p ∗ (t ) u (t ) + x ∗ (t ). Since p ∗ (t ) = 1 − t > 0 for all t ∈ [0, 1) the value of
the optimal input is the minimal value in U,

u ∗ (t ) = −1 ∀t ∈ [0, 1].
1
This makes perfect sense: to minimize 0 x (t ) dt , we want x (t ) to go down as
fast as possible, which, given the system dynamics ẋ (t ) = u (t ), means taking
u (t ) as small (negative) as possible. 

Example 2.5.3 (Switching inputs). We consider again the integrator system

ẋ (t ) = u (t ), x (0) = x0 , U = [−1, 1],

but now we add a final cost − 12 x (1) to the cost function,


1
1
J (u) = x (t ) dt − x (1).
0 2
In this case it is not obvious what to do with u (t ) because the faster x (t ) goes
down the larger the final cost − 12 x (1) is going to be. So possibly u (t ) = −1 is no
longer optimal. In fact, we will see that it is not. We have H (x, p, u) = pu + x and
K (x) = − 12 x, and so the costate equation now is

∂K ( x ∗ (1)) 1
ṗ ∗ (t ) = −1, p ∗ (1) = =− .
∂x 2
Hence,
1
p ∗ (t ) = − t .
2

The costate is positive for 0 ≤ t < 12 but negative for 12 < t ≤ 1. The optimal
control minimizes the Hamiltonian p ∗ (t ) u ∗ (t ) + x ∗ (t ), and, because of the sign
change in p ∗ (t ) at t = 1/2, we see that the optimal input switches sign at t = 1/2:

1
−1 if 0 ≤ t <
u ∗ (t ) = 1
2
.
+1 if 2 <t ≤1

Assuming an optimal control exists, it must be this one. Apparently, it is now


optimal to move x (t ) down as fast as possible over the first half of the time inter-
val and then back up as fast as possible over the second half. 

Example 2.5.4 (Linear system with quadratic cost). Consider the system with
cost
1
ẋ (t ) = u (t ), x (0) = x0 , J (u) = x 2 (t ) + u 2 (t ) dt .
0
2.5 M INIMUM P RINCIPLE 59

Now we allow every u (t ) ∈ R, that is, U = R. Notice that this is the same cost func-
tion as in Example 1.5.3 (for α = 1) because u (t ) = ẋ (t ). The associated Hamilto-
nian is

H (x, p, u) = pu + x 2 + u 2 .

Since H (x, p, u) is quadratic in u, and U = R, the minimizing u is the one at


which the gradient of H (x, p, u) with respect to u is zero. This yields

u ∗ (t ) = − 12 p ∗ (t ), (2.17)

and then the Hamiltonian equations (2.14) become

ẋ ∗ (t ) = − 12 p ∗ (t ), x ∗ (0) = x0 ,
ṗ ∗ (t ) = −2 x ∗ (t ), p ∗ (1) = 0.
This implies that p̈ ∗ (t ) = p ∗ (t ). The general solution of this second-order differ-
ential equation is

p ∗ (t ) = c1 et +c2 e−t ,
and since x ∗ (t ) = − 12 ṗ ∗ (t ), we find that

x ∗ (t ) = − 12 c1 et + 12 c2 e−t .
The two constants c 1 , c 2 , follow uniquely from the two boundary conditions
x ∗ (0) = x0 and p ∗ (1) = 0, and it gives (verify this yourself )
x0
x ∗ (t ) = e1−t + et −1 ,
e + e−1
2x 0
p ∗ (t ) = e1−t − et −1 .
e + e−1
The optimal control u ∗ (t ) now follows from (2.17). 

Example 2.5.5 (Optimal reinvestment). Let x (t ) be the rate of production of,


say, gold of some mining company. At each moment in time a fraction u (t ) ∈
[0, 1] of the produced gold is used for reinvestment so as to increase the pro-
duction rate. This is modeled as

ẋ (t ) = α u (t ) x (t ), x (0) = x0 , u (t ) ∈ [0, 1],


where α is some positive parameter that models the success of investment.
Since we reinvest u (t ) x (t ), the net production rate is (1− u (t )) x (t ). After T units
T
of time we want to net total production 0 (1− u (t )) x (t ) dt to be as large as pos-
sible. In our setup (with minimization), it means that we want the quantity with
the opposite sign
T
J (u) := ( u (t ) − 1) x (t ) dt
0
60 2 M INIMUM P RINCIPLE

to be as small as possible. The Hamiltonian in this case is

H (x, p, u) = pαux + (u − 1)x = ux(1 + αp) − x, (2.18)

and the Hamiltonian equations become

ẋ (t ) = α u (t ) x (t ), x (0) = x0 ,
ṗ (t ) = (1 − u (t )) − p (t )α u (t ), p (T ) = 0.

These differential equations are, in their present form, still hard to solve. How-
ever, the Hamiltonian (2.18) is linear in u, so the minimizer u of the Hamilto-
nian (2.18) depends solely on the sign of x(1 + αp). In fact, since the production
rate x (t ) is inherently positive (because x (0) = x 0 > 0 and ẋ (t ) = α u (t ) x (t ) ≥ 0),
the Hamiltonian at each moment in time is minimal for

0 if 1 + α p ∗ (t ) > 0,
u ∗ (t ) =
1 if 1 + α p ∗ (t ) < 0.

The value of the costate p ∗ (t ) where this u ∗ (t ) switches is p ∗ (t ) = −1/α, see


Fig. 2.2 (left). Now at t = T we have p ∗ (T ) = 0, so near the final time T we have
u ∗ (t ) = 0 (invest nothing, sell all), and then the Hamiltonian dynamics reduces
to ẋ ∗ (t ) = 0 and

ṗ ∗ (t ) = 1 near t = T , and p ∗ (T ) = 0.

That is, p ∗ (t ) = t − T near t = T , see Fig. 2.2. Solving backwards in time, starting
at t = T , we see that the costate reduces linearly, until at time

t s := T − 1/α

it reaches the level p ∗ (t s ) = −1/α < 0 at which point u ∗ (t ) switches sign. Since
ṗ (t ) > 0 for every input, the value of p ∗ (t ) is less than −1/α for all t < ts , which
implies that u ∗ (t ) = 1 all t < t s . For this case, the Hamiltonian dynamics simplify
to

ẋ ∗ (t ) = α x ∗ (t ), ṗ ∗ (t ) = −α p ∗ (t ) if t < t s .

Both x ∗ (t ) and p ∗ (t ) now have exponential solutions. The combination of


before-and-after-switch is shown in Fig. 2.2. This settles x ∗ (t ), p ∗ (t ), u ∗ (t ) for
all t ∈ [0, T ].
Notice that if t s < 0 then on the time window [0, T ] no switch takes place.
It is then optimal to invest nothing and sell everything throughout [0, T ]. This
happens if α < 1/T , and the interpretation is that α is then too small to benefit
from investment. If, on the other hand, α > 1/T then t s > 0 and then investment
is beneficial and the above shows that it is optimal to first invest everything, and
in the final 1/α time units to sell everything. Of course, this model is a simplifi-
cation of reality. 
2.5 M INIMUM P RINCIPLE 61

F IGURE 2.2: Optimal costate p ∗ (t ), optimal input u ∗ (t ) and optimal state


x ∗ (t ). See Example 2.5.5.

The Hamiltonian was derived from the Beltrami identity (see Eqn. (2.10)).
Hence, we could expect that H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) is constant as a function of
time. For the unconstrained inputs (U = Rm ) and smooth enough solutions,
this may easily be verified directly from the first-order equations for optimal-
ity expressed in Lemma 2.3.1. Indeed, if ( x ∗ , p ∗ , u ∗ ) are a smooth triple satisfy-
ing (2.11), then a direct computation yields (and for the sake of exposition, we
momentarily drop here the arguments of H and other functions)

d ∂H ∂H ∂H
H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) = T ẋ ∗ + ṗ + T u̇ ∗
T ∗
dt ∂x ∂p ∂u
∂H ∂H ∂H ∂H ∂H
= T + (− ) + T u̇ ∗
∂x ∂p ∂p T
∂x ∂u
∂H
= u̇ ∗ (2.19)
∂u T
= 0.

The final equality follows from (2.11c). Actually, the constancy of the Hamil-
tonian H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) also holds for restricted input sets U (such as U =
[0, 1], etc.). This is remarkable because in such cases the input quite often is not
even continuous.
62 2 M INIMUM P RINCIPLE

h(t0 )
h(t )

t0 t

F IGURE 2.3: Suppose a function h : R → R is not continuous at some t 0 ,


but that the limit from the left and from the right at t 0 are the same:
limt ↑t0 h(t ) = lims↓t0 h(s). Then the discontinuity of h at t 0 is said to be
removable. See the proof of Theorem 2.5.6.

Theorem 2.5.6 (Constancy of the Hamiltonian). Let all the assumptions


of Theorem 2.5.1 be satisfied. Suppose, in addition, that ∂ f (x, u)/∂u and
∂L(x, u)/∂u are continuous, and that u ∗ is C 1 at all but finitely many t ∈ [0, T ].
Then a constant H∗ exists such that

H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) = H∗

at every t where u ∗ (t ) is continuous.

Proof. Clearly, H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) as a function of time is continuous wher-


ever u ∗ (t ) is continuous. So, the Hamiltonian has finitely many discontinu-
ities because u ∗ (t ) has finitely many discontinuities. We first prove that all dis-
continuities in H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) are removable. This notion is illustrated
in Fig. 2.3. It means that by suitably redefining the function at the discon-
tinuities the function becomes continuous everywhere. Let t 0 ∈ (0, T ) be a
point of discontinuity of u ∗ (t ), and realize that x ∗ (t ) and p ∗ (t ) are continu-
ous at t 0 . Because of the minimality property of the Hamiltonian, we have that
H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) ≤ H ( x ∗ (t ), p ∗ (t ), u ∗ (s)) at every t where u ∗ (t ) is continu-
ous, and at every s ∈ [0, T ]. This also holds in the limit t ↑ t 0 and s ↓ t 0 :

lim H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) ≤ lim H ( x ∗ (t ), p ∗ (t ), u ∗ (s)) (2.20)


t ↑t 0 t ↑t 0 ,s↓t 0

= lim H ( x ∗ (s), p ∗ (s), u ∗ (s)). (2.21)


s↓t 0

The final equality is because x ∗ (t ), p ∗ (t ) are continuous functions, also at


t 0 . Again by the minimality property, the Hamiltonian in (2.21) satisfies
H ( x ∗ (s), p ∗ (s), u ∗ (s)) ≤ H ( x ∗ (s), p ∗ (s), u ∗ (τ)) at every s where u ∗ (s) is contin-
uous, and at every τ ∈ [0, T ]. Thus, we also have

lim H ( x ∗ (s), p ∗ (s), u ∗ (s)) ≤ lim H ( x ∗ (s), p ∗ (s), u ∗ (τ)) (2.22)


s↓t 0 s↓t 0 ,τ↑t 0

= lim H ( x ∗ (τ), p ∗ (τ), u ∗ (τ)). (2.23)


τ↑t 0

The last identity is again by continuity of x ∗ , p ∗ at t 0 . Finally, notice that (2.23)


equals the first limit in (2.20). Therefore, all the above limits in (2.20)–(2.23)
2.5 M INIMUM P RINCIPLE 63

are the same. In particular, the limits in (2.21) and (2.23) are the same. It
means that the possible discontinuity at t 0 is removable. This shows that
H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) equals some continuous function H∗ (t ) at all but finitely
many t .
Now let t be a time at which u ∗ is C 1 . Since x ∗ and p ∗ are C 1 at t , the equal-
ity (2.19) gives

dH ( x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t ))
q := = u̇ ∗ (t ). (2.24)
dt ∂u T
This q is zero by the fact that u ∗ (t ) minimizes the Hamiltonian. To see this more
clearly, realize that this q also enters the following Taylor series:

H ( x ∗ (t ), p ∗ (t ), u ∗ (t + )) = H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) + q + o().

Since u ∗ (t + ) minimizes H ( x ∗ (t ), p ∗ (t ), u ∗ (t + )) for  = 0, the above Taylor


series reveals that q must indeed be zero. Hence, (2.24) is zero at every t where
u ∗ is C 1 .
Summarizing: H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) equals some continuous function H∗ (t )
at all but finitely many t , and it has zero derivative at all but finitely many
t . Hence, also the continuous function H∗ (t ) has zero derivative at all but
finitely many t . But that means that H∗ (t ) is constant, H∗ (t ) = H∗ . The func-
tion H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) is continuous at every t where u ∗ (t ) is continuous, so
it equals this constant H∗ at all such t . ■

We note that Theorem 2.5.6 can be proved under weaker smoothness


assumptions. However, the above assumptions hold in all practical cases, and
avoid further technicalities in the proof.
The following example illustrates the constancy property of the Hamiltonian
for a case where the optimal input is not even continuous.

Example 2.5.7 (Example 2.5.3 continued). In Example 2.5.3, we considered



ẋ (t ) = u (t ) with initial condition x (0) = x0 and cost J ( u ) = 01 x (t ) dt − 12 x (1). We
found that the optimal costate trajectory equals

and that the optimal input switches halfway,


64 2 M INIMUM P RINCIPLE

Therefore, the description of the optimal state trajectory also switches halfway:
from ẋ (t ) = u ∗ (t ) it follows that

Based on this, it seems unlikely that the Hamiltonian along the optimal solution
is constant, but realize that p ∗ (t ) u ∗ (t ) equals

and, therefore, that the Hamiltonian is constant as a function of time,

H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) = p ∗ (t ) u ∗ (t ) + x ∗ (t )

−( 1 − t ) + (x 0 − t ) if t < 1/2
= 12
( 2 − t ) + (x 0 − 1 + t ) if t ≥ 1/2
= x 0 − 12 for all t ∈ [0, 1].

2.6 Optimal Control with Final Constraints

So far in this chapter the final state x (T ) was not constrained. In quite a few
applications, however, there are constraints on the final state x (T ). (Note fur-
thermore that in Chapter 1, calculus of variations, we actually started with a
fully constrained x (T ).) In the car parking application, for instance, we obvi-
ously want the speed of the car to equal zero at the final time. Let r denote the
number of components of the final state that are constrained. Without loss of
generality, we assume these to be the first r components. So consider the sys-
tem with initial and final conditions

ẋ (t ) = f ( x (t ), u (t )), x (0) = x0 , x i (T ) = x̂i , i = 1, . . . , r. (2.25)

Keep in mind that no conditions are imposed on the remaining final state com-
ponents x r +1 (T ), . . . , x n (T ). As before we take a cost of the form
T
J (u) = L( x (t ), u (t )) dt + K ( x (T )). (2.26)
0

Lemma 2.3.1 (the first-order conditions for U = Rm ) can be generalized to this


case as follows. In the proof of this lemma, the conditions on the final costate
∂K ( x ∗ (T ))
p ∗ (T ) =
∂x
2.6 O PTIMAL C ONTROL WITH F INAL C ONSTRAINTS 65

were derived from the free endpoint condition (1.42), but in Proposition 1.5.1
we saw that these conditions are absent if the final state is constrained. With
that in mind, it will be no surprise that fixing the first r components of the final
state, x i (T ), i = 1, . . . , r , implies that the conditions on the corresponding first r
components of the final costate are absent, i.e., only the remaining components
of p ∗ (T ) are constrained:

∂K ( x ∗ (T ))
p ∗i (T ) = , i = r + 1, . . . , n.
∂x i

That is indeed the case. However, there is a catch: the first-order conditions were
derived using a perturbation of the solution x ∗ , but if we have constraints on the
both initial state and final state, then it may happen that nonzero perturbations
do not exist. An example is

ẋ (t ) = u 2 (t ), x (0) = 0, x (1) = 0.

Clearly, in this case, there is only one feasible control and one feasible state:
the zero function. (This system will be the starting point of Example 2.6.2.) We
have seen similar difficulties in the calculus of variations problems subject to
integral constraints, with its “normal” and “abnormal” Euler-Lagrange equation
(have a look at § 1.7, in particular, Example 1.7.2). Also now we make a dis-
tinction between a normal and an abnormal case, but the proof of the resulting
theorem is involved, and it would take too long to explain the details here. The
interested reader might want to consult the excellent book (Liberzon, 2012). We
just provide the solution. It involves the modified Hamiltonian defined as

Hλ (x, p, u) = p T f (x, u) + λL(x, u).

It is the Hamiltonian but with an extra parameter λ, and this parameter is either
zero or one,

λ ∈ {0, 1}.

Observe that H1 (x, p, u) is the “normal” Hamiltonian, while H0 (x, p, u) com-


pletely neglects the running cost L(x, u). Optimal control problems where we
need H0 are referred to as “abnormal” problems, indicating that they are not
likely to happen. With this modified Hamiltonian, the minimum principle (The-
orem 2.5.1) generalizes as follows.

Theorem 2.6.1 (Minimum principle for constrained final state). Con-


sider (2.25) with standard cost (2.26), and assume that f (x, u) and ∂ f (x, u)/∂x
and L(x, u) and ∂L(x, u)/∂x are all continuous in x and u, and that K (x) and
∂K (x)/∂x are continuous in x.
Suppose u ∗ : [0, T ] → U is a solution of the optimal control problem, and
assume it is piecewise continuous, and let x ∗ : [0, T ] → Rn be the resulting opti-
66 2 M INIMUM P RINCIPLE

mal state. Then there is a function p ∗ : [0, T ] → Rn and a constant λ ∈ {0, 1} such
that (λ, p ∗ (t )) = (0, 0) for all t ∈ [0, T ], and

∂Hλ ( x ∗ (t ), p ∗ (t ), u ∗ (t ))
ẋ ∗ (t ) = , x ∗ (0) = x0 , x ∗i (T ) = x̂i , i = 1, . . . , r,
∂p
(2.27a)

∂Hλ ( x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂K ( x ∗ (T ))
ṗ ∗ (t ) = − , p ∗i (T ) = , i = r + 1, . . . , n,
∂x ∂x i
(2.27b)

and along the solution x ∗ (t ), p ∗ (t ), the input u ∗ (t ) minimizes the modified


Hamiltonian,

Hλ ( x ∗ (t ), p ∗ (t ), u ∗ (t )) = min Hλ ( x ∗ (t ), p ∗ (t ), u), (2.28)


u∈U

at every t ∈ [0, T ] where u ∗ (t ) is continuous. 

Example 2.6.2 (Singular optimal control—an abnormal case). Consider the


system with given initial and final states

ẋ (t ) = u 2 (t ), x (0) = 0, x (1) = 0,

and with U = R and cost


1
J (u) = u (t ) dt .
0

As mentioned before, the only feasible control is the zero function. So the min-
imal cost is 0, and x ∗ (t ) = u ∗ (t ) = 0 for all time. The modified Hamiltonian is

Hλ (x, p, u) = pu 2 + λu, λ ∈ {0, 1}.

If we try to solve the normal Hamiltonian equations (2.27a, 2.28) (so for λ =
1), we find that the costate is constant and that u ∗ (t ) at every t minimizes
p ∗ (t ) u 2 (t )+ u (t ). But the true optimal control is u ∗ (t ) = 0 and this does not min-
imize p ∗ (t ) u 2 (t ) + u (t ).
If we take λ = 0 (the abnormal case), then the Hamiltonian simplifies to
H0 (x, p, u) = pu 2 . This again implies that the costate is constant, p ∗ (t ) = p̂. The
input u (t ) that minimizes the Hamiltonian p̂ u 2 (t ) now is either not defined (if
p̂ < 0), or is non-unique (if p̂ = 0), or equals zero (if p̂ > 0). This last case (the
zero input) is the true optimal input. 

One more abnormal case is discussed in Exercise 2.15. All other examples in
this chapter are normal.

Example 2.6.3 (Shortest path—a normal case). In the previous chapter (Exam-
ple 1.1.4 and Example 1.2.5), we solved the (trivial) shortest path problem by
2.6 O PTIMAL C ONTROL WITH F INAL C ONSTRAINTS 67

formulating it as an example of the simplest problem in the calculus of varia-


tions. We now formulate it as an optimal control problem with final condition.
Let x : [0, T ] → R be a function through the points x (0) = a and x (T ) = b, and
assume T > 0. The length of the curve of the function is
T
J ( x ) :=
˜ 1 + ẋ 2 (t ) dt .
0

We want to minimize J˜( x ). This can be seen as an optimal control problem for
the system

ẋ (t ) = u (t ), x (0) = x0 , x (T ) = xT ,

with cost
T 
J (u) = 1 + u 2 (t ) dt .
0

Its normal Hamiltonian is



H1 (x, p, u) = pu + 1 + u 2 .

If we apply Theorem 2.6.1, we find that p ∗ (t ) is constant. We denote this con-


stant as p̂. Since u ∗ (t ) minimizes the Hamiltonian, we necessarily have that
∂H ( x ∗ (t ), p̂, u ∗ (t )) u ∗ (t )
0= = p̂ + , (2.29)
∂u
1 + u 2∗ (t )

whenever u ∗ (t ) is finite. After some rearrangements, this yields the following


candidates for the optimal input (verify this yourself ):


⎪ −∞ if p̂ ≥ 1
⎨ −p̂
u ∗ (t ) = 1−p̂ 2 if − 1 < p̂ < 1 .



∞ if p̂ ≤ −1

We can strike off the first and the last candidates, because they clearly fail to
achieve the final condition x (T ) = x T . The second candidate says that u ∗ (t ) is
some constant. But for a constant input û := u ∗ (t ), the solution x (t ) of the dif-
ferential equation is x (t ) = ût + x 0 , which is a straight line. From the initial and
final conditions, it follows that û = (x T − x 0 )/T . Hence, as expected,
xT − x0 xT − x0
x ∗ (t ) = x0 + t, u ∗ (t ) = .
T T
The constant costate then follows from (2.29),
− u ∗ (t ) −(x T − x 0 )
p ∗ (t ) = p̂ = = .
1 + u 2∗ (t ) T 2 + (x T − x 0 )2
68 2 M INIMUM P RINCIPLE

It is interesting to compare this with the optimal cost (the minimal length of the
curve)

J (u∗) = T 2 + (x T − x 0 )2 .

(u∗)
We see that p ∗ (0) equals dJdx 0
. That is, p ∗ (0) expresses how strongly the optimal
cost changes if x 0 changes. We return to this sensitivity property of the costate
in § 3.5. 

Example 2.6.4 (Integrator system with fixed initial and final states). Consider
the system with bounded derivative,

ẋ (t ) = u (t ), u (t ) ∈ [−1, 1],

and with cost


T
J (u) = x (t ) dt .
0

In Example 2.5.2, we analyzed the same system and cost (for T = 1), but now we
fix both the initial and final states,

x (0) = x (T ) = 0.

To minimize the cost we want x (t ) as small (negative) as possible, yet it needs


to start at zero, x (0) = 0, and needs to end at zero, x (T ) = 0. The normal Hamil-
tonian is

H1 (x, p, u) = pu + x,

and therefore the costate equations become

ṗ ∗ (t ) = −1.

Notice that since the state is fixed at the final time, x (T ) = 0, there is no condi-
tion on the costate at the final time. So all we know, for now, about the costate
is that its derivative is −1, i.e.,

for some as yet unknown constant c. Given this p ∗ (t ) = c−t , the minimizer u ∗ (t )
of the Hamiltonian is
2.7 F REE F INAL T IME 69

This function switches sign (from negative to positive) at t = c, and, as a result,


the state x ∗ (t ) is piecewise linear. First, it goes down, and from t = c on it goes
up,

It will be clear that the only value of c for which x ∗ (T ) is zero, is

c = T /2.

This completely settles the optimal control problem. In the first half, [0, T /2],
we have ẋ ∗ (t ) = −1 and, in the second half, [T /2, T ], we have ẋ ∗ (t ) = +1. The
T
optimal cost is J ( u ∗ ) = 0 x ∗ (t ) dt = −T 2 /4. 

2.7 Free Final Time

So far, the final time T in the optimal control problem was fixed. Now we extend
the optimal control problem by minimizing the cost over all inputs as well as
over all final times T ≥ 0. As before we assume a cost of the form
T
J T ( u ) := L( x (t ), u (t )) dt + K ( x (T )).
0

Since we now have one extra degree of freedom, we expect that the minimum
principle still holds but with one extra condition. This turns out to be true, and
the extra condition is quite elegant:

Theorem 2.7.1 (Minimum principle with free final time). Consider the sys-
tem (2.25) with cost (2.26), and assume that f (x, u) and ∂ f (x, u)/∂x and L(x, u)
and ∂L(x, u)/∂x are continuous in x and u, and that K (x) and ∂K (x)/∂x are con-
tinuous in x.
Suppose ( u ∗ , T∗ ) is a solution of the optimal control problem with free final
time, and that u ∗ is piecewise continuous on [0, T∗ ], and that 0 ≤ T∗ < ∞. Then
all conditions of Theorem 2.6.1 hold (with T = T∗ ), and, in addition,

Hλ ( x ∗ (T∗ ), p ∗ (T∗ ), u ∗ (T∗ )) = 0. (2.30)

Proof. We prove it only for the normal case (λ = 1). If the pair ( u ∗ , T∗ ) is opti-
mal, then u ∗ is also optimal for the fixed final time T = T∗ ; hence, all conditions
of Theorem 2.6.1 hold.
Since u ∗ is assumed to be piecewise continuous, the limit u T∗ := limt ↑T∗ u ∗ (t )
exists. The given u ∗ is defined on [0, T∗ ], and we now extend its definition by
letting u ∗ (t ) = u T∗ for all t ≥ T∗ . That way u ∗ is continuous at T = T∗ , and the
70 2 M INIMUM P RINCIPLE

cost becomes differentiable with respect to T at T = T∗ . By the fact that T = T∗


is time-optimal, we have that

dJ ( u ∗ , T∗ )
= 0.
dT
This derivative equals

dJ ( u ∗ , T∗ ) ∂K ( x ∗ (T∗ ))
= ẋ ∗ (T∗ ) + L( x ∗ (T∗ ), u ∗ (T∗ ))
dT ∂x T
= p ∗T (T∗ ) f ( x ∗ (T∗ ), u ∗ (T∗ )) + L( x ∗ (T∗ ), u ∗ (T∗ ))
= H1 ( x ∗ (T∗ ), p ∗ (T∗ ), u ∗ (T∗ )).

The remarks about λ made in the previous section also apply to this situa-
tion. Also the constancy property of the Hamiltonian (Theorem 2.5.6) remains.
This is interesting because it shows that for final time problems the Hamiltonian
H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) is actually zero for all time!
An important special case is when L(x, u) = 1 and K (x) = 0. Then the cost
T
function equals J T ( u ) = 0 1 dt = T , that is, the task of the control input is to
realize given boundary conditions in minimal time. This only makes sense if
we have both initial and final conditions. Such problems are known as time-
optimal control problems. A classic time-optimal control problem is the prob-
lem of Zermelo.

Example 2.7.2 (Zermelo). Consider a boat on a river as depicted in Fig. 2.4.


The problem of Zermelo is to steer the boat in minimal time from a given point
on one side of the river to a given point on the other side. The coordinates of
the boat are denoted by x 1 , x 2 . The boat starts at ( x 1 (0), x 2 (0)) = (0, 0), and the
destination point is ( x 1 (T ), x 2 (T )) = (a, b) for some given a, b, where b is the
width of the river. The complicating factor is the flow velocity of the water in
the river. We assume the flow velocity to be parallel to the river banks, and to be
of the form w(x 2 ), where x 2 is the distance to one of the river banks, see Fig. 2.4.
We assume that the speed of the boat with respect to the water is constant and
equal to 1. The control u of the boat is the angle between the boat’s principal
axis and the x 1 -axis, see Fig. 2.4. This leads to the following equations of motion:

ẋ 1 (t ) = cos( u (t )) + w( x 2 (t )), x 1 (0) = 0, x 1 (T ) = a,


ẋ 2 (t ) = sin( u (t )), x 2 (0) = 0, x 2 (T ) = b. (2.31)

As we want to cross the river in minimal time, the cost to be minimized is


T
JT (u) = 1 dt = T.
0
2.7 F REE F INAL T IME 71

F IGURE 2.4: The problem of Zermelo. See Example 2.7.2.

F IGURE 2.5: The problem of Zermelo. We assume that the speed of the boat
with respect to the water is v = 1 and that the flow velocity is w(x 2 ) =
0.5, and
 cos(u)  (a, b) = (−1, 2). Then optimal is to take u constant
  such
 that
−0.8 (shown in red), for then the sum of cos(u) and 0.5 (shown
sin(u) = +0.6
  sin(u) 0
in blue) is −0.30.6 (shown in yellow), and this direction brings the boat to
(a, b) = (−1, 2) as required. It takes T = b/0.6 = 3 13 units of time. See Exam-
ple 2.7.2.

F IGURE 2.6: The problem of Zermelo. We assume that the speed of the boat
with respect to the water is v = 1 and that the flow velocity is w(x 2 ) =
x 2 (1 − x 2 /b) with (a, b) = (−1, 2). The optimal trajectory x ∗1 (t ), x ∗2 (t ) as
shown here was computed by iterating over u 0 in (2.36). The optimal value
is u 0 = 2.5937, and the optimal (minimal) time is T = 2.7570. The optimal
angle u ∗ (t ) of the boat does not vary much with time. See Example 2.7.2.
72 2 M INIMUM P RINCIPLE

With this cost, the normal Hamiltonian is given by

H1 (x, p, u) = p 1 cos(u) + p 1 w(x 2 ) + p 2 sin(u) + 1.

∂H1 ( x ∗ , p ∗ , u ∗ )
The optimal control u ∗ minimizes the Hamiltonian, so we need ∂u to
be equal to zero for all time. This gives

− p ∗1 sin( u ∗ ) + p ∗2 cos( u ∗ ) = 0. (2.32)

(To avoid clutter, we drop the argument t in most of the equations.) Also, the
free final time condition (2.30) has to hold, and this means that

p ∗1 cos( u ∗ ) + p ∗1 w( x ∗2 ) + p ∗2 sin( u ∗ ) + 1 = 0 (2.33)

for all time. Equations (2.32) and (2.33) are two linear equations in p ∗1 , p ∗2 , so
is easy to solve as
   
p ∗1 −1 cos( u ∗ )
= . (2.34)
p ∗2 cos( u ∗ )w( x ∗2 ) + 1 sin( u ∗ )

Incidentally, since we assumed the speed of the boat to be 1, we can not


expect to be able to reach every destination point (a, b) if the flow velocity
w(x 2 ) exceeds 1 everywhere. We assume from now on that |w(x 2 )| < 1 for every
x 2 ∈ [0, b] even though this assumption is stronger than needed for the problem
to have a solution. With this assumption, the inverse in (2.34) is guaranteed to
exist.
We have not yet exploited the costate equations. From the Hamiltonian, we
readily get the costate equations

ṗ ∗1 = 0,
(2.35)
ṗ ∗2 = − p ∗1 w  ( x ∗2 ),

in which w  is the derivative of w. Notice that we do not have final conditions


on the costate. Interestingly, ṗ ∗1 is zero. Using the formula for p ∗1 in (2.34), we
find that this derivative is

sin( u ∗ )( u̇ ∗ + cos2 ( u ∗ )w  ( x ∗2 ))
ṗ ∗1 = .
(cos( u ∗ )w( x ∗2 ) + 1)2

(Verify this yourself.) This needs to be zero for all time, so either sin( u ∗ ) is zero
or u̇ ∗ = − cos2 ( u ∗ )w  ( x ∗2 ). Likewise it can be shown that the costate equation
for p ∗2 holds iff u̇ ∗ = − cos2 ( u ∗ )w  ( x ∗2 ) or cos( u ∗ ) + w( x ∗2 ) = 0. Since sin( u ∗ )
and cos( u ∗ ) + w( x ∗2 ) cannot be zero simultaneously (because we assumed
|w(x 2 )| < 1), we conclude that both costate equations hold iff

u̇ ∗ = − cos2 ( u ∗ )w  ( x ∗2 ).
2.7 F REE F INAL T IME 73

Summarizing, we have the following three coupled differential equations:

ẋ ∗1 (t ) = cos( u ∗ (t )) + w( x ∗2 (t )), x ∗1 (0) = 0, (2.36a)


ẋ ∗2 (t ) = sin( u ∗ (t )), x ∗2 (0) = 0, (2.36b)

u̇ ∗ (t ) = − cos ( u ∗ (t ))w ( x ∗2 (t )),
2
u ∗ (0) = u0 , (2.36c)

and its solution by construction makes the costate defined in (2.34) satisfy the
Hamiltonian equations and makes the Hamiltonian equal to zero for all time.
The game is now to determine the initial condition u 0 of the control for which
( x 1 (T ), x 2 (T )) equals (a, b) for some T > 0. Without further assumptions on the
flow velocity w(x 2 ), there does not seem to be an easy answer to this problem.
For the special case of a constant flow velocity, w(x 2 ) = w 0 , however, we see that
u ∗ (t ) is constant, and then ( x ∗1 (t ), x ∗2 (t )) is a straight line. A particular instance
is shown in Fig. 2.5. A more realistic scenario is when the flow velocity w(x 2 ) is
small near the banks. One such example is depicted in Fig. 2.6. The solution
shown in this figure was determined numerically (by iterating over u 0 ). 

Example 2.7.3 (Minimal time car parking problem). This is an elegant and
classic application. We want to steer a car into a parking spot, and we want to
do it in minimal time. To keep things manageable, we assume that we can steer
the car in one dimension only (like a cart on a rail). The position of the car is
denoted as x 1 and its speed as x 2 . The acceleration u is bounded, specifically,
u (t ) ∈ [−1, 1] for all t . The equations thus are
ẋ 1 (t ) = x 2 (t ), x 1 (0) = x01 ,
ẋ 2 (t ) = u (t ), x 2 (0) = x02 , u (t ) ∈ [−1, 1].
The parking spot we take to be x 1 = 0, and the time we reach the parking spot
we denote by T , and, of course, at that moment our speed should become zero.
So the final conditions are

x 1 (T ) = 0, x 2 (T ) = 0.
T
We want to achieve this in minimal time, thus we take as cost J T ( u ) = 0 1 dt .
The normal Hamiltonian for this problem is

H1 (x, p, u) = p 1 x 2 + p 2 u + 1.

From the Hamiltonian, the costate equations follow as

ṗ 1 (t ) = 0,
ṗ 2 (t ) = − p 1 (t ).
Since both components of the final state x (T ) are fixed, the final conditions on
both components of the costate are absent. Therefore, in principle, every con-
stant p 1 (t ) = −a is allowed and, consequently, every linear function

p 2 (t ) = at + b.
74 2 M INIMUM P RINCIPLE

We can not have a = b = 0 because that contradicts the fact that the Hamiltonian
p 1 (t ) x 2 (t ) + p 2 (t ) u (t ) + 1 is zero along optimal solutions (Theorem 2.7.1). As a
result, the second costate entry, p 2 (t ), is not the zero function. This, in turn,
implies that p 2 (t ) switches sign at most once. Why is this important? Well, the
optimal u ∗ (t ) minimizes the Hamiltonian, p 1 (t ) x 2 (t ) + p 2 (t ) u (t ) + 1, and since
u ∗ (t ) ∈ [−1, 1] this yields
u ∗ (t ) = − sgn( p 2 (t )).
This is well defined because p 2 (t ) is nontrivial. In fact, as p 2 (t ) switches sign at
most once, also

u ∗ (t ) switches sign at most once.


Let t s be the time of switching. Then the input for t > t s by definition does
not switch any more and so is either +1 throughout or −1 throughout. Now for
u (t ) = +1, the system equations become ẋ 2 (t ) = 1, ẋ 1 (t ) = x 2 (t ), so x 2 (t ) = t + c
and x 1 (t ) = 12 (t + c)2 + d = 12 x 22 (t ) + d . Hence, the trajectories ( x 1 (t ), x 2 (t )) are
shifted parabolas, shown here in red:

(The arrows indicate the direction of the trajectory as time increases.) Likewise,
if u (t ) = −1, then all possible ( x 1 (t ), x 2 (t )) are the shifted “reversed” parabolas,
shown here in blue:
2.7 F REE F INAL T IME 75

Since on (t s , T ] the input does not change, and since we demand that x (T ) =
(0, 0), it must be that on (t s , T ] the state either follows this red or blue parabola:

After all, these two are the only two trajectories that end up at the desired final
state x (T ) = (0, 0). Before the moment of switching, the input u (t ) had the oppo-
site sign. For instance, if after the switch we have u (t ) = +1 (the red trajectory),
then before the switch we have u (t ) = −1, i.e., any of the blue parabolas. These
have to end up at the above red parabola at t = t s . Inspection shows that the
possible trajectories are any of these:
x2

u * (t ) 1

x1

u * (t ) 1

This solves the problem for every initial state ( x 1 (0), x 2 (0)). If before the switch
the trajectory follows a blue parabola, then, when it reaches the thick red
parabola, the inputs switches sign, and the trajectory continues along the thick
red parabola, ending up at (0, 0). Likewise, if it first follows a red parabola then,
when it reaches the thick blue parabola, the input switches sign, and the trajec-
tory continues along the thick blue parabola, ending up at (0, 0). 
76 2 M INIMUM P RINCIPLE

2.8 Convexity and the Minimum Principle

The minimum principle assumes the existence of an optimal control, and then
derives some conditions for it: (2.14) and (2.15). These conditions are necessary
for optimality, but in general not sufficient (see Exercise 2.16). If, however, the
problem has certain convexity properties then the necessary conditions are suf-
ficient. That is what the following theorem is about. It requires some knowledge
of convex sets and functions as discussed in Appendix A.7.

Theorem 2.8.1 (Mangasarian). Consider the optimal control problem defined


by (2.1), (2.2), (2.3), and assume that f (x, u), L(x, u), and K (x) are C 1 . Suppose
( x ∗ , p ∗ , u ∗ ) are piecewise continuous functions that satisfy (2.14) and (2.15), and
that

• U is a convex set,

• H (x, p ∗ (t ), u) for every t ∈ [0, T ] is convex in (x, u) ∈ (Rn , U),

• K (x) is convex in x ∈ Rn .

Then ( x ∗ , p ∗ , u ∗ ) is an optimal triple; in particular, u ∗ is an optimal control.

Proof. In order not to digress too much, we allow ourselves here some “proofs
by picture”. Details can be found in Appendix A.7.
Convexity of the Hamiltonian in (x, u) means that

∂H (x̄, p ∗ (t ), ū) ∂H (x̄, p ∗ (t ), ū)


H (x, p ∗ (t ), u) ≥ H (x̄, p ∗ (t ), ū)+ (x − x̄)+ (u − ū)
∂x T
∂u T
for all x, x̄ ∈ Rn , u, ū ∈ U. This is illustrated in Fig. 2.7 (left). By convexity of U and
the fact that u ∗ (t ) minimizes the Hamiltonian for almost all times, we have that

∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t ))
(u − u ∗ (t )) ≥ 0 ∀u ∈ U
∂u T
for almost all times. This property is illustrated in Fig. 2.7 (right). The above two
∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t ))
inequalities, combined with the fact that ṗ ∗ (t ) = − ∂x , shows that

H (x, p ∗ (t ), u) ≥ H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) − ṗ ∗T (t )(x − x ∗ (t ))

for all x ∈ Rn , u ∈ U, for almost all times. For simplicity, we now assume that the
terminal cost is absent, K = 0. (Exercise 2.19 considers nonzero K .) The final
inequality gives us that
2.9 E XERCISES 77

T
J (x 0 , u ) − J (x 0 , u ∗ ) = L( x , u ) dt − L( x ∗ , u ∗ ) dt
0
T
= H ( x , p ∗ , u ) − p ∗T ẋ − H ( x ∗ , p ∗ , u ∗ ) − p ∗T ẋ ∗ dt
0
T
= H ( x , p ∗ , u ) − H ( x ∗ , p ∗ , u ∗ ) − p ∗T ( ẋ − ẋ ∗ ) dt
0
T
≥ − ṗ ∗T ( x − x ∗ ) − p ∗T ( ẋ − ẋ ∗ ) dt
0
 T
= − p ∗T (t )( x (t ) − x ∗ (t )) 0 = 0.

(In the last equality, we used that p ∗ (T ) = 0 and that x (0) = x ∗ (0) = x 0 .) There-
fore J (x 0 , u ) ≥ J (x 0 , u ∗ ). So u ∗ is optimal. ■

Many of the examples considered in the chapter satisfy the above convexity
properties, see Exercises 2.20 and 2.21 for illustrations.

{u H (u) c}
u
h(x̄) h (x̄ ) (x x̄ ) u
h (x̄ )
x

H (u )
u
x̄ x
{u H (u) c}

F IGURE 2.7: Left: a C 1 function h : R → R is convex iff h(x) ≥ h(x̄)+ ∂h( x̄)
∂x (x −
2 1 2
x̄) ∀x̄, x ∈ R. Right: Suppose H : R → R is C and that U ⊆ R . If H (u ∗ ) =
minu∈U H (u) for some u ∗ ∈ U, and U is convex, then ∂H∂u(uT∗ ) (u − u ∗ ) ≥ 0 ∀u ∈
U. This is used in the proof of Theorem 2.8.1. Appendix A.7 has more
details.

2.9 Exercises

2.1 Consider the scalar system

ẋ (t ) = x (t ) u (t ), x (0) = x0 = 1,
T
with U = R and cost function J ( u ) = 2 x (T ) + 0 x 2 (t ) + u 2 (t ) dt .

(a) Determine the Hamiltonian H (x, p, u) and the differential equation


for the costate.
(b) Determine the optimal input u ∗ (t ) as a function of x ∗ (t ), p ∗ (t ).
78 2 M INIMUM P RINCIPLE

(c) Show that H ( x ∗ (T ), p ∗ (T ), u ∗ (T )) is zero.


(d) Show that p ∗ (t ) = 2 (the constant function 2) satisfies the costate
equation.
(e) Express the optimal u ∗ (t ) as a function of x ∗ (t ), and then determine
x ∗ (t ). [Hint: see Example B.1.5.]

In the next chapter, we analyze this problem from another perspective


(Exercise 3.3).

2.2 Consider the system

ẋ (t ) = u (t ), x (0) = x0 ,
T
and cost function J ( u ) = 0 x 2 (t ) dt . We want to minimize this cost over
all u : [0, T ] → [0, 1].

(a) Give the Hamiltonian and the differential equation for the costate.
(b) Argue from the Hamiltonian that u ∗ (t ) most likely assumes just one
or two values.
(c) Argue that if x 0 > 0, then x ∗ (t ) > 0 for all t ∈ [0, T ].
(d) Prove that p ∗ (t ) under the conditions stated in (c) has at most one
sign change. What does this mean for u ∗ (t )?
(e) Solve the optimization problem for x 0 > 0. Also give the solution for
p ∗ (t ).
(f ) Determine an optimal input for the case that x 0 < 0, and verify that it
satisfies the Hamiltonian equations (2.14), (2.15). [Hint: directly for-
mulate an optimal input u ∗ without using the minimum principle.]

2.3 Consider the scalar system ẋ (t ) = x (t ) + u (t ) with initial condition x (0) =


1
x 0 . Determine the input u : [0, 1] → [0, 4] that minimizes 0 12 u 2 (t )−2 u (t )−
2 x (t ) dt over all functions u : [0, 1] → [0, 4].

2.4 Optimal potential energy. Consider

This is the standard linear model for a point mass of mass m = 1, dis-
placement from the equilibrium x 1 , velocity x 2 , and which is subject to
an external force u and a spring force with spring constant k > 0.
We assume that | u (t )| ≤ 1 for all t .
2.9 E XERCISES 79

(a) Show that without externalforce u , the mass follows a harmonic


motion with period T = 2π/ k.

The system has to be controlled in such a way that after a time T = 2π/ k
the potential energy 12 k x 21 (T ) of the mass is maximal.

(b) Formulate the associated optimal control problem. [Hint: use


L(x, u) = 0.]
(c) Derive the equations for the costate, including final conditions.
Show that the optimal control only depends on the second com-
ponent p 2 of the costate. How are the control and p 2 connected?
(d) Derive, by elimination of the first component p 1 , a differential equa-
tion for p 2 , including final condition. Determine all possible solu-
tions p 2 , and, from this, derive all possible optimal controls.
(e) Determine an optimal displacement x ∗1 (t ) as a function of t for the
case that x ∗1 (0) = x ∗2 (0) = 0. What is the maximal potential energy
at the final time in this case?

2.5 Initial and final conditions. Consider ẋ (t ) = x (t ) + u (t ) with an initial and


3
a final condition, x (0) = 1, x (3) = 0, and U = R. As cost we take 0 41 u 4 (t ) dt .
Determine the optimal state x ∗ (t ) and optimal control u ∗ (t ).

2.6 Initial and final conditions. Let u , y : [0, T ] → R. Consider the second-order
differential equation

ÿ (t ) + y (t ) = u (t ), y (0) = y 0 , ẏ (0) = ẏ 0 ,

with cost
T
1
J (u) = u 2 (t ) dt .
2 0

Determine the optimal control u ∗ that drives the system from the initial
state y (0) = y 0 , ẏ (0) = ẏ 0 to the final state y (T ) = ẏ (T ) = 0.

2.7 Maximal distance. We want to move a mass in T seconds, beginning and


ending with zero speed, using bounded acceleration. With x 1 its position
and x 2 its speed, a model for this problem is

ẋ 1 (t ) = x 2 (t ), x 1 (0) = 0,
ẋ 2 (t ) = u (t ), x 2 (0) = 0, x 2 (T ) = 0.

Here u is the acceleration which we take to be bounded in magnitude


by one, that is, u (t ) ∈ [−1, 1] for all t . We want to maximize the traveled
distance x ∗1 (T ).

(a) Determine the Hamiltonian H (x, p, u).


80 2 M INIMUM P RINCIPLE

(b) Determine the Hamiltonian equations in x (t ) and p (t ) as used in the


minimum principle, including all initial and final conditions.
(c) Determine the general solution of the costate p (t ) for t ∈ [0, T ].
(d) Determine the optimal input u ∗ (t ) for all t ∈ [0, T ] and compute the
maximal distance x ∗1 (T ).

F IGURE 2.8: A pendulum with a torque u. See Exercise 2.8.

2.8 Control of pendula via torques. Consider a mass m hanging from a ceiling
on a thin massless rod of length , see Fig. 2.8. We can control the pendu-
lum with a torque u exerted around the suspension point. The differential
equation describing the pendulum without damping is

m2 φ̈(t ) + g m sin(φ(t )) = u (t ),

where φ is the angle with respect to the stable equilibrium state (the ver-
tical hanging position). The objective is to minimize the cost
T
J ( u ) := m2 φ̇2 (T ) − 2mg  cos(φ(T )) + φ̇2 (t ) + u 2 (t ) dt .
0

It is convenient to use x 1 := φ and x 2 := φ̇.

(a) Determine the state differential equation ẋ (t ) = f ( x (t ), u (t )).


(b) Determine the Hamiltonian H (x, p, u) and the differential equation
for the costate, including final conditions.
T
(c) Calculate 0 φ̇(t ) u (t ) dt . What do you see?
(d) Express the optimal control in terms of φ and/or φ̇.

2.9 Optimal capacitor charging. Consider the RC -circuit of Fig. 2.9. We want
to determine at any moment in time the voltage u (t ) of the voltage source
that charges the capacitor in T seconds from zero voltage, x (0) = 0,
to a certain desired voltage, x (T ) = x desired , with minimal dissipation
of energy through the resistor. The voltage v (t ) across the resistor is
given by Kirchhoff’s voltage law as v (t ) = u (t ) − x (t ). Hence the current
i (t ) through the resistor with resistance R equals i (t ) = R1 ( u (t ) − x (t )),
2.9 E XERCISES 81

R
u C x

F IGURE 2.9: An RC -circuit with resistance R and capacitance C . The volt-


age source is denoted by u, and the voltage across the capacitor is denoted
by x. See Exercise 2.9.

and thus the dynamics of the charge q (t ) at the capacitor is given as


q̇ (t ) = i (t ) = R1 ( u (t ) − x (t )). For a linear capacitor with capacitance C the
charge q (t ) equals C x (t ). This leads to the model

1 1
ẋ (t ) = − x (t ) + u (t ). (2.37)
RC RC
Furthermore, the power dissipated in the resistor is given as v (t ) i (t ) =
R v (t ), and, hence, the total energy loss is
1 2

T
1 2
J (u) = u (t ) − x (t ) dt .
0 R

For the rest of this exercise we take R = 1 (ohm), and C = 1 (farad), and
x desired = 1 (volt). We further assume that x (0) = 0, and that U = R.

(a) Determine the solution ( x ∗ , p ∗ , u ∗ ) of the normal Hamiltonian equa-


tions (2.27a), (2.28) explicitly as functions of time.
(b) Are the convexity assumptions of Theorem 2.8.1 satisfied?
(c) Determine the minimal cost J ( u ∗ ). It turns out that the minimal cost
decreases as T grows. Explain why this makes sense.

2.10 Soft landing on the Moon. We consider the problem of optimal and safe
landing on the Moon. The situation is depicted in Fig. 2.10. We assume
the lunar ship only moves in the vertical direction. Its position relative to
the Moon’s surface is denoted by y (t ), and its mass is denoted by m (t ).
The ship can generate an upwards force by thrusting out gasses down-
wards (in the direction of the Moon). We assume it does so with a con-
stant velocity c, but that it can control the rate − ṁ (t ) at which it expels the
mass. This results in an upwards force of −c ṁ (t ). The gravitational pull on
the lunar ship is −g m (t ). (On the Moon the gravitational acceleration g is
1.624 m/s2 .) The altitude y (t ) of the ship satisfies the differential equation

m (t ) ÿ (t ) = −g m (t ) − c ṁ (t ).
82 2 M INIMUM P RINCIPLE

F IGURE 2.10: Soft landing on the Moon. See Exercise 2.10.

As mention earlier, we can control the rate of expel

u (t ) := − ṁ (t ).

Clearly, u (t ) is bounded and nonnegative. Specifically we assume that


u (t ) ∈ U :=[0, 1]. The total expelled mass over [0, T ] equals
T
J (u) = u (t ) dt .
0

The objective is to determine the control u and final time T > 0 that min-
imizes the total expelled mass, J ( u ), while achieving a safe, soft landing
on the Moon at final time T . The latter means that y (T ) = 0 and ẏ (T ) = 0.
With the state variables x 1 := y , x 2 := ẏ , x 3 := m we can rewrite the differen-
tial equations as

ẋ 1 (t ) = x 2 (t ), x 1 (0) = y 0 , x 1 (T ) = 0,
u (t )
ẋ 2 (t ) = c −g, x 2 (0) = ẏ 0 , x 2 (T ) = 0,
x 3 (t )
ẋ 3 (t ) = − u (t ), x 3 (0) = m0 .

(a) Explain in words why in this application we need x 1 (t ) ≥ 0, x 3 (t ) > 0


for all t ∈ [0, T ], and that ẋ 2 (T ) ≥ 0.
(b) Determine the Hamiltonian and the differential equation for the
costate p = ( p 1 , p 2 , p 3 ), including possible final conditions.
∂H ( x (t ), p (t ), u (t ))
(c) Show that z (t ) := ∂u equals

p 2 (t )
z (t ) = c − p 3 (t ) + 1,
x 3 (t )
and show that
−c p 1 (T )
ż (t ) = .
x 3 (t )
2.9 E XERCISES 83

Also, use the fact H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) is the zero function for time-


optimal problems to show that z ∗ (t ) cannot be the zero function.
(d) Conclude from (c) that u ∗ (t ) is of the form

0 if t < t s
u ∗ (t ) =
1 if t > t s

for some t s < T . Thus it is optimal to thrust out gasses only during
the final stage of descent, and then to do so at maximal rate.

2.11 Initial and final conditions. Consider the system

ẋ (t ) = x (t )(1 − u (t )), x (0) = 1, x (1) = 12 e

with cost
1
J (u) = − ln x (t ) u (t ) dt .
0

Since x (0) > 0 we have that x (t ) ≥ 0 for all t . For a well-defined cost we
hence need u (t ) ∈ [0, ∞) but for the moment we allow any u (t ) ∈ R and
later verify that the optimal u ∗ (t ) is in fact positive for all t ∈ [0, 1].

(a) Determine the Hamiltonian.


(b) Determine the Hamiltonian equations (2.11).
(c) Show that u (t ) = −1/( p (t ) x (t )) is the candidate optimal control.
(d) Substitute this u (t ) = −1/( p (t ) x (t )) into the Hamiltonian equations
and solve for p ∗ (t ) and then x ∗ (t ) and subsequently u ∗ (t ).
(e) Is u ∗ (t ) > 0 for all t ∈ [0, 1]?

2.12 Running cost depending on time. Consider the second-order system with
mixed initial and final conditions

ẋ 1 (t ) = u (t ), x 1 (0) = 0, x 1 (1) = 1,
ẋ 2 (t ) = 1, x 2 (0) = 0,

and with cost


1
J (u) = u 2 (t ) + 12 x 2 (t ) x 1 (t ) dt .
0

(Notice that x 2 (t ) = t . This is a simple strategy to allow running costs that


depend on t , here L( x (t ), u (t )) = u 2 (t ) + 12t x 1 (t ).)
We assume that the input is not restricted, i.e., U = R.

(a) Determine the Hamiltonian for this problem.


84 2 M INIMUM P RINCIPLE

(b) Determine the differential equations for state x (t ) and costate p (t ),


including the boundary conditions.
(c) Express the candidate minimizing u ∗ (t ) as a function of x (t ), p (t ).
(d) Solve the equations for x ∗ , p ∗ , u ∗ (that is, determine x ∗ (t ), p ∗ (t ), u ∗ (t )
as explicit functions of time t ∈ [0, 1]).

2.13 Two-sector economy. Consider an economy consisting of two sectors


where sector 1 produces investment goods and sector 2 produces con-
sumption goods. Let x i (t ), i = 1, 2, denote the production rate in the i -th
sector at time t , and let u (t ) be the fraction of investments allocated to
sector 1. Suppose the dynamics of the x i (t ) are given by

ẋ 1 (t ) = a u (t ) x 1 (t ),
ẋ 2 (t ) = a(1 − u (t )) x 1 (t ),
where a is a positive constant. Hence, the increase in production per unit
of time in each sector is assumed to be proportional to the investment
allocated to the sector. By definition we have

0 ≤ u (t ) ≤ 1 for all t ∈ [0, T ],

where [0, T ] denotes the planning period. As optimal control problem we


may consider the problem of maximizing the total consumption in the
given planning period [0, T ], thus our problem is to maximize
T
J˜( u ) := x 2 (t ) dt
0
subject to

x 1 (0) = x01 , x 1 (T ) = free,


x 2 (0) = x02 , x 2 (T ) = free,
in which x 01 > 0, x 02 ≥ 0.

(a) Argue that x 1 (t ) > 0 for all time.


(b) Determine an optimal input using the minimum principle. [Hint: it
may help to realize that ṗ 1 (T ) − ṗ 2 (T ) < 0.]

2.14 Consider the second-order system with mixed initial and final conditions

ẋ 1 (t ) = u (t ), x 1 (0) = 0, x 1 (1) = 2,
ẋ 2 (t ) = 1, x 2 (0) = 0,
and with cost
1
J (u) = u 2 (t ) + 4 x 2 (t ) u (t ) dt .
0
The input u : [0, 1] → R is not restricted, i.e., u (t ) can take on any real
value.
2.9 E XERCISES 85

(a) Determine the Hamiltonian for this problem.


(b) Determine the differential equations for the costate p (t ), including
the boundary conditions.
(c) Express the candidate minimizing u ∗ (t ) as a function of x ∗ (t ), p ∗ (t ).
(d) Solve the equations for x ∗ , p ∗ , u ∗ (that is, determine x ∗ (t ), p ∗ (t ), u ∗ (t )
as explicit functions of time t ∈ [0, 1]).

2.15 Integral constraints. Let us return to the calculus of variations problem of


minimizing
T
F ( x (t ), ẋ (t )) dt
0
over all functions x : [0, T ] → Rn that satisfy an integral constraint
T
M ( x (t ), ẋ (t )) dt = c 0 .
0
Theorem 1.7.1 (p. 32) says that the optimal solution satisfies either (1.54)
for some μ∗ ∈ R, or satisfies (1.55). This problem can also be cast as an
optimal control problem with a final condition, and then Theorem 2.6.1
gives us the same two conditions (depending on whether the Hamiltonian
is normal or abnormal):

(a) Let ẋ (t ) = u (t ) and define ż n+1 (t ) = M ( x (t ), u (t )) and z :=( x , z n+1 ).


Formulate the above calculus of variations problem as an optimal
control problem with a final condition on state z and with U = Rn .
(I.e., express f (z), L(z, u), K (z) in terms of F, M , c 0 .)
(b) Since z :=( x , z n+1 ) has n + 1 components, also the corresponding
costate p has n + 1 components. Show that p n+1 (t ) is constant for
the normal Hamiltonian H1 (z, p, u) as well as the abnormal Hamil-
tonian H0 (z, p, u).
(c) For the normal Hamiltonian H1 (z, p, u), show that the existence of
a solution of the Hamiltonian equations (2.27a) and (2.28) imply
that (1.54) holds for μ∗ = p n+1 .
(d) For the abnormal Hamiltonian H0 (z, p, u), show that the existence of
a solution of the Hamiltonian equations (2.27a) and (2.28) with
p n+1 = 0 implies that (1.55) holds.
2.16 The minimum principle is not sufficient for optimality. The minimum
principle is necessary for optimality, but it is not sufficient. That is to say,
if we are able to solve the Hamiltonian equations (2.14) (including the
pointwise minimization (2.15)) then it is not guaranteed that the so found
input is optimal. Here is an example: consider the system and cost
2π
ẋ (t ) = u (t ), x (0) = 1, J (u) = − 12 x 2 (t ) + 12 u 2 (t ) dt .
0
86 2 M INIMUM P RINCIPLE

We allow every input, so U = R.

(a) Solve the equations (2.14), (2.15).


(b) Compute J ( u ∗ ) for the input u ∗ found in the previous part.
(c) Find an input u for which J ( u ) is less than J ( u ∗ ). (Just guess a simple
one; many inputs u will do the job.)

2.17 Time-optimal control. Consider the optimal control problem of Exam-


ple 2.1.1.

(a) Solve this problem using the minimum principle.


(b) For every T ≥ 0 determine H ( x ∗ (t ), p ∗ (t ), u ∗ (t )).
(c) For every T ≥ 0 determine the optimal cost J ( u ∗ ).
(d) Now suppose we also optimize the cost over all final times T ≥ 0.
Which T ’s are optimal, and does this agree with Theorem 2.7.1?

2.18 Calculus of variations. Consider the calculus of variations problem of


Example 2.4.1.

(a) Use (2.11c) to express p ∗ (t ) explicitly in terms of L, x ∗ , u ∗ , t and


derivatives.
(b) Given the above form of p ∗ (t ), and (2.11a), show that the costate
equation (2.11b) is equivalent to the Euler-Lagrange equation.

2.19 Convexity. The proof of Theorem 2.8.1 assumes that the terminal cost is
absent, i.e., K (x) = 0. Now consider more general K (x). Assume K (x) and
∂K (x)
∂x are continuous, and that K (x) is a convex function.

(a) Adapt the proof of Theorem 2.8.1 so that it also works for nonzero
convex K (x). [Hint: have a look at Lemma A.7.1 (p. 198).]
(b) Theorem 2.8.1 considers the standard (free endpoint) optimal con-
trol problem of Theorem 2.5.1. Show that Theorem 2.8.1 remains
valid for the case that the final state is constrained as in Theo-
rem 2.6.1.

2.20 Convexity. Does Example 2.5.3 satisfy the convexity assumptions of Theo-
rem 2.8.1?

2.21 Convexity. Does Example 2.5.4 satisfy the convexity assumptions of Theo-
rem 2.8.1?
Chapter 3

Dynamic Programming

3.1 Introduction

The minimum principle was developed in the Soviet Union in the late fifties of
the previous century. At about the same time Richard Bellman in the USA devel-
oped an entirely different approach to optimal control, called dynamic pro-
gramming. In this chapter, we deal with dynamic programming. As in the previ-
ous chapter, we assume that the state satisfies a system of differential equations

ẋ (t ) = f ( x (t ), u (t )), x (0) = x0 , (3.1a)

in which x : [0, T ] → Rn , and x 0 ∈ Rn , and that the input u at each moment in


time takes values in a given subset U of Rm ,

u : [0, T ] → U. (3.1b)

As before, we associate with system (3.1a) a cost over a finite time horizon [0, T ]
of the form
T
J [0,T ] (x 0 , u ) := L( x (t ), u (t )) dt + K ( x (T )). (3.1c)
0

This cost depends on the input u , but in dynamic programming it is convenient


to also emphasize the dependence of this cost on x 0 and the time interval [0, T ].
The final time T and the functions K : Rn → R and L : Rn × U → R are assumed
as given.
The crux of dynamic programming is to associate with this single cost over
time horizon [0, T ] a whole family of costs over subsets of this time horizon,
T
J [τ,T ] (x, u ) := L( x (t ), u (t )) dt + K ( x (T )), (3.2)
τ

for each initial time τ ∈ [0, T ] and for each initial state x (τ) = x, and then to
establish a dynamic relation between the family of optimal costs (hence the

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 87


G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0_3
88 3 DYNAMIC P ROGRAMMING

name dynamic programming). On the one hand, this complicates the problem
because many optimal control problems need to be considered. On the other
hand, if this dynamic relation can be solved, then it turns out to produce suffi-
cient conditions for optimality.

3.2 Principle of Optimality

u (t )
û(t )

0 T t

F IGURE 3.1: Principle of optimality, see § 3.2.

The principle of optimality is a simple yet powerful result in optimal control.


Roughly speaking, it says that every tail of an optimal control is optimal. We
formalize this result. Figure 3.1 should be instructive here. It depicts an optimal
control u ∗ on [0, T ] and an alternative control û on a restricted time window
[τ, T ] for some τ ∈ [0, T ]. The optimal control u ∗ steers the state from x (0) = x 0
to some value x ∗ (τ) at time τ. Is it now possible that the alternative control û
achieves a smaller cost-to-go J [τ,T ] ( x ∗ (τ), u ) over the remaining window [τ, T ]
than u ∗ ? That is, is it possible that

J [τ,T ] ( x ∗ (τ), û ) < J [τ,T ] ( x ∗ (τ), u ∗ )?

No, because if it would, then the new input ũ constructed from u ∗ over the ini-
tial interval [0, τ], and from û over the remaining [τ, T ], would improve on u ∗
over the entire interval:

J [0,T ] (x 0 , ũ ) = L( x (t ), ũ (t )) dt + J [τ,T ] ( x (τ), ũ )
0τ
= L( x ∗ (t ), u ∗ (t )) dt + J [τ,T ] ( x ∗ (τ), û )
0

< L( x ∗ (t ), u ∗ (t )) dt + J [τ,T ] ( x ∗ (τ), u ∗ ) = J [0,T ] (x 0 , u ∗ ),
0

and this contradicts the assumed optimality of u ∗ . So we conclude: if u ∗ :


[0, T ] → U is optimal for J [0,T ] (x 0 , u ) then for every τ ∈ [0, T ] this u ∗ restricted
to [τ, T ] is optimal for J [τ,T ] ( x ∗ (τ), u ) as well. This is the principle of optimality.
It will be of great help in the analysis to come. Notice that this principle hinges
on the causality property of the system that x (t ) for t < τ does not depend on
u (t  ) for t  > τ. Also, the additivity property of the cost τ
is crucial, which is that
the “cost-to-go” J [τ,T ] ( x ∗ (τ), u ) adds to “cost-so-far” 0 L( x ∗ (t ), u (t )) dt .
3.3 D ISCRETE -T IME DYNAMIC P ROGRAMMING 89

3.3 Discrete-Time Dynamic Programming

The main idea of dynamic programming and the reason for its popularity is
explained best for systems that evolve over discrete time—as opposed to the
systems that evolve over continuous time, which we normally consider in this
book. Thus, for the time being, consider a discrete-time system

x (t + 1) = f ( x (t ), u (t )), x (0) = x0 , (3.3)

on some discrete finite time horizon

t ∈ {0, 1, . . . , T − 1},

with x 0 given, and T a given positive integer. We want to find a control sequence
u ∗ = ( u ∗ (0), u ∗ (1), . . . , u ∗ (T − 1)), called optimal control (sequence), and resulting
state sequence ( x ∗ (0), x ∗ (1), . . . , x ∗ (T )), that minimizes a cost of the form
T
−1
J [0,T ] (x 0 , u ) = L( x (t ), u (t )) + K ( x (T )). (3.4)
t =0

Incidentally, in discrete-time systems there is no need to restrict the state space


X to some set on which derivatives are defined, like our default Rn . Indeed, the
state space in applications is often a finite set. The same is true for the input set
U. In what follows, the number of elements of a set X is denoted as |X|.

2
1
3
0
4
6
5

F IGURE 3.2: A discrete-time system with 7 states. See Example 3.3.1.

Example 3.3.1 (Naive optimization). Suppose the state space X consists of the
7 integer elements

X = {0, 1, 2, . . . , 6}.

Align the states in a circle (see Fig. 3.2), and suppose that at each moment in
time, the state can either move one step counter-clockwise, or stay where it
is. Thus, at each moment in time, we have a choice of two. The input space
U hence has two elements. If we take

U = {0, 1}
90 3 DYNAMIC P ROGRAMMING

then the transition from one state to the next is modeled by the discrete system
x (t + 1) = x (t ) + u (t ), u (t ) ∈ U, t ∈ {0, 1, . . . , T − 1}
(counting modulo 7, so 6 + 1 = 0). Each transition from one state x (t ) to the
next x (t + 1) is assumed to cost a certain amount L( x (t ), u (t )), and the final
state x (T ) costs an additional K ( x (T )). The total cost hence is (3.4). The naive
approach to determine the optimal control ( u (0), . . . , u (T −1)) and resulting opti-
mal state sequence ( x (1), . . . , x (T )) is to just explore them all and pick the best.
As we can move in two different ways each at moment in time, this naive
approach requires 2T sequences ( x (1), . . . , x (T )) to explore. Since each sequence
has length T the evaluation of the cost for each sequence is (roughly) lin-
ear in T , and, therefore, the total number of operations required in this naive
approach is of order
T × 2T .

It is not hard to see that for arbitrary systems (3.3) the total number of oper-
ations that the naive approach requires is of order
T × |U|T .
Thus the total number of operations is exponential in T .
In general, in dynamic programming we solve the minimization backwards
in time. This may at first sight seem to complicate the analysis, but it allows us
to exploit the principle of optimality. The following example explains it all.

Example 3.3.2 (Dynamic programming). Continue with the system of Exam-


ple 3.3.1:
x (t + 1) = x (t ) + u (t ), u (t ) ∈ {0, 1}, t ∈ {0, 1, . . . , T − 1}
with x (t ) ∈ X :={0, 1, . . . , 6}, and, to make it more explicit, assume that the final
cost is x 2 and that each counter-clockwise move costs 1, i.e.,
K (x) = x 2 and L(x, u) = u ∈ U :={0, 1}.
This system over a given time horizon we now visualize as
x= 6

x= 1
x= 0
t=0

t=1

t=T−1

t=T
3.3 D ISCRETE -T IME DYNAMIC P ROGRAMMING 91

(Here we took T = 5.) The horizontal axis represents the time, t = 0, 1, . . . , T ,


and the vertical axis represents the states, x = 0, 1, . . . , 6. Vertices (dots) denote
pairs (t , x), and lines (edges) between vertices represent possible transitions. For
instance, the line connecting (t , x) = (0, 6) with (t , x) = (1, 0) says that we can
move from x = 6 to x = 0 in one time step.
Let us first figure out the cost at the final time T . Since we do not know
in which final state, x (T ), we end up, we have to determine this cost for every
element of the state space. This cost we denote as V (x, T ), and clearly this is
simply the final cost, so V (x, T ) = K (x) = x 2 :
x= 6 36

25

16

x= 1 1

x= 0 0
0 1 T−1 T
Now that V (x, T ) is known, consider the optimal cost-to-go from t = T − 1
onwards. This optimal cost-to-go we denote by V (x, T − 1) and it is defined as
the minimal cost from t = T − 1 onwards if at time t = T − 1 we are at state
x (T − 1) = x. It satisfies
 
V (x, T − 1) = min L(x, u (T − 1)) + K ( x (T )) ,
u (T −1)∈{0,1}

because L(x, u (T − 1)) is the cost of the transition if we apply input u (T − 1),
and K ( x (T )) is the final cost. Since x (T ) = f (x, u (T − 1)), this final cost equals
V ( f (x, u (T − 1)), T ), so we can also write
 
V (x, T − 1) = min L(x, u) + V ( f (x, u), T ) .
u∈{0,1}

With V (x, T ) already established for all x, this minimization requires |U| =
|{1, 2}| = 2 inputs to explore at each state, and, hence, the total number of oper-
ations that this requires is of order |X| × |U|. The so determined V (x, T − 1),
together with V (x, T ), are shown here:
x= 6 1 36

25 25

16 16

9 9

4 4

x= 1 1 1

x= 0 0 0
0 1 T−1 T
92 3 DYNAMIC P ROGRAMMING

Along the way we also determined for each state x (T − 1) an optimal control
u ∗ (T − 1), indicated in the figure by the thick edges. Notice that none of the
states x (T −1) switch to x (T ) = 6. We can continue in this fashion and determine
backwards in time—for t = T −2, then t = T −3, etc., till t = 0—the optimal cost-
to-go from t onwards for any state x (t ) = x. At this stage, we exploit the principle
of optimality: since every tail of an optimal control is optimal, the optimal cost-
to-go V (x, t ), defined as the optimal cost from t onwards starting at x (t ) = x,
satisfies the equation:
 
V (x, t ) = min L(x, u) + V ( f (x, u), t + 1) .
u∈{0,1}

This equation expresses that the optimal cost from t onwards, starting at x (t ) =
x, is the cost of the transition, L(x, u), plus the optimal cost from t + 1 onwards.
Once V (x, t + 1) is known for all x, this easily gives us V (x, t ) for all x. For

T =5

we end up this way with the following complete solution:


x= 6 1 1 1 1 1 36

x= 5 2 2 2 2 25 25

x= 4 3 3 3 16 16 16

x= 3 4 4 9 9 9 9

x= 2 4 4 4 4 4 4

x= 1 1 1 1 1 1 1

x= 0 0 0 0 0 0 0
0 1 2 3 4 T=5
This solves the optimal control problem for every initial state x 0 . For some
initial states the optimal control sequence, u ∗ = ( u (0), u (1), u (2), u (3), u (4)), is
actually not unique. For instance, the control sequence shown here in red,
u ∗ = (1, 0, 1, 0, 0), is one of several optimal controls for x0 = 5. The optimal cost-
to-go V (x, t ) of course is unique. 

In general, in dynamic programming, we compute the optimal cost-to-go


V (x, t ) via the recursion
 
V (x, t ) = min L(x, u) + V ( f (x, u), t + 1) , t ∈ {0, 1, . . . , T − 1}, (3.5)
u∈U

starting at the final time where V (x, T ) = K (x) for all states, and then subse-
quently going backwards in time, t = T − 1, t = T − 2, . . . , until we reach t = 0.
In this way, the optimal control problem is split into T ordinary minimization
problems. To determine the final cost V (x, T ) = K (x) for all x ∈ X requires order
|X| operations. Then determining V (x, T − 1) for all x ∈ X requires |X| times the
3.4 H AMILTON -J ACOBI -B ELLMAN E QUATION 93

number of inputs |U| to explore, etc., and so the total number of operations over
all t ∈ {0, 1, . . . , T − 1} is of order
T × |U| × |X|.
If the number of states is modest or if T is large, then this typically outperforms
the naive approach (which requires order T × |U|T operations). Equation (3.5) is
called Bellman’s equation of dynamic programming.
In continuous time the same basic idea survives, except for the results
regarding computational complexity. Note that, in the continuous time case, the
optimization is over a set of input functions on the time interval [0, T ], which
is an infinite-dimensional space. Furthermore, it is clear that, contrary to the
discrete-time case, we will not be able to split the problem into a series of finite-
dimensional minimization problems.

3.4 Hamilton-Jacobi-Bellman Equation

We return to the continuous time. In dynamic programming in continuous time


we minimize all costs J [τ,T ] (x, u )—for all τ ∈ [0, T ] and all states x (τ) = x—and
not just the one cost J [0,T ] (x 0 , u ) that we are asked to minimize. To tackle this
problem, we will again exploit the principle of optimality, and we will again
need the notion of optimal cost-to-go, also known as the value function.

Definition 3.4.1 (Value function/optimal cost-to-go). Consider the optimal


control problem (3.1). The value function V : Rn × [0, T ] → R at state x and time
τ is defined as the optimal cost-to-go over time horizon [τ, T ] with initial state
x (τ) = x, that is,
V (x, τ) = inf J [τ,T ] (x, u ), (3.6)
u :[τ,T ]→U

with J [τ,T ] as defined in (3.2). 

In most cases of interest the infimum in (3.6) is attained by some u ∗ , in


which case the infimum (3.6) is a minimum. In general, though, a minimizer
need not exist, while the infimum does exist (but it might be ±∞).

Example 3.4.2 (Integrator with linear cost). Consider once again the optimal
control problem of Example 2.5.2:
ẋ (t ) = u (t ), x (0) = x0 , (3.7)
with bounded inputs
U = [−1, 1],
and with cost
1
J [0,1] (x 0 , u ) = x (t ) dt .
0
94 3 DYNAMIC P ROGRAMMING

From the fact that ẋ (t ) = u (t ) ∈ [−1, 1], it is immediate that the optimal control
is u ∗ (t ) = −1 and, hence, x ∗ (t ) = x 0 − t . Therefore, the value function at τ = 0 is
1
V (x 0 , 0) = J [0,1] (x 0 , u ∗ ) = x 0 − t dt = x 0 − 1/2.
0

Next, we determine the value function at the other time instances. It is easy to
see that u ∗ (t ) = −1 is optimal for J [τ,1] (x, u ) for every τ > 0 and every x (τ) = x.
Hence, in this case, x ∗ (t ) = x − (t − τ) and
1  1
V (x, τ) = x − (t − τ) dt = xt − 12 (t − τ)2 = x(1 − τ) − 12 (1 − τ)2 .
τ τ

As expected, the value function is zero at the final time τ = 1. It is not necessarily
monotonic in τ, see Fig. 3.3. Indeed, for x = 1/2, the value function is zero at
τ = 0 and at τ = 1, yet it is positive in between. 

1
(1.5, )

0.5 (1, )

(.5, )

0 T 1
(0, )

0.5
( .5, )

F IGURE 3.3: The value function V (x, τ) of the problem of Example 3.4.2 for
various x as a function of τ ∈ [0, 1].

Now it is time to derive, or rather motivate, the continuous-time version of


Bellman’s equation of dynamic programming (3.5). For any input u —optimal or
not—the cost-to-go from τ onwards equals the cost over [τ, τ + ] plus the cost
over the remaining [τ + , T ], that is
τ+
J [τ,T ] (x, u ) = L( x (t ), u (t )) dt + J [τ+,T ] ( x (τ + ), u ) (3.8)
τ

with initial state x (τ) = x. The value function is defined as the infimum of this
cost over all inputs. Suppose the infimum is attained by some input, i.e., that the
3.4 H AMILTON -J ACOBI -B ELLMAN E QUATION 95

infimum is a minimum. Taking the minimum over all u of the left- and right-
hand sides of (3.8) shows that
τ+
V (x, τ) = min L( x (t ), u (t )) dt + J [τ+,T ] ( x (τ + ), u ) .
u :[τ,T ]→U τ

By the principle of optimality, any optimal control over [τ, T ] is optimal for
J [τ+,T ] ( x (τ + ), u ) as well. The right-hand side of the above equality can thus
be simplified to
τ+
V (x, τ) = min L( x (t ), u (t )) dt + V ( x (τ + ), τ + )
u :[τ,τ+]→U τ

with initial state x (τ) = x. Notice that, in this last equation, we need only opti-
mize over inputs defined on the time window [τ, τ+] because the optimization
over the remaining time window [τ + , T ] is incorporated in the value function
V ( x (τ + ), τ + ). For further analysis, it is beneficial to move the term V (x, τ) to
the right-hand side and to scale the equation by ,
τ+
τ L( x (t ), u (t )) dt + V ( x (τ + ), τ + ) − V (x, τ)
0= min . (3.9)
u :[τ,τ+]→U 
In this form we can take the limit  → 0. It is plausible that functions u : [τ, τ+
] → U in the limit can be identified with constants u ∈ U, and that the differ-
ence between the two value functions in (3.9) converges for  → 0 to the total
derivative of V ( x (τ), τ) with respect to τ. Thus,

d V ( x (τ), τ)
0 = min L( x (τ), u) + (3.10)
u∈U dτ

for all τ ∈ [0, T ] and all x (τ) = x ∈ Rn . Incidentally, this identity is reminiscent of
the cost-to-go (B.14) as explained in Section B.5 of Appendix B. The total deriva-
tive of V ( x (τ), τ) with respect to τ is

d V ( x (τ), τ) ∂ V ( x (τ), τ) ∂ V ( x (τ), τ)


= f ( x (τ), u (τ)) + .
dτ ∂x T ∂τ
Inserting this into (3.10), and using u = u (τ), x = x (τ), we arrive at the partial
differential equation:

∂ V (x, τ) ∂ V (x, τ)
0 = min L(x, u) + f (x, u) +
u∈U ∂x T
∂τ

for all τ ∈ [0, T ] and all x ∈ Rn . The partial derivative of V (x, τ) with respect to
τ does not depend on u and so does not contribute to the minimization. This,
finally, brings us to the famous equation

∂ V (x, τ) ∂ V (x, τ)
+ min f (x, u) + L(x, u) = 0. (3.11)
∂τ u∈U ∂x T
96 3 DYNAMIC P ROGRAMMING

This equation is known as the Hamilton-Jacobi-Bellman equation—or just


HJB equation—because it extends the Hamilton-Jacobi equation from classi-
cal mechanics (see Lanczos (1986)).
What did we do so far? We made it plausible that the relation between the
value functions at neighboring points in state x and time τ is the partial differ-
ential equation (3.11). We need to stress here the word “plausible”, because we
have “derived” (3.11) only under several technical assumptions including exis-
tence of an optimal control, existence of a value function, and existence of some
limits. However, we can turn the analysis around, and show that (3.11) provides
sufficient conditions for optimality. This is the following theorem, and it is the
central result of this chapter. In this formulation, the time τ is called t again,
and the solution of the partial differential equation we denote by V , and not
V , because the solution V of the partial differential equation is not always the
value function (although in most cases it is).

Theorem 3.4.3 (Hamilton-Jacobi-Bellman equations). Consider the optimal


control problem (3.1). Suppose V : Rn × [0, T ] → R is a continuously differen-
tiable function that satisfies the partial differential equation

∂V (x, t ) ∂V (x, t )
+ min f (x, u) + L(x, u) = 0 (3.12a)
∂t u∈U ∂x T

for all x ∈ Rn and all t ∈ [0, T ], and that satisfies the final time condition

V (x, T ) = K (x) (3.12b)

for all x ∈ Rn . Then

1. V (x, τ) is a lower bound of the cost over [τ, T ] starting at x (τ) = x, that is,

J [τ,T ] (x, u ) ≥ V (x, τ) for every input u .

2. Suppose u ∗ : [0, T ] → U is such that the solution x of ẋ (t ) = f ( x (t ), u ∗ (t ))


with x (0) = x 0 is well defined on [0, T ], and that at almost every t ∈ [0, T ]
the vector u ∗ (t ) minimizes

∂V ( x (t ), t )
f ( x (t ), u) + L( x (t ), u)
∂x T
over all u ∈ U. Then u ∗ is a solution of the optimal control problem and
the optimal cost is

J [0,T ] (x 0 , u ∗ ) = V (x 0 , 0). (3.13)

Furthermore, in this case, V (x 0 , 0) equals the value function V (x 0 , 0). (Note


however that for other states and times, the solution V (x, t ) may differ
from the value function V (x, t ).)
3.4 H AMILTON -J ACOBI -B ELLMAN E QUATION 97

3. Suppose the minimization problem in (3.12a) for each x ∈ Rn and each t ∈


[0, T ] has a (possibly non-unique) solution u. Denote one such solution
as u (x, t ). If for every x ∈ Rn and every τ ∈ [0, T ] the solution x of ẋ (t ) =
f ( x (t ), u ( x (t ), t )) with initial condition x (τ) = x is well defined for all t ∈
[τ, T ], then V equals the value function,

V (x, t ) = V (x, t ) ∀x ∈ Rn , t ∈ [0, T ],

and u ∗ (t ) := u ( x (t ), t ) is an optimal control for J [τ,T ] (x, u ) for every x ∈


Rn , τ ∈ [0, T ].

Proof.

1. Let x (τ) = x. We have that


T
J [τ,T ] (x, u ) = L( x (t ), u (t )) dt + K ( x (T ))
τ
T
∂V ( x (t ), t )
= f ( x (t ), u (t )) + L( x (t ), u (t )) dt
τ ∂x T
T
∂V ( x (t ), t )
− f ( x (t ), u (t )) dt + K ( x (T ))
τ ∂x T
T
∂V ( x (t ), t )
≥ min f ( x (t ), u) + L( x (t ), u) dt (3.14)
τ u∈U ∂x T
T
∂V ( x (t ), t )
− f ( x (t ), u (t )) dt + K ( x (T ))
τ ∂x T
T
∂V ( x (t ), t ) ∂V ( x (t ), t )
= − − f ( x (t ), u (t )) dt + K ( x (T ))
τ ∂t ∂x T
T
dV ( x (t ), t )
=− dt + V ( x (T ), T )
τ dt
T
= − V ( x (t ), t ) τ + V ( x (T ), T ) = V ( x (τ), τ) = V (x, τ).

2. By assumption, x (t ) is well defined for all t ∈ [0, T ]. Let x = x 0 and


τ = 0. For the input u ∗ , the inequality in (3.14) is an equality. Hence,
J [0,T ] (x 0 , u ∗ ) = V (x 0 , 0), and we already established that no control
achieves a smaller cost.

3. Similar to part 2: by assumption for every τ ∈ [0, T ] and every x (τ) =


x ∈ Rn the solution x is well defined for all t ∈ [τ, T ]. For the input u ∗ ,
the inequality in (3.14) is an equality. Hence, the optimal cost equals
J [τ,T ] (x, u ∗ ) = V (x, τ) and it is attained by u ∗ . Since this holds for every
x ∈ Rn and every τ ∈ [0, T ] the function V (x, τ) equals the value function
V (x, τ) at every x ∈ Rn and every τ ∈ [0, T ].


98 3 DYNAMIC P ROGRAMMING

The reasoning in the proof of this theorem (especially Part 1) is very similar
to the one used by Caratheodory in his approach to the calculus of variations1 .
This approach was called the “royal road of the calculus of variations2 ”.
Parts 2 and 3 are technical but this is needed because the input found by
solving the minimization problem (3.12a) pointwise (for each x and each t ) does
not always give us an input u ( x (t ), t ) for which x (t ) is well defined for all t ∈
[0, T ], see Exercise 3.3(c) and Exercise 3.7(e). Such cases are ruled out in parts
2 and 3. In most applications this problem does not occur, and then the above
says that the so determined input is the optimal control and that V (x, t ) is the
value function V (x, t ).
Theorem 3.4.3 provides a sufficient condition for optimality: if we can solve
the Hamilton-Jacobi-Bellman equations (3.12) and if the conditions of Theo-
rem 3.4.3 are satisfied, then it is guaranteed that u ∗ is an optimal control. Recall,
on the other hand, that the conditions formulated in the minimum principle
(Theorem 2.5.1) are necessary for optimality. So in a sense, dynamic program-
ming and the minimum principle complement each other.
Another difference between the two methods is that an optimal control u ∗
derived from the minimum principle is given as a function of state x ∗ and
costate p ∗ , which, after solving the Hamiltonian equations, gives us u ∗ (t ) as a
function of time, while in dynamic programming the optimal control is given
in state feedback form, u (x, t ). Applying the feedback u (x, t ) to the system gives,
what is called, the closed-loop system3
ẋ ∗ (t ) = f ( x ∗ (t ), u ( x ∗ (t ), t )), x ∗ (0) = x0 ,
and its solution (if it exists) determines x ∗ (t ) and the optimal control
u ∗ (t ) := u ( x ∗ (t ), t ). In applications, the state feedback form is often preferred,
because its implementation is way more robust. For example, if the evolution
of the state is affected by disturbances, then the optimal control as a function
of time, u ∗ (t ), derived from the undisturbed case can easily be very differ-
ent from the true optimal control, whereas the optimal control given in state
feedback form, u ( x (t ), t ), will automatically keep track of possible disturbances
in the system dynamics. Most of the following examples exhibit this feedback
property.

Example 3.4.4 (Integrator with quadratic cost). Consider


ẋ (t ) = u (t ), x (0) = x0
1 C. Carathéodory. Variationsrechnung und partielle Differentialgleichungen erster Ordnung.

B.G. Teubner, Leipzig, 1935.


2 “Königsweg der Variationsrechnung” in H. Boerner, Caratheodorys Eingang zur variation-

srechnung, Jahresbericht der Deutschen Mathematiker Vereinigung, 56 (1953), 31—58; see H.J.
Pesch, Caratheodory’s royal road of the Calculus of Variations: Missed exits to the Maximum Prin-
ciple of Optimal Control Theory, AIMS.
3 Controlling a system with an input u (optimal or not) that depends on x is known as closed-

loop control, and the resulting system is known as the closed-loop system. Controlling the system
with a given time function u (t ) is called open-loop control.
3.4 H AMILTON -J ACOBI -B ELLMAN E QUATION 99

with cost
T
J [0,T ] (x 0 , u ) = x (T ) +
2
r u 2 (t ) dt
0

for some r > 0. We allow every input, that is, U = R. Then the HJB equa-
tions (3.12) become
∂V (x, t ) ∂V (x, t )
+ min u + r u 2 = 0, V (x, T ) = x 2 . (3.15)
∂t u∈R ∂x
Since the term to be minimized is quadratic in u (and r > 0), the optimal u is
where the derivative of ∂V∂x
(x,t )
u + r u 2 with respect to u is zero. This u depends
on x and t ,
1 ∂V (x, t )
u (x, t ) = − , (3.16)
2r ∂x
and thereby reduces the HJB equations (3.15) to
2
∂V (x, t ) 1 ∂V (x, t )
− = 0, V (x, T ) = x 2 .
∂t 4r ∂x
Motivated by the boundary condition we try a V (x, t ) that is quadratic in x for
all time, so of the form V (x, t ) = x 2 P (t ). (Granted, this is a magic step because
at this point it is not clear that a quadratic form works.) This way the HJB equa-
tions simplify to
1
x 2 Ṗ (t ) − (2xP (t ))2 = 0, x 2 P (t ) = x 2 .
4r
It has a common quadratic term x 2 . Canceling this quadratic term x 2 gives

Ṗ (t ) = P 2 (t )/r, P (T ) = 1.

This is an ordinary differential equation and its solution can be found with sep-
aration of variables. The solution is
r
P (t ) = .
r +T −t
It is well defined throughout t ∈ [0, T ] and, therefore,
r
V (x, t ) = x 2 (3.17)
r +T −t
is a solution of the HJB equations (3.15). Now that V (x, t ) is known we can com-
pute the optimal input (3.16). It is expressed in feedback form, i.e., depending
on x (t ) (as well as on t ),
1 ∂V ( x (t ), t ) 2 x (t )P (t ) x (t )
u ∗ (t ) = u ( x (t ), t ) = − =− =− .
2r ∂x 2r r +T −t
100 3 DYNAMIC P ROGRAMMING

The optimal state x ∗ therefore satisfies the closed-loop differential equation

x ∗ (t )
ẋ ∗ (t ) = u ∗ (t ) = − .
r +T −t
This is a linear differential equation which has a well-defined solution x ∗ (t )
for all t ∈ [0, T ] and all initial states, and, hence, also the above u ∗ (t ) is well
defined for all t ∈ [0, T ]. This, finally, allows us to conclude that (3.17) is the
value function, that the above u ∗ is the optimal input, and that the optimal cost
is J [0,T ] (x 0 , u ∗ ) = V (x 0 , 0) = x 02 /(1 + T /r ). 

The next example is a minor variation of the previous example.

Example 3.4.5 (Quadratic control). Consider the linear system

ẋ (t ) = u (t ), x (0) = x0

with U = R and cost


T
J [0,T ] (x 0 , u ) = x 2 (t ) + ρ 2 u 2 (t ) dt
0

for some ρ > 0. For this problem, the HJB equations (3.12) are

∂V (x, t ) ∂V (x, t )
+ min u + x 2 + ρ 2 u 2 = 0, V (x, T ) = 0.
∂t u∈R ∂x

The term to be minimized is quadratic in u. Hence, it is minimal only if the


derivative with respect to u is zero. This gives

1 ∂V (x, t )
u=− .
2ρ 2 ∂x

So, we can rewrite the HJB equations as


2
∂V (x, t ) 1 ∂V (x, t )
+ x2 − 2 = 0, V (x, T ) = 0. (3.18)
∂t 4ρ ∂x

This is a nonlinear partial differential equation, and this might be complicated.


But it has an interesting physical dimension4 property, which implies that

V (x, t ) = P (t )x 2 .

4 Outside the scope of this book, but still: let [x] denote the dimension of a quantity x. For

example, [t ] = time. From ẋ = u , it follows that [u] = [x][t ]−1 . Also, the expression x 2 + ρ 2 u 2
implies that ρ 2 u 2 has the same dimension as x 2 . Hence, [ρ] = [t ], and then [V ] = [J ] = [x]2 [t ].
This suggests that V (x, t ) = x 2 P (t ). In fact, application of the Buckingham π-theorem (not part of
this course) shows that V (x, t ) must have the form V (x, t ) = x 2 ρG((t −T )/ρ) for some dimension-
less function G : R → R.
3.4 H AMILTON -J ACOBI -B ELLMAN E QUATION 101

This form turns (3.18) into


1
Ṗ (t )x 2 + x 2 − (2P (t )x)2 = 0, P (T ) = 0.
4ρ 2

It has a common factor x 2 . Division by x 2 yields the ordinary differential equa-


tion
1 2
Ṗ (t ) + 1 − P (t ) = 0, P (T ) = 0.
ρ2
This type of differential equation is discussed at length in the next chapter. The
solution is

(3.19)

So for this P (t ), the function V (x, t ) := P (t )x 2 solves the HJB equations (3.18).
x (t ),t )
The candidate optimal control thus takes the form u ( x (t ), t ) = − 2ρ1 2 ∂V (∂x =
− ρ12 P (t ) x (t ), and the candidate optimal state satisfies the linear time-varying
differential equation
1
ẋ ∗ (t ) = u ( x ∗ (t ), t ) = − P (t ) x ∗ (t ), x ∗ (0) = x0 .
ρ2
Since P (t ) is well defined and bounded, it is clear that the solution x ∗ (t ) is well
defined for all t ∈ [0, T ]. In fact the solution is
1
t
− P (τ) dτ
x ∗ (t ) = e ρ2 0
x0 .

Having a well-defined solution for all t ∈ [0, T ] allows us to conclude that x ∗ (t )


is the optimal state, that
1
u ∗ (t ) := − P (t ) x ∗ (t )
ρ2

is the optimal control, and that V (x 0 , 0) = P (0)x 02 is the optimal cost. 

Example 3.4.6 (Quartic control). This is an uncommon application, but inter-


esting. We again consider the integrator system ẋ (t ) = u (t ), x (0) = x 0 , but now
with the cost equal to a sum of quartics
T
J [0,T ] (x 0 , u ) = x 4 (t ) + u 4 (t ) dt .
0

Again we assume that the input is not restricted: U = R. The HJB equations
become
∂V (x, t ) ∂V (x, t )
+ min u + x 4 + u 4 = 0, V (x, T ) = 0.
∂t u∈R ∂x
102 3 DYNAMIC P ROGRAMMING

Encouraged by the previous example, we try a V (x, t ) of the form

V (x, t ) = x 4 P (t ). (3.20)

(We will soon see that this form works.) Substitution of this form in the HJB
equations yields
 
x 4 Ṗ (t ) + min 4x 3 P (t )u + x 4 + u 4 = 0, x 4 P (T ) = 0.
u∈R

The minimizing u is u = − 3 P (t )x. This can be obtained by setting the gradient


of 4x 3 P (t )u + x 4 + u 4 with respect to u equal to zero (verify this yourself ). This
reduces the HJB equations to

x 4 Ṗ (t ) − 4x 4 P 4/3 (t ) + x 4 + x 4 P 4/3 (t ) = 0, x 4 P (T ) = 0.

Canceling the common factor x 4 leaves us with

Ṗ (t ) = 3P 4/3 (t ) − 1, P (T ) = 0. (3.21)

The equation here is a simple first-order differential equation, except that no


closed-form solution appears to be known. The graph of the solution (obtained
numerically) is

Clearly, P (t ) is well defined and bounded on (−∞, T ]. This shows that the HJB
equations have a solution of the quartic form (3.20). As t → −∞ the solution
P (t ) converges to the equilibrium solution where 0 = 3P 4/3 − 1, i.e., P = 3−3/4 ≈
0.43869. For now the function V (x, t ) = x 4 P (t ) is just a candidate value function.
The resulting candidate optimal control

u ∗ (t ) = − 3 P (t ) x (t ) (3.22)

is linear in x (t ), and thus the optimal closed-loop system is linear as well,



ẋ ∗ (t ) = − 3 P (t ) x ∗ (t ), x ∗ (0) = x0 .

Since P (t ) is bounded, also − 3 P (t ) is bounded. Therefore the closed-loop sys-


tem has a well-defined solution x ∗ (t ) for every initial condition x 0 and all
t ∈ [0, T ]. We thus conclude that V (x, t ) = x 4 P (t ) is the value function, that (3.22)
is the optimal control and that x 04 P (0) is the optimal cost. 

In the above examples, the functions V all turned out to be true value func-
tions: V = V . We need to stress that examples exist where this is not the case,
see Exercise 3.3(c). The next example is one where U is bounded (while again
V = V ).
3.4 H AMILTON -J ACOBI -B ELLMAN E QUATION 103

Example 3.4.7 (Example 3.4.2 extended). We consider the system of Exam-


ple 3.4.2, that is, ẋ (t ) = u (t ), x (0) = x 0 , with the input taking values in U = [−1, 1].
The cost, however, we extend with a final cost,
T
J [0,T ] (x 0 , u ) = x (t ) dt − α x (T ).
0

We assume that α > 0. The HJB equations (3.12) become

∂V (x, t ) ∂V (x, t )
+ min u + x = 0, V (x, T ) = −αx.
∂t u∈[−1,1] ∂x

The function to be minimized, ∂V∂x (x,t )


u + x, is linear in u. So the minimum is
attained at one of the boundaries of [−1, 1]. One way to proceed would be to
analyze the HJB equations for the two cases u = ±1. But the equations are partial
differential equations and these are often very hard to solve. We take another
route: in Example 3.4.2 we analyzed a similar problem and ended up with a
value function of the form

V (x, t ) = xP (t ) + Q(t )

for certain functions P (t ),Q(t ). We will demonstrate that this form also works
for the present problem. The HJB equations for this form simplify to

x Ṗ (t ) + Q̇(t ) + min (P (t )u + x) = 0, xP (T ) + Q(T ) = −αx.


u∈[−1,1]

This has to hold for all x and all t so the HJB equations hold iff

Ṗ (t ) = −1, Q̇(t ) = − min P (t )u, P (T ) = −α, Q(T ) = 0. (3.23)


u∈[−1,1]

This settles P (t ):

Thus, P (t ) is positive for t < T −α and negative for t > T −α. The u ∈ [−1, 1] that
minimizes P (t )u + x hence is

−1 if t < T − α
u ∗ (t ) = . (3.24)
+1 if t > T − α
104 3 DYNAMIC P ROGRAMMING

This, in turn, specializes the differential equation for Q(t ) as given in (3.23)
to

Since Q(T ) = 0, it follows that

This function is continuously differentiable. Now all conditions of (3.23) are met
and, therefore, V (x, t ) := xP (t ) +Q(t ) satisfies the HJB equations. Along the way,
we also determined the candidate optimal input: (3.24). Clearly, for this input,
the solution x ∗ (t ) of the closed loop ẋ ∗ (t ) = u ∗ (t ), x ∗ (0) = x 0 is well defined for
all t ∈ [0, T ]. Hence, (3.24) is the optimal input, the above V (x, t ) is the value
function, and V (x 0 , 0) = x 0 P (0) + Q(0) is the optimal cost.
Does it agree with the minimum principle? The Hamiltonian is H (x, p, u) =
pu + x, and so the Hamiltonian equation for the costate is ṗ ∗ (t ) = −1, p ∗ (T ) =
−α. Clearly, this means that p ∗ (t ) = T −α−t . Now the input u ∗ (t ) that minimizes
the Hamiltonian

H ( x ∗ (t ), p ∗ (t ), u) = (T − α − t )u + x ∗ (t )

agrees with what we found earlier: (3.24). But, of course, the fundamental differ-
ence is that the minimum principle assumes the existence of an optimal control,
whereas satisfaction of the HJB equations proves that the control is optimal. 

The examples might give the impression that dynamic programming is


superior to the minimum principle. In applications, however, it is often the
other way around. The thing is that the equations needed in the minimum
principle (i.e., the Hamiltonian equations) are ordinary differential equations,
and numerical routines exist that are quite efficient in solving these equations.
The HJB equations, in contrast, consist of a partial differential equation and a
boundary condition. Furthermore, the standard HJB theory requires a higher
degree of smoothness than the minimum principle.
For restricted sets such as U = [0, 1], the value function often is continuously
differentiable everywhere (see Example 3.4.7 and Exercise 3.7) but there are also
cases where it is not even differentiable everywhere (let alone continuously dif-
ferentiable). Here is an example:
3.5 C ONNECTION WITH THE M INIMUM P RINCIPLE 105

Example 3.4.8 (Non-smooth value functions). Let

ẋ (t ) = − x (t ) u (t ), x (0) = x0 , U = [0, 1],

and take J [0,T ] (x 0 , u ) = x (T ). So, we have to make x (T ) as small (negative) as


possible. From the differential equation we can deduce that one optimal input
as a function of state x and time t is

1 if x > 0
u (x, t ) = ,
0 if x ≤ 0

and then the value function follows as



et −T x if x > 0
V (x, t ) = x (T ) = .
x if x ≤ 0

This value function is not continuously differentiable with respect to x at x =


0, and, therefore, the standard theory does not apply. It does satisfy the HJB
equation (3.12a) at all x where it is continuously differentiable (at all x = 0):

∂ V (x, t ) ∂ V (x, t ) et −T x + et −T (−x) = 0 if x > 0
+ min (−xu) = .
∂t u∈[0,1] ∂x 0 + 0 = 0 if x < 0

3.5 Connection with the Minimum Principle

In Chapter 2, we claimed that the initial costate p ∗ (0) measures the sensitivity of
the optimal cost with respect to changes in the initial state (end of § 2.3, see also
Example 2.6.3). This connection can now be proved. In fact, it is a by-product of
a more general connection between the solution of the HJB equations (assum-
ing it equals the value function) and the costate of the minimum principle. First
of all, we note that the HJB equation (3.12a) can be expressed in terms of the
Hamiltonian H (x, p, u) := p T f (x, u) + L(x, u) as

∂V (x, t ) ∂V (x, t )
+ min H x, , u = 0,
∂t u∈U ∂x
(whence the name Hamilton-Jacobi). This suggests that the costate is closely
related to ∂V (x, t )/∂x. In fact, since we know that p ∗ (T ) = ∂K ( x ∗ (T ))/∂x =
∂V ( x ∗ (T ), T )/∂x we conjecture that

∂V ( x ∗ (t ), t )
p ∗ (t ) = for all t ∈ [0, T ].
∂x
Under mild assumptions that is indeed the case. To avoid technicalities, we
derive this connection only for U = Rm and value functions that are C 2 .
106 3 DYNAMIC P ROGRAMMING

Theorem 3.5.1 (Connection between costate and value function). Assume


f (x, u), L(x, u), K (x) are all C 1 . Let U = Rm and suppose there is a C 2 function
V : Rn × [0, T ] → R that satisfies the HJB equations
∂V (x, t ) ∂V (x, t )
+ min H x, , u = 0, V (x, T ) = K (x)
∂t u∈U ∂x
for all x ∈ Rn and all t ∈ [0, T ]. Denote, for each x, t , one possible minimizer
as u ∗ (x, t ), and assume that all conditions of Theorem 3.4.3 are satisfied. In
particular that V (x, t ) equals the value function V (x, t ) and that the differen-
tial equation ẋ ∗ (t ) = f ( x ∗ (t ), u ∗ ( x ∗ (t ), t )) has a well-defined solution x ∗ (t ) for
all t ∈ [τ, T ] and all x (τ) ∈ Rn . Then p ∗ (t ) defined as
∂V ( x ∗ (t ), t )
p ∗ (t ) = (3.25)
∂x
is the solution of the Hamiltonian costate equation
∂H ( x ∗ (t ), p ∗ (t ), u ∗ ( x ∗ (t ), t )) ∂K ( x ∗ (T ))
ṗ ∗ (t ) = − , p ∗ (T ) = . (3.26)
∂x ∂x
In particular, p ∗ (0) = ∂V (x 0 , 0)/∂x 0 .

Proof. Let H (x, p, u) = p T f (x, u) + L(x, u). By definition, the minimizing u ∗ (x, t )
satisfies the HJB equation
∂V (x, t ) ∂V (x, t )
+ H x, , u ∗ (x, t ) = 0.
∂t ∂x
In the rest of this proof, we drop all function arguments. The partial derivative
of the previous expression with respect to (row vector) x T yields

∂2V ∂H ∂H ∂2V ∂H ∂ u ∗
+ T+ + = 0.
∂x ∂t ∂x
T
∂p T ∂x T ∂x ∂u T∂x T
0

The underbraced term is zero because u ∗ minimizes the Hamiltonian. Using


this expression and the fact that ∂H
∂p = f , we find that

d ∂V ( x (t ), t ) ∂2V ∂2V ∂2V ∂2V ∂H ∂H


= + f = + =− .
dt ∂x ∂t ∂x ∂x∂x T ∂t ∂x ∂x T ∂x ∂p ∂x
∂V ( x (T ),T ) ∂K ( x (T ))
Because V (x, T ) = K (x) for all x, we also have ∂x = ∂x . Hence,
p ∗ (t ) := ∂V ( x∂x∗ (t ),t ) satisfies the costate equation (3.26) for all time. ■

Example 3.5.2. We apply Theorem 3.5.1 to the optimal control problem of


Example 3.4.5. For simplicity, we take ρ = T = 1. Then Example 3.4.5 says that

e1−t − et −1
V (x, t ) = x 2 P (t ) where P (t ) = .
e1−t + et −1
3.6 I NFINITE H ORIZON O PTIMAL C ONTROL AND LYAPUNOV F UNCTIONS 107

Using this and the formula for x ∗ (t ) (determined in Example 2.5.4), we find that

∂V ( x ∗ (t ), t )
p ∗ (t ) = = 2 x ∗ (t )P (t )
∂x
x 0  1−t  1−t − et −1
t −1 e
=2 e + e
e + e−1 e1−t + et −1
x 0  1−t t −1

=2 e −e .
e + e−1
This equals the p ∗ (t ) as determined in Example 2.5.4. 

3.6 Infinite Horizon Optimal Control and Lyapunov Functions

For infinite horizon optimal control problems, there is an interesting connec-


tion with Lyapunov functions and stabilizing inputs. Given a system of differen-
tial equations and a class of inputs

ẋ (t ) = f ( x (t ), u (t )), x (0) = x0 , u : [0, ∞) → U, (3.27a)

the infinite horizon optimal control problem is to minimize over all inputs the
infinite horizon cost
∞
J [0,∞) (x 0 , u ) := L( x (t ), u (t )) dt . (3.27b)
0

The only difference with the previous formulation is the cost function. The inte-
gral that defines the cost is now over all t > 0, and the “final” cost K ( x (∞)) has
been dropped because in applications we normally send the state to a unique
equilibrium x (∞) := limt →∞ x (t ), and thus all such controls achieve the same
final cost (i.e., the final cost would not affect the optimal control).
As before we define the value function as

V (x, τ) = inf J [τ,∞) (x, u ) (3.28)


u :[τ,∞)→U
∞
in which J [τ,∞) (x, u ) := τ L( x (t ), u (t )) dt for x (τ) = x. Because of the infi-
nite horizon, however, the value function no longer depends on τ (see Exer-
cise 3.14(a)), and so we can simply write

V (x) = inf J [0,∞) (x, u ). (3.29)


u :[0,∞)→U

The derivative of V (x) with respect to time vanishes, and thus the HJB equa-
tion (3.12a) simplifies to

∂V (x)
min f (x, u) + L(x, u) = 0. (3.30)
u∈U ∂x T
108 3 DYNAMIC P ROGRAMMING

As we will soon see, this equation typically has more than one solution V , and,
clearly, at most one of them will be the value function V . The next example sug-
gests that the “right” solution gives us a stabilizing input 5 , and a Lyapunov func-
tion for that equilibrium.

Example 3.6.1 (Quartic control—design of optimal stabilizing inputs and


Lyapunov function). Consider the infinite horizon optimal control problem
with
∞
ẋ (t ) = u (t ), x (0) = x0 , U = R, J [0,∞) (x0 , u ) = x 4 (t ) + u 4 (t ) dt .
0

For this problem the infinite horizon HJB equation (3.30) is


∂V (x)
min u + x 4 + u 4 = 0.
u∈R ∂x
The solution u of the above minimization problem is
1/3
1 ∂V (x)
u=− (3.31)
4 ∂x
(verify this yourself ), and then the HJB equation becomes
1  1 1/3 ∂V (x) 4/3
−1 + x 4 = 0.
4 4 ∂x
This looks rather ugly but actually it says that
∂V (x)
= ±4(3−3/4 )x 3 ,
∂x
and, therefore, all possible solutions are

Vall (x) = ±3−3/4 x 4 + d (3.32)

with d some integration constant. Which Vall (x) gives us a stabilizing


input (3.31)? First of all, we can take d = 0 without loss of generality in the sense
that it does not affect the control (3.31). Since L(x, u) = x 4 + u 4 ≥ 0 we know that
the value function V defined in (3.29) is nonnegative; hence we consider the
nonnegative option of (3.32), that is,

V (x) = 3−3/4 x 4 .

We claim that this V is a Lyapunov function for the equilibrium x̄ = 0 of the


closed-loop system ẋ (t ) = u (t ) for the control equal to the candidate optimal
control defined in (3.31),
 1 ∂V ( x (t )) 1/3
u ∗ (t ) = − = −3−1/4 x (t ).
4 ∂x

5 A stabilizing input is an input that steers the state to a given equilibrium. Better would have

been to call it “asymptotically stabilizing” input or “attracting” input, but “stabilizing” is the stan-
dard in the literature.
3.6 I NFINITE H ORIZON O PTIMAL C ONTROL AND LYAPUNOV F UNCTIONS 109

Indeed, x̄ = 0 is an equilibrium of the closed-loop system, and V clearly is C 1


and positive definite, and by construction the HJB equation (3.30) gives us that
∂V ( x (t ))
V̇ ( x (t )) = f ( x (t ), u ∗ (t )) = −L( x (t ), u ∗ (t )) = −( x 4 (t ) + u 4∗ (t )) (3.33)
∂x T
which is < 0 for all x (t ) = 0. Hence V is a strong Lyapunov function of the closed-
loop system with equilibrium x̄ = 0 and, therefore, it is asymptotically stable at
x̄ = 0 (see Theorem B.3.2). The closed-loop system is even globally asymptot-
ically stable because all conditions of Theorem B.3.5 (p. 216) are met. For this
reason the control input u ∗ is called a stabilizing input. In fact it is the input
that minimizes the cost J [0,∞) (x 0 , u ) over all inputs that stabilize the system!
Indeed, for every input u that steers the state to zero we have the inequality
∞
J [0,∞) (x 0 , u ) = L( x (t ), u (t )) dt
0
∞
∂V ( x (t ))
≥ − f ( x (t ), u (t )) dt because of (3.30)
∂x T
0∞
= −V̇ ( x (t )) dt = V (x 0 ) − V ( x (∞)) = V (x 0 ) = 3−3/4 x 04 ,
0   
0

while in view of (3.33) equality holds if u = u ∗ . 

In this example we demonstrated that u (t ) := −3−1/4 x (t ) minimizes the cost


J [0,∞) (x 0 , u ) over all stabilizing inputs, not necessarily over all inputs (see, how-
ever, Exercise 18). In applications closed-loop stability is such a crucial property
that we prefer to consider only stabilizing inputs. We thus define:

Definition 3.6.2 (Optimal control with stability). Given a system ẋ (t ) =


f ( x (t ), u (t )), x (0) = x 0 and candidate closed-loop equilibrium x̄, the infinite
horizon optimal control problem with stability is to minimize J [0,∞) (x 0 , u ) over
all inputs u : [0, ∞) → U that stabilize the system (meaning the solution x (t ) of
ẋ (t ) = f ( x (t ), u (t )), x (0) = x0 is defined for all t > 0, and limt →∞ x (t ) = x̄). 

The following proposition is a generalization of the previous example. We


assume in this result that the equilibrium is the origin, x̄ = 0.

Proposition 3.6.3 (Optimal control with stability & Lyapunov functions). Con-
sider the optimal control problem (3.27), and assume that f (x, u) and L(x, u) are
C 1 and that f (0, 0) = 0, L(0, 0) = 0, and L(x, u) ≥ 0 for all x ∈ Rn , u ∈ U. Then

1. V (0) = 0, and V (x) ≥ 0 for all x = 0. (Possibly V (x) = ∞.)

2. Suppose V is a C 1 solution of the HJB equation (3.30) and that V (0) = 0


and V (x) > 0 for all x = 0. Let û (x) be a minimizer of (3.30), i.e.,
∂V (x)
f (x, û (x)) + L(x, û (x)) = 0 ∀x ∈ Rn .
∂x T
110 3 DYNAMIC P ROGRAMMING

If f (x, û (x)) is Lipschitz continuous on a neighborhood of x̄ = 0 then the


closed-loop system ẋ (t ) = f ( x (t ), û ( x (t )), x (0) = x 0 has as a well defined
solution x (t ) for all x 0 sufficiently close to x̄ = 0 for all t > 0, and the
closed-loop system at equilibrium x̄ = 0 is stable.

3. Suppose, in addition, that L(x, û (x)) > 0 for all x = 0. Then the closed-loop
system is asymptotically stable, and for all x 0 sufficiently close to x̄ = 0 the
input u ∗ (t ) := û ( x (t )) solves the infinite horizon optimal control problem
with stability. Moreover the optimal cost then equals V (x 0 ).

Proof. This proof refers to several definitions and results from Appendix B.
1. Since L(x, u) ≥ 0 it is immediate that V (x) ≥ 0. Also V (0) = 0 because for
x 0 = 0 the control u (t ) = 0 achieves x (t ) = 0 for all time, and, hence,
L( x (t ), u (t )) = 0 for all time.

2. Lipschitz continuity assures existence and uniqueness of the solution x (t )


(for x 0 sufficiently close to 0), see Theorem B.1.3. The function V is a
Lyapunov function for the equilibrium x̄ = 0 because it is C 1 and positive
definite (by assumption), and V̇ (x) ≤ 0 because
∂V ( x (t ))
V̇ ( x (t )) = f ( x (t ), û ( x (t ))) = −L( x (t ), û ( x (t )) ≤ 0.
∂x T
Theorem B.3.2 now guarantees that the equilibrium is stable.

3. Then V is a strong Lyapunov function, so then the equilibrium is asymp-


totically stable (Theorem B.3.2). As in the previous example we have for
every stabilizing input u that
∞
J [0,∞) (x 0 , u ) = L( x (t ), u (t )) dt
0∞
∂V ( x (t ))
≥ − f ( x (t ), u (t )) dt because of (3.30)
∂x T
0∞
= −V̇ ( x (t )) dt = V (x 0 ) − V ( x (∞)) = V (x 0 ),
0   
0

and equality holds if u (t ) = û ( x (t )). ■

3.7 Exercises

3.1 Maximization. Consider the system

ẋ (t ) = f ( x (t ), u (t )), x (0) = x0 .
We want to maximize the cost
T
L 0 ( x (t ), u (t )) dt + K 0 ( x (T )).
0
3.7 E XERCISES 111

Find a new cost such that the maximization problem becomes a mini-
mization problem, in the sense that an input u solves the minimization
problem iff it solves the maximization problem. Also comment how the
associated two costs are related? (Note: this exercise is trivial.)

3.2 An optimal control problem that has no solution. Not every optimal con-
trol problem is solvable. Consider the system ẋ (t ) = u (t ), x 0 = 1 with cost
1
J [0,T ] (x 0 , u ) = x 2 (t ) dt ,
0
and U = R.

(a) Determine the value function (from the definition, not from the HJB
equations).
(b) Show that the value function does not satisfy the HJB equations.

3.3 The solution V need not equal the value function V . Consider again the
optimal control problem of Exercise 2.1:
ẋ (t ) = x (t ) u (t ), x (0) = x0
with cost function
T
J [0,T ] (x 0 , u ) = x 2 (t ) + u 2 (t ) dt + 2 x (T ),
0
and with the input free to choose: U = R.

(a) Determine a solution V (x, t ) of (3.12), and a candidate optimal con-


trol u ∗ ( x (t ), t ) (possibly still depending on x (t )).
[Hint: assume that V (x, t ) does not depend on t , i.e., that it has the
form V (x, t ) = Q(x) for some function Q.]
(b) Now let x 0 = 1 and T > 0. Show that V (x 0 , 0) is the optimal cost and
determine the optimal control u ∗ (t ) explicitly as a function of time.
[Hint: have look at Example B.1.5.]
(c) Now let x 0 = −1 and T = 2. Show that V (x, t ) and u ∗ ( x (t ), t ) are not
the value function and not the optimal input! (In other words: what
condition of Theorem 3.4.3 fails here?)

3.4 Direct solution. Even though dynamic programming and the HJB
equations are powerful concepts, we should always aim for simpler
approaches. Consider the system
ẋ (t ) = u (t ), x (0) = x0
T
with cost function J [0,T ] (x 0 , u ) = 0 x 2 (t ) dt . The problem is to minimize
this with bounded inputs
0 ≤ u (t ) ≤ 1.
112 3 DYNAMIC P ROGRAMMING

(a) Use your common sense to solve the problem for x 0 ≥ 0.


(b) What is the cost for the optimal control found in (a)?
(c) Use (b) to find a candidate solution V (x, t ) of the HJB equations for
x ≥ 0. Verify that this candidate solution satisfies (3.12) for x > 0.
(d) Use your common sense to solve the minimization problem for x 0 <
0. What are the minimal costs now?

3.5 Economic application. The capital x (t ) ≥ 0 of an economy at any moment


t is divided into two parts: u (t ) x (t ) and (1 − u (t )) x (t ), with

u (t ) ∈ [0, 1].

The first part, u (t ) x (t ), is for investments and contributes to the increase


of capital according to the formula

ẋ (t ) = u (t ) x (t ), x (0) > 0.

The other part, (1 − u (t )) x (t ), is for consumption and is evaluated by the


“satisfaction”
3
J [0,3] (x 0 , u ) := − x (3) +
ˆ (1 − u (t )) x (t ) dt .
0

We want to maximize the satisfaction.

(a) Let V be a function of the form V (x, t ) = Q(t )x, and with it determine
the HJB equations.
(b) Express the candidate optimal u ∗ (t ) as a function of Q(t ) [Hint: x (t )
is always positive].
(c) Determine Q(t ) for all t ∈ [0, 3].
(d) Determine the optimal u ∗ (t ) explicitly as a function of time, and
argue that this is the true optimal control (so not just the candidate
optimal control).
(e) What is the maximal satisfaction Jˆ[0,3] (x 0 , u ∗ )?

3.6 Weird problem. Consider the system

ẋ (t ) = x (t ) + u (t ), x (0) = x0 , U=R

with cost
T
J [0,T ] (x 0 , u ) = 12 x 2 (T ) + − x 2 (t ) − x (t ) u (t ) dt . (3.34)
0

(a) Solve the HJB equations. [Hint: try the special form V (x, t ) = Q(x).]
(b) Determine an optimal input u (t ).
3.7 E XERCISES 113

(c) Determine the optimal cost.


(d) Show directly from (3.34) that every input results in the same cost,
J [0,T ] (x 0 , u ) = 12 x 02 . [Hint: use that u (t ) = ẋ (t ) − x (t ).]
(e) About the technicalities of Theorem 3.4.3. This optimal control prob-
lem is a good one to illustrate why we have to be so technical in parts
2 and 3 of Theorem 3.4.3. The HJB equation (3.12a) for this problem
is that minu∈R 0 = 0 for every x, t . Clearly this means that every u (x, t )
solves the HJB equation (3.12a), for instance

−x + T 1−t if 0 ≤ t < T
u (x, t ) = ,
0 if t = T

and

u (x, t ) = −x − x 2 ,
and

1 if t is a rational number
u (x, t ) = .
0 if t is an irrational number

Why are these three choices problematic? (For the second input
Example B.1.5 may be useful.)

3.7 Value function. Let T > 0 and consider the system with bounded input

ẋ (t ) = u (t ), x (0) = x0 , u (t ) ∈ [−1, 1],


and define the family of costs

J [τ,T ] ( x (τ), u ) = x 2 (T ), τ ∈ [0, T ].

(a) Argue that




⎨+1 if x (t ) < 0

u ∗ (t ) := 0 if x (t ) = 0


⎩−1 if x (t ) > 0

is an optimal control for J [τ,T ] ( x (τ), u ) for every τ ∈ [0, T ] and x (τ).
(b) Use the above optimal input to determine the value function V (x, t ).
(Use the definition of value function, do not use the HJB equations.)
(c) Verify that this V (x, t ) satisfies the HJB equations (3.12).

3.8 Quartic control. Consider the problem of Example 3.4.6 on quartic con-
trol. Argue that
∂V ( x ∗ (t ), t )
u ∗ (t ) + x 4∗ (t ) + u 4∗ (t )
∂x
equals x 4∗ (T ) for all t ∈ [0, T ].
114 3 DYNAMIC P ROGRAMMING

3.9 Exploiting physical dimensions. Consider the nonlinear system

ẋ (t ) = x (t ) u (t ), x (0) = x0 , U = R,
with a quadratic cost
T
J [0,T ] (x 0 , u ) = x 2 (t ) + ρ 2 u 2 (t ) dt
0

that depends on a positive number ρ. The HJB equation (3.12a) is a partial


differential equation in V (x, t ) which is in general hard to solve. However,
by exploiting physical dimensions one can sometimes get an idea of the
form of V (x, t ). With some basic knowledge of dimensional analysis this
goes as follows: let [x] be the dimension of a quantity x, for instance [t ]
is time. The identity ẋ = xu implies that u has dimension [ u ] = [t ]−1 . Fur-
thermore, in order for the sum x 2 + ρ 2 u 2 to be well defined, we need that
x 2 and ρ 2 u 2 have the same dimension. Hence [ρ] = [x][t ], and then we
have [V ] = [J ] = [x]2 [t ]. Thus, both V /(xρ) and (t − T )x/ρ are dimension-
less. The Buckingham π-theorem (not covered in this book) claims that
V /(xρ) is a function of (t − T )x/ρ. That is to say, it claims that the value
function V (x, t ) for this problem must be of the special form
 )x 
V (x, t ) = xρG (t −Tρ

for some function G : R → R. We can verify that this is indeed correct:

(a) Show that the HJB equations (3.12) for this form reduce to an ordi-
nary differential equation with “initial” condition,
 2
G  (z) + 1 − 14 G(z) + zG  (z) = 0, G(0) = 0.

An analytic solution of this ODE is difficult to obtain. Figure 3.4 shows


the graphs of G(z) and G(z) + zG  (z) obtained numerically. Both G(z) and
G(z) + zG  (z) converge to +2 as z → −∞ and to −2 as z → ∞.

(b) Express the optimal control u ∗ (t ) in terms of x (t ), ρ,G and z :=(t −


T ) x (t )ρ.
(c) Argue, using the graphs, that the closed-loop system ẋ ∗ (t ) =
x ∗ (t ) u ∗ (t ), x ∗ (0) = x0 has a well-defined solution on [0, T ], and
then conclude that the optimal cost is V (x 0 , 0) = x 0 ρG(−T x 0 /ρ).

3.10 Weierstrass necessary condition. As we saw in Example 2.4.1, the calculus


of variations problem with free endpoint equals the optimal control prob-
lem with

ẋ (t ) = u (t ), x (0) = x0 , U = R, L(x, u) = F (x, u).

In what follows we assume that the solution V (x, t ) of the HJB equa-
tions (3.12) exists, and that it is the value function, and that the optimal
x ∗ , u ∗ are sufficiently smooth.
3.7 E XERCISES 115

G(z) zG (z)
2
G(z)
0
2 2 z

F IGURE 3.4: Graphs of G(z) and G(z) + zG (z) of the solution of the differ-
2
ential equation G  (z) + 1 − 14 G(z) + zG  (z) = 0, G(0) = 0. See Exercise 3.9.

(a) Determine the HJB equations (3.12) for this calculus of variations
problem formulated as an optimal control problem.
(b) Show that
∂V ( x ∗ (t ), t ) ∂F ( x ∗ (t ), ẋ ∗ (t ))
+ = 0.
∂x ∂ẋ

(c) Show that


∂F ( x ∗ (t ), ẋ ∗ (t ))
F ( x ∗ (t ), u)−F ( x ∗ (t ), ẋ ∗ (t ))−(u − ẋ ∗ (t ))T ≥ 0 (3.35)
∂ẋ
for all u ∈ Rn and all t ∈ (0, T ).

Inequality (3.35) is known as the Weierstrass necessary condition for opti-


mality of calculus of variations problems.

3.11 Optimal capacitor charging. In Exercise 2.9 we determined a voltage input


u that charges a capacitor from zero voltage, x (0) = 0, to a desired voltage,
x (T ) = 1, with minimal energy loss. We continue with this problem. As
in Exercise 2.9, we take U = R, and we assume that capacitance and
resistance are both equal to one. In that case the relation between x
and u is

ẋ (t ) = − x (t ) + u (t ), x (0) = 0. (3.36)

As cost we take
T
( x (T ) − 1)2
J [0,T ] (0, u ) = + ( u (t ) − x (t ))2 dt . (3.37)
β 0
T
Here, as before, the term 0 ( u (t )− x (t ))2 dt is the energy loss, but now we
also added a final cost, ( x (T ) − 1)2 /β. This final cost depends on a positive
tuning parameter β that we can choose. We do not insist on having x (T ) =
1, but the final cost puts a penalty on the deviation of the final voltage
x (T ) from 1 (our desired voltage). For instance, if β ≈ 0 then we expect
x (T ) ≈ 1.
116 3 DYNAMIC P ROGRAMMING

(a) Assume the solution of the HJB equations (3.12) is of the form

V (x, t ) = (x − 1)2 P (t ).

Derive an ordinary differential equation for P (t ) in terms of


P, Ṗ , t , T, β (and nothing else).
(b) Verify that

1
P (t ) = ,
β+T −t

and that
t t +1 1
x ∗ (t ) = , u ∗ (t ) = , and J [0,T ] (0, u ∗ ) = ,
β+T β+T β+T

respectively, are the optimal state, optimal control, and optimal cost.
(c) Determine limβ↓0 x ∗ (T ), and explain on the basis of the cost (3.37)
why this makes sense.

F IGURE 3.5: A pendulum with a torque u. See Exercise 3.12.

3.12 Optimal stabilization of a pendulum. Consider a mass m hanging from a


ceiling on a thin massless rod of length , see Fig. 3.5. We can control
the pendulum with a torque. The standard mathematical model in the
absence of damping is

m2 φ̈(t ) + g m sin(φ(t )) = u (t ),

where φ is the angle between pendulum and the vertical hanging position,
u is the applied torque, m is the mass of the pendulum,  is the length of
the pendulum, and g is the gravitational acceleration.
The objective is to determine a torque u that stabilizes the pendulum to
the vertical hanging equilibrium φ = 2kπ, φ̇ = 0. This, by definition, means
that u is such that

lim φ(t ) = 2kπ, lim φ̇(t ) = 0.


t →∞ t →∞
3.7 E XERCISES 117

We consider the stabilization “optimal” if the input stabilizes and mini-


mizes
∞
J [0,∞) (x 0 , u ) = φ̇2 (t ) + u 2 (t ) dt
0

over all stabilizing inputs.


∞
(a) Prove that if u stabilizes the system, then 0 u (t )φ̇(t ) dt only
depends on the initial conditions φ(0), φ̇(0).
[Hint: there is an explicit anti-derivative of the product u φ̇.]
(b) Solve the optimal control problem.
[Hint: work out (φ̇ ± u )2 and use (a).]
(c) Verify that your optimal solution renders the closed-loop asymp-
totically stable. [Hint: you probably need Lyapunov functions and
LaSalle’s invariance principle, see § B.4 (p. 218).]

3.13 Connection between value function and costate. Consider Example 3.4.8.

(a) Determine the costate directly from the minimum principle.


(b) Argue that for x 0 = 0 there are many optimal inputs and many corre-
sponding costates p ∗ (t ).
(c) Determine the costate via Theorem 3.5.1 for the cases that x 0 > 0 and
x 0 < 0.

3.14 Infinite horizon. Consider a system ẋ (t ) = f ( x (t ), u (t )), x (0) = x 0 , and infi-


nite horizon cost
∞
J [τ,∞) ( x (τ), u ) = L( x (t ), u (t )) dt .
τ

(a) Argue that the value function V (x, τ) defined in (3.28) does not
depend on τ.
(b) Suppose V (x) is a continuously differentiable function that solves
the HJB equation (3.30). Show that for every input for which
V ( x (∞)) = 0 we have that

J [0,∞) (x 0 , u ) ≥ V (x 0 ).

(c) Consider the integrator ẋ (t ) = u (t ), x (0) = x 0 , and assume that u (t ) is


free to choose (so u (t ) ∈ R), and that the cost is
∞
J [0,∞) (x 0 , u ) = x 2 (t ) + u 2 (t ) dt .
0

There are two continuously differentiable solutions V of the HJB


equation (3.30) with the property that V (0) = 0. Determine both.
118 3 DYNAMIC P ROGRAMMING

(d) Continue with the system and cost of (c). Find the input u ∗ : [0, ∞) →
R that minimizes J [0,∞) (x 0 , u ) over all inputs that steer the state to
zero (i.e., such that limt →∞ x (t ) = 0).

3.15 Infinite horizon optimal control. Determine the input u : [0, ∞) → R that
stabilizes the system ẋ (t ) = u (t ), x (0) = x 0 (meaning limt →∞ x (t ) = 0) and
∞
that minimizes 0 x 4 (t )+ u 2 (t ) dt over all inputs that stabilize the system.

3.16 Infinite horizon optimal control. Consider the nonlinear system with infi-
nite horizon quadratic cost
∞
ẋ (t ) = x (t ) u (t ), x (0) = x0 , U = R, J [0,T ] (x0 , u ) = x 2 (t )+ρ 2 u 2 (t ) dt .
0

We assume that ρ > 0.

(a) Determine all nonnegative functions V that satisfy (3.30) and such
that V (0) = 0.
(b) Determine the input u : [0, ∞) → R that stabilizes the system (mean-
∞
ing limt →∞ x (t ) = 0) and that minimizes 0 x 2 (t )+ρ 2 u 2 (t ) dt over all
inputs that stabilize the system.
(c) Express the optimal cost V (x 0 ) in terms of x 0 , ρ, and show that this
equals limT →∞ V (x 0 , 0) where V (x 0 , 0) is the value function of the
finite horizon case as given in Exercise 3.9(c) and Fig. 3.4.

3.17 Infinite horizon optimal control with and without stability.

(a) Determine the input u : [0, ∞) → R that stabilizes the system ẋ (t ) =


x (t ) + u (t ), x (0) = 1 (meaning limt →∞ x (t ) = 0) and that minimizes
∞ 4
0 u (t ) dt over all inputs that stabilize the system.
(b) Show that the optimal input differs if we do not insist on sta-
bility. That is, argue that the input (stabilizing or not) that mini-
∞
mizes 0 u 4 (t ) dt does not stabilize the given system ẋ (t ) = x (t ) +
u (t ), x (0) = 1.
3.18 Infinite horizon optimal control with and without stability. Example 3.6.1
shows that the minimal value of the cost over all stabilizing inputs is
3−3/4 x 04 . Argue that this equals the minimal value of the cost over all
inputs (stabilizing or not). [Hint: have a look at the finite horizon case as
discussed in Example 3.4.6.]

3.19 Free final time. Consider the standard optimal control problem (3.1), but
now we optimize over all u : [0, T ] → U as well as all final times T ≥ 0. The
definition of the value function changes accordingly:

V (x, τ) = inf J [τ,T ] (x, u ), τ ≥ 0.


T ≥τ, u :[τ,T ]→U
3.7 E XERCISES 119

(a) Show that V (x, τ) does not depend on τ. Hence the value function is
of the form V (x).
(b) Assume that the value function V (x) is well defined and that it is C 1 .
Let u ∗ (t ) be an optimal control for a given x 0 , and let x ∗ (t ) be the
resulting optimal state, and assume that L( x ∗ (t ), u ∗ (t )) is continuous
in t . Show that
d V ( x ∗ (t )) ∂ V ( x ∗ (t ))
= f ( x ∗ (t ), u ∗ (t )) = −L( x ∗ (t ), u ∗ (t )).
dt ∂x T

(c) Let V and u ∗ and x ∗ be as in part (b) of this exercise. Show that

H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) = 0 ∀t ∈ [0, T ],
x ∗ (t ))
for p ∗ (t ) := ∂ V (∂x . Which two theorems from Chapter 2 come to
mind?
Chapter 4

Linear Quadratic Control

4.1 Linear Systems with Quadratic Costs

Optimal control theory took shape in the late 1950s, among others stimulated
by the space programs in the Soviet Union and the USA. At about the same time
there was a clear paradigm shift in control theory. Till about 1960 the trans-
fer function was the de-facto standard representation of linear time-invariant
dynamical systems with inputs and outputs. This changed in the sixties when
Kalman (and others) advocated the use of state space representations. These
developments are not unrelated since optimal control theory is based on the
representation of dynamical systems by systems of differential equations, i.e.,
state space models. Furthermore, the introduction of the Kalman-Bucy filter in
the early 1960s, again based on state representations and replacing the Wiener
filter, contributed to the rise of state space descriptions.
Kalman also introduced and solved the optimal control problem for lin-
ear systems with quadratic costs. This has become a standard tool in the con-
trol of linear systems, and it also paved the way for the highly influential state
space H∞ control theory as it emerged in the eighties and nineties (see Chap-
ter 5). In this chapter we study optimal control problems for linear systems with
quadratic costs, close to Kalman’s original problem. Specifically, we consider the
minimization of quadratic costs of the form
T
J [0,T ] (x 0 , u ) := x T (t )Q x (t ) + u T (t )R u (t ) dt + x T (T )S x (T ) (4.1)
0

over all inputs u : [0, T ] → Rm and states x : [0, T ] → Rn that are governed by a
linear system with given initial state,

ẋ (t ) = A x (t ) + B u (t ), x (0) = x0 . (4.2)

This problem is known as the finite horizon linear quadratic optimal control
problem, or LQ problem for short. Later in this chapter we also consider the

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 121
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0_4
122 4 L INEAR QUADRATIC C ONTROL

infinite horizon case, which is when T = ∞. No restrictions are imposed on u (t ),


that is, at any time t ∈ [0, T ] the input can take any value in Rm .
The matrix B in (4.2) has n rows and m columns, and A is an n × n matrix.
The weighting matrices Q and S in (4.1) are n×n matrices, and they are assumed
to be positive semi-definite, and we assume that R is an m × m positive definite
matrix:
S ≥ 0, Q ≥ 0, R > 0. (4.3)
This class of linear systems and quadratic costs is broad, yet is simple enough
to allow for explicit solutions. Especially for the infinite horizon case there are
efficient numerical routines that solve the problem completely, and they can
be found in various software packages. The theory is popular because it is a
complete and elegant theory, but also because it provides the pragmatic con-
trol engineer with a systematic tool to determine stabilizing controllers, and it
allows to tune the controller by changing meaningful parameters. As an exam-
ple, suppose we have the scalar system ẋ (t ) = a x (t ) + b u (t ) and that we want to
steer the state x to zero “quickly” by choice of input u , but using only “small”
inputs u . LQ control has the potential to solve such problems. The idea is to
minimize over all inputs the quadratic cost
∞
x 2 (t ) + ρ 2 u 2 (t ) dt . (4.4)
0
Here ρ is a tuning parameter that we choose. If ρ is large then we put a large
penalty on the input u in the cost function, so the input that minimizes the
cost is probably going to be “small”. Conversely, if ρ is small (close to zero) then
inputs u are “cheap” and then the optimal input is probably going to be “large”
and possibly it is able to steer the state x to zero “quickly”. By tuning ρ we can
now hope to come up with a good compromise between small u and small x .
In this example we have a single parameter to choose, ρ. In the general case we
can choose the matrices Q, R and S; see § 4.6 for a number of examples.
We solve the finite horizon LQ problem in detail, first using Pontryagin’s
minimum principle and then using dynamic programming. Both methods
reveal that the optimal cost is quadratic in the initial state, that is,
min J [0,T ] (x 0 , u ) = x 0T P x 0 (4.5)
u
for some matrix P . Furthermore, the optimal input can be implemented as
a linear time-varying state feedback, u ∗ (t ) = −F (t ) x (t ), for some matrix F (t )
depending on time t . Then we tackle the infinite horizon LQ problem. Also in
that case the optimal cost turns out to be quadratic in the initial state, and,
interestingly, the optimal input u ∗ (if it exists) can be implemented as a linear
state feedback
u ∗ (t ) = −F x (t )
for some constant matrix F . Note that the linear state feedback form is not
imposed on the LQ problem. It will be a result.
4.2 F INITE H ORIZON LQ: M INIMUM P RINCIPLE 123

4.2 Finite Horizon LQ: Minimum Principle

The Hamiltonian (2.12) for system (4.2) and cost (4.1) is H (x, p, u) = p T (Ax +
Bu) + x T Qx + u T Ru. The computations to come clean up considerably if we
express the costate as 2p, thus our new costate p is half of the standard costate.
This way the Hamiltonian becomes

H (x, 2p, u) = 2p T (Ax + Bu) + x T Qx + u T Ru.

Working out the Hamiltonian equations (2.14) for the state x and the halved
costate p , we obtain

ẋ (t ) = A x (t ) + B u (t ), x (0) = x0 , (4.6)
2 ṗ (t ) = −A 2 p (t ) − 2Q x (t ),
T
2 p (T ) = 2S x (T ).

According to the minimum principle, the optimal input at each t minimizes


the Hamiltonian. Since the Hamiltonian is quadratic in u with positive definite
quadratic term (since R is assumed to be positive definite), it is minimal iff its
gradient with respect to u is zero. This gradient is

∂H (x, 2p, u)
= B T 2p + 2Ru,
∂u
and so the input that minimizes the Hamiltonian is

u ∗ (t ) := −R −1 B T p (t ). (4.7)

Hence if we can additionally compute p (t ) then this settles the optimal control.
Substitution of (4.7) into the Hamiltonian equations (4.6) yields the system of
coupled differential equations
    
ẋ ∗ (t ) A −B R −1 B T x ∗ (t ) x ∗ (0) = x0
= , (4.8)
ṗ ∗ (t ) −Q −A T p ∗ (t ) p ∗ (T ) = S x ∗ (T ).

The (2n) × (2n) matrix here is called a Hamiltonian matrix and we denote it by
H,
 
A −B R −1 B T
H := . (4.9)
−Q −A T

Remark 4.2.1. Another way to obtain (4.8), instead of halving the costate as
done above, is to halve the cost criterion (4.1). Clearly, this does not change the
optimal control u ∗ , it only scales the cost with a factor 1/2. This approach leads
to the Hamiltonian p T (Ax +Bu)+ 12 x T Qx + 12 u T Ru which is half of the expression
H (x, 2p, u) as considered above, and it also leads to (4.8). 
124 4 L INEAR QUADRATIC C ONTROL

The coupled differential equations (4.8) form a linear system of differential


equations in x ∗ and p ∗ . If we would have had only an initial or only a final
condition on x ∗ and p ∗ , then we could easily solve (4.8). Here, though, we have
partly an initial condition (on x ∗ ) and partly a final condition (on p ∗ and x ∗ ), so
it is not immediately clear how to solve the above equation. In fact, at this point
it is not even clear that the above differential equation, with its mixed boundary
conditions, has a solution at all, or, if it has, that its solution is unique. Later
on in this section we will see that it indeed has a unique solution, owing to
our assumptions on Q, R and S. This result exploits the following remarkable
connection between state and costate and the optimal cost. This connection
may come as a surprise but can be understood from the dynamic programming
solution presented further on in this chapter.

Lemma 4.2.2 (Optimal cost). For every solution x ∗ , p ∗ of (4.8), the cost (4.1) for
input u ∗ := −R −1 B T p ∗ equals

J [0,T ] (x 0 , u ∗ ) = p ∗T (0)x 0 .

Proof. Consider first the identity (and here we momentarily skip time argu-
ments)

dt ( p ∗ x ∗ ) =
d T
p ∗T ẋ ∗ + ṗ ∗T x ∗ = p ∗T (A x − B R −1 B T p ∗ ) + (−Q x ∗ − A T p ∗ )T x ∗
= − p ∗T B R −1 B T p ∗ − x ∗T Q x ∗
= −( u ∗T R u ∗ + x ∗T Q x ∗ ).

With this we can express the cost (4.1) as


T
 
J [0,T ] (x 0 , u ∗ ) = x ∗ (T )S x ∗ (T ) −
T d
dt p ∗T (t ) x ∗ (t ) dt
0
 T
= x ∗ (T )S x ∗ (T ) − p ∗T (t ) x ∗ (t )
T
0
= x ∗ (T )S x ∗ (T ) − p ∗ (T ) x ∗ (T ) + p ∗T (0) x ∗ (0) = p ∗T (0)x 0 .
T T

In the final identity we used the final condition p ∗ (T ) = S x ∗ (T ). ■

Example 4.2.3 (First-order system). For the standard integrator system ẋ (t ) =


u (t ) with quadratic cost
T
J [0,T ] (x 0 , u ) = x 2 (t ) + u 2 (t ) dt
0

the Hamiltonian matrix (4.9) is


 
0 −1
H = .
−1 0
4.2 F INITE H ORIZON LQ: M INIMUM P RINCIPLE 125

This matrix is simple enough to allow for an explicit solution of its matrix expo-
nential,

1 et + e−t − et + e−t
eH t = .
2 − et + e−t et + e−t

The state and costate thus equal


   
x ∗ (t ) 1 et + e−t − et + e−t x0
=
p ∗ (t ) 2 − et + e−t et + e−t p ∗ (0)

for an, as yet unknown, initial costate p ∗ (0). This p ∗ (0) must be chosen such
that x ∗ (T ), p ∗ (T ) match the final condition p ∗ (T ) = S x ∗ (T ) = 0 x ∗ (T ) = 0. It is
not hard to see that this requires

eT − e−T
p ∗ (0) = x0 .
eT + e−T
This then fully determines the state and costate for all t ∈ [0, T ] as
 
x ∗ (t ) 1 et + e−t − et + e−t 1
= eT − e−T x0 .
p ∗ (t ) 2 − et + e−t et + e−t eT + e−T

The initial costate p ∗ (0) is linear in x 0 , and therefore the entire state and costate
( x ∗ , p ∗ ) is linear in x 0 . Hence the optimal cost is quadratic in x 0 ,

eT − e−T
J [0,T ] (x 0 , u ∗ ) = p ∗ (0)x 0 = x 02 .
eT + e−T
Furthermore, the optimal input is linear in the costate,

u ∗ (t ) = −R −1 B T p ∗ (t ) = − p ∗ (t ),
and since the costate is linear in x 0 , the optimal input is also linear in x 0 . 

In the above example we managed to transform the final condition, p ∗ (T ) =


S x ∗ (T ), into an equivalent initial condition on p ∗ (0), and this demonstrated
that the solution of the Hamiltonian equation exists and is unique. We will
shortly see that this always works. The general procedure is as follows. First con-
sider the (2n) × (2n) matrix exponential of the Hamiltonian matrix H , and split
it into four n × n blocks:
 
Σ11 (t ) Σ12 (t )
:= eH t . (4.10)
Σ21 (t ) Σ22 (t )

Now the state-costate solution as a function of (known) x 0 and (unknown) p ∗ (0)


is
    
x ∗ (t ) Σ11 (t ) Σ12 (t ) x0
= .
p ∗ (t ) Σ21 (t ) Σ22 (t ) p ∗ (0)
126 4 L INEAR QUADRATIC C ONTROL

With this expression the final condition, p ∗ (T ) = S x ∗ (T ), can be written as

0 = S x ∗ (T ) − p ∗ (T )
 
x ∗ (T )
= S −I
p ∗ (T )
  
Σ11 (T ) Σ12 (T ) x0
= S −I
Σ21 (T ) Σ22 (T ) p ∗ (0)
   
= SΣ11 (T ) − Σ21 (T ) x 0 + SΣ12 (T ) − Σ22 (T ) p ∗ (0). (4.11)

This final condition has a unique solution p ∗ (0) iff SΣ12 (T )−Σ22 (T ) is invertible,
and then

p ∗ (0) = M x0

where M is defined as
 −1  
M = − SΣ12 (T ) − Σ22 (T ) SΣ11 (T ) − Σ21 (T ) . (4.12)

Hence the question is: does the inverse of SΣ12 (T ) − Σ22 (T ) always exist? The
answer is yes:

Theorem 4.2.4 (Existence and uniqueness of solution). Suppose Q ≥ 0, S ≥


0, R > 0. Then the matrix M in (4.12) is well defined. Hence the linear system
with mixed boundary conditions (4.8) has a unique solution x ∗ , p ∗ on the time
interval [0, T ], and it is given by
    
x ∗ (t ) Σ11 (t ) Σ12 (t ) I
= x0 for all t ∈ [0, T ]. (4.13)
p ∗ (t ) Σ21 (t ) Σ22 (t ) M

Proof. First take x 0 = 0, and realize that (4.8) has at least the trivial solution
x ∗ = p ∗ = 0. Lemma 4.2.2 showed that for every possible solution x ∗ , p ∗ we have
T
x ∗ (T )S x ∗ (T ) +
T
x ∗T (t )Q x ∗ (t ) + u ∗T (t )R u ∗ (t ) dt = p ∗T (0)x0 ,
0

and here that is zero because we took x 0 = 0. Since all terms on the left-hand
side of the above equation are nonnegative, it must be that all these terms are
zero. In particular u ∗T (t )R u ∗ (t ) = 0. Now R > 0, so necessarily u ∗ (t ) = 0. This, in
turn, implies that ẋ ∗ (t ) = A x ∗ (t ) + B u ∗ (t ) = A x ∗ (t ). Given that x ∗ (0) = x 0 = 0 we
get x ∗ (t ) = 0 for all time and, as a result, ṗ ∗ (t ) = −Q x ∗ (t ) − A T p ∗ (t ) = −A T p ∗ (t )
and p ∗ (T ) = S x ∗ (T ) = 0. This shows that p ∗ (t ) is zero for all time as well.
Conclusion: for x 0 = 0 the solution ( x ∗ , p ∗ ) of (4.8) exists and is unique. This
implies that SΣ12 (T ) − Σ22 (T ) is nonsingular for otherwise there would have
existed multiple p ∗ (0) that satisfy the boundary condition (4.11). Invertibility
of SΣ12 (T ) − Σ22 (T ) in turn shows that the final condition (4.11) has a unique
solution, p ∗ (0) = M x 0 , for every x 0 . ■
4.2 F INITE H ORIZON LQ: M INIMUM P RINCIPLE 127

It gets better: the LQ problem satisfies the convexity conditions of Theo-


rem 2.8.1 if S ≥ 0,Q ≥ 0, R > 0. So solvability of the Hamiltonian equations—
which we just proved—is not only necessary for optimality, it is also sufficient.
That is, u ∗ (t ) := −R −1 B T p ∗ (t ) is the optimal control. A more direct proof of opti-
mality is discussed in Exercise 4.3. The assumptions that R is positive definite
and S and Q positive semi-definite are crucial. Without these assumptions, M
might not exist, see Exercise 4.2.
Note that p ∗ (0) according to (4.13) is linear in the initial state, p ∗ (0) = M x 0 .
Hence, as follows from Lemma 4.2.2, the optimal cost is quadratic in the ini-
tial state. There is also an elegant elementary argument why the optimal cost is
quadratic in the state, see Exercise 4.5.

Example 4.2.5 (Integrator, see also Example 3.4.4). Consider again the scalar
integrator system ẋ (t ) = u (t ), x (0) = x 0 and take as cost
T
J [0,T ] (x 0 , u ) = x (T ) +
2
R u 2 (t ) dt (4.14)
0

where R is some positive number. Then


 
0 −1/R
H = ,
0 0
and, thus,
 
1 −t /R
eH t = .
0 1
The final condition on p ∗ (T ) can be transformed into a unique initial condition
on p ∗ (0). Indeed the final condition is met iff

0 = S x ∗ (T ) − p ∗ (T )
 
HT x0
= S −1 e
p ∗ (0)
  
1 −T /R x0
= 1 −1
0 1 p ∗ (0)
= x 0 − (T /R + 1) p ∗ (0).

This is the case iff


x0
p ∗ (0) = .
T /R + 1
It is linear in x 0 (as predicted), and the above inverse (T /R + 1)−1 exists (as pre-
dicted) because T /R ≥ 0 so T /R +1 = 0. The optimal cost is quadratic in x 0 (pre-
dicted as well), in fact,
x 02
J [0,T ] (x 0 , u ∗ ) = p ∗ (0)x 0 = .
T /R + 1
128 4 L INEAR QUADRATIC C ONTROL

Special about this example is that the costate is constant, p ∗ (t ) = p ∗ (0). The
optimal control is therefore constant as well,
1 p ∗ (0) x0
u ∗ (t ) = − p ∗ (t ) = − =− .
R R T +R
For R T the optimal control u ∗ (t ) is small, which is to be expected because
for large R the input is penalized strongly in the cost (4.14). If R ≈ 0 then control
is cheap. In this case the control is not necessarily large, u ∗ (t ) ≈ −x 0 /T , but
large enough to steer the final state x ∗ (T ) to something close to zero, x ∗ (T ) =
x 0 (1 − T /(R + T )) = x 0 R/(T + R) ≈ 0. 

Example 4.2.6 (Second-order system with mixed boundary condition). This is


a laborious example. Consider the system with initial condition
          
ẋ 1 (t ) 0 1 x 1 (t ) 0 x 1 (0) 1
= + u (t ), = ,
ẋ 2 (t ) 0 0 x 2 (t ) 1 x 2 (0) 0

and with cost


3
J [0,3] (x 0 , u ) = x 21 (3) + u 2 (t ) dt .
0

The Hamiltonian equations (4.8) then become (verify this yourself )


⎡ ⎤ ⎡ ⎤⎡ ⎤
ẋ ∗1 (t ) 0 1 0 0 x ∗1 (t )
⎢ ẋ (t )⎥ ⎢ 0 0 0 −1 ⎥ ⎢ x (t )⎥
⎢ ∗2 ⎥ ⎢ ⎥ ⎢ ∗2 ⎥
⎢ ⎥=⎢ ⎥⎢ ⎥ (4.15)
⎣ ṗ ∗1 (t )⎦ ⎣ 0 0 0 0 ⎦ ⎣ p ∗1 (t )⎦
ṗ ∗2 (t ) 0 0 −1 0 p ∗2 (t )
with boundary conditions
       
x ∗1 (0) 1 p ∗1 (3) x ∗1 (3)
= , = .
x ∗2 (0) 0 p ∗2 (3) 0

Now we try to solve (4.15). The differential equation for p ∗1 (t ) simply is ṗ ∗1 (t ) =


0, p ∗1 (3) = x ∗1 (3), and therefore it has a constant solution,

p ∗1 (t ) = x ∗1 (3). (4.16)

The differential equation for p ∗2 (t ) now is

ṗ ∗2 (t ) = − p ∗1 (t ) = − x ∗1 (3), p ∗2 (3) = 0,

so that

p ∗2 (t ) = (3 − t ) x ∗1 (3). (4.17)

With this solution the differential equation for x ∗2 (t ) becomes

ẋ ∗2 (t ) = − p ∗2 (t ) = (t − 3) x ∗1 (3), x ∗2 (0) = 0.
4.3 F INITE H ORIZON LQ: DYNAMIC P ROGRAMMING 129

This equation, too, is not difficult to solve,

x ∗2 (t ) = ( 12 t 2 − 3t ) x ∗1 (3). (4.18)

Finally, we have to solve the differential equation for x ∗1 (t ), given by

ẋ ∗1 (t ) = x ∗2 (t ) = ( 12 t 2 − 3t ) x ∗1 (3), x ∗1 (0) = 1.

Its solution is
1 
x ∗1 (t ) = 6t
3
− 32 t 2 x ∗1 (3) + 1. (4.19)

The only unknown left is x ∗1 (3). From (4.19) it follows that


 
x ∗1 (3) = 92 − 27
2 x ∗1 (3) + 1,

i.e.,
1
x ∗1 (3) = . (4.20)
10
Now we have solved the differential equation (4.15), and the solution is given
by (4.16)–(4.19), with x ∗1 (3) equal to 1/10, see (4.20). Hence, the optimal con-
trol (4.7) is
1
u ∗ (t ) = −R −1 B T p ∗ (t ) = −B T p ∗ (t ) = − 0 1 p ∗ (t ) = − p ∗2 (t ) = (t − 3),
10
and the optimal cost is
 T    T  
p ∗1 (0) x ∗1 (0) 1/10 1 1
= = .
p ∗2 (0) x ∗2 (0) 3/10 0 10


4.3 Finite Horizon LQ: Dynamic Programming

The LQ problem can be solved with the aid of dynamic programming as well.
The equation that has to be solved in dynamic programming is a partial differ-
ential equation—the Hamilton-Jacobi-Bellman (HJB) equation—and that is, in
general, not an easy task. For LQ it can be done, however.
So consider again a linear system

ẋ (t ) = A x (t ) + B u (t ), x (0) = x0

with a quadratic cost


T
J [0,T ] (x 0 , u ) = x T (t )Q x (t ) + u T (t )R u (t ) dt + x T (T )S x (T ).
0
130 4 L INEAR QUADRATIC C ONTROL

Here, as before, S and Q are symmetric n × n positive semi-definite matrices,


and R is an m × m positive definite matrix: S ≥ 0,Q ≥ 0, R > 0. The HJB equa-
tions (3.12) for this problem are
 
∂V (x, t ) ∂V (x, t )
+ minm (Ax + Bu) + x Qx + u Ru = 0, V (x, T ) = x T Sx.
T T
∂t u∈R ∂x T
We determine a solution V (x, t ) of this equation. Because the optimal cost
according to the minimum principle is quadratic, we expect the value function
to be quadratic in x as well. Based on this we restrict our V (x, t ) to functions of
the form

V (x, t ) = x T P (t )x

with P (t ) an n × n symmetric matrix depending on t . Using this quadratic


V (x, t ), the above HJB equations become
 
x T Ṗ (t )x + minm 2x T P (t ) (Ax + Bu) + x T Qx + u T Ru = 0, x T P (T )x = x T Sx.
u∈R

The minimization over u can, like in the previous section, be solved by setting
the gradient of 2x T P (t )(Ax + Bu) + x T Qx + u T Ru with respect to u equal to zero.
This gives for each x and each t as input

u = −R −1 B T P (t )x

(verify this yourself ), and thereby the HJB equations reduce to


 
x T Ṗ (t ) + P (t )A + A T P (t ) + Q − P (t )B R −1 B T P (t ) x = 0,
(4.21)
x T P (T )x = x T Sx.

Here we used the fact that x T P (t )Ax is the same as x T A T P (t )x. All terms in (4.21)
have a factor x T (on the left) and a factor x (on the right), and the matrix inside
the brackets is symmetric. Hence (4.21) holds for all x ∈ Rn iff the equation in
which x T and x are removed holds. Thus

Ṗ (t ) = −P (t )A − A T P (t ) + P (t )B R −1 B T P (t ) − Q, P (T ) = S . (4.22)

This is a nonlinear, n × n-matrix-valued differential equation, called Riccati dif-


ferential equation (or RDE for short) because of its (loose) connection with cer-
tain quadratic differential equations studied by the Italian nobleman, Count
Jacopo Riccati (1676–1754).
The existence of a solution P (t ) of this RDE is not straightforward, but if
there is a continuous solution on [0, T ] then the candidate optimal control

u ∗ (t ) = −R −1 B T P (t ) x (t )
makes the closed-loop system satisfy

ẋ (t ) = A x (t ) + B u ∗ (t ) = (A − B R −1 B T P (t )) x (t ), x (0) = x0 .
4.3 F INITE H ORIZON LQ: DYNAMIC P ROGRAMMING 131

This is a linear differential equation, so if P (t ) is continuous on [0, T ] then it


has a unique solution x ∗ (t ) for every x 0 and all t ∈ [0, T ]. Theorem 3.4.3 in that
case guarantees that this u ∗ is the optimal input, and that V (x, t ) = x T P (t )x is
the value function. In particular V (x 0 , 0) = x 0T P (0)x 0 is the optimal cost. Thus we
proved:

Proposition 4.3.1 (Solution of the finite horizon LQ problem). Let Q, R, S be


symmetric, and suppose that R > 0. If the RDE (4.22) has a continuously differ-
entiable solution P : [0, T ] → Rn×n , then the LQ problem (4.1)–(4.2) has a solu-
tion for every x 0 ∈ Rn . In particular

u ∗ (t ) := −R −1 B T P (t ) x (t ) (4.23)

is the optimal input, and the optimal cost is

J [0,T ] (x 0 , u ∗ ) = x 0T P (0)x 0 ,

and V (x, t ) := x T P (t )x is its value function. 

Proof. It is an immediate consequence of Theorem 3.4.3 of the previous chap-


ter. But it also has a direct proof. This proof resembles that of Lemma 4.2.2, and
for the case that R = I it goes as follows (for ease of exposition we omit here the
time arguments):

d
dt ( x T P x ) = ẋ T P x + x T Ṗ x + x T P ẋ
= (A x + B u )T P x + x T (−P A − P A T + P B B T P − Q) x + x T P (A x + B u )
= u T B T P x + x T P B u + x T P B B T P x − x TQ x
= ( u + B T P x )T ( u + B T P x ) − x T Q x − u T u .

(Verify the final identity yourself.) From this it follows that the cost can also be
expressed as (where, again, we omit the time arguments),
T
J [0,T ] (x 0 , u ) = x T (T )S x (T ) + x T Q x + u T u dt
0
T
= x (T )S x (T ) +
T d
− dt ( x T P x ) + ( u + B T P x )T ( u + B T P x ) dt
0
T
T
= x (T )S x (T ) − x (t )P (t ) x (t ) 0 +
T T
( u + B T P x )T ( u + B T P x ) dt
0
T
= x 0T P (0)x 0 + ( u + B T P x )T ( u + B T P x ) dt .
0

Clearly, the final integral is nonnegative for every u , and it is minimal if we take
u (t ) = −B T P (t ) x (t ), and thus the optimal cost is x0T P (0)x0 . For R = I the proof is
similar, see Exercise 4.10. ■
132 4 L INEAR QUADRATIC C ONTROL

Notice that Proposition 4.3.1 assumes symmetry of S and Q, but not that
they are positive semi-definite. Also notice that the optimal control (4.23) is of
a special form: first we have to determine the solution P (t ) of the RDE, but this
can be done irrespective of x 0 . Once P (t ) is determined the optimal control can
be implemented as a linear time-varying state feedback (4.23). Thus the “gain”
matrix F (t ) := R −1 B T P (t ) in the optimal feedback can be computed “off-line”,
i.e., based on the knowledge of the system matrices A, B and the cost criterion
matrices Q, R, S only.

Example 4.3.2 (Example 4.2.5 continued). Consider again the integrator system
ẋ (t ) = u (t ), x (0) = x0 of Example 4.2.5 with
T
J [0,T ] (x 0 , u ) = x 2 (T ) + R u 2 (t ) dt
0

for some R > 0. Here S = 1 and Q = 0, and thus the RDE (4.22) becomes

Ṗ (t ) = P 2 (t )/R, P (T ) = 1.

The solution of this RDE can be found with separation of variables,


R 1
P (t ) = = .
R + T − t 1 + (T − t )/R
Since t ∈ [0, T ] and R > 0 we see that R +T − t > 0 throughout, and so P (t ) is well
defined on [0, T ]. Hence P (t )x 2 is the value function with optimal cost

x 02
J [0,T ] (x 0 , u ∗ ) = P (0)x 02 = ,
1 + T /R
and the optimal input is

P (t ) x (t ) x (t )
u ∗ (t ) = −R −1 B T P (t ) x (t ) = − =− . (4.24)
R R +T −t
In this example the optimal control u ∗ is given in state feedback form, while
in Example 4.2.5 (where we handled the same LQ problem) the control input is
given as a function of time. The feedback form is often preferred in applications,
but for this particular problem the feedback form (4.24) blurs the fact that the
optimal state and optimal control are just linear functions of time, see Exam-
ple 4.2.5. 

4.4 Riccati Differential Equations

Proposition 4.3.1 assumes the existence of a solution P (t ) of the RDE, but does
not require S and Q to be positive semi-definite. If S ≥ 0,Q ≥ 0 (and R > 0) then
existence of P (t ) can in fact be guaranteed. So for standard LQ problems we
have a complete solution:
4.4 R ICCATI D IFFERENTIAL E QUATIONS 133

Theorem 4.4.1 (Existence of solution of RDE’s). If S = S T ≥ 0,Q = Q T ≥ 0 and


R = R T > 0, then the RDE (4.22) has a unique continuously differentiable solu-
tion P (t ) on [0, T ], and P (t ) is symmetric and positive semi-definite at every
t ∈ [0, T ].
Consequently, u ∗ (t ) := −R −1 B T P (t ) x (t ) is a solution of the LQ problem,
V (x, t ) := x T P (t )x is its value function, and the optimal cost is J [0,T ] (x 0 , u ∗ ) =
x 0T P (0)x 0 .

Proof. The RDE (4.22) is equivalent to a system of n 2 differential equations in


the entries p i j (t ), i , j = 1, . . . , n of P (t ). The right-hand side of this equation con-
sists of polynomials in p i j (t ), and hence it is continuously differentiable and
therefore is locally Lipschitz. We conclude from Theorem B.1.3 (p. 207) that the
solution P (t ) exists and is unique on an interval (t esc , T ] for some t esc < T . It is
easy to see that also P T (t ) is a solution, so, being unique, we have that P (t ) is
symmetric on (t esc , T ].
Now suppose that it has an escape time t esc ∈ [0, T ). Theorem B.1.4 (p. 207)
then says that as t ↓ t esc , the norm of the vector with entries p i j (t ), i , j = 1, . . . , n,
diverges to infinity. This implies that at least one entry, say p i j (t ), is unbounded
as t ↓ t esc . We now show that this leads to a contradiction. Let e i denote the i -th
basis vector of Rn . Because P (t ) is symmetric, it follows that

(e i + e j )T P (t )(e i + e j ) − (e i − e j )T P (t )(e i − e j ) = 4p i j (t ).

Since p i j (t ) is unbounded, either (e i + e j )T P (t )(e i + e j ) or (e i − e j )T P (t )(e i − e j )


is unbounded (or both are unbounded). Now, choose the initial state z equal to
e i + e j or e i − e j , whichever results in an unbounded z T P (t )z as t ↓ t esc . From
the preceding discussion we know that z T P (t )z is the value function V (z, t ) for
t ∈ (t esc , T ] but the value function for sure is bounded from above by the cost
that we make with the zero input:

∀t ∈ (t esc , T ] : z T P (t )z = min J [t ,T ] (z, u ) ≤ J [t ,T ] (z, 0) ≤ J [tesc ,T ] (z, 0) < ∞.


u

Our z P (t )z can therefore not escape to +∞ as t ↓ t esc . Furthermore, it can also


T

not escape to −∞ because z T P (t )z = min u J [t ,T ] (z, u ) ≥ 0. This yields a contra-


diction, so there is no t esc ∈ [0, T ). Lemma B.1.4 (p. 207) then guarantees that
the differential equation has a unique solution P (t ) on the entire time interval
[0, T ].
Now that existence of P (t ) is proved, Proposition 4.3.1 tells us that the given
u ∗ is optimal, and that x T P (t )x is the value function and x0T P (0)x0 is the optimal
cost. Finally, we showed at the beginning of the proof that P (t ) is symmetric. It
is also positive semi-definite because the cost is nonnegative for every x. ■

In this proof non-negativity of Q, S and positivity of R is crucially used.


These assumptions are standard in LQ, but it is interesting to see what happens
if one of these assumptions is violated. Then the solution P (t ) might escape in
finite time. The following example demonstrates this for negative Q.
134 4 L INEAR QUADRATIC C ONTROL

Example 4.4.2 (Negative Q, finite escape time). Consider the integrator system
ẋ (t ) = u (t ) with initial state x (0) = x0 and cost
T
J [0,T ] (x 0 , u ) = − x 2 (t ) + u 2 (t ) dt .
0

This is a non-standard LQ problem because Q = −1 < 0. The RDE (4.22) simpli-


fies to

Ṗ (t ) = P 2 (t ) + 1, P (T ) = 0.

Using separation of variables one can show that its solution is

P (t ) = tan(t − T ), t ∈ (T − π2 , T ].

This solution P (t ) escapes at

t esc = T − π2 ,

see Fig. 4.1. If T < π2 then there is no escape time in [0, T ] and, hence,
P (t ) := tan(t − T ) is then well defined on the entire horizon [0, T ], and, con-
sequently,

V (x, t ) = x 2 tan(t − T )

is the value function, and

u ∗ (t ) = −R −1 B T P (t ) x (t ) = − tan(t − T ) x (t )

is the optimal state feedback.


However, if T ≥ π2 then the escape time t esc is in [0, T ], see Fig. 4.1 (right). In
this case the optimal cost is unbounded from below. That is, it can be made as
close to −∞ as we desire. To see this, take

0 t ≤ t esc + 
u  (t ) =
− tan(T − t ) x (t ) t > t esc + 

for some small  > 0. For this input the state x (t ) is constant over [0, t esc +], and
continues optimally over [t esc + , T ]. The cost for this input thus is
tesc +
J [0,T ] (x 0 , u  ) = − x 2 (t ) + u 2 (t ) dt + V (x 0 , t esc + )
0
= −(t esc + )x 02 + tan(− π2 + )x 02 .

It diverges to −∞ as  ↓ 0. 
4.4 R ICCATI D IFFERENTIAL E QUATIONS 135

F IGURE 4.1: Graph of tan(t − T ) for t ∈ [0, T ]. Left: if 0 < T < π2 . In that case
tan(t − T ) is defined for all t ∈ [0, T ]. Right: if T ≥ π2 . Then tan(t − T ) is not
defined at T − π2 ∈ [0, T ]. See Example 4.4.2.

Connection between Hamiltonians and RDE’s

In Theorem 3.5.1 we established a connection between value functions and


standard costates: p ∗ (t ) = ∂V ( x∂x
∗ (t ),t )
. Given that the standard costate is twice the
costate as used in the LQ problem—see the beginning of § 4.2—this connection
for the LQ problem becomes

∂ V ( x ∗ (t ), t )
2 p ∗ (t ) = .
∂x

For the LQ problem with quadratic value functions V (x, t ) = x T P (t )x we have


∂ V ( x ∗ (t ),t )
∂x = 2P (t ) x ∗ (t ). Therefore the connection is

p ∗ (t ) = P (t ) x ∗ (t ). (4.25)

Incidentally, this re-proves Lemma 4.2.2 because p ∗T (0)x 0 = x 0T P (0)x 0 = V (x 0 , 0).


Equation (4.25) expresses the costate p ∗ (t ) in terms of the solution P (t ) of the
RDE, but it can also be used to determine P (t ) using the states and costates.
This goes as follows. In Theorem 4.2.4 we saw that the optimal x ∗ and p ∗ follow
uniquely from x 0 as
    
x ∗ (t ) Σ11 (t ) Σ12 (t ) I
= x0 (4.26)
p ∗ (t ) Σ21 (t ) Σ22 (t ) M

for M := −(SΣ12 (T ) − Σ22 (T ))−1 (SΣ11 (T ) − Σ21 (T )). Consider the mapping x 0 →
x ∗ (t ) given by the upper part of (4.26), i.e., x ∗ (t ) = (Σ11 (t ) + Σ12 (t )M )x0 . If this
mapping is invertible at every t ∈ [0, T ] then x 0 follows uniquely from x ∗ (t ) as
x 0 = (Σ11 (t ) + Σ12 (t )M )−1 x ∗ (t ) and, consequently, p ∗ (t ) also follows uniquely
136 4 L INEAR QUADRATIC C ONTROL

from x ∗ (t ):

p ∗ (t ) = (Σ21 (t ) + Σ22 (t )M )x0


= (Σ21 (t ) + Σ22 (t )M )(Σ11 (t ) + Σ12 (t )M )−1 x ∗ (t ).

Comparing this with (4.25) suggests the following explicit formula for P (t ).

Lemma 4.4.3 (Solution of RDE’s using the Hamiltonian). Let S,Q be positive
semi-definite n ×n matrices, and R = R T > 0. Then the solution P (t ), t ∈ [0, T ], of
the RDE

Ṗ (t ) = −P (t )A − A T P (t ) + P (t )B R −1 B T P (t ) − Q, P (T ) = S,

is
  −1
P (t ) = Σ21 (t ) + Σ22 (t )M Σ11 (t ) + Σ12 (t )M . (4.27)

Here M := −(SΣ12 (T )−Σ22 (T ))−1 (SΣ11 (T )−Σ21 (T )), and Σi j are n ×n sub-blocks
of the matrix exponential eH t as defined in (4.10).

Proof. Recall that the solution P (t ) of the RDE exists. If Σ11 (t ) + Σ12 (t )M would
have been singular at some t = t̄ , then any nonzero x 0 in the null space of
Σ11 (t̄ ) + Σ12 (t̄ )M renders x ∗ (t̄ ) = 0 while p ∗ (t̄ ) is nonzero (because Σ(t ) := eH t
is invertible). This contradicts the fact that p ∗ (t ) = P (t ) x ∗ (t ). Hence Σ11 (t ) +
Σ12 (t )M is invertible for every t ∈ [0, T ] and, consequently, the mapping from
x ∗ (t ) to p ∗ (t ) follows uniquely from (4.26) and it equals (4.27). ■
T
Example 4.4.4. In Example 4.2.3 we tackled the minimization of 0 x 2 (t ) +
u 2 (t ) dt for ẋ (t ) = u (t ) using Hamiltonians, and we found that
 
Σ11 (t ) Σ12 (t ) 1 et + e−t − et + e−t eT − e−T
= , M = .
Σ21 (t ) Σ22 (t ) 2 − et + e−t et + e−t eT + e−T
The RDE for this problem is

Ṗ (t ) = P 2 (t ) − 1, P (T ) = 0.

According to (4.27) the solution of this RDE is


Σ21 (t ) + Σ22 (t )M
P (t ) =
Σ11 (t ) + Σ12 (t )M
− et + e−t +(et + e−t )M
= t
e + e−t +(− et + e−t )M
(− et + e−t )(eT + e−T ) + (et + e−t )(eT − e−T )
= t
(e + e−t )(eT + e−T ) + (− et + e−t )(eT − e−T )
eT −t − e−(T −t )
= T −t = tanh(T − t ).
e + e−(T −t )

4.5 I NFINITE H ORIZON LQ AND A LGEBRAIC R ICCATI E QUATIONS 137

4.5 Infinite Horizon LQ and Algebraic Riccati Equations

Now we turn to the infinite horizon LQ problem. This is the problem of mini-
mizing
∞
J [0,∞) (x 0 , u ) := x T (t )Q x (t ) + u T (t )R u (t ) dt (4.28)
0

over all u : [0, ∞) → Rm under the dynamical constraint

ẋ (t ) = A x (t ) + B u (t ), x (0) = x0 .

As before, we assume that R is positive definite and that Q is positive semi-


definite. The terminal cost x T (∞)S x (∞) is absent. (For the problems that we
have in mind the state converges to zero so the terminal cost would not con-
tribute anyway.)
We first approach the infinite horizon LQ problem as the limit as T → ∞ of
the finite horizon LQ problem over the time window [0, T ]. To make the depen-
dence on T explicit we add a subscript T to the solution of the RDE (4.22),
that is,

Ṗ T (t ) = −P T (t )A − A T P T (t ) + P T (t )B R −1 B T P T (t ) − Q, P T (T ) = 0. (4.29)

Example 4.5.1. Consider again the integrator system

ẋ (t ) = u (t ), x (0) = x0 ,

but still with a finite horizon cost


T
J [0,T ] (x 0 , u ) = x 2 (t ) + u 2 (t ) dt .
0

The associated RDE (4.29) is

Ṗ T (t ) = P T2 (t ) − 1, P T (T ) = 0.

Its solution was derived in Example 4.4.4,

P T (t )
eT −t − e−(T −t )
P T (t ) = tanh(T − t ) = .
eT −t + e−(T −t )
T t

Clearly, as T goes to infinity, the solution P T (t ) converges to P := 1 and, in par-


ticular, it no longer depends on t . It is now tempting to conclude that the con-
stant state feedback u ∗ (t ) := −R −1 B T P x (t ) = − x (t ) is the optimal solution of the
infinite horizon LQ problem. It is, as we shall soon see. 
138 4 L INEAR QUADRATIC C ONTROL

The example suggests that P T (t ) converges to a constant P as the horizon


T goes to ∞. It also suggests that limT →∞ Ṗ T (t ) = 0, which in turn suggests that
the Riccati differential equation in the limit reduces to an algebraic equation,
known as the algebraic Riccati equation (or ARE for short) of LQ:

0 = A T P + P A − P B R −1 B T P + Q . (4.30)
The following theorem shows that all this is indeed the case. It requires just one
extra condition (apart from the standard conditions Q ≥ 0, R > 0): for each x 0
there needs to exist at least one input that renders the cost J [0,∞) (x 0 , u ) finite.

Theorem 4.5.2 (Solution of ARE via limit of solution of RDE). Consider ẋ (t ) =


A x (t ) + B u (t ), x (0) = x 0 , and suppose Q ≥ 0, R > 0, and that for every x 0 an input
exists that renders the cost (4.28) finite. Then the solution P T (t ) of (4.29) con-
verges to a matrix independent of t as the final time T goes to infinity. That is,
a constant matrix P exists such that
lim P T (t ) = P ∀t > 0.
T →∞

This P is symmetric, positive semi-definite, and it satisfies the ARE (4.30).

Proof. For every fixed x 0 the expression x 0T P T (t )x 0 is nondecreasing with T


because the longer the horizon the higher the cost. Indeed, for every  > 0 and
initial x (t ) = z we have
T +
z P T + (t )z =
T
x ∗T (t )Q x ∗ (t ) + u ∗T (t )R u ∗ (t ) dt
t
T
≥ x ∗T (t )Q x ∗ (t ) + u ∗T (t )R u ∗ (t ) dt ≥ z T P T (t )z.
t
Besides being nondecreasing, it is, for any given z, also bounded from above
because by assumption for at least one input u z the infinite horizon cost is
finite, so that
z T P T (t )z ≤ J [t ,T ] (z, u z ) ≤ J [t ,∞) (z, u z ) < ∞.
Bounded and nondecreasing implies that z T P T (t )z converges as T → ∞. Next
we prove that in fact the entire matrix P T (t ) converges as T → ∞. Let e i be the
i -th unit vector in Rn , so e i = (0, . . . , 0, 1, 0, . . . , 0)T , with a 1 on the i-th position.
The preceding discussion shows that for each z = e i , the limit
p i i := lim e iT P T (t )e i
T →∞

exists. The diagonal entries of P T (t ) hence converge. For the off-diagonal entries
we use that
lim (e i + e j )T P T (t )(e i + e j ) = lim e iT P T (t )e i + e Tj P T (t )e j + 2e iT P T (t )e j
T →∞ T →∞
= p i i + p j j + lim 2e iT P T (t )e j .
T →∞
4.5 I NFINITE H ORIZON LQ AND A LGEBRAIC R ICCATI E QUATIONS 139

The limit on the left-hand side exists, so the limit p i j := limT →∞ e iT P T (t )e j exists
as well. Therefore all entries of P T (t ) converge as T → ∞. The limit is indepen-
dent of t because P T (t ) = P T −t (0).
Clearly, P ≥ 0 because it is the limit of P T (t ) ≥ 0.
Since P T (t ) converges to a constant matrix, also Ṗ T (t ) = −P T (t )A− A T P T (t )+
P T (t )B R −1 B T P T (t ) − Q converges to a constant matrix as T → ∞. This constant
t +1
matrix must be zero because t Ṗ T (τ) dτ = P T (t + 1) − P T (t ) → 0 as T → ∞. ■

LQ with stability

The classic infinite-horizon LQ problem does not consider asymptotic stability


∞
of the closed-loop system. For instance, if we choose as cost 0 u 2 (t ) dt then
optimal is to take u ∗ (t ) = 0, even if it would render the closed-loop system
unstable, such as when ẋ (t ) = x (t ) + u (t ). In applications closed-loop asymp-
totic stability is crucial. Classically, closed-loop asymptotic stability is incorpo-
rated in LQ by imposing conditions on Q. For example, if Q = I then the cost
∞
contains a term 0 x T (t ) x (t ) dt , and then the optimal control turns out to nec-
essarily stabilize the system. An alternative approach is to include asymptotic
stability in the problem definition. This we explore now.

Definition 4.5.3 (Infinite-horizon LQ problem with stability). Suppose Q ≥


0, R > 0, and consider the linear system with given initial state ẋ (t ) = A x (t ) +
B u (t ), x (0) = x 0 . The LQ problem with stability is to minimize
∞
J [0,∞) (x 0 , u ) := x T (t )Q x (t ) + u T (t )R u (t ) dt (4.31)
0

over all stabilizing inputs u , meaning inputs that achieve limt →∞ x (t ) = 0. 

The next example shows that in some cases the LQ problem with stability
has an easy solution.

Example 4.5.4 (LQ with stability). Consider the problem of Example 4.5.1:
∞
ẋ (t ) = u (t ), x (0) = x0 , J [0,∞) (x 0 , u ) = x 2 (t ) + u 2 (t ) dt .
0

The running cost, x + u , can also be written as


2 2

x 2 + u 2 = ( x + u )2 − 2 xu = ( x + u )2 − 2 x ẋ .
Interestingly, the term −2 x ẋ has an explicit antiderivative, namely − x 2 , so

x 2 + u 2 = dt
d
(− x 2 ) + ( x + u )2 .

Integrating this over t ∈ [0, ∞) we see that the cost for stabilizing inputs equals
∞
J [0,∞) (x 0 , u ) = x 0 +
2
( x (t ) + u (t ))2 dt . (4.32)
0
140 4 L INEAR QUADRATIC C ONTROL

Here we used that limt →∞ x (t ) = 0, since u is assumed to stabilize the system. It


is immediate from (4.32) that the cost for every stabilizing input is at least x 02 ,
and it equals the minimal value x 02 iff

u = −x.

Since the state feedback u ∗ := − x indeed stabilizes (because the closed-loop sys-
tem becomes ẋ = − x ) we conclude that this state feedback is the optimal con-
trol, and that the optimal (minimal) cost is

J [0,∞) (x 0 , u ∗ ) = x 02 .

In Example 4.5.1 we conjectured that u ∗ := − x is optimal. Now we know it is


optimal, or at least optimal with respect to all stabilizing inputs. 

In this example, and also in the general finite horizon LQ problem, we have
that the optimal cost is quadratic in the initial state, and that the optimal input
can be implemented as a state feedback. Inspired by this we expect that every
infinite horizon LQ problem has these properties. That is, we conjecture that
the optimal cost is of the form

x 0T P x 0

for some matrix P , and that the optimal input equals u ∗ (t ) := −F x (t ) for some
matrix F . This leads to the following central result.

Theorem 4.5.5 (Solution of the LQ problem with stability). There is at most


one matrix P ∈ Rn×n that satisfies the algebraic Riccati equation (ARE)

A T P + P A − P B R −1 B T P + Q = 0 (4.33)

with the property that

A − B R −1 B T P is asymptotically stable. (4.34)

Such a P is called a stabilizing solution of the ARE. In that case P is symmetric,


and the linear state feedback

u ∗ (t ) := −R −1 B T P x (t )

is the solution of the LQ problem with stability, and the optimal cost is x 0T P x 0 .

Proof. If P satisfies the ARE then it can be verified that P − P T satisfies

(A − B R −1 B T P )T (P − P T ) + (P − P T )(A − B R −1 B T P ) = −Q + Q T = 0.

Using this identity, Corollary B.5.3 (p. 227) shows that for asymptotically sta-
ble A − B R −1 B T P we necessarily have P − P T = 0, i.e., stabilizing solutions P of
4.5 I NFINITE H ORIZON LQ AND A LGEBRAIC R ICCATI E QUATIONS 141

the ARE are symmetric. To show that there is at most one stabilizing solution
we proceed as follows. Suppose P 1 , P 2 are two stabilizing solutions of the ARE
(hence P 1 and P 2 are symmetric), and let x 1 , x 2 be solutions of the correspond-
ing ẋ 1 = (A − B R −1 B T P 1 ) x 1 and ẋ 2 = (A − B R −1 B T P 2 ) x 2 . Then
d
dt ( x 1T (P 1 − P 2 ) x 2 )
 
= x 1T (A − B R −1 B T P 1 )T (P 1 − P 2 ) + (P 1 − P 2 )(A − B R −1 B T P 2 ) x 2

= x 1T A T P 1 − A T P 2 − P 1 B R −1 B T P 1 + P 1 B R −1 B T P 2

+P 1 A − P 2 A − P 1 B R −1 B T P 2 + P 2 B R −1 B T P 2 x 2
= x 1T (−Q + Q) x 2 = 0.
Hence x 1 (t )(P 1 −P 2 ) x 2 (t ) is constant as a function of time. By asymptotic stabil-
ity we have that limt →∞ x 1 (t ) = limt →∞ x 2 (t ) = 0. Therefore x 1 (t )(P 1 −P 2 ) x 2 (t ) is,
in fact, zero for all time. Since this holds for every initial condition x 1 (0), x 2 (0),
we conclude that P 1 = P 2 .
In the rest of the proof we assume that P is the symmetric stabilizing solu-
tion of the ARE. We expect the optimal u to be a linear state feedback u = −F x
for some F , so with that in mind define v := F x + u . (If our hunch is correct then
optimal means v = 0.) Next we write x T Q x + u T R u and v T R v and dt d
( x T P x ) as
quadratic expressions in ( x , u ):
  
Q 0 x
x TQ x + u T R u = x T u T ,
0 R u
 T  
F RF F T R x
v R v = ( x F + u )R(F x + u ) = x u
T T T T T T
,
RF R u
dt ( x P x ) = ẋ T P x + x T P ẋ = ( x T A T + u T B T )P x + x T P (A x + B u )
d T

  
ATP + P A PB x
= xT uT .
B TP 0 u
Therefore
  
T A T P + P A + Q −F T RF P B −F T R x
x Q x+u R u−v
T T T
R v + dt
d
(x T P x) = x u .
B T P − RF 0 u
Since P is symmetric and satisfies (4.33), the choice F := R −1 B T P makes the
above matrix on the right-hand side equal to zero. So then
x T Q x + u T R u = − dt
d
(x T P x) + v T R v,
and, hence, the cost (4.31) equals
∞
J [0,∞) (x 0 , u ) = x 0 P x 0 +
T
v (t )T R v (t ) dt ,
0
whenever the input stabilizes the system. Given x 0 the above cost is mini-
mal for v = 0, provided it stabilizes. It does: since v := u + F x = u + R −1 B T P x
we have v = 0 iff u = −F x = −R −1 B T P x , and so the closed-loop system is
ẋ = (A − B R −1 B T P ) x , which, by assumption on P , is asymptotically stable. ■
142 4 L INEAR QUADRATIC C ONTROL

The theorem does not say that the ARE has a stabilizing solution. It only says
that if a stabilizing solution P exists, then it is unique and symmetric, and the
LQ problem with stability is solved, with u ∗ (t ) := −R −1 B T P x (t ) being the opti-
mal control. It is not yet clear under what conditions there exists a stabilizing
solution P of the ARE (4.33). This will be addressed by considering the solution
P T (t ) of the finite horizon problem, and showing how under stabilizability and
detectability assumptions1 limT →∞ P T (t ) exists and defines such a solution:

Theorem 4.5.6 (Three ways to solve the LQ problem with stability). Consider
the LQ problem with stability as formulated in Definition 4.5.3, and consider
the associated ARE (4.33). (In particular assume Q ≥ 0, R > 0.)
If (A, B ) is stabilizable and (Q, A) detectable, then there is a unique
stabilizing solution P of the ARE, and this P is symmetric. Consequently,
u ∗ (t ) := −R −1 B T P x (t ) solves the infinite horizon LQ problem with stability,
and x 0T P x 0 is the minimal cost. Moreover this unique P can be determined in
the following three equivalent ways:

1. P equals limT →∞ P T (t ) where P T (t ) is the solution of RDE (4.29),

2. P is the unique symmetric, positive semi-definite solution of the ARE,

3. P is the unique stabilizing solution of the ARE.

Proof. This proof assumes knowledge of detectability and stabilizability as


explained in Appendix A.6. First we prove equivalence of the three ways of com-
puting P , and later we comment on the uniqueness.
(1 =⇒ 2). Since (A, B ) is stabilizable, there is a state feedback u = −F x
that steers the state to zero exponentially fast for every x 0 , and, so, renders
the cost finite. Therefore the conditions of Theorem 4.5.2 are met. That is,
P := limT →∞ P T (t ) exists and it satisfies the ARE, and it is positive semi-definite.
(2 =⇒ 3). Assume P is a positive semi-definite solution of the ARE, and let
x be an eigenvector of A − B R −1 B T P with eigenvalue λ. We show that Re(λ) < 0.
The trick is to rewrite the ARE as

(A − B R −1 B T P )T P + P (A − B R −1 B T P ) + Q + P B R −1 B T P = 0.

Next, postmultiply this equation with the eigenvector x, and premultiply with
its complex conjugate transpose x ∗ :
 
x ∗ (A − B R −1 B T P )T P + P (A − B R −1 B T P ) + Q + P B R −1 B T P x = 0.

Since x is an eigenvector of A − B R −1 B T P the above simplifies to a sum of three


terms, the last two of which are nonnegative,

(λ∗ + λ)(x ∗ P x) + x ∗Qx + x ∗ P B R −1 B T P x = 0.

1 Stabilizability and detectability are discussed in Appendix A.6.


4.5 I NFINITE H ORIZON LQ AND A LGEBRAIC R ICCATI E QUATIONS 143

If Re(λ) ≥ 0 then (λ∗ +λ)x ∗ P x ≥ 0, implying that all the above three terms are in
fact zero: (λ∗ + λ)x ∗ P x = 0, Qx = 0, and B T P x = 0 (and, consequently, Ax = λx).
This contradicts detectability. So it cannot be that Re(λ) ≥ 0. It must be that
A − B R −1 B T P is asymptotically stable.
(Uniqueness & 3 =⇒ 1). Theorem 4.5.5 shows that there is at most one sta-
bilizing solution P of the ARE. Now P := limT →∞ P T (t ) is one solution that stabi-
lizes (because 1. =⇒ 2. =⇒ 3.). Hence the stabilizing solution of the ARE exists
and is unique, and it equals the matrices P from Item 1 and 2, which, hence, are
unique as well.
Theorem 4.5.5 then guarantees that u ∗ = −R −1 B T P x solves the LQ problem
with stability, and that x 0T P x 0 is the optimal cost. ■

Theorem 4.5.6 shows that we have several ways to determine the solution
P that solves the LQ problem with stability, namely (a) limT →∞ P T (t ), (b) the
unique symmetric positive semi-definite solution of the ARE, and (c) the unique
stabilizing solution of the ARE.

Example 4.5.7 (LQ problem with stability of the integrator system solved in
three ways). Consider again the integrator system

ẋ (t ) = u (t ), x (0) = x0
and cost
∞
J [0,∞) (x 0 , u ) = x 2 (t ) + u 2 (t ) dt .
0

This system is stabilizable, and (Q, A) = (1, 0) is detectable. We determine the LQ


solution P in the three different ways as explained in Theorem 4.5.6:

1. In Example 4.5.1 we handled the finite horizon case of this problem, and
we found that P := limT →∞ P T (t ) = 1.

2. We could have gone as well for the unique symmetric, positive semi-
definite solution of the ARE. The ARE in this case is

−P 2 + 1 = 0,

and, clearly, the only (symmetric) positive semi-definite solution is P = 1.

3. The ARE has two solutions, P = ±1, and Theorem 4.5.6 guarantees that
precisely one of them is stabilizing. The solution P is stabilizing if A −
B R −1 B T P = −P is less than zero. Clearly this, again, gives P = 1.


While for low-order systems the 2nd option (that P is positive semi-definite)
is often the easiest way to determine P , general numerical recipes usually
exploit the 3rd option. This is explained in the final part of this section where
we examine the connection with Hamiltonian matrices.
144 4 L INEAR QUADRATIC C ONTROL

Connection between Hamiltonians and ARE’s

For the finite horizon LQ problem we established in Lemma 4.4.3 a tight con-
nection between solutions P (t ) of RDE’s and Hamiltonian matrices H . For the
infinite horizon case a similar connection exists. This we explore now.
A matrix P satisfies the ARE

P A + A T P − P B R −1 B T P + Q = 0

iff

−Q − A T P = P (A − B R −1 B T P ),

that is, iff


    
A −B R −1 B T I I
= (A − B R −1 B T P ). (4.35)
−Q −A T
P P
  
H

This is interesting because in the case that all matrices here are numbers (and
the Hamiltonian matrix H hence a 2×2 matrix) then it says that PI is an eigen-
vector of the Hamiltonian matrix, and that A − B R −1 B T P is its eigenvalue. This
connection between P and eigenvectors/eigenvalues of the Hamiltonian matrix
H is the key to most numerical routines for computation of P . This central
result is formulated in the following theorem. The subsequent examples show
how the result can be used to find P concretely.

Theorem 4.5.8 (Computation of P ). Define H ∈ R(2n)×(2n) as in (4.35), and


assume that Q ≥ 0, R > 0. If (A, B ) is stabilizable and (Q, A) detectable, then

1. H has no imaginary eigenvalues, and it has n asymptotically stable eigen-


values and n unstable eigenvalues. Also, λ is an eigenvalue of H iff so is
−λ,

2. matrices V ∈ R(2n)×n of rank n exist that satisfy H V = V Λ for some


asymptotically stable Λ ∈ Rn×n ,
V1
3. for any such V ∈ R(2n)×n , if we partition V as V = V2 with V1 ,V2 ∈ Rn×n ,
then V1 is invertible,

4. the ARE (4.33) has a unique stabilizing solution P . In fact

P :=V2V1−1 ,

is the unique answer, and it is symmetric.


4.5 I NFINITE H ORIZON LQ AND A LGEBRAIC R ICCATI E QUATIONS 145

Proof. This proof is involved. We assume familiarity with detectability and sta-
bilizability as explained in Appendix A.6. The proof again exploits the remark-
able property that solutions of the associated Hamiltonian system (now with
initial conditions, possibly complex-valued),
        
ẋ (t ) A −B R −1 B T x (t ) x (0) x0
= , = ∈ C2n (4.36)
ṗ (t ) −Q −A T p (t ) p (0) p0

satisfy

d
dt ( p ∗ x ) = −( x ∗Q x + p ∗ B R −1 B T p ), (4.37)

(see the proof of Lemma 4.2.2). Note that we consider the system of differential
equations over C2n , instead of over R2n , and here p ∗ means the complex con-
jugate transpose of p . The reason is that eigenvalues and eigenvectors may be
complex-valued. Integrating (4.37) over t ∈ [0, ∞) tells us that
∞
x ∗ (t )Q x (t ) + p ∗ (t )B R −1 B T p (t ) dt = p 0∗ x0 − lim p ∗ (t ) x (t ), (4.38)
0 t →∞

x (t )
provided the limit exists. In what follows we denote by p (t ) the solution
of (4.36).
x
1. Suppose p00 is an eigenvector of H with imaginary eigenvalue λ. Then
x (t ) λt x 0 ∗
p (t ) = e p 0 . Now p (t ) x (t ) is constant, hence both sides of (4.37) are
zero for all time. So both x ∗ (t )Q x (t ) and B T p (t ) are zero for all time.
Inserting this into (4.36) shows that λx 0 = Ax 0 and λp 0 = −A T p 0 . Thus
A−λI
Q x 0 = 0 and p 0∗ A+λI B = 0. Stabilizability and detectability imply
x
that then x 0 = 0, p 0 = 0, but p00 is an eigenvector, so nonzero. Contradic-
tion, hence H has no imaginary eigenvalues.
Exercise 4.19 shows that r (λ) := det(λI − H ) equals r (−λ). So H has as
many (asymptotically) stable eigenvalues as unstable eigenvalues.

2. Since H has no imaginary eigenvalues and has n asymptotically stable


eigenvalues, linear algebra tells us that a (2n)×n matrix V exists of rank n
such that H V = V Λ with Λ asymptotically stable. (If all n asymptotically
stable eigenvalues are distinct then we can simply take V = v 1 · · · v n
where v 1 , . . . , v n are eigenvectors corresponding to the asymptotically sta-
ble eigenvalues λ1 , . . . , λn of H , and then Λ is the diagonal matrix with
these eigenvalues on the diagonal. If some eigenvalues coincide then one
might need a Jordan normal form and use generalized eigenvectors.)

3. Suppose, to obtain a contradiction, that V has rank n but that V1 is


x
singular. Then the subspace spanned by VV12 contains an p00 with
x (t )
x 0 = 0, p 0 = 0. The solution p (t ) for this initial condition converges to
146 4 L INEAR QUADRATIC C ONTROL

zero2 . Hence the integral in (4.38) equals p 0∗ x 0 = 0. That can only be if


Q x (t ) and B T p (t ) are zero for all time. Equation (4.36) then implies that
ṗ (t ) = −A T p (t ), p (0) = p 0 . We claim that this contradicts stabilizability.
Indeed, since B T p (t ) = 0 for all time, we have

ṗ (t ) = −(A T − LB T ) p (t ), p (0) = p 0 = 0 (4.39)

for every L. By stabilizability there is an L such that A − B L T is asymp-


totically stable. Then all eigenvalues of −(A T − LB T ) are anti-stable, and
thus the solution p (t ) of (4.39) diverges. But we know that limt →∞ p (t ) = 0.
Contradiction, so the assumption that V1 is singular is wrong.

4. Let P = V2V1−1 . Since H V = V Λ we have that H PI = PI V1 ΛV1−1 . Also


V1 ΛV1−1 is asymptotically stable because it has the same eigenvalues as Λ
(assumed asymptotically stable). Hence
    
A −B R −1 B T I I
= Λ̂ (4.40)
−Q −A T P P

for some asymptotically stable Λ̂ ∈ Rn×n . Premultiplying (4.40) from the


left with −P I shows that
  
A −B R −1 B T I
−P I = 0.
−Q −A T P

This equation is nothing else than the ARE (verify this for yourself ). And P
is a stabilizing solution because A −B R −1 B T P = Λ̂ is asymptotically stable.
Uniqueness and symmetry of P we showed earlier (Theorem 4.5.5).

Realize that any V ∈ R(2n)×n of rank n for which H V = V Λ does the job if Λ
is asymptotically stable. That is, even though there are many such V , we always
have that V1 is invertible and that P follows uniquely as P = V2V1−1 . As already
mentioned in the above proof, in case H has n distinct asymptotically stable
eigenvalues λ1 , . . . , λn , with eigenvectors v 1 , . . . , v n , then we can take

V = v1 v2 ··· vn

for then Λ is diagonal with


⎡ ⎤
λ1 0 · · · 0
⎢0 λ .. .. ⎥
⎢ 2 . . ⎥
Λ = ⎢ .. . . . . ⎥,
⎣ . . . 0 ⎦
0 ··· 0 λn

and this matrix clearly is asymptotically stable.


2 If x 0 = V z for some z then x (t ) = V z (t ) where z (t ) is the solution of ż (t ) = Λ z (t ), z (0) =
p0 0 0 p (t )
z 0 . If Λ is asymptotically stable then z (t ) → 0 as t → ∞.
4.5 I NFINITE H ORIZON LQ AND A LGEBRAIC R ICCATI E QUATIONS 147

Example 4.5.9 (n = 1). Consider once more the integrator system ẋ (t ) = u (t )


∞
and cost 0 x 2 (t ) + u 2 (t ) dt . That is, A = 0, B = Q = R = 1. The Hamiltonian
matrix for this case is
 
0 −1
H = .
−1 0

Its characteristic polynomials is λ2 − 1, and the eigenvalues are λ1,2 = ±1. Its
asymptotically stable eigenvalue is λas = −1, and it is easy to verify that v is an
eigenvector corresponding to this asymptotically stable eigenvalue iff
   
v1 1
v := = c, c = 0.
v2 1

According to Lemma 4.5.8 the stabilizing solution P of the ARE is


v2 c
P = v 2 v 1−1 = = = 1.
v1 c

This agrees with what we found in Example 4.5.7. As predicted, P does not
depend on the choice of eigenvector (the choice of c). Also, the (eigen)value of
A − B R −1 B T P = −1 as predicted equals the asymptotically stable eigenvalue of
the Hamiltonian matrix, λas = −1. The optimal control is u ∗ = −R −1 B T P x = − x .


Example 4.5.10 (n = 2). Consider the stabilizable system


   
0 1 0
ẋ (t ) = x (t ) + u (t ),
0 0 1

with standard cost


∞
x 21 (t ) + x 22 (t ) + u 2 (t ) dt .
0

The associated Hamiltonian matrix is (verify this yourself )


⎡ ⎤
0 1 0 0
⎢ 0 −1 ⎥
⎢ 0 0 ⎥
H =⎢ ⎥.
⎣ −1 0 0 0 ⎦
0 −1 −1 0

Its characteristic polynomial is λ4 − λ2 + 1, and the four eigenvalues turn out to


be
 
λ1,2 = − 12 3 ± 12 i, λ3,4 = + 12 3 ± 12 i.
148 4 L INEAR QUADRATIC C ONTROL

The first two eigenvalues, λ1,2 , are asymptotically stable so we need eigenvec-
tors corresponding to these two. Not very enlightening manipulation shows that
we can take
⎡ ⎤
−λ1,2
⎢−λ2 ⎥
⎢ ⎥
v 1,2 = ⎢ 1,2 ⎥ .
⎣ 1 ⎦
λ31,2

Now V ∈ C4×2 defined as


⎡ ⎤
−λ1 −λ2
⎢−λ2 −λ22 ⎥
⎢ 1 ⎥
V = v1 v2 = ⎢ ⎥
⎣ 1 1 ⎦
λ31 λ32

is the V we need. (Note that this matrix is complex; this is not a problem.) With
V known, it is easy to compute the stabilizing solution of the ARE,
  −1  
1 1 −λ1 −λ2 3 1
P = V2V1−1 = 3 =  .
λ1 λ32 −λ21 −λ22 1 3

The optimal input is u ∗ = −R −1 B T P x = −p 21 x 1 − p 22 x 2 = − x 1 − 3 x 2 . The LQ-
optimal closed-loop system is described by
 
0 1
ẋ ∗ (t ) = (A − B R −1 B T P ) x ∗ (t ) =  x ∗ (t ),
−1 3

and its eigenvalues are λ1,2 = − 12 3 ± 12 i (which, as predicted, are the asymptot-
ically stable eigenvalues of H ). 

In the above example the characteristic polynomial λ4 − λ2 + 1 is of degree


4, but by letting μ = λ2 it reduces to the polynomial μ2 − μ + 1 of degree 2. This
works for every Hamiltonian matrix, see Exercise 4.19.

Example 4.5.11. In Example 4.5.10 we found the solution


 
3 1
P= 
1 3

via the eigenvectors of the Hamiltonian, which, by construction, gives us the


stabilizing solution of the ARE. This solution is positive semi-definite according

to Theorem 4.5.6. Let us verify. Clearly P is symmetric, and since p 1,1 = 3 > 0
and det(P ) = 2 > 0 it indeed is positive semi-definite (in fact, positive definite).

4.6 C ONTROLLER D ESIGN WITH LQ O PTIMAL C ONTROL 149

4.6 Controller Design with LQ Optimal Control

In five examples we explore the use of infinite horizon LQ theory for the design
of controllers. The first two examples discuss the effect of tuning parameters on
the control and cost. The final three examples are about control of cars.

Example 4.6.1 (Tuning the controller). Consider the system with output,

ẋ (t ) = u (t ), x (0) = x0 = 1,
y (t ) = 2 x (t ),
and suppose the task is to steer the output y to zero “quickly” but without using
“excessive” inputs u . A way to resolve this problem is by considering the LQ
problem with stability with cost
∞ ∞
y (t ) + ρ u (t ) dt =
2 2 2
4 x 2 (t ) + ρ 2 u 2 (t ) dt .
0 0

This cost includes a tuning parameter

ρ > 0,

which we will choose so as to achieve an acceptable compromise between


“small” y and “small” u . For large values of ρ we put a strong penalty on u
in the cost function, hence we expect the optimal u ∗ to be small in that case.
Conversely, for ρ close to zero the input is “cheap”, and the optimal input in
that case is probably going to be “large” and is able to steer to output y to zero
quickly.
We have A = 0, B = 1, R = ρ 2 ,Q = 4, thus the ARE (4.33) and optimal input for
this problem are
1 2 1
4− P = 0, u∗ = − P x.
ρ2 ρ2
Clearly this means P = ±2ρ. Since A − B R −1 B T P = ∓2/ρ needs to be negative,
we find that
2
P = +2ρ, u∗ = − x,
ρ

and the optimal closed-loop system is ẋ ∗ (t ) = − ρ2 x ∗ (t ). In particular we have


2 2 2
y ∗ (t ) = 2 e− ρ t , u ∗ (t ) = − e− ρ t .
ρ
Let us consider several different values and ranges of the tuning parameter ρ:
• If ρ = 1 then the input u is “as cheap” or “as expensive” as y ; they
are equally weighted. The closed-loop eigenvalue in that case is A −
B R −1 B T P = −2, and the optimal u ∗ and y ∗ have the same magnitude:
| u ∗ (t )| = | y ∗ (t )| = 2 e−2t . (See the red graphs of Fig. 4.2.)
150 4 L INEAR QUADRATIC C ONTROL

• If 0 < ρ  1 then the control input u is “cheap”. The closed-loop system is


fast now (the closed-loop eigenvalue is −2/ρ  −2 < 0), and both u ∗ , y ∗
converge to zero quickly, but u ∗ initially is relatively large (in magnitude):
| u ∗ (0)| = 2/ρ = | y ∗ (0)|/ρ | y ∗ (0)|. That is to be expected since control is
cheap. (See the yellow graphs of Fig. 4.2.)

• Conversely, if ρ 1 then the input u is “expensive”. The closed-loop


system is now slow (the closed-loop eigenvalue is −2/ρ ≈ 0), and both
u ∗ , y ∗ converge to zero slowly, although u initially is already small: u ∗ (0) =
−2/ρ ≈ 0. That is to be expected since control is expensive. (See the blue
graphs of Fig. 4.2.)

It is not hard to see that the optimal solutions satisfy


∞ ∞
1
y 2∗ (t ) dt = ρ and u 2∗ (t ) dt = .
0 0 ρ
∞ 2 ∞ 2
Hence 0 y ∗ (t ) dt = 1/ 0 u ∗ (t ) dt . This relation establishes once more that
small inputs result in large outputs, and that large inputs result in small out-
puts. 

y (t )

0
4 t

u (t )

F IGURE 4.2: Graphs of optimal y ∗ and u ∗ for ρ = 1/2 (yellow), for ρ = 1


(red), and ρ = 2 (blue). The larger ρ is the slower the system is and the
smaller | u ∗ (0)| is. See Example 4.6.1.

Example 4.6.2 (Two tuning parameters). Consider the third-order system


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0 0 1
ẋ (t ) = ⎣ 0 0 1 ⎦ ⎣
x (t ) + 0⎦ u (t ), ⎣
x (0) = x0 = 0⎦ ,
0 −1 −0.1 1 0
y (t ) = 1 1 0 x (t ).
4.6 C ONTROLLER D ESIGN WITH LQ O PTIMAL C ONTROL 151

We want to steer the output y to zero quickly but not too steeply, so ẏ should
be small as well, and all that using small u . This requires a cost function with
two tuning parameters,
∞
σ2 y 2 (t ) + (1 − σ2 ) ẏ 2 (t ) + ρ 2 u 2 (t ) dt .
0
The parameter σ ∈ [0, 1] defines the trade-off between small y and small ẏ ,
and the parameter ρ > 0 defines the trade-off between small ( y , ẏ ) and small u .
Given σ and ρ the LQ solution can be determined numerically using the
eigenvalues and eigenvectors of the corresponding Hamiltonian matrix, but we
skip the details. In what follows we take as initial state x 0 = (1, 0, 0). Figure 4.3
shows the response of the optimal u ∗ and resulting y ∗ for various combinations
of σ and ρ. For σ = 1 the term ẏ is not included in the cost, so we can expect
“steep” behavior in the output. For σ ≈ 0 the output converges slowly to zero. As
for ρ, we see that smaller ρ means larger controls u ∗ and faster convergence to
zero of the output y ∗ .
Assuming we can live with inputs u of at most 2 (in magnitude) then ρ 2 = 0.2
is a reasonable choice (the red graphs in Fig. 4.3). Given that, a value of σ2 =
0.75 may be a good compromise between overshoot and settling time in the
response y ∗ . For this ρ 2 = 0.2, σ2 = 0.75, the optimal control turns out to be
u ∗ = −R −1 B T P x = −(1.9365 x 1 + 3.0656 x 2 + 2.6187 x 3 ),
and the eigenvalues of A − B R −1 B T P are −0.7468 and −0.9859 ± 1.2732i. 

Example 4.6.3 (Control of a car connected to a wall via a spring). Consider


a car of mass m connected to a wall via a spring with spring constant k, see
Fig. 4.4. The position of the car is denoted by y. Assume the car is controlled by
an external force u. Newton’s second law says that
m ÿ (t ) + k y (t ) = u (t ).
To keep matters simple we take k = 1 and m = 1, so
ÿ (t ) + y (t ) = u (t ).
For zero input the car continues to oscillate around y = 0. The task of the con-
troller is to bring the car quickly to a stand still at position y = 0 but without
using excessive force u . We propose to take as cost
∞
y 2 (t ) + 13 u 2 (t ) dt .
0
A state representation of the system is
   
0 1 0
ẋ (t ) = x (t ) + u (t ),
−1 0 1
y (t ) = 1 0 x (t ).
152 4 L INEAR QUADRATIC C ONTROL

F IGURE 4.3: LQ-optimal responses of u ∗ (left) and y ∗ (right) for various


combinations of σ ∈ [0, 1] and ρ > 0. See Example 4.6.2.
4.6 C ONTROLLER D ESIGN WITH LQ O PTIMAL C ONTROL 153

k
u
m

F IGURE 4.4: Car connected to a wall via a spring and with a force control
u. See Example 4.6.3.

k LQ 1
k 1
m

c LQ 2

1
u
y x1
0.5

0.5

1
0 2 4 6 8 10
time t

F IGURE 4.5: Top: a car connected to a wall via a spring (on the left). The car
is controlled with an LQ-optimal force u ∗ (t ) = −k LQ y (t ) − c LQ ẏ (t ) imple-
mented as spring/damper system (on the right). Bottom: responses u ∗ (t )
and y (t ) = x 1 (t ) for initial state x (0) = 10 . See Example 4.6.3.
154 4 L INEAR QUADRATIC C ONTROL

Here, the first state component is x 1 = y and the second is x 2 = ẏ . This way the
∞
cost becomes 0 x 21 (t ) + 13 u 2 (t ) dt , and the stabilizing solution of the ARE turns
out to be3
  
1 2 2 1
P=  ,
3 1 2
 
and the eigenvalues of A − B R −1 B T P are − 12 2 ± 3/2i, while
 
u ∗ (t ) = −3B T P x (t ) = − 1 2 x (t ) = − y (t ) − 2 ẏ (t ).

An analog implementation of this control law is a spring with  spring constant


k LQ = 1 parallel to a damper with friction coefficient c LQ = 2, see Fig. 4.5(top).
For x 0 = 10 the LQ-optimal input and output converge to zero quickly, although
there is some overshoot, see Fig. 4.5 (bottom). 

u
m

c ẏ

F IGURE 4.6: A car at position y subject to a friction force −c ẏ and external


force u. See Example 4.6.4.

Example 4.6.4 (Control of a car subject to friction). In the previous example we


considered a car connected to a wall via a spring. Now we consider a car that is
subject to damping (e.g., linear friction). As in the previous example, m denotes
the mass of the car, and y is its position. The input u is an external force, and
−c ẏ models the friction force, see Fig. 4.6. The model is

m ÿ (t ) + c ẏ (t ) = u (t ), c > 0. (4.41)

We take the mass equal to m = 1, and we leave the friction coefficient c arbitrary
(but positive). As state we take x :=( y , ẏ ). Then (4.41) becomes
   
0 1 0
ẋ (t ) = A x (t ) + B u (t ) with A = , B= .
0 −c 1
The aim is again to bring the mass to rest but without using excessive control
effort. A possible solution is to minimize the cost
∞
J [0,∞) (x 0 , u ) = y 2 (t ) + ρ 2 u 2 (t ) dt .
0

3 This is the reason we took R = 1 . Other values yield more complicated expressions for P .
3
4.6 C ONTROLLER D ESIGN WITH LQ O PTIMAL C ONTROL 155

Again the parameter ρ > 0 defines a trade-off between small y and small u . The
matrices Q and R for this cost are
 
1 0
Q= , R = ρ2.
0 0

(It can be verified that (A, B ) is stabilizable and (Q, A) is detectable.) The ARE
becomes
         
0 1 0 0 0 0 1 0 0 0
P + P −P P + = .
0 −c 1 −c 0 ρ −2 0 0 0 0

This matrix equation is effectively a set of three scalar equations in three


unknowns. Indeed, the matrix P is symmetric so is characterized by three num-
p p 12
bers, P = p 11 12 p 22
, and then the above left-hand side is symmetric so it equals
zero iff its (1, 1)-element, (1, 2)-element and (2, 2)-element are zero. This gives

0 = 1 − ρ −2 p 12
2
,
0 = p 11 − cp 12 − ρ −2 p 12 p 22 ,
0 = 2p 12 − 2cp 22 − ρ −2 p 22
2
.

From the first equation we find that p 12= ±ρ. If p 12 = +ρ then the third equa-
tion gives two possible p 22 = ρ 2 (−c ± c 2 + 2/ρ). One is positive, the other is
negative. We need the positive solution because positive semi-definiteness of P
requires p 22 ≥ 0. Now that p 12 and p 22 are known, the second equation settles
p 11 . This turns out to give

c 2 + 2/ρ 1
P =ρ    . (4.42)
1 ρ −c + c 2 + 2/ρ

(Similarly, for p 12 = −ρ the resulting P turns out not to be positive semi-


definite.) Conclusion: the P of (4.42) is the unique positive semi-definite solu-
tion P of the ARE. Hence it is the solution we seek. The optimal control is

u ∗ (t ) = −ρ −2 B T P x (t )
  
= − ρ1 c− c 2 + 2/ρ x (t )
  
= − ρ y (t ) + c − c 2 + 2/ρ ẏ (t ).
1

This optimal control is a linear combination of the displacement y (t ) and veloc-


ity ẏ (t ); similarly to the solution found in Example 4.6.3. These two terms can
be interpreted as a spring and friction force in parallel, connected to a wall, see
Fig. 4.7. 
156 4 L INEAR QUADRATIC C ONTROL

1/

c ẏ c2 2/ c

F IGURE 4.7: A car at position y subject to a friction force −c ẏ. It is opti-


mally controlled with
 a spring with spring constant 1/ρ and a damper with
friction coefficient c 2 + 2/ρ − c. See Example 4.6.4.

u
k1 k2

m1 m2

c1 c2
q1 q2

F IGURE 4.8: Two connected cars. The purpose is to control the second car
with a force u that acts on the first car. See Example 4.6.5.

Example 4.6.5 (Connected cars). In this example we consider an application of


two connected cars. The state dimension in this case is four which is too high to
easily determine the solution of the Riccati equation by hand. The solution will
be determined numerically.
The two cars are connected to each other with springs and dampers, and
with the car on the left connected to a wall, see Fig. 4.8. The two spring con-
stants are denoted k 1 and k 2 , and the two friction coefficients c 1 and c 2 . The
horizontal positions of the two cars relative to the equilibrium positions are
denoted q 1 and q 2 respectively, and the two masses are m 1 and m 2 . We can
control the first car with an additional force u, but we want to control the posi-
tion q 2 of the second car. This application represents a common situation (for
instance in a robotics context) where the control action is physically separated
from the part that needs to be controlled.
The standard model for this system is
          
m1 0 q̈ 1 (t ) c 1 +c 2 −c 2 q̇ 1 (t ) k 1 +k 2 −k 2 q 1 (t ) u (t )
+ + = .
0 m 2 q̈ 2 (t ) −c 2 c2 q̇ 2 (t ) −k 2 k2 q 2 (t ) 0

For simplicity we take all masses and spring constants equal to one, m 1 = m 2 =
1, k 1 = k 2 = 1, and that the friction coefficients are small and the same: c 1 = c 2 =
0.1. Then the linear model in terms of the state x defined as x = ( q 1 , q 2 , q̇ 1 , q̇ 2 )
4.6 C ONTROLLER D ESIGN WITH LQ O PTIMAL C ONTROL 157

1
q2 (t )

0.5
q1 (t )

0.5

1
0 5 10 15 20 25
time t

1
q2 (t )

0.5
q1 (t )

0.5

1
0 5 10 15 20 25
time t

0.5

u (t )
0

0.5

1
0 5 10 15 20 25
time t

F IGURE 4.9: Top: positions of the uncontrolled cars. Middle: positions of


the controlled cars. Bottom: control force u ∗ for the controlled car. For all
cases the initial state is q 1 (0) = 0, q 2 (0) = 1, q̇ 1 (0) = 0, q̇ 2 (0) = 0. See Exam-
ple 4.6.5.
158 4 L INEAR QUADRATIC C ONTROL

becomes
⎡ ⎤ ⎡ ⎤
0 0 1 0 0
⎢0 ⎥ ⎢0⎥
⎢ 0 0 1 ⎥ ⎢ ⎥
ẋ (t ) = ⎢ ⎥ x (t ) + ⎢ ⎥ u (t ).
⎣−2 1 −0.2 0.1 ⎦ ⎣1⎦
1 −1 0.1 −0.1 0

As the friction coefficients are small one may expect sizeable oscillations when
no control is applied. Indeed, the above A matrix has two eigenvalues close to
the imaginary axis (at −0.0119±i0.6177 and −0.1309±i1.6127), and for the initial
state x 0 = (0, 1, 0, 0) and u = 0 the positions q 1 , q 2 of the two cars oscillate for a
long time, see Fig. 4.9(top).
To control the second car with the force u we propose the solution of the
infinite horizon LQ problem with cost
∞
q 22 (t ) + R u 2 (t ) dt .
0

The value of R was set, somewhat arbitrarily, to R = 0.2. Since A is asymptoti-


cally stable we know that (A, B ) is stabilizable and (Q, A) is detectable whatever
B and Q we have. Therefore the conditions of Theorem 4.5.6 are met, and so we
are guaranteed that the stabilizing solution P of the Riccati equation exists and
is unique. The solution, obtained numerically, is
⎡ ⎤
0.4126 0.2286 0.2126 0.5381
⎢0.2286 0.9375 0.0773 0.5624⎥
⎢ ⎥
P =⎢ ⎥,
⎣0.2126 0.0773 0.2830 0.4430⎦
0.5381 0.5624 0.4430 1.1607

and the optimal state feedback control u ∗ (t ) = −R −1 B T P x (t ) follows as

u ∗ (t ) = − 1.0628 0.3867 1.4151 2.2150 x (t ).


Under this control the response for the initial state x 0 = (0, 1, 0, 0) is damped
much stronger than without control, see Fig. 4.9(middle). The eigenvalues of
the closed loop system ẋ ∗ (t ) = (A − B R −1 B T P ) x ∗ (t ) are −0.5925 ± 0.6847i and
−0.2651 ± 1.7081i, and these are considerably further away from the imaginary
axis than the eigenvalues of A, and the imaginary parts are almost the same.
This confirms the stronger damping in the controlled system.
All this is achieved with a control force u ∗ (t ) that never exceeds 0.4 in mag-
nitude for this initial state, see Fig. 4.9 (bottom). Notice that the optimal control
u ∗ (t ) starts out negative but turns positive way before q 2 (t ) becomes zero for
the first time. So apparently it is optimal to initially speed up the first car away
from the second car, but only for a very short period of time, and then for the
next couple of seconds to move the first car towards the second car.
For the initial state x 0 = (0, 1, 0, 0) the optimal cost x 0T P x 0 follows from the
∞
(2, 2)-element of P : 0 q 2∗2 (t ) + R u 2∗ (t ) dt = 0.9375. 
4.7 E XERCISES 159

4.7 Exercises

4.1 Hamiltonian matrix. Let T > 0 and consider the system

ẋ (t ) = 3 x (t ) + 2 u (t ), x (0) = x0

with cost
T
J [0,T ] (x 0 , u ) = 4 x 2 (t ) + u 2 (t ) dt .
0

(a) Determine the Hamiltonian matrix H .


(b) It can be shown that
 
1 4 e5t + e−5t −2 e5t +2 e−5t
eH t = .
5 −2 e5t +2 e−5t e5t +4 e−5t

For arbitrary T > 0 determine the optimal x ∗ (t ), u ∗ (t ), p ∗ (t ) and the


optimal cost.

4.2 Hamiltonian equations for an LQ problem with negative weight. Consider


the system and cost of Example 4.4.2. Special about the example is that
Q < 0. This makes it a non-standard LQ problem. In the example we
found an optimal control only if 0 ≤ T < π/2. For T = π/2 the method
failed. In this exercise we use Hamiltonian equations to analyze the case
T = π/2.

(a) Determine the Hamiltonian matrix H for this problem.


(b) It can be shown that
 
Ht cos(t ) − sin(t )
e = .
sin(t ) cos(t )

Use this to confirm the claim that for T = π/2 the Hamiltonian equa-
tions (4.8) have no solution if x 0 = 0.
(c) Does Pontryagin’s minimum principle allow us to conclude that for
T = π/2 and x 0 = 0 no optimal control u ∗ exists?
π/2 π/2
(d) A Wirtinger inequality. Show that 0 ẋ 2 (t ) dt ≥ 0 x 2 (t ) dt for all
smooth x for which x (0) = x 0 := 0, and show that equality holds iff
x (t ) = A sin(t ).

4.3 A direct proof of why solutions of the Hamiltonian equations in the LQ


problem are optimal solutions. Following Theorem 4.2.4 we argued that
the LQ problem satisfies the convexity conditions of Theorem 2.8.1, so
satisfaction of the Hamiltonian equations is both necessary and sufficient
for optimality. There is also a direct proof of this result. It exploits the
160 4 L INEAR QUADRATIC C ONTROL

linear/quadratic nature of the LQ problem. To simplify matters a bit we


assume in this exercise that
R = I, S = 0,
and, as always, Q ≥ 0.

(a) Show that the finite horizon LQ problem satisfies the convexity
assumptions of Theorem 2.8.1. [Hint: Appendix A.7 may be useful.]
(b) Consider a solution ( x ∗ , p ∗ ) of (4.8), and define u ∗ = −B T p ∗ . Now
consider an arbitrary input u and corresponding state x , an define
v = u − u∗.
i. Show that z := x − x ∗ satisfies ż (t ) = A z (t ) + B v (t ), z (0) = 0.
ii. Show that
T
J [0,T ] (x 0 , u ) − J [0,T ] (x 0 , u ∗ ) = z T Q z + v T v + 2 z T Q x ∗ + 2 u ∗T v dt .
0
(For readability we dropped here the time argument.)
iii. Show that dt d
( p ∗T (t ) z (t )) = − z T (t )Q x ∗ (t ) − u ∗T (t ) v (t ).
iv. Show that
T
J [0,T ] (x 0 , u ) − J [0,T ] (x 0 , u ∗ ) = z T (t )Q z (t ) + v T (t ) v (t ) dt ,
0
and argue that u ∗ is the optimal control.

4.4 There are RDE’s whose solution is constant. Let T > 0 and consider
ẋ (t ) = x (t ) + u (t ), x (0) = x0 := 1,
with cost
T
J [0,T ] (x 0 , u ) = 2 x 2 (T ) + u 2 (t ) dt .
0
(a) Determine the RDE.
(b) Solve the RDE. [Hint: the solution happens to be constant.]
(c) Determine the optimal state x ∗ (t ) and input u ∗ (t ) explicitly as func-
tions of time.
(d) Verify that J [0,T ] (1, u ∗ ) = P (0).

4.5 Why LQ-optimal inputs are linear in the state, and costs are quadratic in
the state. In this exercise we prove, using only elementary arguments (but
not easy arguments), that the optimal control in LQ control is linear in
the state, and that the value function is quadratic in the state. Consider
ẋ (t ) = A x (t )+B u (t ) with the standard LQ cost over the time window [t , T ],
T
J [t ,T ] ( x (t ), u ) = x (T )S x (T ) +
T
x T (τ)Q x (τ) + u T (τ)R u (τ) dτ,
t
and let V (x, t ) be its value function.
4.7 E XERCISES 161

(a) Exploit the quadratic nature of the cost to prove that for every λ ∈ R,
every two x, z ∈ Rn and every two inputs u , w we have

J [t ,T ] (λx, λ u ) = λ2 J [t ,T ] (x, u ),
J [t ,T ] (x + z, u + w ) + J [t ,T ] (x − z, u − w ) = 2J [t ,T ] (x, u ) + 2J [t ,T ] (z, w ).
(4.43)

(The second identity is known as the parallelogram law.)


(b) Prove that V (λx, t ) = λ2 V (x, t ), and that input λ u ∗ is optimal for ini-
tial state λx if u ∗ is optimal for initial state x.
(c) Conclude that

V (x + z, t ) + V (x − z, t ) ≤ 2 V (x, t ) + 2 V (z, t ).

[Hint: minimize the right-hand side of (4.43) over all u , w .]


(d) Likewise conclude that

V (x + z, t ) + V (x − z, t ) ≥ 2 V (x, t ) + 2 V (z, t ).

[Hint: minimize the left-hand side of (4.43) over all u + w , u − w .]


(e) Suppose u x is the optimal input for x, and w z is the optimal input
for z. Use (a,c,d) to show that

J [t ,T ] (x + z, u x + w z ) − V (x + z, t ) = V (x − z, t ) − J [t ,T ] (x − z, u x − w z ).

(f ) Let λ ∈ R. Prove that if u x is the optimal input for x, and w z is the


optimal input for z, then u x + λ w z is the optimal input for x + λz.
[Hint: both sides of the identity of the previous part are zero. Why?]
(g) The previous part shows that the optimal control u ∗ : [t , T ] → Rm for
J [t ,T ] ( x (t ), u ) is linear in x (t ). Show that this implies that at each t
the optimal control u ∗ (t ) is linear in x (t ).
(h) Argue that V (x, t ) for each t is quadratic in the state, i.e., that
V (x, t ) = x T P (t )x for some matrix P (t ) ∈ Rn×n . [Hint: it is quadratic
iff V (x +λz, t )+V (x −λz, t ) = 2 V (x, t )+λ2 V (z, t ) for all x, z and scalars
λ.]

4.6 Solution of scalar RDE. Consider the scalar system ẋ (t ) = A x (t ) + B u (t ),


that is, with x having just one entry. Then A, B,Q, R are numbers. As
always we assume that Q ≥ 0 and R > 0. The RDE in the scalar case may
be solved explicitly, as we show in this exercise.

(a) Suppose B = 0. Show that the RDE (4.22) is of the form

Ṗ (t ) = γ(P (t ) − α)(P (t ) − β).

for some nonzero real numbers α, β, γ and with γ > 0.


162 4 L INEAR QUADRATIC C ONTROL

(b) Consider the above scalar differential equation in P (t ). Prove that


1
G(t ) :=
P (t ) − α
satisfies
Ġ(t ) = γ(β − α)G(t ) − γ.

(c) The above assumes that P (t ) = α. Solve the differential equation of


(a) directly for the case that P (t̄ ) = α for some t̄ ∈ [0, T ]. (Yes, the
answer is easy.)
(d) Solve the RDE (4.22) for A = −1, B = 2,Q = 4, R = 2, S = 0, and final
time T = 1.
(e) Solve the RDE (4.22) for A = −1, B = 2,Q = 4, R = 2, S = 1, and final
time T = 1.
(f ) Solve the RDE (4.22) for A = 0, B = 1,Q = 1, R = 1, S = 0, and final time
T = 1.

F IGURE 4.10: Graph of P (t ) for several values of P (T ) = s. If s = 3 then P (t )


is constant (shown in red). See Exercise 4.7.

4.7 Dependence of P (t ) and x ∗ (t ) on the final cost. Consider the scalar system
ẋ (t ) = x (t ) + u (t ), x (0) = x0 ,
and cost
T
J [0,T ] (x 0 , u ) = s x (T ) +
2
3 x 2 (t ) + u 2 (t ) dt .
0
Here s is some nonnegative number.

(a) Determine the associated RDE (4.22), and verify that the solution is
given by
3 − d e4(t −T ) 3−s
P (t ) = for d := .
1 + d e4(t −T ) 1+s
[Hint: Exercise 4.6 is useful.]
4.7 E XERCISES 163

(b) Figure 4.10 depicts the graph of P (t ) for several s ≥ 0. The graphs
suggest that P (t ) is an increasing function if s > 3, and a decreasing
function if 0 ≤ s < 3. Use the RDE to formally verify this property.
(c) It appears that for s = 0 the function P (t ) is decreasing. Argue from
the value function why it is immediate that P (t ) is decreasing if s = 0.
[Hint: P (t ) decreases iff for a fixed t the function P (t ) increases as a
function of final time T .]
(d) Figure 4.11 shows graphs of the optimal state x ∗ (t ) for T = 1, T = 2
and various s. The initial condition is x 0 = 1 in all cases. The plot
on the left considers T = 1 and s = 0, 1, 2, 3, 4, 5. The plot on the right
T = 2 and the same s = 0, 1, 2, 3, 4, 5. Explain which of the graphs cor-
respond to which value of s, and also explain from the system equa-
tion ẋ (t ) = x (t )+ u (t ) and cost why it can happen that for some s the
optimal x ∗ (t ) increases for t near T .

1 1
x (t )
x (t )

0 T 1 0 T 2

F IGURE 4.11: Graphs of optimal x ∗ with x 0 = 1. Left: for T = 1 and s =


0, 1, 2, 3, 4, 5. Right: for T = 2 and s = 0, 1, 2, 3, 4, 5. See Exercise 4.7.

4.8 State transformation. Sometimes a transformation of the state variables


can facilitate solving the optimal control problem. With z (t ) defined as
z (t ) = E −1 x (t ), show that the LQ problem for ẋ (t ) = A x (t ) + B u (t ), x (0) =
x 0 with cost
T
J [0,T ] (x 0 , u ) = x T (t )Q x (t ) + u T (t )R u (t ) dt
0

yields the problem

ż (t ) = Ã z (t ) + B̃ u (t ), z (0) = z0 := E −1 x0

with cost
T
J˜[0,T ] (z 0 , u ) = z T (t )Q̃ z (t ) + u T (t )R u (t ) dt ,
0

where à = E −1 AE , B̃ = E −1 B and Q̃ = E T QE .
Also, what is the relationship between the value functions for both prob-
lems?
164 4 L INEAR QUADRATIC C ONTROL

4.9 State transformation. Consider the infinite horizon LQ problem with sta-
bility for the system
   
−1 1 1 0
ẋ (t ) = x (t ) + u (t ), x (0) = x0
1 −1 0 1
and cost
∞  
7 2
J [0,∞) (x 0 , u ) = x T (t ) x (t ) + u T (t ) u (t ) dt .
0 2 7
(a) Show that the optimal control is u ∗ (t ) = −P x (t ), where P is the sta-
bilizing solution of the ARE.
(b) To find the solution P , perform the state transformation z = E −1 x
for a suitable matrix E . Choose E such that E −1 AE is diagonal, and
use it to determine P . [Hint: use Exercise 4.8.]

4.10 Direct proof of optimality. The proof of Proposition 4.3.1 assumes that R =
I . Develop a similar proof for the case that R is an arbitrary m ×m positive
definite matrix.

4.11 Discount factor. Consider the linear system

ẋ (t ) = A x (t ) + B u (t ), x (0) = x0
with A ∈ Rn×n and B ∈ Rn×m , and suppose that the cost function contains
an exponential factor,
T
 
J [0,T ] (x 0 , u ) = e−2αt x T (t )Q x (t ) + u T (t )R u (t ) dt ,
0

for a given constant α. (For α > 0 the factor e−2αt is known as a discount
factor, rendering running costs further in the future less important.) We
also assume that Q ≥ 0 and R > 0.

(a) Write the above cost function and system equations ẋ (t ) = A x (t ) +


B u (t ) in terms of the new variables z (t ), v (t ) defined as

z (t ) = e−αt x (t ), v (t ) = e−αt u (t ).
(The new version of the cost function and system equations should
no longer contain x and u .)
(b) With the aid of (a) determine the solution u ∗ (t ) in terms of x ∗ (t ), of
the optimal control problem of the scalar system

ẋ (t ) = x (t ) + u (t ), x (0) = x0 := 1
with cost
1
 
J [0,T ] (x 0 , u ) = e−2t x 2 (t ) + u 2 (t ) dt .
0
4.7 E XERCISES 165

4.12 Solving the infinite horizon LQ problem via the eigenvectors of the Hamil-
tonian. (This exercise assumes knowledge of basic linear algebra.) Con-
sider the infinite horizon LQ problem of Exercise 4.9.

(a) Verify that the Hamiltonian matrix H has eigenvalues ±3. [Hint: per-
form row operations on H + 3I .]
(b) Determine two linearly independent eigenvectors v 1 , v 2 of H that
both have eigenvalue −3, and use these to construct the stabilizing
solution P of the ARE.

4.13 Consider the LQ problem of Example 4.6.3.

(a) Verify that P satisfies the ARE, and that P > 0.


(b) Verify that

−1 0 1
A − BR B P=T  ,
−2 − 2

and show that the closed loop is asymptotically stable.

4.14 Set-point regulation. Consider the linear system

ẋ (t ) = A x (t ) + B u (t ), x (0) = x0

with A ∈ Rn×n and B ∈ Rn×m . In applications we often want to steer the


state to a “set-point” x̄ ∈ Rn which is not necessarily zero (think of the
heating system in your house). For constant inputs, u (t ) = ū, a set-point
x̄ is an equilibrium iff

0 = A x̄ + B ū.

Given such a pair (x̄, ū) we define a cost relative to them as


∞
J [0,∞) (x 0 , u ) =
¯ ( x (t ) − x̄)T Q( x (t ) − x̄) + ( u (t ) − ū)T R( u (t ) − ū) dt ,
0

in which Q ≥ 0 and R > 0.

(a) Show that the transformation

z := x − x̄, v := u − ū

reduces the above optimal control problem to a standard LQ prob-


lem.
(b) Under what general conditions on A, B,Q, R are we guaranteed that
the problem of minimizing the above J¯ over all inputs that steer
( x , u ) to (x̄, ū) has a solution?
166 4 L INEAR QUADRATIC C ONTROL

(c) What is the general form of the optimal input u ∗ in terms of


R, B, P, x , x̄, ū?
(d) Apply the above to the optimal control problem for the scalar system

ẋ (t ) = − x (t ) + u (t ), x (0) = 0

with cost
∞
( x (t ) − 1)2 + ( u (t ) − 1)2 dt ,
0

and compute the optimal control u ∗ .

4.15 Infinite horizon LQ. We consider the same system and cost as in Exer-
cise 4.1, but now with T = ∞:
∞
ẋ (t ) = 3 x (t ) + 2 u (t ), x (0) = x0 , J [0,∞) (x0 , u ) = 4 x 2 (t ) + u 2 (t ) dt .
0

(a) Determine the nonnegative solution P of the ARE.


(b) Verify that A − B R −1 B T P is asymptotically stable.
(c) Determine the input that solves the LQ problem with stability, and
write it in the form u (t ) = −F x (t ) (that is, find F ).
(d) Determine the optimal cost.
(e) Determine the eigenvalues of the Hamiltonian matrix H without
determining the Hamiltonian matrix!
(f ) Determine the Hamiltonian matrix and verify that 1 is an eigen-
P
vector of the Hamiltonian matrix.

4.16 Infinite horizon LQ problems with and without stability. Consider the sys-
tem

ẋ (t ) = x (t ) + u (t ), x (0) = x0 , u (t ) ∈ R

with the infinite horizon cost


∞
J [0,∞) (x 0 , u ) = g 2 x 2 (t ) + u 2 (t ) dt .
0

Here g is some nonzero real number.

(a) Determine all solutions P of the ARE.


(b) Which solution P do we need for the infinite horizon LQ problem
with stability?
(c) Determine the solutions u ∗ (t ), x ∗ (t ) explicitly as a function of time
of the infinite horizon LQ problem with stability.
4.7 E XERCISES 167

(d) Now take g = 0. Argue that the input that minimizes the cost over all
stabilizing inputs is not the same as the one that minimizes the cost
over all inputs (stabilizing or not).

4.17 Lyapunov equation. Let Q ≥ 0, R > 0 and suppose that B = 0. Consider The-
orem 4.5.6.

(a) Given that B = 0, under what conditions on A are the assumptions


of Theorem 4.5.6 satisfied?
(b) Determine the ARE for this case.
(c) What result from § B.5 comes to mind?

4.18 Quadratic cost consisting of two terms. Consider the system

ẋ (t ) = u (t ), x (0) = x0 , u (t ) ∈ R,

and the infinite horizon quadratic cost composed of two terms,


1 ∞
J [0,∞) (x 0 , u ) = u 2 (t ) dt + 4 x 2 (t ) + u 2 (t ) dt .
0 1
∞
(a) Assume first that x (1) is given. Minimize 1 4 x 2 (t )+ u 2 (t ) dt over all
stabilizing inputs.
1
(b) Express the optimal cost J [0,∞) (x 0 , u ) as J [0,∞) (x 0 , u ) = 0 u 2 (t ) dt +
S x 2 (1), that is, determine S.
(c) Solve the optimal control problem, that is, minimize J [0,∞) (x 0 , u )
over all stabilizing inputs, and express the optimal input u ∗ (t ) as a
function of x (t ). [Hint: use separation of variables, see § A.3.]

4.19 Properties of the Hamiltonian Matrix. Let H be the Hamiltonian matrix


as defined in (4.9). Define the characteristic polynomial r (λ) as
 
λI − A B R −1 B T
r (λ) = det(λI − H ) = det .
Q λI + A T

(a) Argue that


 
−Q −λI − A T
r (λ) = det .
λI − A B R −1 B T

(b) Show that r (λ) = r (−λ).


(c) Argue that r (λ) is a polynomial in λ2 .

4.20 Control of a car when the input is either cheap or expensive. Consider
Example 4.6.4. In this exercise we analyze what happens if ρ is either very
large or very small (positive but close to zero). Keep in mind that c > 0.
168 4 L INEAR QUADRATIC C ONTROL

(a) Show that


 
1 c 1
lim P =
ρ→∞ ρ 1 1/c

and

lim ρ u ∗ (t ) = − y (t ) − (1/c) ẏ (t ).
ρ→∞

(b) Assume that x 0 = 10 . Determine what happens with the optimal


control, the optimal cost, and the eigenvalues of the closed-loop sys-
tem as ρ → ∞?
(c) Argue that for ρ ≈ 0 (but positive) we have
 
2ρ ρ
P≈ 
ρ ρ 2ρ

and

u ∗ (t ) ≈ −(1/ρ) y (t ) − 2/ρ ẏ (t ).

(d) Assume that x 0 = 10 . Determine what happens with the optimal


control, the optimal cost, and the eigenvalues of the closed-loop sys-
tem as ρ ↓ 0.
Chapter 5

Glimpses of Related Topics

This chapter provides an outlook on a number of (rather arbitrarily chosen) top-


ics that are related to the main contents of this book. These brief glimpses are
meant to raise interest and are written in a style that is different from the rest of
the book. Each section, is concluded with a few key references, which offer an
entrance to the literature for further study.

5.1 H∞ Theory and Robustness

The LQ optimal control problem can be seen as an L2 -norm minimization prob-


lem. This point of view became widespread in the eighties of the previous cen-
tury, in part because of its connection with the very popular H∞ optimal con-
trol problem which emerged in that same decade. The L2 -norm of a function
y : [0, ∞) → Rp is defined as

∞ 
 y  L2 =  y (t )2 dt where y := y T y,
0

and L2 is the Hilbert space of functions whose L2 -norm is finite. In this section,
we briefly review L2 -norm inequalities, and we make a connection with H∞ the-
ory. The starting point is a system that includes an output y ,

ẋ (t ) = A x (t ) + B u (t ), x (0) = x0 ,
(5.1)
y (t ) = C x (t ).
Suppose we want to minimize the indefinite cost

− y 2L2 + γ2  u 2L2 (5.2)

for some given γ > 0 over all stabilizing inputs. In terms of the state and input,
∞
this cost takes the form 0 x T (−C T C ) x (t )+γ2 u T (t ) u (t ) dt , and, therefore, the LQ
Riccati equation (4.33) and stability condition (4.34) become

A T P̃ + P̃ A − γ12 P̃ B B T P̃ −C T C = 0 and A − γ12 B B T P̃ asymptotically stable. (5.3)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 169
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0_5
170 5 G LIMPSES OF R ELATED T OPICS

However, since −C T C ≤ 0 this is not a standard LQ problem, and, as such, the


existence of P̃ is not ensured. Assume, for now, that a symmetric solution P̃
of (5.3) does exist, and also assume that A is asymptotically stable. Actually, it is
customary in this case to express the Riccati equation in terms of its negation,
P := −P̃ . That way (5.3) becomes

A T P +P A + γ12 P B B T P +C T C = 0 and A + γ12 B B T P asymptotically stable. (5.4)

Exactly as in the proof of Theorem 4.5.5 it now follows that


 
− y (t )2 + γ2  u (t )2 = d
dt x T (t )P x (t ) + γ2  u (t ) − γ12 B T P x (t )2 . (5.5)

In particular, it reveals that every stabilizing input satisfies the equality

− y 2L2 + γ2  u 2L2 = −x 0T P x 0 + γ2  u − γ12 B T P x 2L2 . (5.6)

Consequently, like in the standard LQ problem, the stabilizing input that min-
imizes (5.6) is u ∗ = γ12 B T P x , and the optimal (minimal) cost − y 2L2 + γ2  u 2L2
equals −x 0T P x 0 . It also shows that P is positive semi-definite, because for u = 0
(which is stabilizing since we assumed A to be asymptotically stable) the cost is
− y 2L2 + γ2  u 2L2 = − y 2L2 ≤ 0, so the minimal cost −x 0T P x 0 is less than or equal
to zero for every x 0 . Knowing this, the identity (5.5) for zero initial state, x 0 = 0,
gives us the inequality

− y 2L2 + γ2  u 2L2 = x T (∞)P x (∞) + γ2  u − γ12 B T P x 2L2 ≥ 0. (5.7)

This is a key observation: in asymptotically stable systems (5.1) that start at rest,
x 0 = 0, the norm  y L2 never exceeds γ u L2 if there exists a symmetric P that
satisfies (5.4). It is a central result in H∞ theory that the existence of such a P
is both necessary and sufficient for the norm inequality (5.7) to hold (in a strict
sense):

Theorem 5.1.1 (Characterization of the L2 -gain). Let A ∈ Rn×n , B ∈ Rn×m ,C ∈


Rp×n , and assume A is asymptotically stable. Consider the system with input u
and output y , and zero initial state,

ẋ (t ) = A x (t ) + B u (t ), x (0) = 0,
y (t ) = C x (t ).

For every γ > 0, the following four conditions are equivalent.


 y L
1. sup u ∈L2 , u =0  u L2 < γ.
2

1
A γ2
BBT
2. The Hamiltonian matrix has no imaginary eigenvalues.
−C T C −A T
5.1 H∞ T HEORY AND R OBUSTNESS 171

3. The Riccati equation A T P +P A+ γ12 P B B T P +C T C = 0 has a unique solution


P ∈ Rn×n for which A + γ12 B B T P is asymptotically stable, and P is symmet-
ric and positive semi-definite.

4. The Riccati equation AQ + Q A T + γ12 QC T CQ + B B T = 0 has a unique solu-


tion Q ∈ Rn×n for which A + γ12 QC T C is asymptotically stable, and Q is
symmetric and positive semi-definite.

Proof. The equivalence of the first three is standard and can be found in sev-
eral books, e.g. (Zhou et al., 1996, Corollary 13.24)1 . The Riccati equation in
Condition 4 we recognize as the Riccati equation of the “transposed” system
x̂˙ (t ) = A T x̂ (t ) + C T û (t ), ŷ (t ) = B T x̂ (t ). Condition 4 is equivalent to Condition 2
because the Hamiltonian matrix H of the transposed system is similar to the
transpose of the Hamiltonian matrix H of Condition 2:
     T
−1 1 T
γ2
I 0 AT −C T C −γ2 I 0 A γ2
C C
1 = ,
0 I γ2
BB T
−A 0 I −B B T
−A
 
=H T =H
and, thus, H and H have the same eigenvalues. ■

There are many variations of this theorem, for instance, for systems that in-
clude a direct feedthrough term, y (t ) = A x (t ) + D u (t ). In general, the expression
 y  L2
sup
u ∈L2 , u =0  u L2
is known as the L2 -gain of the system. Theorem 5.1.1 shows that the L2 -gain
equals the largest γ > 0 for which the Hamiltonian matrix has imaginary eigen-
values. Thus by iterating over γ, we can calculate the L2 -gain. Also, as γ → ∞,
the eigenvalues of the Hamiltonian matrix converge to the eigenvalues of A and
−A. Hence, the L2 -gain is finite whenever A is asymptotically stable. An inter-
esting by-product of Theorem 5.1.1 is that every L2 input is a stabilizing input if
the system is asymptotically stable:

Lemma 5.1.2 (Stability for L2 inputs). Let A ∈ Rn×n , B ∈ Rn×m and consider
ẋ (t ) = A x (t ) + B u (t ), x (0) = x0 . If A is asymptotically stable and u ∈ L2 , then
• both x and ẋ are in L2 ,

• limt →∞ x (t ) = 0.

Proof. Take y = x , that is, C = I . For large enough γ, the Hamiltonian matrix of
Theorem 5.1.1 does not have imaginary eigenvalues, so Condition 3 of Theo-
rem 5.1.1 holds for some large enough γ. Thus, given γ large enough, (5.5) says
that
γ2  u 2L2 + x 0T P x 0 =  x 2L2 + x T (∞)P x (∞) + γ2  u − γ1 B T P x 2L2 .

1 See References and further reading on page 175.


172 5 G LIMPSES OF R ELATED T OPICS

All terms on the right-hand side are nonnegative, hence if u ∈ L2 , then all
these terms are finite. In particular, x ∈ L2 . Consequently also ẋ = A x +
B u ∈ L2 . Now the Cauchy-Schwarz inequality guarantees that | x 2i (b) − x 2i (a)| =
b  
b b 2
| a 2 ẋ i (t ) x i (t ) dt | ≤ 2 a ẋ 2i (t ) dt a x i (t ) dt → 0 as a, b → ∞. So x i (t ) con-
2

verges as t → ∞. Since x i ∈ L2 , it must in fact converge to zero. ■

The standard H∞ problem

LQ optimal control theory was very successful, and it still is, but it has not been
easy to incorporate model uncertainties in this approach. In the late seventies,
this omission led Zames2 to the idea of using H∞ -optimization as an alternative
to LQ optimal control. It was the starting point of the wonder years of H∞ the-
ory. It attracted the attention of operator theoreticians, and for over a decade
there was a very fruitful cooperation between operator theory and systems and
control theory. At the end of the eighties the first truly satisfactory solutions of
what is called the “standard H∞ problem” were obtained.
The H∞ -norm of a linear time-invariant mapping from w to z is, strictly
speaking, defined as a property of its transfer function / matrix H z/w (s). How-
ever, for systems described by

ẋ (t ) = A x (t ) + B w (t ), x (0) = 0,
z (t ) = C x (t ),

the H∞ -norm of H z/w coincides with its L2 -gain,

 z  L2
H z/w H∞ = sup .
w ∈L2 , w =0  w L2

Thus, the H∞ -norm is an induced norm (on a Banach space of bounded lin-
ear operators). Hence, we have the well-known contraction theorem (aka small
gain theorem) which for linear mappings H : L2 → L2 says that I − H is invertible
on L2 if H H∞ < 1. This elegant result is very powerful and can be utilized to
design controllers for a whole family of systems, or for systems whose models
are uncertain in some specific way. The game in H∞ optimal control is to exploit
the freedom in the design to minimize the H∞ -norm of a mapping H z/w that we
select. In this context, w is often called the “disturbance” and z the “error signal”.
Even though the popularity stems mainly from its ability to deal with dynamic
uncertainties, we illustrate it only for a problem with signal uncertainties:

Example 5.1.3 (H∞ filtering). Suppose we have a function q : R → R that we


wish to estimate/reconstruct on the basis of another signal y : R → R that we
can measure. This assumes that q and y are in some way related. For instance,
q might be the noise in an airplane as it enters your ear, and y is the noise
2 See References and further reading on page 175.
5.1 H∞ T HEORY AND R OBUSTNESS 173

q
G q/w
z w

K G y /w
u y

F IGURE 5.1: Filtering configuration. See Example 5.1.3.

picked up somewhere else by a microphone. We model q and y as the outputs of


systems G q/w and G y/w driven by a common noise source input w , see Fig. 5.1.
Let u be the estimate of q that we determine based on y :
u = K ( y ).
In this context, K is usually called a filter, and it is the part that we have to
design. The configuration of this problem is shown in Fig. 5.1. Ideally, u equals
q (perfect reconstruction), and then the reconstruction error z := q − u is zero. In
practice, that will hardly ever be possible. An option is then to try to minimize
the effect of w on z , for instance, by minimizing the H∞ -norm of the mapping
from w to z over all stable causal filters K . The mapping from w to z is
H z/w :=G q/w − K G y/w .
Minimizing its H∞ -norm over all stable causal filters K is a typical H∞ optimal
control problem.

z w
G

y u

F IGURE 5.2: Configuration for the standard H∞ problem.

Having Theorem 5.1.1, it will be no surprise that the theory of H∞ opti-


mal control has strong ties with LQ theory and the machinery of Riccati equa-
tions. The above H∞ filtering problem, and many other H∞ problems, are spe-
cial cases of what is called the “standard H∞ (optimal control) problem”. In this
problem, we are given a system G with two sets of inputs ( w , u ) and two sets of
outputs ( z , y ), described by, say,

⎨ ẋ (t ) = A x (t ) + B w w (t ) + B u u (t ), x (0) = 0,
G: z (t ) = C z x (t ) + D z/u u (t ), (5.8)

y (t ) = C y x (t ) + D y/w w (t ) ,
174 5 G LIMPSES OF R ELATED T OPICS

for certain matrices A, B w , B u ,C z ,C y , D z/u , D y/w . Given this system, the standard
H∞ problem is to minimize the H∞ -norm of the mapping from w to z over all
stabilizing causal mappings u = K ( y ), see Fig. 5.2. The mapping K is usually
called controller, and it is the part that we have to design. Over the years, many
solutions have been put forward, but it is fair to say that the best known solution
and best supported in software packages is as follows. It assumes that the state
representation (5.8) is such that
• (A, B u ) is stabilizable,

• (C y , A) is detectable,
 
A − iωI Bu
• has full column rank for all ω ∈ R,
Cz D z/u
 
A − iωI Bw
• has full row rank for all ω ∈ R,
Cy D y/w

• D z/u has full column rank, and D y/w has full row rank.
Here is the famous result:

Theorem 5.1.4 (γ-optimal solution of the standard H∞ problem). Let γ > 0.


Suppose the above 5 assumptions hold, and that in addition (for reasons of aes-
thetics only)
       
T
D y/w B w D Ty/w = 0 I T
and D z/u C z D z/u = 0 I . (5.9)

Then there exists a causal stabilizing controller K for which the H∞ -norm of the
mapping from w to z is less than γ iff the following three conditions hold:
1. The Riccati equation A T P + P A + P ( γ12 B w B w
T
− B u B uT )P + C zT C z = 0 has a
unique solution P for which A + ( γ12 B w B w
T
− B u B uT )P is asymptotically sta-
ble, and P is symmetric and positive semi-definite,

2. The Riccati equation AQ + Q A T + Q( γ12 C zT C z − C yT C y )Q + B w B w


T
= 0 has a
unique solution Q for which A+Q( γ12 C z C zT −C yT C y ) is asymptotically stable,
and Q is symmetric and positive semi-definite,

3. All eigenvalues of QP have magnitude less than γ2 .


In that case, one (out of many) causal stabilizing controllers for which the H∞ -
norm of the mapping from w to z is less than γ is the mapping u = K ( y ) defined
by
   −1
x̂˙ = A + [ γ12 B w B wT − B u B uT ]P x̂ + I − γ12 QP QC yT ( y − C y x̂ ), x̂ (0) = 0,
u = −B uT P x̂ .
5.2 D ISSIPATIVE S YSTEMS 175

The solutions P and Q of the above two Riccati equations can be con-
structed from the asymptotically stable eigenvalues and eigenvectors of the cor-
responding Hamiltonian matrices, much like what we did in the final examples
of § 4.5. We need to stress that the additional assumptions (5.9) are for ease of
exposition only. Without it, the problem is perfectly solvable but the formulae
become unwieldy.

References and further reading

• G. Zames. Feedback and optimal sensitivity: Model reference transforma-


tions, multiplicative seminorms, and approximate inverses. IEEE Trans.
Automat. Control, 26(2): 301–320, 1981.

• S. Skogestad and I. Postlethwaite. Multivariable Feedback Control. Anal-


ysis and Design. John Wiley and Sons Ltd., Chichester, Sussex, UK, 2nd
edition, 2005.

• K. Zhou, J.C. Doyle, and K. Glover. Robust and Optimal Control. Prentice
Hall: Upper Saddle River, New Jersey, 1996.

5.2 Dissipative Systems

An important approach to the analysis of input-state-output systems

ẋ (t ) = f ( x (t ), u (t )), x (t ) ∈ X = Rn , u (t ) ∈ U = Rm ,
(5.10)
y (t ) = h( x (t ), u (t )), y (t ) ∈ Y = Rp ,

is the theory of dissipative systems; as initiated and developed by Willems3 . In


particular, this theory unifies, and generalizes, the classical passivity and small-
gain theorems for feedback interconnections of systems. Perhaps surprisingly,
it is intimately related to optimal control.
Consider a system (5.10), together with a function s : U×Y → R, called a sup-
ply rate. The system is said to be dissipative (with respect to the supply rate s)
if there exists a nonnegative function S : X → [0, ∞) such that

 
S( x (τ)) ≤ S(x 0 ) + s u (t ), y (t ) dt (5.11)
0

for all initial conditions x (0) = x 0 , all τ ≥ 0, and all u : [0, τ] → U, where x (τ)
denotes the state at time τ and y (t ) the output at time t resulting from initial
condition x (0) = x 0 and input function u . The nonnegative function S is called
the storage function (corresponding to the supply rate s), and (5.11) is called the
dissipation inequality. Clearly, if S(x) is satisfies (5.11), then so does the function
S(x)−c for any constant c. Hence, if S(x) is a storage function, then so is S(x)−c
3 See References and further reading on page 179.
176 5 G LIMPSES OF R ELATED T OPICS

for any c such that S(x) − c is a nonnegative function. However, in many cases,
the non-uniqueness of storage functions goes much further than this. Two key
examples of supply rates s(u, y) are

passivity supply rate : s(u, y) = y T u, (assuming p = m),

and

L2 -gain supply rate : s(u, y) = γ2 u2 − y2 , with γ ≥ 0.



Here, u, y denote the standard Euclidean norms, u = u T u, y = y T y.
The passivity supply rate typically has the interpretation of the supplied
power, with u, y denoting, for example, generalized forces and velocities (in
the mechanical domain), or currents and voltages (in the electrical domain).
Then S(x) has the interpretation of the energy stored in the system if it is at
state x, and (5.11) expresses the property that for all initial conditions x 0 and all
input functions the energy stored at any future time instant τ is always less than
or equal to the amount of energy stored at time 0 plus the total energy that is
supplied to the system by its surroundings during the time-interval [0, τ]. That
is, the energy of the system can only increase due to supply from outside. Said
differently, the system itself cannot create energy; it can only dissipate. Systems
that are dissipative with respect to the supply rate s(u, y) = y T u are also called
passive.
Dissipativity with respect to the L2 -gain supply rate means that
τ τ
S( x (τ)) ≤ S(x 0 ) + γ2  u (t )2 dt −  y (t )2 dt .
0 0

Since S( x (τ)) ≥ 0 this implies


τ τ
 y (t ) dt ≤
2
γ2  u (t )2 dt + S(x 0 ).
0 0

(This is similar to the infinite horizon case as discussed in § 5.1 for linear sys-
tems.) Thus, the L2 -norm of the output on [0, τ] is bounded by γ times the L2 -
norm of the input on [0, τ], plus a constant (only depending on the initial con-
dition); for all τ ≥ 0. The infimal value of γ for which this holds is the L2 -gain
of the system; it measures the amplification from input to output functions. For
linear time-invariant systems, the L2 -gain equals the H∞ -norm of its transfer
matrix; see Section 5.1.
When is the system (5.10) dissipative for a given supply rate s? It turns out
that the answer to this question is given by an optimal control problem! Con-
sider the extended function S a : X → [0, ∞] (“extended” because the value ∞ is
allowed) which is defined by the free final time optimal control problem
τ τ
   
S a (x 0 ) = sup − s u (t ), y (t ) dt = − inf s u (t ), y (t ) dt (5.12)
τ, u :[0,τ]→U 0 τ, u :[0,τ]→U 0
5.2 D ISSIPATIVE S YSTEMS 177

for any initial condition x 0 , where y (t ), t ∈ [0, τ], denotes the output resulting
from initial condition x (0) = x 0 and input u : [0, τ] → U. Note that by construc-
tion (take τ = 0) S a (x 0 ) ≥ 0 for all x 0 ∈ X.

Theorem 5.2.1. The system (5.10) is dissipative with respect to the supply rate
s iff S a (x 0 ) < ∞ for all x 0 , i.e., S a : X → [0, ∞). Furthermore, if this is the case,
then S a is a storage function, and any other storage function S satisfies S a (x 0 ) ≤
S(x 0 ) − infx S(x) for all x 0 ∈ X. Finally, infx S a (x) = 0.

Proof. Suppose S a (x 0 ) < ∞ for all x 0 , and thus S a : X → [0, ∞). Consider any
u : [0, τ] → U and x0 . Then in general u will be a suboptimal input for the optimal
control problem (5.12). Hence,

 
S a (x 0 ) ≥ − s u (t ), y (t ) dt + S a ( x (τ)),
0

which is the same as the dissipation inequality (5.11). Thus, (5.10) is dissipative
with storage function S a . Conversely, let (5.10) be dissipative, i.e., there exists
nonnegative S satisfying (5.11). Then for any u : [0, τ] → U and x 0

 
S(x 0 ) + s u (t ), y (t ) dt ≥ S( x (τ)) ≥ 0,
0
τ  
and thus S(x 0 ) ≥ − 0 s u (t ), y (t ) dt , and hence

 
S(x 0 ) ≥ sup − s u (t ), y (t ) dt = S a (x 0 ).
τ, u :[0,τ]→U 0

For any storage function S, the function Ŝ(x 0 ) := S(x 0 )−infx S(x), x 0 ∈ X, is a stor-
age function as well. Finally, since infx Ŝ(x) = 0 also infx S a (x) = 0. ■

In the passivity case, S a (x 0 ) has the interpretation of the “maximal” energy


that can be extracted from the system being at time 0 at x 0 ; this quantity should
be finite in order that the system is passive. Hence, S a is called the available
energy.
Now return to the dissipation inequality (5.11), where we additionally
assume that S is differentiable. Furthermore, in order to obtain simple formulas,
let us restrict attention to systems of input-affine form and without feedthrough
term:
ẋ (t ) = f ( x (t )) + G( x (t )) u (t ), x (t ) ∈ X = Rn , u (t ) ∈ U = Rm ,
(5.13)
y (t ) = h( x (t )), y (t ) ∈ Y = Rp ,
where G(x) is an n × m matrix. Bringing S(x 0 ) to the left-hand side of (5.11),
dividing both sides by τ, and letting τ → 0, turns the dissipation inequality (5.11)
into the differential dissipation inequality
∂S(x)  
f (x) + G(x)u ≤ s(u, h(x)) (5.14)
∂x T
178 5 G LIMPSES OF R ELATED T OPICS

for all x ∈ X, u ∈ U. The connection of (5.14) with Lyapunov function theory


(Appendix B.3) is clear: if the supply rate s is such that s(0, y) ≤ 0 for all y (which
is, e.g., the case for the passivity and L2 -gain supply rate), then the nonnegative
function S satisfies ∂S(x)
∂x T f (x) ≤ 0, and thus is a candidate Lyapunov function
for the uncontrolled system ẋ (t ) = f ( x (t )). In this sense, one could say that the
theory of dissipative systems generalizes Lyapunov function theory to systems
with inputs and outputs.
In case of the L2 -gain supply rate, the optimal control problem (5.12) has
an indefinite cost criterion. (This was already noted in § 5.1.) Furthermore, the
differential dissipation inequality (5.11) amounts to
∂S(x)  
f (x) + G(x)u − γ2 u2 + y2 ≤ 0 (5.15)
∂x T

for all x, u and y = h(x). For any x the maximum over all u of the left-hand
side of (5.15) is attained at u = 2γ1 2 G T (x) ∂S(x)
∂x , and by substitution into (5.15), it
follows that (5.15) is satisfied for all x, u iff
∂S(x) 1 ∂S(x) ∂S(x)
f (x) + 2 G(x)G T (x) + h T (x)h(x) ≤ 0 (5.16)
∂x T 4γ ∂x T ∂x
for all x ∈ X. On the other hand, the Hamilton-Jacobi-Bellman equation (3.12a)
(p. 96) for the optimal control problem (5.12) with s(u, y) = γ2 u2 −y2 is read-
ily computed as
∂S a (x) 1 ∂S a (x) ∂S a (x)
f (x) + 2 G(x)G T (x) + h T (x)h(x) = 0. (5.17)
∂x T 4γ ∂x T ∂x
Hence, we will call (5.16) the Hamilton-Jacobi inequality. It thus follows that S a
satisfies the Hamilton-Jacobi equation (5.17), while any other storage function
S satisfies the Hamilton-Jacobi inequality (5.16). In general, any infinite horizon
∞
optimal control problem of minimizing 0 L( x (t ), u (t )) dt over all input func-
tions to a control system ẋ (t ) = f ( x (t ), u (t )), with L(x, u) an arbitrary running
cost, leads to the following inequality involving Bellman’s value function V :
∂ V (x)
f (x, u) + L(x, u) ≥ 0, x ∈ X, u ∈ U. (5.18)
∂x T
Thus, (5.18) could be regarded as a reversed dissipation inequality.
The optimal control problem (5.12) for the passivity supply rate s(u, y) = y T u
is much harder: by linearity in u, it is singular. On the other hand, the differen-
tial dissipation inequality (5.14) takes the simple form
∂S(x)  
f (x) + G(x)u ≤ h T (x)u
∂x T

for all x, u, or equivalently for all x ∈ X


∂S(x) ∂S(x)
f (x) ≤ 0, h(x) = G T (x) . (5.19)
∂x T ∂x
5.3 I NVARIANT L AGRANGIAN S UBSPACES AND R ICCATI 179

Let us finally consider (5.19) in case the system (5.13) is a linear system ẋ (t ) =
A x (t ) + B u (t ), y (t ) = C x (t ), that is, f (x) = Ax,G(x) = B, h(x) = C x. Using the
same argumentation as in Exercise 4.5, it follows that S a is a quadratic func-
tion of the state; i.e., S a (x) = 12 x T Q a x for some matrix Q a = Q aT ≥ 0. Restricting
anyway to quadratic storage functions S(x) = 12 x T Qx, with Q = Q T ≥ 0, the dif-
ferential dissipation inequality (5.19) takes the form x T Q Ax ≤ 0,C x = B T Qx for
all x, that is

A T Q + Q A ≤ 0, C = B T Q. (5.20)

This is the famous linear matrix inequality (LMI) of the classical Kalman-
Yakubovich-Popov lemma. Note that Q a is the minimal solution of (5.20).

References and further reading

• J.C. Willems, Dissipative dynamical systems. Part I: General theory. Arch.


Rat. Mech. Anal. 1972, 45, 321–351.

• J.C. Willems, Dissipative dynamical systems. Part II: Linear systems with
quadratic supply rates. Arch. Rat. Mech. Anal. 1972, 45, 352–393.

• A.J. van der Schaft, L 2 -Gain and Passivity Techniques in Nonlinear Con-
trol, 3rd ed.; Springer: Cham, Switzerland, 2017.

5.3 Invariant Lagrangian Subspaces and Riccati

In this section, we will explore some of the geometry behind the algebraic Riccati
equation, using the geometric theory of Hamiltonian dynamics.
Consider a state space X with elements x; first regarded as a finite-
dimensional abstract linear space. Let X∗ be its dual space, with elements
denoted by p. Denote the duality product between X and X∗ by 〈p, x〉 ∈ R,
for x ∈ X, p ∈ X∗ . After choosing a basis for X, one identifies X ∼
= Rn for some
n. Taking the dual basis for X∗ , thereby also identifying X∗ ∼ = Rn , the duality
product reduces to the vector product p x. On the product space X × X∗ , one
T

defines the symplectic form

 (x 1 , p 1 ), (x 2 , p 2 )  :=〈p 1 , x 2 〉 − 〈p 2 , x 1 〉, (5.21)

which, after choosing a basis for X and dual basis for X∗ , is identified with the
matrix
 
0 −I n
J := . (5.22)
In 0
x 
Denoting z i := pii , i = 1, 2, the expression (5.21) thus equals z 1T J z 2 . Now con-
sider any differentiable function H : X × X∗ → R, called a Hamiltonian function.
180 5 G LIMPSES OF R ELATED T OPICS

Its gradient will be denoted by the 2n-dimensional vector


⎡ ⎤
∂H (x,p)
∂x
e H (x, p) := ⎣ ∂H (x,p)
⎦.
∂p

Furthermore, the Hamiltonian vector field v H on X × X∗ is defined as


 ∂H (x,p)
∂p
J v H (x, p) = e H (x, p), equivalently, v H (x, p) = ∂H (x,p)
.
− ∂x

In case of a quadratic Hamiltonian

1 1
H (z) = H (x, p) = p T Ax + x T F x + p T G p, F = F T ,G = G T , (5.23)
2 2
the Hamiltonian vector field v H corresponds to the linear Hamiltonian differen-
tial equations
    
ẋ (t ) A G x (t )
= , (5.24)
ṗ (t ) −F −A T p (t )

H

where H is the Hamiltonian matrix corresponding to the Hamiltonian (5.23).


Next, we come to the definition of a Lagrangian subspace.

Definition 5.3.1 (Lagrangian subspace). A subspace L ⊂ X × X∗ is Lagrangian


if L = L ⊥⊥ , where L ⊥⊥ is defined as L ⊥⊥ = {z ∈ X×X∗ | z T J v = 0 for all v ∈ L }.

It follows that any Lagrangian subspace L satisfies z 1T J z 2 = 0 for all z 1 , z 2 ∈


 
L , while dim L = dim X = n. Examples of Lagrangian subspaces are L = Im PI
for P symmetric. Another example is the generalized stable eigenspace N − of
a Hamiltonian matrix H in case H does not have purely imaginary eigen-
values. In fact, as already noticed in the previous chapter (Exercise 4.19), if λ
is an eigenvalue of a Hamiltonian matrix H , then so is −λ. Thus, if H has
no purely imaginary eigenvalues, then the number (counting multiplicities) of
eigenvalues in the open left half-plane is equal to n = dim X. Furthermore, for
any z 1 , z 2 ∈ N − , let z 1 , z 2 be the solutions of (5.24) for z 1 (0) = z 1 , z 2 (0) = z 2 . Any
Hamiltonian matrix H as in (5.24) satisfies H T J + J H = 0, and thus

d T  
z 1 (t )J z 2 (t ) = ż 1T (t )J z 2 (t )+ z 1T (t )J ż 2 (t ) = z 1T (t ) H T J +J H z 2 (t ) = 0,
dt
implying that z 1T (t )J z 2 (t ) is constant as a function of t . Because limt →∞ z 1 (t ) =
0 = limt →∞ z 2 (t ) this yields z 1T (t )J z 2 (t ) = 0 for all t , proving that N − is indeed
a Lagrangian subspace. By time-reversal, the same is shown for the generalized
unstable eigenspace N + .
5.3 I NVARIANT L AGRANGIAN S UBSPACES AND R ICCATI 181

A Lagrangian subspace L is called invariant for the Hamiltonian vector


field v H if v H (z) ∈ L for all z = (x, p) ∈ L . This means that the Hamiltonian
dynamics ż = v H ( z ) leaves L invariant: starting in L the solution remains in
L . Obviously, both N − and N + are invariant for (5.24). With respect to general
invariant Lagrangian subspaces and general Hamiltonians, we have the follow-
ing result.

Proposition 5.3.2. Consider a differentiable Hamiltonian H : X × X∗ → R and


a Lagrangian subspace L ⊂ X × X∗ . Then L is invariant for the Hamiltonian
vector field v H iff H is zero restricted to L .

Proof. Let L be such that v H (z) ∈ L for all z ∈ L . Then for all v, z ∈ L ,
0 = v T J v H (z) = v T e H (z), which implies that H is constant on L . Thus, since
H (0) = 0, H is zero on L . Conversely, let H be zero on L , implying that
0 = v T e H (z) = v T J v H (z) for all v, z ∈ L . By L = L ⊥⊥ this implies v H (z) ∈ L
for all z ∈ L , and thus L is invariant for v H . ■

Restricting to quadratic Hamiltonians H as in (5.23), and taking a basis for X


and dual basis for X∗ , this results in the following equations. Any n-dimensional
subspace L ⊂ X × X∗ can be written as
   
U U
L = Im , rank = n, (5.25)
V V

for some square matrices U ,V . Furthermore, L given by (5.25) is Lagrangian iff


x
U T V = V TU . It follows that any z = p ∈ L is given as x = U w, p = V w for some
w ∈ Rn . Hence, H given by (5.23) is zero on L iff w T V T AU w + 12 w TU T FU w +
1 T T n
2 w V GV w = 0 for all w ∈ R , or equivalently

U T A T V + V T AU +U T FU + V T GV = 0. (5.26)

This will be called the generalized algebraic Riccati equation. In case U is invert-
ible, the Lagrangian subspace L given by (5.25) equals
   
U I
L = Im = Im .
V V U −1

Hence, H is zero on L iff P :=V U −1 satisfies the standard algebraic Riccati


equation

A T P + P T A + F + P T GP = 0.

Also, the condition U T V = V TU for any Lagrangian subspace (5.25) yields


P = P T.
Invertibility of U can be guaranteed for several cases of interest. One case is
exploited in Theorem 5.1.1. Another elegant result is the following.
182 5 G LIMPSES OF R ELATED T OPICS

Proposition 5.3.3 (Kučera, 1991). Consider H given by (5.23), and a Lagrangian


subspace L given by (5.25) which is invariant for the Hamiltonian vector field
v H corresponding to the Hamiltonian matrix H as in (5.24). If (A,G) is control-
lable, then U is invertible.

Remark 5.3.4. Dually, if (F, A) is observable, then V is invertible.

The dynamics restricted to any invariant Lagrangian subspace L can be


expressed as follows. By U T V = V TU and the rank condition on U ,V in (5.25),
the Lagrangian subspace L satisfies
 
U  
L = Im = ker V T −U T .
V

Furthermore, in view of (5.26), premultiplication of


    
A G U AU + GV
= (5.27)
−F −A T V −FU − A T V
 
by V T −U T is seen to be zero. This implies that the subspace spanned by the
 
right-hand side of (5.27) is indeed contained in Im U
V , and thus
    
A G U U
= L
−F −A T V V

for some matrix L describing the dynamics on L . In fact,


 −1  T 
L = U TU + V T V U AU +U T GV − V T A T V − V T FU .

Many of the above geometric considerations for algebraic Riccati equations


extend to the Hamilton-Jacobi equation
 
∂V (x)
H x, = 0, x ∈ X, (5.28)
∂x

with X a general n-dimensional smooth manifold. (Compare with the infinite


horizon Hamilton-Jacobi-Bellman equation (3.30).) In this case the co-tangent
bundle T ∗ X is endowed with a symplectic form ω, which in natural cotangent
bundle coordinates (x, p) is given by the same matrix expression J as in (5.22).
Furthermore, for any differentiable V : X → R, the submanifold

∂V (x)
L := {(x, p) | p = } (5.29)
∂x
is a Lagrangian submanifold, i.e., a submanifold of dimension equal to dim X on
which ω is zero. Such a Lagrangian submanifold L is invariant for the Hamil-
tonian vector field v H on T ∗ X for an arbitrary differentiable Hamiltonian H iff
H is constant on L . Hence, if H is zero at some point of L given by (5.29),
5.4 M ODEL P REDICTIVE C ONTROL 183

then the Hamilton-Jacobi equation (5.28) corresponds to invariance of L . Fur-


thermore, tangent spaces to Lagrangian submanifolds are Lagrangian subspaces,
while invariance of Lagrangian submanifolds with respect to nonlinear Hamil-
tonian dynamics implies invariance with respect to their linearized Hamiltonian
dynamics, see the last reference below.
Finally, the consideration of invariant Lagrangian subspaces/manifolds L
can be extended to the dynamics of Lagrangian subspaces/manifolds resulting
from a Hamiltonian that is not constant on L . In the linear case, this corre-
sponds to the Riccati differential equation (RDE), and in the nonlinear case
to the time-variant Hamilton-Jacobi equation. In fact, the Lagrangian subman-
ifold at time t is given by L (t ) = {(x, p) | p = ∂V ∂x
(x,t ))
} with V a solution of
∂V (x,t )
∂t + H (x, ∂V∂x
(x,t )
) = 0.

References and further reading

• R.A. Abraham, and J.E. Marsden, Foundations of Mechanics, 2nd ed.; Ben-
jamin/Cummings: Reading, MA, USA, 1978.

• V.I. Arnold, Mathematical Methods of Classical Mechanics; 2nd ed.;


Springer: Berlin/Heidelberg, Germany, 1989.

• S. Bittanti, A.J. Laub, and J.C. Willems (Eds.), The Riccati Equation,
Springer-Verlag, Berlin-Heidelberg, 1991. (Chapter 3 of this book is by
V. Kučera and it contains a proof of Proposition 5.3.3.)

• A.J. van der Schaft, L 2 -Gain and Passivity Techniques in Nonlinear Con-
trol, 3rd ed.; Springer: Cham, Switzerland, 2017.

5.4 Model Predictive Control

Model predictive control (MPC) is usually formulated for systems in discrete


time (see (3.3) in Chapter 3)

x (t + 1) = f ( x (t ), u (t )), t ∈ Z = {. . . , −1, 0, 1, . . . }. (5.30)

Its key idea is the following. Let the state of the system at some discrete time
instant t ∈ Z be equal to x t . Then consider for given integer N > 0 the optimal
control problem of minimizing over all control sequences u (t ), . . . , u (t + N − 1)
the cost criterion
N
−1
J [t ,t +N −1] (x t , u ) = L( x (t + k), u (t + k)). (5.31)
k=0

Here, L(x, u) is the running cost, and x (t + k) denotes the state at time t + k
resulting from initial condition x (t ) = x t and control sequence u (t ), . . . , u (t + k −
1). Note that this is the same optimal control problem as considered in § 3.3,
184 5 G LIMPSES OF R ELATED T OPICS

apart from the fact that the minimization is done over the time horizon t , t +
1, . . . , t + N − 1, instead of 0, 1, . . . , T − 1, and that there is no terminal cost.
Let u ∗ (t ), u ∗ (t + 1), . . . , u ∗ (t + N − 1) be the computed optimal control
sequence. The basic difference between model predictive control and stan-
dard optimal control now shows up as follows. Consider just the first control
value u ∗ (t ), and apply this to the system. Now consider the same optimal con-
trol problem, but shifted in time by 1. That is, minimize, for the observed initial
condition x t +1 at time t + 1, the cost criterion
N
−1
J [t +1,t +N ] (x t , u ) = L( x (t + 1 + k), u (t + 1 + k)) (5.32)
k=0

over the shifted time horizon t + 1, t + 2, . . . , t + N . This yields an optimal control


sequence u ∗∗ (t + 1), u ∗∗ (t + 2), . . . , u ∗∗ (t + N ). Then, again, only apply the first
value u ∗∗ (t + 1) of the now obtained optimal control sequence, consider the
observed state x t +2 , and continue with the next shifted optimal control prob-
lem. And so on.
What are the characteristics of this MPC strategy? First, we do not consider
one single optimal control problem over a fixed horizon, but instead at every
subsequent time instant, we consider an optimal control problem with a shifted
time horizon. This is why MPC is also called “receding horizon” optimal con-
trol (and in this sense bears similarity with infinite horizon optimal control).
Second, we do not use the full optimal control sequence computed at each sub-
sequent time instant t , t + 1, . . . , but instead only its first value; thus yielding for
all t the same feedback expression in x t .
Furthermore, what are the possible advantages of MPC over standard opti-
mal control? Clearly, this very much depends on the aims and on the application
context. Model predictive control is often applied in situations where the sys-
tem model (5.30) is uncertain and incomplete. In such a situation, f (x t , u ∗ (t ))
resulting from the optimal input u ∗ (t ) can be rather different from the actual,
observed, value of x t +1 at time t + 1. Furthermore, such a discrepancy between
computed values of the next state and the observed ones will generally only
increase for subsequent time instants t + 2, . . . . This is the reason that in MPC
only the first value of the optimal control sequence is used, and that the initial
value of the optimal control problem (5.32) is rather taken to be the observed
value of x t +1 , instead of its computed (or predicted) value f (x t , u ∗ (t )). Thus, the
uncertain system model (5.30) is only used as the “available knowledge” for pre-
dicting the future behavior of the system, in order to compute a current control
value that is optimal with respect to this available knowledge and a cost crite-
rion with time horizon length N .
The computation of the optimal control problems (5.31), (5.32), etc., can be
done with the same optimal control techniques (but in discrete time; see § 3.3)
as treated in the present book. However, often it is done in a more basic fash-
ion. For example, the optimal control problem (5.31) can be written as a static
5.4 M ODEL P REDICTIVE C ONTROL 185

optimization problem over the vectors u (t ), . . . , u (t + N − 1), taking into account


the system model (5.30). (In fact, this is not so much different from what was
done in § 3.3.) This means that in case there are equality and/or inequality con-
straints on the input and state vectors, these can be handled using the standard
powerful techniques from static optimization theory.
Contrary to infinite horizon optimal control (§ 4.5), MPC does not give a pri-
ori guarantees for (asymptotic) stability. In fact, quite some research has been
devoted to enforcing stability in MPC; either by adding suitable terminal cost
functions to the optimal control problems (5.31), (5.32), etc., or by supplement-
ing these optimal control problems with terminal constraints; see the references
below.
Finally, note that MPC can be also extended to continuous time systems
ẋ (t ) = f ( x (t ), u (t )), by considering for any state x t at time t ∈ Z the optimal
control problem of minimizing
t +T
 
J [t ,t +T ] (x t , u ) = L x (τ), u (τ) dτ
t

for some finite horizon length T , with T a positive integer. Solving this optimal
control problem yields an optimal control function u ∗ : [t , t +T ] → U, which can
be implemented during the restricted time interval [t , t + 1). The observed state
x t +1 at the next discrete time instant t + 1 then serves as initial condition for
a shifted optimal control problem on [t + 1, t + 1 + T ], and so on, just as in the
discrete-time case. (Obviously, all this can be extended to arbitrary time instants
t ∈ R, arbitrary horizon T ∈ (0, ∞), and an arbitrary implementation time inter-
val [t , t + δ], T ≥ δ > 0.)

References and further reading

• L. Grüne and J. Pannek. Nonlinear Model Predictive Control: Theory and


Algorithms. Springer, London, 2011.

• J.B. Rawlings and D.Q. Mayne. Model Predictive Control: Theory and
Design. Nob Hill Publishing, Madison, 2009.
Appendix A

Background Material

This appendix contains summaries of a number of topics that play a role in opti-
mal control. Each section covers one topic, and most of them can be read inde-
pendently from the other sections. The topics are standard and are covered in
one form or another in calculus courses, a course on differential equations, or a
first course on systems theory. Nonlinear differential equations and stability of
their equilibria are discussed in Appendix B.

A.1 Positive Definite Functions and Matrices

Let Ω ⊂ Rn , and suppose Ω is a neighborhood of some x̄ ∈ Rn . A continuously


differentiable function V : Ω → R is said to be positive definite relative to x̄ if

V (x) > 0 ∀x ∈ Ω\{x̄} and V (x̄) = 0.

It is positive semi-definite—also known as nonnegative definite—if

V (x) ≥ 0 ∀x ∈ Ω and V (x̄) = 0.

A symmetric matrix P ∈ Rn×n is said to be positive definite if V (x) := x T P x is


a positive definite function relative to x̄ = 0 ∈ Rn . In this case the neighborhood
Ω is irrelevant and we may as well take Ω = Rn . So a symmetric P ∈ Rn×n is
positive definite if

x T P x > 0 ∀x ∈ Rn , x = 0.

It is positive semi-definite (or nonnegative definite) if

x T P x ≥ 0 ∀x ∈ Rn .

The notation V > 0 and P > 0 means that the function/matrix is positive
definite. Interestingly, real symmetric matrices have real eigenvalues only, and
there exist simple tests for positive definiteness:

© The Editor(s) (if applicable) and The Author(s), under exclusive license 187
to Springer Nature Switzerland AG 2023
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0
188 A PPENDIX A: B ACKGROUND M ATERIAL

Lemma A.1.1 (Tests for positive definiteness). Suppose P is an n × n real sym-


metric matrix. The following six statements are equivalent.

1. P > 0.

2. All leading principal minors are positive: det(P 1:k,1:k ) > 0 for all k ∈
{1, 2, . . . , n}. Here P 1:k,1:k is the k × k sub-matrix of P composed of the
first k rows and first k columns of P .

3. All eigenvalues of P are real and larger than zero.

4. There is a nonsingular matrix X such that P = X T X .

5. Cholesky factorization: P = X T X for some (unique) upper-triangular


matrix X with positive entries on the diagonal.

6. For whatever partition of P ,


 
P 11 P 12
P= T
P 12 P 22

with P 11 square (hence P 22 square), we have


−1
P 11 > 0 and P 22 − P 12
T
P 11 P 12 > 0.

−1
(That is, both P 11 and its so-called Schur complement P 22 −P 12
T
P 11 P 12 are
positive definite).


For positive semi-definite matrices similar tests exist, except for the princi-
pal minor test which is now more involved:

Lemma A.1.2 (Tests for positive semi-definiteness). Let P = P T ∈ Rn×n . The fol-
lowing five statements are equivalent.

1. P ≥ 0.

2. All principal minors (not just the leading ones) are nonnegative. That is,
det(P I,I ) ≥ 0 for every subset I of {1, . . . , n}. Here P I,I is the square matrix
composed from all rows i ∈ I and columns j ∈ I.

3. All eigenvalues of P are real and nonnegative.

4. There is a matrix X such that P = X T X .

5. Cholesky factorization: P = X T X for a unique upper-triangular matrix X


with nonnegative diagonal entries.
A.2 A N OTATION FOR PARTIAL D ERIVATIVES 189

Moreover, if for some partition


 
P 11 P 12
P= T
P 12 P 22

the matrix P 11 is square and invertible, then P ≥ 0 iff P 11 > 0 and P 22 −


−1
T
P 12 P 11 P 12 ≥ 0. 

 0
Example A.1.3. P = 00 −1 is not positive semi-definite because the principal
minor det(P 2,2 ) = −1 is not nonnegative.
 
P = 00 01 is positive semi-definite because all three principal minors, det(0),
det(1), det(P ) are nonnegative. 

A.2 A Notation for Partial Derivatives

We introduce a notation for partial derivatives of functions f : Rn → Rk .


∂ f (x)
First the case k = 1, so f : Rn → R. Then ∂x is a vector of partial derivatives
of the same dimension as x. For the standard choice of column vectors x (with
n entries) this means the column vector
⎡ ⎤
∂ f (x)
⎢ ∂x ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ ∂ f (x) ⎥
∂ f (x) ⎢ ⎢ ⎥
:= ⎢ ∂x 2 ⎥ n
⎥∈R .
∂x ⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎣ ∂ f (x) ⎦
∂x n

With the same logic we get a row vector if we differentiate with respect to a row
vector,
 
∂ f (x) ∂ f (x) ∂ f (x) ∂ f (x)
:= · · · ∈ R1×n .
∂x T ∂x 1 ∂x 2 ∂x n

Now the case k ≥ 1. If f (x) ∈ Rk is itself vectorial (column) then we similarly


define
⎡ ⎤
∂ f 1 (x) ∂ f 1 (x) ∂ f 1 (x)
⎢ ∂x ···
⎢ 1 ∂x 2 ∂x n ⎥ ⎥
∂ f (x) ⎢ ⎥

: = ⎢ .. . . . .. ⎥
. k×n
∂x
.. .. ⎥∈R ,
T
⎢ ⎥
⎣ ∂ f k (x) ∂ f k (x) ∂ f k (x) ⎦
···
∂x 1 ∂x 2 ∂x n
190 A PPENDIX A: B ACKGROUND M ATERIAL

and
⎡ ⎤
∂ f 1 (x) ∂ f k (x)
⎢ ∂x ···
⎢ 1 ∂x 1 ⎥ ⎥
⎢ ⎥
⎢ ∂ f 1 (x) ∂ f k (x) ⎥
∂ f T (x) ⎢ ··· ⎥
:=⎢
⎢ ∂x 2 ∂x 2 ⎥ ⎥∈R
n×k
.
∂x ⎢ .. .. .. ⎥
⎢ . . . ⎥
⎢ ⎥
⎣ ∂ f 1 (x) ∂ f k (x) ⎦
···
∂x n ∂x n
The first is the Jacobian, the second is its transpose. Convenient about this nota-
tion is that the n × n Hessian of a function f : Rn → R can now compactly be
denoted as
 
∂2 f (x) ∂ ∂ f (x) ∂ ∂ f (x) ∂ f (x) ∂ f (x)
:= = · · ·
∂x∂x T ∂x ∂x T ∂x ∂x 1 ∂x 2 ∂x n
⎡ 2 ⎤
∂ f (x) ∂2 f (x) ∂2 f (x)
⎢ ···
⎢ ∂x 12 ∂x 1 ∂x 2 ∂x 1 ∂x n ⎥ ⎥
⎢ 2 ⎥
⎢ ∂ f (x) ∂2
f (x) ∂ 2
f (x) ⎥
⎢ ··· ⎥
⎢ ∂x ∂x ∂x 2 ∂x n ⎥
=⎢ 2 1 ∂x 2 2
⎥.
⎢ ⎥
⎢ .. .. .. .. ⎥
⎢ . . . . ⎥
⎢ 2 ⎥
⎣ ∂ f (x) ∂2 f (x) ∂2 f (x) ⎦
···
∂x n ∂x 1 ∂x n ∂x 2 ∂x n2

Indeed, we first differentiate with respect to the row x T , and subsequently, dif-
ferentiate the outcome (a row) with respect to the column x, resulting in an n×n
matrix of second-order partial derivatives. If f (x) is twice continuously differen-
tiable then the order in which we determine the second-order derivatives does
not matter, so then

∂2 f (x) ∂2 f (x)
= .
∂x∂x T ∂x T ∂x
Hence the Hessian is symmetric.

A.3 Separation of Variables

Let x : R → R, and consider the differential equation


g (t )
ẋ (t ) =
h( x (t ))
for some given continuous functions g , h : R → R with h( x (t )) = 0. Let G, H
denote anti-derivatives of g , h. The differential equation is equivalent to

h( x (t )) ẋ (t ) = g (t ),
A.3 S EPARATION OF VARIABLES 191

and we see that the left-hand side is the derivative of H ( x (t )) with respect to t ,
and the right-hand side is the derivative of G(t ) with respect to t . So it must be
that

H ( x (t )) = G(t ) + c 0

for some integration constant c 0 . That is

x (t ) = H −1 (G(t ) + c0 ).
This derivation assumes that H is invertible. The value of c 0 is typically used to
match an initial condition x (t 0 ).

Example A.3.1. We solve the differential equation

ẋ (t ) = − x 2 (t ), x (0) = x0
of Example B.1.5 using separation of variables. We split the solution in two
columns; the first column is the example, the second column makes a connec-
tion with the general procedure:

ẋ (t ) = − x 2 (t ) h( x (t )) = 1/ x (t )2 , g (t ) = − 1
ẋ (t )
= −1 h( x (t )) ẋ (t ) = g (t )
x 2 (t )
1
− = −t + c 0 H ( x (t )) = G(t ) + c 0
x (t )
1
x (t ) = x (t ) = H −1 (G(t ) + c0 )
t − c0
In this example the inverse exists as long as t = c 0 . Now x 0 = x (0) = −1/c 0 so
c 0 can be expressed in terms of x 0 as c 0 = −1/x 0 and the above solution then
becomes
1 x0
x (t ) = = . (A.1)
t + 1/x 0 x 0 t + 1
The solution x (t ) escapes at t = −1/x 0 . (For the escape time we refer to Exam-
ple B.1.5.) 

Example A.3.2. Suppose that

ẋ (t ) = a x (t ), x (0) = x0 ,
and that x (t ) > 0. Then we may divide by x (t ) to obtain
ẋ (t )
= a.
x (t )
Integrating both sides and using that x (t ) > 0, we find that

ln( x (t )) = at + c 0 .
192 A PPENDIX A: B ACKGROUND M ATERIAL

The logarithm is invertible, yielding

x (t ) = eat +c0 = x0 eat .

For x (t ) < 0 the same solution x 0 eat results (verify this yourself ), and if x (t ) = 0
for some time t then x (t ) = 0 for all time, which is also of the form x (t ) = x 0 eat
(since x 0 = 0). In summary, for every x 0 ∈ R the solution is x (t ) = x 0 eat . 

A.4 Linear Constant-Coefficient DE’s

On the basis of a few examples we briefly refresh the method of characteristic


equations for solving linear differential equations (DE’s) with constant coeffi-
cients. Several exercises and examples in this book assume familiarity with this
method.
To determine all solutions y : R → R of the homogeneous DE
...
y (t ) − ÿ (t ) − 5 ẏ (t ) − 3 y (t ) = 0 (A.2)

we first determine its characteristic equation

λ3 − λ2 − 5λ − 3 = 0.

The function λ3 −λ2 −5λ−3 is known as the characteristic polynomial of the DE,
and in this case it happens to equal (λ + 1)2 (λ − 3). Thus the characteristic roots
of this equation (over the complex numbers) are

λ = −1 (of multiplicity two), and λ = +3.

To each characteristic root, λ, there corresponds an exponential solution, y (t ) =


eλt , of the differential equation, and the general solution y of (A.2) follows as

y (t ) = (c1 + c2 t ) e−t +c3 e+3t

with c 1 , c 2 , c 3 arbitrary constants. The number of degrees of freedom per expo-


nential function equals the multiplicity of the corresponding characteristic root.
For inhomogeneous equations with an exponential term on the right, say,
...
y (t ) − ÿ (t ) − 5 ẏ (t ) − 3 y (t ) = 2 u̇ (t ) + 3 u (t ), u (t ) = est (A.3)

one can find a particular solution, y part (t ), of the same exponential form,
y part (t ) = A est . The constant A follows easily by equating left and right-hand
side of (A.3). For this example it gives

2s + 3
y part (t ) = est .
s 3 − s 2 − 5s − 3
A.5 S YSTEMS OF L INEAR T IME -I NVARIANT DE’ S 193

Then the general solution is obtained by adding the general solution of the
homogeneous equation to the particular solution,

2s + 3
y (t ) = est +(c 1 + c 2 t ) e−t +c 3 e+3t .
s 3 − s 2 − 5s − 3
If s equals a characteristic root (s = −1 or s = +3 in our example) then the above
particular solution is invalid due to a division by zero. Then a particular solution
exists of the form

y part (t ) = At k est

for some constant A and large enough integer k.


If the function u (t ) in (A.3) is polynomial in t , then a polynomial particular
solution y part (t ) = A k t k + · · · + A 1 t + A 0 of sufficiently high degree k exists.

A.5 Systems of Linear Time-Invariant DE’s

Let A ∈ Rn×n and B ∈ Rn×m . For every x 0 ∈ Rn and piecewise continuous u : R →


Rm the solution x : R → Rn of the system of DE’s

ẋ (t ) = A x (t ) + B u (t ), x (0) = x0

follows uniquely as
t
x (t ) = e At
x0 + e A(t −τ) B u (τ) dτ, t ∈ R. (A.4)
0

Piecewise continuity of u is for technical reasons only. Here e A is the matrix


exponential. It exists for square matrices A and can, for instance, be defined in
analogy with the Taylor series expansion of ea as

∞ 1 1 1
eA = Ak = I + A + A2 + A3 + · · · .
k=0 k! 2! 3!

This series is convergent for every square matrix A. Some characteristic proper-
ties of the matrix exponential are:

Lemma A.5.1 (Matrix exponential properties). Let A, P ∈ Rn×n . Then

1. e0 = I for the zero matrix 0 ∈ Rn×n .

2. e A is invertible, and (e A )−1 = e−A .

3. If A = P ΛP −1 for some matrix Λ, then e A = P eΛ P −1 .


d
4. Let t ∈ R. Then dt e At = A e At = e At A.

194 A PPENDIX A: B ACKGROUND M ATERIAL

For the zero signal u (t ) = 0, the above equation (A.4) says that the general
solution of

ẋ (t ) = A x (t ), x (0) = x0 ∈ Rn

is

x (t ) = e At x0 .

For diagonal matrices


⎡ ⎤
λ1 0 ··· 0
⎢0 λ2
.. .. ⎥
⎢ . . ⎥
Λ = ⎢ .. .. .. ⎥,
⎣ . . . 0⎦
0 ··· 0 λn

the matrix exponential is simply the diagonal matrix of scalar exponentials,


⎡ ⎤
eλ1 t 0 · · · 0
⎢ 0 .. .. .. ⎥
Λt ⎢ . . . ⎥
e = ⎢ .. .. .. ⎥. (A.5)
⎣ . . . 0 ⎦
0 · · · 0 eλn t

If A is diagonalizable—meaning Cn has a basis {v 1 , . . . , v n } of eigenvectors of


 
A—then the matrix of eigenvectors, P := v 1 v 2 · · · v n , is invertible and

AP = P Λ,

with Λ the diagonal matrix of eigenvalues of A. Thus A = P ΛP −1 which yields

e At = P eΛt P −1

with eΛt as in (A.5). This shows that for diagonalizable matrices A, every entry
of e At is a linear combination of eλi t , i = 1, . . . , n. However, not every matrix is
diagonalizable. Using Jordan forms it can be shown that:

Lemma A.5.2. Let A ∈ Rn×n and denote its eigenvalues as λ1 , λ2 , . . . , λn (possibly


some coinciding). Then every entry of e At is a finite linear combination of t k eλi t
with k ∈ N and i = 1, 2, . . . , n. Moreover, the following statements are equivalent.

1. Every entry of e At converges to zero as t → ∞.

2. Every entry of e At converges to zero exponentially fast as t → ∞ (meaning


for every entry w(t ) of e At there is a δ > 0 such that limt →∞ w(t ) eδt = 0).

3. All eigenvalues of A have negative real part: Re(λi ) < 0 ∀i = 1, . . . , n.

In that case we say that A is an asymptotically stable matrix. 


A.6 S TABILIZABILITY AND D ETECTABILITY 195

A.6 Stabilizability and Detectability

Consider the following system of differential equations,

ẋ (t ) = A x (t ) + B u (t ), x (0) = x0 , t > 0. (A.6)

Here A ∈ Rn×n and B ∈ Rn×m . The function u : [0, ∞) → Rm is often called the
(control) input, and the interpretation is that this u is for us to choose, and that
the state x : [0, ∞) → Rn follows. A natural question is how well the state can be
controlled by choice of u :

Definition A.6.1 (Controllability). A system ẋ (t ) = A x (t ) + B u (t ) is controllable


if for every pair of states x 0 , x 1 ∈ Rn , there is a time T > 0 and an input u : [0, T ] →
Rm such that the solution x with x (0) = x 0 satisfies x (T ) = x 1 . 

We then say “(A, B ) is controllable”. Controllability can be tested in many


ways. The following four conditions are equivalent:

1. (A, B ) is controllable;
 
2. B AB · · · A n−1 B ∈ Rn×(mn) has rank n;
 
3. A − sI B has rank n for every s ∈ C;

4. for every set {λ1 , λ2 , . . . , λn } of n points in the complex plane, symmetric


with respect to the real axis, there exists a matrix F ∈ Rm×n such that the
eigenvalues of A − B F are equal to {λ1 , λ2 , . . . , λn }.

A weaker form is stabilizability:

Definition A.6.2 (Stabilizability). A system ẋ (t ) = A x (t ) + B u (t ) is stabilizable if


for every x (0) ∈ Rn there is a u : [0, ∞) → Rm such that limt →∞ x (t ) = 0. 

The following three conditions are equivalent:

1. (A, B ) is stabilizable;
 
2. A − sI B has rank n for every s ∈ C with Re(s) ≥ 0;

3. there is an F ∈ Rm×n such that A − B F is asymptotically stable.

The final condition is interesting because it implies that u (t ) := −F x (t ) is a sta-


bilizing input for (A.6), irrespective of the initial condition x 0 .
Now consider a system with an output y :

ẋ (t ) = A x (t ), x (0) = x0 , t > 0,
(A.7)
y (t ) = C x (t ).

Here A is the same as in (A.6), and C is a (constant) k × n matrix. The function


y : [0, ∞) → Rk is called the output, and in applications y often is the part of the
196 A PPENDIX A: B ACKGROUND M ATERIAL

state that can be measured. It is a natural question to ask how much informa-
tion the output provides about the state. For example, if we know the output,
can we reconstruct the state? For linear systems one might define observability
as follows.

Definition A.6.3 (Observability). A system (A.7) is observable if a T > 0 exists


such that x 0 follows uniquely from y : [0, T ] → Rk . 

We then say “(C , A) is observable”. Of course, if x 0 follows uniquely then the


state x (t ) = e At x 0 follows uniquely for all time. There are many ways to test for
observability. The following five conditions are equivalent:

1. (C , A) is observable;
⎡ ⎤
C
⎢ ⎥
⎢ CA ⎥
2. ⎢ . ⎥

⎥∈R
(kn)×n
has rank n;
⎣ .. ⎦
C A n−1
 
C
3. has rank n for every s ∈ C;
A − sI

4. for every set {λ1 , λ2 , . . . , λn } of n points in the complex plane, symmetric


with respect to the real axis, there is a matrix L ∈ Rn×k such that the eigen-
values of A − LC are equal to {λ1 , λ2 , . . . , λn };

5. the “transposed” system x̃˙ (t ) = A T x̃ (t ) + C T ũ (t ) is controllable.

A weaker form of observability is “detectability”. A possible definition is as fol-


lows (from this definition it is not clear that it is weaker than observability):

Definition A.6.4 (Detectability). A system (A.7) is detectable if limt →∞ y (t ) = 0


iff limt →∞ x (t ) = 0. 

Detectability thus means that a possible instability of ẋ (t ) = A x (t ) can be


detected from y (t ). The following four statements are equivalent:

1. (C , A) is detectable;
 
C
2. has rank n for every s ∈ C with Re(s) ≥ 0;
A − sI

3. there is an L ∈ Rn×k such that A − LC is asymptotically stable;

4. the “transposed” system x̃˙ (t ) = A T x̃ (t ) + C T ũ (t ) is stabilizable.


A.6 S TABILIZABILITY AND D ETECTABILITY 197

x1
x̃0
x0
x̃1 x1
x0

F IGURE A.1: Left: two line segments in R2 . Right: one line segment in R3 .
See § A.7.

1 2 3

F IGURE A.2: Three subsets of R2 . Sets X1 , X2 are convex. The third set, X3 ,
is not convex because one of its line segments is not contained in X3 . See
§ A.7.

F IGURE A.3: A function g : R → R is convex (on R) if for every x 0 , x 1 ∈ R and


every μ ∈ [0, 1] we have g ((1−μ)x 0 +μx 1 ) ≤ (1−μ)g (x 0 )+μg (x 1 ). (In the plot
we took μ = 1/4.). See § A.7.
198 A PPENDIX A: B ACKGROUND M ATERIAL

A.7 Convex Sets and Convex Functions

To explain convex functions we first need to know what line segments and con-
vex sets are. Let X be a set. A line segment with endpoints x 0 , x 1 ∈ X is the set

{x 0 + μ(x 1 − x 0 ) | μ ∈ R, 0 ≤ μ ≤ 1}, (A.8)

and the entries of this set are known as the convex combinations of x 0 and x 1 .
For X = R the line segments are the closed intervals [x 0 , x 1 ]. Figure A.1 depicts a
couple of line segments in R2 and R3 . In order to convey the symmetry in x 0 , x 1 ,
line segments are usually denoted as

{(1 − μ)x 0 + μx 1 | 0 ≤ μ ≤ 1}.

This is the same as (A.8). A set X is said to be convex if it contains all its line
segments. That is, if for every x 0 , x 1 ∈ X also (1−μ)x 0 +μx 1 ∈ X for every μ ∈ [0, 1].
Figure A.2 depicts a couple of convex sets in R2 , and also one that is not convex.
Now that convex sets are defined, we can define convex functions. Such
functions are only defined on convex sets. Let X be a convex set. A function
g : X → R is said to be a convex function (on X) if for every x 0 , x 1 ∈ X the graph
of the function with endpoints (x 0 , g (x 0 )) and (x 1 , g (x 1 )) is on or below the line
segment between these two points. More concrete, it is a convex function if for
every x 0 , x 1 ∈ X we have

g ((1 − μ)x 0 + μx 1 ) ≤ (1 − μ)g (x 0 ) + μg (x 1 ) ∀μ ∈ [0, 1].

This is illustrated in Figure A.3. A function g on a convex set is said to be concave


if −g is convex. Convex functions enjoy many fantastic properties. We need the
following three results. The first of these is illustrated in Fig. A.4.

g (x̄) g (x̄ ) (x x̄ )
g (x̄ )
x

x̄ x
∂g (x̄)
F IGURE A.4: A C 1 function g : R → R is convex iff g (x) ≥ g (x̄) + ∂x (x − x̄)
∀x̄, x ∈ R. See Lemma A.7.1.

Lemma A.7.1 (First-order inequality for convexity). Let g : Rn → R, and sup-


pose g is C 1 . Then g is convex on Rn iff
∂g (x̄)
g (x) ≥ g (x̄) + (x − x̄) ∀x̄, x ∈ Rn , (A.9)
∂x T
see Fig. A.4.
A.7 C ONVEX S ETS AND C ONVEX F UNCTIONS 199

Proof. If g is convex then g (x̄ + μ(x − x̄)) ≤ g (x̄) + μ(g (x) − g (x̄)) for all μ ∈ [0, 1].
This inequality we can rearrange as

g (x̄ + μ(x − x̄)) − g (x̄)


g (x) − g (x̄) ≥
μ

assuming μ ∈ (0, 1]. The right-hand side of the above inequality converges to
∂g (x̄)
∂x T (x − x̄) as μ ↓ 0. So (A.9) follows.
Conversely, suppose (A.9) holds. Then it also holds for x μ :=(1 − μ)x̄ + μx for
arbitrary μ ∈ [0, 1]. That is,

∂g (x μ ) ∂g (x μ )
g (x̄) ≥ g (x μ ) + (x̄ − x μ ) = g (x μ ) − μ(x − x̄),
∂x T
∂x T
∂g (x μ ) ∂g (x μ )
g (x) ≥ g (x μ ) + (x − x μ ) = g (x μ ) + (1 − μ)(x − x̄).
∂x T ∂x T
Adding the first inequality times 1 − μ to the second times μ cancels the deriva-
tive and yields (1 − μ)g (x̄) + μg (x) ≥ g (x μ ) = g ((1 − μ)x̄ + μx). ■

Lemma A.7.2 (Second-order inequality for convexity). Let g : Rn → R, and sup-


pose it is C 2 . Then g is convex on Rn iff the n × n Hessian G(x) defined as
⎡ ⎤
∂2 g (x) ∂2 g (x)
⎢ ∂2 x 2 ···
⎢ 1
∂x 1 ∂x n ⎥

⎢ ⎥
G(x) = ⎢

.. .. .. ⎥

⎢ 2. . . ⎥
⎣ ∂ g (x) ∂2 g (x) ⎦
···
∂x n ∂x 1 ∂2 x n2

is positive semi-definite for every x ∈ Rn .

Proof. The Hessian is symmetric. By Taylor’s formula we have that g (x) = g (x̄)+
dg (x̄) 1
dx T (x − x̄) + 2 (x − x̄) G(z)(x − x̄) for some convex combination z of x̄, x. Hence
T

G(z) ≥ 0 implies the inequality of (A.9) which, in turn, implies convexity.


Conversely, if G(x̄) is not positive semi-definite for some x̄ ∈ Rn then a w ∈
dg (x̄)
Rn exists such that C := w T G(x̄)w < 0. Now g (x̄ +μw) = g (x̄)+ dx T (μw)+ 21 μ2C +
dg (x̄)
o(μ2 ) < g (x̄)+ dx T (μw) for some small enough μ > 0. This contradicts convexity.

Lemma A.7.3. Let X be a convex subset of Rn , and suppose f : Rn → R is some


C 1 function (not necessarily convex). If x ∗ minimizes f (x) over all x ∈ X, then

∂ f (x ∗ )
(x − x ∗ ) ≥ 0 ∀x ∈ X, (A.10)
∂x T
see Fig. A.5. If f in addition is a convex function then x ∗ minimizes f (x) over
all x ∈ X iff (A.10) holds.
200 A PPENDIX A: B ACKGROUND M ATERIAL

{x f (x) c}
x
x

f (x )
x

{x f (x) c}

F IGURE A.5: Let f : R2 → R, and X ⊆ R2 . If X is convex, and x ∗ minimizes


∂ f (x )
f (x) over all x ∈ X, then ∂x T∗ (x − x ∗ ) ≥ 0 ∀x ∈ X. See Lemma A.7.3.

∂ f (x ∗ )
Proof. Suppose (A.10) does not hold, i.e., ∂x T (x − x ∗ ) < 0 for some x ∈ X. Then
∂ f (x )
(x ∗ )+μ ∂x T∗ (x −x ∗ )+ o(μ) <
f (x ∗ +μ(x −x ∗ )) = f f (x ∗ ) for small enough μ ∈ (0, 1].
So x ∗ does not minimize f (x).
∂ f (x )
If f is convex then by Lemma A.7.1 we have that f (x) ≥ f (x ∗ ) + ∂x T∗ (x −
x ∗ ). If (A.10) holds then the latter clearly implies f (x) ≥ f (x ∗ ). Hence x ∗ is the
solution of the minimization problem. ■

A.8 Lagrange Multipliers

We recall in this section the notion of Lagrange multipliers for finite-dimen-


sional optimization problems and its connection with first-order conditions of
constrained minimization.
Let J : Rn → R. Roughly speaking, the first-order condition for unconstrained
minimization,

minn J (z),
z∈R

is that no small perturbation z = z ∗ + δ of the candidate minimizer z ∗ decreases


J (z). This idea leads to the classic first-order condition that the gradient vector
∂J (z)/∂z of a differentiable function J must be zero at a minimizer z ∗ . A similar
idea applies to the constrained minimization problem

minn J (z) subject toG(z) = 0, (A.11)


z∈R

where G : Rn → Rk . Notice that G(z) has k rows, so G(z) = 0 is a collection of


k constraints. To motivate the general case, consider first the situation of one
constraint, k = 1, and two parameters, n = 2:

Example A.8.1 (Geometric interpretation of the first-order necessary condi-


tion for k = 1 and n = 2). An example is depicted in Fig. A.6. Let J : R2 → R
A.8 L AGRANGE M ULTIPLIERS 201

ent
tang

0
z)
G(
(0, 0)
G(z0 )
z
z0
J (z0 )
z

G(z) 0 (a)

0
z)
G(

G(z )
(0, 0) z J (z )
z z

G(z) 0 (b)

F IGURE A.6: Let J (z) = z 12 + z 22 . Its level sets {z = (z 1 , z 2 )|J (z) = c} are cir-
cles (shown in red). Suppose the blue curve is where G(z) = 0 and the gray
region is where G(z) < 0. The z 0 in (a) is not a local minimizer of J (z)
subject to G(z) = 0. The z ∗ in (b) is a local minimizer, and it satisfies the
first-order condition that the gradients ∂J (z)/∂z and ∂G(z)/∂z are aligned
at z = z ∗ . See § A.8.
202 A PPENDIX A: B ACKGROUND M ATERIAL

be a smooth function that we want to minimize over all z ∈ R2 subject to the


constraint G(z) = 0, with G : R2 → R some differentiable function. Intuition tells
us that z 0 in Fig. A.6(a) is not a local minimizer because moving up along the
constraint curve brings us to a lower value of J (z). Another way to say this is that
the tangent of the constraint curve at z 0 is not tangent to the level set of J (z)
through z 0 . The first-order condition for (local) minimality is that every pertur-
bation δ ∈ R2 in the “tangent of the constraint”
∂G(z ∗ )
{δ ∈ R2 | δ = 0}
∂z T
is also tangent to the level set through z ∗ , meaning
∂J (z ∗ )
δ = 0.
∂z T
∂J (z) ∂G(z)
The geometric interpretation for n = 2 is that the gradients ∂z and ∂z are
aligned at the minimizer z ∗ , see Fig. A.6(b). 

The findings of the above example generalize as follows: under regularity


assumptions1 one can show that a solution z ∗ of the general constrained mini-
mization problem (A.11) necessarily has the properties

G(z ∗ ) = 0, (A.12a)

and for all δ ∈ Rn


∂G(z ∗ ) ∂J (z ∗ )
if δ = 0 then δ = 0. (A.12b)
∂z T ∂z T
Now the brilliant idea of Lagrange is to associate with the constrained min-
imization problem (A.11) an augmented function defined as

J(z, λ) := J (z) + λT G(z). (A.13)

This is a function of z ∈ Rn as well as λ ∈ Rk , and the components of λ are


known as the Lagrange multipliers. The standard unconstrained first-order con-
ditions for stationarity of the augmented function J(z, λ) are that the gradient
with respect to both z and λ are zero at (z ∗ , λ∗ ):
∂J (z ∗ ) ∂G(z ∗ )
+ λ∗T = 0 and G(z ∗ ) = 0. (A.14)
∂z T
∂z T
The second half of these equations (the first-order conditions with respect to
λ) are just the constraint equations themselves. The first half of these equa-
tions (the first-order conditions with respect to z) tell us that at a stationary
z ∗ the gradient vector ∂J∂z(z)
T is a linear combination of the k rows of ∂G(z ∗)
∂z T , see
Fig. A.6(b). The classic result is that the first-order conditions for the uncon-
strained problem in the augmented set of variables (z, λ) are equivalent to the
above first-order conditions for the original constrained problem in z:
1 That J and G are continuously differentiable, and ∂G(z)/∂z T has full row rank at z = z .

A.8 L AGRANGE M ULTIPLIERS 203

Lemma A.8.2 (First-order condition). Let J : Rn → R, G : Rn → Rk , and assume


that they are both continuously differentiable. Then z ∗ ∈ Rn satisfies the first-
order condition (A.12) iff (A.14) holds for some λ∗ ∈ Rk .

Proof. This is an application of the theorem of alternatives:

Given A ∈ Rk×n and μ ∈ Rn there is a λ ∈ Rk such that λT A = μT iff


for every δ ∈ Rn such that Aδ = 0 we have μT δ = 0.

(proof: The only-if part is easy: if μT = λT A and Aδ = 0 then μT δ = λT Aδ = 0.


For the if-part we note that the condition that (Aδ = 0 =⇒ μT δ = 0) implies that
ker A ⊆ ker μT . This is equivalent to Im μ ⊆ Im A T . Since μ ∈ Im μ this implies the
existence of a λ ∈ Rn such that A T λ = μ.)
Apply the theorem of alternatives with A = ∂G(z ∗ )/∂z T and μ = −∂J (z ∗ )/∂z.

Appendix B

Differential Equations and


Lyapunov Functions

This appendix first reviews the existence and uniqueness of solutions x 1 , . . . , x n :


R → R of n coupled differential equations

ẋ 1 (t ) = f 1 ( x 1 (t ), . . . , x n (t )), x 1 (0) = x01 ,


.. ..
. .
ẋ n (t ) = f n ( x 1 (t ), . . . , x n (t )), x n (0) = x0n , t ≥ 0, (B.1)

and later in this appendix its stability properties are analyzed. Here x 01 , . . . , x 0n ∈
R are given initial conditions, and the f i : Rn → R are given functions. The vector
⎡ ⎤
x1
⎢ ⎥
x := ⎣ ... ⎦ : R → Rn
xn

is called the state, and


⎡ ⎤
x 01
⎢ . ⎥
x 0 := ⎣ .. ⎦ ∈ Rn ,
x 0n

the initial state. Define the vector field f : Rn → Rn with components f 1 , . . . , f n .


Then (B.1) may be written succinctly as

ẋ (t ) = f ( x (t )), x (0) = x0 , t ≥ 0. (B.2)

The solution we normally write as x or x (t ) but sometimes we use x (t ; x 0 ) if we


want to emphasize the dependence on the initial state.

© The Editor(s) (if applicable) and The Author(s), under exclusive license 205
to Springer Nature Switzerland AG 2023
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0
206 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

B.1 Existence and Uniqueness of Solutions

There are differential equations (B.2) whose solution x does not follow uniquely
from the initial state:

Example B.1.1 (Non-unique solution). A standard example of a differential


equation with a non-unique solution is

ẋ (t ) = x (t ), x (0) = 0, t ≥ 0.

Clearly the zero function, x (t ) = 0 ∀t , is one solution, but it is easy to verify that
for every c > 0 the function

is a solution as well! Weird. It is as if the state x —like Baron Münchhausen—is


able to lift itself by pulling on its own hair. 

The vector field in this example is f (x) = x and it has unbounded deriva-
tive around x = 0. We will soon see that if f (x) does not increase “too quickly”
then uniqueness of solutions x of (B.2) is ensured. A measure for the rate of
increase is the Lipschitz constant.

Definition B.1.2 (Lipschitz continuity). Let Ω ⊆ Rn , and let  ·  be some norm


on Rn (e.g., the standard Euclidean norm). A function f : Ω → Rn is Lipschitz
continuous on Ω if a Lipschitz constant K ≥ 0 exists such that

 f (x) − f (z) ≤ K x − z (B.3)

for all x, z ∈ Ω. It is Lipschitz continuous at x 0 if it is Lipschitz continuous on


some neighborhood Ω of x 0 , and it is locally Lipschitz continuous if it is Lip-
schitz continuous at every x 0 ∈ Rn . 

Figure B.1 illustrates Lipschitz continuity for f : R → R. For the linear f (x) =
kx, with k ∈ R, the Lipschitz constant is obviously K = |k|, and the solution of
the corresponding differential equation

ẋ (t ) = k x (t ), x (0) = x0

is x (t ) = ekt x 0 . Given x 0 , this solution exists and is unique. The idea is now
that for arbitrary Lipschitz continuous f : Rn → Rn the solution of ẋ (t ) =
f ( x (t )), x (0) = x 0 exists and is unique (one some neighborhood of x 0 ) and that
the solution increases at most exponentially fast as a function of t , with the
exponent K equal to the Lipschitz constant (on that neighborhood):
B.1 E XISTENCE AND U NIQUENESS OF S OLUTIONS 207

f (x)

a z b x

F IGURE B.1: A function f : R → R is Lipschitz continuous on some interval


Ω = [a, b] ⊂ R if at each z ∈ [a, b] the graph (x, f (x)) is completely contained
in a steep-enough “bow tie” through the point (z, f (z)). The slope of the
steepest bow tie needed over all z ∈ [a, b] is a possible Lipschitz constant.
See Definition B.1.2.

Theorem B.1.3 (Existence and uniqueness of solution). Let x 0 ∈ Rn and f :


Rn → Rn . If f is Lipschitz continuous at x 0 then, for some T > 0, the differential
equation (B.2) has a unique solution x (t ; x 0 ) for all t ∈ [0, T ), and then for every
fixed t ∈ [0, T ) the solution x (t ; x) is continuous at x = x 0 . Specifically, if x (t ; x 0 )
and x (t ; z 0 ) are two solutions which for all t ∈ [0, T ) live in some neighborhood
Ω, and if f on this neighborhood has a Lipschitz constant K , then

 x (t ; x 0 ) − x (t ; z 0 ) ≤ x 0 − z 0  eK t ∀t ∈ [0, T ).

Proof. The proof can be found in many textbooks, e.g., (Khalil, 1996, Thm. 2.2
& Thm. 2.5). ■

If a single Lipschitz constant K ≥ 0 exists such that (B.3) holds for all x, z ∈ Rn
then f is said to satisfy a global Lipschitz condition.
It follows from the above theorem that the solution x (t ) can be uniquely
continued at every t if f is locally Lipschitz continuous. This is such a desir-
able property that one normally assumes that f is locally Lipschitz continuous.
Every continuously differentiable f is locally Lipschitz continuous, so then we
can uniquely continue the solution x (t ) at every t . However, the solution may
escape in finite time:

Theorem B.1.4 (Escape time). Suppose that f : Rn → Rn is locally Lipschitz


continuous. Then for every x (0) = x 0 there is a unique t (x 0 ) > 0 (possibly
t (x 0 ) = ∞) such that the solution x (t ; x 0 ) of (B.2) exists and is unique on the
half-open time interval [0, t (x 0 )) but does not exist for t > t (x 0 ).
Moreover if t (x 0 ) < ∞ then limt ↑t (x0 )  x (t ; x 0 ) = ∞.
208 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

If f is globally Lipschitz continuous then t (x 0 ) = ∞, i.e., the solution x (t ; x 0 )


then exists and is unique for all t ≥ 0.

Proof. See (Khalil 1996, p. 74–75). ■

This t (x 0 )—whenever finite—is known as the escape time.

Example B.1.5 (Escape time). Consider the scalar differential equation

ẋ (t ) = − x 2 (t ), x (0) = x0 .

The vector field f (x) := −x 2 is locally Lipschitz continuous because it is continu-


ously differentiable. (It is not globally Lipschitz continuous however.) Hence for
every initial condition x 0 there is a unique solution on some non-empty inter-
val [0, t (x 0 )) but t (x 0 ) might be finite. In fact, for this example we can solve the
differential equation explicitly (see Appendix A.3), and the solution is
x0
x (t ) = .
t x0 + 1

If x 0 ≥ 0 then x (t ) is well defined for every t > 0, so then t (x 0 ) = ∞. If, however,


x 0 < 0 then the solution escapes at finite time t (x 0 ) = −1/x 0 , see Fig. B.2. 

We conclude this section with a result about the continuity of solutions that
we need in the proof of the minimum principle (Theorem 2.5.1). Here we take
the standard Euclidean norm:

Lemma B.1.6 (Continuity of solutions). Consider the two differential equations


in x and in z :

ẋ (t ) = f ( x (t )), x (0) = x0 ,
ż (t ) = f ( z (t )) + g (t ), z (0) = z0 .

Let T > 0. If Ω is a set such that x (t ), z (t ) ∈ Ω for all t ∈ [0, T ) and if f on Ω has
Lipschitz constant K , then
t
 x (t ) − z (t ) ≤ e Kt
x 0 − z 0  + g (τ) dτ ∀t ∈ [0, T ).
0

Proof. Let Δ(t ) = x (t ) − z (t ). Then Δ̇(t ) = f ( x (t )) − f ( z (t )) − g (t ). By Cauchy-


d
Schwarz we have | dt Δ(t )| ≤ K Δ(t ) + g (t ). From (A.4) it follows that then
Kt
t
Δ(t ) ≤ e (Δ(0) + 0 g (τ)dτ). ■
B.1 E XISTENCE AND U NIQUENESS OF S OLUTIONS 209

x(t ; x0 )
for x0 0

x̃ 0 1/x̃ 0 t

x(t ; x0 )
for x0 0

F IGURE B.2: Solutions x (t ; x 0 ) of ẋ (t ) = − x 2 (t ) for various initial states


x (0) = x0 . If x0 < 0 then the solution escapes at t = −1/x0 . See Exam-
ple B.1.5.

x2

x0


x(t )

x1

F IGURE B.3: Illustration of stability for systems with two state components,
x = ( x 1 , x 2 ). See § B.2.
210 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

B.2 Definitions of Stability

Asymptotic stability of ẋ (t ) = f ( x (t )) means, loosely speaking, that solutions


x (t ) “come to rest”, and stability means that solutions x (t ) remain “close to rest”.
In order to formalize this, we first have to define the “points of rest”. These are
the constant solutions, x (t ) = x̄, of the differential equation, so solutions x̄ of
f (x̄) = 0.

Definition B.2.1 (Equilibrium). x̄ ∈ Rn is an equilibrium (point) of (B.2) if


f (x̄) = 0. 

Different possibilities for the behavior of the system near an equilibrium


point are described in the following definition (see also Fig. B.3).

Definition B.2.2 (Stability of equilibria). An equilibrium point x̄ of a differen-


tial equation ẋ (t ) = f ( x (t )), x (0) = x 0 is called
1. stable if ∀ > 0 ∃δ > 0 such that x 0 − x̄ < δ implies  x (t ; x 0 )− x̄ <  ∀t ≥ 0.

2. attractive if ∃δ1 > 0 such that x 0 − x̄ < δ1 implies that limt →∞ x (t ; x 0 ) = x̄.

3. asymptotically stable if it is stable and attractive.

4. globally attractive if limt →∞ x (t ; x 0 ) = x̄ for every x 0 ∈ Rn .

5. globally asymptotically stable if it is stable and globally attractive.

6. unstable if x̄ is not stable. This means that ∃ > 0 such that ∀δ > 0 an x 0
and a t 1 ≥ 0 exists for which x 0 − x̄ < δ yet  x (t 1 ; x 0 ) − x̄ ≥ .


In particular, an equilibrium x̄ is unstable if every neighborhood of it con-


tains an x 0 that has finite escape time, t (x 0 ) < ∞. Surprisingly, perhaps, we
have that attractive equilibria need not be stable, see Exercise B.24. However,
it is easy to see that in the case of linear dynamics, ẋ (t ) = A x (t ), attractivity
implies global attractivity and global asymptotic stability. In fact, in the linear
case, attractivity, asymptotic stability, global attractivity and global asymptotic
stability are all equivalent.
Instead of (in)stability of equilibria, one may also study (in)stability of a spe-
cific trajectory x (t ), t ∈ R, in particular of a periodic orbit. We do not explicitly
deal with this problem.
There are many ways to analyze stability properties of equilibria. Of par-
ticular importance are those methods that do not rely on explicit forms of the
solutions x (t ) since explicit forms are in general hard to find. Two methods,
both attributed to Lyapunov, that do not require explicit knowledge of x (t ) are
linearization, also known as Lyapunov’s first method, and the method of Lya-
punov functions, also known as Lyapunov’s second method. An advantage of the
second method over the first is that the first can be proved elegantly with the
second. This is why Lyapunov’s second method is covered first.
B.3 LYAPUNOV F UNCTIONS 211

B.3 Lyapunov Functions

Lyapunov’s second method mimics the well-known physical property that a sys-
tem that continually loses energy eventually comes to a halt. Of course, in a
mathematical context one may bypass the notion of physical energy, but it is a
helpful interpretation.
Suppose we have a function V : Rn → R that does not increase along any
solution x of the differential equation, i.e., that

V ( x (t + h)) ≤ V ( x (t )) ∀h > 0, ∀t (B.4)

for every solution of ẋ (t ) = f ( x (t )). If V ( x (t )) is differentiable with respect to


time t then it is non-increasing for all solutions iff its derivative with respect to
time is non-positive everywhere,

V̇ ( x (t )) ≤ 0 ∀t . (B.5)

This condition can be checked for solutions x of ẋ (t ) = f ( x (t )) without explicitly


computing the solutions. Indeed, using the chain rule, we have that
dV ( x (t )) ∂V ( x (t )) ∂V ( x (t ))
V̇ ( x (t )) = = ẋ 1 (t ) + · · · + ẋ n (t )
dt ∂x 1 ∂x n
∂V ( x (t )) ∂V ( x (t ))
= f 1 ( x (t )) + · · · + f n ( x (t )),
∂x 1 ∂x n
and, hence, (B.5) holds for all solutions x (t ) iff
∂V (x) ∂V (x)
f 1 (x) + · · · + f n (x) ≤ 0 ∀x ∈ Rn . (B.6)
∂x 1 ∂x n
This final inequality no longer involves time, let alone solutions x (t ) depending
on time. This is a key result. We clean up the notation a bit. The left-hand side
of (B.6) can be seen as the column vector f (x) premultiplied by the gradient of
V (x) seen as a row vector,
 
∂V (x) ∂V (x) ∂V (x) ∂V (x)
:= ··· .
∂x T ∂x 1 ∂x 2 ∂x n
(Appendix A.2 explains this notation, in particular the role of the transpose.)
Thus (B.6) by definition is the same as ∂V∂x(x)
T f (x) ≤ 0. With slight abuse of nota-
tion we now use V̇ (x) to mean
∂V (x)
V̇ (x) = f (x),
∂x T
and we conclude that (B.5) holds for all solutions x of the differential equation
iff V̇ (x) ≤ 0 for all x ∈ Rn . In order to deduce stability from the existence of a
non-increasing function V ( x (t )) we additionally require that V has a minimum
at the equilibrium. Furthermore, we also require a certain degree of differentia-
bility of the function. We formalize these properties in the following definition
and theorem. As in Appendix A.1 we define:
212 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

Definition B.3.1 (Positive and negative (semi-)definite). Let Ω ⊆ Rn and


assume it is a neighborhood of some x̄ ∈ Rn . A continuously differentiable
function V : Ω → R is positive definite on Ω relative to x̄ if

V (x̄) = 0 while V (x) > 0 for all x ∈ Ω \ {x̄}.

It is positive semi-definite if V (x̄) = 0 and V (x) ≥ 0 for all other x. And V is nega-
tive (semi-)definite if −V is positive (semi-)definite. 

Positive definite implies that V has a unique minimum on Ω, and that the
minimum is attained at x̄. The assumption that the minimum is zero, V (x̄) = 0,
is a convenient normalization. Figure B.4 shows an example of each of the four
types of “definite” functions.

positive positive
definite semi-definite

x̄ x x̄ x

x̄ x̄
x x

negative negative
definite semi-definite

F IGURE B.4: Examples of graphs of positive/negative (semi-)definite func-


tions V : R → R. See Definition B.3.1.

The following famous result can now be proved:

Theorem B.3.2 (Lyapunov’s second stability theorem). Consider the differen-


tial equation ẋ (t ) = f ( x (t )) and assume that f : Rn → Rn is locally Lipschitz con-
tinuous. Let x̄ be an equilibrium of this differential equation. If there is a neigh-
borhood Ω of x̄ and a function V : Ω → R such that on Ω

1. V is continuously differentiable,

2. V is positive definite relative to x̄,

3. V̇ is negative semi-definite relative to x̄,

then x̄ is a stable equilibrium and we call V a Lyapunov function.


If in addition V̇ is negative definite on Ω relative to x̄ (so not just negative
semi-definite) then x̄ is asymptotically stable and we call V a strong Lyapunov
function.
B.3 LYAPUNOV F UNCTIONS 213

F IGURE B.5: Four inclusions of regions (used in the proof of Theo-


rem B.3.2).

Proof. We denote the open ball with radius r and center x̄ by B (x̄, r ), i.e.,

B (x̄, r ) :={x ∈ Rn | x − x̄ < r }.

We first consider the stability property. For every  > 0 we have to find a δ > 0
such that x 0 ∈ B (x̄, δ) implies x (t ) ∈ B (x̄, ) for all t > 0. To this end we construct
a series of inclusions, see Fig. B.5.
Because Ω is a neighborhood of x̄, there exists an 1 > 0 such that B (x̄, 1 ) ⊂
Ω. Without loss of generality we can take it so small that 1 ≤ . Because V is
continuous on Ω and because the boundary of B (x̄, 1 ) is a compact set, the
function V has a minimum on the boundary of B (x̄, 1 ). We call this minimum
α, and realize that α > 0. Now define

Ω1 :={x ∈ B (x̄, 1 ) | V (x) < α}.

This set Ω1 is open because V is continuous. By definition, Ω1 is contained in


B (x̄, 1 ). Clearly, x̄ is an element of Ω1 because V (x̄) = 0. Thus, as Ω1 is open,
there exists a δ > 0 such that B (x̄, δ) ⊂ Ω1 . We prove that this δ satisfies the
requirements.
If x 0 ∈ B (x̄, δ) we find, because V̇ is negative semi-definite, that V ( x (t ; x 0 )) ≤
V (x 0 ) < α for all t ≥ 0. This means that it is impossible that x (t ; x 0 ), with initial
condition in B (x̄, δ), reaches the boundary of B (x̄, 1 ) because on this boundary
we have, by definition, that V (x) ≥ α. Hence,  x (t ; x 0 ) − x̄ < 1 ≤  for all time
and the system, therefore, is stable.
Next we prove that the stronger inequality V̇ (x) < 0 ∀x ∈ Ω\{x̄} assures
asymptotic stability. Specifically we prove that for every x 0 ∈ B (x̄, δ) the solu-
tion x (t ; x 0 ) → x̄ as t → ∞. First note that, because of stability, the state trajec-
tory x (t ; x 0 ) remains within the bounded set B (x̄, 1 ) for all time. Now, to obtain
a contradiction, assume that x (t ; x 0 ) does not converge to x̄. This implies that
there is a μ > 0 and time instances t k , with limk→∞ t k = ∞, such that

 x (t k ; x 0 ) − x̄ > μ > 0 ∀k ∈ N.


214 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

As x (t k ; x 0 ) is a bounded sequence, the theorem of Bolzano-Weierstrass guaran-


tees the existence of a subsequence x (t k j ; x 0 ) that converges to some element
x ∞ . Clearly x ∞ = x̄. Since V ( x (t )) is non-increasing we have for every t > 0 that

V ( x (t k j ; x 0 )) ≥ V ( x (t k j + t ; x 0 )) ≥ V ( x (t k j +m ; x 0 )),

where m is chosen such that t k j + t < t k j +m . The term in the middle equals
V ( x (t ; x (t k j ; x 0 ))). So we also have

V ( x (t k j ; x 0 )) ≥ V ( x (t ; x (t k j ; x 0 ))) ≥ V ( x (t k j +m ; x 0 )),

In the limit j → ∞ the above inequality becomes

V (x ∞ ) ≥ V ( x (t ; x ∞ )) ≥ V (x ∞ ).

(Let us be precise here: since the differential equation is locally Lipschitz con-
tinuous we have, by Theorem B.1.3, that x (t ; x) depends continuously on x. For
that reason we are allowed to say that lim j →∞ x (t ; x (t k j ; x 0 )) = x (t ; x ∞ ).) The
above shows that V ( x (t ; x ∞ )) = V (x ∞ ) for all t > 0. In particular we see that
V ( x (t ; x ∞ )) is constant. But that would mean that V̇ (x ∞ ) = 0, and this violates
the fact that V̇ is negative definite and x ∞ = x̄. Therefore the assumption that
x (t ) does not converge to x̄ is wrong. The system is asymptotically stable. ■

By definition, a Lyapunov function V ( x (t )) never increases over time (on Ω),


while a strong Lyapunov function V ( x (t )) always decreases on Ω unless we are
at the equilibrium x̄.

1 1
x

1 − x2
F IGURE B.6: Graph of . See Example B.3.3.
1 + x2

Example B.3.3 (First-order system). The scalar system

1 − x 2 (t )
ẋ (t ) =
1 + x 2 (t )

has two equilibria, x̄ = ±1, see Fig. B.6. For equilibrium x̄ = 1 we propose the
candidate Lyapunov function

V (x) = (x − 1)2 .
B.3 LYAPUNOV F UNCTIONS 215

It is positive definite relative to x̄ = 1 and it is continuously differentiable. On


Ω = (−1, ∞) it is a Lyapunov function because then also the third condition of
Theorem B.3.2 holds:
∂V (x) 1 − x2 (1 − x)2 (1 + x)
V̇ (x) = f (x) = 2(x − 1) = −2 ≤ 0 ∀x ∈ (−1, ∞).
∂x 1 + x2 1 + x2
Actually V̇ (x) < 0 for all x ∈ (−1, ∞) \ {1}, so it is in fact a strong Lyapunov func-
tion for equilibrium x̄ = 1 and, hence, the equilibrium is asymptotically stable.
The other equilibrium, x̄ = −1, is unstable. 

x2
joint

x1 x1 x1

mass

F IGURE B.7: Left: pendulum. Right: level sets of its mechanical energy V (x).
See Example B.3.4.

Example B.3.4 (Undamped pendulum). The standard equation of motion of a


pendulum without damping is

ẋ 1 (t ) = x 2 (t ),
g
ẋ 2 (t ) = −  sin( x 1 (t )).

Here x 1 is the angular displacement, x 2 is the angular velocity, g is the gravita-


tional acceleration, and  is the length of the pendulum, see Fig. B.7(left). The
mechanical energy of the pendulum with mass m is

1  
V (x) = m2 x 22 + mg  1 − cos(x 1 ) .
2
This energy is zero at (x 1 , x 2 ) = (2kπ, 0), k ∈ Z and it is positive elsewhere. To turn
this into a Lyapunov function for the hanging position x̄ = (0, 0) we simply take,
say,

Ω = {x ∈ R2 | −2π < x 1 < 2π}.

This way V on Ω has a unique minimum at equilibrium x̄ = (0, 0). Hence V


is positive definite relative to this x̄ for this Ω. Clearly, V is also continuously
216 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

differentiable, and V̇ (x) equals


∂V (x) ∂V (x) ∂V (x)
V̇ (x) = f (x) = f 1 (x) + f 2 (x)
∂x T ∂x 1 ∂x 2
g
= mg  sin(x 1 )x 2 − m2 x 2 sin(x 1 ) = 0.

Apparently the mechanical energy is constant over time. Therefore, using Theo-
rem B.3.2, we may draw the conclusion that the system is stable, but not neces-
sarily asymptotically stable. The fact that V ( x (t )) is constant actually implies it
is not asymptotically stable. Indeed if we start at a nonzero state x 0 ∈ Ω—so with
V (x 0 ) > 0—then V ( x (t )) = V (x 0 ) for all time, and x (t ) thus does not converge to
(0, 0) as t → ∞. Figure B.7(right) depicts level sets {(x 1 , x 2 )|V (x 1 , x 2 ) = c} of the
mechanical energy in the phase plane for several levels c > 0. Solutions x (t ; x 0 )
remain within its level set {x|V (x) = V (x 0 )} . 

For strong Lyapunov functions, Theorem B.3.2 states that x (t ; x 0 ) → x̄ for ini-
tial sates x 0 that are sufficiently close to the equilibrium. At first sight it seems
reasonable to expect that the “bigger” the set Ω the “bigger” the region of attrac-
tion. Alas, as demonstrated in Exercise B.4, having a strong Lyapunov function
on the entire state space Ω = Rn does not imply that x (t ; x 0 ) → x̄ for all initial
conditions x 0 ∈ Rn . The question that thus arises is: what is the region of attrac-
tion of the equilibrium x̄ in case it is asymptotically stable, and under which
conditions is this region of attraction the entire state space Rn ?
The proof of Theorem B.3.2 gives some insight into the region of attrac-
tion. In fact, it follows that the region of attraction of x̄ includes the largest ball
around x̄ that is contained in Ω1 :={x ∈ B (x̄, ) | V (x) < α}, see Fig. B.5. We use
this observation to formulate an extra condition on V that guarantees global
asymptotic stability.

Theorem B.3.5 (Global asymptotic stability). Suppose all conditions of Theo-


rem B.3.2 are met with Ω = Rn . If V : Rn → R is a strong Lyapunov function with
the additional property that

V (x) → ∞ as x → ∞, (B.7)

then the system is globally asymptotically stable. (Property (B.7) is known as


radial unboundedness.)

Proof. The proof of Theorem B.3.2 shows that x (t ) → x̄ as t → ∞ whenever


x 0 ∈ B (x̄, δ), where δ is as indicated in Fig. B.5. Remains to show that every x 0
is in this ball B (x̄, δ), that is, that δ can be chosen arbitrarily large. We will con-
struct the various regions of Fig. B.5 starting with the smallest and step-by-step
working towards the biggest.
Take an arbitrary x 0 ∈ Rn and let δ := 2x̄ − x 0 . Then by construction we
have that x 0 ∈ B (x̄, δ). Define α = maxx−x̄≤δ V (x). This α is finite and positive.
Next let Ω1 = {x ∈ Rn |V (x) < α}. This set is bounded because V (x) is radially
B.3 LYAPUNOV F UNCTIONS 217

unbounded. (This is the reason we require radial unboundedness.) As a result,


1 defined as 1 = supx∈Ω1 x − x̄ is finite. For every  > 1 the conditions of The-
orem B.3.2 are met, and since x 0 ∈ B (x̄, δ), the proof of Theorem B.3.2 says that
x (t ) → x̄ as t → ∞. This works for every x0 so the system is globally attractive.
Together with asymptotic stability this means it is globally asymptotically stable.

F IGURE B.8: Phase portrait of the system of Example B.3.6. The origin is
globally asymptotically stable.

Example B.3.6 (Global asymptotic stability). Consider the system

ẋ 1 (t ) = − x 1 (t ) + x 22 (t ),
ẋ 2 (t ) = − x 2 (t ) x 1 (t ) − x 2 (t ).

Clearly the origin (0, 0) is an equilibrium of this system. We choose

V (x) = x 12 + x 22 .

This V is radially unbounded and it is a strong Lyapunov function on R2 because


it is positive definite and continuously differentiable and

V̇ (x) = 2x 1 (−x 1 + x 22 ) + 2x 2 (−x 2 x 1 − x 2 ) = −2(x 12 + x 22 ) < 0 ∀x = 0.

Since V is radially unbounded the equilibrium (0, 0) is globally asymptotically


stable. This also implies that (0, 0) is the only equilibrium. Its phase portrait is
shown in Fig. B.8. 

Powerful as the theory may be, it does not really tell us how to find a Lya-
punov function, assuming one exists. Systematic design of Lyapunov functions
is hard, but it does work for linear systems, as discussed in § B.5. In physical sys-
tems the construction of Lyapunov functions is often facilitated by the knowl-
edge of existence of conserved quantities, like total energy or total momentum.
218 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

B.4 LaSalle’s Invariance Principle

Theorem B.3.2 guarantees asymptotic stability when V̇ (x) < 0 everywhere


except at the equilibrium. However, in many cases of interest the natural Lya-
punov function does not satisfy this condition, while the equilibrium may be
asymptotically stable nonetheless. Examples include physical systems whose
energy decreases almost everywhere but not everywhere. A case in point is the
pendulum with damping:

Example B.4.1 (Damped pendulum). The equations of motion of a pendulum


subject to damping due to a friction force are

ẋ 1 (t ) = x 2 (t ),
g d
ẋ 2 (t ) = − sin( x 1 (t )) − x 2 (t ),
 m
where x 1 is the angular displacement, and x 2 is the angular velocity. The param-
eter d is a positive friction coefficient. The time-derivative of the mechanical
energy V (x) = 12 m2 x 22 + mg (1 − cos(x 1 )) is

g
V̇ (x) = mg  sin(x 1 )x 2 − m2 x 2 sin(x 1 ) − d 2 x 22 = −d 2 x 22 ≤ 0.

Thus the mechanical energy decreases everywhere except if the angular velocity
x 2 is zero. Using Theorem B.3.2 we may draw the conclusion that the system is
stable, but not that it is asymptotically stable, because V̇ (x) is also zero at points
other than the equilibrium (it is zero at every x = (x 1 , 0)). However from physical
considerations we feel that (0, 0) is asymptotically stable. 

In the above example we would still like to infer asymptotically stability


(since we do know from experience that the hanging position of the pendulum
with friction is asymptotically stable). If we were to use the theory from the pre-
vious section, we would have to find a new Lyapunov function (different from
the mechanical energy), but this is not an easy task. In this section we discuss a
method that allows to prove asymptotic stability without us having to construct
a new Lyapunov function.
From the above pendulum example one might be tempted to conclude that
asymptotic stability follows as long as V̇ (x) < 0 “almost everywhere” in state
space. That is not necessarily the case as the following basic example demon-
strates.

Example B.4.2 (Simple system). Consider

ẋ 1 (t ) = 0,
ẋ 2 (t ) = − x 2 (t ).
B.4 L A S ALLE ’ S I NVARIANCE P RINCIPLE 219

x2
x1

F IGURE B.9: Simple system. All solutions converge to the x 1 -axis. See
Example B.4.2.

Clearly, x 1 is constant and x 2 converges exponentially fast to zero (see the vec-
tor field of Fig. B.9). Now

V (x) := x 12 + x 22

is a Lyapunov function for x̄ = (0, 0) because it is positive definite and continu-


ously differentiable, and

V̇ (x) = 2x 1 ẋ 1 + 2x 2 ẋ 2 = −2x 22 ≤ 0.

The set of states x where V̇ (x) = 0 is where x 2 = 0 (i.e., the x 1 -axis) and every-
where else in the plane we have V̇ (x) < 0. In that sense V̇ (x) is negative “almost
everywhere”. The origin is however not asymptotically stable because every
(x̄ 1 , 0) on the x 1 -axis is an equilibrium, so no matter how small we take δ > 0,
there is always an initial state x 0 = (δ/2, 0) less than δ away from (0, 0) for which
the solution x (t ; x 0 ) is constant, and so does not converge to (0, 0). 

We set up a generalized Lyapunov theory that allows to prove that the hang-
ing position in the damped pendulum example (Example B.4.1) is asymptoti-
cally stable, and that in Example B.4.2 all solutions converge to the x 1 -axis. It
requires a bit of terminology.

Definition B.4.3 (Orbit). The orbit O (x 0 ) with initial condition x 0 is defined as


O (x 0 ) = {y ∈ Rn | y = x (t ; x 0 ) for some t ≥ 0}. 

The orbit of x 0 is just the set of states that x (t ; x 0 ) traces out as t varies over
all t ≥ 0.

Definition B.4.4 (Invariant set). A set G ⊆ Rn is called a (forward) invariant


set for (B.2) if every solution x (t ; x 0 ) of (B.2) with initial condition x 0 in G , is
contained in G for all t > 0. 
220 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

F IGURE B.10: Phase portrait. See Example B.4.6.

So once the state is in an invariant set it never leaves it. Note that every
orbit is an example of an invariant set. In particular every equilibrium point is
an invariant set.

Example B.4.5. The x 1 -axis is an invariant set for the system of Example B.4.2.
In fact every element x = (x 1 , 0) of this axis is an invariant set because they all
are equilibria. The general solution is x (t ) = (x 10 , x 20 e−t ). This shows that for
instance also the x 2 -axis {(0, x 2 )|x 2 ∈ R} is an invariant set. 

The union of two invariant sets is invariant. In fact, the union of an arbitrary
number (finite, infinite, countable, uncountable) of invariant sets is invariant.

Example B.4.6 (Rotation invariant phase portrait). The phase portrait of


Fig. B.10 is that of
 
ẋ1 (t ) = x 2 (t ) + x1 (t ) 1 − x 21 (t ) − x 22 (t ) ,
  (B.8)
ẋ2 (t ) = − x 1 (t ) + x 2 (t ) 1 − x 21 (t ) − x 22 (t ) .
Inspired by the rotation-invariant phase portrait (see Fig. B.10) we analyze first
how the squared radius

r (t ) := x 21 (t ) + x 22 (t )
changes over time,
 2 
ṙ (t ) = dt
d
x 1 (t ) + x 22 (t )
= 2 x 1 (t ) ẋ 1 (t ) + 2 x 2 (t ) ẋ 2 (t )
 
= 2 x 1 (t ) x 2 (t ) + 2 x 21 (t ) 1 − x 21 (t ) − x 22 (t )
 
− 2 x 2 (t ) x 1 (t ) + 2 x 22 (t ) 1 − x 21 (t ) − x 22 (t )
  
= 2 x 21 (t ) + x 22 (t ) 1 − x 21 (t ) − x 22 (t ) .

Therefore

ṙ (t ) = 2 r (t )(1 − r (t )). (B.9)


B.4 L A S ALLE ’ S I NVARIANCE P RINCIPLE 221

If r (0) = 1 then r (t ) is always equal to one, so the unit circle is an invariant set.
Furthermore, Eqn. (B.9) shows that if 0 ≤ r (0) < 1, then 0 ≤ r (t ) < 1 for all time.
Hence the open unit disc is also invariant. Using similar arguments, we find that
also the complement of the unit disc is invariant. This system has many more
invariant sets. 

In the previous example the state does not always converge to a single ele-
ment, but to a set (e.g., the unit circle in Example B.4.6). We use dist(x, G ) to
denote the (infimal) distance between a point x ∈ Rn and a set G ⊆ Rn , thus

dist(x, G ) := inf x − g ,
g ∈G

and we say that x converges to a set G if limt →∞ dist( x (t ), G ) = 0. The following


extension of Lyapunov’s stability theorem can now be proved.

Theorem B.4.7 (LaSalle’s invariance principle). Let x̄ be an equilibrium of a


locally Lipschitz continuous differential equation ẋ (t ) = f ( x (t )), and suppose
that V is a Lyapunov function for this system on some neighborhood Ω of x̄.
Then Ω contains a closed and bounded invariant neighborhood K of x̄, and
for every x 0 ∈ K the solution x (t ) as t → ∞ converges to the set

G :={x ∈ K | V̇ ( x (t ; x)) = 0 ∀t ≥ 0}.

This set is invariant and non-empty. In particular, if G = {x̄} then x̄ is an asymp-


totically stable equilibrium.

Proof. The construction of K is very similar to that of Ω1 in the proof of Theo-


rem B.3.2. Since Ω is a neighborhood of x̄ there is, by definition, a small enough
ball B (x̄, ) completely contained in Ω. Let α = minx−x̄= V (x). This α is larger
than zero. Then K :={x ∈ B (x̄, )|V (x) ≤ α/2} does the job. Indeed it is bounded,
it is closed, and since V̇ (x) ≤ 0 it is also invariant. And, finally, it is a neighbor-
hood of x̄.
The set G is non-empty (it contains x̄). Let x be an element of G . Then
by invariance of K for every t > 0 the element y := x (t ; x) is in K . Also since
V̇ ( x (s; y)) = V̇ ( x (t + s; x)) = 0 this orbit is in G . Hence G is invariant.
Next let x 0 ∈ K . Since K is invariant, the entire orbit x (t ; x 0 ) is in K for
all time. Now suppose, to obtain a contradiction, that x (t ) does not converge to
G . Then, as x (t ) is bounded, there is a sequence t n of time with limn→∞ t n = ∞
for which x (t n , x 0 ) converges to some x ∞ ∈ G . This x ∞ is in K because K is
closed. We claim that V ( x (t ; x ∞ )) is constant as a function of time. To see this
we need the inequality

V ( x (t n ; x 0 )) ≥ V ( x (t n + t ; x 0 )) ≥ V (x ∞ ) ∀t ≥ 0. (B.10)

(The first inequality holds because V̇ (x) ≤ 0 and the second inequality follows
from V̇ (x) ≤ 0 combined with the fact that t n +t < t n+k for some large enough k,
222 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

so that V ( x (t n + t )) ≥ V ( x (t n+k )) ≥ V (x ∞ ).) Taking the limit n → ∞ turns (B.10)


into

V (x ∞ ) ≥ V ( x (t ; x ∞ )) ≥ V (x ∞ ).

Hence V ( x (t ; x ∞ )) is constant for all time, that is, V̇ ( x (t ; x ∞ )) = 0. But then x ∞ ∈


G (by definition of G ) which is a contradiction. Therefore the assumption that
x (t ) does not converge to G is wrong. ■

The proof also provides an explicit description of the set K . But if we only
want to establish asymptotic stability then we can normally avoid this descrip-
tion. Its existence is enough.

Example B.4.8. Consider the system

ẋ 1 (t ) = x 32 (t ),
(B.11)
ẋ 2 (t ) = − x 31 (t ) − x 2 (t ).
Clearly, the origin (0, 0) is an equilibrium. For this equilibrium we propose the
Lyapunov function

V (x) = x 14 + x 24 .

This function is indeed a Lyapunov function (on Ω = R2 ) because it is continu-


ously differentiable, it is positive definite and

V̇ (x) = 4x 13 (x 23 ) + 4x 23 (−x 13 − x 2 ) = −4x 24 ≤ 0.

This implies that the origin is stable, but not necessarily asymptotically stable.
To prove asymptotic stability we use Theorem B.4.7. This theorem says that a
bounded, closed invariant neighborhood K of the equilibrium (0, 0) exists, but
we need not worry about its form. The set of interest is G . It contains those
initial states x ∈ K whose solution x (t ; x) satisfies the system equations (B.11)
and at the same time is such that V̇ ( x (t )) = 0 for all time. For our example the
latter means

x 2 (t ) = 0 ∀t .
Substituting this into the system equations (B.11) gives

ẋ1 (t ) = 0,
0 = − x 31 (t ) − 0, ∀t .

Clearly then x 1 (t ) = 0 for all time as well, and so

G = {(0, 0)}.

LaSalle’s invariance principle proves that for every x 0 ∈ K the solution x (t ) con-
verges to (0, 0) as t → ∞, and, hence, that (0, 0) is an asymptotically stable equi-
librium of this system. 
B.4 L A S ALLE ’ S I NVARIANCE P RINCIPLE 223

Example B.4.9 (Example B.4.1 continued). Consider again the damped pendu-
lum from Example B.4.1,

ẋ 1 (t ) = x 2 (t ),
g d (B.12)
ẋ 2 (t ) = − sin( x 1 (t )) − x 2 (t ).
 m

We found that the mechanical energy V (x) := 12 m2 x 22 + mg (1 − cos(x 1 )) is a


Lyapunov function on some small enough neighborhood Ω of the hanging equi-
librium x̄ = (0, 0), and we also found that

V̇ (x) = −d 2 x 22 .

The equality V̇ ( x (t )) = 0 hence holds for all time iff x 2 (t ) = 0 for all time, and
the LaSalle set G therefore is

G = {x ∈ K | x (t ; x) satisfies (B.12) and x 2 (t ; x) = 0 ∀t }.

We comment on K later. Since x 2 (t ) ≡ 0 the equations (B.12) reduce to ẋ 1 (t ) = 0


and 0 = −g / sin( x 1 (t )). This implies that x 1 (t ) is constant and sin( x 1 (t )) = 0, so

G = {x ∈ K | x 1 = kπ, k ∈ Z, x 2 = 0}.

This set contains at most two physically different solutions: the hanging down-
   
wards solution x = 00 and the standing upwards solution x = π0 . To rule out
 
the upwards solution it suffices to take the neighborhood Ω of x̄ = 00 so small
π
that 0 ∈ Ω. For example

Ω = {x ∈ R2 | −π < x 1 < π}.

LaSalle’s invariance principle now guarantees the existence of an invariant,


closed, bounded neighborhood K of x̄ in Ω. Clearly, this K does not contain
π
0 either, so then
 
G = { 00 },

and thus we have asymptotic stability of the hanging position.


Although not strictly needed, it may be interesting to know that by conti-
nuity of V we can always take K equal to the set of states x close enough to
x̄ = (0, 0) and whose energy V (x) is less than or equal to some small enough
positive number such as,
 
K = {x ∈ R2 | −π < x 1 < π, V (x) ≤ 0.9V ( π0 )}.

Since the energy does not increase over time it is immediate that this set is
invariant. It is also closed and bounded, and it is a neighborhood of (0, 0). 
224 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

Example B.4.10 (Example B.4.2 continued). Consider again the system

ẋ 1 (t ) = 0,
ẋ 2 (t ) = − x 2 (t ),

with equilibrium x̄ = (0, 0) and Lyapunov function V (x) = x 12 + x 22 . In Exam-


ple B.4.2 we found that V (x) is a Lyapunov function and that

V̇ (x) = −2x 22 ≤ 0.

Substitution of x 2 (t ) = 0 in the system equations reduces the system equations


to

ẋ 1 (t ) = 0.

Hence x 1 (t ) is constant (besides x 2 (t ) = 0), so

G = {(x 1 , x 2 ) ∈ K | x 2 = 0}.

This is a part of the x 1 -axis. Now LaSalle’s invariance principle says that all states
that start in K converge to the x 1 -axis as t → ∞. For K we can take for instance
K = {x ∈ R2 |V (x) ≤ 1000}. 

B.5 Cost-to-Go Lyapunov Functions

It is in general hard to come up with a Lyapunov function for a given ẋ (t ) =


f ( x (t )) and equilibrium point x̄. An elegant attempt, with interesting interpre-
tations, goes as follows. Suppose we have to pay an amount

L(x) ≥ 0

per unit time, when we are at state x. As time progresses we move as dictated
by the differential equation, and so the cost L( x (t )) typically changes with time.
The cost-to-go V (x 0 ) is now defined as the total payment over the infinite future
if we start at x 0 , that is, it is the integral of L( x (t )) over positive time,
∞
V (x 0 ) := L( x (τ)) dτ for x (0) = x 0 . (B.13)
0

If L( x (t )) decreases quick enough as we approach the equilibrium x̄ then the


cost-to-go may be well defined (finite), and possibly it is going to be contin-
uously differentiable in x 0 as well. These are technical considerations and they
might be hard to verify. The interesting property of the cost-to-go V ( x (t )) is that
it decays as t increases. In fact

V̇ (x) = −L(x) (B.14)


B.5 C OST- TO -G O LYAPUNOV F UNCTIONS 225

whenever V (x) as defined in (B.13) is convergent. To see this split the cost-to-
go into an integral over the first h units of time and an integral over the time
beyond h,
t +h ∞ t +h
V ( x (t )) = L( x (τ)) dτ + L( x (τ)) dτ = L( x (τ)) dτ + V ( x (t + h)).
t t +h t

Therefore
t +h
V ( x (t + h)) − V ( x (t )) − t L( x (τ)) dτ
V̇ ( x (t )) = lim = lim = −L( x (t ))
h→0 h h→0 h
if L( x (t )) is continuous. An interpretation of (B.14) is that the current cost-to-
go minus the cost-to-go from tomorrow onwards, is what we pay today. The
function L(x) is called the running cost. In physical applications L(x) is often
the dissipated power, and then V (x) is the total dissipated energy.

Example B.5.1 (Damped pendulum). Consider once more the pendulum as


shown in Fig. B.7 (p. 215). Here x 1 is the angular displacement, g is the gravita-
tional acceleration,  is the length of the pendulum, and m its mass. The force
balance on the mass is

m ẍ 1 (t ) = −mg sin( x 1 (t )) − d  ẋ 1 (t ).

The final term, d  ẋ 1 (t ), is a friction force, and d is a positive number known


as the friction coefficient. The dissipated power due to this friction equals the
friction force times the velocity of the mass,  ẋ 1 (t ). Hence we take this product
as our running cost,

L( x 1 (t ), ẋ 1 (t )) = d  ẋ 1 (t ) ẋ 1 (t ) = d 2 ẋ 21 (t ).

Notice that L ≥ 0 because d > 0. The cost-to-go is the total dissipated power,
∞
V ( x 1 (0), ẋ 1 (0)) := d 2 ẋ 21 (t ) dt
0
∞
−mg sin( x 1 (t )) − m ẍ 1 (t )
= d 2 ẋ 1 (t ) dt
d
0∞
= −mg  sin( x 1 (t )) ẋ 1 (t ) − m2 ẍ 1 (t ) ẋ 1 (t ) dt
0
 ∞
= mg  cos( x 1 (t )) − m2 21 ẋ 21 (t )
0
= mg (1 − cos( x 1 (0)) + 2 m ẋ 1 (0).
1 2 2
(B.15)

In the final equality we used that x 1 (t ) and ẋ 1 (t ) converge to zero as t → ∞. The


first term in (B.15) is the potential energy at t = 0, and the second term is the
kinetic energy at t = 0. Thus the cost-to-go V is the mechanical energy. This is
the Lyapunov function that we used in Example B.3.4 and Example B.4.1. 
226 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

As mentioned earlier, the only obstacle is that the integral (B.13) has to be
well defined and continuously differentiable at x 0 . If the system dynamics is
linear of the form

ẋ (t ) = A x (t )

then these obstacles can be overcome and we end up with a very useful result.
It is a classic result in systems theory. In this result we take the running cost to
be quadratic in the state,

L(x) = x T Qx,

with Q ∈ Rn×n a symmetric positive definite matrix (see Appendix A.1).

Theorem B.5.2 (Lyapunov equation). Let A ∈ Rn×n and consider

ẋ (t ) = A x (t ), x (0) = x0 . (B.16)

Suppose Q ∈ Rn×n is symmetric positive definite, and let


∞
V (x 0 ) := x T (t )Q x (t ) dt . (B.17)
0

The following four statements are equivalent.

1. x̄ = 0 is a globally asymptotically stable equilibrium of (B.16).

2. x̄ = 0 is an asymptotically stable equilibrium of (B.16).

3. V (x) defined in (B.17) exists for every x ∈ Rn , and it is a strong Lyapunov


function for (B.16) with equilibrium x̄ = 0. In fact V (x) is quadratic, V (x) =
x T P x, with P ∈ Rn×n the well-defined positive definite matrix
∞
T
P := e A t Q e At dt . (B.18)
0

4. The linear matrix equation (known as the Lyapunov equation)

A T P + P A = −Q (B.19)

has a unique solution P ∈ Rn×n , and P is symmetric positive definite.

In that case the P of (B.18) and (B.19) are the same.

Proof. We prove the cycle of implications 1. =⇒ 2. =⇒ 3. =⇒ 4. =⇒ 1.

1. =⇒ 2. Trivial.
B.5 C OST- TO -G O LYAPUNOV F UNCTIONS 227

2. =⇒ 3. The solution of ẋ (t ) = A x (t ) is x (t ) = e At x 0 . By asymptotic stability


the entire transition matrix converges to zero, limt →∞ e At = 0 ∈ Rn×n . Now
∞ ∞
T
At At
V (x 0 ) = (e x 0 ) Q(e x 0 ) dt =
T
x 0T (e A t Q e At )x 0 dt = x 0T P x 0
0 0
∞ T
for P := 0 e A t Q e At dt . This P is well defined because e At converges to
zero exponentially fast. This P is positive definite because it is the integral
of a positive definite matrix.
So V (x 0 ) is well defined and quadratic and, hence, continuously differ-
entiable. It has a unique minimum at x 0 = 0 and, as we showed earlier,
V̇ (x) = −L(x) := −x T Qx ≤ 0. Hence V is a Lyapunov function. In fact it is a
strong Lyapunov function because −x T Qx < 0 for every x = 0.

3. =⇒ 4. Since V is a strong Lyapunov function, it follows that A is asymptoti-


cally stable. Take P as defined in (B.18). Then
∞  T ∞
T T
ATP + P A = A T e A t Q e At + e A t Q e At A dt = e A t Q e At = −Q.
0 0

This shows that for every Q ∈ Rn×n (symmetric or not) there is a P ∈ Rn×n
for which A T P + P A = −Q. This means that the linear mapping from P ∈
Rn×n to A T P + P A ∈ Rn×n is surjective. It is a standard result from linear
algebra that a surjective linear mapping from a finite-dimensional vector
space to the same vector space is in fact invertible. Hence, for every Q ∈
Rn×n (symmetric or not) the solution P of A T P + P A = −Q exists and is
unique. Our Q is symmetric and positive definite, so the solution P as
constructed in (B.18) is symmetric and positive definite.

4. =⇒ 1. Then V (x) := x T P x satisfies V̇ ( x (t )) = dt x (t )P x (t ) = ẋ T (t )P x (t ) +


d T

x (t )P ẋ (t ) = x (t )(A P + P A) x (t ) = − x (t )Q x (t ). So V is a strong Lya-


T T T T

punov function with V̇ (x) < 0 for all x = 0. It is radially unbounded, hence
the equilibrium is globally asymptotically stable (Theorem B.3.5).

The proof of the above theorem actually shows that for asymptotically stable
matrices A ∈ Rn×n , the Lyapunov equation

A T P + P A = −Q

for every matrix Q ∈ Rn×n (not necessarily symmetric or positive definite) has
a unique solution P ∈ Rn×n . In particular, it immediately yields the following
result.

Corollary B.5.3 (Lyapunov equation). If A ∈ Rn×n is asymptotically stable, then


A T P + P A = 0 ∈ Rn×n iff P = 0 ∈ Rn×n . 
228 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

Example B.5.4 (Scalar Lyapunov equation). The system

ẋ (t ) = −2 x (t )

is globally asymptotically stable because for q = 1 > 0 the Lyapunov equation

−2p − 2p = −q = −1

has a unique solution, p = 1/4, and the solution is positive. Note that Theo-
rem B.5.2 says that we may take any q > 0 that we like. Indeed, whatever pos-
itive q we take, we have that the solution of the Lyapunov equation is unique
and positive: p = q/4 > 0. 

It is good to realize that (B.19) is a linear equation in the entries of P , and it


is therefore easily solved (it requires a finite number of operations). Also positive
definiteness of a matrix can be tested in a finite number of steps (Appendix A.1).
Thus asymptotic stability of ẋ (t ) = A x (t ) can be tested in a finite number of
steps.

Example B.5.5 (2 × 2 Lyapunov equation). Consider


 
−1 2
ẋ (t ) = x (t ).
0 −1

We take Q equal to the 2 × 2 identity matrix,


 
1 0
Q= .
0 1
α β
The candidate solution P of the Lyapunov equation we write as P = β γ . The
Lyapunov equation (B.19) then reads
       
−1 0 α β α β −1 2 −1 0
+ = .
2 −1 β γ β γ 0 −1 0 −1

Working out the matrix products on left-hand side leaves us with


   
−2α 2α − 2β −1 0
= .
2α − 2β 4β − 2γ 0 −1

By symmetry, the upper-right and lower-left entries are identical so the above
matrix equation is effectively three scalar equations in the three unknowns
α, β, γ:

−2α = −1,
2α − 2β = 0,
4β − 2γ = −1.
B.6 LYAPUNOV ’ S F IRST M ETHOD 229

This gives α = β = 1/2 and γ = 3/2, that is,


 
1 1 1
P= .
2 1 3

This matrix is positive definite because P 11 = 12 > 0 and det(P ) = 1 > 0 (see
Appendix A.1). Therefore the differential equation with equilibrium x̄ = (0, 0) is
globally asymptotically stable. 

B.6 Lyapunov’s First Method

Through a process called linearization we can approximate a nonlinear system


around an equilibrium with a linear system. Often the stability properties of the
nonlinear system and the so determined linear system are alike. In fact, as we
will see, they often share the same Lyapunov function.
We assume that the vector field f : Rn → Rn is differentiable at the given
equilibrium x̄. This is to say that f (x) is of the form

f (x̄ + δx ) = Aδx + o(δx ) (B.20)

with A some n × n matrix, and o : Rn → Rn some little-o function which means


a function having the property that

 o(δx )
lim = 0. (B.21)
δx →0 δx 

We think of little-o functions as functions that are “extremely small” around the
origin.
To analyze the behavior of the state x (t ) relative to an equilibrium x̄, it
makes sense to analyze δx (t ) defined as the difference between state and equi-
librium,

δx (t ) := x (t ) − x̄.

This difference obeys the differential equation

δ̇x (t ) = ẋ (t ) − x̄˙ = ẋ (t ) = f ( x (t )) = f (x̄ + δx (t )) = Aδx (t ) + o(δx (t )).

The linearized system of ẋ (t ) = f ( x (t )) at equilibrium x̄ is defined as the system


in which the little-o term, o(δx (t )), is deleted:

δ̇x (t ) = Aδx (t ).

It constitutes a linear approximation of the original nonlinear system but we


expect it to be an accurate approximation as long as δx (t ) is “small”. The matrix
230 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

A equals the Jacobian matrix at x̄ defined as


⎡ ⎤
∂ f 1 (x̄) ∂ f 1 (x̄)
⎢ ∂x ···
∂x n ⎥
∂ f (x̄) ⎢ ⎥
1
⎢ .. .. ⎥
A= := ⎢ . ⎥ . (B.22)
∂x T ⎢ . ⎥
⎣ ∂ f n (x̄) ∂ f n (x̄) ⎦
···
∂x 1 ∂x n
(See Appendix A.2 for an explanation of this notation.)

x̄ x 0 x

f (x) A x

F IGURE B.11: Nonlinear f (x) (left) and its linear approximation Aδx (right).

Example B.6.1. Consider the nonlinear differential equation

ẋ (t ) = − sin(2 x (t )). (B.23)

The function f (x) = − sin(2x) has many zeros, among which is

x̄ = 0.

The idea of linearization is that around x̄ the function f (x) is almost indistin-
guishable from its tangent with slope
∂ f (x̄)
A= = −2 cos(x̄) = −2 cos(0) = −2,
∂x
see Fig. B.11, and so the solutions x (t ) of (B.23) will probably be quite similar to
x̄ + δx (t ) = δx (t ) with δx (t ) the solution of the linear system

δ̇x (t ) = −2δx (t ) (B.24)

provided that δx (t ) is small. The above linear system (B.24) is the linearized sys-
tem of (B.23) at equilibrium x̄ = 0. 

Lyapunov’s first method presented next, roughly speaking says that the non-
linear system and the linearized system have the same asymptotic stability
properties. The only exception to this rule is if the eigenvalue with the largest
real part is on the imaginary axis (so its real part is zero). The proof of this result
relies on the fact that every asymptotically stable linear system has a Lyapunov
function (namely its cost-to-go) which turns out to be a Lyapunov function for
the nonlinear system as well:
B.6 LYAPUNOV ’ S F IRST M ETHOD 231

Theorem B.6.2 (Lyapunov’s first method). Let f : Rn → Rn be a continuously


differentiable function and let x̄ be an equilibrium of ẋ (t ) = f ( x (t )).
1. If all eigenvalues of the Jacobian (B.22) have negative real part, then x̄ is
an asymptotically stable equilibrium of the nonlinear system.

2. If there is an eigenvalue of the Jacobian (B.22) with positive real part, then
x̄ is an unstable equilibrium of the nonlinear system.

Proof. (First realize that continuous differentiability of f implies Lipschitz con-


tinuity, and so Lyapunov theory is applicable.) Write f (x) as in (B.20). Without
loss of generality we assume that x̄ = 0, and we define A as in (B.22).
1. By the assumptions on the eigenvalues the linearized system δ̇x (t ) =
Aδx (t ) is asymptotically stable. So Theorem B.5.2 guarantees the existence
of a positive definite matrix P that satisfies

A T P + P A = −I ,

and that V (δx ) = δxT P δx is a strong Lyapunov function for the linear sys-
tem δ̇x (t ) = Aδx (t ). We prove that V (x) := x T P x is also a strong Lyapunov
function for ẋ (t ) = f ( x (t )) on some neighborhood Ω of x̄ = 0. Clearly, this
V is positive definite and continuously differentiable and positive definite.
We have that

V̇ (x) = ẋ T P x + x T P ẋ
= f (x)T P x + x T P f (x)
= [Ax + o(x)]T P x + x T P [Ax + o(x)]
= x T (A T P + P A)x + o(x)T P x + x T P o(x)
= −x T x + 2 o(x)T P x
= −x2 + 2 o(x)T P x.

The term 2 o(x)T P x we recognize as the standard inner product of 2 o(x)


and P x, so by the Cauchy-Schwarz inequality we can bound it from above
by 2 o(x)P x, hence

V̇ (x) ≤ −x2 + 2 o(x)P x.

Based on this we now choose Ω as


1
Ω :={x ∈ Rn | 2 o(x)P x ≤ x2 }.
2
From (B.21) it follows that this Ω is a neighborhood of x̄ = 0. Then, finally,
we find that
1
V̇ (x) ≤ − x2 ∀x ∈ Ω.
2
Therefore V̇ (x) < 0 for all x ∈ Ω, x = x̄, so V is a strong Lyapunov function
for the nonlinear system.
232 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

2. See (Khalil, 1996, Thm. 3.7).

The two cases of Theorem B.6.2 cover all possible eigenvalue configura-
tions, except when some eigenvalues have zero real part and none have pos-
itive real part. In fact, if there are eigenvalues on the imaginary axis then the
dynamical behavior crucially depends on the higher-order terms o(δx ), which
are neglected in the linearization. For example, the three systems

ẋ (t ) = x 2 (t ),
ẋ (t ) = − x 2 (t ),
ẋ (t ) = x 3 (t ),

all have the same linearization at x̄ = 0, but their dynamical properties are very
different. See also Exercise B.5.

Example B.6.3. Consider the system

ẋ 1 (t ) = x 1 (t ) + x 1 (t ) x 22 (t )
ẋ 2 (t ) = − x 2 (t ) + x 21 (t ) x 2 (t ).
   
The system has equilibrium x̄ := 00 , and the Jacobian A at this 00 is
   
∂ f (x̄) 1 + x 22 2x 1 x 2  1 0
A= = = .
∂x T 2x 1 x 2 −1 + x 12 x= 0  0 −1
0

Clearly it has eigenvalues ±1. In particular it has a positive eigenvalue. Lya-


punov’s first method hence proves that the system at this equilibrium is unsta-
ble. 

B.7 Exercises

B.1 Equilibria.

(a) Let x̄ be an equilibrium of system (B.2). Show that every continu-


ously differentiable function V : Rn → R satisfies V̇ (x̄) = 0.
(b) Prove that if a system of the form (B.2) has more than one equi-
librium point, then none of these equilibrium points are globally
asymptotically stable.
(c) Consider the linear system

ẋ (t ) = A x (t ),

with A an n × n matrix. Argue that this system either has exactly one
equilibrium, or infinitely many equilibria.
B.7 E XERCISES 233

B.2 Investigate the stability of the origin for the following two systems (that is,
check all six stability types as mentioned in Definition B.2.2). Use a suit-
able Lyapunov function.

(a)

ẋ 1 (t ) = − x 31 (t ) − x 22 (t ),
ẋ 2 (t ) = x 1 (t ) x 2 (t ) − x 32 (t ).

[Hint: take the “standard” V (x).]


(b)

ẋ 1 (t ) = x 2 (t ),
ẋ 2 (t ) = − x 31 (t ).
β
[Hint: try V (x 1 , x 2 ) = x 1α + cx 2 and then determine suitable α, β, c.]

B.3 Adaptive Control. The following problem from adaptive control illustrates
an extension of the theory of Lyapunov functions to functions that are,
strictly speaking, no longer Lyapunov functions. This problem concerns
the stabilization of a system of which the parameters are not (completely)
known. Consider the scalar system

ẋ (t ) = a x (t ) + u (t ), x (0) = x0 , (B.25)

where a is a constant, and where u : [0, ∞) → R is an input that we have


to choose in such a way that limt →∞ x (t ) = 0. If we know a then u (t ) =
−k x (t ), with k > a, would solve the problem. However, we assume that
a is unknown but that we can measure x (t ). Contemplate the following
dynamic state feedback

u (t ) = − k (t ) x (t ) where k̇ (t ) = x 2 (t ), k (0) = 0. (B.26)

Here, the term x 2 (t ) ensures that k (t ) grows if x 2 (t ) is not close to


zero. The idea is that k (t ) keeps on growing until it is so large that
u (t ) := − k (t ) x (t ) stabilizes the system, so until x (t ) is equal to zero.

(a) Write (B.25)–(B.26) as one system with state ( x , k ) and determine all
equilibrium points.
(b) Consider the function V (x, k) := x 2 +(k−a)2 . Prove that V̇ ( x (t ), k (t )) =
0 for all x , k . For which equilibrium point is this a Lyapunov func-
tion?
(c) Prove, using the above, that k (t ) is bounded.
(d) Prove, using (B.26), that k (t ) converges as t → ∞.
(e) Prove that limt →∞ x (t ; x 0 ) = 0.
234 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

(f ) Determine limt →∞ k (t ).

B.4 This exercise is based on an exercise in Khalil (1996) who, in turn, took it
from a book by Hahn1 , and it appears that Hahn was inspired by a paper
by Barbashin and Krasovskı̆2 . Consider the system

− x 1 (t ) + x 2 (t )(1 + x 21 (t ))2
ẋ 1 (t ) = ,
(1 + x 21 (t ))2
− x 1 (t ) − x 2 (t )
ẋ 2 (t ) = ,
(1 + x 21 (t ))2

and define V : R2 → R as

x 12
V (x) = + x 22 .
1 + x 12

(a) Show that (0, 0) is the only equilibrium point.


(b) Show that V is a strong Lyapunov function on the entire state space
Ω = R2 .
(c) Show that the level sets {x ∈ R2 | V (x) = c} of the Lyapunov function
are unbounded if c ≥ 1. Hence the Lyapunov function is not radially
unbounded. (Figure B.12 depicts several level sets.)
(d) Figure B.12 also depicts the curve x 2 = 1/x 1 and the region to the
right of it, so where x 1 x 2 > 1. The phase portrait suggests that
x 1 (t ) x 2 (t ) increases if x 2 (t ) = 1/ x 1 (t ). Indeed. Show that
d x 1 (t ) x 2 (t ) 1
= 2 >0
dt x 1 (t )(1 + x 21 (t ))2

whenever x 2 (t ) = 1/ x 1 (t ) > 0.
(e) Use the above to prove that the origin is not globally asymptotically
stable.

B.5 Linearization. Consider the scalar system

ẋ (t ) = a x 3 (t )

with a ∈ R.

(a) Prove that the linearization of this system about its equilibrium
point is independent of a.
1 W. Hahn. Stability of Motion, volume 138 of Die Grundlehren der mathematischen Wis-

senschaften. Springer-Verlag, New York, 1967.


2 E.A. Barbashin and N.N. Krasovskı̆. Ob ustoichivosti dvizheniya vtzelom. Dokl. Akad. Nauk.

USSR, 86(3): 453–456, 1952. (Russian). English title: “On the stability of motion in the large”.
B.7 E XERCISES 235

x2
x1 x2 1

0 2 x1

F IGURE B.12: A phase portrait of the system of Exercise B.4. The red
dashed lines are level sets of V (x). The boundary of the shaded region
{(x 1 , x 2 ) | x 1 , x 2 > 0, x 1 x 2 > 1} is where x 2 = 1/x 1 > 0.

(b) Sketch the graph of ax 3 as a function of x, and use it to argue that


the equilibrium is
• asymptotically stable if a < 0,
• stable if a = 0,
• unstable if a > 0.
(c) Determine a Lyapunov function for the cases that the system is sta-
ble.
(d) Determine a strong Lyapunov function for the cases that the system
is asymptotically stable.

B.6 Consider the system

ẋ 1 (t ) = − x 51 (t ) − x 2 (t ),
ẋ 2 (t ) = x 1 (t ) − 2 x 32 (t ).
(a) Determine all points of equilibrium.
(b) Determine a Lyapunov function for the equilibrium x̄ = (0, 0), and
discuss the type of stability that follows from this Lyapunov function
(stable? asymptotically stable? globally asymptotically stable?)

B.7 Suppose that

ẋ 1 (t ) = x 2 (t ) − x 1 (t ),
ẋ 2 (t ) = − x 31 (t ),
236 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

and use the candidate Lyapunov function V (x 1 , x 2 ) = x 14 + 2x 22 . The equi-


librium is x̄ = (0, 0).

(a) Is this a Lyapunov function?


(b) Is this a strong Lyapunov function?
(c) Investigate the nature of stability of this equilibrium with LaSalle’s
invariance principle.

B.8 Consider the Van der Pol equation

ÿ (t ) − (1 − y 2 (t )) ẏ (t ) + y (t ) = 0.

This equation occurs in the study of vacuum tubes and then  is positive.
However, in this exercise we take  < 0.

(a) Rewrite this equation in the standard form (B.2) with x 1 := y and
x 2 := ẏ .
(b) Use linearization to show that the origin (x 1 , x 2 ) = (0, 0) is an asymp-
totically stable equilibrium (recall that  < 0).
(c) Determine a neighborhood Ω of the origin for which V (x 1 , x 2 ) = x 12 +
x 22 is a Lyapunov function for x̄ = (0, 0).
(d) Let V (x 1 , x 2 ) and Ω be as in the previous part. Which stability prop-
erties can be concluded from LaSalle’s invariance principle?

B.9 The well-known Lotka-Volterra model describes the interaction between


a population of predators (of size x 1 ) and prey (of size x 2 ), and is given by
the equations

ẋ 1 (t ) = −a x 1 (t ) + b x 1 (t ) x 2 (t ), x 1 (0) ≥ 0,
ẋ 2 (t ) = c x 2 (t ) − d x 1 (t ) x 2 (t ), x 2 (0) ≥ 0.

The first term on the right-hand side of the first equation models that
predators become extinct without food, while the second term models
that the growth of the number of predators is proportional to the number
of prey. Likewise, the term on the right-hand side of the second equation
models that without predators the population of prey increases, while its
decrease is proportional to the number of predators. For convenience we
choose a = b = c = d = 1.

(a) Show that, apart from (0, 0), the system has a second equilibrium
point.
(b) Investigate the stability of both equilibrium points using lineariza-
tion.
B.7 E XERCISES 237

(c) Investigate the stability of the nonzero equilibrium point using the
function

V (x 1 , x 2 ) = x 1 + x 2 − ln(x 1 x 2 ) − 2.

Here, ln is the natural logarithm.

B.10 The equations of motion of the pendulum with friction are

ẋ 1 (t ) = x 2 (t ),
g d (B.27)
ẋ 2 (t ) = − sin( x 1 (t )) − x 2 (t ).
 m
Here x 1 is the angular displacement, x 2 is the angular velocity, g is the
gravitational acceleration,  is the length of the pendulum, m is the mass
of the pendulum, and d is a friction coefficient. All constants g , , d , m are
positive.

(a) Prove, using Theorem B.6.2, that the origin is an asymptotically sta-
ble equilibrium point.
(b) In Example B.4.9 we verified asymptotic stability using LaSalle’s
invariance principle. Here we want to construct a strong Lyapunov
function to show asymptotic stability using Theorem B.3.2: deter-
mine a symmetric matrix P > 0 such that the function
 
V (x) := x T P x + g 1 − cos(x 1 )

is a strong Lyapunov function for (B.27) on some neighborhood Ω of


the origin. (This exercise assumes knowledge of Appendix A.1.)

B.11 Consider the system

ẋ 1 (t ) = −2 x 1 (t )( x 1 (t ) − 1)(2 x 1 (t ) − 1),
(B.28)
ẋ 2 (t ) = −2 x 2 (t ).

(a) Determine all equilibrium points of the system (B.28).


(b) Prove that there are two asymptotically stable equilibrium points.
(c) Investigate the stability of the other equilibrium point(s).

B.12 Determine all equilibrium points of

ẋ 1 (t ) = x 1 (t )(1 − x 22 (t )),
ẋ 2 (t ) = x 2 (t )(1 − x 21 (t )).

For each of the equilibrium points determine the linearization and the
nature of stability of the linearization.
238 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

F IGURE B.13: A spinning rigid body. See Exercise B.13.

B.13 The equations of motion of a rigid body spinning around its center of
mass are

I 1 ω̇1 (t ) = (I 2 − I 3 )ω2 (t )ω3 (t ),


I 2 ω̇2 (t ) = (I 3 − I 1 )ω1 (t )ω3 (t ),
I 3 ω̇3 (t ) = (I 1 − I 2 )ω1 (t )ω2 (t ),

where ω :=(ω1 , ω2 , ω3 ) is the vector of angular velocities around the


three principal axes of the rigid body, and I 1 , I 2 , I 3 > 0 are the princi-
pal moments of inertia. This is depicted in Fig. B.13. The kinetic energy
(due to rotation) is
1 
I 1 ω21 + I 2 ω22 + I 3 ω23 .
2
(a) Prove that the origin ω = (0, 0, 0) is a stable equilibrium.
(b) Prove that the origin ω = (0, 0, 0) is not asymptotically stable.

Now assume that the moments of inertias are ordered as

0 < I1 < I2 < I3.

(This implies a certain lack of symmetry of the rigid body, e.g., it is not a
unit cube. An example where 0 < I 1 < I 2 < I 3 is shown in Fig. B.13.)

(c) The origin (0, 0, 0) is just one equilibrium. Determine all equilibria
and explain what this implies about the stability properties.
(d) Determine the linearization around each of the equilibria.
(e) Use linearization to prove that steady spinning around the second
principal axis (0, ω̄2 , 0) is unstable if ω̄2 = 0.
B.7 E XERCISES 239

(f ) This is a tricky question. Prove that both the kinetic energy


1
 2 2 2

2 I 1 ω1 + I 2 ω2 + I 3 ω3 ,

and the squared total angular momentum

I 12 ω21 + I 22 ω22 + I 32 ω23

are constant over time, and use this to prove that steady spinning
around the first and third principal axes is stable, but not asymptot-
ically stable.
Remark: A spinning body spins stably both around the principal axis
with the smallest moment of inertia and the principal axis with the
largest moment of inertia. But around the other principal axis it is
not stable. This can be demonstrated by (carefully) spinning this
book in the air. You will see that you can get it to spin nicely around
the axis with largest inertia – like a discus – and around the axis with
smallest inertia – like a spear – but you will probably fail to make it
spin around the other axis.

B.14 Consider the system ẋ (t ) = f ( x (t )) with equilibrium point x̄, and let Ω
be a neighborhood of x̄. Suppose a Lyapunov function exists such that
V̇ (x) = 0 for all x ∈ Ω. Prove that this system with equilibrium x̄ is not
asymptotically stable.

B.15 Let x (t ; x 0 ) be a solution of the differential equation ẋ (t ) = f ( x (t )), x (0) =


x 0 . Prove that O (x 0 ) := { x (t ; x 0 ) | t ≥ 0} is an invariant set for ẋ (t ) = f ( x (t )).

B.16 Consider a system ẋ (t ) = f ( x (t )), and assume that f is locally Lipschitz


continuous. A trajectory x (t ; x 0 ) is closed if x (t + s; x 0 ) = x (t ; x 0 ) for some
t and some s > 0. Let x (t ; x 0 ) be a closed trajectory of this system, and
suppose that V : Rn → R is a C 1 Lyapunov function for this system on the
entire state space Rn . Prove that V̇ ( x (t ; x 0 )) = 0 for all t ≥ 0.

B.17 In this exercise we look at variations of the system (B.8) from Exam-
ple B.4.6. We investigate the system
 
ẋ 1 (t ) = x 2 (t ) + x 1 (t ) γ − x 21 (t ) − x 22 (t ) ,
 
ẋ 2 (t ) = − x 1 (t ) + x 2 (t ) γ − x 21 (t ) − x 22 (t ) ,
with γ ∈ R. Prove that the origin is an asymptotically stable equilibrium
point if γ ≤ 0, and that it is an unstable equilibrium point if γ > 0.

B.18 (Assumes knowledge of Appendix A.1.) Determine all α, β ∈ R for which


⎡ ⎤
α 0 0
⎣ 0 1 β⎦
0 β 4
240 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

(a) is positive definite.


(b) is positive semi-definite but not positive definite.

B.19 (Assumes knowledge of Appendix A.1.) Let the matrices A and Q be given
by
   
0 1 4 6
A= , Q= ,
−2 −3 6 10

(a) Determine a matrix P such that A T P + P A = −Q.


(b) Show that P and Q are positive definite and conclude that ẋ (t ) =
Ax(t ) is asymptotically stable.

B.20 (Assumes knowledge of Appendix A.1.) Consider the matrix


⎡ ⎤
−2 1 0
A = ⎣ 0 −1 0 ⎦ .
0 1 −2

(a) Use a computer to determine the solution P of the Lyapunov equa-


tion

A T P + P A = −I .

(b) Check (without using a computer) that this solution P is positive def-
inite.

B.21 Consider the linear differential equation

ẋ 1 (t ) = x 1 (t ) + 2 x 2 (t ),
ẋ 2 (t ) = −α x 1 (t ) + (1 − α) x 2 (t ).

Determine all α’s for which this differential equation is asymptotically sta-
ble around x̄ = (0, 0).

B.22 The blue phase portrait of Fig. B.14 is that of ẋ (t ) = A x (t ) with


 
−1 − π/2
A= 3 .
3π/2 −1
p 0
(a) Determine a diagonal positive definite matrix P of the form P = 0 1
for which also A T P + P A is diagonal.
(b) Show that x T P x is a strong Lyapunov function for this system (with
equilibrium x̄ = 0).
(c) Sketch in Fig. B.14 a couple of level sets {x ∈ R2 | x T P x = c}, and
explain from this figure why indeed V̇ (x) < 0 for all nonzero x.
B.7 E XERCISES 241

4 3 2 1 1 2 3 4

F IGURE B.14: The blue phase portrait is that of the system of Exercise B.22.
This is also the phase portrait of the system ẋ (t ) = A even x (t ) of Exer-
cise B.23. In red is the phase portrait of ẋ (t ) = A odd x (t ) of Exercise B.23.
All trajectories (blue and red) converge to zero as t → ∞.
242 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS

B.23 Notice that the results we derived in this chapter are formulated only for
time-invariant systems ẋ (t ) = f ( x (t )). For time-varying systems ẋ (t ) =
f ( x (t ), t ) the story is quite different, even if the system is linear of the form

ẋ (t ) = A(t ) x (t ). (B.29)

For such linear systems one might be tempted to conclude that it is


asymptotically stable if for every t all eigenvalues of A(t ) have negative
real part. In this exercise we will see that this is wrong. Consider the sys-
tem (B.29) where

A even if t  is even
A(t ) = (B.30)
A odd if t  is odd

in which
   
−1 − π/2 −1 −3π/2
A even = 3 , A odd = 1 .
3π/2 −1 3 π/2 −1
Here t  denotes the floor of t (the largest integer less than are equal to t ).
The system hence switches dynamics at every t ∈ Z.

(a) Show that the eigenvalues of A(t ) at every t are −1 ± iπ/2.

In particular all eigenvalues of A(t ) have negative real part at every t . At


this point it is interesting to have a look at the phase portraits of ẋ (t ) =
A even x (t ) and ẋ (t ) = A odd x (t ), see Fig. B.14. The blue phase portrait is that
of ẋ (t ) = A even x (t ), and the red phase portrait is that of ẋ (t ) = A odd x (t ). It
can be shown that
 
A even t −t
cos( π2 t ) − 13 sin( π2 t )
e =e ,
3 sin( π2 t ) cos( π2 t )
 
A odd (t −1) −t
cos( π2 t ) −3 sin( π2 t )
e =e π
.
1
3 sin( 2 t ) cos( π2 t )

(b) Verify that x (t ) = e A even t x (0) for all t ∈ [0, 1], and x (t ) = e A odd (t −1) x (1)
for all t ∈ [1, 2].
(c) Show that
 
−(3/ e)2 0
x (2k + 2) = x (2k)
0 −1/(3 e)2
for all k ∈ Z, and use it to conclude that the time-varying sys-
tem (B.29) is not asymptotically stable.
(d) Use the above to sketch in Fig. B.14 the trajectory x (t ) for t > 0 with
 
initial condition x (0) = 10 , and argue that this trajectory diverges as
t → ∞.
B.7 E XERCISES 243

Remark: another “counterexample” can be described in words as follows:


consider a mass attached to a spring where the positive spring constant
k depends on time t . If the function k(t ) is constructed in such a way
that k(t ) is small whenever the mass is far from the rest position but large
whenever the mass passes through the rest position, then it is physically
clear that the time-varying mass-spring system is unstable even in the
presence of small damping. On the other hand, the eigenvalues at each
time t have real part strictly less than zero.
x2

x1

F IGURE B.15: Stable or not? Globally attractive or not? See Exercise B.7.

B.24 This exercise is based on an example from a paper by Ryan and Sontag3 .
It is about a system whose equilibrium is globally attractive yet not stable!
Consider the system ẋ (t ) = f ( x (t )) with
⎧ 1 x1


⎪ −x 1 (1 − x ) − 2x 2 (1 − x )

⎪ if x ≥ 1,

⎨ −x 2 (1 − 1 ) + 2x 1 (1 − x1 )
x x
f (x) =  



⎪ 2(x 1 − 1)x 2

⎩ if x < 1.
−(x 1 − 1)2 + x 22

Notice that f inside the unit disc is defined differently than outside the
unit disc. Nevertheless, f (x) is locally Lipschitz continuous, also on the
unit circle. Inside the unit circle, the orbits are arcs (parts of circles) that
converge to x̄ = (1, 0), see Fig. B.15. Outside, x ≥ 1, the system is easier
to comprehend in polar coordinates (x 1 , x 2 ) = (r cos(θ), r sin(θ)) with r =
3 E.P. Ryan and E.D. Sontag. Well-defined steady-state response does not imply CICS. System

and Control Letters, 55: 707–710, 2006.


244 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS


x 12 + x 22 . This gives

ṙ (t ) = 1 − r (t ),
(B.31)
θ̇(t ) = 4 sin2 (θ(t )/2) = 2(1 − cos(θ(t ))).

(a) Derive (B.31).


(b) Show that x̄ :=(1, 0) is the unique point of equilibrium.
(c) Argue that for  x (0) > 1 its phase portrait is as in Fig. B.15.
(d) Argue that the equilibrium x̄ is globally attractive but not stable.

B.25 Let A ∈ Rn×n and suppose that A + A T is negative definite. Is the origin a
stable equilibrium of ẋ (t ) = A x (t )?
Solutions to Odd-Numbered
Exercises

To avoid clutter in this appendix we often write x (t ) simply as x .

Chapter 1


∂ ∂
1.1 (a) 0 = ∂x
d
− dt ∂ẋ ( ẋ 2 − α x 2 ) = −2α2 x − dt (2 ẋ )
d
= −2α2 x − 2 ẍ . So
ẍ + α x = 0. Its solution (using characteristic polynomials) is
2

x (t ) = c eiαt +d e−iαt with c, d arbitrary constants. Equivalently,


x (t ) = a cos(αt ) + b sin(αt ) with a, b arbitrary constants.
d
(b) 0 = 2 − dt (2 ẋ ) = 2 − 2 ẍ . So x (t ) = 12 t 2 + at + b.
d
(c) 0 = − dt (2 ẋ + 4t ) = −(2 ẍ + 4). So x (t ) = −t 2 + at + b.
(d) 0 = ẋ + 2 x − dt
d
(2 ẋ + x ) = 2 x − 2 ẍ . So x (t ) = a et +b e−t .

∂ d ∂
∂ẋ ( x +2t x ẋ ) = 2 x +2t ẋ − dt (2t x ) = 2 x +2t ẋ −(2 x +2t ẋ ) =
2 d
(e) 0 = ∂x − dt
0. The Euler-Lagrange equation hence is 0 = 0. Every function x sat-
isfies this, so every function is stationary. (Incidentally, it is not too
hard to show that J ( x ) equals T x 2 (T ), so J is constant if we specify
the endpoint x (T ) = x T . This explains why all x are stationary.)

∂ d
 d∂ d 
1.3 (a) 0 = ∂x dt G(t , x (t )) − dt ∂ẋ ( dt G(t , x (t ))) .
∂G(t , x (t )) , x (t ))
(b) dt G(t , x (t ))
d
equals ∂t + ẋ T (t ) ∂G(t∂x . So the Euler-Lagrange
equation becomes
  
∂ d ∂ ∂ ∂
∂x dt G(t , x (t )) ∂t G(t , x (t )) + ẋ (t ) ∂x G(t , x (t ))
d
0= − dt ∂ẋ
T


∂ d ∂
∂x ( dt G(t , x (t ))) − dt ∂x G(t , x (t ))
d
=
= 0.

d ∂G(t , x (t ))
This holds for all x . In the last equality we used that dt ( ∂x )
∂ d
equals ∂x ( dt G(t , x (t ))).
T T d
(c) 0 F (t , x (t ), ẋ (t )) dt = 0 dt (G(t , x (t )) dt = G(T, x T ) − G(0, x 0 ) so the
outcome is the same for all functions x that satisfy the given bound-
ary conditions.
© The Editor(s) (if applicable) and The Author(s), under exclusive license 245
to Springer Nature Switzerland AG 2023
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0
246 S OLUTIONS TO O DD -N UMBERED E XERCISES

1.5 The constant 4πρv 2 plays no role. So take F (y, ẏ) = y ẏ 3 . Beltrami gives

C = y ẏ 3 − ẏ ( y ẏ 3 ) = y ẏ 3 − 3 y ẏ 3 = −2 y ẏ 3 .
∂ ẏ
Hence y ẏ 3 is constant. Now y (x) := y 1 (x/x 1 )3/4 satisfies this equation
(verify this) and, in addition, then y (0) = 0 and y (x 1 ) = y 1 as required.
(By the way, the function y (x) = y 1 (x/x 1 )3/4 is not differentiable at x = 0.)

1.7 (a) By contradiction. If f is not constant then f (a) = f (b) for some
a, b ∈ (0, T ). Let φ be a continuous function with φ(a) = +1 and
φ(b) = −1 of the form

T
This function satisfies 0 φ(t ) dt = 0 and for small enough “tents”
T
around a, b the integral 0 f (t )φ(t ) dt is nonzero because f (t ) is
continuous and f (a) = f (b). Contradiction. Hence f is constant.
∂F (t , x ∗ (t ), ẋ ∗ (t ))
(b) Momentarily denote ∂x  simply as F x (t ), and let G x (t ) be
T
an antiderivative of F x (t ). Now 0 F xT (t )δx (t ) dt equals [G xT (t )δx (t )]T0 −
T T T
0 G x (t )δ̇x (t ) dt . The term [G x (t )δx (t )]0 is zero because δx (0) =
T

δx (T ) = 0. Therefore (1.63) is the same as


T
 ∂F (t , x ∗ (t ), ẋ ∗ (t )) 
−G xT (t ) + δ̇x (t ) dt .
0 ∂ẋ T
t ∂F (τ, x ∗ (τ), ẋ ∗ (τ))
Clearly, one possible antiderivative is G x (t ) = 0 ∂x dτ.
(c) Follows immediately from (a) and (b) [take φ = δ̇x and realize that
T
this φ is continuous and 0 φ(t ) dt = [δx (t )]T0 = 0 − 0 = 0.]
∂F (t , x ∗ (t ), ẋ ∗ (t ))
(d) From (c) and continuity of ∂x it follows that ∂F (t , x ∗∂(tẋ), ẋ ∗ (t ))
d ∂F (t , x ∗ (t ), ẋ ∗ (t ))
is differentiable w.r.t. t and that dt ∂ẋ = −F (t , x ∗ (t ), ẋ ∗ (t )).
(e) Follows from the solution of (d). (Since F and x ∗ are C 1 it is guaran-
teed that the derivative exists and is continuous.)

1.9 (a)
∂ d ∂
0= − ( ẋ 2 − 2 x ẋ − ẋ )
∂x dt ∂ẋ
d
= −2 ẋ − (2 ẋ − 2 x − 1)
dt
= −2 ẋ − (2 ẍ − 2 ẋ ) = −2 ẍ .
S OLUTIONS TO O DD -N UMBERED E XERCISES 247

Thus x (t ) = at + x (0) = at + 1. According to (1.43) we need that 2 ẋ −


2 x −1 = 0 at T = 1. For our x (t ) = at +1 this means 2a−2(a+1)−1 = 0,
i.e., −3 = 0. Impossible.
(b) Follows from the previous part.
1
(c) For x (t ) = at + 1 we get J ( x ) = 0 a 2 − 2(at + 1)a − a dt = [a 2 t − (at +
1)2 − at ]10 = a 2 − (a + 1)2 − a + 1 = −3a. This cost is unbounded from
below (as a function of a), so does not have a minimum (not even a
local minimum).

1.11 In Example 1.6.8 we saw that the 2nd derivative of 1 + ẏ 2 w.r.t. ẏ is 1/(1+
ẏ 2 )3/2 . So here the Legendre condition is that 2π r (x)/(1 + ṙ 2 (x))3/2 has to
be nonnegative for all x. That is the case because the hyperbolic cosine
solution r a (x) is ≥ 0.

1.13
   ⎡ 2 ⎤
∂ F (... ) ∂2 F (... )  
∂2 F (t , xx12 , ẋẋ12 ) ∂ẋ 12 ∂ẋ 1 ∂ẋ 2 2 0

 ẋ1   ẋ1 T = ∂2 F (... ⎦ = > 0.
) ∂2 F (... ) 0 2
∂ ẋ2 ∂ ẋ2 ∂ẋ 2 ∂ẋ 1 ∂ẋ 22

Hence the Legendre condition is satisfied.



∂ d ∂
∂ẋ ( ẋ − x ) = −1 − dt (2 ẋ ) = −1 − 2 ẍ .
2 d
1.15 (a) 0 = ∂x − dt

(b) So x ∗ (t ) = −t 2 /4 + at + b. Given x (0) = 0, x (1) = 1 it follows that


x ∗ (t ) = −t 2 /4 + 5/4t .
∂2 F (t ,x,ẋ)
(c) ∂ẋ 2
= 2 ≥ 0 so yes.
(d) The Hessian H (t , x, y) is
 
0 0
H (t , x, y) = .
0 2

It is positive semi-definite, so the condition is satisfied.


(e) Since H ≥ 0 our x ∗ is globally optimal.

1.17 (a) Because (1 − ẋ (t ))2 x 2 (t ) ≥ 0 for every x and t .


(b) J ( x ) = 0 only if at every moment in time x (t ) = 0 or ẋ (t ) = 1. So either
x (t ) = 0 are any of these:
248 S OLUTIONS TO O DD -N UMBERED E XERCISES

Among these there is only one continuous function for which


x (−1) = 0 and x (1) = 1:

(c) (Idea suffices.) Among the continuous functions no C 1 function is


optimal because the unique optimal solution is not C 1 . (One can
also understand this question as: among the C 1 functions is there
an optimal one? The answer is no: take a sequence of C 1 func-
tions x n that “converge” to the optimal continuous function. Then
limn→∞ J ( x n ) = 0. So inf x isC 1 J ( x ) = 0 while no C 1 function x achieves
J ( x ) = 0.)

1.19 (a) The function F + μM as used in Theorem 1.7.1 becomes



F (x, y, ẏ) + μM (x, y, ẏ) = (ρg y + μ) 1 + ẏ 2 .

Since it does notdepend on x we can apply Beltrami. This gives


y (x) + μ/(ρg ) = a 1 + ẏ 2 (x) for some integration constant a.
(b) The normal case is given in (a). For the abnormal case: Euler-
Lagrange on 1 + ẏ 2 gives ÿ (x) = 0 (see Example 1.2.5). Hence a
straight line.
(c) We have the normal case if  exceeds the distance between
(x 0 , y (x 0 )) and (x 1 , y (x 1 )). The abnormal case if  equals this dis-
tance (and if  is less than this distance then no solution exists).
1
1.21 The function that minimizes 0 ẋ 2 (t ) dt satisfies the Euler-Lagrange equa-
tion: 0 = −2 ẍ (t ), so is a function of the form x (t ) = bt +c. Given the bound-
ary conditions ( x (0) = 0, x (1) = 1) this gives b = 1, c = 0, and, so, x (t ) = t .
The Hessian is positive semi-definite. Therefore x ∗ (t ) = t is an optimal
1 1
solution, and, consequently, the minimal C is C = 0 ẋ 2∗ (t ) dt = 0 1 dt = 1.

Chapter 2

2.1 (a) H (x, p, u) = pxu + x 2 + u 2 and, hence, ṗ = − pu − 2 x , p (T ) = 2.


(b) The u at any moment in time minimizes the Hamiltonian. Since the
Hamiltonian is a convex parabola in u , the minimizing u follows
from 0 = ∂H
∂u = px + 2u, i.e., u ∗ = −( px )/2.
S OLUTIONS TO O DD -N UMBERED E XERCISES 249

(c) The final condition on p is p (T ) = 2. Also, for u ∗ = − px /2 we have

H ( x , p , u ∗ ) = pxu ∗ + x 2 + u 2∗ = − px ( px /2)+ x 2 +( px /2)2 = x 2 (1− p 2 /4).

So H ( x ∗ (T ), p ∗ (T ), u ∗ (T )) is zero.
(d) For u ∗ = − px /2 the costate equation becomes ṗ = ( 12 p 2 − 2) x with
final condition p (T ) = 2. Clearly the constant p ∗ (t ) = 2 satisfies the
final condition, and also the DE because then ṗ ∗ = 0 = ( 12 p 2∗ − 2).
(e) For p = 2 (constant) we have u ∗ = − x so ẋ = − x 2 . See Example B.1.5:
x (t ) = 1/(t + 1).

2.3 The Hamiltonian is H = p(x +u)+ 12 u 2 −2u −2x. If u would have been free
to choose, then the minimizing u would have been the one that achieves
0 = ∂H /∂u = p + u − 2. This gives u = 2 − p. Given that U = [0, 4] and the
fact that H is a convex parabola in u it is easy to see that the minimizing u
is the element of [0, 4] that is closest to û := 2 − p. The costate equation is
ṗ = 2− p , p (1) = 0. Therefore p (t ) = 2(1−e1−t ). Thus û (t ) := 2− p (t ) = 2 e1−t .
We need u (t ) ∈ [0, 4]. Now û (t ) = 4 at t = 1 − ln(2) ≈ 0.3069, and û (t ) > 4 if
t < 1 − ln(2), and û (t ) ∈ [0, 4] if t ≥ 1 − ln(2). Therefore

4 if t ∈ [0, 1 − ln(2))
u ∗ (t ) = 1−t
.
2e if t ∈ [1 − ln(2), 1]

The optimal control is continuous but not differentiable.

2.5 The Hamiltonian is p(x + u) + 14 u 4 . So ṗ = − p (without final condition),


and the u minimizes iff u 3 = −p, so u ∗ = − p 1/3 . The costate has the form
p (t ) = c e−t . Hence u ∗ has the form d e−t /3 . From ẋ (t ) = x (t ) + d e−t /3 it
follows that x (t ) = α et +β e−t /3 for certain α, β. (The c, d , α, β are related
but let us worry about that later.) The initial and final conditions become
1 = x (0) = α+β and 0 = x (3) = α e3 +β e−1 . So α = − e−4 β and 1 = β(1−e−4 ).
That is, β = 1/(1 − e−4 ) and α = − e−4 /(1 − e−4 ). Now that α, β are known
the x ∗ = α et +β e−t /3 follows, and also u ∗ = ẋ ∗ − x ∗ = − 43 β e−t /3 .

2.7 The cost to be minimized is J (x 0 , u ) = − x 1 (T ), so K (x) = −x 1 and


L(x, u ) = 0.

(a) H (x, p, u) = p T f (x, u) + L(x, u) = p 1 x 2 + p 2 u.


(b) ṗ 1 (t ) = 0, ṗ 2 (t ) = − p 1 (t ). Since x 2 (t ) has a final condition, the cor-
responding final condition on p 2 (t ) is absent. We just have p 1 (T ) =
∂K ( x (T ))/∂x 1 = −1. The Hamiltonian equations in x are simply the
given equations: ẋ 1 (t ) = x 2 (t ), ẋ 2 (t ) = u (t ) and the given initial and
final condition.
(c) p 1 (t ) = p 1 (T ) = −1. Then ṗ 2 (t ) = +1, so p 2 (t ) = t + c.
250 S OLUTIONS TO O DD -N UMBERED E XERCISES

(d) Since u (t ) minimizes the Hamiltonian we have u (t ) = − sgn( p 2 (t )).


Since p 2 (t ) is an increasing function, our u (t ) switches sign at most
T
once, from u (t ) = 1 to u (t ) = −1. As 0 u (t ) dt = x 2 (T ) = 0 it must be
that u (t ) switches sign half-way, at t = T /2:

+1 t < T /2,
u (t ) =
−1 t > T /2.

(By the way, this means that the constant c in p 2 (t ) = t + c is c =


−T /2). Then the speed x 2 (t ) is this “tent” function

and the traveled distance x 1 (T ) is the area under the “tent”: x 1 (T ) =


T 2 /4.
T
By the way, we may also choose J (x 0 , u ) = 0 − x 2 (t ) dt because that, too,
equals − x 1 (T ). That choice works fine as well.

2.9 (a) H (x, p, u) = p(−x + u) + (x − u)2 , so ṗ = p − 2( x − u ) (and no final


condition on p ). The u that minimizes the Hamiltonian satisfies
0 = ∂H∂u = p −2(x −u). So p ∗ = 2( x ∗ − u ∗ ). Inserting this into the costate
equations gives ṗ ∗ = 0, hence the costate is constant, p ∗ = p ∗ . The
system equation then becomes ẋ = − 12 p ∗ indicating that x grows
linearly with slope − 12 p ∗ . Given x (0) = 0, x (T ) = 1 it follows that
x (t ) = t /T . Hence p ∗ (t ) = p ∗ = −2/T and u ∗ (t ) = (t + 1)/T .
 2 −2 
(b) Yes because the Hessian of H w.r.t. x, u is −2 2 ≥ 0, so H is convex
in x, u. (See Appendix A.7.) Also U := R is a convex set.
T
(c) x ∗ − u ∗ = −p ∗ /2 = 1/T so 0 ( x − u )2 dt = 1/T . The longer it takes
the “cheaper” it is. This is to be expected because if for some final
time T1 < T it would be cheaper then taking u = x for t > T1 would
achieve this cheaper (lower) cost also for final time T . (Notice that
taking u = x “costs nothing”.)

2.11 (a) H (p, x, u) = px(1 − u) − ln(xu).


(b) ẋ = x (1 − u ), x (0) = 1, x (1) = 12 e, and ṗ = − p (1 − u ) + 1/ x . Setting
∂H (x, p, u)/∂u to zero gives 0 = −px − 1/u.
(c) 0 = −px − 1/u gives u ∗ = −1/( p ∗ x ∗ ).
(d) ṗ = − p (1+1/( px ))+1/ x = − p so p (t ) = c e−t . Also, ẋ = x (1+1/( px )) =
x + 1/ p = x + c et /c. Hence x (t ) = (t /c + d ) et . The conditions x (0) = 1
and x (1) = 12 e determine d = 1, c = −2.
S OLUTIONS TO O DD -N UMBERED E XERCISES 251

(e) Since x (0) > 0 and ẋ ≥ 0 we have x (t ) > 0 for all t > 0. Now u ∗ (t ) =
−1/( p (t ) x (t )) = 1/(2 − t ) > 0. So yes.

2.13 (a) Once x (t ) > 0 it can only increase because a > 0, u (t ) ≥ 0 so then
ẋ (t ) = a u (t ) x (t ) ≥ 0.
T
(b) Since J ( u ) = − 0 x 2 (t ) dt we have H = p 1 aux 1 + p 2 a(1 − u)x 1 − x 2 .
The costate equations become

ṗ 1 = −a up 1 − a(1 − u ) p 2 , p 1 (T ) = 0,
ṗ 2 = 1, p 2 (T ) = 0.
So p 2 (t ) = t − T . Write H as H = uax 1 (p 1 − p 2 ) + · · · , thus the opti-
mal u only depends on sign of p 1 − p 2 . Now the clever trick: at the
final time we have ṗ 1 (T ) = 0, and since ṗ 2 = 1 it follows that we have
p 1 (t ) − p 2 (t ) < 0 near T . Therefore u (t ) = 0 near T . Now, as in Exam-
ple 2.5.5, solve the equations backwards in time:

p 1 (t ) = − 12 a(t − T )2 ∀t ∈ [t s , T ].

Here t s is the switching time (where p 1 (t s ) = p 2 (t s ), i.e., − 12 a(t −T )2 =


t − T , so t s = T − 2/a). We expect that p 1 (t ) < p 2 (t ) for t < t s and so
u (t ) = 1 for t < t s . Again this requires some “ad hoc” argument: since
at t = t s we have p 1 (t ) = p 2 (t ), we see that ṗ 1 (t s ) = −a p 2 (t s ) = 2 > 1
(independent of u ). It shows that p 1 increases faster than p 2 around
t = t s , so for t < t s but close to t s we have p 1 (t ) < p 2 (t ). Then

1 if t < T − 2/a
u ∗ (t ) = .
0 if t > T − 2/a

For T = 5, a = 0.5 this gives t s = 1 and the solution of p 1 (t ) and p 2 (t )


are as follows:

(The plot of p 1 (t ) is deceivingly smooth. We already saw that ṗ 1 (t ) →


2 both for t ↑ t s and t ↓ t s . The behavior of p 1 is, however, quite
252 S OLUTIONS TO O DD -N UMBERED E XERCISES

different for t < t s and t > t s . One is exponential in t , the other is


polynomial in t .)

2.15 (We use p 1:n to denote the first n entries of a vector p.)
 u   
(a) f (z) = M (x,u) , L(z, u) = F (x, u), K (z) = c00 ,
(b)

Hλ (z, p, u) = p 1:n
T
u + p n+1 M (x, u) + λF (x, u).

So
∂H
ṗ n+1 = − = 0.
∂z n+1

(c) First note that the first n entries of the costate satisfy

∂H ∂F ∂M
ṗ 1:n = − =− − p n+1 (B.32)
∂x ∂x ∂x
∂H
(if λ = 1). The optimal input satisfies ∂u = 0. This means

∂M ∂F
p 1:n + p n+1 + = 0. (B.33)
∂u ∂u
Now (1.54) for μ∗ := p n+1 (constant) is satisfied because

∂(F + p n+1 M ) d ∂(F + p n+1 M )


− dt
∂x ∂u
∂F ∂M
= + p n+1 d
− dt (− p 1:n ) because of (B.33)
∂x ∂x
∂F ∂M ∂F ∂M
= + p n+1 − + p n+1 because of (B.32)
∂x ∂x ∂x ∂x
= 0.

(d) In the abnormal case we have H0 = p 1:n T


u + p n+1 M (x, u). The costate
∂M
equation then gives us ṗ 1:n = − p n+1 ∂x , and the optimal u makes ∂H ∂u
zero, so p 1:n + p n+1 ∂M
∂u = 0. Then the abnormal equation (1.55) holds
because p n+1 = 0 (which we were allowed to assume), and

∂M d ∂M ∂M
p n+1 ( − dt ( )) = p n+1 d
+ dt p 1:n = 0.
∂x ∂u ∂x
(Actually it can be shown that p n+1 is indeed nonzero, for if p n+1
would have been zero then the minimality property of the Hamil-
tonian would yield p 1:n = 0, but Thm. 2.6.1 guarantees that a zero
costate is impossible in the abnormal case, so p n+1 = 0.)
S OLUTIONS TO O DD -N UMBERED E XERCISES 253

2.17 (a) H (x, p, u) = pu + x so ṗ = −1, p (T ) = −1. Therefore p (t ) = T − 1 − t .


This costate swaps sign at t = T − 1. Since u minimizes pu + x we
have u (t ) = 0 (minimal) for all t < T − 1, and u (t ) = 1 for all t > T − 1.
(b) H = 0 if T ≥ 1, while H = T − 1 if T < 1.
(c) J = −1/2 if T ≥ 1, while J = T 2 /2 − T if T < 1:

(d) Every T ≥ 1 is optimal. It agrees with Thm. 2.7.1 because then H = 0.

2.19 (a) The same as the proof of Theorem 2.8.1 but with addition of the red
parts:

J (x 0 , u ) − J (x 0 , u ∗ )
T
= L( x , u ) dt − L( x ∗ , u ∗ ) dt + K ( x (T )) − K ( x ∗ (T ))
0
T
= (H ( x , p ∗ , u )− p ∗T ẋ )−(H ( x ∗ , p ∗ , u ∗ )− p ∗T ẋ ∗ ) dt + K ( x (T ))−K ( x ∗ (T ))
0
T
= (H ( x , p ∗ , u ) − H ( x ∗ , p ∗ , u ∗ )) − p ∗T ( ẋ − ẋ ∗ ) dt + K ( x (T )) − K ( x ∗ (T ))
0
T
≥ − ṗ ∗T ( x − x ∗ ) − p T∗ ( ẋ − ẋ ∗ ) dt + K ( x (T )) − K ( x ∗ (T ))
0
 T
= − p ∗T (t )( x (t ) − x ∗ (t )) 0 + K ( x (T )) − K ( x ∗ (T ))
∂K ( x ∗ (T ))
≥ − p ∗T (T )( x (T ) − x ∗ (T )) + ( x (T ) − x ∗ (T )) (B.34)
∂x T
∂K ( x ∗ (T ))
=0 because p ∗ (T ) = .
∂x
(Inequality (B.34) is because of convexity of K .)
(b) For every constrained state entry x i we trivially have that (B.34) is
zero (because x i (T ) = x ∗i (T ) for every such entry).

2.21 Yes: U = R is a convex set; H (x, p (t ), u) = p (t )u + x 2 + u 2 is a convex


parabola, so definitely convex in (x, u) [at every t ]; K (x) = 0 so a convex
function. Convexity of H can also be concluded from the fact that its Hes-
sian w.r.t. (x, u) is positive (semi-)definite:
⎡ ∂2 H (x, p (t ),u) ⎤
∂2 H (x, p (t ),u)  
⎣ ∂x 2 ∂x∂u
⎦ = 2 0 > 0.
∂2 H (x, p (t ),u) ∂2 H (x, p (t ),u) 0 2
∂u∂x ∂u 2
254 S OLUTIONS TO O DD -N UMBERED E XERCISES

Chapter 3

3.1 K = −K 0 , L = −L 0 , J = −J 0 .

3.3 (a) For V (x, t ) = Q(x) the HJB equations become

0 + min[Q  (x)xu + x 2 + u 2 ] = 0, Q(x) = 2x.


u∈R

So the final condition V (x, T ) = K (x) := 2x uniquely establishes Q(x)!


Remarkable. For this Q(x) = 2x the HJB equation simplifies to

0 + min(2xu + x 2 + u 2 ) = 0.
u∈R

It is indeed satisfied because 2xu + x 2 + u 2 = (x + u)2 so its minimum


is zero (attained at u = −x.)
The V (x, t ) = 2x hence is our candidate value function, and u (t ) =
− x (t ) our candidate optimal control with candidate optimal cost
V (x 0 , 0) = 2x 0 = 2.
(b) They are just candidate solutions because we still need to verify
that the resulting closed loop ẋ ∗ (t ) = x ∗ (t ) u ∗ (t ) = − x 2∗ (t ) has a well
defined solution. For x (0) = 1 that is the case (see Example B.1.5):
x ∗ (t ) = 1/(t + 1). Now that x ∗ (t ) is well defined, Thm. 3.4.3 (Item 2)
says that u ∗ (t ) = − x ∗ (t ) is the optimal control and that V (x 0 , 0) =
2x 0 = 2 is the optimal cost.
(c) For x (0) = −1 the candidate optimal input makes the closed-loop sys-
tem satisfy ẋ ∗ (t ) = − x 2∗ (t ) with x ∗ (0) = −1. In Example B.1.5 we saw
that x ∗ (t ) = −1/(−t + 1) so this solution escapes at t = 1 < T = 2.
Hence the candidate u ∗ (t ) = − x ∗ (t ) is not optimal after all. (One can
show that in this case the cost is unbounded from below.)

3.5 To turn maximization into minimization we need to swap the sign of the
cost,
3
J [0,3] (x 0 , u ) := x (3) + ( u (t ) − 1) x (t ) dt .
0

(a) Q̇(t )x + minu∈[0,1] (Q(t )ux + (u − 1)x) = 0 and Q(3)x = x.


(b) Since x > 0 we may cancel x from the HJB equations to obtain Q̇(t )+
minu∈[0,1] (Q(t )u + u − 1) = 0,Q(3) = 1, which is

Q̇(t ) − 1 + min (Q(t ) + 1)u = 0, Q(3) = 1.


u∈[0,1]

So

0 if Q(t ) + 1 > 0
u (t ) = .
1 if Q(t ) + 1 < 0
S OLUTIONS TO O DD -N UMBERED E XERCISES 255

(c) As Q(3) = 1 we have Q(t ) + 1 > 0 near the final time. So then u = 0
which turns the HJB equations into Q̇(t )−1 = 0,Q(3) = 1. Thus Q(t ) =
t − 2. This is the solution on [1, 3] for then we still have Q(t ) + 1 > 0.
On [0, 1] we have u (t ) = 1 so then the HJB equations become Q̇(t ) +
Q(t ) = 0 which, given Q(1) = −1, implies that Q(t ) = − e1−t .
(d) u (t ) = 1 on [0, 1], and u (t ) = 0 on [1, 3]. Then x (t ) satisfies ẋ (t ) =
x (t ) u (t ) which is well defined for all t ∈ [0, 3]. So the candidate opti-
mal solution is truly optimal, and the candidate value function is the
true value function.
(e) The optimal (minimal) cost is V (x 0 , 0) = Q(0)x 0 = − e x 0 , so the max-
imal satisfaction is + e x 0 .

3.7 (a) We want the final state to be as close as possible to zero. So,
whenever x (t ) is nonzero, we steer optimally fast to zero: u (t ) =
− sgn( x (t )). Once x (t ) is zero, we take u (t ) = 0.
(b) The plot below shows a couple of state trajectories x (t ) as a function
of time:

Clearly, the shaded triangle is determined by

|x| ≤ T − t

and for any (x, t ) in this triangle the final state x (T ) is zero, so then
V (x, t ) = 0. For any (x, t ) above the triangle we have x (T ) = x − (T −
t ) = x +t −T , hence V (x, t ) = (x +t −T )2 . Likewise for any (x, t ) below
the triangle we have x (T ) = x + (T − t ) = x − t + T , hence V (x, t ) =
256 S OLUTIONS TO O DD -N UMBERED E XERCISES

(x − t + T )2 :

As a formula:


⎪ if |x| ≤ T − t
⎨0
V (x, t ) = (x + t − T ) 2
if x > T − t .


⎩(x − t + T )2 if x < t − T

Does it satisfy the HJB equations? The final condition V (x, T ) =


K (x) = x 2 is satisfied by construction. For (x, t ) above the triangle,
the HJB partial differential equation becomes

2(x + t − T ) + min (2(x + t − T )u) = 0.


u∈[−1,1]

The equality holds because in this region x + t − T > 0 so the mini-


mizer is u = −1 and this renders the left-hand side of the HJB equa-
tion indeed equal to zero. On the triangle, the HJB equation is rather
trivial,

0 + min 0 = 0.
u∈[−1,1]

Below the triangle the HJB equation reads

−2(x − t + T ) + min (2(x − t + T )u) = 0,


u∈[−1,1]

which, too, is correct because now x − t + T < 0 so u = +1 is the min-


imizer.
On the boundary |x| = T − t of the triangle, the function V (x, t ) is
continuously differentiable, and so the HJB equations hold for all x
and all t ∈ [0, T ].
S OLUTIONS TO O DD -N UMBERED E XERCISES 257

∂V (x,t )
3.9 We momentarily use Vx to mean ∂x , and likewise for Vt .

(a) The u that minimizes Vx xu +x 2 +ρ 2 u 2 is u = −Vx x/(2ρ 2 ). So the HJB


equations (3.12) become

Vt + Vx x(−Vx x/(2ρ 2 )) + x 2 + ρ 2 (Vx x)2 /(4ρ 4 ) = 0, V (x, T ) = 0.

Since V (x, t ) = xρG(z) with z :=(t − T )x/ρ, we have

Vt = xρG  (z)x/ρ = x 2G  (z),


Vx = ρG(z) + xρG  (z)(t − T )/ρ = ρ(G(z) + zG  (z)).

Thus the HJB equations become

x 2G  (z) − (G(z) + zG  (z))2 x 2 /2 + x 2 + x 2 (G(z) + zG  (z))2 /4 = 0,

together with G(0) = 0. After cancelling the common term x 2 we get


G  (z) − 14 (G(z) + zG  (z))2 + 1 = 0.
x
(b) u = −Vx x/(2ρ 2 ) = − 2ρ (G(z) + zG  (z)). Hence u ∗ (t ) = − x2ρ
(t )
(G(z) +

z G(z)) in which z = (t − T ) x (t )/ρ.
(c) Thus the closed-loop satisfies

ẋ ∗ (t ) = − 2ρ x ∗ (t )[G(z) + zG  (z)] in which z = (t − T ) x ∗ (t )/ρ.


1 2

Recall that ρ > 0. If x (t ) > 0 then z ≤ 0 for every t ∈ [0, T ] so (see


graphs) G(z) + zG  (z) ≥ 0 and, thus, ẋ ∗ (t ) ≤ 0. Likewise, if x (t ) < 0
then z ≥ 0 and G(z) + zG  (z) ≤ 0 and ẋ ∗ (t ) ≥ 0. In both cases we see
that | x ∗ (t )| decreases. Therefore, it does not escape on [0, T ], that
is, the solution x ∗ (t ) is well defined for all t ∈ [0, T ]. Hence V is the
value function, and the optimal cost is V (x 0 , 0) = x 0 ρG(−T x 0 /ρ).

3.11 (a) The HJB equations become


 
(x − 1)2 Ṗ (t ) + min 2(x − 1)P (t )(u − x) + (u − x)2 = 0,
u∈R
P (T )(x − 1)2 = β1 (x − 1)2 .

The minimizing u satisfies 2(x −1)P (t )+2(u − x) = 0, so u − x = −(x −


1)P . Thus the HJB equations become

(x − 1)2 Ṗ (t ) − 2(x − 1)2 P 2 (t ) + (x − 1)2 P 2 (t ) = 0,


P (T )(x − 1)2 = β1 (x − 1)2 .

Cancel the common factor (x − 1)2 and we find the simple ODE

Ṗ (t ) = P 2 (t ), P (T ) = 1/β.
258 S OLUTIONS TO O DD -N UMBERED E XERCISES

(b) For P (t ) := 1/(β+T −t ) we have P (T ) = 1/β and Ṗ (t ) = 1/(β+T −t )2 =


P 2 (t ). Correct. The previous part gives u ∗ (t ) = x (t ) + P (t )(1 − x (t )) =
1− x (t )
x (t ) + β+T −t . Thus the closed-loop system satisfies

1 − x (t )
ẋ (t ) = , x (0) = 0.
β+T −t

For x ∗ (t ) := t /(β + T ) we have ẋ ∗ (t ) = 1/(β + T ) and (1 − x ∗ (t ))/(β +


T − t ) = (1 − t /(β + T ))/(β + T − t ) = 1/(β + T ) = ẋ ∗ (t ). Correct. Conse-
t +1
quently, u ∗ (t ) = ẋ ∗ (t ) + x ∗ (t ) = β+T
1 t
+ β+T = β+T . Correct. The opti-
T
mal cost is β (T /(β+T )−1)2 + 0 ẋ 2 (t ) dt = β (−β/(β+T ))2 +T (1/(β+
1 1

T ))2 = 1/(β + T ).
(c) limβ↓0 x (T ) = T /T = 1. So it equals the desired voltage 1. This is to be
expected because the quadratic term ( x (T ) − 1)2 /β in the cost func-
tion blows up as β ↓ 0 unless x (T ) → 1.

3.13 (a) H = −pxu so ṗ = pu . We also have p (T ) = 1.


If x 0 > 0 then x (t ) is positive for all t > 0 so then H is minimal for
u ∗ (t ) = 1 near t = T (because p (t ) ≈ T > 0). This gives ṗ ∗ = p ∗ . Since
p ∗ (T ) = 1 we get

p ∗ (t ) = et −T if x > 0.

If x 0 < 0 then x (t ) < 0 for all t > 0. Then near t = T optimal is u ∗ (t ) =


0, so p ∗ (t ) is constant,

p ∗ (t ) = 1 if x < 0.

(b) For x 0 = 0 the solution of ẋ = xu is zero for all time (not dependent
on bounded u ). Thus every u results in the same cost J .
(c) If x ∗ (t ) > 0 then ∂V ( x ∗ (t ), t )/∂x = et −T . It agrees with the above
p ∗ (t ).
If x ∗ (t ) < 0 then ∂V ( x ∗ (t ), t )/∂x = 1. It agrees with the above p ∗ (t ).

3.15 The infinite horizon HJB equation (3.30) becomes


 
min V  (x)u + x 4 + u 2 = 0.
u∈R

The minimizing u satisfies V  (x) + 2u = 0. Using this V  (x) = −2u the HJB
equation becomes 0 = −2u 2 +x 4 +u 2 . Hence u = ∓x 2 and, so, V  (x) = ±2x 2 .
Clearly, V (x) = 23 |x|3 is the unique solution of the HJB equation that is
nonnegative and such that V (0) = 0. The corresponding input is u ∗ (t ) =
S OLUTIONS TO O DD -N UMBERED E XERCISES 259

− 12 V  ( x (t )) = − x 2 (t ) if x (t ) > 0, and u ∗ (t ) = + x 2 (t ) if x (t ) < 0. It is a sta-


bilizing input. Exactly as in Example 3.6.1 we have for every stabilizing u
that
∞
J [0,∞) (x 0 , u ) = L( x (t ), u (t )) dt
0∞
V ( x (t ))
≥ − f ( x (t ), u (t )) dt because of (3.30)
∂x T
0∞
= −V̇ ( x (t )) dt = V (x 0 ) − V ( x (∞)) = V (x 0 ).
0 ! "# $
0

And since equality holds for u = u ∗ (and u ∗ stabilizes) we see that u ∗ is


the solution we seek.

3.17 (a) It is a variation of Example 3.6.1. The infinite horizon HJB equa-
tion (3.30) becomes

min V  (x)(x + u) + u 4 = 0.
u∈R

The minimizing u satisfies V  +4u 3 = 0. Using this V  = −4u 3 the HJB


equation (3.30) becomes 0 = −4u 3 (x + u) + u 4 = −4u 3 x − 3u 4 . Hence,
either 0 = u = V  or u = − 43 x. For the latter, V  (x) = 4(4/3x)3 . So
V (x) = 3(4/3x)4 is a possible solution. Notice that u ∗ := − 43 x stabi-
lizes because then ẋ = x + u = − 13 x . Now, exactly as in Example 3.6.1,
we have for every stabilizing u that
∞
J [0,∞) (x 0 , u ) = L( x (t ), u (t )) dt
0∞
V ( x (t ))
≥ − f ( x (t ), u (t )) dt because of (3.30)
∂x T
0∞
= −V̇ ( x (t )) dt = V (x 0 ) − V ( x (∞)) = V (x 0 ).
0 ! "# $
0

And since equality holds for u = u ∗ (and u ∗ stabilizes) we see that u ∗


is the solution we seek.
(b) Clearly u ∗ (t ) = 0 minimizes the cost. It renders the closed loop
unstable: ẋ (t ) = x (t ), x (0) = 1 (so x (t ) = et ).

3.19 (a) Practically the same as Exercise 3.14(a): let t 1 , δ ∈ R. If x (t ) satisfies

ẋ (t ) = f ( x (t ), u (t )), x (t1 ) = x
then, by time-invariance, the shifted x̃ (t ) := x (t − δ), ũ (t ) := u (t − δ)
satisfy the same differential equation but with shifted-time initial
condition

x̃˙ (t ) = f ( x̃ (t ), ũ (t )), x̃ (t1 + δ) = x.


260 S OLUTIONS TO O DD -N UMBERED E XERCISES

Therefore

J [t1 ,T ] (x, u ) = J [t1 +δ,T +δ] (x, ũ ).

Hence, any cost that can be achieved over [t 1 , T ] starting at x (t 1 ) = x,


can also be achieved over [t 1 +δ, T  ] starting at x (t 1 +δ) = x for some
T  (namely T  = T +δ). So also the optimal cost-to-go (where we also
optimize over the final time T ) starting at x does not depend on the
initial time.
(b) Notice that we cannot use (3.30) because we are given the value
function V and not a solution V of (3.30). On the one hand we
x ∗ (t )) x ∗ (t ))
have by the chain rule that d V (dt = ∂ V (∂x T f ( x ∗ (t ), u ∗ (t )), and
d V ( x ∗ (t ))
on the other hand we have exactly as in § B.5 that dt =
−L( x ∗ (t ), u ∗ (t )).
(c) The second identity in the displayed equation in (b) immediately
yields H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) = 0 for p ∗ (t ) = ∂V ( x∂x
∗ (t ),t )
. It is in line with
Theorem 2.7.1 (at the final time) and the constancy of Hamiltonians
(Theorem 2.5.6).

Chapter 4

4.1 So A = 3, B = 2,Q = 4, R = 1, S = 0.
 
3 −4
(a) H = .
−4 −3
(b) Now
    
x ∗ (t ) 1 4 e5t + e−5t −2 e5t +2 e−5t x0
= .
p ∗ (t ) 5 −2 e5t +2 e−5t e5t +4 e−5t p ∗ (0)

Given that p ∗ (T ) = 0 we can determine p ∗ (0) (as a function of x 0 ):

2 e5T −2 e−5T
p ∗ (0) = x0 .
e5T +4 e−5T
This determines p ∗ (0) and, therefore, determines x ∗ , p ∗ for all time.
Finally, u ∗ = −R −1 B T p ∗ = −2 p ∗ . So also u ∗ is determined for all time.
The optimal cost is

2 e5T −2 e−5T
p ∗ (0)x0 = x 02 .
e5T +4 e−5T

4.3 (a) H (x, 2p, u) = 2p T (Ax + Bu) + x T Qx + u T u. So the Hessian of H w.r.t.


 
2Q 0
(x, u) is ≥ 0.
0 2I
S OLUTIONS TO O DD -N UMBERED E XERCISES 261

(b)

ẋ = A x + B ( u ∗ + v ), x (0) = x0 ,
ẋ ∗ = A x ∗ + B u ∗ , x ∗ (0) = x0 ,
=⇒ ż = A z + B v , z (0) = 0.
T
J= x T Q x + ( u ∗ + v )T ( u ∗ + v ) dt ,
0
T
J∗ = x ∗T Q x ∗ + u ∗T u ∗ dt ,
0
T
=⇒ J − J∗ = ( x ∗ + z )T Q( x ∗ + z ) − x ∗T Q x ∗ + v T v + 2 u ∗T v dt
0
T
= ( z T Q z + v T v ) + 2 z T Q x ∗ + 2 u ∗T v dt .
0
d
dt ( p ∗T z ) = ṗ ∗T z + p ∗T ż
= (−Q x ∗ − A T p ∗ )T z + p ∗T (A z + B v )
= − z T Q x ∗ + p ∗T B v .
= − z T Q x ∗ − u ∗T v .
T
So 0 z T (t )Q x ∗ (t ) + u ∗T (t ) v (t ) dt = [− p ∗T (t ) z (t )]T0 = 0 and, therefore,
T
J − J ∗ = 0 z T (t )Q z (t ) + v T (t ) v (t ) dt ≥ 0.

4.5 The solution x of ẋ = A x + B u is linear in both x 0 and u , so we can write


x = L (x0 , u ) for some linear mapping L .
T
(a) J [t ,T ] (λx, λ u ) = 0 (λL (x, u ))T Q(λL (x, u ))+λ2 u T R u dt = λ2 J [t ,T ] (x, u ).
As for the second equality, use that J [t ,T ] (x + z, u + w ) equals
T
(L (x, u )+L (z, w ))T Q(L (x, u )+L (z, w ))+( u T + w T )R( u + w ) dt
0

and that J [t ,T ] (x − z, u − w ) equals


T
(L (x, u )−L (z, w ))T Q(L (x, u )−L (z, w ))−( u T − w T )R( u − w ) dt .
0

The sum of these two cancels all cross terms and leaves
T
2L (x, u )T QL (x, u )+2L (z, w )T QL (z, w )+2 u T R u +2 w T R w dt .
0

(b) V (λx, t ) = min u J [t ,T ] (λx, u ) = min u J [t ,T ] (λx, λ u /λ) and because of


(a) this equals λ2 min u J [t ,T ] (x, u /λ) = λ2 V (x, t ).
We know that J [t ,T ] (λx, λ u ) = λ2 J [t ,T ] (x, u ). Since λ is constant we
see that u minimizes J [t ,T ] (x, u ) iff u minimizes J [t ,T ] (λx, λ u ), i.e., iff
w := λ u minimizes J [t ,T ] (λx, w ).
262 S OLUTIONS TO O DD -N UMBERED E XERCISES

(c) Let u x , w z be the minimizers of the right-hand side of (4.43).


Then (4.43) by definition of value function becomes

J [t ,T ] (x + z, u x + w z ) + J [t ,T ] (x − z, u x − w z ) = 2 V (x, t ) + 2 V (z, t ).

The result follows because, again by definition of value function,


V (x + z, t ) ≤ J [t ,T ] (x + z, u x + w z ) and V (x − z, t ) ≤ J [t ,T ] (x − z, u x − w z ).
(d) Essentially the same: define û = u + w , ŵ = u − w . Then (4.43) becomes

J [t ,T ] (x+z, û )+J [t ,T ] (x−z, ŵ ) = 2J [t ,T ] (x, ( û + ŵ )/2)+2J [t ,T ] (z, ( û − ŵ )/2).

Minimizing the left-hand side over all û , ŵ by definition of value


function gives

V (x +z, t )+V (x −z, t ) = 2J [t ,T ] (x, ( û ∗ + ŵ ∗ )/2)+2J [t ,T ] (z, ( û ∗ − ŵ ∗ )/2),

and the right-hand side by definition is at most 2 V (x, t ) + 2 V (z, t ).


(e) We saw earlier that (4.43) for u = u z , w = w z says

J [t ,T ] (x + z, u x + w z ) + J [t ,T ] (x − z, u x − w z ) = 2 V (x, t ) + 2 V (z, t ).

The previous two parts show that the above right-hand side equals
V (x + z, t ) + V (x − z, t ).
(f ) V is the minimal cost, so the left-hand side of the equality of the pre-
vious part is nonnegative, while the right-hand side is non-positive.
So they must both be zero! Hence J [t ,T ] (x + z, u x + w z ) = V (x + z, t ). It
shows that u x + w z is optimal for x +z. Scaling z with a factor λ shows
the result.
(g) Trivial: if u ∗ is linear in x (t ) then so is u ∗ (t ) at every t .
(h) Follows from (a) and the fact that V (x +λz, t ) = J [t ,T ] (x +λz, u x +λ w z ).

4.7 (a) The RDE is Ṗ = −2P + P 2 − 3, P (0) = s. We use Exercise 4.6: we have
Ṗ = (P + 1)(P − 3), so G := 1/(P + 1) satisfies Ġ = 4G − 1. Hence G(t ) =
1 4t
4 + c e for some c. Exercise 4.6 now says that

1 3 − c e4t
P (t ) = −1 + = .
c e4t +1/4 1 + c e4t

We need P (T ) = s, so c = e−4T 3−s


1+s . Write c as c = e−4T d then the
result follows.
(b) Since Ṗ = (P − 3)(P + 1) it is immediate that if P (t ) > 3 then Ṗ (t ) > 0
so P (t ) increases. Conversely, if −1 < P (t ) < 3 then Ṗ (t ) < 0 so P (t )
decreases.
(c) See the first sentence of the proof of Theorem 4.5.2.
S OLUTIONS TO O DD -N UMBERED E XERCISES 263

(d) The larger s is the higher the penalty on the final state, so for both
plots a “small” final value x (T ) corresponds to a “large” s. A bit
vague: if s is small and x is close to zero then the dominant term
in the cost function is u 2 . So (as long as x does not change too much
over the rest of time) it is “optimal” to take u small, but then ẋ ≈ x
so x starts to increase in magnitude.

4.9 (a) We have B = R = I 2 so it is clear that u ∗ = −R −1 B T P x = −P x .


   0 0 0   
(b) The matrix E = 11 −1 1 achieves Q̃ = 18 1 1 1
0 10 , Ã = 0 −2 , B̃ = 2 −1 1 .
Then B̃ B̃ T = 12 I 2 , so the ARE becomes
0  0   18 
0
0 −2 P +P 0
0 −2 − 12 P 2 + 0
0 10 = 0.
 
Clearly a diagonal P̃ suffices. It is easy to see that P̃ = 60 02 does the
job, and since P ≥ 0 it is the solution we need. Then P = E −T PE T =
2 1
12 .

4.11 (a) ż = −α e−αt x +e−αt ẋ = −α z +e−αt (A x +B u ) = (−αI + A) z +B v , z (0) =


T
x 0 . J = 0 z T Q z + v T R v dt .
(b) So α = 1 and ż = v and, hence, A = 0,Q = R = B = 1: Ṗ = P 2 − 1 and
P (T ) = 0. Example 4.5.1 says P (t ) = tanh(T − t ). Now

u ∗ (t ) = e+αt v ∗ (t ) = − e+αt R −1 B T P (t ) z (t ) = −P (t ) x (t ).
0  
1 1
4.13 (a) Note that P B for B = 1 is the 2nd column of P , so P B = 3 2 .

A T P + P A − (P B )R −1 (P B )T + Q
  22 1  1  22 1  0 1  1  1  1     1 0 
= 13 01 −1  +  − 3 2 3( 3 1 2 ) + 0 0
0 1 2 3 1 2 −1 0
       1     1 0 
= 13 −1  − 2 + 1 −1  2 2 −1 
3 − 2 1 3 2 1 2 + 00
$ ! "# $
2 2 1
! "#
 −2 2  1  1 2   1 0   0 0 
1 
= 3 −3  + 00 = 00 .
2 2 2 2

  0  1 ? ?   0 1   0   
(b) A − B R −1 B T P = 0 1
−1 0 − 
1 3 0 1 3 1 2 = −1 0 − 1 1 2 =
0 1 

−2 − 2 .
The standard stability test uses eigenvalues. The eigenvalues
 are the
−1 T
 λ −1 

zeros λ of det(λI − (A − B R B P )) = det 2 λ+ 2 = λ(λ + 2) + 2. So
 
λ1,2 = − 12 2 ± 3/2i. They have negative real part, hence the closed
loop is asymptotically stable.
(An easier test for stability is to verify that V (x) := x T P x is a strong
Lyapunov function for the closed-loop system. Indeed P > 0, and
V̇ (x) = −x T (Q + P B R −1 B T P )x and −(Q + P B R −1 B T P ) < 0.)
264 S OLUTIONS TO O DD -N UMBERED E XERCISES

2
4.15 (a) S = 0,Q = 4, R = 1, A = 3, B = 2. The ARE
 is −4P + 6P + 4 = 0. The
solutions P of the ARE are P = (−6 ± 62 + 43 )/ − 8 that is P = −1/2
and P = 2. So we need P = 2.
(b) (By the way, Lemma 4.5.6 guarantees that A − B R −1 B T P is asymp-
totically stable because (A, B ) is stabilizable and (Q, A) detectable.)
We have A − B R −1 B T P = 3 − 4 × 2 = 3 − 8 = −5. Asymptotically stable
indeed.
(c) F = −R −1 B T P = −4.
(d) P x 02 = 2x 02 .
(e) The eigenvalues must be ±5 because A − B R −1 B T P = −5.
 3 −4   1   −5  1
(f ) H = −4 −3 , P = 2. Then H 2 = −10 = −5 2 .

4.17 (a) A needs to be asymptotically stable (only then is (A, B ) stabilizable,


and as a result (Q, A) is detectable).
(b) P A + A T P + Q = 0.
(c) That P A + A T P +Q = 0 has a unique positive semi-definite solution P
if A is asymptotically stable and Q ≥ 0. That is part 4 of Thm. B.5.2.

4.19 (a) Swapping n rows means the determinant gains a factor (−1)n . Multi-
plying n rows with −1 means another (−1)n . Hence, in total a factor
(−1)2n = 1. So the sign of determinant does not change.
(b) Let Z (λ) be the matrix of (a). Clearly, (Z (−λ))T = Z (λ). Hence r (λ) =
det(Z (λ)) = det(Z (−λ))T ) = r (−λ).
(c) For every zero λ = 0 also −λ is a zero. Suppose it has 2m nonzero
%
zeros λ1 , . . . , λ2m . Then r (λ) = cλ2n−2m m i =1 (λ − λi )(λ + λi ) =
%
c(λ2 )n−m m i =1 (λ 2
− λ 2
i
). It is a function of λ 2
.

Appendix B

∂V (x̄) ∂V (x̄)
B.1 (a) V̇ (x̄) = ∂x T f (x̄) = ∂x T 0 = 0.
(b) Let x̄ 1 , x̄ 2 be two equilibria. Global asymptotic stability of x̄ 1 means
that also x (t ; x̄ 2 ) would have to converge to x̄ 1 , but it does not
because, by definition of equilibrium, we have x (t ; x̄ 2 ) = x̄ 2 for all t .
(c) x̄ is an equilibrium iff A x̄ = 0. If A is nonsingular then x̄ = A −1 0 = 0.
If A is singular then any element in the null space of A is an equilib-
rium (and the null space then has infinitely many elements).
       x0 
B.3 (a) ẋk̇ = (a−xk2 ) x , xk (0)
(0) = 0 . It is an equilibrium iff (x̄, k̄) = (0, k̄) with
k̄ free to choose.
(b) V̇ (x, k) = 2x(a − k)x + 2(k − a)x 2 = 0. This V is C 1 , is positive definite
relative to (x, k) = (0, a). So V is a Lyapunov function (on Ω = R2 ) for
equilibrium (0, a).
S OLUTIONS TO O DD -N UMBERED E XERCISES 265

(c) Since V̇ (x, k) = 0 we have that x 2 +( k − a)2 is constant. Hence ( k (t )−


a)2 = C − x 2 (t ) ≤ C , which implies k (t ) is bounded. (Notice that C =
x 02 + a 2 .)
(d) Math question: since k̇ (t ) ≥ 0 we have that k (t ) is (besides bounded)
also nondecreasing. This shows that k (t ) converges as t → ∞. Since
k (t ) converges and k̇ = x 2 we would expect x (t ) → 0 as t → ∞. Bet-
ter: since k̇ = x 2 we have that k̇ (t )+( k (t )−a)2 = V ( k̇ (t ), k (t )) which is
constant. So as k converges, also k̇ converges. And then k̇ obviously
converges to 0. Since k̇ = x 2 we thus conclude that x converges to
zero as t → ∞.

(e) The positive solution of x 02 + a 2 = (k − a)2 is k = a + a 2 + x 02 .

B.5 (a) If α = 0 then x̄ = 0, and if α = 0 then every x̄ is an equilibrium. In any


event, the linearization is δ̇x = Aδx with A = ∂ f (x̄)/∂x = 3a x̄ 2 which
is zero, so δ̇x = 0.
(b)

If a < 0 then the graph shows that x (t ) > 0 implies ẋ (t ) < 0, so


x (t ) then decreases (in the direction of 0). Likewise, if x (t ) < 0 then
ẋ (t ) > 0 so x (t ) then increases (again in the direction of 0). It keeps
on moving until it reaches 0 (in the limit). It is asymptotically stable.
If a = 0 then the graph obviously shows that ẋ (t ) = 0. Hence x (t ) is
constant. It does not converge to 0 if x 0 = 0 no matter how close x 0
is to zero.
If a > 0 then the graph shows that x (t ) > 0 implies ẋ (t ) > 0, so x (t )
increases (further away from 0). Likewise, x (t ) < 0 then ẋ (t ) < 0 so
x (t ) decreases (again further away from 0). The system is unstable.
(c) If α = 0 then every x̄ is an equilibrium. Take V (x) = (x − x̄)2 . It is
C 1 and positive definite (relative to x̄) and V̇ (x) = 0. So this V is a
Lyapunov function and, consequently, every x̄ ∈ R is a stable equilib-
rium.
(d) If α < 0 then x̄ = 0 asymptotically stable because V (x) = x 2 is a
strong Lyapunov function: it is C 1 , it is positive definite, and V̇ (x) =
2x f (x) = 2αx 4 < 0 for all x = 0.

B.7 Let Ω = R2 .
266 S OLUTIONS TO O DD -N UMBERED E XERCISES

(a) V (x) > 0 (relative to x̄ = (0, 0)) and it is C 1 , V̇ (x) = 4x 13 (x 2 − x 1 ) +


4x 2 (−x 13 ) = −4x 14 . It is ≤ 0 so V is a Lyapunov function (on whatever
neighborhood of x̄).
(b) It is not a strong Lyapunov equation because V̇ (x) = 0 for all x =
(0, x 2 ).
(c) According to LaSalle there is a closed, bounded invariant neighbor-
hood K of x̄, and for every x 0 ∈ K the state x (t ) for t → ∞ con-
verges to

G = {x ∈ K |V̇ ( x (t ; x)) = 0} = {x ∈ K | x 1 (t ; x) = 0, ẋ (t ; x) = f ( x (t ; x))}.

For x 1 (t ) = 0 the system dynamics ẋ (t ) = f ( x (t )) become 0 =


 
x 2 (t ), ẋ 2 (t ) = 0. Which means x 2 (t ) = 0. Hence G = { 00 } and we
therefore have asymptotic stability.

B.9 (a) We need to solve 0 = x 1 (−1 + x 2 ), 0 = x 2 (1 − x 1 ). The first equation


implies x 1 = 0 or x 2 = 1. If x 1 = 0 then the second equation holds iff
x 2 = 0. If x 2 = 1 then the second equation holds iff x 1 = 1. So (0, 0)
and (1, 1) are the only two equilibria.
(b) The Jacobian (for arbitrary x) is
 
−1 + x 2 x1
.
−x 2 1 − x1

At (0, 0) and (1, 1) they respectively are


   
−1 0 0 1
, .
0 1 −1 0

The first one is unstable (eigenvalue +1) so also the nonlinear system
is unstable at (0, 0). The second one has imaginary eigenvalues only,
so this says nothing about the stability of the nonlinear system.
(c) The function y − ln(y) − 1 ≥ 0 for all y ≥ 0 (proof: its derivative is zero
at y = 1 and its second derivative is > 0 so the function is minimal at
y = 1 and then y −ln(y)−1 = 0 ≥ 0). Likewise V ( x 1 , x 2 ) = ( x 1 +ln( x 1 )−
1) + ( x 2 − ln( x 2 ) − 1) is nonnegative and is minimal at (x 1 , x 2 ) = (1, 1)
(where V = 0). So V is positive definite relative to (1, 1). Clearly it is
also C 1 for all x 1 , x 2 > 0. (Yes, our state space is x 1 , x 2 > 0.) Remains
to analyze V̇ (x):

V̇ (x) = (1 − 1/x 1 )x 1 (−1 + x 2 ) + (1 − 1/x 2 )x 2 (1 − x 1 )


= 0.

So V (x) is preserved over time. Hence (1, 1) is stable, but not asymp-
totically stable.
S OLUTIONS TO O DD -N UMBERED E XERCISES 267

B.11 (a) The first equation says x 1 = 0 or x 1 = 1 or x 1 = 1/2. The second equa-
tion says x 2 = 0. So three equilibria: (0, 0), (1, 0), and (1/2, 0).
(b) Jacobian (for general x) is
 
−2(x 1 − 1)(2x 1 − 1) − 2x 1 (2x 1 − 1) − 2x 1 (x 1 − 1) 0
.
0 −2

At (0, 0), (1, 0), (1/2, 0) this becomes


     
−2 0 −2 0 1/2 0
, , .
0 −2 0 −2 0 −2

The first two have stable eigenvalues only (so the nonlinear system
is asymptotically stable at (0, 0) and (1, 0)).
(c) The third has a unstable eigenvalue (1/2 > 0) so the nonlinear system
at (1/2, 0) is unstable.

B.13 (a) Since the model does not include damping we expect the kinetic
energy to be constant, and V (ω) := 12 (I 1 ω21 + I 2 ω22 + I 3 ω22 ) clearly is
positive definite, and is C 1 . Now

V̇ (ω) = ω1 (I 2 −I 3 )ω2 ω3 + ω2 (I 3 −I 1 )ω1 ω3 + ω3 (I 1 −I 2 )ω1 ω2


= ω1 ω2 ω3 [(I 2 − I 3 ) + (I 3 − I 1 ) + (I 1 − I 2 )] = 0.

So V is Lyapunov function: the origin is stable.


(b) In fact the origin is not asymptotically stable because the above V is
constant over time.
(c) For every ω̄1 , ω̄2 , ω̄3 ∈ R the three points (ω̄1 , 0, 0), (0, ω̄2 , 0), (0, 0, ω̄3 )
are equilibria.
(d) For these three types of equilibria the Jacobian’s respectively are
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 0 0 0 I 2I−I 3
ω̄2 0 I 2 −I 3
I 1 ω̄3 0
⎢0 I 3 −I 1
ω̄ ⎥ ⎢ 1
⎥ ⎢ I 3 −I 1 ⎥
⎣ 0 I2 1 ⎦ , ⎣ 0 0 0 ⎦ , ⎣ I2 ω̄3 0 0⎦ .
I 1 −I 2 I 1 −I 2
0 I 3 ω̄1 0 I 3 ω̄2 0 0 0 0 0

(e) From the inequality 0 < I 1 < I 2 < I 3 it follows that the Jacobian is of
the form (with ω̄2 = 0):
⎡ ⎤
0 0 a ω̄2
⎣ 0 0 0 ⎦
b ω̄2 0 0

for some a, b < 0. Its characteristic polynomial is λ3 − λab ω̄22 . Since



ab ω̄22 > 0 we have an unstable real eigenvalue: λ1 = ab ω̄22 > 0. Con-
clusion: also the nonlinear system is unstable.
268 S OLUTIONS TO O DD -N UMBERED E XERCISES

(f ) In the first part we already showed that V (ω) := 12 (I 1 ω21 + I 2 ω22 + I 3 ω22 )
is constant over time. Likewise we have for W (ω) defined as

W (ω) = I 12 ω21 + I 22 ω22 + I 32 ω23

that

Ẇ (ω) = (I 1 (I 2 − I 3 ) + I 2 (I 3 − I 1 ) + I 3 (I 1 − I 2 ))2ω1 ω2 ω3 = 0.

Verifying stability is very technical. Consider an equilibrium


x̄ :=(ω̄1 , 0, 0), and take an initial state x 0 ∈ B (x̄, δ), that is, x 0 =
(ω̄1 + ω1 (0), ω2 (0), ω3 (0)) with

ω21 (0) + ω22 (ω) + ω23 (0) < δ2 .

Since both V ( x (t ; x 0 )) and W ( x (t ; x 0 )) are constant (over time) also

g (ω2 , ω3 ) :=W ( x ) − 2I 1V ( x ) = (I 2 − I 1 )I 2 ω22 + (I 3 − I 1 )I 3 ω23

is constant over time. (Notice that both (I 2 − I 1 )I 2 and (I 3 − I 1 )I 3 are


positive.) Let C = min((I 2 − I 1 )I 2 , (I 3 − I 1 )I 3 ) > 0. Then

g (ω2 (t ), ω3 (t )) g (ω2 (0), ω3 (0))


ω22 (t ) + ω23 (t ) ≤ = ≤ Dδ2 ∀t ,
C C
for some constant D (not depending on δ). Furthermore, since also
V ( x (t )) is constant over time, we infer that |ω1 (t ) − ω̄1 |2 is ≤ E δ2 for
all time for some constant E (not depending on δ). Given  > 0 we
can thus choose δ > 0 so small that (ω1 (t ) − ω̄1 , ω2 (t ), ω3 (t )) <  for
all t > 0 whenever x 0 ∈ B (x̄, δ). So the equilibrium x̄ :=(ω̄1 , 0, 0) is sta-
ble. It is not asymptotically stable because the kinetic energy V ( x (t ))
is constant over time, so the kinetic energy does not converge to zero
as t → ∞.
For the other equilibrium, (0, 0, ω̄3 ), a similar argument works.

B.15 Almost by definition: if y ∈ O (x 0 ) then y = x (t 1 ; x 0 ) for some t 1 . Then


x (t ; y) = x (t + t1 ; x0 ) ∈ O (x0 ).

B.17 For γ > 0 it is essentially the same as Example B.4.6. Anyway, for r := x 12 +
x 22 one has ṙ = 2 x 1 ( x 2 + x 1 (γ − x 21 − x 22 )) + 2 x 2 (− x 1 + (γ − x 21 − x 22 )) = 2( x 21 +
x 22 )(γ− x 21 − x 22 ) = 2 r (γ− r 2 ). If γ ≤ 0 then ṙ ≤ −2 r 3 so asymptotically stable
(take, e.g., Lyapunov function V (r ) = r 2 .) If γ > 0 then the linearization
around r = 0 is ṙ = (2γ) r , which is unstable (with eigenvalue 2γ > 0). So
also the nonlinear system is unstable in that case.
 
B.19 (a) P = 11 12 .
S OLUTIONS TO O DD -N UMBERED E XERCISES 269

(b) P > 0 because p 11 = 1 > 0 and det P = 1 > 0. Q > 0 because q 11 =


4 > 0 and detQ = 4 > 0, so Thm. B.5.2 guarantees that ẋ (t ) = A x (t ) is
asymptotically stable.

B.21 This is a linear system


 
1 2
ẋ (t ) = x (t ).
−α (1 − α)

Method 1: the characteristic polynomial of A is

det(λI − A) = λ2 + (α − 2)λ + 1 + α.

The Routh-Hurwitz test says that a degree-two polynomial is asymptot-


ically stable iff all its coefficients have the same sign, so asymptotically
stable iff α > 2. Ready.
Method 2: you might have forgotten about Routh-Hurwitz. Then compute
its zeros (roots):

2 − α ± (2 − α)2 − 4(1 + α)
λ1,2 = .
2
The sum of the λ1,2 is 2 − α. Hence for
 stability we need α > 2. If α > 2
− α)2 − 4(α + 1) < (2 − α)2 so (α − 2)2 − 4(1 + α) < |2 − α|. Then
then (2
2 − α ± (2 − α)2 − 4(1 + α) both have negative real part.
Method 3 (really laborious): it is asymptotically stable iff A T P + P A = −I
p q 
has a unique symmetric solution P , and P > 0. Write P as P = q r . The
Lyapunov equation becomes
       
1 −α p q p q 1 2 −1 0
+ = .
2 1−α q r q r −α 1 − α 0 −1

The three equations are

p − αq = 1,
2p + (2 − α)q − αr = 0,
2q + (1 − α)r = 1.

This turns out to give (yes, tricky):

2α2 − α + 2 3α − 2 6+α
p= , q= 2 , r= 2 .
α2 − α − 2 α −α−2 α −α−2
p q 
The P = q r needs to exist, so we need α = −1 and α = 2. As 2α2 −α+2 >
0 for all α, we have p 11 > 0 iff α2 − α − 2 > 0. This is the case iff α > 2 or
α < −1. The determinant of P is 2α(α+2α +8α+8
3 2
2 −α−2)2 . It is positive iff α > −1. This
combined with (α < −1 or α > 2) shows that P exists and is unique with
P > 0 iff α > 2.
270 S OLUTIONS TO O DD -N UMBERED E XERCISES

B.23 (a) Both matrices have characteristic polynomial det(λI − A) = (λ + 1)2 −


π2 /22 . Its zeros are −1 ± iπ/2.
(b) Trivial.
   0 −3 
(c) Follows from (b): x (1) = e−1 03 −1/3 x (0) and x (2) = e−1 1/3 0 x (1),

−2 −9 0
 0
so x (2) = e 0 −1/9 x (0). Et cetera. Since 3/ e > 1 we have
limk→∞  x (2k) = ∞ whenever x 1 (0) = 0.
(d) Per quadrant follow the blue or red phase portraits (alternating):

B.25 So Q := A T + A < 0. Then V (x) := x 12 + x 22 satisfies V̇ (x) = x T Qx < 0, so yes.


Alternative proof: if Av = λv and v = 0 then v ∗ (A + A T )v < 0 but v ∗ (A +
A T )v = (2 Re(λ))v2 . Hence Re(λ) < 0.
Bibliography

M. Athans and P.L. Falb. Optimal Control: An Introduction to the Theory and Its
Applications. McGraw-Hill, New York, 1966.

R.W. Brockett. Finite Dimensional Linear Systems. Wiley, New York, 1970.

A.E. Bryson and Y.C. Ho. Applied Optimal Control: Optimization, Estimation and
Control. Taylor & Francis Group, New York, 1975.

H.K. Khalil. Nonlinear Systems. Macmillan Publisher Company, New York, 2nd
edition, 1996.

H. Kwakernaak and R. Sivan. Linear Optimal Control Systems. Wiley-


Interscience, New York, 1972.

C. Lanczos. The Variational Principles of Mechanics. Dover Books On Physics.


Dover Publications, 1986.

D. Liberzon. Calculus of Variations and Optimal Control Theory: A Concise Intro-


duction. Princeton University Press, Princeton, 2012.

A. Seierstad and K. Sydsaeter. Optimal Control Theory with Economic Applica-


tions. Elsevier Science, Amsterdam, 3rd edition, 1987.

© The Editor(s) (if applicable) and The Author(s), under exclusive license 271
to Springer Nature Switzerland AG 2023
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0
Index

B (x, r ), 213 C
C 1, 7 catenoid, 17
C 2, 7 characteristic
H (x, p, u), 51 equation, 192
Hλ (x, p, u), 65 polynomial, 192
J T ( u ), 69 root, 192
J [0,T ] (x 0 , u ), 87 closed-loop system, 98
V (x, τ), 93 closed trajectory, 239
V̇ (x), 211 concave function, 198
∂ f (x)
∂x , 189 control
o, 10 closed-loop, 98
H∞ -norm, 172 open-loop, 98
H∞ optimization, 172 optimal, 47
L2 , 169 controllability, 195
L2 -gain, 171, 178 controller, 174
L2 -norm, 169 convex
calculus of variations, 29
A combination, 198
action integral, 19 function, 198
algebraic Riccati equation, 138 minimimum principle, 76
ARE, 138 set, 198
stabilizing solution, 140 cost
asymptotically stable augmented, 50
equilibrium, 210 criterion, 6
matrix, 194 final, 23, 47
attractive equilibrium, 210 function, 6
augmented initial, 23
cost, 49 running, 6, 47
function, 202 terminal, 23, 47
running cost, 50 cost-to-go, 224
available energy, 177 discrete time, 92
optimal, 93
B
costate, 52, 106
Bellman, 87
cycloid, 14
Beltrami identity, 13
brachistochrone problem, 3, 14

© The Editor(s) (if applicable) and The Author(s), under exclusive license 273
to Springer Nature Switzerland AG 2023
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0
274 Index

D Hamilton-Jacobi-Bellman, 96
detectability, 196 Hamiltonian, 51
Dido’s isoperimetric problem, 31 (ab)normal, 65
differential dissipation inequality, 177 equations, 52
discount for LQ, 123
factor, 5 matrix, 123
rate, 5 modified, 65
dissipation inequality, 175 Hessian, 190
dissipativity, 175 HJB, 96
du Bois-Reymond, 39 infinite horizon, 107
dynamic programming, 87
I
E infinite horizon
endpoint LQ problem, 137
free, 23 optimal control problem, 107
energy, 176 initial
equilibrium, 210 cost, 23
asymptotically stable, 210 state, 205
attractive, 210 input, 47, 195
stable, 210 stabilizing, 109, 139
unstable, 210 integral constraint, 31
escape time, 208 invariant set, 219
Euler, 10
Euler-Lagrange equation, 10 J
discrete time, 45 Jacobian, 190, 230
higher-order, 20
L
Euler equation, 10
Lagrange, 10
F lemma, 9
filter, 173 multiplier, 32, 50, 202
final cost, 23, 47 Lagrangian, 6, 19
final time, 69 submanifold, 182
floor, 242 subspace, 180
free endpoint, 23 LaSalle’s invariance principle, 221
free final time, 69 Legendre condition, 28
lemma of du Bois-Reymond, 39
G linear quadratic optimal control, 121
global linearized system, 229
asymptotic stability, 210 line segment, 198
attractive, 210 Lipschitz
Lipschitz condition, 207 constant, 206
Goldschmidt solution, 18 continuity, 206
global, 207
H local, 206
Hamilton’s principle, 19
Index 275

little-o, 10, 229 definite function, 187, 212


Lotka Volterra, 236 definite matrix, 187
LQ problem, 121 semi-definite function, 187, 212
infinite horizon, 137 semi-definite matrix, 187
with stability, 139 power, 176
Lyapunov principle of optimality, 88
equation, 226
first method, 229 R
function, 212 radial unboundedness, 216
second method, 211 RDE, 130
strong, 212 removable discontinuity, 62
Riccati
M algebraic equation, 138
Mangasarian, 76 differential equation, 130
matrix exponential, 193 running cost, 6, 47, 225
maximum principle, 57
minimum principle, 54 S
model predictive control, 183 Schur complement, 188
MPC, 183 second-order condition, 26
set-point, 165
N simplest problem in the calculus of vari-
negative semi-definite, 212 ations, 6
stability
O asymptotic, 210
observability, 196 equilibrium, 210
open-loop control, 98 globally asymptotic, 210
optimal stabilizability, 195
cost-to-go, 93 stabilizing
input, 47 input, 108, 109, 139
optimal control, 47 solution of ARE, 140
infinite horizon, 109 stable matrix, 194
linear quadratic, 121 standard H∞ problem, 174
problem, 47 state, 205
sequence, 89 feedback, 98
time, 70 initial, 205
orbit, 219 stationary, 10
output, 195 storage function, 175
supply rate, 175
P
L2 , 176
partial derivative, 189
passivity, 176
particular solution, 192
symplectic form, 179
passivity, 176
system
piecewise continuity, 54
closed-loop, 98
Pontryagin, 57
positive
276 Index

T V
terminal cost, 47 value function, 93
theorem of alternatives, 203 infinite horizon, 107
time Van der Pol equation, 236
escape, 208
optimal control, 70 W
tuning parameter, 149 Weierstrass necessary condition, 115
Wirtinger inequality, 159
U
unstable equilibrium, 210 Z
Zermelo, 70

You might also like