Course On Optimal Control
Course On Optimal Control
Gjerrit Meinsma
Arjan van der Schaft
A Course on
Optimal Control
Springer Undergraduate Texts
in Mathematics and Technology
Series Editors
Helge Holden, Department of Mathematical Sciences, Norwegian University
of Science and Technology, Trondheim, Norway
Keri A. Kornelson, Department of Mathematics, University of Oklahoma,
Norman, OK, USA
Editorial Board
Lisa Goldberg, Department of Statistics, University of California, Berkeley,
Berkeley, CA, USA
Armin Iske, Department of Mathematics, University of Hamburg, Hamburg,
Germany
Palle E.T. Jorgensen, Department of Mathematics, University of Iowa, Iowa
City, IA, USA
Springer Undergraduate Texts in Mathematics and Technology (SUMAT)
publishes textbooks aimed primarily at the undergraduate. Each text is
designed principally for students who are considering careers either in the
mathematical sciences or in technology-based areas such as engineering,
finance, information technology and computer science, bioscience and
medicine, optimization or industry. Texts aim to be accessible introductions
to a wide range of core mathematical disciplines and their practical,
real-world applications; and are fashioned both for course use and for
independent study.
Gjerrit Meinsma Arjan van der Schaft
•
A Course on
Optimal Control
123
Gjerrit Meinsma Arjan van der Schaft
Faculty of Electrical Engineering Department of Mathematics
Mathematics and Computer Science Bernoulli Institute
University of Twente University of Groningen
Enschede, The Netherlands Groningen, The Netherlands
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher,
whether the whole or part of the material is concerned, specifically the rights of translation,
reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any
other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are
exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in
this book are believed to be true and accurate at the date of publication. Neither the publisher nor
the authors or the editors give a warranty, expressed or implied, with respect to the material
contained herein or for any errors or omissions that may have been made. The publisher remains
neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This book reflects a long history of teaching optimal control for students in mathe-
matics and engineering at the universities of Twente and Groningen, the Netherlands.
In fact, the book has grown out of lecture notes that were tested, adapted, and expanded
over many years of teaching.
The present book provides a self-contained treatment of what the undersigned con-
sider to be the core topics of optimal control for finite-dimensional deterministic
dynamical systems. The style of writing aims at carefully guiding the students through the
mathematical developments, and emphasizes motivational examples; either of a math-
ematical nature or motivated by applications of optimal control.
Chapter 1 covers the basics of the classical calculus of variations, including
second-order conditions and integral constraints. This directly motivates the intro-
duction of the minimum principle (more commonly known as the maximum principle)
in Chapter 2. Although the presentation of the book aims at minimizing generalities, the
treatment of the minimum principle as given in Chapter 2 is self-contained, and suited
for a basic course on optimal control. Chapter 3 continues with the dynamic pro-
gramming approach to optimal control, culminating in the Hamilton-Jacobi-Bellman
equation. The connection with the minimum principle is discussed, as well as the
relation between infinite horizon optimal control and Lyapunov functions. In Chapter 4,
the theory of Chapters 2 and 3 is applied, and specialized, to linear control systems with
quadratic cost criteria (LQ-optimal control). This includes a succinct but detailed
treatment of Riccati differential equations and algebraic Riccati equations. The chapter
is concluded with a section on controller design based on LQ-optimal control.
In our experience, the material of Chapters 1–4 provides a good coverage of an
introductory, but mathematically self-contained, course on optimal control for final
year BSc (or beginning MSc) students in (applied) mathematics and beginning MSc
students in engineering. We have taught the course for such an audience for many years
as an 8-week course of 4 lecture hours and 2 tutorial hours per week (5 ECTS in the
European credit system). Of course, the contents of the course can still be adapted. For
example, in some of the editions of the course we did not cover Section 1.7 on integral
constraints, but instead we paid more attention to Lyapunov stability theory as detailed
in Appendix B.
Required background for the course is linear algebra and calculus, basic knowledge
of differential equations, and (rudimentary) acquaintance with control systems. Some
mathematical background is summarized in Appendix A for easy recollection and for
bringing students to the same mathematical level. Appendix B goes further: it does not
only recall some of the basics of differential equations, but also provides a rather
detailed treatment of Lyapunov stability theory including LaSalle’s invariance principle,
as occasionally used in Chapters 3 and 4 of the book.
Chapter 5 of the book is of a different nature. It is not considered to be part of the
basic material for a course on optimal control. Instead, it provides brief outlooks to a
number of (somewhat arbitrarily chosen) topics that are related to optimal control, in
order to raise interest of students. As such Chapter 5 is written differently from Chapters
1–4. In particular the treatment of the covered topics is not always self-contained.
v
vi PREFACE
At the end of each of Chapters 1–4, as well as of Appendix B, there is a rich collection
of exercises, including a number of instructive examples of applications of optimal
control. Solutions to the odd-numbered exercises are provided.
Main contributors to the first versions of the lecture notes (developed since the
1990s) were Hans Zwart, Jan Willem Polderman (both University of Twente), and Henk
Nijmeijer (University of Twente, currently Eindhoven University of Technology). We
thank them for their initial contributions.
In 2006–2008 Arjan van der Schaft (University of Groningen) made a number of
substantial revisions and modifications to the then available lecture notes. In the period
2010–2018 Gjerrit Meinsma (University of Twente) rewrote much of the material, and
added more theory, examples, and illustrations. Finally, in 2021–2023 the book took its
final shape. We thank the students and teaching assistants for providing us with con-
stant feedback and encouragement over the years.
We have profitably used many books and papers in the writing of this book. Some
of these books are listed in the References at the end. In particular some of our
examples and exercises are based on those in Bryson and Ho (1975) and Seierstad and
Sydsaeter (1987). We thank Leonid Mirkin (Technion, Haifa) for Example 4.6.2.
Unavoidably, there will be remaining typos and errors in the book. Mentioning
of them to us is highly welcomed. A list of errata will be maintained at the
website https://people.utwente.nl/g.meinsma?tab=projects.
This website can also be reached by scanning the QR code below.
1 Calculus of Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Euler-Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Beltrami Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Higher-Order Euler-Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Relaxed Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.6 Second-Order Conditions for Minimality . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.7 Integral Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2 Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.1 Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2 Quick Summary of the Classic Lagrange Multiplier Method . . . . . . . . . . . 49
2.3 First-order Conditions for Unbounded and Smooth Controls . . . . . . . . . . 50
2.4 Towards the Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.5 Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6 Optimal Control with Final Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.7 Free Final Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.8 Convexity and the Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.2 Principle of Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3 Discrete-Time Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4 Hamilton-Jacobi-Bellman Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5 Connection with the Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.6 Infinite Horizon Optimal Control and Lyapunov Functions . . . . . . . . . . . . 107
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
vii
viii CONTENTS
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Notation and Conventions
While most of the notation used in this book is standard, there are a few conventions we
would like to emphasize.
Notation for vectors and functions of time. We frequently switch from functions
x : R ! Rn
to vectors
x 2 Rn
and back to functions, and this can be confusing upon first reading. To highlight the
difference we typeset functions of time usually in an upright math font, e.g., x, instead
of the standard italic math font, e.g., x. This convention is used in differential equations
d
xðtÞ ¼ axðtÞ; xð0Þ ¼ x0 ;
dt
and solutions of them, e.g., xðtÞ ¼ eat x0 : But it is used mainly to avoid possibly
ambiguous expressions. For instance whenever we use V(x) we mean that V is a
function of x 2 Rn and not of the whole time function x : R ! Rn . We still use the italic
math font for functions of time if they only play a minor role, such as a(t) in
d
xðtÞ ¼ aðtÞxðtÞ þ uðtÞ:
dt
d dgðtÞ
_
gðtÞ or g 0 ðtÞ or gðtÞ or :
dt dt
Now if, for example, F : R3 ! R and x : R ! R; then
d @
Fðt; xðtÞ; x_ ðtÞÞ
dt @ x_
@
means the total derivative with respect to t of the partial derivative @v Fðt; x; vÞ evaluated
at ðt; x; vÞ ¼ ðt; xðtÞ; x_ ðtÞÞ. For instance, if Fðt; x; vÞ ¼ t v3 then
ix
x Notation and Conventions
d @ d @ t x_ ðtÞ
3 _ 2 ðtÞ
d 3t x
Fðt; xðtÞ; x_ ðtÞÞ ¼ ¼ _ 2 ðtÞ þ 6t x_ ðtÞ x
¼ 3x € ðtÞ:
dt @ x_ dt @ x_ dt
Notation for differentiation with respect to a column or a row vector. For functions
f : Rn ! R we think of its gradient at some x 2 Rn as a column vector, and we denote it as
@f ðxÞ
; so
@x 2 3
@f ðxÞ
6 @x1 7
6 7
6 7
7
6 @f ðxÞ
@f ðxÞ 6 7
¼6
6 @x2 7 n
72R :
@x 6 .. 7
6 . 7
6 7
5
4 @f ðxÞ
@xn
In fact, throughout we think of Rn as the linear space of n-dimensional column vectors.
The above is a derivative with respect to a column vector x 2 Rn , and the outcome (the
gradient) is then a column vector as well. By the same logic, if we differentiate f with
respect to a row vector xt 2 R1n —mind the transpose—then we mean the gradient
seen as a row vector,
@f ðxÞ
@f ðxÞ
@f ðxÞ
@f ðxÞ
¼ 2 R1n :
@xt @x1 @x2 @xn
The previous two conventions are also combined: if F : R Rn Rn ! R and
x : R ! Rn , then
d @
Fðt; xðtÞ; x_ ðtÞÞ
dt @ x_
means the total derivative with respect to t of the column vector with n entries of
@ _ d @ _
@ x_ Fðt; xðtÞ; xðtÞÞ: Then dt @ x_ Fðt; xðtÞ; xðtÞÞ is a column vector with n entries as well.
Occasionally we also need second-order derivatives with respect to vectors x, such as
Hessian matrices. Their notation is discussed in Appendix A.2.
Chapter 1
Calculus of Variations
1.1 Introduction
x : [0, T ] → Rn .
A A
B B
A A
B B
F IGURE 1.1: Four paths from A to B . Which is fastest? See Example 1.1.1.
x
(x0 , 0)
F IGURE 1.2: In this case a positive y means a negative altitude. See Exam-
ple 1.1.1.
y(x)
ds
dx x
d y (x)
F IGURE 1.3: ds = 1 + ẏ 2 (x) dx. Here ẏ (x) := dx .
1.1 I NTRODUCTION 3
1
mv 2 − mg y = c
2
for some constant c. Here v is the speed of the mass. We release the mass at
zero altitude and with zero speed, so c = 0. Hence the speed v follows uniquely
from y as
v = 2g y.
Thus the brachistochrone problem is to minimize the integral (1.1) over all func-
tions y : [x 0 , x 1 ] → R subject to y (x 0 ) = y 0 = 0 and y (x 1 ) = y 1 .
H.J. Sussmann and J.C. Willems. 300 years of optimal control: from the brachistochrone to the
maximum principle. IEEE Control Systems Magazine, 17:32–44, 1997.
4 1 C ALCULUS OF VARIATIONS
y y (x) u
u(c)
x c
(a) (b)
an economy of some country. We distinguish its capital stock x (t ) in, say, euros,
which is a measure of the physical capital in the country at time t . We also need
the net national product y (t ) in euros per unit time, which is the value of all
that is produced at time t per unit time in the country. The derivative ẋ (t ) of
capital stock with respect to t is the increase in physical capital, and it is called
investment. Therefore, what is left for consumption (euros per unit time) at time
t is the difference between national product and investment,
c (t ) := y (t ) − ẋ (t ).
y (t ) = φ( x (t )) (1.2)
x (0) = x0 . (1.3)
Then, given an arbitrary investment function ẋ (t ), all variables in our model are
determined since
t
x (t ) = x0 + ẋ (τ) dτ. (1.4)
0
The question now is: what is a good investment function ẋ (t )? A way to answer
this question is as follows. Suppose we have a utility function u(c) that mod-
els the enjoyment of consuming c. Standard assumptions on utility functions
are that they are strictly increasing, strictly concave, and twice continuously
1.1 I NTRODUCTION 5
differentiable, so u (c) > 0, u (c) < 0 for all c > 0, see Fig. 1.4(b). This is just to
say that additional enjoyment of additional consumption flattens at high levels
of consumption.
An investment function ẋ (t ) is now considered optimal if it maximizes the
T
integrated utility 0 u( c (t )) e−αt dt , that is, if it maximizes
T T
−αt
u( c (t )) e dt = u φ( x (t )) − ẋ (t ) e−αt dt (1.5)
0 0
over all investment functions ẋ (t ) or, equivalently, over all functions x (t ) sat-
isfying (1.3). The term e−αt is a so-called discount factor (and α is a discount
rate, assumed positive). This is included to express that the importance of the
future utility u( c (t )) is considered to be declining with t further in the future.
The optimization problem is of the same type as before apart from the fact that
we are maximizing instead of minimizing. Clearly, maximizing the integrated
utility (1.5) is equivalent to minimizing its negation
T T
−u( c (t )) e−αt dt = −u φ( x (t )) − ẋ (t ) e−αt dt .
0 0
The end time of the planning period is denoted as T , and we will assume in
addition that
x (T ) = xT (1.6)
for some given desired capital stock x T . This type of model for optimal eco-
nomic growth was initiated by F.P. Ramsey in 1928.
where β x (t ) models the storage cost per unit time and α ẋ 2 (t ) models the pro-
duction cost per unit time. The constants α, β are positive numbers. The objec-
tive of the cheesemaker is to determine a production profile x (t ) that minimizes
the above cost, subject to the conditions
x (0) = 0, x (T ) = xT , ẋ (t ) ≥ 0. (1.8)
Example 1.1.4 (Shortest path). What is the shortest path between two points
(x 0 , y 0 ) and (x 1 , y 1 ) in R2 ? Of course we know the answer but let us anyway for-
mulate this problem in more detail.
6 1 C ALCULUS OF VARIATIONS
With the exception of the final example, the optimal solution—if one exists
at all—is not easy to find.
The examples given in the preceding section are instances of what is called the
simplest problem in the calculus of variations:
x (0) = x0 , x (T ) = xT . (1.12)
The function J is called the cost (function) or cost criterion, and the inte-
grand F of this cost is called the running cost or the Lagrangian. For n = 1
the problem is visualized in Fig. 1.5: given the two points (0, x 0 ) and (T, x T )
each smooth function x that connects the two points determines a cost J ( x )
as defined in (1.11), and the problem is to find the function x that minimizes
this cost.
The calculus of variations problem can be regarded as an infinite-
dimensional version of the basic optimization problem of finding a z ∗ ∈ Rn
that minimizes a function K : Rn → R. The difference is that the function K
is replaced by an integral expression J , while vectors z ∈ Rn are replaced by
functions x : [0, T ] → Rn .
1.2 E ULER-L AGRANGE E QUATION 7
xT
x(t )
x0
x̃(t )
t 0 t T
F IGURE 1.5: Two functions x , x̃ : [0, T ] → R that satisfy the boundary condi-
tions (1.12).
xT
x (t ) x (t )
x (t )
x0
t 0 t T
δx (0) = δx (T ) = 0. (1.14)
x (t ) = x ∗ (t ) + αδx (t ),
in which α ∈ R. Notice that this x for every α ∈ R satisfies the boundary condi-
tions x (0) = x ∗ (0) = x 0 and x (T ) = x ∗ (T ) = x T , see Fig. 1.6. Since x ∗ is a mini-
mizing solution for our problem we have that
For every fixed function δx the cost J ( x ∗ + αδx ) is a function of the scalar vari-
able α,
J¯(α) := J ( x ∗ + αδx ), α ∈ R.
The minimality condition (1.15) thus implies that J¯(0) ≤ J¯(α) for all α ∈ R. Given
that x ∗ , δx and F are all assumed C 1 , it follows that J¯(α) is differentiable as a
function of α, and so the above implies that J¯ (0) = 0. This derivative is3
d T
J¯ (0) = F (t , x ∗ (t ) + αδx (t ), ẋ ∗ (t ) + αδ̇x (t )) dt
dα 0 α=0
T
∂F (t , x ∗ (t ), ẋ ∗ (t )) ∂F (t , x ∗ (t ), ẋ ∗ (t ))
= δx (t ) + δ̇x (t ) dt . (1.16)
0 ∂x T
∂ẋ T
In the rest of the proof we assume that F and x ∗ and δx are C 2 . (The case when
they are only C 1 is slightly more involved; this is covered in Exercise 1.7.) Inte-
gration by parts of the second term in (1.16) yields4
T
∂F (t , x ∗ (t ), ẋ ∗ (t ))
δ̇x (t ) dt
0 ∂ẋ T
T
∂F (t , x ∗ (t ), ẋ ∗ (t )) T
d ∂F (t , x ∗ (t ), ẋ ∗ (t ))
= δx (t ) − δx (t ) dt .
∂ẋ T 0 0 dt ∂ẋ T
(1.17)
3 Leibniz’ integral rule says that d
dα
G(α, t )dt = ∂G(α,t
∂α
)
dt if G(α, t ) and ∂G(α,t
∂α
)
are continu-
ous in t and α. Here they are continuous because F and δx are assumed C . 1
4 The integration by parts rule holds if ∂ F (t , x (t ), ẋ (t )) and δ (t ) are C 1 with respect to
∂ẋ T ∗ ∗ x
time. This holds if F, x ∗ , δx are C 2 in all their arguments.
1.2 E ULER-L AGRANGE E QUATION 9
Plugging (1.17) into (1.16) and using that J¯ (0) = 0 we find that
T
∂F (t , x ∗ (t ), ẋ ∗ (t )) T
∂ d ∂ T
0= δx (t ) + − F (t , x ∗ (t ), ẋ ∗ (t )) δx (t ) dt .
∂ẋ T 0 0 ∂x dt ∂ẋ
(1.18)
The first term on the right-hand side is actually zero because of the boundary
conditions (1.14). Hence we have
T
∂ d ∂ T
0= − F (t , x ∗ (t ), ẋ ∗ (t )) δx (t ) dt . (1.19)
0 ∂x dt ∂ẋ
So far the perturbation δx in our derivation was some fixed function. However
since δx can be arbitrarily chosen, the equality (1.19) must hold for every C 2
perturbation δx that satisfies (1.14). But this implies, via the result presented
next (Lemma 1.2.3), that the term in between the square brackets in (1.19) is
zero for all t ∈ [0, T ], i.e., that (1.13) holds. ■
(t )
x (t )
a t̄ b t
for every C 2 function δx : [0, T ] → Rn satisfying (1.14) iff φ(t ) = 0 for all t ∈ [0, T ].
Proof. We prove it for n = 1. Figure 1.7 explains it all: suppose that φ is not the
zero function, i.e., that φ(t̄ ) is nonzero for some t̄ ∈ [0, T ]. For example, φ(t̄ ) > 0.
Then, by continuity, φ(t ) is positive on some interval [a, b] around t̄ (with 0 ≤
a < b ≤ T ). In order to provide a formal proof consider the function δx defined
as
((t − a)(b − t ))3 t ∈ [a, b],
δx (t ) = (1.21)
0 elsewhere,
see Figure 1.7. Clearly this δx fulfills the requirements of (1.14), but it vio-
lates (1.20) because both φ and δx are positive on [a, b], and hence the integral
in (1.20) is positive as well. A similar argument works for φ(t̄ ) < 0. The assump-
tion that φ(t̄ ) = 0 at some t̄ ∈ [0, T ] hence is wrong. ■
10 1 C ALCULUS OF VARIATIONS
J ( x ∗ + αδx ) = J ( x ∗ ) + o(α),
ÿ (x) = 0,
which is another way of saying that y (x) is a straight line. In light of the bound-
ary conditions y (x 0 ) = y 0 and y (x 1 ) = y 1 , it has the unique solution
y −y
y ∗ (x) = y 0 + x11 −x00 (x − x0 ).
o(y)
5 A little-o function o : Rm → Rk is any function with the property that lim y→0 y = 0.
1.2 E ULER-L AGRANGE E QUATION 11
∂ d ∂
− u φ( x (t )) − ẋ (t ) e−αt = 0,
∂x dt ∂ẋ
This, together with the boundary conditions (1.3) and (1.6), has to be solved
for the unknown function x (t ), or—see also (1.4)—for the unknown investment
function ẋ (t ). This can be done once the utility function u(c) and the consump-
tion function φ(x) are specified.
∂ d ∂ d
0= − (α ẋ 2 (t ) + β x (t )) = β − 2α ẋ (t ) = β − 2α ẍ (t ).
∂x dt ∂ẋ dt
β
So ẍ (t ) = 2α , that is,
β 2
x (t ) = t + ẋ 0 t + x 0 . (1.26)
4α
The constants x 0 and ẋ 0 follow from the boundary conditions x (0) = 0 and
x (T ) = xT , i.e., x0 = 0 and ẋ0 = xT /T − βT /(4α). Of course, it still remains to be
seen whether the x (t ) defined in (1.26) is indeed minimizing (1.7). Notice that
the extra constraint, ẋ (t ) ≥ 0, from (1.8) puts a further restriction on the total
amount of x T and the final time T .
12 1 C ALCULUS OF VARIATIONS
x 1 (0) = 0, x 2 (0) = 0, x 1 ( π2 ) = 1, x 2 ( π2 ) = 1.
Since the minimization is over a vector x = xx 12 of two components, the Euler-
Lagrange equation is given by a two-dimensional system of differential equa-
tions
−2 x 2 (t ) d 2 ẋ 1 (t ) 0
− = ,
−2 x 1 (t ) dt 2 ẋ 2 (t ) 0
These are linear differential equations with constant coefficients, and they can
be solved with standard methods (see Appendix A.4). The general solution is
x ∗1 (t ) = x ∗2 (t ) = sin(t ).
In many applications, the running cost F (t , x, ẋ) does not depend on t and thus
has the form
F (x, ẋ).
∂F (x,ẋ)
Obviously the partial derivative ∂t is zero now. An interesting consequence
is that then
∂F ( x (t ), ẋ (t ))
F ( x (t ), ẋ (t )) − ẋ T (t )
∂ẋ
1.3 B ELTRAMI I DENTITY 13
is constant over time for every solution x of the Euler-Lagrange equation. To see
this, we differentiate the above expression with respect to time (and for ease of
notation we momentarily write x (t ) simply as x ):
d ∂F ( x , ẋ )
F ( x , ẋ ) − ẋ T
dt ∂ẋ
d d T ∂F ( x , ẋ )
= F ( x , ẋ ) − ẋ
dt dt ∂ẋ
∂F ( x , ẋ ) ∂F ( x , ẋ ) ∂F ( x , ẋ ) d ∂F ( x , ẋ )
= ẋ T + ẍ T − ẍ T + ẋ T
∂x ∂ẋ ∂ẋ dt ∂ẋ
∂F ( x , ẋ ) d ∂F ( x , ẋ )
= ẋ T − . (1.27)
∂x dt ∂ẋ
This is zero for every solution x of the Euler-Lagrange equation. Hence every
stationary solution x ∗ has the property that
∂F ( x ∗ (t ), ẋ ∗ (t ))
F ( x ∗ (t ), ẋ ∗ (t )) − ẋ ∗T (t ) =C ∀t ∈ [0, T ]
∂ẋ
for some integration constant C . This identity is known as the Beltrami iden-
tity. We illustrate the usefulness of this identity by explicitly solving the brachis-
tochrone problem. It is good to realize, though, that the Beltrami identity is
not equivalent to the Euler-Lagrange equation. Indeed, every constant function
x (t ) satisfies the Beltrami identity. The Beltrami identity and the Euler-Lagrange
equation are equivalent for scalar functions x : [0, T ] → R if ẋ (t ) is nonzero for
almost all t , as can be seen from (1.27).
y
c2
0 c2 x
A
x
c2
F IGURE 1.8: Top: shown in red is the cycloid x (φ) = 2 (φ − sin(φ)), y (φ) =
c2
2 (1 − cos(φ)) for φ ∈ [0, 2π]. It is the curve that a point on a rolling disk of
2
radius c /2 traces out. Bottom: a downwards facing cycloid (solution of the
brachistochrone problem). See Example 1.3.1.
14 1 C ALCULUS OF VARIATIONS
A (0, 0) x
F IGURE 1.9: Cycloids (1.29) for various c > 0. Given a B to the right and
below A = (0, 0) there is a unique cycloid that joins A and B . See Exam-
ple 1.3.1.
It does not depend on x, so Beltrami applies which says that the solution of the
brachistochrone problem makes the following function constant (as a function
of x):
∂F ( y (x), ẏ (x)) 1 + ẏ 2 (x) ẏ 2 (x)
F ( y (x), ẏ (x)) − ẏ (x) = −
∂ ẏ 2g y (x) 2g y (x)(1 + ẏ 2 (x))
1
= .
2g y (x)(1 + ẏ 2 (x))
c2 c2
x (φ) = (φ − sin(φ)), y (φ) = (1 − cos(φ)). (1.29)
2 2
The curve ( x (φ), y (φ)) is known as the cycloid. It is the curve that a fixed point
on the boundary of a wheel with radius c 2 /2 traces out while rolling with-
6 Quick derivation: since the cotangent cos(φ/2)/ sin(φ/2) for φ ∈ [0, 2π] ranges over all
real numbers once (including ±∞) it follows that any dy/dx can uniquely be written
as dy/dx = cos(φ/2)/ sin(φ/2) with φ ∈ [0, 2π]. Then (1.28) implies that y (φ) = c 2 /(1 +
cos2 (φ/2)/ sin2 (φ/2)) = c 2 sin2 (φ/2) = c 2 (1 − cos(φ))/2 and then dx/dφ = (dy/dφ)/(dy/dx) =
[c 2 sin(φ/2) cos(φ/2)]/[cos(φ/2)/ sin(φ/2)] = c 2 sin2 (φ/2) = c 2 (1 − cos(φ))/2. Integrating this
expression shows that x (φ) = c 2 (φ − sin(φ))/2 + d where d is some integration constant. This d
equals zero because (x, y) :=(0, 0) is on the curve. (See Exercise 1.4 for more details.)
1.3 B ELTRAMI I DENTITY 15
out slipping on a horizontal line (think of the valve on your bike’s wheel), see
Fig. 1.8. For the cycloid, the Beltrami identity and the Euler-Lagrange equation
are equivalent because ẏ (x) is nonzero almost everywhere. Hence all sufficiently
smooth stationary solutions of the brachistochrone problem are precisely these
cycloids.
Varying c in (1.29) generates a family of cycloids, see Fig. 1.9. Given a desti-
nation point B to the right and below A = (0, 0) there is a unique cycloid that
connects A and B , and the solution of the brachistochrone problem is that
segment of the cycloid. Notice that for certain final destinations B the curve
extends below the final destination!
r(x)
1 x 1
dx
ra ( 1)
aG 1.564
a 0.834
1.509 G (a)
surface
area a a
17.16 a a
(b)
surface
area Goldschmidt
22.56
17.16 a a
G 1.895 (c)
F IGURE 1.11: (a) The endpoint radius r a (±1) := a cosh(1/a) of the catenoid
as a function of a. Its minimal value r a (±1) is ρ ∗ = 1.509 (attained at a ∗ =
0.834); (b) the area of the catenoid as a function of endpoint radius ρ; (c)
the area of the catenoid (in red) and of the Goldschmidt solution (in yellow)
as a function of endpoint radius ρ. The two areas are the same at ρ G =
1.895. This ρ G corresponds to a G = 1.564 (see part (a) of this figure). See
Example 1.3.2.
1.3 B ELTRAMI I DENTITY 17
r a (±1) = a cosh(1/a)
as a function of a (notice the flipped axes in Figure 1.11(a)). The figure demon-
strates that the endpoint radius has a minimum, and the minimum is ρ ∗ =
1.509, and it is attained at a ∗ = 0.834. So if we choose an endpoint radius ρ less
than ρ ∗ then none of these hyperbolic cosines r a is the solution to our prob-
lem! Also, if ρ > ρ ∗ then apparently there are two hyperbolic cosines that meet
the endpoint condition, r a (±1) = ρ, and at most one of them is the optimal
solution. It can be shown that the area of the catenoid is
7 This hyperbolic cosine solution can be derived using separation of variables (see
Appendix A.3). However, there is a technicality in this derivation that is often overlooked, see
Exercise 1.6, but we need not worry about that now.
18 1 C ALCULUS OF VARIATIONS
F IGURE 1.12: The Goldschmidt solution is the union of disks around the
two endpoints, combined with a line that connects the centers of the two
disks. See Example 1.3.2.
k
m
enough, the dynamics of the mass is given by the Euler-Lagrange equation cor-
responding to
F (q, q̇) := 12 m q̇ 2 − 12 kq 2 ,
that is, the difference of the kinetic energy 12 m q̇ 2 of the mass and the potential
energy 12 kq 2 of the spring. Indeed, the Euler-Lagrange equation corresponding
to this F (q, q̇) is
∂ d ∂ 1 d
0= − ( 2 m q̇ 2 (t )− 12 k q 2 (t )) = −k q (t )− (m q̇ (t )) = −k q (t )−m q̈ (t ),
∂q dt ∂q̇ dt
∂F ( q (t ), q̇ (t ))
q̇ (t ) − F ( q (t ), q̇ (t )) = m q̇ 2 (t ) − 12 m q̇ 2 (t ) − 12 k q 2 (t )
∂q̇
= 12 m q̇ 2 (t ) + 12 k q 2 (t )
is constant over time. This quantity is nothing else than the total energy, that is,
kinetic plus potential energy. Thus the Beltrami identity is in this case the well-
known conservation of energy of a mechanical system with conservative forces
(in this case the spring force).
In general, in classical mechanics the difference of the kinetic and potential
energy F ( q (t ), q̇ (t )) is referred to as the Lagrangian, while the integral
T
F ( q (t ), q̇ (t )) dt
0
is referred to as the action integral. The stationary property of the action integral
is known as Hamilton’s principle; see, e.g., Lanczos (1986) for the close connec-
tion between the calculus of variations and classical mechanics.
20 1 C ALCULUS OF VARIATIONS
The Euler-Lagrange equation can directly be extended to the case that the inte-
gral J ( x ) depends on higher-order derivatives of x . Let us state explicitly the
second-order case.
x (0) = x0 , x (T ) = xT ,
(1.33)
ẋ (0) = x0d , ẋ (T ) = xTd ,
for given x 0 , x 0d , x T , x Td ∈ Rn . Suppose F is C 2 . A necessary condition that a C 2
function x ∗ minimizes (1.32) and satisfies (1.33) is that x ∗ is a solution of the
differential equation
∂ d ∂ d2 ∂
− + 2 F (t , x ∗ (t ), ẋ ∗ (t ), ẍ ∗ (t )) = 0 ∀t ∈ [0, T ]. (1.34)
∂x dt ∂ẋ dt ∂ẍ
Proof. We prove it for the case that F and x ∗ are C 3 . (If they are only C 2
then one can use the lemma of du Bois-Reymond as explained for the stan-
dard problem in Exercise 1.7.) Define J¯(α) = J ( x ∗ + αδx ) where δx : [0, T ] → Rn
is a C 3 perturbation that satisfies the boundary conditions δx (0) = δx (T ) = 0
and δ̇x (0) = δ̇(T ) = 0. Then, as before, the derivative J¯ (0) is zero. Analogously
to (1.16) we compute J¯ (0). For ease of exposition we momentarily omit all time
arguments in x ∗ (t ) and δx (t ) and, sometimes, F :
d T
0 = J¯ (0) = F (t , x ∗ + αδx , ẋ ∗ + αδ̇x , ẍ ∗ + αδ̈x ) dt
dα 0 α=0
T
∂F ∂F ∂F
= δ + T δ̇x + T δ̈x dt .
T x
(1.35)
0 ∂x ∂ẋ ∂ẍ
Integration by parts of the second term of the integrand yields
T T T T
∂F ∂F d ∂F d ∂F
δ̇
T x
dt = δ
T x
− δ x dt = − δx dt .
0 ∂ẋ ∂ẋ dt ∂ẋ dt ∂ẋ T
T
0 0 0
=0
The last equality follows from the boundary condition that δx (0) = δx (T ) = 0.
Integration by parts of the third term in (1.35) similarly gives
T T T T
∂F ∂F d ∂F d ∂F
δ̈ x dt = δ̇ x − δ̇ x dt = − δ̇x dt , (1.36)
0 ∂ẍ ∂ẍ dt ∂ẍ dt ∂ẍ T
T T T
0 0
0
=0
1.4 H IGHER-O RDER E ULER-L AGRANGE E QUATION 21
where now the second equality is the result of the boundary conditions that
δ̇x (0) = δ̇x (T ) = 0. In fact, we can apply integration by parts again on the final
term of (1.36) to obtain
T T T T
∂F d ∂F d ∂F d2 ∂F
δ̈ x dt = − δ̇ x dt = − δ x + δx dt .
0 ∂ẍ dt ∂ẍ T dt ∂ẍ T dt 2 ∂ẍ T
T
0 0 0
=0
x 0 x
y(x)
Example 1.4.2 (Elastic bar). Consider an elastic bar clamped at its two ends,
see Fig. 1.14. The bar bends under the influence of gravity. The horizontal and
vertical positions we denote by x and y, respectively. The shape of the bar is
modeled with the function y (x). We assume the bar has a uniform cross section
(independent of x). If the curvature of the elastic bar is not too large then the
potential energy due to elastic forces can be considered, up to first order, to be
proportional to the square of the second derivative,
2
k d y (x) 2
V1 := dx,
2 0 dx 2
where k is a constant depending on the elasticity of the bar. Furthermore, the
potential energy due to gravity is given by
V2 := g ρ(x) y (x) dx.
0
Here, ρ(x) is the mass density of the bar at x, and, again, we assume that the
curvature is small. The total potential energy thus is
k d2 y (x) 2
+ g ρ(x) y (x) dx.
0 2 dx 2
The minimal potential energy solution satisfies the Euler-Lagrange equa-
tion (1.34), and this gives the fourth-order differential equation
d4 y (x)
k = −g ρ(x) ∀x ∈ [0, ].
dx 4
22 1 C ALCULUS OF VARIATIONS
In the problems considered so far, the initial x (0) and final x (T ) were fixed. A
useful extension is obtained by removing some of these conditions. This means
that we allow more functions x to optimize over, and, consequently, we expect
that the Euler-Lagrange equation still holds for the optimal solution. To get an
idea we first look at an example.
Suppose x has three components and that the first component of x (0) and
the last component of x (T ) are free to choose:
⎡ ⎤ ⎡ ⎤
free fixed
x (0) = ⎣fixed⎦ , x (T ) = ⎣fixed⎦ . (1.37)
fixed free
In the proof of Theorem 1.2.2 we found the following necessary first-order con-
dition for optimality (Eqn. (1.18)):
T
∂F (t , x ∗ (t ), ẋ ∗ (t )) T
∂ d ∂ T
δx (t ) + − F (t , x ∗ (t ), ẋ ∗ (t )) δx (t ) dt = 0.
∂ẋ T 0 0 ∂x dt ∂ẋ
(1.38)
When is this equal to zero for every allowable perturbation? Since the perturbed
x ∗ (t ) + αδx (t ) for our example must obey the boundary condition (1.37) it fol-
lows that the allowable perturbations are exactly those that satisfy
⎡ ⎤ ⎡ ⎤
free 0
δx (0) = ⎣ 0 ⎦ , δx (T ) = ⎣ 0 ⎦ .
0 free
1.5 R ELAXED B OUNDARY C ONDITIONS 23
Clearly, the first-order condition (1.39) holds for all such δx iff
⎡ ⎤ ⎡ ⎤
0 free
∂F (0, x (0), ẋ (0)) ⎣ ∂F (T, x (T ), ẋ (T ))
= free⎦ , = ⎣free⎦ .
∂ẋ ∂ẋ
free 0
This example demonstrates that to every initial or final entry of x that is free to
choose there corresponds a condition on the derivative of F with respect to that
component of ẋ . Incidentally, by allowing functions x with free entries at initial
and/or final time, it can now make sense to include an initial- and/or final cost
to the cost function:
T
J (x) = F (t , x (t ), ẋ (t )) dt + G( x (0)) + K ( x (T )). (1.40)
0
Here G( x (0)) denotes an initial cost, and K ( x (T )) a final cost (also known as ter-
minal cost). The addition of these two costs does not complicate matters much,
as detailed in the next proposition.
This general result is needed in the next chapter when we tackle the optimal
control problem. A common special case is the free endpoint problem, which is
when x (0) is completely fixed and x (T ) is completely free. In the terminology
of Proposition 1.5.1 this means I0 = and IT = {1, . . . , n}. In this case Proposi-
tion 1.5.1 simplifies as follows.
Corollary 1.5.2 (Free endpoint). Let T > 0, x 0 ∈ Rn , and suppose both F : [0, T ] ×
Rn × Rn → R and K : Rn → R are C 1 . Necessary for a C 1 function x ∗ : [0, T ] → Rn
to minimize
T
J (x) = F (t , x (t ), ẋ (t )) dt + K ( x (T ))
0
24 1 C ALCULUS OF VARIATIONS
over all functions with x (0) = x 0 is that it satisfies the Euler-Lagrange equa-
tion (1.13) together with the free endpoint boundary condition
∂F (T, x ∗ (T ), ẋ ∗ (T )) ∂K ( x ∗ (T ))
+ = 0 ∈ Rn . (1.43)
∂ẋ ∂x
Example 1.5.3 (Quadratic cost with fixed and free endpoint). Let α ∈ R, and
consider minimization of
1
α2 x 2 (t ) + ẋ 2 (t ) dt (1.44)
−1
over all functions x : [−1, 1] → R. First we solve the standard problem, so where
both x (0) and x (T ) are fixed. For instance, assume that
ẍ (t ) = α2 x (t ).
This differential equation can be solved using characteristic equations (do this
yourself, see Appendix A.4), and the general solution is
with c, d two arbitrary constants. The two constants follow from the two bound-
ary conditions (1.45):
We stick to the same cost function (1.44). In the terminology of (1.40) this means
x (T ))
we take the initial and final costs equal to zero, G(x) = K (x) = 0. Hence ∂K (∂x =
0, and the free endpoint boundary condition (1.43) thus becomes
The solution is
e−α e+α
c= , d= ,
e2α + e−2α e2α + e−2α
(check it for yourself ). We see that also in this case the first-order conditions
together with the boundary condition have a unique solution,
The free endpoint condition is that the derivative of x is zero at the final time.
Again we see that the solution approaches zero fast if α is large.
The Euler-Lagrange equation was derived from the condition that minimizing
solutions x ∗ are necessarily stationary solutions, i.e., solutions for which
J ( x ∗ + αδx ) = J ( x ∗ ) + o(α)
26 1 C ALCULUS OF VARIATIONS
for every fixed admissible perturbation function δx and all scalars α. But not all
stationary solutions are minimizing solutions. To be minimizing the above term
“o(α)” needs to be nonnegative in a neighborhood of α = 0. In this section we
analyze this problem. We derive a necessary condition and a sufficient condition
for stationary solutions to be minimizing. These conditions are second-order
conditions and they require a second-order Taylor series expansion of F (t , x, y)
for fixed t around (x, y) ∈ Rn × Rn :
∂F (t , x, y) ∂F (t , x, y) δx
F (t , x + δx , y + δ y ) = F (t , x, y) +
∂x T ∂y T δy
⎡ 2 2 ⎤
∂ F (t , x, y) ∂ F (t , x, y)
1 T ⎢ ∂x∂x T ∂x∂y T ⎥
+ δx δTy ⎢ ⎥ δx (1.47)
2 ⎣ ∂2 F (t , x, y) ∂2 F (t , x, y) ⎦ δ y
∂y∂x T ∂y∂y T
Hessian of F
# #
# δ #2
+ o # δxy # .
(The role of the transpose is explained on page x. More details about this nota-
tion can be found in Appendix A.2.) We assume that F (t , x, y) is C 2 so the above
Taylor series is valid, and the 2n × 2n Hessian of F exists and is symmetric.
J¯(α) := J ( x ∗ + αδx ).
0 ∂ F (t , x ∗ , ẋ ∗ ) ∂ F (t , x ∗ , ẋ ∗ )
2
δ̇x
∂x∂ẋ ∂ẋ 2
Hessian
T
∂2 F 2 ∂ F2
∂ F 2
2
= δ
∂x 2 x
+ 2 ∂x∂ ẋ δx δ̇x + ∂ẋ 2 δ̇x dt . (1.49)
0
1.6 S ECOND -O RDER C ONDITIONS FOR M INIMALITY 27
Therefore
T
∂2 F
J¯ (0) = ∂2 F
δ2x + ∂∂ẋF2 δ̇2x dt .
2
d
∂x 2
− dt ∂x∂ẋ (1.50)
0
If x ∗ is optimal then J¯ (0) ≥ 0 for every allowable perturbation δx . Lemma 1.6.2
∂2 F (t , x ∗ (t ), ẋ ∗ (t ))
(presented next) applied to (1.50) shows that this implies that ∂ẋ 2
is
nonnegative for all time, i.e., that (1.48) holds. ■
ψ(t ) ≥ 0 ∀t ∈ [0, T ].
Proof. Suppose, on the contrary, that ψ(t̄ ) < 0 for some t̄ ∈ [0, T ]. Then for every
> 0 we can construct a possibly small interval [a, b] about t̄ in [0, T ] and a C 2
function δx on [0, T ] that is zero for t ∈ [a, b] and that satisfies
b b
δ2x (t ) dt < and δ̇2x (t ) dt > 1.
a a
This may be clear from Figure 1.15. Such a δx satisfies all the conditions of the
lemma but renders the integral in (1.51) negative for small enough > 0. That is
a contradiction, and so the assumption that ψ(t̄ ) < 0 is wrong. ■
x (t )
0 a T
b
(t )
F IGURE 1.15: About the construction of a δx (t ) that violates (1.51). See the
proof of Lemma 1.6.2.
28 1 C ALCULUS OF VARIATIONS
ẋ T (which is an n × n
matrix if x has n components) is a symmetric positive semi-definite matrix at
every moment in time.
Example 1.6.3 (Example 1.1.3 continued). The running cost of Example 1.1.3 is
Example 1.6.4 (Example 1.5.3 continued). The running cost of Example 1.5.3 is
F (t , x, ẋ) = α2 x 2 + ẋ 2 . Therefore ∂2 F (t , x (t ), ẋ (t ))/∂ẋ 2 = 2 ≥ 0 for all functions x
and all t . This holds in particular for x ∗ , so the Legendre condition holds.
This is derived from (1.5), but we added a minus sign because the application is
about maximization, not minimization. Now
∂2 F (t , x, ẋ)
= −u φ(x) − ẋ e−αt ,
∂ẋ 2
and this is nonnegative for every t , x, ẋ since the utility function u is assumed
to be concave, i.e., u (c) ≤ 0 for all c > 0. So, apart from the standard economic
interpretation that utility functions are concave, this assumption is also crucial
for the optimization problem to have a solution.
x (0) = x (1) = 0,
x ∗ (t ) = A sin(2πt ), A ∈ R.
∂2 F (t , x ∗ (t ), ẋ ∗ (t )) 2
= > 0.
∂ẋ 2 (2π)2
Also, each such x ∗ renders the integral in (1.52) equal to zero. There are how-
ever many other functions x that satisfy x (0) = x (1) = 0 but for which the inte-
gral (1.52) takes a negative value. For example x (t ) = −t 2 + t . By scaling this last
function with a constant we can make the cost as negative as we desire. Thus in
this example there is no optimal solution x ∗ .
A closer look at the proof of Theorem 1.6.1 actually provides us with an ele-
gant sufficient condition for optimality, in fact for global optimality. If the Hes-
sian of F , defined earlier as
⎡ 2 ⎤
∂ F (t , x, y) ∂2 F (t , x, y)
⎢ ∂x∂x T ∂x∂y T ⎥
⎢ ⎥
H (t , x, y) := ⎢ 2 ⎥, (1.53)
⎣ ∂ F (t , x, y) ∂2 F (t , x, y) ⎦
∂y∂x T ∂y∂y T
for each t is positive semi-definite for all x ∈ Rn and all y ∈ Rn , then at each t the
running cost F (t , x, ẋ) is convex in x, ẋ (see Appendix A.7). For convex functions
it is known that stationarity implies global optimality:
Proof. Suppose that the Hessian is positive semi-definite. Let x ∗ , x be two func-
tions that satisfy the boundary conditions, and suppose x ∗ satisfies the Euler-
Lagrange equation. Define the function δ = x − x ∗ and J¯(α) = J ( x ∗ + αδ). This
way J¯(0) = J ( x ∗ ) while J¯(1) = J ( x ). We need to prove that J¯(1) ≥ J¯(0).
8 The relation between positive semi-definite Hessians and convexity is explained in
Appendix A.7.
30 1 C ALCULUS OF VARIATIONS
As before, we have that J¯ (0) is zero by the fact that x ∗ satisfies the Euler-
Lagrange equation.
The second derivative of J¯(α) with respect to α is (omitting time arguments)
T
T δ
J¯ (α) = δ δ̇T H (t , x ∗ + αδ, ẋ ∗ + αδ̇) dt .
0 δ̇
This result produces a lot, but also requires a lot. Indeed the convexity
assumption fails in many cases of interest. Here are a couple examples where
the convexity assumption is satisfied.
∂F (x, y, ẏ) ẏ
= ,
∂ ẏ (1 + ẏ 2 )1/2
and
∂2 F (x, y, ẏ) 1
= .
∂ ẏ 2 (1 + ẏ 2 )3/2
Clearly, this second derivative is positive for all y, ẏ ∈ R. This implies that
the solution y ∗ found in Example 1.2.5—namely, the straight line through the
points (x 0 , y 0 ) and (x 1 , y 1 )—satisfies the Legendre condition.
The Hessian (1.53) is
$ %
0 0
H (x, y, ẏ) = ≥ 0.
0 (1+ ẏ12 )3/2
Example 1.6.9 (Quadratic cost; Example 1.5.3 continued). For the quadratic
cost
1
J ( x ) := α2 x 2 (t ) + ẋ 2 (t ) dt ,
−1
1.7 I NTEGRAL C ONSTRAINTS 31
2α2 0
H (t , x, ẋ) = .
0 2
This Hessian is positive definite for every α = 0 and, hence, the solution x ∗ of
the Euler-Lagrange equation found in Example 1.5.3 is the unique optimal solu-
tion of the problem. For α = 0, the Hessian is positive semi-definite, so Theo-
rem 1.6.7 guarantees that x ∗ is optimal, but possibly not unique. (Actually, for
α = 0 the solution x ∗ found in Example 1.5.3 is the unique differentiable optimal
solution because it achieves a zero cost, J ( x ∗ ) = 0, and for all other differentiable
x the cost is positive).
The Legendre condition (1.48) is only one of several necessary conditions for
optimality. Additional necessary conditions go under the names of Weierstrass
and Jacobi. Actually, the necessary condition of Weierstrass follows nicely from
the dynamic programming approach as explained in Chapter 3, Exercise 3.10
(p. 114).
One can pose many different types of problems in the calculus of varia-
tions by giving different boundary conditions, for instance, involving ẋ (T ), or by
imposing further constraints on the required solution. An example of the latter
we saw in (1.8) where ẋ (t ) needs to be nonnegative for all time. Also, in Exer-
cise 1.18, we explain what to do if x (T ) needs to satisfy an inequality. Another
variation is considered in the next section.
F IGURE 1.16: Three areas enclosed by ropes of the same length. See § 1.7.
The standard example of this type is Queen Dido’s isoperimetric problem. This
is the problem of determining an area as large as possible that is enclosed by
a rope of a given length. Intuition tells us that the optimal area is a disk (the
32 1 C ALCULUS OF VARIATIONS
for a given .
How to solve such constrained minimization problems? A quick-and-dirty
argument goes as follows: from calculus it is known that the solution of a min-
imization problem of some function J ( x ) subject to the constraint C ( x ) − c 0 = 0
is a stationary solution of the augmented function J defined as
T
J( x , μ) := J ( x ) + μ(C ( x ) − c0 ) = F (t , x (t ), ẋ (t )) + μM (t , x (t ), ẋ (t )) dt − μc 0
0
9 Lagrange multipliers are usually denoted as λ. We use μ in order to avoid a confusion in the
next chapter.
1.7 I NTEGRAL C ONSTRAINTS 33
Proof. This is not an easy proof. Suppose x ∗ solves the constrained minimiza-
tion problem, and fix two C 2 functions δx , x that vanish at the boundaries,
δx (0) = 0 = x (0), δx (T ) = 0 = x (T ).
T T
Define J ( x ) = 0 F (t , x (t ), ẋ (t )) dt and C ( x ) = 0 M (t , x (t ), ẋ (t )) dt and consider
the mapping that sends two real numbers (α, β) to the two real numbers
J¯(α, β) J ( x ∗ + αδx + β x )
:= .
C̄ (α, β) C ( x ∗ + αδx + β x )
The mapping from (α, β) to ( J¯(α, β), C̄ (α, β)) is C 1 . So if the Jacobian at (α, β) =
(0, 0),
⎡ ¯ ⎤
∂ J (α, β) ∂ J¯(α, β)
⎢ ∂α ∂β ⎥
D := ⎢ ⎥
⎣ ∂C̄ (α, β) ∂C̄ (α, β) ⎦ (1.56)
∂α ∂β (α=0,β=0)
The theorem says that the solution x ∗ satisfies either (1.54) or (1.55). The
first of these two is called the normal case, and the second the abnormal case.
Notice that the abnormal case completely neglects the running cost F . The next
example indicates that we usually have the normal case.
1
Clearly, out of these two, the cost J ( x ∗ ) := 0 x ∗ (t ) dt is minimal for the positive
solution μ∗ .
In the abnormal case, (1.55), we have that
∂ d ∂
0= − ẋ 2 (t ) = −2 ẍ ∗ (t ).
∂x dt ∂ẋ ∗
Hence x ∗ (t ) = bt + c for some b, c. Given the boundary conditions x (0) =
0, x (1) = 1 it is immediate that this allows for only one solution: x ∗ (t ) = t :
1.8 Exercises
(a) F (t , x, ẋ) = ẋ 2 − α2 x 2 .
(b) F (t , x, ẋ) = ẋ 2 + 2x.
(c) F (t , x, ẋ) = ẋ 2 + 4t ẋ.
(d) F (t , x, ẋ) = ẋ 2 + x ẋ + x 2 .
(e) F (t , x, ẋ) = x 2 + 2t x ẋ (this one is curious).
F (t , x (t ), ẋ (t )) = dt G(t , x (t ))
d
1.4 Technical problem: the lack of Lipschitz continuity in the Beltrami identity
for the brachistochrone problem, and how to circumvent it. The footnote
of Example 1.3.1 derives the cycloid equations (1.29) from
The derivation was quick, and this exercise shows that it was a bit dirty as
well.
dy
(a) Let x (φ), y (φ) be the cycloid solution (1.29). Use the identity dx =
dy/dφ
dx/dφ to show that they satisfy (1.61).
(b) The curve of this cycloid solution for φ ∈ [0, 2π] is
Argue that for every Δ ≥ 0 also this new function satisfies the Bel-
trami identity (1.61) for all x ∈ (0, c 2 π + Δ).
(c) This is not what the footnote of Example 1.3.1 says. What goes wrong
in this footnote?
(d) This new function y (x) is constant over the interval [ c 2π , c 2π + Δ].
2 2
Show that a constant function y (x) does not satisfy the Euler-
Lagrange equation of the brachistochrone problem.
(e) It can be shown that y (x) solves (1.61) iff it is of this new form for
some Δ ≥ 0 (possibly Δ = ∞). Argue that the only function that sat-
isfies the Euler-Lagrange equation with y (0) = 0 is the cycloid solu-
tion (1.29).
y
y(x)
y1
air speed v 0 x1
d r (x)
= dx.
r 2 (x)/a 2 − 1
(a) For G(x) = K (x) = 0 the first-order conditions are that (1.38) holds for
all possible perturbations. Adapt this equation for the case that G(x)
and K (x) are arbitrary C 1 functions.
(b) Prove that this equality implies that the Euler-Lagrange equation
holds.
(c) Finish the proof of Proposition 1.5.1.
1.11 Show that the minimal surface example (Example 1.3.2) satisfies the Leg-
endre second-order necessary condition of Theorem 1.6.1.
1.13 Show that the minimization problem in Example 1.2.8 satisfies the Legen-
dre condition. [Hint: The condition now involves a 2 × 2 matrix.]
1.14 The optimal solar challenge. A solar vehicle receives power from solar
radiation. This power p(x, t ) depends on position x (due to clouds) and
on time t (due to moving clouds and the sun’s angle of inclination). Driv-
ing at some speed ẋ also consumes power. Denote this power loss by f (ẋ).
This assumes that it is a function of speed alone, which is reasonable if we
do not change speed aggressively and if friction depends only on speed.
Driving at higher speed requires more energy per meter than driving at
lower speed. This means that f is convex, in fact
x (0) = 0,
1.8 E XERCISES 41
f (ẋ) = ẋ 2
p(x, t ) = q(x),
i.e., that the sun’s angle does not change much over our time window
[0, T ] and that clouds are not moving. Use the Beltrami identity to
express ẋ (t ) in terms of q( x (t )) and the initial speed ẋ (0) and initial
q(0).
(e) Argue once again (but now using the explicit relation of the previous
part) that we should speed up if we drive into a cloud.
(f ) (A computer might be useful for this part.) Continue with f (ẋ) = ẋ 2
and p(x, t ) = q(x). Suppose that up to position x = 20 the sky is clear
but that from x = 20 onwards heavy clouds limit the power input:
100 x < 20,
q(x) =
4 x > 20.
over all functions x : [0, 1]→R that satisfy the boundary conditions x (0) = 0,
x (1) = 1.
x (0) = 0, x (1) = 1
1.17 Smoothness. This exercise is from Liberzon (2012). It shows that smooth
running costs F may result in non-smooth optimal solutions x ∗ . Consider
minimization of
1
J (x) = (1 − ẋ (t ))2 x 2 (t ) dt
−1
x (−1) = 0, x (1) = 1.
(a) Show that optimal solutions x ∗ must obey the Euler-Lagrange equa-
tion, and the inequality
∂F ( x ∗ (T ), ẋ ∗ (T ), T )
≥ 0.
∂ẋ
1
(b) Verify this statement for the cost 0 ( x (t ) − ẋ (t ))2 dt with x (0) =
1, x (1) ≥ x T , and distinguish the cases x T ≤ e and x T > e.
1.19 The hanging cable. Every hanging cable eventually comes to a halt in a
position of minimal energy, such as these three:
What is the shape of this minimal energy position? When hanging still it
has no kinetic energy, it only has potential energy. If the cable is very flex-
ible then the potential energy is only due to its height y. We assume that
the cable is very thin, does not stretch and that it has a constant mass per
unit length. In a constant gravitational field with gravitational accelera-
tion g the potential energy J ( y ) equals
x 1
J (y) = ρg y (x) 1 + ẏ 2 (x) dx,
x0
with ρ the mass per unit length of the cable. We want to minimize the
potential energy over all functions y : [x 0 , x 1 ] → R, subject to y (x 0 ) =
y 0 , y (x 1 ) = y 1 and such that the length of the cable is . The length of the
cable can be expressed as
x 1
1 + ẏ 2 (x) dx = .
x0
(a) Consider first the normal case, and the associated Euler-Lagrange
equation (1.54). Analyze the Beltrami identity of this case to show
that the minimal energy solution y ∗ satisfies
y ∗ (x) + μ∗ ρg
1
= a 1 + ẏ 2 (x)
1.21 Consider Example 1.7.2. Prove that for C < 1 there is no smooth function
that satisfies the boundary conditions and integral constraint.
with δx (0) = δx (T ) = 0.
1.8 E XERCISES 45
T&
−1 ∂F (t , x T&
−1
∗ (t ), x ∗ (t + 1)) ∂F (t , x ∗ (t ), x ∗ (t + 1))
δx (t )+ δx (t +1) = 0
t =0 ∂x 1T t =0 ∂x 2T
T&
−1 ∂F (t , x ∗ (t ), x ∗ (t + 1)) ∂F (t − 1, x ∗ (t − 1), x ∗ (t ))
+ δx (t ) = 0,
t =1 ∂x 1T ∂x 2T
∂F (t , x ∗ (t ), x ∗ (t + 1)) ∂F (t − 1, x ∗ (t − 1), x ∗ (t ))
+ =0
∂x 1 ∂x 2
for all t = 1, . . . , T − 1. This system of equations can be called the dis-
crete Euler-Lagrange equation.
(c) Extend this to the minimization of (with S( x (T )) some final cost)
T&
−1
F (t , x (t ), x (t + 1)) + S( x (T ))
t =0
F (t , x (t ), x (t + 1)) := F̃ (t , x (t ), x (t + 1) − x (t )).
Minimum Principle
In the solar challenge problem (Exercise 1.14) we assumed that we could choose
the speed ẋ of the car at will, but in reality, the speed is limited by the dynamics
of the car. For instance, the acceleration of the car is bounded. In this chap-
ter, we take such dynamical constraints into account. We assume that the state
x : [0, T ] → Rn satisfies a system of differential equations with initial state
and that we can not choose x directly but only can choose u , which is known
as the input of the system. Furthermore, the input is restricted to take values in
some given set U ⊆ Rm , that is,
u : [0, T ] → U. (2.2)
For instance, in a car-parking problem, the input u might be the throttle open-
ing and this takes values in between u = 0 (fully closed) and u = 1 (fully open),
so then U = [0, 1]. For a given U and given (2.1), the optimal control problem is
to determine an input u : [0, T ] → U that minimizes a given cost function of the
form
T
J ( u ) := L( x (t ), u (t )) dt + K ( x (T )). (2.3)
0
control problem is to find, for a given system (2.1) and x T and U, an input u :
[0, T ] → U that minimizes
T
L( x (t ), u (t )) dt
0
xT xT
x
x
reachable
states x(t ) T
0 x(t ) dt
0 t T 0 T 0 T xT T
F IGURE 2.1: Reachable states and candidate optimal states for the optimal
control problem of Example 2.1.1.
This cost equals 12 (x T − 1)2 − 12 and, therefore, x T = 1 achieves the smallest pos-
sible cost over all x T . Now we solved the problem: optimal is x T = 1, and the
optimal state x ∗ (t ) is zero for all t ≤ T −x T = T −1, and increases with ẋ ∗ (t ) = +1
for all t > T − 1. Therefore we conclude that the optimal control u ∗ is
0 if t < T − 1,
u ∗ (t ) = (2.4)
1 if t > T − 1.
The derivation in this example is ad hoc. We want a theory that can deal
with optimal control problems systematically, including problems whose solu-
tion is discontinuous. To develop this theory we first assume that all functions
involved are sufficiently smooth, and that U = Rm . Combined with the classic
method of Lagrange multipliers we can then employ the theory of calculus of
variations, and this provides first-order conditions that optimal controls must
satisfy. This is derived in § 2.3. Motivated by these first-order conditions, we
then formulate and prove the truly fabulous minimum principle of Pontrya-
gin (§ 2.5). This result shocked the scientific community when it was presented
in the late fifties of the previous century. The minimum principle is very gen-
eral, and it provides necessary conditions for a control to be optimal, even if
the optimal control is discontinuous. In many applications, these conditions are
numerically tractable and allow us to construct the optimal control, assuming
one exists. But be warned: the proof of the minimum principle is involved.
The method of Lagrange multipliers can help to find minimizers. In short, the
idea is to associate with this constrained problem in z an unconstrained prob-
lem in (z, λ) with cost function
and the components of the vector λ are known as Lagrange multipliers. Assum-
ing J is sufficiently smooth, a pair (z ∗ , λ∗ ) is a stationary solution of the uncon-
strained cost J(z, λ) over all z and λ iff both gradients vanish,
∂J(z ∗ , λ∗ ) ∂J(z ∗ , λ∗ )
= 0, = 0. (2.6)
∂z ∂λ
The gradient of J(z, λ) with respect to λ is G T (z). Hence, stationary solutions
(z ∗ , λ∗ ) of J(z, λ) necessarily satisfy G(z ∗ ) = 0, and, therefore, J(z ∗ , λ∗ ) = J (z ∗ ).
In fact, under mild assumptions, the unconstrained first-order conditions (2.6)
are equivalent to the first-order conditions of the constrained minimization
problem (2.5), see Appendix A.8 for details.
For the optimal control problem, we take a similar approach, however with
the complication that we are not dealing with a minimization over a finite num-
ber of variables z ∈ Rn , but over uncountably many functions u , x , and the con-
straints are the dynamical constraints ẋ (t ) = f ( x (t ), u (t )), and these need to be
satisfied for all t ∈ [0, T ].
subject to
U = Rm ,
and we further assume for now that all functions involved are sufficiently
smooth.
The optimal control problem can be regarded as a constrained optimization
problem, with (2.7) being the dynamical constraint. This observation provides a
clue to its solution: introduce Lagrange multiplier functions p : [0, T ] → Rn cor-
responding to these dynamical constraints. Analogous to the classic Lagrange
multiplier method, we introduce an augmented running cost L : Rn × Rn × U ×
Rn → R, defined as
and analyze the first-order conditions for the corresponding cost. That is, we
want to know which conditions are satisfied by stationary solutions
q ∗ :=( x ∗ , p ∗ , u ∗ )
2.3 F IRST- ORDER C ONDITIONS FOR U NBOUNDED AND S MOOTH C ONTROLS 51
The final function is known as the (optimal control) Hamiltonian and it plays a
central role in optimal control. First, we use it to formulate the necessary first-
order conditions for the augmented problem:
∂L( q (T ), q̇ (T )) ∂K ( x (T )) ∂K ( x (T ))
0= + = − p (T ) + ,
∂ẋ ∂x ∂x
∂L( q (T ), q̇ (T )) ∂K ( x (T ))
0= + = 0 + 0,
∂ṗ ∂p
∂L( q (T ), q̇ (T )) ∂K ( x (T ))
0= + = 0 + 0.
∂u̇ ∂u
x (T ))
The first says that p (T ) = ∂K (∂x , and the other two are void.
Since we have an initial condition on x but not on p and u , the free initial-
point conditions (1.41) on q need to hold for the components p and u (see
∂L( q (0), q̇ (0))
Proposition 1.5.1). The initial-point conditions become 0 = ∂q̇ , and for
the respective components p and u , this gives
∂H ( x (t ), p (t ), u (t ))
= f ( x (t ), u (t )).
∂p
Therefore, the first Hamiltonian equation (2.11a) is nothing else than the given
system equation: ẋ (t ) = f ( x (t ), u (t )), x (0) = x 0 .
The Lagrange multiplier p is called the costate (because mathematically, it
lives in a dual space to the (variations) of the state x ). In examples it often has
2.4 T OWARDS THE M INIMUM P RINCIPLE 53
Based on the previous section, one would conjecture that smooth optimal con-
trols, for U = Rm , must satisfy the first-order conditions of the augmented prob-
lem (Lemma 2.3.1). Specifically, if u ∗ is an optimal control, and x ∗ is the result-
ing optimal state, then one would conjecture the existence of a function p ∗ that
satisfies
∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂K ( x ∗ (T ))
ṗ ∗ (t ) = − , p ∗ (T ) = ,
∂x ∂x
and such that ( x ∗ , p ∗ , u ∗ ) satisfies (2.11c). We will soon see that that is indeed
the case (under some mild smoothness assumption). In fact, it holds in a far
more general setting. To motivate this general result, it is instructive to rewrite
the Legendre condition of calculus of variations problems in terms of Hamilto-
nians:
The Legendre condition says that optimal solutions of this calculus of vari-
ations problem must satisfy
∂2 F ( x ∗ (t ), ẋ ∗ (t ))
≥0 ∀t ∈ [0, T ].
∂ẋ∂ẋ T
54 2 M INIMUM P RINCIPLE
From the equality H (x, p, u) = p T u +F (x, u), it follows immediately that the Leg-
endre condition for our problem, in terms of the Hamiltonian, is that
∂2 H ( x ∗ (t ), p (t ), u ∗ (t ))
≥0 ∀t ∈ [0, T ], (2.13)
∂u∂u T
for whatever p (t ).
is negative. Then, by continuity, for some small enough > 0 the function
defined as
û if t ∈ [t̄ , t̄ + ],
ū (t ) =
u ∗ (t ) elsewhere
achieves a smaller (or equal) value of the Hamiltonian for all time, and
T
H ( x ∗ (t ), p ∗ (t ), ū (t )) − H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) dt = c + o().
0
ū (t ) = u ∗ (t ) + δu (t ).
In the rest of the proof we fix this perturbation and we consider only very small
and positive . Such perturbations are called “needle” perturbations.
By perturbing the input, ū = u ∗ + δu , the solution of ẋ (t ) = f ( x (t ), u ∗ (t ) +
δu (t )) for t > t̄ perturbs as well. Denote the perturbed state as x (t ) = x ∗ (t ) +
δx (t ). The perturbation δx (t ) is probably not a needle but at each t > t̄ it is of
56 2 M INIMUM P RINCIPLE
order3 . To avoid clutter, we now drop all time arguments, that is, x (t ) is simply
denoted as x , etc. The derivative of δx with respect to time satisfies
Δ = J ( u ∗ + δu ) − J ( u ∗ )
T
= K ( x ∗ (T ) + δx (T )) − K ( x ∗ (T )) + L( x ∗ + δx , u ∗ + δu ) − L( x ∗ , u ∗ ) dt
0
T
∂K ( x ∗ (T ))
= δ x (T ) + L( x ∗ + δx , u ∗ + δu ) − L( x ∗ , u ∗ ) dt + o().
∂x T 0
Next use that L(x, u) = −p T f (x, u) + H (x, p, u), and substitute p for the optimal
costate p ∗ :
T
Δ = p ∗ (T )δx (T ) +
T
− p ∗T [ f ( x ∗ + δx , u ∗ + δu ) − f ( x ∗ , u ∗ )] dt
0
T
+ H ( x ∗ + δx , p ∗ , u ∗ + δu ) − H ( x ∗ , p ∗ , u ∗ ) dt + o().
0
Here, we also subtracted and added a term H ( x ∗ , p ∗ , u ∗ +δu ). The reason is that
now the difference of the first two Hamiltonian terms can be recognized as an
approximate partial derivative with respect to x, and the difference of the final
two Hamiltonian terms is what we considered earlier (it equals c + o()). So:
T
∂H ( x ∗ , p ∗ , u ∗ + δu )
Δ = p ∗T (T )δx (T ) + − p ∗T δ̇x + δx dt + c + o().
0 ∂x T
∂H ( x , p , u +δ ) ∂H ( x , p , u )
Notice that the partial derivative ∗ ∗ ∗
∂x
u
equals − ṗ ∗ = ∗ ∗ ∗
∂x every-
where except for units of time (for t ∈ [t̄ , t̄ + ]). This, combined with the fact
that δx at each moment in time is also of order , allows us to conclude that
T
Δ = p ∗ (T )δx (T ) +
T
− p ∗T δ̇x − ṗ ∗T δx dt + c + o().
0
Here we used that δx (0) = 0. This is because of the initial condition x (0) = x 0 .
Since c < 0 we see that Δ is negative for small enough . But that would mean
that ū for small enough achieves a smaller cost than optimal. Not possible.
Hence, the assumption that u ∗ (t ) does not minimize the Hamiltonian at every
t where u ∗ (t ) is continuous, is wrong. ■
The theory of the minimum principle was developed during the 1950s in the
former Soviet Union by a group of mathematicians led by Lev Pontryagin, and
in honor of him it is called Pontryagin’s minimum principle. Actually, Pontrya-
gin followed the classical mechanics sign convention, p T f (x, u) minus L(x, u).
Hence, the principle is better known as Pontryagin’s maximum principle.
The principle assumes the existence of an optimal control u ∗ , and then guar-
antees that u ∗ minimizes the Hamiltonian at each moment in time. In practical
situations, this pointwise minimization is used to determine the optimal con-
trol, tacitly assuming an optimal control exists. Hence, one could say that the
principle provides necessary conditions for optimality. In Section 2.8, we discuss
under which conditions these conditions are sufficient as well; see also Chap-
ter 3 for the alternative approach offered by dynamic programming.
U = [−1, 1].
H (x, p, u) = pu + x,
p ∗ (t ) = 1 − t .
58 2 M INIMUM P RINCIPLE
u ∗ (t ) = −1 ∀t ∈ [0, 1].
1
This makes perfect sense: to minimize 0 x (t ) dt , we want x (t ) to go down as
fast as possible, which, given the system dynamics ẋ (t ) = u (t ), means taking
u (t ) as small (negative) as possible.
∂K ( x ∗ (1)) 1
ṗ ∗ (t ) = −1, p ∗ (1) = =− .
∂x 2
Hence,
1
p ∗ (t ) = − t .
2
The costate is positive for 0 ≤ t < 12 but negative for 12 < t ≤ 1. The optimal
control minimizes the Hamiltonian p ∗ (t ) u ∗ (t ) + x ∗ (t ), and, because of the sign
change in p ∗ (t ) at t = 1/2, we see that the optimal input switches sign at t = 1/2:
1
−1 if 0 ≤ t <
u ∗ (t ) = 1
2
.
+1 if 2 <t ≤1
Example 2.5.4 (Linear system with quadratic cost). Consider the system with
cost
1
ẋ (t ) = u (t ), x (0) = x0 , J (u) = x 2 (t ) + u 2 (t ) dt .
0
2.5 M INIMUM P RINCIPLE 59
Now we allow every u (t ) ∈ R, that is, U = R. Notice that this is the same cost func-
tion as in Example 1.5.3 (for α = 1) because u (t ) = ẋ (t ). The associated Hamilto-
nian is
H (x, p, u) = pu + x 2 + u 2 .
u ∗ (t ) = − 12 p ∗ (t ), (2.17)
ẋ ∗ (t ) = − 12 p ∗ (t ), x ∗ (0) = x0 ,
ṗ ∗ (t ) = −2 x ∗ (t ), p ∗ (1) = 0.
This implies that p̈ ∗ (t ) = p ∗ (t ). The general solution of this second-order differ-
ential equation is
p ∗ (t ) = c1 et +c2 e−t ,
and since x ∗ (t ) = − 12 ṗ ∗ (t ), we find that
x ∗ (t ) = − 12 c1 et + 12 c2 e−t .
The two constants c 1 , c 2 , follow uniquely from the two boundary conditions
x ∗ (0) = x0 and p ∗ (1) = 0, and it gives (verify this yourself )
x0
x ∗ (t ) = e1−t + et −1 ,
e + e−1
2x 0
p ∗ (t ) = e1−t − et −1 .
e + e−1
The optimal control u ∗ (t ) now follows from (2.17).
ẋ (t ) = α u (t ) x (t ), x (0) = x0 ,
ṗ (t ) = (1 − u (t )) − p (t )α u (t ), p (T ) = 0.
These differential equations are, in their present form, still hard to solve. How-
ever, the Hamiltonian (2.18) is linear in u, so the minimizer u of the Hamilto-
nian (2.18) depends solely on the sign of x(1 + αp). In fact, since the production
rate x (t ) is inherently positive (because x (0) = x 0 > 0 and ẋ (t ) = α u (t ) x (t ) ≥ 0),
the Hamiltonian at each moment in time is minimal for
0 if 1 + α p ∗ (t ) > 0,
u ∗ (t ) =
1 if 1 + α p ∗ (t ) < 0.
ṗ ∗ (t ) = 1 near t = T , and p ∗ (T ) = 0.
That is, p ∗ (t ) = t − T near t = T , see Fig. 2.2. Solving backwards in time, starting
at t = T , we see that the costate reduces linearly, until at time
t s := T − 1/α
it reaches the level p ∗ (t s ) = −1/α < 0 at which point u ∗ (t ) switches sign. Since
ṗ (t ) > 0 for every input, the value of p ∗ (t ) is less than −1/α for all t < ts , which
implies that u ∗ (t ) = 1 all t < t s . For this case, the Hamiltonian dynamics simplify
to
ẋ ∗ (t ) = α x ∗ (t ), ṗ ∗ (t ) = −α p ∗ (t ) if t < t s .
The Hamiltonian was derived from the Beltrami identity (see Eqn. (2.10)).
Hence, we could expect that H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) is constant as a function of
time. For the unconstrained inputs (U = Rm ) and smooth enough solutions,
this may easily be verified directly from the first-order equations for optimal-
ity expressed in Lemma 2.3.1. Indeed, if ( x ∗ , p ∗ , u ∗ ) are a smooth triple satisfy-
ing (2.11), then a direct computation yields (and for the sake of exposition, we
momentarily drop here the arguments of H and other functions)
d ∂H ∂H ∂H
H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) = T ẋ ∗ + ṗ + T u̇ ∗
T ∗
dt ∂x ∂p ∂u
∂H ∂H ∂H ∂H ∂H
= T + (− ) + T u̇ ∗
∂x ∂p ∂p T
∂x ∂u
∂H
= u̇ ∗ (2.19)
∂u T
= 0.
The final equality follows from (2.11c). Actually, the constancy of the Hamil-
tonian H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) also holds for restricted input sets U (such as U =
[0, 1], etc.). This is remarkable because in such cases the input quite often is not
even continuous.
62 2 M INIMUM P RINCIPLE
h(t0 )
h(t )
t0 t
H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) = H∗
are the same. In particular, the limits in (2.21) and (2.23) are the same. It
means that the possible discontinuity at t 0 is removable. This shows that
H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) equals some continuous function H∗ (t ) at all but finitely
many t .
Now let t be a time at which u ∗ is C 1 . Since x ∗ and p ∗ are C 1 at t , the equal-
ity (2.19) gives
dH ( x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t ))
q := = u̇ ∗ (t ). (2.24)
dt ∂u T
This q is zero by the fact that u ∗ (t ) minimizes the Hamiltonian. To see this more
clearly, realize that this q also enters the following Taylor series:
H ( x ∗ (t ), p ∗ (t ), u ∗ (t + )) = H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) + q + o().
Therefore, the description of the optimal state trajectory also switches halfway:
from ẋ (t ) = u ∗ (t ) it follows that
Based on this, it seems unlikely that the Hamiltonian along the optimal solution
is constant, but realize that p ∗ (t ) u ∗ (t ) equals
H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) = p ∗ (t ) u ∗ (t ) + x ∗ (t )
−( 1 − t ) + (x 0 − t ) if t < 1/2
= 12
( 2 − t ) + (x 0 − 1 + t ) if t ≥ 1/2
= x 0 − 12 for all t ∈ [0, 1].
So far in this chapter the final state x (T ) was not constrained. In quite a few
applications, however, there are constraints on the final state x (T ). (Note fur-
thermore that in Chapter 1, calculus of variations, we actually started with a
fully constrained x (T ).) In the car parking application, for instance, we obvi-
ously want the speed of the car to equal zero at the final time. Let r denote the
number of components of the final state that are constrained. Without loss of
generality, we assume these to be the first r components. So consider the sys-
tem with initial and final conditions
Keep in mind that no conditions are imposed on the remaining final state com-
ponents x r +1 (T ), . . . , x n (T ). As before we take a cost of the form
T
J (u) = L( x (t ), u (t )) dt + K ( x (T )). (2.26)
0
were derived from the free endpoint condition (1.42), but in Proposition 1.5.1
we saw that these conditions are absent if the final state is constrained. With
that in mind, it will be no surprise that fixing the first r components of the final
state, x i (T ), i = 1, . . . , r , implies that the conditions on the corresponding first r
components of the final costate are absent, i.e., only the remaining components
of p ∗ (T ) are constrained:
∂K ( x ∗ (T ))
p ∗i (T ) = , i = r + 1, . . . , n.
∂x i
That is indeed the case. However, there is a catch: the first-order conditions were
derived using a perturbation of the solution x ∗ , but if we have constraints on the
both initial state and final state, then it may happen that nonzero perturbations
do not exist. An example is
ẋ (t ) = u 2 (t ), x (0) = 0, x (1) = 0.
Clearly, in this case, there is only one feasible control and one feasible state:
the zero function. (This system will be the starting point of Example 2.6.2.) We
have seen similar difficulties in the calculus of variations problems subject to
integral constraints, with its “normal” and “abnormal” Euler-Lagrange equation
(have a look at § 1.7, in particular, Example 1.7.2). Also now we make a dis-
tinction between a normal and an abnormal case, but the proof of the resulting
theorem is involved, and it would take too long to explain the details here. The
interested reader might want to consult the excellent book (Liberzon, 2012). We
just provide the solution. It involves the modified Hamiltonian defined as
It is the Hamiltonian but with an extra parameter λ, and this parameter is either
zero or one,
λ ∈ {0, 1}.
mal state. Then there is a function p ∗ : [0, T ] → Rn and a constant λ ∈ {0, 1} such
that (λ, p ∗ (t )) = (0, 0) for all t ∈ [0, T ], and
∂Hλ ( x ∗ (t ), p ∗ (t ), u ∗ (t ))
ẋ ∗ (t ) = , x ∗ (0) = x0 , x ∗i (T ) = x̂i , i = 1, . . . , r,
∂p
(2.27a)
∂Hλ ( x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂K ( x ∗ (T ))
ṗ ∗ (t ) = − , p ∗i (T ) = , i = r + 1, . . . , n,
∂x ∂x i
(2.27b)
ẋ (t ) = u 2 (t ), x (0) = 0, x (1) = 0,
As mentioned before, the only feasible control is the zero function. So the min-
imal cost is 0, and x ∗ (t ) = u ∗ (t ) = 0 for all time. The modified Hamiltonian is
If we try to solve the normal Hamiltonian equations (2.27a, 2.28) (so for λ =
1), we find that the costate is constant and that u ∗ (t ) at every t minimizes
p ∗ (t ) u 2 (t )+ u (t ). But the true optimal control is u ∗ (t ) = 0 and this does not min-
imize p ∗ (t ) u 2 (t ) + u (t ).
If we take λ = 0 (the abnormal case), then the Hamiltonian simplifies to
H0 (x, p, u) = pu 2 . This again implies that the costate is constant, p ∗ (t ) = p̂. The
input u (t ) that minimizes the Hamiltonian p̂ u 2 (t ) now is either not defined (if
p̂ < 0), or is non-unique (if p̂ = 0), or equals zero (if p̂ > 0). This last case (the
zero input) is the true optimal input.
One more abnormal case is discussed in Exercise 2.15. All other examples in
this chapter are normal.
Example 2.6.3 (Shortest path—a normal case). In the previous chapter (Exam-
ple 1.1.4 and Example 1.2.5), we solved the (trivial) shortest path problem by
2.6 O PTIMAL C ONTROL WITH F INAL C ONSTRAINTS 67
We want to minimize J˜( x ). This can be seen as an optimal control problem for
the system
ẋ (t ) = u (t ), x (0) = x0 , x (T ) = xT ,
with cost
T
J (u) = 1 + u 2 (t ) dt .
0
We can strike off the first and the last candidates, because they clearly fail to
achieve the final condition x (T ) = x T . The second candidate says that u ∗ (t ) is
some constant. But for a constant input û := u ∗ (t ), the solution x (t ) of the dif-
ferential equation is x (t ) = ût + x 0 , which is a straight line. From the initial and
final conditions, it follows that û = (x T − x 0 )/T . Hence, as expected,
xT − x0 xT − x0
x ∗ (t ) = x0 + t, u ∗ (t ) = .
T T
The constant costate then follows from (2.29),
− u ∗ (t ) −(x T − x 0 )
p ∗ (t ) = p̂ = = .
1 + u 2∗ (t ) T 2 + (x T − x 0 )2
68 2 M INIMUM P RINCIPLE
It is interesting to compare this with the optimal cost (the minimal length of the
curve)
J (u∗) = T 2 + (x T − x 0 )2 .
(u∗)
We see that p ∗ (0) equals dJdx 0
. That is, p ∗ (0) expresses how strongly the optimal
cost changes if x 0 changes. We return to this sensitivity property of the costate
in § 3.5.
Example 2.6.4 (Integrator system with fixed initial and final states). Consider
the system with bounded derivative,
ẋ (t ) = u (t ), u (t ) ∈ [−1, 1],
In Example 2.5.2, we analyzed the same system and cost (for T = 1), but now we
fix both the initial and final states,
x (0) = x (T ) = 0.
H1 (x, p, u) = pu + x,
ṗ ∗ (t ) = −1.
Notice that since the state is fixed at the final time, x (T ) = 0, there is no condi-
tion on the costate at the final time. So all we know, for now, about the costate
is that its derivative is −1, i.e.,
for some as yet unknown constant c. Given this p ∗ (t ) = c−t , the minimizer u ∗ (t )
of the Hamiltonian is
2.7 F REE F INAL T IME 69
c = T /2.
This completely settles the optimal control problem. In the first half, [0, T /2],
we have ẋ ∗ (t ) = −1 and, in the second half, [T /2, T ], we have ẋ ∗ (t ) = +1. The
T
optimal cost is J ( u ∗ ) = 0 x ∗ (t ) dt = −T 2 /4.
So far, the final time T in the optimal control problem was fixed. Now we extend
the optimal control problem by minimizing the cost over all inputs as well as
over all final times T ≥ 0. As before we assume a cost of the form
T
J T ( u ) := L( x (t ), u (t )) dt + K ( x (T )).
0
Since we now have one extra degree of freedom, we expect that the minimum
principle still holds but with one extra condition. This turns out to be true, and
the extra condition is quite elegant:
Theorem 2.7.1 (Minimum principle with free final time). Consider the sys-
tem (2.25) with cost (2.26), and assume that f (x, u) and ∂ f (x, u)/∂x and L(x, u)
and ∂L(x, u)/∂x are continuous in x and u, and that K (x) and ∂K (x)/∂x are con-
tinuous in x.
Suppose ( u ∗ , T∗ ) is a solution of the optimal control problem with free final
time, and that u ∗ is piecewise continuous on [0, T∗ ], and that 0 ≤ T∗ < ∞. Then
all conditions of Theorem 2.6.1 hold (with T = T∗ ), and, in addition,
Proof. We prove it only for the normal case (λ = 1). If the pair ( u ∗ , T∗ ) is opti-
mal, then u ∗ is also optimal for the fixed final time T = T∗ ; hence, all conditions
of Theorem 2.6.1 hold.
Since u ∗ is assumed to be piecewise continuous, the limit u T∗ := limt ↑T∗ u ∗ (t )
exists. The given u ∗ is defined on [0, T∗ ], and we now extend its definition by
letting u ∗ (t ) = u T∗ for all t ≥ T∗ . That way u ∗ is continuous at T = T∗ , and the
70 2 M INIMUM P RINCIPLE
dJ ( u ∗ , T∗ )
= 0.
dT
This derivative equals
dJ ( u ∗ , T∗ ) ∂K ( x ∗ (T∗ ))
= ẋ ∗ (T∗ ) + L( x ∗ (T∗ ), u ∗ (T∗ ))
dT ∂x T
= p ∗T (T∗ ) f ( x ∗ (T∗ ), u ∗ (T∗ )) + L( x ∗ (T∗ ), u ∗ (T∗ ))
= H1 ( x ∗ (T∗ ), p ∗ (T∗ ), u ∗ (T∗ )).
The remarks about λ made in the previous section also apply to this situa-
tion. Also the constancy property of the Hamiltonian (Theorem 2.5.6) remains.
This is interesting because it shows that for final time problems the Hamiltonian
H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) is actually zero for all time!
An important special case is when L(x, u) = 1 and K (x) = 0. Then the cost
T
function equals J T ( u ) = 0 1 dt = T , that is, the task of the control input is to
realize given boundary conditions in minimal time. This only makes sense if
we have both initial and final conditions. Such problems are known as time-
optimal control problems. A classic time-optimal control problem is the prob-
lem of Zermelo.
F IGURE 2.5: The problem of Zermelo. We assume that the speed of the boat
with respect to the water is v = 1 and that the flow velocity is w(x 2 ) =
0.5, and
cos(u) (a, b) = (−1, 2). Then optimal is to take u constant
such
that
−0.8 (shown in red), for then the sum of cos(u) and 0.5 (shown
sin(u) = +0.6
sin(u) 0
in blue) is −0.30.6 (shown in yellow), and this direction brings the boat to
(a, b) = (−1, 2) as required. It takes T = b/0.6 = 3 13 units of time. See Exam-
ple 2.7.2.
F IGURE 2.6: The problem of Zermelo. We assume that the speed of the boat
with respect to the water is v = 1 and that the flow velocity is w(x 2 ) =
x 2 (1 − x 2 /b) with (a, b) = (−1, 2). The optimal trajectory x ∗1 (t ), x ∗2 (t ) as
shown here was computed by iterating over u 0 in (2.36). The optimal value
is u 0 = 2.5937, and the optimal (minimal) time is T = 2.7570. The optimal
angle u ∗ (t ) of the boat does not vary much with time. See Example 2.7.2.
72 2 M INIMUM P RINCIPLE
∂H1 ( x ∗ , p ∗ , u ∗ )
The optimal control u ∗ minimizes the Hamiltonian, so we need ∂u to
be equal to zero for all time. This gives
(To avoid clutter, we drop the argument t in most of the equations.) Also, the
free final time condition (2.30) has to hold, and this means that
for all time. Equations (2.32) and (2.33) are two linear equations in p ∗1 , p ∗2 , so
is easy to solve as
p ∗1 −1 cos( u ∗ )
= . (2.34)
p ∗2 cos( u ∗ )w( x ∗2 ) + 1 sin( u ∗ )
ṗ ∗1 = 0,
(2.35)
ṗ ∗2 = − p ∗1 w ( x ∗2 ),
sin( u ∗ )( u̇ ∗ + cos2 ( u ∗ )w ( x ∗2 ))
ṗ ∗1 = .
(cos( u ∗ )w( x ∗2 ) + 1)2
(Verify this yourself.) This needs to be zero for all time, so either sin( u ∗ ) is zero
or u̇ ∗ = − cos2 ( u ∗ )w ( x ∗2 ). Likewise it can be shown that the costate equation
for p ∗2 holds iff u̇ ∗ = − cos2 ( u ∗ )w ( x ∗2 ) or cos( u ∗ ) + w( x ∗2 ) = 0. Since sin( u ∗ )
and cos( u ∗ ) + w( x ∗2 ) cannot be zero simultaneously (because we assumed
|w(x 2 )| < 1), we conclude that both costate equations hold iff
u̇ ∗ = − cos2 ( u ∗ )w ( x ∗2 ).
2.7 F REE F INAL T IME 73
and its solution by construction makes the costate defined in (2.34) satisfy the
Hamiltonian equations and makes the Hamiltonian equal to zero for all time.
The game is now to determine the initial condition u 0 of the control for which
( x 1 (T ), x 2 (T )) equals (a, b) for some T > 0. Without further assumptions on the
flow velocity w(x 2 ), there does not seem to be an easy answer to this problem.
For the special case of a constant flow velocity, w(x 2 ) = w 0 , however, we see that
u ∗ (t ) is constant, and then ( x ∗1 (t ), x ∗2 (t )) is a straight line. A particular instance
is shown in Fig. 2.5. A more realistic scenario is when the flow velocity w(x 2 ) is
small near the banks. One such example is depicted in Fig. 2.6. The solution
shown in this figure was determined numerically (by iterating over u 0 ).
Example 2.7.3 (Minimal time car parking problem). This is an elegant and
classic application. We want to steer a car into a parking spot, and we want to
do it in minimal time. To keep things manageable, we assume that we can steer
the car in one dimension only (like a cart on a rail). The position of the car is
denoted as x 1 and its speed as x 2 . The acceleration u is bounded, specifically,
u (t ) ∈ [−1, 1] for all t . The equations thus are
ẋ 1 (t ) = x 2 (t ), x 1 (0) = x01 ,
ẋ 2 (t ) = u (t ), x 2 (0) = x02 , u (t ) ∈ [−1, 1].
The parking spot we take to be x 1 = 0, and the time we reach the parking spot
we denote by T , and, of course, at that moment our speed should become zero.
So the final conditions are
x 1 (T ) = 0, x 2 (T ) = 0.
T
We want to achieve this in minimal time, thus we take as cost J T ( u ) = 0 1 dt .
The normal Hamiltonian for this problem is
H1 (x, p, u) = p 1 x 2 + p 2 u + 1.
ṗ 1 (t ) = 0,
ṗ 2 (t ) = − p 1 (t ).
Since both components of the final state x (T ) are fixed, the final conditions on
both components of the costate are absent. Therefore, in principle, every con-
stant p 1 (t ) = −a is allowed and, consequently, every linear function
p 2 (t ) = at + b.
74 2 M INIMUM P RINCIPLE
We can not have a = b = 0 because that contradicts the fact that the Hamiltonian
p 1 (t ) x 2 (t ) + p 2 (t ) u (t ) + 1 is zero along optimal solutions (Theorem 2.7.1). As a
result, the second costate entry, p 2 (t ), is not the zero function. This, in turn,
implies that p 2 (t ) switches sign at most once. Why is this important? Well, the
optimal u ∗ (t ) minimizes the Hamiltonian, p 1 (t ) x 2 (t ) + p 2 (t ) u (t ) + 1, and since
u ∗ (t ) ∈ [−1, 1] this yields
u ∗ (t ) = − sgn( p 2 (t )).
This is well defined because p 2 (t ) is nontrivial. In fact, as p 2 (t ) switches sign at
most once, also
(The arrows indicate the direction of the trajectory as time increases.) Likewise,
if u (t ) = −1, then all possible ( x 1 (t ), x 2 (t )) are the shifted “reversed” parabolas,
shown here in blue:
2.7 F REE F INAL T IME 75
Since on (t s , T ] the input does not change, and since we demand that x (T ) =
(0, 0), it must be that on (t s , T ] the state either follows this red or blue parabola:
After all, these two are the only two trajectories that end up at the desired final
state x (T ) = (0, 0). Before the moment of switching, the input u (t ) had the oppo-
site sign. For instance, if after the switch we have u (t ) = +1 (the red trajectory),
then before the switch we have u (t ) = −1, i.e., any of the blue parabolas. These
have to end up at the above red parabola at t = t s . Inspection shows that the
possible trajectories are any of these:
x2
u * (t ) 1
x1
u * (t ) 1
This solves the problem for every initial state ( x 1 (0), x 2 (0)). If before the switch
the trajectory follows a blue parabola, then, when it reaches the thick red
parabola, the inputs switches sign, and the trajectory continues along the thick
red parabola, ending up at (0, 0). Likewise, if it first follows a red parabola then,
when it reaches the thick blue parabola, the input switches sign, and the trajec-
tory continues along the thick blue parabola, ending up at (0, 0).
76 2 M INIMUM P RINCIPLE
The minimum principle assumes the existence of an optimal control, and then
derives some conditions for it: (2.14) and (2.15). These conditions are necessary
for optimality, but in general not sufficient (see Exercise 2.16). If, however, the
problem has certain convexity properties then the necessary conditions are suf-
ficient. That is what the following theorem is about. It requires some knowledge
of convex sets and functions as discussed in Appendix A.7.
• U is a convex set,
• K (x) is convex in x ∈ Rn .
Proof. In order not to digress too much, we allow ourselves here some “proofs
by picture”. Details can be found in Appendix A.7.
Convexity of the Hamiltonian in (x, u) means that
∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t ))
(u − u ∗ (t )) ≥ 0 ∀u ∈ U
∂u T
for almost all times. This property is illustrated in Fig. 2.7 (right). The above two
∂H ( x ∗ (t ), p ∗ (t ), u ∗ (t ))
inequalities, combined with the fact that ṗ ∗ (t ) = − ∂x , shows that
H (x, p ∗ (t ), u) ≥ H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) − ṗ ∗T (t )(x − x ∗ (t ))
for all x ∈ Rn , u ∈ U, for almost all times. For simplicity, we now assume that the
terminal cost is absent, K = 0. (Exercise 2.19 considers nonzero K .) The final
inequality gives us that
2.9 E XERCISES 77
T
J (x 0 , u ) − J (x 0 , u ∗ ) = L( x , u ) dt − L( x ∗ , u ∗ ) dt
0
T
= H ( x , p ∗ , u ) − p ∗T ẋ − H ( x ∗ , p ∗ , u ∗ ) − p ∗T ẋ ∗ dt
0
T
= H ( x , p ∗ , u ) − H ( x ∗ , p ∗ , u ∗ ) − p ∗T ( ẋ − ẋ ∗ ) dt
0
T
≥ − ṗ ∗T ( x − x ∗ ) − p ∗T ( ẋ − ẋ ∗ ) dt
0
T
= − p ∗T (t )( x (t ) − x ∗ (t )) 0 = 0.
(In the last equality, we used that p ∗ (T ) = 0 and that x (0) = x ∗ (0) = x 0 .) There-
fore J (x 0 , u ) ≥ J (x 0 , u ∗ ). So u ∗ is optimal. ■
Many of the examples considered in the chapter satisfy the above convexity
properties, see Exercises 2.20 and 2.21 for illustrations.
{u H (u) c}
u
h(x̄) h (x̄ ) (x x̄ ) u
h (x̄ )
x
H (u )
u
x̄ x
{u H (u) c}
F IGURE 2.7: Left: a C 1 function h : R → R is convex iff h(x) ≥ h(x̄)+ ∂h( x̄)
∂x (x −
2 1 2
x̄) ∀x̄, x ∈ R. Right: Suppose H : R → R is C and that U ⊆ R . If H (u ∗ ) =
minu∈U H (u) for some u ∗ ∈ U, and U is convex, then ∂H∂u(uT∗ ) (u − u ∗ ) ≥ 0 ∀u ∈
U. This is used in the proof of Theorem 2.8.1. Appendix A.7 has more
details.
2.9 Exercises
ẋ (t ) = x (t ) u (t ), x (0) = x0 = 1,
T
with U = R and cost function J ( u ) = 2 x (T ) + 0 x 2 (t ) + u 2 (t ) dt .
ẋ (t ) = u (t ), x (0) = x0 ,
T
and cost function J ( u ) = 0 x 2 (t ) dt . We want to minimize this cost over
all u : [0, T ] → [0, 1].
(a) Give the Hamiltonian and the differential equation for the costate.
(b) Argue from the Hamiltonian that u ∗ (t ) most likely assumes just one
or two values.
(c) Argue that if x 0 > 0, then x ∗ (t ) > 0 for all t ∈ [0, T ].
(d) Prove that p ∗ (t ) under the conditions stated in (c) has at most one
sign change. What does this mean for u ∗ (t )?
(e) Solve the optimization problem for x 0 > 0. Also give the solution for
p ∗ (t ).
(f ) Determine an optimal input for the case that x 0 < 0, and verify that it
satisfies the Hamiltonian equations (2.14), (2.15). [Hint: directly for-
mulate an optimal input u ∗ without using the minimum principle.]
This is the standard linear model for a point mass of mass m = 1, dis-
placement from the equilibrium x 1 , velocity x 2 , and which is subject to
an external force u and a spring force with spring constant k > 0.
We assume that | u (t )| ≤ 1 for all t .
2.9 E XERCISES 79
2.6 Initial and final conditions. Let u , y : [0, T ] → R. Consider the second-order
differential equation
ÿ (t ) + y (t ) = u (t ), y (0) = y 0 , ẏ (0) = ẏ 0 ,
with cost
T
1
J (u) = u 2 (t ) dt .
2 0
Determine the optimal control u ∗ that drives the system from the initial
state y (0) = y 0 , ẏ (0) = ẏ 0 to the final state y (T ) = ẏ (T ) = 0.
ẋ 1 (t ) = x 2 (t ), x 1 (0) = 0,
ẋ 2 (t ) = u (t ), x 2 (0) = 0, x 2 (T ) = 0.
2.8 Control of pendula via torques. Consider a mass m hanging from a ceiling
on a thin massless rod of length , see Fig. 2.8. We can control the pendu-
lum with a torque u exerted around the suspension point. The differential
equation describing the pendulum without damping is
where φ is the angle with respect to the stable equilibrium state (the ver-
tical hanging position). The objective is to minimize the cost
T
J ( u ) := m2 φ̇2 (T ) − 2mg cos(φ(T )) + φ̇2 (t ) + u 2 (t ) dt .
0
2.9 Optimal capacitor charging. Consider the RC -circuit of Fig. 2.9. We want
to determine at any moment in time the voltage u (t ) of the voltage source
that charges the capacitor in T seconds from zero voltage, x (0) = 0,
to a certain desired voltage, x (T ) = x desired , with minimal dissipation
of energy through the resistor. The voltage v (t ) across the resistor is
given by Kirchhoff’s voltage law as v (t ) = u (t ) − x (t ). Hence the current
i (t ) through the resistor with resistance R equals i (t ) = R1 ( u (t ) − x (t )),
2.9 E XERCISES 81
R
u C x
1 1
ẋ (t ) = − x (t ) + u (t ). (2.37)
RC RC
Furthermore, the power dissipated in the resistor is given as v (t ) i (t ) =
R v (t ), and, hence, the total energy loss is
1 2
T
1 2
J (u) = u (t ) − x (t ) dt .
0 R
For the rest of this exercise we take R = 1 (ohm), and C = 1 (farad), and
x desired = 1 (volt). We further assume that x (0) = 0, and that U = R.
2.10 Soft landing on the Moon. We consider the problem of optimal and safe
landing on the Moon. The situation is depicted in Fig. 2.10. We assume
the lunar ship only moves in the vertical direction. Its position relative to
the Moon’s surface is denoted by y (t ), and its mass is denoted by m (t ).
The ship can generate an upwards force by thrusting out gasses down-
wards (in the direction of the Moon). We assume it does so with a con-
stant velocity c, but that it can control the rate − ṁ (t ) at which it expels the
mass. This results in an upwards force of −c ṁ (t ). The gravitational pull on
the lunar ship is −g m (t ). (On the Moon the gravitational acceleration g is
1.624 m/s2 .) The altitude y (t ) of the ship satisfies the differential equation
m (t ) ÿ (t ) = −g m (t ) − c ṁ (t ).
82 2 M INIMUM P RINCIPLE
u (t ) := − ṁ (t ).
The objective is to determine the control u and final time T > 0 that min-
imizes the total expelled mass, J ( u ), while achieving a safe, soft landing
on the Moon at final time T . The latter means that y (T ) = 0 and ẏ (T ) = 0.
With the state variables x 1 := y , x 2 := ẏ , x 3 := m we can rewrite the differen-
tial equations as
ẋ 1 (t ) = x 2 (t ), x 1 (0) = y 0 , x 1 (T ) = 0,
u (t )
ẋ 2 (t ) = c −g, x 2 (0) = ẏ 0 , x 2 (T ) = 0,
x 3 (t )
ẋ 3 (t ) = − u (t ), x 3 (0) = m0 .
p 2 (t )
z (t ) = c − p 3 (t ) + 1,
x 3 (t )
and show that
−c p 1 (T )
ż (t ) = .
x 3 (t )
2.9 E XERCISES 83
for some t s < T . Thus it is optimal to thrust out gasses only during
the final stage of descent, and then to do so at maximal rate.
with cost
1
J (u) = − ln x (t ) u (t ) dt .
0
Since x (0) > 0 we have that x (t ) ≥ 0 for all t . For a well-defined cost we
hence need u (t ) ∈ [0, ∞) but for the moment we allow any u (t ) ∈ R and
later verify that the optimal u ∗ (t ) is in fact positive for all t ∈ [0, 1].
2.12 Running cost depending on time. Consider the second-order system with
mixed initial and final conditions
ẋ 1 (t ) = u (t ), x 1 (0) = 0, x 1 (1) = 1,
ẋ 2 (t ) = 1, x 2 (0) = 0,
ẋ 1 (t ) = a u (t ) x 1 (t ),
ẋ 2 (t ) = a(1 − u (t )) x 1 (t ),
where a is a positive constant. Hence, the increase in production per unit
of time in each sector is assumed to be proportional to the investment
allocated to the sector. By definition we have
2.14 Consider the second-order system with mixed initial and final conditions
ẋ 1 (t ) = u (t ), x 1 (0) = 0, x 1 (1) = 2,
ẋ 2 (t ) = 1, x 2 (0) = 0,
and with cost
1
J (u) = u 2 (t ) + 4 x 2 (t ) u (t ) dt .
0
The input u : [0, 1] → R is not restricted, i.e., u (t ) can take on any real
value.
2.9 E XERCISES 85
2.19 Convexity. The proof of Theorem 2.8.1 assumes that the terminal cost is
absent, i.e., K (x) = 0. Now consider more general K (x). Assume K (x) and
∂K (x)
∂x are continuous, and that K (x) is a convex function.
(a) Adapt the proof of Theorem 2.8.1 so that it also works for nonzero
convex K (x). [Hint: have a look at Lemma A.7.1 (p. 198).]
(b) Theorem 2.8.1 considers the standard (free endpoint) optimal con-
trol problem of Theorem 2.5.1. Show that Theorem 2.8.1 remains
valid for the case that the final state is constrained as in Theo-
rem 2.6.1.
2.20 Convexity. Does Example 2.5.3 satisfy the convexity assumptions of Theo-
rem 2.8.1?
2.21 Convexity. Does Example 2.5.4 satisfy the convexity assumptions of Theo-
rem 2.8.1?
Chapter 3
Dynamic Programming
3.1 Introduction
The minimum principle was developed in the Soviet Union in the late fifties of
the previous century. At about the same time Richard Bellman in the USA devel-
oped an entirely different approach to optimal control, called dynamic pro-
gramming. In this chapter, we deal with dynamic programming. As in the previ-
ous chapter, we assume that the state satisfies a system of differential equations
u : [0, T ] → U. (3.1b)
As before, we associate with system (3.1a) a cost over a finite time horizon [0, T ]
of the form
T
J [0,T ] (x 0 , u ) := L( x (t ), u (t )) dt + K ( x (T )). (3.1c)
0
for each initial time τ ∈ [0, T ] and for each initial state x (τ) = x, and then to
establish a dynamic relation between the family of optimal costs (hence the
name dynamic programming). On the one hand, this complicates the problem
because many optimal control problems need to be considered. On the other
hand, if this dynamic relation can be solved, then it turns out to produce suffi-
cient conditions for optimality.
u (t )
û(t )
0 T t
No, because if it would, then the new input ũ constructed from u ∗ over the ini-
tial interval [0, τ], and from û over the remaining [τ, T ], would improve on u ∗
over the entire interval:
τ
J [0,T ] (x 0 , ũ ) = L( x (t ), ũ (t )) dt + J [τ,T ] ( x (τ), ũ )
0τ
= L( x ∗ (t ), u ∗ (t )) dt + J [τ,T ] ( x ∗ (τ), û )
0
τ
< L( x ∗ (t ), u ∗ (t )) dt + J [τ,T ] ( x ∗ (τ), u ∗ ) = J [0,T ] (x 0 , u ∗ ),
0
The main idea of dynamic programming and the reason for its popularity is
explained best for systems that evolve over discrete time—as opposed to the
systems that evolve over continuous time, which we normally consider in this
book. Thus, for the time being, consider a discrete-time system
t ∈ {0, 1, . . . , T − 1},
with x 0 given, and T a given positive integer. We want to find a control sequence
u ∗ = ( u ∗ (0), u ∗ (1), . . . , u ∗ (T − 1)), called optimal control (sequence), and resulting
state sequence ( x ∗ (0), x ∗ (1), . . . , x ∗ (T )), that minimizes a cost of the form
T
−1
J [0,T ] (x 0 , u ) = L( x (t ), u (t )) + K ( x (T )). (3.4)
t =0
2
1
3
0
4
6
5
Example 3.3.1 (Naive optimization). Suppose the state space X consists of the
7 integer elements
X = {0, 1, 2, . . . , 6}.
Align the states in a circle (see Fig. 3.2), and suppose that at each moment in
time, the state can either move one step counter-clockwise, or stay where it
is. Thus, at each moment in time, we have a choice of two. The input space
U hence has two elements. If we take
U = {0, 1}
90 3 DYNAMIC P ROGRAMMING
then the transition from one state to the next is modeled by the discrete system
x (t + 1) = x (t ) + u (t ), u (t ) ∈ U, t ∈ {0, 1, . . . , T − 1}
(counting modulo 7, so 6 + 1 = 0). Each transition from one state x (t ) to the
next x (t + 1) is assumed to cost a certain amount L( x (t ), u (t )), and the final
state x (T ) costs an additional K ( x (T )). The total cost hence is (3.4). The naive
approach to determine the optimal control ( u (0), . . . , u (T −1)) and resulting opti-
mal state sequence ( x (1), . . . , x (T )) is to just explore them all and pick the best.
As we can move in two different ways each at moment in time, this naive
approach requires 2T sequences ( x (1), . . . , x (T )) to explore. Since each sequence
has length T the evaluation of the cost for each sequence is (roughly) lin-
ear in T , and, therefore, the total number of operations required in this naive
approach is of order
T × 2T .
It is not hard to see that for arbitrary systems (3.3) the total number of oper-
ations that the naive approach requires is of order
T × |U|T .
Thus the total number of operations is exponential in T .
In general, in dynamic programming we solve the minimization backwards
in time. This may at first sight seem to complicate the analysis, but it allows us
to exploit the principle of optimality. The following example explains it all.
x= 1
x= 0
t=0
t=1
t=T−1
t=T
3.3 D ISCRETE -T IME DYNAMIC P ROGRAMMING 91
25
16
x= 1 1
x= 0 0
0 1 T−1 T
Now that V (x, T ) is known, consider the optimal cost-to-go from t = T − 1
onwards. This optimal cost-to-go we denote by V (x, T − 1) and it is defined as
the minimal cost from t = T − 1 onwards if at time t = T − 1 we are at state
x (T − 1) = x. It satisfies
V (x, T − 1) = min L(x, u (T − 1)) + K ( x (T )) ,
u (T −1)∈{0,1}
because L(x, u (T − 1)) is the cost of the transition if we apply input u (T − 1),
and K ( x (T )) is the final cost. Since x (T ) = f (x, u (T − 1)), this final cost equals
V ( f (x, u (T − 1)), T ), so we can also write
V (x, T − 1) = min L(x, u) + V ( f (x, u), T ) .
u∈{0,1}
With V (x, T ) already established for all x, this minimization requires |U| =
|{1, 2}| = 2 inputs to explore at each state, and, hence, the total number of oper-
ations that this requires is of order |X| × |U|. The so determined V (x, T − 1),
together with V (x, T ), are shown here:
x= 6 1 36
25 25
16 16
9 9
4 4
x= 1 1 1
x= 0 0 0
0 1 T−1 T
92 3 DYNAMIC P ROGRAMMING
Along the way we also determined for each state x (T − 1) an optimal control
u ∗ (T − 1), indicated in the figure by the thick edges. Notice that none of the
states x (T −1) switch to x (T ) = 6. We can continue in this fashion and determine
backwards in time—for t = T −2, then t = T −3, etc., till t = 0—the optimal cost-
to-go from t onwards for any state x (t ) = x. At this stage, we exploit the principle
of optimality: since every tail of an optimal control is optimal, the optimal cost-
to-go V (x, t ), defined as the optimal cost from t onwards starting at x (t ) = x,
satisfies the equation:
V (x, t ) = min L(x, u) + V ( f (x, u), t + 1) .
u∈{0,1}
This equation expresses that the optimal cost from t onwards, starting at x (t ) =
x, is the cost of the transition, L(x, u), plus the optimal cost from t + 1 onwards.
Once V (x, t + 1) is known for all x, this easily gives us V (x, t ) for all x. For
T =5
x= 5 2 2 2 2 25 25
x= 4 3 3 3 16 16 16
x= 3 4 4 9 9 9 9
x= 2 4 4 4 4 4 4
x= 1 1 1 1 1 1 1
x= 0 0 0 0 0 0 0
0 1 2 3 4 T=5
This solves the optimal control problem for every initial state x 0 . For some
initial states the optimal control sequence, u ∗ = ( u (0), u (1), u (2), u (3), u (4)), is
actually not unique. For instance, the control sequence shown here in red,
u ∗ = (1, 0, 1, 0, 0), is one of several optimal controls for x0 = 5. The optimal cost-
to-go V (x, t ) of course is unique.
starting at the final time where V (x, T ) = K (x) for all states, and then subse-
quently going backwards in time, t = T − 1, t = T − 2, . . . , until we reach t = 0.
In this way, the optimal control problem is split into T ordinary minimization
problems. To determine the final cost V (x, T ) = K (x) for all x ∈ X requires order
|X| operations. Then determining V (x, T − 1) for all x ∈ X requires |X| times the
3.4 H AMILTON -J ACOBI -B ELLMAN E QUATION 93
number of inputs |U| to explore, etc., and so the total number of operations over
all t ∈ {0, 1, . . . , T − 1} is of order
T × |U| × |X|.
If the number of states is modest or if T is large, then this typically outperforms
the naive approach (which requires order T × |U|T operations). Equation (3.5) is
called Bellman’s equation of dynamic programming.
In continuous time the same basic idea survives, except for the results
regarding computational complexity. Note that, in the continuous time case, the
optimization is over a set of input functions on the time interval [0, T ], which
is an infinite-dimensional space. Furthermore, it is clear that, contrary to the
discrete-time case, we will not be able to split the problem into a series of finite-
dimensional minimization problems.
Example 3.4.2 (Integrator with linear cost). Consider once again the optimal
control problem of Example 2.5.2:
ẋ (t ) = u (t ), x (0) = x0 , (3.7)
with bounded inputs
U = [−1, 1],
and with cost
1
J [0,1] (x 0 , u ) = x (t ) dt .
0
94 3 DYNAMIC P ROGRAMMING
From the fact that ẋ (t ) = u (t ) ∈ [−1, 1], it is immediate that the optimal control
is u ∗ (t ) = −1 and, hence, x ∗ (t ) = x 0 − t . Therefore, the value function at τ = 0 is
1
V (x 0 , 0) = J [0,1] (x 0 , u ∗ ) = x 0 − t dt = x 0 − 1/2.
0
Next, we determine the value function at the other time instances. It is easy to
see that u ∗ (t ) = −1 is optimal for J [τ,1] (x, u ) for every τ > 0 and every x (τ) = x.
Hence, in this case, x ∗ (t ) = x − (t − τ) and
1 1
V (x, τ) = x − (t − τ) dt = xt − 12 (t − τ)2 = x(1 − τ) − 12 (1 − τ)2 .
τ τ
As expected, the value function is zero at the final time τ = 1. It is not necessarily
monotonic in τ, see Fig. 3.3. Indeed, for x = 1/2, the value function is zero at
τ = 0 and at τ = 1, yet it is positive in between.
1
(1.5, )
0.5 (1, )
(.5, )
0 T 1
(0, )
0.5
( .5, )
F IGURE 3.3: The value function V (x, τ) of the problem of Example 3.4.2 for
various x as a function of τ ∈ [0, 1].
with initial state x (τ) = x. The value function is defined as the infimum of this
cost over all inputs. Suppose the infimum is attained by some input, i.e., that the
3.4 H AMILTON -J ACOBI -B ELLMAN E QUATION 95
infimum is a minimum. Taking the minimum over all u of the left- and right-
hand sides of (3.8) shows that
τ+
V (x, τ) = min L( x (t ), u (t )) dt + J [τ+,T ] ( x (τ + ), u ) .
u :[τ,T ]→U τ
By the principle of optimality, any optimal control over [τ, T ] is optimal for
J [τ+,T ] ( x (τ + ), u ) as well. The right-hand side of the above equality can thus
be simplified to
τ+
V (x, τ) = min L( x (t ), u (t )) dt + V ( x (τ + ), τ + )
u :[τ,τ+]→U τ
with initial state x (τ) = x. Notice that, in this last equation, we need only opti-
mize over inputs defined on the time window [τ, τ+] because the optimization
over the remaining time window [τ + , T ] is incorporated in the value function
V ( x (τ + ), τ + ). For further analysis, it is beneficial to move the term V (x, τ) to
the right-hand side and to scale the equation by ,
τ+
τ L( x (t ), u (t )) dt + V ( x (τ + ), τ + ) − V (x, τ)
0= min . (3.9)
u :[τ,τ+]→U
In this form we can take the limit → 0. It is plausible that functions u : [τ, τ+
] → U in the limit can be identified with constants u ∈ U, and that the differ-
ence between the two value functions in (3.9) converges for → 0 to the total
derivative of V ( x (τ), τ) with respect to τ. Thus,
d V ( x (τ), τ)
0 = min L( x (τ), u) + (3.10)
u∈U dτ
for all τ ∈ [0, T ] and all x (τ) = x ∈ Rn . Incidentally, this identity is reminiscent of
the cost-to-go (B.14) as explained in Section B.5 of Appendix B. The total deriva-
tive of V ( x (τ), τ) with respect to τ is
∂ V (x, τ) ∂ V (x, τ)
0 = min L(x, u) + f (x, u) +
u∈U ∂x T
∂τ
for all τ ∈ [0, T ] and all x ∈ Rn . The partial derivative of V (x, τ) with respect to
τ does not depend on u and so does not contribute to the minimization. This,
finally, brings us to the famous equation
∂ V (x, τ) ∂ V (x, τ)
+ min f (x, u) + L(x, u) = 0. (3.11)
∂τ u∈U ∂x T
96 3 DYNAMIC P ROGRAMMING
∂V (x, t ) ∂V (x, t )
+ min f (x, u) + L(x, u) = 0 (3.12a)
∂t u∈U ∂x T
for all x ∈ Rn and all t ∈ [0, T ], and that satisfies the final time condition
1. V (x, τ) is a lower bound of the cost over [τ, T ] starting at x (τ) = x, that is,
∂V ( x (t ), t )
f ( x (t ), u) + L( x (t ), u)
∂x T
over all u ∈ U. Then u ∗ is a solution of the optimal control problem and
the optimal cost is
Proof.
■
98 3 DYNAMIC P ROGRAMMING
The reasoning in the proof of this theorem (especially Part 1) is very similar
to the one used by Caratheodory in his approach to the calculus of variations1 .
This approach was called the “royal road of the calculus of variations2 ”.
Parts 2 and 3 are technical but this is needed because the input found by
solving the minimization problem (3.12a) pointwise (for each x and each t ) does
not always give us an input u ( x (t ), t ) for which x (t ) is well defined for all t ∈
[0, T ], see Exercise 3.3(c) and Exercise 3.7(e). Such cases are ruled out in parts
2 and 3. In most applications this problem does not occur, and then the above
says that the so determined input is the optimal control and that V (x, t ) is the
value function V (x, t ).
Theorem 3.4.3 provides a sufficient condition for optimality: if we can solve
the Hamilton-Jacobi-Bellman equations (3.12) and if the conditions of Theo-
rem 3.4.3 are satisfied, then it is guaranteed that u ∗ is an optimal control. Recall,
on the other hand, that the conditions formulated in the minimum principle
(Theorem 2.5.1) are necessary for optimality. So in a sense, dynamic program-
ming and the minimum principle complement each other.
Another difference between the two methods is that an optimal control u ∗
derived from the minimum principle is given as a function of state x ∗ and
costate p ∗ , which, after solving the Hamiltonian equations, gives us u ∗ (t ) as a
function of time, while in dynamic programming the optimal control is given
in state feedback form, u (x, t ). Applying the feedback u (x, t ) to the system gives,
what is called, the closed-loop system3
ẋ ∗ (t ) = f ( x ∗ (t ), u ( x ∗ (t ), t )), x ∗ (0) = x0 ,
and its solution (if it exists) determines x ∗ (t ) and the optimal control
u ∗ (t ) := u ( x ∗ (t ), t ). In applications, the state feedback form is often preferred,
because its implementation is way more robust. For example, if the evolution
of the state is affected by disturbances, then the optimal control as a function
of time, u ∗ (t ), derived from the undisturbed case can easily be very differ-
ent from the true optimal control, whereas the optimal control given in state
feedback form, u ( x (t ), t ), will automatically keep track of possible disturbances
in the system dynamics. Most of the following examples exhibit this feedback
property.
srechnung, Jahresbericht der Deutschen Mathematiker Vereinigung, 56 (1953), 31—58; see H.J.
Pesch, Caratheodory’s royal road of the Calculus of Variations: Missed exits to the Maximum Prin-
ciple of Optimal Control Theory, AIMS.
3 Controlling a system with an input u (optimal or not) that depends on x is known as closed-
loop control, and the resulting system is known as the closed-loop system. Controlling the system
with a given time function u (t ) is called open-loop control.
3.4 H AMILTON -J ACOBI -B ELLMAN E QUATION 99
with cost
T
J [0,T ] (x 0 , u ) = x (T ) +
2
r u 2 (t ) dt
0
for some r > 0. We allow every input, that is, U = R. Then the HJB equa-
tions (3.12) become
∂V (x, t ) ∂V (x, t )
+ min u + r u 2 = 0, V (x, T ) = x 2 . (3.15)
∂t u∈R ∂x
Since the term to be minimized is quadratic in u (and r > 0), the optimal u is
where the derivative of ∂V∂x
(x,t )
u + r u 2 with respect to u is zero. This u depends
on x and t ,
1 ∂V (x, t )
u (x, t ) = − , (3.16)
2r ∂x
and thereby reduces the HJB equations (3.15) to
2
∂V (x, t ) 1 ∂V (x, t )
− = 0, V (x, T ) = x 2 .
∂t 4r ∂x
Motivated by the boundary condition we try a V (x, t ) that is quadratic in x for
all time, so of the form V (x, t ) = x 2 P (t ). (Granted, this is a magic step because
at this point it is not clear that a quadratic form works.) This way the HJB equa-
tions simplify to
1
x 2 Ṗ (t ) − (2xP (t ))2 = 0, x 2 P (t ) = x 2 .
4r
It has a common quadratic term x 2 . Canceling this quadratic term x 2 gives
Ṗ (t ) = P 2 (t )/r, P (T ) = 1.
This is an ordinary differential equation and its solution can be found with sep-
aration of variables. The solution is
r
P (t ) = .
r +T −t
It is well defined throughout t ∈ [0, T ] and, therefore,
r
V (x, t ) = x 2 (3.17)
r +T −t
is a solution of the HJB equations (3.15). Now that V (x, t ) is known we can com-
pute the optimal input (3.16). It is expressed in feedback form, i.e., depending
on x (t ) (as well as on t ),
1 ∂V ( x (t ), t ) 2 x (t )P (t ) x (t )
u ∗ (t ) = u ( x (t ), t ) = − =− =− .
2r ∂x 2r r +T −t
100 3 DYNAMIC P ROGRAMMING
x ∗ (t )
ẋ ∗ (t ) = u ∗ (t ) = − .
r +T −t
This is a linear differential equation which has a well-defined solution x ∗ (t )
for all t ∈ [0, T ] and all initial states, and, hence, also the above u ∗ (t ) is well
defined for all t ∈ [0, T ]. This, finally, allows us to conclude that (3.17) is the
value function, that the above u ∗ is the optimal input, and that the optimal cost
is J [0,T ] (x 0 , u ∗ ) = V (x 0 , 0) = x 02 /(1 + T /r ).
ẋ (t ) = u (t ), x (0) = x0
for some ρ > 0. For this problem, the HJB equations (3.12) are
∂V (x, t ) ∂V (x, t )
+ min u + x 2 + ρ 2 u 2 = 0, V (x, T ) = 0.
∂t u∈R ∂x
1 ∂V (x, t )
u=− .
2ρ 2 ∂x
V (x, t ) = P (t )x 2 .
4 Outside the scope of this book, but still: let [x] denote the dimension of a quantity x. For
example, [t ] = time. From ẋ = u , it follows that [u] = [x][t ]−1 . Also, the expression x 2 + ρ 2 u 2
implies that ρ 2 u 2 has the same dimension as x 2 . Hence, [ρ] = [t ], and then [V ] = [J ] = [x]2 [t ].
This suggests that V (x, t ) = x 2 P (t ). In fact, application of the Buckingham π-theorem (not part of
this course) shows that V (x, t ) must have the form V (x, t ) = x 2 ρG((t −T )/ρ) for some dimension-
less function G : R → R.
3.4 H AMILTON -J ACOBI -B ELLMAN E QUATION 101
(3.19)
So for this P (t ), the function V (x, t ) := P (t )x 2 solves the HJB equations (3.18).
x (t ),t )
The candidate optimal control thus takes the form u ( x (t ), t ) = − 2ρ1 2 ∂V (∂x =
− ρ12 P (t ) x (t ), and the candidate optimal state satisfies the linear time-varying
differential equation
1
ẋ ∗ (t ) = u ( x ∗ (t ), t ) = − P (t ) x ∗ (t ), x ∗ (0) = x0 .
ρ2
Since P (t ) is well defined and bounded, it is clear that the solution x ∗ (t ) is well
defined for all t ∈ [0, T ]. In fact the solution is
1
t
− P (τ) dτ
x ∗ (t ) = e ρ2 0
x0 .
Again we assume that the input is not restricted: U = R. The HJB equations
become
∂V (x, t ) ∂V (x, t )
+ min u + x 4 + u 4 = 0, V (x, T ) = 0.
∂t u∈R ∂x
102 3 DYNAMIC P ROGRAMMING
V (x, t ) = x 4 P (t ). (3.20)
(We will soon see that this form works.) Substitution of this form in the HJB
equations yields
x 4 Ṗ (t ) + min 4x 3 P (t )u + x 4 + u 4 = 0, x 4 P (T ) = 0.
u∈R
x 4 Ṗ (t ) − 4x 4 P 4/3 (t ) + x 4 + x 4 P 4/3 (t ) = 0, x 4 P (T ) = 0.
Ṗ (t ) = 3P 4/3 (t ) − 1, P (T ) = 0. (3.21)
Clearly, P (t ) is well defined and bounded on (−∞, T ]. This shows that the HJB
equations have a solution of the quartic form (3.20). As t → −∞ the solution
P (t ) converges to the equilibrium solution where 0 = 3P 4/3 − 1, i.e., P = 3−3/4 ≈
0.43869. For now the function V (x, t ) = x 4 P (t ) is just a candidate value function.
The resulting candidate optimal control
u ∗ (t ) = − 3 P (t ) x (t ) (3.22)
In the above examples, the functions V all turned out to be true value func-
tions: V = V . We need to stress that examples exist where this is not the case,
see Exercise 3.3(c). The next example is one where U is bounded (while again
V = V ).
3.4 H AMILTON -J ACOBI -B ELLMAN E QUATION 103
∂V (x, t ) ∂V (x, t )
+ min u + x = 0, V (x, T ) = −αx.
∂t u∈[−1,1] ∂x
V (x, t ) = xP (t ) + Q(t )
for certain functions P (t ),Q(t ). We will demonstrate that this form also works
for the present problem. The HJB equations for this form simplify to
This has to hold for all x and all t so the HJB equations hold iff
This settles P (t ):
Thus, P (t ) is positive for t < T −α and negative for t > T −α. The u ∈ [−1, 1] that
minimizes P (t )u + x hence is
−1 if t < T − α
u ∗ (t ) = . (3.24)
+1 if t > T − α
104 3 DYNAMIC P ROGRAMMING
This, in turn, specializes the differential equation for Q(t ) as given in (3.23)
to
This function is continuously differentiable. Now all conditions of (3.23) are met
and, therefore, V (x, t ) := xP (t ) +Q(t ) satisfies the HJB equations. Along the way,
we also determined the candidate optimal input: (3.24). Clearly, for this input,
the solution x ∗ (t ) of the closed loop ẋ ∗ (t ) = u ∗ (t ), x ∗ (0) = x 0 is well defined for
all t ∈ [0, T ]. Hence, (3.24) is the optimal input, the above V (x, t ) is the value
function, and V (x 0 , 0) = x 0 P (0) + Q(0) is the optimal cost.
Does it agree with the minimum principle? The Hamiltonian is H (x, p, u) =
pu + x, and so the Hamiltonian equation for the costate is ṗ ∗ (t ) = −1, p ∗ (T ) =
−α. Clearly, this means that p ∗ (t ) = T −α−t . Now the input u ∗ (t ) that minimizes
the Hamiltonian
H ( x ∗ (t ), p ∗ (t ), u) = (T − α − t )u + x ∗ (t )
agrees with what we found earlier: (3.24). But, of course, the fundamental differ-
ence is that the minimum principle assumes the existence of an optimal control,
whereas satisfaction of the HJB equations proves that the control is optimal.
In Chapter 2, we claimed that the initial costate p ∗ (0) measures the sensitivity of
the optimal cost with respect to changes in the initial state (end of § 2.3, see also
Example 2.6.3). This connection can now be proved. In fact, it is a by-product of
a more general connection between the solution of the HJB equations (assum-
ing it equals the value function) and the costate of the minimum principle. First
of all, we note that the HJB equation (3.12a) can be expressed in terms of the
Hamiltonian H (x, p, u) := p T f (x, u) + L(x, u) as
∂V (x, t ) ∂V (x, t )
+ min H x, , u = 0,
∂t u∈U ∂x
(whence the name Hamilton-Jacobi). This suggests that the costate is closely
related to ∂V (x, t )/∂x. In fact, since we know that p ∗ (T ) = ∂K ( x ∗ (T ))/∂x =
∂V ( x ∗ (T ), T )/∂x we conjecture that
∂V ( x ∗ (t ), t )
p ∗ (t ) = for all t ∈ [0, T ].
∂x
Under mild assumptions that is indeed the case. To avoid technicalities, we
derive this connection only for U = Rm and value functions that are C 2 .
106 3 DYNAMIC P ROGRAMMING
Proof. Let H (x, p, u) = p T f (x, u) + L(x, u). By definition, the minimizing u ∗ (x, t )
satisfies the HJB equation
∂V (x, t ) ∂V (x, t )
+ H x, , u ∗ (x, t ) = 0.
∂t ∂x
In the rest of this proof, we drop all function arguments. The partial derivative
of the previous expression with respect to (row vector) x T yields
∂2V ∂H ∂H ∂2V ∂H ∂ u ∗
+ T+ + = 0.
∂x ∂t ∂x
T
∂p T ∂x T ∂x ∂u T∂x T
0
e1−t − et −1
V (x, t ) = x 2 P (t ) where P (t ) = .
e1−t + et −1
3.6 I NFINITE H ORIZON O PTIMAL C ONTROL AND LYAPUNOV F UNCTIONS 107
Using this and the formula for x ∗ (t ) (determined in Example 2.5.4), we find that
∂V ( x ∗ (t ), t )
p ∗ (t ) = = 2 x ∗ (t )P (t )
∂x
x 0 1−t 1−t − et −1
t −1 e
=2 e + e
e + e−1 e1−t + et −1
x 0 1−t t −1
=2 e −e .
e + e−1
This equals the p ∗ (t ) as determined in Example 2.5.4.
the infinite horizon optimal control problem is to minimize over all inputs the
infinite horizon cost
∞
J [0,∞) (x 0 , u ) := L( x (t ), u (t )) dt . (3.27b)
0
The only difference with the previous formulation is the cost function. The inte-
gral that defines the cost is now over all t > 0, and the “final” cost K ( x (∞)) has
been dropped because in applications we normally send the state to a unique
equilibrium x (∞) := limt →∞ x (t ), and thus all such controls achieve the same
final cost (i.e., the final cost would not affect the optimal control).
As before we define the value function as
The derivative of V (x) with respect to time vanishes, and thus the HJB equa-
tion (3.12a) simplifies to
∂V (x)
min f (x, u) + L(x, u) = 0. (3.30)
u∈U ∂x T
108 3 DYNAMIC P ROGRAMMING
As we will soon see, this equation typically has more than one solution V , and,
clearly, at most one of them will be the value function V . The next example sug-
gests that the “right” solution gives us a stabilizing input 5 , and a Lyapunov func-
tion for that equilibrium.
V (x) = 3−3/4 x 4 .
5 A stabilizing input is an input that steers the state to a given equilibrium. Better would have
been to call it “asymptotically stabilizing” input or “attracting” input, but “stabilizing” is the stan-
dard in the literature.
3.6 I NFINITE H ORIZON O PTIMAL C ONTROL AND LYAPUNOV F UNCTIONS 109
Proposition 3.6.3 (Optimal control with stability & Lyapunov functions). Con-
sider the optimal control problem (3.27), and assume that f (x, u) and L(x, u) are
C 1 and that f (0, 0) = 0, L(0, 0) = 0, and L(x, u) ≥ 0 for all x ∈ Rn , u ∈ U. Then
3. Suppose, in addition, that L(x, û (x)) > 0 for all x = 0. Then the closed-loop
system is asymptotically stable, and for all x 0 sufficiently close to x̄ = 0 the
input u ∗ (t ) := û ( x (t )) solves the infinite horizon optimal control problem
with stability. Moreover the optimal cost then equals V (x 0 ).
Proof. This proof refers to several definitions and results from Appendix B.
1. Since L(x, u) ≥ 0 it is immediate that V (x) ≥ 0. Also V (0) = 0 because for
x 0 = 0 the control u (t ) = 0 achieves x (t ) = 0 for all time, and, hence,
L( x (t ), u (t )) = 0 for all time.
3.7 Exercises
ẋ (t ) = f ( x (t ), u (t )), x (0) = x0 .
We want to maximize the cost
T
L 0 ( x (t ), u (t )) dt + K 0 ( x (T )).
0
3.7 E XERCISES 111
Find a new cost such that the maximization problem becomes a mini-
mization problem, in the sense that an input u solves the minimization
problem iff it solves the maximization problem. Also comment how the
associated two costs are related? (Note: this exercise is trivial.)
3.2 An optimal control problem that has no solution. Not every optimal con-
trol problem is solvable. Consider the system ẋ (t ) = u (t ), x 0 = 1 with cost
1
J [0,T ] (x 0 , u ) = x 2 (t ) dt ,
0
and U = R.
(a) Determine the value function (from the definition, not from the HJB
equations).
(b) Show that the value function does not satisfy the HJB equations.
3.3 The solution V need not equal the value function V . Consider again the
optimal control problem of Exercise 2.1:
ẋ (t ) = x (t ) u (t ), x (0) = x0
with cost function
T
J [0,T ] (x 0 , u ) = x 2 (t ) + u 2 (t ) dt + 2 x (T ),
0
and with the input free to choose: U = R.
3.4 Direct solution. Even though dynamic programming and the HJB
equations are powerful concepts, we should always aim for simpler
approaches. Consider the system
ẋ (t ) = u (t ), x (0) = x0
T
with cost function J [0,T ] (x 0 , u ) = 0 x 2 (t ) dt . The problem is to minimize
this with bounded inputs
0 ≤ u (t ) ≤ 1.
112 3 DYNAMIC P ROGRAMMING
u (t ) ∈ [0, 1].
ẋ (t ) = u (t ) x (t ), x (0) > 0.
(a) Let V be a function of the form V (x, t ) = Q(t )x, and with it determine
the HJB equations.
(b) Express the candidate optimal u ∗ (t ) as a function of Q(t ) [Hint: x (t )
is always positive].
(c) Determine Q(t ) for all t ∈ [0, 3].
(d) Determine the optimal u ∗ (t ) explicitly as a function of time, and
argue that this is the true optimal control (so not just the candidate
optimal control).
(e) What is the maximal satisfaction Jˆ[0,3] (x 0 , u ∗ )?
ẋ (t ) = x (t ) + u (t ), x (0) = x0 , U=R
with cost
T
J [0,T ] (x 0 , u ) = 12 x 2 (T ) + − x 2 (t ) − x (t ) u (t ) dt . (3.34)
0
(a) Solve the HJB equations. [Hint: try the special form V (x, t ) = Q(x).]
(b) Determine an optimal input u (t ).
3.7 E XERCISES 113
and
u (x, t ) = −x − x 2 ,
and
1 if t is a rational number
u (x, t ) = .
0 if t is an irrational number
Why are these three choices problematic? (For the second input
Example B.1.5 may be useful.)
3.7 Value function. Let T > 0 and consider the system with bounded input
is an optimal control for J [τ,T ] ( x (τ), u ) for every τ ∈ [0, T ] and x (τ).
(b) Use the above optimal input to determine the value function V (x, t ).
(Use the definition of value function, do not use the HJB equations.)
(c) Verify that this V (x, t ) satisfies the HJB equations (3.12).
3.8 Quartic control. Consider the problem of Example 3.4.6 on quartic con-
trol. Argue that
∂V ( x ∗ (t ), t )
u ∗ (t ) + x 4∗ (t ) + u 4∗ (t )
∂x
equals x 4∗ (T ) for all t ∈ [0, T ].
114 3 DYNAMIC P ROGRAMMING
ẋ (t ) = x (t ) u (t ), x (0) = x0 , U = R,
with a quadratic cost
T
J [0,T ] (x 0 , u ) = x 2 (t ) + ρ 2 u 2 (t ) dt
0
(a) Show that the HJB equations (3.12) for this form reduce to an ordi-
nary differential equation with “initial” condition,
2
G (z) + 1 − 14 G(z) + zG (z) = 0, G(0) = 0.
In what follows we assume that the solution V (x, t ) of the HJB equa-
tions (3.12) exists, and that it is the value function, and that the optimal
x ∗ , u ∗ are sufficiently smooth.
3.7 E XERCISES 115
G(z) zG (z)
2
G(z)
0
2 2 z
F IGURE 3.4: Graphs of G(z) and G(z) + zG (z) of the solution of the differ-
2
ential equation G (z) + 1 − 14 G(z) + zG (z) = 0, G(0) = 0. See Exercise 3.9.
(a) Determine the HJB equations (3.12) for this calculus of variations
problem formulated as an optimal control problem.
(b) Show that
∂V ( x ∗ (t ), t ) ∂F ( x ∗ (t ), ẋ ∗ (t ))
+ = 0.
∂x ∂ẋ
ẋ (t ) = − x (t ) + u (t ), x (0) = 0. (3.36)
As cost we take
T
( x (T ) − 1)2
J [0,T ] (0, u ) = + ( u (t ) − x (t ))2 dt . (3.37)
β 0
T
Here, as before, the term 0 ( u (t )− x (t ))2 dt is the energy loss, but now we
also added a final cost, ( x (T ) − 1)2 /β. This final cost depends on a positive
tuning parameter β that we can choose. We do not insist on having x (T ) =
1, but the final cost puts a penalty on the deviation of the final voltage
x (T ) from 1 (our desired voltage). For instance, if β ≈ 0 then we expect
x (T ) ≈ 1.
116 3 DYNAMIC P ROGRAMMING
(a) Assume the solution of the HJB equations (3.12) is of the form
V (x, t ) = (x − 1)2 P (t ).
1
P (t ) = ,
β+T −t
and that
t t +1 1
x ∗ (t ) = , u ∗ (t ) = , and J [0,T ] (0, u ∗ ) = ,
β+T β+T β+T
respectively, are the optimal state, optimal control, and optimal cost.
(c) Determine limβ↓0 x ∗ (T ), and explain on the basis of the cost (3.37)
why this makes sense.
where φ is the angle between pendulum and the vertical hanging position,
u is the applied torque, m is the mass of the pendulum, is the length of
the pendulum, and g is the gravitational acceleration.
The objective is to determine a torque u that stabilizes the pendulum to
the vertical hanging equilibrium φ = 2kπ, φ̇ = 0. This, by definition, means
that u is such that
3.13 Connection between value function and costate. Consider Example 3.4.8.
(a) Argue that the value function V (x, τ) defined in (3.28) does not
depend on τ.
(b) Suppose V (x) is a continuously differentiable function that solves
the HJB equation (3.30). Show that for every input for which
V ( x (∞)) = 0 we have that
J [0,∞) (x 0 , u ) ≥ V (x 0 ).
(d) Continue with the system and cost of (c). Find the input u ∗ : [0, ∞) →
R that minimizes J [0,∞) (x 0 , u ) over all inputs that steer the state to
zero (i.e., such that limt →∞ x (t ) = 0).
3.15 Infinite horizon optimal control. Determine the input u : [0, ∞) → R that
stabilizes the system ẋ (t ) = u (t ), x (0) = x 0 (meaning limt →∞ x (t ) = 0) and
∞
that minimizes 0 x 4 (t )+ u 2 (t ) dt over all inputs that stabilize the system.
3.16 Infinite horizon optimal control. Consider the nonlinear system with infi-
nite horizon quadratic cost
∞
ẋ (t ) = x (t ) u (t ), x (0) = x0 , U = R, J [0,T ] (x0 , u ) = x 2 (t )+ρ 2 u 2 (t ) dt .
0
(a) Determine all nonnegative functions V that satisfy (3.30) and such
that V (0) = 0.
(b) Determine the input u : [0, ∞) → R that stabilizes the system (mean-
∞
ing limt →∞ x (t ) = 0) and that minimizes 0 x 2 (t )+ρ 2 u 2 (t ) dt over all
inputs that stabilize the system.
(c) Express the optimal cost V (x 0 ) in terms of x 0 , ρ, and show that this
equals limT →∞ V (x 0 , 0) where V (x 0 , 0) is the value function of the
finite horizon case as given in Exercise 3.9(c) and Fig. 3.4.
3.19 Free final time. Consider the standard optimal control problem (3.1), but
now we optimize over all u : [0, T ] → U as well as all final times T ≥ 0. The
definition of the value function changes accordingly:
(a) Show that V (x, τ) does not depend on τ. Hence the value function is
of the form V (x).
(b) Assume that the value function V (x) is well defined and that it is C 1 .
Let u ∗ (t ) be an optimal control for a given x 0 , and let x ∗ (t ) be the
resulting optimal state, and assume that L( x ∗ (t ), u ∗ (t )) is continuous
in t . Show that
d V ( x ∗ (t )) ∂ V ( x ∗ (t ))
= f ( x ∗ (t ), u ∗ (t )) = −L( x ∗ (t ), u ∗ (t )).
dt ∂x T
(c) Let V and u ∗ and x ∗ be as in part (b) of this exercise. Show that
H ( x ∗ (t ), p ∗ (t ), u ∗ (t )) = 0 ∀t ∈ [0, T ],
x ∗ (t ))
for p ∗ (t ) := ∂ V (∂x . Which two theorems from Chapter 2 come to
mind?
Chapter 4
Optimal control theory took shape in the late 1950s, among others stimulated
by the space programs in the Soviet Union and the USA. At about the same time
there was a clear paradigm shift in control theory. Till about 1960 the trans-
fer function was the de-facto standard representation of linear time-invariant
dynamical systems with inputs and outputs. This changed in the sixties when
Kalman (and others) advocated the use of state space representations. These
developments are not unrelated since optimal control theory is based on the
representation of dynamical systems by systems of differential equations, i.e.,
state space models. Furthermore, the introduction of the Kalman-Bucy filter in
the early 1960s, again based on state representations and replacing the Wiener
filter, contributed to the rise of state space descriptions.
Kalman also introduced and solved the optimal control problem for lin-
ear systems with quadratic costs. This has become a standard tool in the con-
trol of linear systems, and it also paved the way for the highly influential state
space H∞ control theory as it emerged in the eighties and nineties (see Chap-
ter 5). In this chapter we study optimal control problems for linear systems with
quadratic costs, close to Kalman’s original problem. Specifically, we consider the
minimization of quadratic costs of the form
T
J [0,T ] (x 0 , u ) := x T (t )Q x (t ) + u T (t )R u (t ) dt + x T (T )S x (T ) (4.1)
0
over all inputs u : [0, T ] → Rm and states x : [0, T ] → Rn that are governed by a
linear system with given initial state,
ẋ (t ) = A x (t ) + B u (t ), x (0) = x0 . (4.2)
This problem is known as the finite horizon linear quadratic optimal control
problem, or LQ problem for short. Later in this chapter we also consider the
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 121
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0_4
122 4 L INEAR QUADRATIC C ONTROL
The Hamiltonian (2.12) for system (4.2) and cost (4.1) is H (x, p, u) = p T (Ax +
Bu) + x T Qx + u T Ru. The computations to come clean up considerably if we
express the costate as 2p, thus our new costate p is half of the standard costate.
This way the Hamiltonian becomes
Working out the Hamiltonian equations (2.14) for the state x and the halved
costate p , we obtain
ẋ (t ) = A x (t ) + B u (t ), x (0) = x0 , (4.6)
2 ṗ (t ) = −A 2 p (t ) − 2Q x (t ),
T
2 p (T ) = 2S x (T ).
∂H (x, 2p, u)
= B T 2p + 2Ru,
∂u
and so the input that minimizes the Hamiltonian is
u ∗ (t ) := −R −1 B T p (t ). (4.7)
Hence if we can additionally compute p (t ) then this settles the optimal control.
Substitution of (4.7) into the Hamiltonian equations (4.6) yields the system of
coupled differential equations
ẋ ∗ (t ) A −B R −1 B T x ∗ (t ) x ∗ (0) = x0
= , (4.8)
ṗ ∗ (t ) −Q −A T p ∗ (t ) p ∗ (T ) = S x ∗ (T ).
The (2n) × (2n) matrix here is called a Hamiltonian matrix and we denote it by
H,
A −B R −1 B T
H := . (4.9)
−Q −A T
Remark 4.2.1. Another way to obtain (4.8), instead of halving the costate as
done above, is to halve the cost criterion (4.1). Clearly, this does not change the
optimal control u ∗ , it only scales the cost with a factor 1/2. This approach leads
to the Hamiltonian p T (Ax +Bu)+ 12 x T Qx + 12 u T Ru which is half of the expression
H (x, 2p, u) as considered above, and it also leads to (4.8).
124 4 L INEAR QUADRATIC C ONTROL
Lemma 4.2.2 (Optimal cost). For every solution x ∗ , p ∗ of (4.8), the cost (4.1) for
input u ∗ := −R −1 B T p ∗ equals
J [0,T ] (x 0 , u ∗ ) = p ∗T (0)x 0 .
Proof. Consider first the identity (and here we momentarily skip time argu-
ments)
dt ( p ∗ x ∗ ) =
d T
p ∗T ẋ ∗ + ṗ ∗T x ∗ = p ∗T (A x − B R −1 B T p ∗ ) + (−Q x ∗ − A T p ∗ )T x ∗
= − p ∗T B R −1 B T p ∗ − x ∗T Q x ∗
= −( u ∗T R u ∗ + x ∗T Q x ∗ ).
This matrix is simple enough to allow for an explicit solution of its matrix expo-
nential,
1 et + e−t − et + e−t
eH t = .
2 − et + e−t et + e−t
for an, as yet unknown, initial costate p ∗ (0). This p ∗ (0) must be chosen such
that x ∗ (T ), p ∗ (T ) match the final condition p ∗ (T ) = S x ∗ (T ) = 0 x ∗ (T ) = 0. It is
not hard to see that this requires
eT − e−T
p ∗ (0) = x0 .
eT + e−T
This then fully determines the state and costate for all t ∈ [0, T ] as
x ∗ (t ) 1 et + e−t − et + e−t 1
= eT − e−T x0 .
p ∗ (t ) 2 − et + e−t et + e−t eT + e−T
The initial costate p ∗ (0) is linear in x 0 , and therefore the entire state and costate
( x ∗ , p ∗ ) is linear in x 0 . Hence the optimal cost is quadratic in x 0 ,
eT − e−T
J [0,T ] (x 0 , u ∗ ) = p ∗ (0)x 0 = x 02 .
eT + e−T
Furthermore, the optimal input is linear in the costate,
u ∗ (t ) = −R −1 B T p ∗ (t ) = − p ∗ (t ),
and since the costate is linear in x 0 , the optimal input is also linear in x 0 .
0 = S x ∗ (T ) − p ∗ (T )
x ∗ (T )
= S −I
p ∗ (T )
Σ11 (T ) Σ12 (T ) x0
= S −I
Σ21 (T ) Σ22 (T ) p ∗ (0)
= SΣ11 (T ) − Σ21 (T ) x 0 + SΣ12 (T ) − Σ22 (T ) p ∗ (0). (4.11)
This final condition has a unique solution p ∗ (0) iff SΣ12 (T )−Σ22 (T ) is invertible,
and then
p ∗ (0) = M x0
where M is defined as
−1
M = − SΣ12 (T ) − Σ22 (T ) SΣ11 (T ) − Σ21 (T ) . (4.12)
Hence the question is: does the inverse of SΣ12 (T ) − Σ22 (T ) always exist? The
answer is yes:
Proof. First take x 0 = 0, and realize that (4.8) has at least the trivial solution
x ∗ = p ∗ = 0. Lemma 4.2.2 showed that for every possible solution x ∗ , p ∗ we have
T
x ∗ (T )S x ∗ (T ) +
T
x ∗T (t )Q x ∗ (t ) + u ∗T (t )R u ∗ (t ) dt = p ∗T (0)x0 ,
0
and here that is zero because we took x 0 = 0. Since all terms on the left-hand
side of the above equation are nonnegative, it must be that all these terms are
zero. In particular u ∗T (t )R u ∗ (t ) = 0. Now R > 0, so necessarily u ∗ (t ) = 0. This, in
turn, implies that ẋ ∗ (t ) = A x ∗ (t ) + B u ∗ (t ) = A x ∗ (t ). Given that x ∗ (0) = x 0 = 0 we
get x ∗ (t ) = 0 for all time and, as a result, ṗ ∗ (t ) = −Q x ∗ (t ) − A T p ∗ (t ) = −A T p ∗ (t )
and p ∗ (T ) = S x ∗ (T ) = 0. This shows that p ∗ (t ) is zero for all time as well.
Conclusion: for x 0 = 0 the solution ( x ∗ , p ∗ ) of (4.8) exists and is unique. This
implies that SΣ12 (T ) − Σ22 (T ) is nonsingular for otherwise there would have
existed multiple p ∗ (0) that satisfy the boundary condition (4.11). Invertibility
of SΣ12 (T ) − Σ22 (T ) in turn shows that the final condition (4.11) has a unique
solution, p ∗ (0) = M x 0 , for every x 0 . ■
4.2 F INITE H ORIZON LQ: M INIMUM P RINCIPLE 127
Example 4.2.5 (Integrator, see also Example 3.4.4). Consider again the scalar
integrator system ẋ (t ) = u (t ), x (0) = x 0 and take as cost
T
J [0,T ] (x 0 , u ) = x (T ) +
2
R u 2 (t ) dt (4.14)
0
0 = S x ∗ (T ) − p ∗ (T )
HT x0
= S −1 e
p ∗ (0)
1 −T /R x0
= 1 −1
0 1 p ∗ (0)
= x 0 − (T /R + 1) p ∗ (0).
Special about this example is that the costate is constant, p ∗ (t ) = p ∗ (0). The
optimal control is therefore constant as well,
1 p ∗ (0) x0
u ∗ (t ) = − p ∗ (t ) = − =− .
R R T +R
For R T the optimal control u ∗ (t ) is small, which is to be expected because
for large R the input is penalized strongly in the cost (4.14). If R ≈ 0 then control
is cheap. In this case the control is not necessarily large, u ∗ (t ) ≈ −x 0 /T , but
large enough to steer the final state x ∗ (T ) to something close to zero, x ∗ (T ) =
x 0 (1 − T /(R + T )) = x 0 R/(T + R) ≈ 0.
p ∗1 (t ) = x ∗1 (3). (4.16)
ṗ ∗2 (t ) = − p ∗1 (t ) = − x ∗1 (3), p ∗2 (3) = 0,
so that
p ∗2 (t ) = (3 − t ) x ∗1 (3). (4.17)
ẋ ∗2 (t ) = − p ∗2 (t ) = (t − 3) x ∗1 (3), x ∗2 (0) = 0.
4.3 F INITE H ORIZON LQ: DYNAMIC P ROGRAMMING 129
x ∗2 (t ) = ( 12 t 2 − 3t ) x ∗1 (3). (4.18)
ẋ ∗1 (t ) = x ∗2 (t ) = ( 12 t 2 − 3t ) x ∗1 (3), x ∗1 (0) = 1.
Its solution is
1
x ∗1 (t ) = 6t
3
− 32 t 2 x ∗1 (3) + 1. (4.19)
i.e.,
1
x ∗1 (3) = . (4.20)
10
Now we have solved the differential equation (4.15), and the solution is given
by (4.16)–(4.19), with x ∗1 (3) equal to 1/10, see (4.20). Hence, the optimal con-
trol (4.7) is
1
u ∗ (t ) = −R −1 B T p ∗ (t ) = −B T p ∗ (t ) = − 0 1 p ∗ (t ) = − p ∗2 (t ) = (t − 3),
10
and the optimal cost is
T T
p ∗1 (0) x ∗1 (0) 1/10 1 1
= = .
p ∗2 (0) x ∗2 (0) 3/10 0 10
The LQ problem can be solved with the aid of dynamic programming as well.
The equation that has to be solved in dynamic programming is a partial differ-
ential equation—the Hamilton-Jacobi-Bellman (HJB) equation—and that is, in
general, not an easy task. For LQ it can be done, however.
So consider again a linear system
ẋ (t ) = A x (t ) + B u (t ), x (0) = x0
V (x, t ) = x T P (t )x
The minimization over u can, like in the previous section, be solved by setting
the gradient of 2x T P (t )(Ax + Bu) + x T Qx + u T Ru with respect to u equal to zero.
This gives for each x and each t as input
u = −R −1 B T P (t )x
Here we used the fact that x T P (t )Ax is the same as x T A T P (t )x. All terms in (4.21)
have a factor x T (on the left) and a factor x (on the right), and the matrix inside
the brackets is symmetric. Hence (4.21) holds for all x ∈ Rn iff the equation in
which x T and x are removed holds. Thus
Ṗ (t ) = −P (t )A − A T P (t ) + P (t )B R −1 B T P (t ) − Q, P (T ) = S . (4.22)
u ∗ (t ) = −R −1 B T P (t ) x (t )
makes the closed-loop system satisfy
ẋ (t ) = A x (t ) + B u ∗ (t ) = (A − B R −1 B T P (t )) x (t ), x (0) = x0 .
4.3 F INITE H ORIZON LQ: DYNAMIC P ROGRAMMING 131
u ∗ (t ) := −R −1 B T P (t ) x (t ) (4.23)
J [0,T ] (x 0 , u ∗ ) = x 0T P (0)x 0 ,
d
dt ( x T P x ) = ẋ T P x + x T Ṗ x + x T P ẋ
= (A x + B u )T P x + x T (−P A − P A T + P B B T P − Q) x + x T P (A x + B u )
= u T B T P x + x T P B u + x T P B B T P x − x TQ x
= ( u + B T P x )T ( u + B T P x ) − x T Q x − u T u .
(Verify the final identity yourself.) From this it follows that the cost can also be
expressed as (where, again, we omit the time arguments),
T
J [0,T ] (x 0 , u ) = x T (T )S x (T ) + x T Q x + u T u dt
0
T
= x (T )S x (T ) +
T d
− dt ( x T P x ) + ( u + B T P x )T ( u + B T P x ) dt
0
T
T
= x (T )S x (T ) − x (t )P (t ) x (t ) 0 +
T T
( u + B T P x )T ( u + B T P x ) dt
0
T
= x 0T P (0)x 0 + ( u + B T P x )T ( u + B T P x ) dt .
0
Clearly, the final integral is nonnegative for every u , and it is minimal if we take
u (t ) = −B T P (t ) x (t ), and thus the optimal cost is x0T P (0)x0 . For R = I the proof is
similar, see Exercise 4.10. ■
132 4 L INEAR QUADRATIC C ONTROL
Notice that Proposition 4.3.1 assumes symmetry of S and Q, but not that
they are positive semi-definite. Also notice that the optimal control (4.23) is of
a special form: first we have to determine the solution P (t ) of the RDE, but this
can be done irrespective of x 0 . Once P (t ) is determined the optimal control can
be implemented as a linear time-varying state feedback (4.23). Thus the “gain”
matrix F (t ) := R −1 B T P (t ) in the optimal feedback can be computed “off-line”,
i.e., based on the knowledge of the system matrices A, B and the cost criterion
matrices Q, R, S only.
Example 4.3.2 (Example 4.2.5 continued). Consider again the integrator system
ẋ (t ) = u (t ), x (0) = x0 of Example 4.2.5 with
T
J [0,T ] (x 0 , u ) = x 2 (T ) + R u 2 (t ) dt
0
for some R > 0. Here S = 1 and Q = 0, and thus the RDE (4.22) becomes
Ṗ (t ) = P 2 (t )/R, P (T ) = 1.
x 02
J [0,T ] (x 0 , u ∗ ) = P (0)x 02 = ,
1 + T /R
and the optimal input is
P (t ) x (t ) x (t )
u ∗ (t ) = −R −1 B T P (t ) x (t ) = − =− . (4.24)
R R +T −t
In this example the optimal control u ∗ is given in state feedback form, while
in Example 4.2.5 (where we handled the same LQ problem) the control input is
given as a function of time. The feedback form is often preferred in applications,
but for this particular problem the feedback form (4.24) blurs the fact that the
optimal state and optimal control are just linear functions of time, see Exam-
ple 4.2.5.
Proposition 4.3.1 assumes the existence of a solution P (t ) of the RDE, but does
not require S and Q to be positive semi-definite. If S ≥ 0,Q ≥ 0 (and R > 0) then
existence of P (t ) can in fact be guaranteed. So for standard LQ problems we
have a complete solution:
4.4 R ICCATI D IFFERENTIAL E QUATIONS 133
(e i + e j )T P (t )(e i + e j ) − (e i − e j )T P (t )(e i − e j ) = 4p i j (t ).
Example 4.4.2 (Negative Q, finite escape time). Consider the integrator system
ẋ (t ) = u (t ) with initial state x (0) = x0 and cost
T
J [0,T ] (x 0 , u ) = − x 2 (t ) + u 2 (t ) dt .
0
Ṗ (t ) = P 2 (t ) + 1, P (T ) = 0.
P (t ) = tan(t − T ), t ∈ (T − π2 , T ].
t esc = T − π2 ,
see Fig. 4.1. If T < π2 then there is no escape time in [0, T ] and, hence,
P (t ) := tan(t − T ) is then well defined on the entire horizon [0, T ], and, con-
sequently,
V (x, t ) = x 2 tan(t − T )
u ∗ (t ) = −R −1 B T P (t ) x (t ) = − tan(t − T ) x (t )
for some small > 0. For this input the state x (t ) is constant over [0, t esc +], and
continues optimally over [t esc + , T ]. The cost for this input thus is
tesc +
J [0,T ] (x 0 , u ) = − x 2 (t ) + u 2 (t ) dt + V (x 0 , t esc + )
0
= −(t esc + )x 02 + tan(− π2 + )x 02 .
It diverges to −∞ as ↓ 0.
4.4 R ICCATI D IFFERENTIAL E QUATIONS 135
F IGURE 4.1: Graph of tan(t − T ) for t ∈ [0, T ]. Left: if 0 < T < π2 . In that case
tan(t − T ) is defined for all t ∈ [0, T ]. Right: if T ≥ π2 . Then tan(t − T ) is not
defined at T − π2 ∈ [0, T ]. See Example 4.4.2.
∂ V ( x ∗ (t ), t )
2 p ∗ (t ) = .
∂x
p ∗ (t ) = P (t ) x ∗ (t ). (4.25)
for M := −(SΣ12 (T ) − Σ22 (T ))−1 (SΣ11 (T ) − Σ21 (T )). Consider the mapping x 0 →
x ∗ (t ) given by the upper part of (4.26), i.e., x ∗ (t ) = (Σ11 (t ) + Σ12 (t )M )x0 . If this
mapping is invertible at every t ∈ [0, T ] then x 0 follows uniquely from x ∗ (t ) as
x 0 = (Σ11 (t ) + Σ12 (t )M )−1 x ∗ (t ) and, consequently, p ∗ (t ) also follows uniquely
136 4 L INEAR QUADRATIC C ONTROL
from x ∗ (t ):
Comparing this with (4.25) suggests the following explicit formula for P (t ).
Lemma 4.4.3 (Solution of RDE’s using the Hamiltonian). Let S,Q be positive
semi-definite n ×n matrices, and R = R T > 0. Then the solution P (t ), t ∈ [0, T ], of
the RDE
Ṗ (t ) = −P (t )A − A T P (t ) + P (t )B R −1 B T P (t ) − Q, P (T ) = S,
is
−1
P (t ) = Σ21 (t ) + Σ22 (t )M Σ11 (t ) + Σ12 (t )M . (4.27)
Here M := −(SΣ12 (T )−Σ22 (T ))−1 (SΣ11 (T )−Σ21 (T )), and Σi j are n ×n sub-blocks
of the matrix exponential eH t as defined in (4.10).
Proof. Recall that the solution P (t ) of the RDE exists. If Σ11 (t ) + Σ12 (t )M would
have been singular at some t = t̄ , then any nonzero x 0 in the null space of
Σ11 (t̄ ) + Σ12 (t̄ )M renders x ∗ (t̄ ) = 0 while p ∗ (t̄ ) is nonzero (because Σ(t ) := eH t
is invertible). This contradicts the fact that p ∗ (t ) = P (t ) x ∗ (t ). Hence Σ11 (t ) +
Σ12 (t )M is invertible for every t ∈ [0, T ] and, consequently, the mapping from
x ∗ (t ) to p ∗ (t ) follows uniquely from (4.26) and it equals (4.27). ■
T
Example 4.4.4. In Example 4.2.3 we tackled the minimization of 0 x 2 (t ) +
u 2 (t ) dt for ẋ (t ) = u (t ) using Hamiltonians, and we found that
Σ11 (t ) Σ12 (t ) 1 et + e−t − et + e−t eT − e−T
= , M = .
Σ21 (t ) Σ22 (t ) 2 − et + e−t et + e−t eT + e−T
The RDE for this problem is
Ṗ (t ) = P 2 (t ) − 1, P (T ) = 0.
Now we turn to the infinite horizon LQ problem. This is the problem of mini-
mizing
∞
J [0,∞) (x 0 , u ) := x T (t )Q x (t ) + u T (t )R u (t ) dt (4.28)
0
ẋ (t ) = A x (t ) + B u (t ), x (0) = x0 .
Ṗ T (t ) = −P T (t )A − A T P T (t ) + P T (t )B R −1 B T P T (t ) − Q, P T (T ) = 0. (4.29)
ẋ (t ) = u (t ), x (0) = x0 ,
Ṗ T (t ) = P T2 (t ) − 1, P T (T ) = 0.
P T (t )
eT −t − e−(T −t )
P T (t ) = tanh(T − t ) = .
eT −t + e−(T −t )
T t
0 = A T P + P A − P B R −1 B T P + Q . (4.30)
The following theorem shows that all this is indeed the case. It requires just one
extra condition (apart from the standard conditions Q ≥ 0, R > 0): for each x 0
there needs to exist at least one input that renders the cost J [0,∞) (x 0 , u ) finite.
exists. The diagonal entries of P T (t ) hence converge. For the off-diagonal entries
we use that
lim (e i + e j )T P T (t )(e i + e j ) = lim e iT P T (t )e i + e Tj P T (t )e j + 2e iT P T (t )e j
T →∞ T →∞
= p i i + p j j + lim 2e iT P T (t )e j .
T →∞
4.5 I NFINITE H ORIZON LQ AND A LGEBRAIC R ICCATI E QUATIONS 139
The limit on the left-hand side exists, so the limit p i j := limT →∞ e iT P T (t )e j exists
as well. Therefore all entries of P T (t ) converge as T → ∞. The limit is indepen-
dent of t because P T (t ) = P T −t (0).
Clearly, P ≥ 0 because it is the limit of P T (t ) ≥ 0.
Since P T (t ) converges to a constant matrix, also Ṗ T (t ) = −P T (t )A− A T P T (t )+
P T (t )B R −1 B T P T (t ) − Q converges to a constant matrix as T → ∞. This constant
t +1
matrix must be zero because t Ṗ T (τ) dτ = P T (t + 1) − P T (t ) → 0 as T → ∞. ■
LQ with stability
The next example shows that in some cases the LQ problem with stability
has an easy solution.
Example 4.5.4 (LQ with stability). Consider the problem of Example 4.5.1:
∞
ẋ (t ) = u (t ), x (0) = x0 , J [0,∞) (x 0 , u ) = x 2 (t ) + u 2 (t ) dt .
0
x 2 + u 2 = ( x + u )2 − 2 xu = ( x + u )2 − 2 x ẋ .
Interestingly, the term −2 x ẋ has an explicit antiderivative, namely − x 2 , so
x 2 + u 2 = dt
d
(− x 2 ) + ( x + u )2 .
Integrating this over t ∈ [0, ∞) we see that the cost for stabilizing inputs equals
∞
J [0,∞) (x 0 , u ) = x 0 +
2
( x (t ) + u (t ))2 dt . (4.32)
0
140 4 L INEAR QUADRATIC C ONTROL
u = −x.
Since the state feedback u ∗ := − x indeed stabilizes (because the closed-loop sys-
tem becomes ẋ = − x ) we conclude that this state feedback is the optimal con-
trol, and that the optimal (minimal) cost is
J [0,∞) (x 0 , u ∗ ) = x 02 .
In this example, and also in the general finite horizon LQ problem, we have
that the optimal cost is quadratic in the initial state, and that the optimal input
can be implemented as a state feedback. Inspired by this we expect that every
infinite horizon LQ problem has these properties. That is, we conjecture that
the optimal cost is of the form
x 0T P x 0
for some matrix P , and that the optimal input equals u ∗ (t ) := −F x (t ) for some
matrix F . This leads to the following central result.
A T P + P A − P B R −1 B T P + Q = 0 (4.33)
u ∗ (t ) := −R −1 B T P x (t )
is the solution of the LQ problem with stability, and the optimal cost is x 0T P x 0 .
(A − B R −1 B T P )T (P − P T ) + (P − P T )(A − B R −1 B T P ) = −Q + Q T = 0.
Using this identity, Corollary B.5.3 (p. 227) shows that for asymptotically sta-
ble A − B R −1 B T P we necessarily have P − P T = 0, i.e., stabilizing solutions P of
4.5 I NFINITE H ORIZON LQ AND A LGEBRAIC R ICCATI E QUATIONS 141
the ARE are symmetric. To show that there is at most one stabilizing solution
we proceed as follows. Suppose P 1 , P 2 are two stabilizing solutions of the ARE
(hence P 1 and P 2 are symmetric), and let x 1 , x 2 be solutions of the correspond-
ing ẋ 1 = (A − B R −1 B T P 1 ) x 1 and ẋ 2 = (A − B R −1 B T P 2 ) x 2 . Then
d
dt ( x 1T (P 1 − P 2 ) x 2 )
= x 1T (A − B R −1 B T P 1 )T (P 1 − P 2 ) + (P 1 − P 2 )(A − B R −1 B T P 2 ) x 2
= x 1T A T P 1 − A T P 2 − P 1 B R −1 B T P 1 + P 1 B R −1 B T P 2
+P 1 A − P 2 A − P 1 B R −1 B T P 2 + P 2 B R −1 B T P 2 x 2
= x 1T (−Q + Q) x 2 = 0.
Hence x 1 (t )(P 1 −P 2 ) x 2 (t ) is constant as a function of time. By asymptotic stabil-
ity we have that limt →∞ x 1 (t ) = limt →∞ x 2 (t ) = 0. Therefore x 1 (t )(P 1 −P 2 ) x 2 (t ) is,
in fact, zero for all time. Since this holds for every initial condition x 1 (0), x 2 (0),
we conclude that P 1 = P 2 .
In the rest of the proof we assume that P is the symmetric stabilizing solu-
tion of the ARE. We expect the optimal u to be a linear state feedback u = −F x
for some F , so with that in mind define v := F x + u . (If our hunch is correct then
optimal means v = 0.) Next we write x T Q x + u T R u and v T R v and dt d
( x T P x ) as
quadratic expressions in ( x , u ):
Q 0 x
x TQ x + u T R u = x T u T ,
0 R u
T
F RF F T R x
v R v = ( x F + u )R(F x + u ) = x u
T T T T T T
,
RF R u
dt ( x P x ) = ẋ T P x + x T P ẋ = ( x T A T + u T B T )P x + x T P (A x + B u )
d T
ATP + P A PB x
= xT uT .
B TP 0 u
Therefore
T A T P + P A + Q −F T RF P B −F T R x
x Q x+u R u−v
T T T
R v + dt
d
(x T P x) = x u .
B T P − RF 0 u
Since P is symmetric and satisfies (4.33), the choice F := R −1 B T P makes the
above matrix on the right-hand side equal to zero. So then
x T Q x + u T R u = − dt
d
(x T P x) + v T R v,
and, hence, the cost (4.31) equals
∞
J [0,∞) (x 0 , u ) = x 0 P x 0 +
T
v (t )T R v (t ) dt ,
0
whenever the input stabilizes the system. Given x 0 the above cost is mini-
mal for v = 0, provided it stabilizes. It does: since v := u + F x = u + R −1 B T P x
we have v = 0 iff u = −F x = −R −1 B T P x , and so the closed-loop system is
ẋ = (A − B R −1 B T P ) x , which, by assumption on P , is asymptotically stable. ■
142 4 L INEAR QUADRATIC C ONTROL
The theorem does not say that the ARE has a stabilizing solution. It only says
that if a stabilizing solution P exists, then it is unique and symmetric, and the
LQ problem with stability is solved, with u ∗ (t ) := −R −1 B T P x (t ) being the opti-
mal control. It is not yet clear under what conditions there exists a stabilizing
solution P of the ARE (4.33). This will be addressed by considering the solution
P T (t ) of the finite horizon problem, and showing how under stabilizability and
detectability assumptions1 limT →∞ P T (t ) exists and defines such a solution:
Theorem 4.5.6 (Three ways to solve the LQ problem with stability). Consider
the LQ problem with stability as formulated in Definition 4.5.3, and consider
the associated ARE (4.33). (In particular assume Q ≥ 0, R > 0.)
If (A, B ) is stabilizable and (Q, A) detectable, then there is a unique
stabilizing solution P of the ARE, and this P is symmetric. Consequently,
u ∗ (t ) := −R −1 B T P x (t ) solves the infinite horizon LQ problem with stability,
and x 0T P x 0 is the minimal cost. Moreover this unique P can be determined in
the following three equivalent ways:
(A − B R −1 B T P )T P + P (A − B R −1 B T P ) + Q + P B R −1 B T P = 0.
Next, postmultiply this equation with the eigenvector x, and premultiply with
its complex conjugate transpose x ∗ :
x ∗ (A − B R −1 B T P )T P + P (A − B R −1 B T P ) + Q + P B R −1 B T P x = 0.
If Re(λ) ≥ 0 then (λ∗ +λ)x ∗ P x ≥ 0, implying that all the above three terms are in
fact zero: (λ∗ + λ)x ∗ P x = 0, Qx = 0, and B T P x = 0 (and, consequently, Ax = λx).
This contradicts detectability. So it cannot be that Re(λ) ≥ 0. It must be that
A − B R −1 B T P is asymptotically stable.
(Uniqueness & 3 =⇒ 1). Theorem 4.5.5 shows that there is at most one sta-
bilizing solution P of the ARE. Now P := limT →∞ P T (t ) is one solution that stabi-
lizes (because 1. =⇒ 2. =⇒ 3.). Hence the stabilizing solution of the ARE exists
and is unique, and it equals the matrices P from Item 1 and 2, which, hence, are
unique as well.
Theorem 4.5.5 then guarantees that u ∗ = −R −1 B T P x solves the LQ problem
with stability, and that x 0T P x 0 is the optimal cost. ■
Theorem 4.5.6 shows that we have several ways to determine the solution
P that solves the LQ problem with stability, namely (a) limT →∞ P T (t ), (b) the
unique symmetric positive semi-definite solution of the ARE, and (c) the unique
stabilizing solution of the ARE.
Example 4.5.7 (LQ problem with stability of the integrator system solved in
three ways). Consider again the integrator system
ẋ (t ) = u (t ), x (0) = x0
and cost
∞
J [0,∞) (x 0 , u ) = x 2 (t ) + u 2 (t ) dt .
0
1. In Example 4.5.1 we handled the finite horizon case of this problem, and
we found that P := limT →∞ P T (t ) = 1.
2. We could have gone as well for the unique symmetric, positive semi-
definite solution of the ARE. The ARE in this case is
−P 2 + 1 = 0,
3. The ARE has two solutions, P = ±1, and Theorem 4.5.6 guarantees that
precisely one of them is stabilizing. The solution P is stabilizing if A −
B R −1 B T P = −P is less than zero. Clearly this, again, gives P = 1.
While for low-order systems the 2nd option (that P is positive semi-definite)
is often the easiest way to determine P , general numerical recipes usually
exploit the 3rd option. This is explained in the final part of this section where
we examine the connection with Hamiltonian matrices.
144 4 L INEAR QUADRATIC C ONTROL
For the finite horizon LQ problem we established in Lemma 4.4.3 a tight con-
nection between solutions P (t ) of RDE’s and Hamiltonian matrices H . For the
infinite horizon case a similar connection exists. This we explore now.
A matrix P satisfies the ARE
P A + A T P − P B R −1 B T P + Q = 0
iff
−Q − A T P = P (A − B R −1 B T P ),
This is interesting because in the case that all matrices here are numbers (and
the Hamiltonian matrix H hence a 2×2 matrix) then it says that PI is an eigen-
vector of the Hamiltonian matrix, and that A − B R −1 B T P is its eigenvalue. This
connection between P and eigenvectors/eigenvalues of the Hamiltonian matrix
H is the key to most numerical routines for computation of P . This central
result is formulated in the following theorem. The subsequent examples show
how the result can be used to find P concretely.
P :=V2V1−1 ,
Proof. This proof is involved. We assume familiarity with detectability and sta-
bilizability as explained in Appendix A.6. The proof again exploits the remark-
able property that solutions of the associated Hamiltonian system (now with
initial conditions, possibly complex-valued),
ẋ (t ) A −B R −1 B T x (t ) x (0) x0
= , = ∈ C2n (4.36)
ṗ (t ) −Q −A T p (t ) p (0) p0
satisfy
d
dt ( p ∗ x ) = −( x ∗Q x + p ∗ B R −1 B T p ), (4.37)
(see the proof of Lemma 4.2.2). Note that we consider the system of differential
equations over C2n , instead of over R2n , and here p ∗ means the complex con-
jugate transpose of p . The reason is that eigenvalues and eigenvectors may be
complex-valued. Integrating (4.37) over t ∈ [0, ∞) tells us that
∞
x ∗ (t )Q x (t ) + p ∗ (t )B R −1 B T p (t ) dt = p 0∗ x0 − lim p ∗ (t ) x (t ), (4.38)
0 t →∞
x (t )
provided the limit exists. In what follows we denote by p (t ) the solution
of (4.36).
x
1. Suppose p00 is an eigenvector of H with imaginary eigenvalue λ. Then
x (t ) λt x 0 ∗
p (t ) = e p 0 . Now p (t ) x (t ) is constant, hence both sides of (4.37) are
zero for all time. So both x ∗ (t )Q x (t ) and B T p (t ) are zero for all time.
Inserting this into (4.36) shows that λx 0 = Ax 0 and λp 0 = −A T p 0 . Thus
A−λI
Q x 0 = 0 and p 0∗ A+λI B = 0. Stabilizability and detectability imply
x
that then x 0 = 0, p 0 = 0, but p00 is an eigenvector, so nonzero. Contradic-
tion, hence H has no imaginary eigenvalues.
Exercise 4.19 shows that r (λ) := det(λI − H ) equals r (−λ). So H has as
many (asymptotically) stable eigenvalues as unstable eigenvalues.
This equation is nothing else than the ARE (verify this for yourself ). And P
is a stabilizing solution because A −B R −1 B T P = Λ̂ is asymptotically stable.
Uniqueness and symmetry of P we showed earlier (Theorem 4.5.5).
■
Realize that any V ∈ R(2n)×n of rank n for which H V = V Λ does the job if Λ
is asymptotically stable. That is, even though there are many such V , we always
have that V1 is invertible and that P follows uniquely as P = V2V1−1 . As already
mentioned in the above proof, in case H has n distinct asymptotically stable
eigenvalues λ1 , . . . , λn , with eigenvectors v 1 , . . . , v n , then we can take
V = v1 v2 ··· vn
Its characteristic polynomials is λ2 − 1, and the eigenvalues are λ1,2 = ±1. Its
asymptotically stable eigenvalue is λas = −1, and it is easy to verify that v is an
eigenvector corresponding to this asymptotically stable eigenvalue iff
v1 1
v := = c, c = 0.
v2 1
This agrees with what we found in Example 4.5.7. As predicted, P does not
depend on the choice of eigenvector (the choice of c). Also, the (eigen)value of
A − B R −1 B T P = −1 as predicted equals the asymptotically stable eigenvalue of
the Hamiltonian matrix, λas = −1. The optimal control is u ∗ = −R −1 B T P x = − x .
The first two eigenvalues, λ1,2 , are asymptotically stable so we need eigenvec-
tors corresponding to these two. Not very enlightening manipulation shows that
we can take
⎡ ⎤
−λ1,2
⎢−λ2 ⎥
⎢ ⎥
v 1,2 = ⎢ 1,2 ⎥ .
⎣ 1 ⎦
λ31,2
is the V we need. (Note that this matrix is complex; this is not a problem.) With
V known, it is easy to compute the stabilizing solution of the ARE,
−1
1 1 −λ1 −λ2 3 1
P = V2V1−1 = 3 = .
λ1 λ32 −λ21 −λ22 1 3
The optimal input is u ∗ = −R −1 B T P x = −p 21 x 1 − p 22 x 2 = − x 1 − 3 x 2 . The LQ-
optimal closed-loop system is described by
0 1
ẋ ∗ (t ) = (A − B R −1 B T P ) x ∗ (t ) = x ∗ (t ),
−1 3
and its eigenvalues are λ1,2 = − 12 3 ± 12 i (which, as predicted, are the asymptot-
ically stable eigenvalues of H ).
In five examples we explore the use of infinite horizon LQ theory for the design
of controllers. The first two examples discuss the effect of tuning parameters on
the control and cost. The final three examples are about control of cars.
Example 4.6.1 (Tuning the controller). Consider the system with output,
ẋ (t ) = u (t ), x (0) = x0 = 1,
y (t ) = 2 x (t ),
and suppose the task is to steer the output y to zero “quickly” but without using
“excessive” inputs u . A way to resolve this problem is by considering the LQ
problem with stability with cost
∞ ∞
y (t ) + ρ u (t ) dt =
2 2 2
4 x 2 (t ) + ρ 2 u 2 (t ) dt .
0 0
ρ > 0,
y (t )
0
4 t
u (t )
We want to steer the output y to zero quickly but not too steeply, so ẏ should
be small as well, and all that using small u . This requires a cost function with
two tuning parameters,
∞
σ2 y 2 (t ) + (1 − σ2 ) ẏ 2 (t ) + ρ 2 u 2 (t ) dt .
0
The parameter σ ∈ [0, 1] defines the trade-off between small y and small ẏ ,
and the parameter ρ > 0 defines the trade-off between small ( y , ẏ ) and small u .
Given σ and ρ the LQ solution can be determined numerically using the
eigenvalues and eigenvectors of the corresponding Hamiltonian matrix, but we
skip the details. In what follows we take as initial state x 0 = (1, 0, 0). Figure 4.3
shows the response of the optimal u ∗ and resulting y ∗ for various combinations
of σ and ρ. For σ = 1 the term ẏ is not included in the cost, so we can expect
“steep” behavior in the output. For σ ≈ 0 the output converges slowly to zero. As
for ρ, we see that smaller ρ means larger controls u ∗ and faster convergence to
zero of the output y ∗ .
Assuming we can live with inputs u of at most 2 (in magnitude) then ρ 2 = 0.2
is a reasonable choice (the red graphs in Fig. 4.3). Given that, a value of σ2 =
0.75 may be a good compromise between overshoot and settling time in the
response y ∗ . For this ρ 2 = 0.2, σ2 = 0.75, the optimal control turns out to be
u ∗ = −R −1 B T P x = −(1.9365 x 1 + 3.0656 x 2 + 2.6187 x 3 ),
and the eigenvalues of A − B R −1 B T P are −0.7468 and −0.9859 ± 1.2732i.
k
u
m
F IGURE 4.4: Car connected to a wall via a spring and with a force control
u. See Example 4.6.3.
k LQ 1
k 1
m
c LQ 2
1
u
y x1
0.5
0.5
1
0 2 4 6 8 10
time t
F IGURE 4.5: Top: a car connected to a wall via a spring (on the left). The car
is controlled with an LQ-optimal force u ∗ (t ) = −k LQ y (t ) − c LQ ẏ (t ) imple-
mented as spring/damper system (on the right). Bottom: responses u ∗ (t )
and y (t ) = x 1 (t ) for initial state x (0) = 10 . See Example 4.6.3.
154 4 L INEAR QUADRATIC C ONTROL
Here, the first state component is x 1 = y and the second is x 2 = ẏ . This way the
∞
cost becomes 0 x 21 (t ) + 13 u 2 (t ) dt , and the stabilizing solution of the ARE turns
out to be3
1 2 2 1
P= ,
3 1 2
and the eigenvalues of A − B R −1 B T P are − 12 2 ± 3/2i, while
u ∗ (t ) = −3B T P x (t ) = − 1 2 x (t ) = − y (t ) − 2 ẏ (t ).
u
m
c ẏ
m ÿ (t ) + c ẏ (t ) = u (t ), c > 0. (4.41)
We take the mass equal to m = 1, and we leave the friction coefficient c arbitrary
(but positive). As state we take x :=( y , ẏ ). Then (4.41) becomes
0 1 0
ẋ (t ) = A x (t ) + B u (t ) with A = , B= .
0 −c 1
The aim is again to bring the mass to rest but without using excessive control
effort. A possible solution is to minimize the cost
∞
J [0,∞) (x 0 , u ) = y 2 (t ) + ρ 2 u 2 (t ) dt .
0
3 This is the reason we took R = 1 . Other values yield more complicated expressions for P .
3
4.6 C ONTROLLER D ESIGN WITH LQ O PTIMAL C ONTROL 155
Again the parameter ρ > 0 defines a trade-off between small y and small u . The
matrices Q and R for this cost are
1 0
Q= , R = ρ2.
0 0
(It can be verified that (A, B ) is stabilizable and (Q, A) is detectable.) The ARE
becomes
0 1 0 0 0 0 1 0 0 0
P + P −P P + = .
0 −c 1 −c 0 ρ −2 0 0 0 0
0 = 1 − ρ −2 p 12
2
,
0 = p 11 − cp 12 − ρ −2 p 12 p 22 ,
0 = 2p 12 − 2cp 22 − ρ −2 p 22
2
.
From the first equation we find that p 12= ±ρ. If p 12 = +ρ then the third equa-
tion gives two possible p 22 = ρ 2 (−c ± c 2 + 2/ρ). One is positive, the other is
negative. We need the positive solution because positive semi-definiteness of P
requires p 22 ≥ 0. Now that p 12 and p 22 are known, the second equation settles
p 11 . This turns out to give
c 2 + 2/ρ 1
P =ρ . (4.42)
1 ρ −c + c 2 + 2/ρ
u ∗ (t ) = −ρ −2 B T P x (t )
= − ρ1 c− c 2 + 2/ρ x (t )
= − ρ y (t ) + c − c 2 + 2/ρ ẏ (t ).
1
1/
c ẏ c2 2/ c
u
k1 k2
m1 m2
c1 c2
q1 q2
F IGURE 4.8: Two connected cars. The purpose is to control the second car
with a force u that acts on the first car. See Example 4.6.5.
For simplicity we take all masses and spring constants equal to one, m 1 = m 2 =
1, k 1 = k 2 = 1, and that the friction coefficients are small and the same: c 1 = c 2 =
0.1. Then the linear model in terms of the state x defined as x = ( q 1 , q 2 , q̇ 1 , q̇ 2 )
4.6 C ONTROLLER D ESIGN WITH LQ O PTIMAL C ONTROL 157
1
q2 (t )
0.5
q1 (t )
0.5
1
0 5 10 15 20 25
time t
1
q2 (t )
0.5
q1 (t )
0.5
1
0 5 10 15 20 25
time t
0.5
u (t )
0
0.5
1
0 5 10 15 20 25
time t
becomes
⎡ ⎤ ⎡ ⎤
0 0 1 0 0
⎢0 ⎥ ⎢0⎥
⎢ 0 0 1 ⎥ ⎢ ⎥
ẋ (t ) = ⎢ ⎥ x (t ) + ⎢ ⎥ u (t ).
⎣−2 1 −0.2 0.1 ⎦ ⎣1⎦
1 −1 0.1 −0.1 0
As the friction coefficients are small one may expect sizeable oscillations when
no control is applied. Indeed, the above A matrix has two eigenvalues close to
the imaginary axis (at −0.0119±i0.6177 and −0.1309±i1.6127), and for the initial
state x 0 = (0, 1, 0, 0) and u = 0 the positions q 1 , q 2 of the two cars oscillate for a
long time, see Fig. 4.9(top).
To control the second car with the force u we propose the solution of the
infinite horizon LQ problem with cost
∞
q 22 (t ) + R u 2 (t ) dt .
0
4.7 Exercises
ẋ (t ) = 3 x (t ) + 2 u (t ), x (0) = x0
with cost
T
J [0,T ] (x 0 , u ) = 4 x 2 (t ) + u 2 (t ) dt .
0
Use this to confirm the claim that for T = π/2 the Hamiltonian equa-
tions (4.8) have no solution if x 0 = 0.
(c) Does Pontryagin’s minimum principle allow us to conclude that for
T = π/2 and x 0 = 0 no optimal control u ∗ exists?
π/2 π/2
(d) A Wirtinger inequality. Show that 0 ẋ 2 (t ) dt ≥ 0 x 2 (t ) dt for all
smooth x for which x (0) = x 0 := 0, and show that equality holds iff
x (t ) = A sin(t ).
(a) Show that the finite horizon LQ problem satisfies the convexity
assumptions of Theorem 2.8.1. [Hint: Appendix A.7 may be useful.]
(b) Consider a solution ( x ∗ , p ∗ ) of (4.8), and define u ∗ = −B T p ∗ . Now
consider an arbitrary input u and corresponding state x , an define
v = u − u∗.
i. Show that z := x − x ∗ satisfies ż (t ) = A z (t ) + B v (t ), z (0) = 0.
ii. Show that
T
J [0,T ] (x 0 , u ) − J [0,T ] (x 0 , u ∗ ) = z T Q z + v T v + 2 z T Q x ∗ + 2 u ∗T v dt .
0
(For readability we dropped here the time argument.)
iii. Show that dt d
( p ∗T (t ) z (t )) = − z T (t )Q x ∗ (t ) − u ∗T (t ) v (t ).
iv. Show that
T
J [0,T ] (x 0 , u ) − J [0,T ] (x 0 , u ∗ ) = z T (t )Q z (t ) + v T (t ) v (t ) dt ,
0
and argue that u ∗ is the optimal control.
4.4 There are RDE’s whose solution is constant. Let T > 0 and consider
ẋ (t ) = x (t ) + u (t ), x (0) = x0 := 1,
with cost
T
J [0,T ] (x 0 , u ) = 2 x 2 (T ) + u 2 (t ) dt .
0
(a) Determine the RDE.
(b) Solve the RDE. [Hint: the solution happens to be constant.]
(c) Determine the optimal state x ∗ (t ) and input u ∗ (t ) explicitly as func-
tions of time.
(d) Verify that J [0,T ] (1, u ∗ ) = P (0).
4.5 Why LQ-optimal inputs are linear in the state, and costs are quadratic in
the state. In this exercise we prove, using only elementary arguments (but
not easy arguments), that the optimal control in LQ control is linear in
the state, and that the value function is quadratic in the state. Consider
ẋ (t ) = A x (t )+B u (t ) with the standard LQ cost over the time window [t , T ],
T
J [t ,T ] ( x (t ), u ) = x (T )S x (T ) +
T
x T (τ)Q x (τ) + u T (τ)R u (τ) dτ,
t
and let V (x, t ) be its value function.
4.7 E XERCISES 161
(a) Exploit the quadratic nature of the cost to prove that for every λ ∈ R,
every two x, z ∈ Rn and every two inputs u , w we have
J [t ,T ] (λx, λ u ) = λ2 J [t ,T ] (x, u ),
J [t ,T ] (x + z, u + w ) + J [t ,T ] (x − z, u − w ) = 2J [t ,T ] (x, u ) + 2J [t ,T ] (z, w ).
(4.43)
V (x + z, t ) + V (x − z, t ) ≤ 2 V (x, t ) + 2 V (z, t ).
V (x + z, t ) + V (x − z, t ) ≥ 2 V (x, t ) + 2 V (z, t ).
J [t ,T ] (x + z, u x + w z ) − V (x + z, t ) = V (x − z, t ) − J [t ,T ] (x − z, u x − w z ).
4.7 Dependence of P (t ) and x ∗ (t ) on the final cost. Consider the scalar system
ẋ (t ) = x (t ) + u (t ), x (0) = x0 ,
and cost
T
J [0,T ] (x 0 , u ) = s x (T ) +
2
3 x 2 (t ) + u 2 (t ) dt .
0
Here s is some nonnegative number.
(a) Determine the associated RDE (4.22), and verify that the solution is
given by
3 − d e4(t −T ) 3−s
P (t ) = for d := .
1 + d e4(t −T ) 1+s
[Hint: Exercise 4.6 is useful.]
4.7 E XERCISES 163
(b) Figure 4.10 depicts the graph of P (t ) for several s ≥ 0. The graphs
suggest that P (t ) is an increasing function if s > 3, and a decreasing
function if 0 ≤ s < 3. Use the RDE to formally verify this property.
(c) It appears that for s = 0 the function P (t ) is decreasing. Argue from
the value function why it is immediate that P (t ) is decreasing if s = 0.
[Hint: P (t ) decreases iff for a fixed t the function P (t ) increases as a
function of final time T .]
(d) Figure 4.11 shows graphs of the optimal state x ∗ (t ) for T = 1, T = 2
and various s. The initial condition is x 0 = 1 in all cases. The plot
on the left considers T = 1 and s = 0, 1, 2, 3, 4, 5. The plot on the right
T = 2 and the same s = 0, 1, 2, 3, 4, 5. Explain which of the graphs cor-
respond to which value of s, and also explain from the system equa-
tion ẋ (t ) = x (t )+ u (t ) and cost why it can happen that for some s the
optimal x ∗ (t ) increases for t near T .
1 1
x (t )
x (t )
0 T 1 0 T 2
ż (t ) = Ã z (t ) + B̃ u (t ), z (0) = z0 := E −1 x0
with cost
T
J˜[0,T ] (z 0 , u ) = z T (t )Q̃ z (t ) + u T (t )R u (t ) dt ,
0
where à = E −1 AE , B̃ = E −1 B and Q̃ = E T QE .
Also, what is the relationship between the value functions for both prob-
lems?
164 4 L INEAR QUADRATIC C ONTROL
4.9 State transformation. Consider the infinite horizon LQ problem with sta-
bility for the system
−1 1 1 0
ẋ (t ) = x (t ) + u (t ), x (0) = x0
1 −1 0 1
and cost
∞
7 2
J [0,∞) (x 0 , u ) = x T (t ) x (t ) + u T (t ) u (t ) dt .
0 2 7
(a) Show that the optimal control is u ∗ (t ) = −P x (t ), where P is the sta-
bilizing solution of the ARE.
(b) To find the solution P , perform the state transformation z = E −1 x
for a suitable matrix E . Choose E such that E −1 AE is diagonal, and
use it to determine P . [Hint: use Exercise 4.8.]
4.10 Direct proof of optimality. The proof of Proposition 4.3.1 assumes that R =
I . Develop a similar proof for the case that R is an arbitrary m ×m positive
definite matrix.
ẋ (t ) = A x (t ) + B u (t ), x (0) = x0
with A ∈ Rn×n and B ∈ Rn×m , and suppose that the cost function contains
an exponential factor,
T
J [0,T ] (x 0 , u ) = e−2αt x T (t )Q x (t ) + u T (t )R u (t ) dt ,
0
for a given constant α. (For α > 0 the factor e−2αt is known as a discount
factor, rendering running costs further in the future less important.) We
also assume that Q ≥ 0 and R > 0.
z (t ) = e−αt x (t ), v (t ) = e−αt u (t ).
(The new version of the cost function and system equations should
no longer contain x and u .)
(b) With the aid of (a) determine the solution u ∗ (t ) in terms of x ∗ (t ), of
the optimal control problem of the scalar system
ẋ (t ) = x (t ) + u (t ), x (0) = x0 := 1
with cost
1
J [0,T ] (x 0 , u ) = e−2t x 2 (t ) + u 2 (t ) dt .
0
4.7 E XERCISES 165
4.12 Solving the infinite horizon LQ problem via the eigenvectors of the Hamil-
tonian. (This exercise assumes knowledge of basic linear algebra.) Con-
sider the infinite horizon LQ problem of Exercise 4.9.
(a) Verify that the Hamiltonian matrix H has eigenvalues ±3. [Hint: per-
form row operations on H + 3I .]
(b) Determine two linearly independent eigenvectors v 1 , v 2 of H that
both have eigenvalue −3, and use these to construct the stabilizing
solution P of the ARE.
ẋ (t ) = A x (t ) + B u (t ), x (0) = x0
0 = A x̄ + B ū.
z := x − x̄, v := u − ū
ẋ (t ) = − x (t ) + u (t ), x (0) = 0
with cost
∞
( x (t ) − 1)2 + ( u (t ) − 1)2 dt ,
0
4.15 Infinite horizon LQ. We consider the same system and cost as in Exer-
cise 4.1, but now with T = ∞:
∞
ẋ (t ) = 3 x (t ) + 2 u (t ), x (0) = x0 , J [0,∞) (x0 , u ) = 4 x 2 (t ) + u 2 (t ) dt .
0
4.16 Infinite horizon LQ problems with and without stability. Consider the sys-
tem
ẋ (t ) = x (t ) + u (t ), x (0) = x0 , u (t ) ∈ R
(d) Now take g = 0. Argue that the input that minimizes the cost over all
stabilizing inputs is not the same as the one that minimizes the cost
over all inputs (stabilizing or not).
4.17 Lyapunov equation. Let Q ≥ 0, R > 0 and suppose that B = 0. Consider The-
orem 4.5.6.
ẋ (t ) = u (t ), x (0) = x0 , u (t ) ∈ R,
4.20 Control of a car when the input is either cheap or expensive. Consider
Example 4.6.4. In this exercise we analyze what happens if ρ is either very
large or very small (positive but close to zero). Keep in mind that c > 0.
168 4 L INEAR QUADRATIC C ONTROL
and
lim ρ u ∗ (t ) = − y (t ) − (1/c) ẏ (t ).
ρ→∞
and
u ∗ (t ) ≈ −(1/ρ) y (t ) − 2/ρ ẏ (t ).
and L2 is the Hilbert space of functions whose L2 -norm is finite. In this section,
we briefly review L2 -norm inequalities, and we make a connection with H∞ the-
ory. The starting point is a system that includes an output y ,
ẋ (t ) = A x (t ) + B u (t ), x (0) = x0 ,
(5.1)
y (t ) = C x (t ).
Suppose we want to minimize the indefinite cost
for some given γ > 0 over all stabilizing inputs. In terms of the state and input,
∞
this cost takes the form 0 x T (−C T C ) x (t )+γ2 u T (t ) u (t ) dt , and, therefore, the LQ
Riccati equation (4.33) and stability condition (4.34) become
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 169
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0_5
170 5 G LIMPSES OF R ELATED T OPICS
Consequently, like in the standard LQ problem, the stabilizing input that min-
imizes (5.6) is u ∗ = γ12 B T P x , and the optimal (minimal) cost − y 2L2 + γ2 u 2L2
equals −x 0T P x 0 . It also shows that P is positive semi-definite, because for u = 0
(which is stabilizing since we assumed A to be asymptotically stable) the cost is
− y 2L2 + γ2 u 2L2 = − y 2L2 ≤ 0, so the minimal cost −x 0T P x 0 is less than or equal
to zero for every x 0 . Knowing this, the identity (5.5) for zero initial state, x 0 = 0,
gives us the inequality
This is a key observation: in asymptotically stable systems (5.1) that start at rest,
x 0 = 0, the norm y L2 never exceeds γ u L2 if there exists a symmetric P that
satisfies (5.4). It is a central result in H∞ theory that the existence of such a P
is both necessary and sufficient for the norm inequality (5.7) to hold (in a strict
sense):
ẋ (t ) = A x (t ) + B u (t ), x (0) = 0,
y (t ) = C x (t ).
Proof. The equivalence of the first three is standard and can be found in sev-
eral books, e.g. (Zhou et al., 1996, Corollary 13.24)1 . The Riccati equation in
Condition 4 we recognize as the Riccati equation of the “transposed” system
x̂˙ (t ) = A T x̂ (t ) + C T û (t ), ŷ (t ) = B T x̂ (t ). Condition 4 is equivalent to Condition 2
because the Hamiltonian matrix H of the transposed system is similar to the
transpose of the Hamiltonian matrix H of Condition 2:
T
−1 1 T
γ2
I 0 AT −C T C −γ2 I 0 A γ2
C C
1 = ,
0 I γ2
BB T
−A 0 I −B B T
−A
=H T =H
and, thus, H and H have the same eigenvalues. ■
There are many variations of this theorem, for instance, for systems that in-
clude a direct feedthrough term, y (t ) = A x (t ) + D u (t ). In general, the expression
y L2
sup
u ∈L2 , u =0 u L2
is known as the L2 -gain of the system. Theorem 5.1.1 shows that the L2 -gain
equals the largest γ > 0 for which the Hamiltonian matrix has imaginary eigen-
values. Thus by iterating over γ, we can calculate the L2 -gain. Also, as γ → ∞,
the eigenvalues of the Hamiltonian matrix converge to the eigenvalues of A and
−A. Hence, the L2 -gain is finite whenever A is asymptotically stable. An inter-
esting by-product of Theorem 5.1.1 is that every L2 input is a stabilizing input if
the system is asymptotically stable:
Lemma 5.1.2 (Stability for L2 inputs). Let A ∈ Rn×n , B ∈ Rn×m and consider
ẋ (t ) = A x (t ) + B u (t ), x (0) = x0 . If A is asymptotically stable and u ∈ L2 , then
• both x and ẋ are in L2 ,
• limt →∞ x (t ) = 0.
Proof. Take y = x , that is, C = I . For large enough γ, the Hamiltonian matrix of
Theorem 5.1.1 does not have imaginary eigenvalues, so Condition 3 of Theo-
rem 5.1.1 holds for some large enough γ. Thus, given γ large enough, (5.5) says
that
γ2 u 2L2 + x 0T P x 0 = x 2L2 + x T (∞)P x (∞) + γ2 u − γ1 B T P x 2L2 .
All terms on the right-hand side are nonnegative, hence if u ∈ L2 , then all
these terms are finite. In particular, x ∈ L2 . Consequently also ẋ = A x +
B u ∈ L2 . Now the Cauchy-Schwarz inequality guarantees that | x 2i (b) − x 2i (a)| =
b
b b 2
| a 2 ẋ i (t ) x i (t ) dt | ≤ 2 a ẋ 2i (t ) dt a x i (t ) dt → 0 as a, b → ∞. So x i (t ) con-
2
LQ optimal control theory was very successful, and it still is, but it has not been
easy to incorporate model uncertainties in this approach. In the late seventies,
this omission led Zames2 to the idea of using H∞ -optimization as an alternative
to LQ optimal control. It was the starting point of the wonder years of H∞ the-
ory. It attracted the attention of operator theoreticians, and for over a decade
there was a very fruitful cooperation between operator theory and systems and
control theory. At the end of the eighties the first truly satisfactory solutions of
what is called the “standard H∞ problem” were obtained.
The H∞ -norm of a linear time-invariant mapping from w to z is, strictly
speaking, defined as a property of its transfer function / matrix H z/w (s). How-
ever, for systems described by
ẋ (t ) = A x (t ) + B w (t ), x (0) = 0,
z (t ) = C x (t ),
z L2
H z/w H∞ = sup .
w ∈L2 , w =0 w L2
Thus, the H∞ -norm is an induced norm (on a Banach space of bounded lin-
ear operators). Hence, we have the well-known contraction theorem (aka small
gain theorem) which for linear mappings H : L2 → L2 says that I − H is invertible
on L2 if H H∞ < 1. This elegant result is very powerful and can be utilized to
design controllers for a whole family of systems, or for systems whose models
are uncertain in some specific way. The game in H∞ optimal control is to exploit
the freedom in the design to minimize the H∞ -norm of a mapping H z/w that we
select. In this context, w is often called the “disturbance” and z the “error signal”.
Even though the popularity stems mainly from its ability to deal with dynamic
uncertainties, we illustrate it only for a problem with signal uncertainties:
q
G q/w
z w
K G y /w
u y
z w
G
y u
for certain matrices A, B w , B u ,C z ,C y , D z/u , D y/w . Given this system, the standard
H∞ problem is to minimize the H∞ -norm of the mapping from w to z over all
stabilizing causal mappings u = K ( y ), see Fig. 5.2. The mapping K is usually
called controller, and it is the part that we have to design. Over the years, many
solutions have been put forward, but it is fair to say that the best known solution
and best supported in software packages is as follows. It assumes that the state
representation (5.8) is such that
• (A, B u ) is stabilizable,
• (C y , A) is detectable,
A − iωI Bu
• has full column rank for all ω ∈ R,
Cz D z/u
A − iωI Bw
• has full row rank for all ω ∈ R,
Cy D y/w
• D z/u has full column rank, and D y/w has full row rank.
Here is the famous result:
Then there exists a causal stabilizing controller K for which the H∞ -norm of the
mapping from w to z is less than γ iff the following three conditions hold:
1. The Riccati equation A T P + P A + P ( γ12 B w B w
T
− B u B uT )P + C zT C z = 0 has a
unique solution P for which A + ( γ12 B w B w
T
− B u B uT )P is asymptotically sta-
ble, and P is symmetric and positive semi-definite,
The solutions P and Q of the above two Riccati equations can be con-
structed from the asymptotically stable eigenvalues and eigenvectors of the cor-
responding Hamiltonian matrices, much like what we did in the final examples
of § 4.5. We need to stress that the additional assumptions (5.9) are for ease of
exposition only. Without it, the problem is perfectly solvable but the formulae
become unwieldy.
• K. Zhou, J.C. Doyle, and K. Glover. Robust and Optimal Control. Prentice
Hall: Upper Saddle River, New Jersey, 1996.
ẋ (t ) = f ( x (t ), u (t )), x (t ) ∈ X = Rn , u (t ) ∈ U = Rm ,
(5.10)
y (t ) = h( x (t ), u (t )), y (t ) ∈ Y = Rp ,
for all initial conditions x (0) = x 0 , all τ ≥ 0, and all u : [0, τ] → U, where x (τ)
denotes the state at time τ and y (t ) the output at time t resulting from initial
condition x (0) = x 0 and input function u . The nonnegative function S is called
the storage function (corresponding to the supply rate s), and (5.11) is called the
dissipation inequality. Clearly, if S(x) is satisfies (5.11), then so does the function
S(x)−c for any constant c. Hence, if S(x) is a storage function, then so is S(x)−c
3 See References and further reading on page 179.
176 5 G LIMPSES OF R ELATED T OPICS
for any c such that S(x) − c is a nonnegative function. However, in many cases,
the non-uniqueness of storage functions goes much further than this. Two key
examples of supply rates s(u, y) are
and
(This is similar to the infinite horizon case as discussed in § 5.1 for linear sys-
tems.) Thus, the L2 -norm of the output on [0, τ] is bounded by γ times the L2 -
norm of the input on [0, τ], plus a constant (only depending on the initial con-
dition); for all τ ≥ 0. The infimal value of γ for which this holds is the L2 -gain
of the system; it measures the amplification from input to output functions. For
linear time-invariant systems, the L2 -gain equals the H∞ -norm of its transfer
matrix; see Section 5.1.
When is the system (5.10) dissipative for a given supply rate s? It turns out
that the answer to this question is given by an optimal control problem! Con-
sider the extended function S a : X → [0, ∞] (“extended” because the value ∞ is
allowed) which is defined by the free final time optimal control problem
τ τ
S a (x 0 ) = sup − s u (t ), y (t ) dt = − inf s u (t ), y (t ) dt (5.12)
τ, u :[0,τ]→U 0 τ, u :[0,τ]→U 0
5.2 D ISSIPATIVE S YSTEMS 177
for any initial condition x 0 , where y (t ), t ∈ [0, τ], denotes the output resulting
from initial condition x (0) = x 0 and input u : [0, τ] → U. Note that by construc-
tion (take τ = 0) S a (x 0 ) ≥ 0 for all x 0 ∈ X.
Theorem 5.2.1. The system (5.10) is dissipative with respect to the supply rate
s iff S a (x 0 ) < ∞ for all x 0 , i.e., S a : X → [0, ∞). Furthermore, if this is the case,
then S a is a storage function, and any other storage function S satisfies S a (x 0 ) ≤
S(x 0 ) − infx S(x) for all x 0 ∈ X. Finally, infx S a (x) = 0.
Proof. Suppose S a (x 0 ) < ∞ for all x 0 , and thus S a : X → [0, ∞). Consider any
u : [0, τ] → U and x0 . Then in general u will be a suboptimal input for the optimal
control problem (5.12). Hence,
τ
S a (x 0 ) ≥ − s u (t ), y (t ) dt + S a ( x (τ)),
0
which is the same as the dissipation inequality (5.11). Thus, (5.10) is dissipative
with storage function S a . Conversely, let (5.10) be dissipative, i.e., there exists
nonnegative S satisfying (5.11). Then for any u : [0, τ] → U and x 0
τ
S(x 0 ) + s u (t ), y (t ) dt ≥ S( x (τ)) ≥ 0,
0
τ
and thus S(x 0 ) ≥ − 0 s u (t ), y (t ) dt , and hence
τ
S(x 0 ) ≥ sup − s u (t ), y (t ) dt = S a (x 0 ).
τ, u :[0,τ]→U 0
For any storage function S, the function Ŝ(x 0 ) := S(x 0 )−infx S(x), x 0 ∈ X, is a stor-
age function as well. Finally, since infx Ŝ(x) = 0 also infx S a (x) = 0. ■
for all x, u and y = h(x). For any x the maximum over all u of the left-hand
side of (5.15) is attained at u = 2γ1 2 G T (x) ∂S(x)
∂x , and by substitution into (5.15), it
follows that (5.15) is satisfied for all x, u iff
∂S(x) 1 ∂S(x) ∂S(x)
f (x) + 2 G(x)G T (x) + h T (x)h(x) ≤ 0 (5.16)
∂x T 4γ ∂x T ∂x
for all x ∈ X. On the other hand, the Hamilton-Jacobi-Bellman equation (3.12a)
(p. 96) for the optimal control problem (5.12) with s(u, y) = γ2 u2 −y2 is read-
ily computed as
∂S a (x) 1 ∂S a (x) ∂S a (x)
f (x) + 2 G(x)G T (x) + h T (x)h(x) = 0. (5.17)
∂x T 4γ ∂x T ∂x
Hence, we will call (5.16) the Hamilton-Jacobi inequality. It thus follows that S a
satisfies the Hamilton-Jacobi equation (5.17), while any other storage function
S satisfies the Hamilton-Jacobi inequality (5.16). In general, any infinite horizon
∞
optimal control problem of minimizing 0 L( x (t ), u (t )) dt over all input func-
tions to a control system ẋ (t ) = f ( x (t ), u (t )), with L(x, u) an arbitrary running
cost, leads to the following inequality involving Bellman’s value function V :
∂ V (x)
f (x, u) + L(x, u) ≥ 0, x ∈ X, u ∈ U. (5.18)
∂x T
Thus, (5.18) could be regarded as a reversed dissipation inequality.
The optimal control problem (5.12) for the passivity supply rate s(u, y) = y T u
is much harder: by linearity in u, it is singular. On the other hand, the differen-
tial dissipation inequality (5.14) takes the simple form
∂S(x)
f (x) + G(x)u ≤ h T (x)u
∂x T
Let us finally consider (5.19) in case the system (5.13) is a linear system ẋ (t ) =
A x (t ) + B u (t ), y (t ) = C x (t ), that is, f (x) = Ax,G(x) = B, h(x) = C x. Using the
same argumentation as in Exercise 4.5, it follows that S a is a quadratic func-
tion of the state; i.e., S a (x) = 12 x T Q a x for some matrix Q a = Q aT ≥ 0. Restricting
anyway to quadratic storage functions S(x) = 12 x T Qx, with Q = Q T ≥ 0, the dif-
ferential dissipation inequality (5.19) takes the form x T Q Ax ≤ 0,C x = B T Qx for
all x, that is
A T Q + Q A ≤ 0, C = B T Q. (5.20)
This is the famous linear matrix inequality (LMI) of the classical Kalman-
Yakubovich-Popov lemma. Note that Q a is the minimal solution of (5.20).
• J.C. Willems, Dissipative dynamical systems. Part II: Linear systems with
quadratic supply rates. Arch. Rat. Mech. Anal. 1972, 45, 352–393.
• A.J. van der Schaft, L 2 -Gain and Passivity Techniques in Nonlinear Con-
trol, 3rd ed.; Springer: Cham, Switzerland, 2017.
In this section, we will explore some of the geometry behind the algebraic Riccati
equation, using the geometric theory of Hamiltonian dynamics.
Consider a state space X with elements x; first regarded as a finite-
dimensional abstract linear space. Let X∗ be its dual space, with elements
denoted by p. Denote the duality product between X and X∗ by 〈p, x〉 ∈ R,
for x ∈ X, p ∈ X∗ . After choosing a basis for X, one identifies X ∼
= Rn for some
n. Taking the dual basis for X∗ , thereby also identifying X∗ ∼ = Rn , the duality
product reduces to the vector product p x. On the product space X × X∗ , one
T
(x 1 , p 1 ), (x 2 , p 2 ) :=〈p 1 , x 2 〉 − 〈p 2 , x 1 〉, (5.21)
which, after choosing a basis for X and dual basis for X∗ , is identified with the
matrix
0 −I n
J := . (5.22)
In 0
x
Denoting z i := pii , i = 1, 2, the expression (5.21) thus equals z 1T J z 2 . Now con-
sider any differentiable function H : X × X∗ → R, called a Hamiltonian function.
180 5 G LIMPSES OF R ELATED T OPICS
1 1
H (z) = H (x, p) = p T Ax + x T F x + p T G p, F = F T ,G = G T , (5.23)
2 2
the Hamiltonian vector field v H corresponds to the linear Hamiltonian differen-
tial equations
ẋ (t ) A G x (t )
= , (5.24)
ṗ (t ) −F −A T p (t )
H
d T
z 1 (t )J z 2 (t ) = ż 1T (t )J z 2 (t )+ z 1T (t )J ż 2 (t ) = z 1T (t ) H T J +J H z 2 (t ) = 0,
dt
implying that z 1T (t )J z 2 (t ) is constant as a function of t . Because limt →∞ z 1 (t ) =
0 = limt →∞ z 2 (t ) this yields z 1T (t )J z 2 (t ) = 0 for all t , proving that N − is indeed
a Lagrangian subspace. By time-reversal, the same is shown for the generalized
unstable eigenspace N + .
5.3 I NVARIANT L AGRANGIAN S UBSPACES AND R ICCATI 181
Proof. Let L be such that v H (z) ∈ L for all z ∈ L . Then for all v, z ∈ L ,
0 = v T J v H (z) = v T e H (z), which implies that H is constant on L . Thus, since
H (0) = 0, H is zero on L . Conversely, let H be zero on L , implying that
0 = v T e H (z) = v T J v H (z) for all v, z ∈ L . By L = L ⊥⊥ this implies v H (z) ∈ L
for all z ∈ L , and thus L is invariant for v H . ■
U T A T V + V T AU +U T FU + V T GV = 0. (5.26)
This will be called the generalized algebraic Riccati equation. In case U is invert-
ible, the Lagrangian subspace L given by (5.25) equals
U I
L = Im = Im .
V V U −1
A T P + P T A + F + P T GP = 0.
∂V (x)
L := {(x, p) | p = } (5.29)
∂x
is a Lagrangian submanifold, i.e., a submanifold of dimension equal to dim X on
which ω is zero. Such a Lagrangian submanifold L is invariant for the Hamil-
tonian vector field v H on T ∗ X for an arbitrary differentiable Hamiltonian H iff
H is constant on L . Hence, if H is zero at some point of L given by (5.29),
5.4 M ODEL P REDICTIVE C ONTROL 183
• R.A. Abraham, and J.E. Marsden, Foundations of Mechanics, 2nd ed.; Ben-
jamin/Cummings: Reading, MA, USA, 1978.
• S. Bittanti, A.J. Laub, and J.C. Willems (Eds.), The Riccati Equation,
Springer-Verlag, Berlin-Heidelberg, 1991. (Chapter 3 of this book is by
V. Kučera and it contains a proof of Proposition 5.3.3.)
• A.J. van der Schaft, L 2 -Gain and Passivity Techniques in Nonlinear Con-
trol, 3rd ed.; Springer: Cham, Switzerland, 2017.
Its key idea is the following. Let the state of the system at some discrete time
instant t ∈ Z be equal to x t . Then consider for given integer N > 0 the optimal
control problem of minimizing over all control sequences u (t ), . . . , u (t + N − 1)
the cost criterion
N
−1
J [t ,t +N −1] (x t , u ) = L( x (t + k), u (t + k)). (5.31)
k=0
Here, L(x, u) is the running cost, and x (t + k) denotes the state at time t + k
resulting from initial condition x (t ) = x t and control sequence u (t ), . . . , u (t + k −
1). Note that this is the same optimal control problem as considered in § 3.3,
184 5 G LIMPSES OF R ELATED T OPICS
apart from the fact that the minimization is done over the time horizon t , t +
1, . . . , t + N − 1, instead of 0, 1, . . . , T − 1, and that there is no terminal cost.
Let u ∗ (t ), u ∗ (t + 1), . . . , u ∗ (t + N − 1) be the computed optimal control
sequence. The basic difference between model predictive control and stan-
dard optimal control now shows up as follows. Consider just the first control
value u ∗ (t ), and apply this to the system. Now consider the same optimal con-
trol problem, but shifted in time by 1. That is, minimize, for the observed initial
condition x t +1 at time t + 1, the cost criterion
N
−1
J [t +1,t +N ] (x t , u ) = L( x (t + 1 + k), u (t + 1 + k)) (5.32)
k=0
for some finite horizon length T , with T a positive integer. Solving this optimal
control problem yields an optimal control function u ∗ : [t , t +T ] → U, which can
be implemented during the restricted time interval [t , t + 1). The observed state
x t +1 at the next discrete time instant t + 1 then serves as initial condition for
a shifted optimal control problem on [t + 1, t + 1 + T ], and so on, just as in the
discrete-time case. (Obviously, all this can be extended to arbitrary time instants
t ∈ R, arbitrary horizon T ∈ (0, ∞), and an arbitrary implementation time inter-
val [t , t + δ], T ≥ δ > 0.)
• J.B. Rawlings and D.Q. Mayne. Model Predictive Control: Theory and
Design. Nob Hill Publishing, Madison, 2009.
Appendix A
Background Material
This appendix contains summaries of a number of topics that play a role in opti-
mal control. Each section covers one topic, and most of them can be read inde-
pendently from the other sections. The topics are standard and are covered in
one form or another in calculus courses, a course on differential equations, or a
first course on systems theory. Nonlinear differential equations and stability of
their equilibria are discussed in Appendix B.
x T P x > 0 ∀x ∈ Rn , x = 0.
x T P x ≥ 0 ∀x ∈ Rn .
The notation V > 0 and P > 0 means that the function/matrix is positive
definite. Interestingly, real symmetric matrices have real eigenvalues only, and
there exist simple tests for positive definiteness:
© The Editor(s) (if applicable) and The Author(s), under exclusive license 187
to Springer Nature Switzerland AG 2023
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0
188 A PPENDIX A: B ACKGROUND M ATERIAL
1. P > 0.
2. All leading principal minors are positive: det(P 1:k,1:k ) > 0 for all k ∈
{1, 2, . . . , n}. Here P 1:k,1:k is the k × k sub-matrix of P composed of the
first k rows and first k columns of P .
−1
(That is, both P 11 and its so-called Schur complement P 22 −P 12
T
P 11 P 12 are
positive definite).
For positive semi-definite matrices similar tests exist, except for the princi-
pal minor test which is now more involved:
Lemma A.1.2 (Tests for positive semi-definiteness). Let P = P T ∈ Rn×n . The fol-
lowing five statements are equivalent.
1. P ≥ 0.
2. All principal minors (not just the leading ones) are nonnegative. That is,
det(P I,I ) ≥ 0 for every subset I of {1, . . . , n}. Here P I,I is the square matrix
composed from all rows i ∈ I and columns j ∈ I.
0
Example A.1.3. P = 00 −1 is not positive semi-definite because the principal
minor det(P 2,2 ) = −1 is not nonnegative.
P = 00 01 is positive semi-definite because all three principal minors, det(0),
det(1), det(P ) are nonnegative.
With the same logic we get a row vector if we differentiate with respect to a row
vector,
∂ f (x) ∂ f (x) ∂ f (x) ∂ f (x)
:= · · · ∈ R1×n .
∂x T ∂x 1 ∂x 2 ∂x n
and
⎡ ⎤
∂ f 1 (x) ∂ f k (x)
⎢ ∂x ···
⎢ 1 ∂x 1 ⎥ ⎥
⎢ ⎥
⎢ ∂ f 1 (x) ∂ f k (x) ⎥
∂ f T (x) ⎢ ··· ⎥
:=⎢
⎢ ∂x 2 ∂x 2 ⎥ ⎥∈R
n×k
.
∂x ⎢ .. .. .. ⎥
⎢ . . . ⎥
⎢ ⎥
⎣ ∂ f 1 (x) ∂ f k (x) ⎦
···
∂x n ∂x n
The first is the Jacobian, the second is its transpose. Convenient about this nota-
tion is that the n × n Hessian of a function f : Rn → R can now compactly be
denoted as
∂2 f (x) ∂ ∂ f (x) ∂ ∂ f (x) ∂ f (x) ∂ f (x)
:= = · · ·
∂x∂x T ∂x ∂x T ∂x ∂x 1 ∂x 2 ∂x n
⎡ 2 ⎤
∂ f (x) ∂2 f (x) ∂2 f (x)
⎢ ···
⎢ ∂x 12 ∂x 1 ∂x 2 ∂x 1 ∂x n ⎥ ⎥
⎢ 2 ⎥
⎢ ∂ f (x) ∂2
f (x) ∂ 2
f (x) ⎥
⎢ ··· ⎥
⎢ ∂x ∂x ∂x 2 ∂x n ⎥
=⎢ 2 1 ∂x 2 2
⎥.
⎢ ⎥
⎢ .. .. .. .. ⎥
⎢ . . . . ⎥
⎢ 2 ⎥
⎣ ∂ f (x) ∂2 f (x) ∂2 f (x) ⎦
···
∂x n ∂x 1 ∂x n ∂x 2 ∂x n2
Indeed, we first differentiate with respect to the row x T , and subsequently, dif-
ferentiate the outcome (a row) with respect to the column x, resulting in an n×n
matrix of second-order partial derivatives. If f (x) is twice continuously differen-
tiable then the order in which we determine the second-order derivatives does
not matter, so then
∂2 f (x) ∂2 f (x)
= .
∂x∂x T ∂x T ∂x
Hence the Hessian is symmetric.
h( x (t )) ẋ (t ) = g (t ),
A.3 S EPARATION OF VARIABLES 191
and we see that the left-hand side is the derivative of H ( x (t )) with respect to t ,
and the right-hand side is the derivative of G(t ) with respect to t . So it must be
that
H ( x (t )) = G(t ) + c 0
x (t ) = H −1 (G(t ) + c0 ).
This derivation assumes that H is invertible. The value of c 0 is typically used to
match an initial condition x (t 0 ).
ẋ (t ) = − x 2 (t ), x (0) = x0
of Example B.1.5 using separation of variables. We split the solution in two
columns; the first column is the example, the second column makes a connec-
tion with the general procedure:
ẋ (t ) = − x 2 (t ) h( x (t )) = 1/ x (t )2 , g (t ) = − 1
ẋ (t )
= −1 h( x (t )) ẋ (t ) = g (t )
x 2 (t )
1
− = −t + c 0 H ( x (t )) = G(t ) + c 0
x (t )
1
x (t ) = x (t ) = H −1 (G(t ) + c0 )
t − c0
In this example the inverse exists as long as t = c 0 . Now x 0 = x (0) = −1/c 0 so
c 0 can be expressed in terms of x 0 as c 0 = −1/x 0 and the above solution then
becomes
1 x0
x (t ) = = . (A.1)
t + 1/x 0 x 0 t + 1
The solution x (t ) escapes at t = −1/x 0 . (For the escape time we refer to Exam-
ple B.1.5.)
ẋ (t ) = a x (t ), x (0) = x0 ,
and that x (t ) > 0. Then we may divide by x (t ) to obtain
ẋ (t )
= a.
x (t )
Integrating both sides and using that x (t ) > 0, we find that
ln( x (t )) = at + c 0 .
192 A PPENDIX A: B ACKGROUND M ATERIAL
For x (t ) < 0 the same solution x 0 eat results (verify this yourself ), and if x (t ) = 0
for some time t then x (t ) = 0 for all time, which is also of the form x (t ) = x 0 eat
(since x 0 = 0). In summary, for every x 0 ∈ R the solution is x (t ) = x 0 eat .
λ3 − λ2 − 5λ − 3 = 0.
The function λ3 −λ2 −5λ−3 is known as the characteristic polynomial of the DE,
and in this case it happens to equal (λ + 1)2 (λ − 3). Thus the characteristic roots
of this equation (over the complex numbers) are
one can find a particular solution, y part (t ), of the same exponential form,
y part (t ) = A est . The constant A follows easily by equating left and right-hand
side of (A.3). For this example it gives
2s + 3
y part (t ) = est .
s 3 − s 2 − 5s − 3
A.5 S YSTEMS OF L INEAR T IME -I NVARIANT DE’ S 193
Then the general solution is obtained by adding the general solution of the
homogeneous equation to the particular solution,
2s + 3
y (t ) = est +(c 1 + c 2 t ) e−t +c 3 e+3t .
s 3 − s 2 − 5s − 3
If s equals a characteristic root (s = −1 or s = +3 in our example) then the above
particular solution is invalid due to a division by zero. Then a particular solution
exists of the form
y part (t ) = At k est
ẋ (t ) = A x (t ) + B u (t ), x (0) = x0
follows uniquely as
t
x (t ) = e At
x0 + e A(t −τ) B u (τ) dτ, t ∈ R. (A.4)
0
∞ 1 1 1
eA = Ak = I + A + A2 + A3 + · · · .
k=0 k! 2! 3!
This series is convergent for every square matrix A. Some characteristic proper-
ties of the matrix exponential are:
For the zero signal u (t ) = 0, the above equation (A.4) says that the general
solution of
ẋ (t ) = A x (t ), x (0) = x0 ∈ Rn
is
x (t ) = e At x0 .
AP = P Λ,
e At = P eΛt P −1
with eΛt as in (A.5). This shows that for diagonalizable matrices A, every entry
of e At is a linear combination of eλi t , i = 1, . . . , n. However, not every matrix is
diagonalizable. Using Jordan forms it can be shown that:
Here A ∈ Rn×n and B ∈ Rn×m . The function u : [0, ∞) → Rm is often called the
(control) input, and the interpretation is that this u is for us to choose, and that
the state x : [0, ∞) → Rn follows. A natural question is how well the state can be
controlled by choice of u :
1. (A, B ) is controllable;
2. B AB · · · A n−1 B ∈ Rn×(mn) has rank n;
3. A − sI B has rank n for every s ∈ C;
1. (A, B ) is stabilizable;
2. A − sI B has rank n for every s ∈ C with Re(s) ≥ 0;
ẋ (t ) = A x (t ), x (0) = x0 , t > 0,
(A.7)
y (t ) = C x (t ).
state that can be measured. It is a natural question to ask how much informa-
tion the output provides about the state. For example, if we know the output,
can we reconstruct the state? For linear systems one might define observability
as follows.
1. (C , A) is observable;
⎡ ⎤
C
⎢ ⎥
⎢ CA ⎥
2. ⎢ . ⎥
⎢
⎥∈R
(kn)×n
has rank n;
⎣ .. ⎦
C A n−1
C
3. has rank n for every s ∈ C;
A − sI
1. (C , A) is detectable;
C
2. has rank n for every s ∈ C with Re(s) ≥ 0;
A − sI
x1
x̃0
x0
x̃1 x1
x0
F IGURE A.1: Left: two line segments in R2 . Right: one line segment in R3 .
See § A.7.
1 2 3
F IGURE A.2: Three subsets of R2 . Sets X1 , X2 are convex. The third set, X3 ,
is not convex because one of its line segments is not contained in X3 . See
§ A.7.
To explain convex functions we first need to know what line segments and con-
vex sets are. Let X be a set. A line segment with endpoints x 0 , x 1 ∈ X is the set
and the entries of this set are known as the convex combinations of x 0 and x 1 .
For X = R the line segments are the closed intervals [x 0 , x 1 ]. Figure A.1 depicts a
couple of line segments in R2 and R3 . In order to convey the symmetry in x 0 , x 1 ,
line segments are usually denoted as
This is the same as (A.8). A set X is said to be convex if it contains all its line
segments. That is, if for every x 0 , x 1 ∈ X also (1−μ)x 0 +μx 1 ∈ X for every μ ∈ [0, 1].
Figure A.2 depicts a couple of convex sets in R2 , and also one that is not convex.
Now that convex sets are defined, we can define convex functions. Such
functions are only defined on convex sets. Let X be a convex set. A function
g : X → R is said to be a convex function (on X) if for every x 0 , x 1 ∈ X the graph
of the function with endpoints (x 0 , g (x 0 )) and (x 1 , g (x 1 )) is on or below the line
segment between these two points. More concrete, it is a convex function if for
every x 0 , x 1 ∈ X we have
g (x̄) g (x̄ ) (x x̄ )
g (x̄ )
x
x̄ x
∂g (x̄)
F IGURE A.4: A C 1 function g : R → R is convex iff g (x) ≥ g (x̄) + ∂x (x − x̄)
∀x̄, x ∈ R. See Lemma A.7.1.
Proof. If g is convex then g (x̄ + μ(x − x̄)) ≤ g (x̄) + μ(g (x) − g (x̄)) for all μ ∈ [0, 1].
This inequality we can rearrange as
assuming μ ∈ (0, 1]. The right-hand side of the above inequality converges to
∂g (x̄)
∂x T (x − x̄) as μ ↓ 0. So (A.9) follows.
Conversely, suppose (A.9) holds. Then it also holds for x μ :=(1 − μ)x̄ + μx for
arbitrary μ ∈ [0, 1]. That is,
∂g (x μ ) ∂g (x μ )
g (x̄) ≥ g (x μ ) + (x̄ − x μ ) = g (x μ ) − μ(x − x̄),
∂x T
∂x T
∂g (x μ ) ∂g (x μ )
g (x) ≥ g (x μ ) + (x − x μ ) = g (x μ ) + (1 − μ)(x − x̄).
∂x T ∂x T
Adding the first inequality times 1 − μ to the second times μ cancels the deriva-
tive and yields (1 − μ)g (x̄) + μg (x) ≥ g (x μ ) = g ((1 − μ)x̄ + μx). ■
Proof. The Hessian is symmetric. By Taylor’s formula we have that g (x) = g (x̄)+
dg (x̄) 1
dx T (x − x̄) + 2 (x − x̄) G(z)(x − x̄) for some convex combination z of x̄, x. Hence
T
∂ f (x ∗ )
(x − x ∗ ) ≥ 0 ∀x ∈ X, (A.10)
∂x T
see Fig. A.5. If f in addition is a convex function then x ∗ minimizes f (x) over
all x ∈ X iff (A.10) holds.
200 A PPENDIX A: B ACKGROUND M ATERIAL
{x f (x) c}
x
x
f (x )
x
{x f (x) c}
∂ f (x ∗ )
Proof. Suppose (A.10) does not hold, i.e., ∂x T (x − x ∗ ) < 0 for some x ∈ X. Then
∂ f (x )
(x ∗ )+μ ∂x T∗ (x −x ∗ )+ o(μ) <
f (x ∗ +μ(x −x ∗ )) = f f (x ∗ ) for small enough μ ∈ (0, 1].
So x ∗ does not minimize f (x).
∂ f (x )
If f is convex then by Lemma A.7.1 we have that f (x) ≥ f (x ∗ ) + ∂x T∗ (x −
x ∗ ). If (A.10) holds then the latter clearly implies f (x) ≥ f (x ∗ ). Hence x ∗ is the
solution of the minimization problem. ■
minn J (z),
z∈R
ent
tang
0
z)
G(
(0, 0)
G(z0 )
z
z0
J (z0 )
z
G(z) 0 (a)
0
z)
G(
G(z )
(0, 0) z J (z )
z z
G(z) 0 (b)
F IGURE A.6: Let J (z) = z 12 + z 22 . Its level sets {z = (z 1 , z 2 )|J (z) = c} are cir-
cles (shown in red). Suppose the blue curve is where G(z) = 0 and the gray
region is where G(z) < 0. The z 0 in (a) is not a local minimizer of J (z)
subject to G(z) = 0. The z ∗ in (b) is a local minimizer, and it satisfies the
first-order condition that the gradients ∂J (z)/∂z and ∂G(z)/∂z are aligned
at z = z ∗ . See § A.8.
202 A PPENDIX A: B ACKGROUND M ATERIAL
G(z ∗ ) = 0, (A.12a)
and later in this appendix its stability properties are analyzed. Here x 01 , . . . , x 0n ∈
R are given initial conditions, and the f i : Rn → R are given functions. The vector
⎡ ⎤
x1
⎢ ⎥
x := ⎣ ... ⎦ : R → Rn
xn
© The Editor(s) (if applicable) and The Author(s), under exclusive license 205
to Springer Nature Switzerland AG 2023
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0
206 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS
There are differential equations (B.2) whose solution x does not follow uniquely
from the initial state:
Clearly the zero function, x (t ) = 0 ∀t , is one solution, but it is easy to verify that
for every c > 0 the function
Figure B.1 illustrates Lipschitz continuity for f : R → R. For the linear f (x) =
kx, with k ∈ R, the Lipschitz constant is obviously K = |k|, and the solution of
the corresponding differential equation
ẋ (t ) = k x (t ), x (0) = x0
is x (t ) = ekt x 0 . Given x 0 , this solution exists and is unique. The idea is now
that for arbitrary Lipschitz continuous f : Rn → Rn the solution of ẋ (t ) =
f ( x (t )), x (0) = x 0 exists and is unique (one some neighborhood of x 0 ) and that
the solution increases at most exponentially fast as a function of t , with the
exponent K equal to the Lipschitz constant (on that neighborhood):
B.1 E XISTENCE AND U NIQUENESS OF S OLUTIONS 207
f (x)
a z b x
x (t ; x 0 ) − x (t ; z 0 ) ≤ x 0 − z 0 eK t ∀t ∈ [0, T ).
Proof. The proof can be found in many textbooks, e.g., (Khalil, 1996, Thm. 2.2
& Thm. 2.5). ■
If a single Lipschitz constant K ≥ 0 exists such that (B.3) holds for all x, z ∈ Rn
then f is said to satisfy a global Lipschitz condition.
It follows from the above theorem that the solution x (t ) can be uniquely
continued at every t if f is locally Lipschitz continuous. This is such a desir-
able property that one normally assumes that f is locally Lipschitz continuous.
Every continuously differentiable f is locally Lipschitz continuous, so then we
can uniquely continue the solution x (t ) at every t . However, the solution may
escape in finite time:
ẋ (t ) = − x 2 (t ), x (0) = x0 .
We conclude this section with a result about the continuity of solutions that
we need in the proof of the minimum principle (Theorem 2.5.1). Here we take
the standard Euclidean norm:
ẋ (t ) = f ( x (t )), x (0) = x0 ,
ż (t ) = f ( z (t )) + g (t ), z (0) = z0 .
Let T > 0. If Ω is a set such that x (t ), z (t ) ∈ Ω for all t ∈ [0, T ) and if f on Ω has
Lipschitz constant K , then
t
x (t ) − z (t ) ≤ e Kt
x 0 − z 0 + g (τ) dτ ∀t ∈ [0, T ).
0
x(t ; x0 )
for x0 0
x̃ 0 1/x̃ 0 t
x(t ; x0 )
for x0 0
x2
x0
x̄
x(t )
x1
F IGURE B.3: Illustration of stability for systems with two state components,
x = ( x 1 , x 2 ). See § B.2.
210 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS
2. attractive if ∃δ1 > 0 such that x 0 − x̄ < δ1 implies that limt →∞ x (t ; x 0 ) = x̄.
6. unstable if x̄ is not stable. This means that ∃ > 0 such that ∀δ > 0 an x 0
and a t 1 ≥ 0 exists for which x 0 − x̄ < δ yet x (t 1 ; x 0 ) − x̄ ≥ .
Lyapunov’s second method mimics the well-known physical property that a sys-
tem that continually loses energy eventually comes to a halt. Of course, in a
mathematical context one may bypass the notion of physical energy, but it is a
helpful interpretation.
Suppose we have a function V : Rn → R that does not increase along any
solution x of the differential equation, i.e., that
V̇ ( x (t )) ≤ 0 ∀t . (B.5)
It is positive semi-definite if V (x̄) = 0 and V (x) ≥ 0 for all other x. And V is nega-
tive (semi-)definite if −V is positive (semi-)definite.
Positive definite implies that V has a unique minimum on Ω, and that the
minimum is attained at x̄. The assumption that the minimum is zero, V (x̄) = 0,
is a convenient normalization. Figure B.4 shows an example of each of the four
types of “definite” functions.
positive positive
definite semi-definite
x̄ x x̄ x
x̄ x̄
x x
negative negative
definite semi-definite
1. V is continuously differentiable,
Proof. We denote the open ball with radius r and center x̄ by B (x̄, r ), i.e.,
We first consider the stability property. For every > 0 we have to find a δ > 0
such that x 0 ∈ B (x̄, δ) implies x (t ) ∈ B (x̄, ) for all t > 0. To this end we construct
a series of inclusions, see Fig. B.5.
Because Ω is a neighborhood of x̄, there exists an 1 > 0 such that B (x̄, 1 ) ⊂
Ω. Without loss of generality we can take it so small that 1 ≤ . Because V is
continuous on Ω and because the boundary of B (x̄, 1 ) is a compact set, the
function V has a minimum on the boundary of B (x̄, 1 ). We call this minimum
α, and realize that α > 0. Now define
V ( x (t k j ; x 0 )) ≥ V ( x (t k j + t ; x 0 )) ≥ V ( x (t k j +m ; x 0 )),
where m is chosen such that t k j + t < t k j +m . The term in the middle equals
V ( x (t ; x (t k j ; x 0 ))). So we also have
V ( x (t k j ; x 0 )) ≥ V ( x (t ; x (t k j ; x 0 ))) ≥ V ( x (t k j +m ; x 0 )),
V (x ∞ ) ≥ V ( x (t ; x ∞ )) ≥ V (x ∞ ).
(Let us be precise here: since the differential equation is locally Lipschitz con-
tinuous we have, by Theorem B.1.3, that x (t ; x) depends continuously on x. For
that reason we are allowed to say that lim j →∞ x (t ; x (t k j ; x 0 )) = x (t ; x ∞ ).) The
above shows that V ( x (t ; x ∞ )) = V (x ∞ ) for all t > 0. In particular we see that
V ( x (t ; x ∞ )) is constant. But that would mean that V̇ (x ∞ ) = 0, and this violates
the fact that V̇ is negative definite and x ∞ = x̄. Therefore the assumption that
x (t ) does not converge to x̄ is wrong. The system is asymptotically stable. ■
1 1
x
1 − x2
F IGURE B.6: Graph of . See Example B.3.3.
1 + x2
1 − x 2 (t )
ẋ (t ) =
1 + x 2 (t )
has two equilibria, x̄ = ±1, see Fig. B.6. For equilibrium x̄ = 1 we propose the
candidate Lyapunov function
V (x) = (x − 1)2 .
B.3 LYAPUNOV F UNCTIONS 215
x2
joint
x1 x1 x1
mass
F IGURE B.7: Left: pendulum. Right: level sets of its mechanical energy V (x).
See Example B.3.4.
ẋ 1 (t ) = x 2 (t ),
g
ẋ 2 (t ) = − sin( x 1 (t )).
1
V (x) = m2 x 22 + mg 1 − cos(x 1 ) .
2
This energy is zero at (x 1 , x 2 ) = (2kπ, 0), k ∈ Z and it is positive elsewhere. To turn
this into a Lyapunov function for the hanging position x̄ = (0, 0) we simply take,
say,
For strong Lyapunov functions, Theorem B.3.2 states that x (t ; x 0 ) → x̄ for ini-
tial sates x 0 that are sufficiently close to the equilibrium. At first sight it seems
reasonable to expect that the “bigger” the set Ω the “bigger” the region of attrac-
tion. Alas, as demonstrated in Exercise B.4, having a strong Lyapunov function
on the entire state space Ω = Rn does not imply that x (t ; x 0 ) → x̄ for all initial
conditions x 0 ∈ Rn . The question that thus arises is: what is the region of attrac-
tion of the equilibrium x̄ in case it is asymptotically stable, and under which
conditions is this region of attraction the entire state space Rn ?
The proof of Theorem B.3.2 gives some insight into the region of attrac-
tion. In fact, it follows that the region of attraction of x̄ includes the largest ball
around x̄ that is contained in Ω1 :={x ∈ B (x̄, ) | V (x) < α}, see Fig. B.5. We use
this observation to formulate an extra condition on V that guarantees global
asymptotic stability.
F IGURE B.8: Phase portrait of the system of Example B.3.6. The origin is
globally asymptotically stable.
ẋ 1 (t ) = − x 1 (t ) + x 22 (t ),
ẋ 2 (t ) = − x 2 (t ) x 1 (t ) − x 2 (t ).
V (x) = x 12 + x 22 .
Powerful as the theory may be, it does not really tell us how to find a Lya-
punov function, assuming one exists. Systematic design of Lyapunov functions
is hard, but it does work for linear systems, as discussed in § B.5. In physical sys-
tems the construction of Lyapunov functions is often facilitated by the knowl-
edge of existence of conserved quantities, like total energy or total momentum.
218 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS
ẋ 1 (t ) = x 2 (t ),
g d
ẋ 2 (t ) = − sin( x 1 (t )) − x 2 (t ),
m
where x 1 is the angular displacement, and x 2 is the angular velocity. The param-
eter d is a positive friction coefficient. The time-derivative of the mechanical
energy V (x) = 12 m2 x 22 + mg (1 − cos(x 1 )) is
g
V̇ (x) = mg sin(x 1 )x 2 − m2 x 2 sin(x 1 ) − d 2 x 22 = −d 2 x 22 ≤ 0.
Thus the mechanical energy decreases everywhere except if the angular velocity
x 2 is zero. Using Theorem B.3.2 we may draw the conclusion that the system is
stable, but not that it is asymptotically stable, because V̇ (x) is also zero at points
other than the equilibrium (it is zero at every x = (x 1 , 0)). However from physical
considerations we feel that (0, 0) is asymptotically stable.
ẋ 1 (t ) = 0,
ẋ 2 (t ) = − x 2 (t ).
B.4 L A S ALLE ’ S I NVARIANCE P RINCIPLE 219
x2
x1
F IGURE B.9: Simple system. All solutions converge to the x 1 -axis. See
Example B.4.2.
Clearly, x 1 is constant and x 2 converges exponentially fast to zero (see the vec-
tor field of Fig. B.9). Now
V (x) := x 12 + x 22
V̇ (x) = 2x 1 ẋ 1 + 2x 2 ẋ 2 = −2x 22 ≤ 0.
The set of states x where V̇ (x) = 0 is where x 2 = 0 (i.e., the x 1 -axis) and every-
where else in the plane we have V̇ (x) < 0. In that sense V̇ (x) is negative “almost
everywhere”. The origin is however not asymptotically stable because every
(x̄ 1 , 0) on the x 1 -axis is an equilibrium, so no matter how small we take δ > 0,
there is always an initial state x 0 = (δ/2, 0) less than δ away from (0, 0) for which
the solution x (t ; x 0 ) is constant, and so does not converge to (0, 0).
We set up a generalized Lyapunov theory that allows to prove that the hang-
ing position in the damped pendulum example (Example B.4.1) is asymptoti-
cally stable, and that in Example B.4.2 all solutions converge to the x 1 -axis. It
requires a bit of terminology.
The orbit of x 0 is just the set of states that x (t ; x 0 ) traces out as t varies over
all t ≥ 0.
So once the state is in an invariant set it never leaves it. Note that every
orbit is an example of an invariant set. In particular every equilibrium point is
an invariant set.
Example B.4.5. The x 1 -axis is an invariant set for the system of Example B.4.2.
In fact every element x = (x 1 , 0) of this axis is an invariant set because they all
are equilibria. The general solution is x (t ) = (x 10 , x 20 e−t ). This shows that for
instance also the x 2 -axis {(0, x 2 )|x 2 ∈ R} is an invariant set.
The union of two invariant sets is invariant. In fact, the union of an arbitrary
number (finite, infinite, countable, uncountable) of invariant sets is invariant.
r (t ) := x 21 (t ) + x 22 (t )
changes over time,
2
ṙ (t ) = dt
d
x 1 (t ) + x 22 (t )
= 2 x 1 (t ) ẋ 1 (t ) + 2 x 2 (t ) ẋ 2 (t )
= 2 x 1 (t ) x 2 (t ) + 2 x 21 (t ) 1 − x 21 (t ) − x 22 (t )
− 2 x 2 (t ) x 1 (t ) + 2 x 22 (t ) 1 − x 21 (t ) − x 22 (t )
= 2 x 21 (t ) + x 22 (t ) 1 − x 21 (t ) − x 22 (t ) .
Therefore
If r (0) = 1 then r (t ) is always equal to one, so the unit circle is an invariant set.
Furthermore, Eqn. (B.9) shows that if 0 ≤ r (0) < 1, then 0 ≤ r (t ) < 1 for all time.
Hence the open unit disc is also invariant. Using similar arguments, we find that
also the complement of the unit disc is invariant. This system has many more
invariant sets.
In the previous example the state does not always converge to a single ele-
ment, but to a set (e.g., the unit circle in Example B.4.6). We use dist(x, G ) to
denote the (infimal) distance between a point x ∈ Rn and a set G ⊆ Rn , thus
dist(x, G ) := inf x − g ,
g ∈G
V ( x (t n ; x 0 )) ≥ V ( x (t n + t ; x 0 )) ≥ V (x ∞ ) ∀t ≥ 0. (B.10)
(The first inequality holds because V̇ (x) ≤ 0 and the second inequality follows
from V̇ (x) ≤ 0 combined with the fact that t n +t < t n+k for some large enough k,
222 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS
V (x ∞ ) ≥ V ( x (t ; x ∞ )) ≥ V (x ∞ ).
The proof also provides an explicit description of the set K . But if we only
want to establish asymptotic stability then we can normally avoid this descrip-
tion. Its existence is enough.
ẋ 1 (t ) = x 32 (t ),
(B.11)
ẋ 2 (t ) = − x 31 (t ) − x 2 (t ).
Clearly, the origin (0, 0) is an equilibrium. For this equilibrium we propose the
Lyapunov function
V (x) = x 14 + x 24 .
This implies that the origin is stable, but not necessarily asymptotically stable.
To prove asymptotic stability we use Theorem B.4.7. This theorem says that a
bounded, closed invariant neighborhood K of the equilibrium (0, 0) exists, but
we need not worry about its form. The set of interest is G . It contains those
initial states x ∈ K whose solution x (t ; x) satisfies the system equations (B.11)
and at the same time is such that V̇ ( x (t )) = 0 for all time. For our example the
latter means
x 2 (t ) = 0 ∀t .
Substituting this into the system equations (B.11) gives
ẋ1 (t ) = 0,
0 = − x 31 (t ) − 0, ∀t .
G = {(0, 0)}.
LaSalle’s invariance principle proves that for every x 0 ∈ K the solution x (t ) con-
verges to (0, 0) as t → ∞, and, hence, that (0, 0) is an asymptotically stable equi-
librium of this system.
B.4 L A S ALLE ’ S I NVARIANCE P RINCIPLE 223
Example B.4.9 (Example B.4.1 continued). Consider again the damped pendu-
lum from Example B.4.1,
ẋ 1 (t ) = x 2 (t ),
g d (B.12)
ẋ 2 (t ) = − sin( x 1 (t )) − x 2 (t ).
m
V̇ (x) = −d 2 x 22 .
The equality V̇ ( x (t )) = 0 hence holds for all time iff x 2 (t ) = 0 for all time, and
the LaSalle set G therefore is
G = {x ∈ K | x 1 = kπ, k ∈ Z, x 2 = 0}.
This set contains at most two physically different solutions: the hanging down-
wards solution x = 00 and the standing upwards solution x = π0 . To rule out
the upwards solution it suffices to take the neighborhood Ω of x̄ = 00 so small
π
that 0 ∈ Ω. For example
Since the energy does not increase over time it is immediate that this set is
invariant. It is also closed and bounded, and it is a neighborhood of (0, 0).
224 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS
ẋ 1 (t ) = 0,
ẋ 2 (t ) = − x 2 (t ),
V̇ (x) = −2x 22 ≤ 0.
ẋ 1 (t ) = 0.
G = {(x 1 , x 2 ) ∈ K | x 2 = 0}.
This is a part of the x 1 -axis. Now LaSalle’s invariance principle says that all states
that start in K converge to the x 1 -axis as t → ∞. For K we can take for instance
K = {x ∈ R2 |V (x) ≤ 1000}.
L(x) ≥ 0
per unit time, when we are at state x. As time progresses we move as dictated
by the differential equation, and so the cost L( x (t )) typically changes with time.
The cost-to-go V (x 0 ) is now defined as the total payment over the infinite future
if we start at x 0 , that is, it is the integral of L( x (t )) over positive time,
∞
V (x 0 ) := L( x (τ)) dτ for x (0) = x 0 . (B.13)
0
whenever V (x) as defined in (B.13) is convergent. To see this split the cost-to-
go into an integral over the first h units of time and an integral over the time
beyond h,
t +h ∞ t +h
V ( x (t )) = L( x (τ)) dτ + L( x (τ)) dτ = L( x (τ)) dτ + V ( x (t + h)).
t t +h t
Therefore
t +h
V ( x (t + h)) − V ( x (t )) − t L( x (τ)) dτ
V̇ ( x (t )) = lim = lim = −L( x (t ))
h→0 h h→0 h
if L( x (t )) is continuous. An interpretation of (B.14) is that the current cost-to-
go minus the cost-to-go from tomorrow onwards, is what we pay today. The
function L(x) is called the running cost. In physical applications L(x) is often
the dissipated power, and then V (x) is the total dissipated energy.
m ẍ 1 (t ) = −mg sin( x 1 (t )) − d ẋ 1 (t ).
L( x 1 (t ), ẋ 1 (t )) = d ẋ 1 (t ) ẋ 1 (t ) = d 2 ẋ 21 (t ).
Notice that L ≥ 0 because d > 0. The cost-to-go is the total dissipated power,
∞
V ( x 1 (0), ẋ 1 (0)) := d 2 ẋ 21 (t ) dt
0
∞
−mg sin( x 1 (t )) − m ẍ 1 (t )
= d 2 ẋ 1 (t ) dt
d
0∞
= −mg sin( x 1 (t )) ẋ 1 (t ) − m2 ẍ 1 (t ) ẋ 1 (t ) dt
0
∞
= mg cos( x 1 (t )) − m2 21 ẋ 21 (t )
0
= mg (1 − cos( x 1 (0)) + 2 m ẋ 1 (0).
1 2 2
(B.15)
As mentioned earlier, the only obstacle is that the integral (B.13) has to be
well defined and continuously differentiable at x 0 . If the system dynamics is
linear of the form
ẋ (t ) = A x (t )
then these obstacles can be overcome and we end up with a very useful result.
It is a classic result in systems theory. In this result we take the running cost to
be quadratic in the state,
L(x) = x T Qx,
ẋ (t ) = A x (t ), x (0) = x0 . (B.16)
A T P + P A = −Q (B.19)
1. =⇒ 2. Trivial.
B.5 C OST- TO -G O LYAPUNOV F UNCTIONS 227
This shows that for every Q ∈ Rn×n (symmetric or not) there is a P ∈ Rn×n
for which A T P + P A = −Q. This means that the linear mapping from P ∈
Rn×n to A T P + P A ∈ Rn×n is surjective. It is a standard result from linear
algebra that a surjective linear mapping from a finite-dimensional vector
space to the same vector space is in fact invertible. Hence, for every Q ∈
Rn×n (symmetric or not) the solution P of A T P + P A = −Q exists and is
unique. Our Q is symmetric and positive definite, so the solution P as
constructed in (B.18) is symmetric and positive definite.
punov function with V̇ (x) < 0 for all x = 0. It is radially unbounded, hence
the equilibrium is globally asymptotically stable (Theorem B.3.5).
The proof of the above theorem actually shows that for asymptotically stable
matrices A ∈ Rn×n , the Lyapunov equation
A T P + P A = −Q
for every matrix Q ∈ Rn×n (not necessarily symmetric or positive definite) has
a unique solution P ∈ Rn×n . In particular, it immediately yields the following
result.
ẋ (t ) = −2 x (t )
−2p − 2p = −q = −1
has a unique solution, p = 1/4, and the solution is positive. Note that Theo-
rem B.5.2 says that we may take any q > 0 that we like. Indeed, whatever pos-
itive q we take, we have that the solution of the Lyapunov equation is unique
and positive: p = q/4 > 0.
By symmetry, the upper-right and lower-left entries are identical so the above
matrix equation is effectively three scalar equations in the three unknowns
α, β, γ:
−2α = −1,
2α − 2β = 0,
4β − 2γ = −1.
B.6 LYAPUNOV ’ S F IRST M ETHOD 229
This matrix is positive definite because P 11 = 12 > 0 and det(P ) = 1 > 0 (see
Appendix A.1). Therefore the differential equation with equilibrium x̄ = (0, 0) is
globally asymptotically stable.
o(δx )
lim = 0. (B.21)
δx →0 δx
We think of little-o functions as functions that are “extremely small” around the
origin.
To analyze the behavior of the state x (t ) relative to an equilibrium x̄, it
makes sense to analyze δx (t ) defined as the difference between state and equi-
librium,
δx (t ) := x (t ) − x̄.
δ̇x (t ) = Aδx (t ).
x̄ x 0 x
f (x) A x
F IGURE B.11: Nonlinear f (x) (left) and its linear approximation Aδx (right).
x̄ = 0.
The idea of linearization is that around x̄ the function f (x) is almost indistin-
guishable from its tangent with slope
∂ f (x̄)
A= = −2 cos(x̄) = −2 cos(0) = −2,
∂x
see Fig. B.11, and so the solutions x (t ) of (B.23) will probably be quite similar to
x̄ + δx (t ) = δx (t ) with δx (t ) the solution of the linear system
provided that δx (t ) is small. The above linear system (B.24) is the linearized sys-
tem of (B.23) at equilibrium x̄ = 0.
Lyapunov’s first method presented next, roughly speaking says that the non-
linear system and the linearized system have the same asymptotic stability
properties. The only exception to this rule is if the eigenvalue with the largest
real part is on the imaginary axis (so its real part is zero). The proof of this result
relies on the fact that every asymptotically stable linear system has a Lyapunov
function (namely its cost-to-go) which turns out to be a Lyapunov function for
the nonlinear system as well:
B.6 LYAPUNOV ’ S F IRST M ETHOD 231
2. If there is an eigenvalue of the Jacobian (B.22) with positive real part, then
x̄ is an unstable equilibrium of the nonlinear system.
A T P + P A = −I ,
and that V (δx ) = δxT P δx is a strong Lyapunov function for the linear sys-
tem δ̇x (t ) = Aδx (t ). We prove that V (x) := x T P x is also a strong Lyapunov
function for ẋ (t ) = f ( x (t )) on some neighborhood Ω of x̄ = 0. Clearly, this
V is positive definite and continuously differentiable and positive definite.
We have that
V̇ (x) = ẋ T P x + x T P ẋ
= f (x)T P x + x T P f (x)
= [Ax + o(x)]T P x + x T P [Ax + o(x)]
= x T (A T P + P A)x + o(x)T P x + x T P o(x)
= −x T x + 2 o(x)T P x
= −x2 + 2 o(x)T P x.
The two cases of Theorem B.6.2 cover all possible eigenvalue configura-
tions, except when some eigenvalues have zero real part and none have pos-
itive real part. In fact, if there are eigenvalues on the imaginary axis then the
dynamical behavior crucially depends on the higher-order terms o(δx ), which
are neglected in the linearization. For example, the three systems
ẋ (t ) = x 2 (t ),
ẋ (t ) = − x 2 (t ),
ẋ (t ) = x 3 (t ),
all have the same linearization at x̄ = 0, but their dynamical properties are very
different. See also Exercise B.5.
ẋ 1 (t ) = x 1 (t ) + x 1 (t ) x 22 (t )
ẋ 2 (t ) = − x 2 (t ) + x 21 (t ) x 2 (t ).
The system has equilibrium x̄ := 00 , and the Jacobian A at this 00 is
∂ f (x̄) 1 + x 22 2x 1 x 2 1 0
A= = = .
∂x T 2x 1 x 2 −1 + x 12 x= 0 0 −1
0
B.7 Exercises
B.1 Equilibria.
ẋ (t ) = A x (t ),
with A an n × n matrix. Argue that this system either has exactly one
equilibrium, or infinitely many equilibria.
B.7 E XERCISES 233
B.2 Investigate the stability of the origin for the following two systems (that is,
check all six stability types as mentioned in Definition B.2.2). Use a suit-
able Lyapunov function.
(a)
ẋ 1 (t ) = − x 31 (t ) − x 22 (t ),
ẋ 2 (t ) = x 1 (t ) x 2 (t ) − x 32 (t ).
ẋ 1 (t ) = x 2 (t ),
ẋ 2 (t ) = − x 31 (t ).
β
[Hint: try V (x 1 , x 2 ) = x 1α + cx 2 and then determine suitable α, β, c.]
B.3 Adaptive Control. The following problem from adaptive control illustrates
an extension of the theory of Lyapunov functions to functions that are,
strictly speaking, no longer Lyapunov functions. This problem concerns
the stabilization of a system of which the parameters are not (completely)
known. Consider the scalar system
ẋ (t ) = a x (t ) + u (t ), x (0) = x0 , (B.25)
(a) Write (B.25)–(B.26) as one system with state ( x , k ) and determine all
equilibrium points.
(b) Consider the function V (x, k) := x 2 +(k−a)2 . Prove that V̇ ( x (t ), k (t )) =
0 for all x , k . For which equilibrium point is this a Lyapunov func-
tion?
(c) Prove, using the above, that k (t ) is bounded.
(d) Prove, using (B.26), that k (t ) converges as t → ∞.
(e) Prove that limt →∞ x (t ; x 0 ) = 0.
234 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS
(f ) Determine limt →∞ k (t ).
B.4 This exercise is based on an exercise in Khalil (1996) who, in turn, took it
from a book by Hahn1 , and it appears that Hahn was inspired by a paper
by Barbashin and Krasovskı̆2 . Consider the system
− x 1 (t ) + x 2 (t )(1 + x 21 (t ))2
ẋ 1 (t ) = ,
(1 + x 21 (t ))2
− x 1 (t ) − x 2 (t )
ẋ 2 (t ) = ,
(1 + x 21 (t ))2
and define V : R2 → R as
x 12
V (x) = + x 22 .
1 + x 12
whenever x 2 (t ) = 1/ x 1 (t ) > 0.
(e) Use the above to prove that the origin is not globally asymptotically
stable.
ẋ (t ) = a x 3 (t )
with a ∈ R.
(a) Prove that the linearization of this system about its equilibrium
point is independent of a.
1 W. Hahn. Stability of Motion, volume 138 of Die Grundlehren der mathematischen Wis-
USSR, 86(3): 453–456, 1952. (Russian). English title: “On the stability of motion in the large”.
B.7 E XERCISES 235
x2
x1 x2 1
0 2 x1
F IGURE B.12: A phase portrait of the system of Exercise B.4. The red
dashed lines are level sets of V (x). The boundary of the shaded region
{(x 1 , x 2 ) | x 1 , x 2 > 0, x 1 x 2 > 1} is where x 2 = 1/x 1 > 0.
ẋ 1 (t ) = − x 51 (t ) − x 2 (t ),
ẋ 2 (t ) = x 1 (t ) − 2 x 32 (t ).
(a) Determine all points of equilibrium.
(b) Determine a Lyapunov function for the equilibrium x̄ = (0, 0), and
discuss the type of stability that follows from this Lyapunov function
(stable? asymptotically stable? globally asymptotically stable?)
ẋ 1 (t ) = x 2 (t ) − x 1 (t ),
ẋ 2 (t ) = − x 31 (t ),
236 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS
ÿ (t ) − (1 − y 2 (t )) ẏ (t ) + y (t ) = 0.
This equation occurs in the study of vacuum tubes and then is positive.
However, in this exercise we take < 0.
(a) Rewrite this equation in the standard form (B.2) with x 1 := y and
x 2 := ẏ .
(b) Use linearization to show that the origin (x 1 , x 2 ) = (0, 0) is an asymp-
totically stable equilibrium (recall that < 0).
(c) Determine a neighborhood Ω of the origin for which V (x 1 , x 2 ) = x 12 +
x 22 is a Lyapunov function for x̄ = (0, 0).
(d) Let V (x 1 , x 2 ) and Ω be as in the previous part. Which stability prop-
erties can be concluded from LaSalle’s invariance principle?
ẋ 1 (t ) = −a x 1 (t ) + b x 1 (t ) x 2 (t ), x 1 (0) ≥ 0,
ẋ 2 (t ) = c x 2 (t ) − d x 1 (t ) x 2 (t ), x 2 (0) ≥ 0.
The first term on the right-hand side of the first equation models that
predators become extinct without food, while the second term models
that the growth of the number of predators is proportional to the number
of prey. Likewise, the term on the right-hand side of the second equation
models that without predators the population of prey increases, while its
decrease is proportional to the number of predators. For convenience we
choose a = b = c = d = 1.
(a) Show that, apart from (0, 0), the system has a second equilibrium
point.
(b) Investigate the stability of both equilibrium points using lineariza-
tion.
B.7 E XERCISES 237
(c) Investigate the stability of the nonzero equilibrium point using the
function
V (x 1 , x 2 ) = x 1 + x 2 − ln(x 1 x 2 ) − 2.
ẋ 1 (t ) = x 2 (t ),
g d (B.27)
ẋ 2 (t ) = − sin( x 1 (t )) − x 2 (t ).
m
Here x 1 is the angular displacement, x 2 is the angular velocity, g is the
gravitational acceleration, is the length of the pendulum, m is the mass
of the pendulum, and d is a friction coefficient. All constants g , , d , m are
positive.
(a) Prove, using Theorem B.6.2, that the origin is an asymptotically sta-
ble equilibrium point.
(b) In Example B.4.9 we verified asymptotic stability using LaSalle’s
invariance principle. Here we want to construct a strong Lyapunov
function to show asymptotic stability using Theorem B.3.2: deter-
mine a symmetric matrix P > 0 such that the function
V (x) := x T P x + g 1 − cos(x 1 )
ẋ 1 (t ) = −2 x 1 (t )( x 1 (t ) − 1)(2 x 1 (t ) − 1),
(B.28)
ẋ 2 (t ) = −2 x 2 (t ).
ẋ 1 (t ) = x 1 (t )(1 − x 22 (t )),
ẋ 2 (t ) = x 2 (t )(1 − x 21 (t )).
For each of the equilibrium points determine the linearization and the
nature of stability of the linearization.
238 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS
B.13 The equations of motion of a rigid body spinning around its center of
mass are
(This implies a certain lack of symmetry of the rigid body, e.g., it is not a
unit cube. An example where 0 < I 1 < I 2 < I 3 is shown in Fig. B.13.)
(c) The origin (0, 0, 0) is just one equilibrium. Determine all equilibria
and explain what this implies about the stability properties.
(d) Determine the linearization around each of the equilibria.
(e) Use linearization to prove that steady spinning around the second
principal axis (0, ω̄2 , 0) is unstable if ω̄2 = 0.
B.7 E XERCISES 239
are constant over time, and use this to prove that steady spinning
around the first and third principal axes is stable, but not asymptot-
ically stable.
Remark: A spinning body spins stably both around the principal axis
with the smallest moment of inertia and the principal axis with the
largest moment of inertia. But around the other principal axis it is
not stable. This can be demonstrated by (carefully) spinning this
book in the air. You will see that you can get it to spin nicely around
the axis with largest inertia – like a discus – and around the axis with
smallest inertia – like a spear – but you will probably fail to make it
spin around the other axis.
B.14 Consider the system ẋ (t ) = f ( x (t )) with equilibrium point x̄, and let Ω
be a neighborhood of x̄. Suppose a Lyapunov function exists such that
V̇ (x) = 0 for all x ∈ Ω. Prove that this system with equilibrium x̄ is not
asymptotically stable.
B.17 In this exercise we look at variations of the system (B.8) from Exam-
ple B.4.6. We investigate the system
ẋ 1 (t ) = x 2 (t ) + x 1 (t ) γ − x 21 (t ) − x 22 (t ) ,
ẋ 2 (t ) = − x 1 (t ) + x 2 (t ) γ − x 21 (t ) − x 22 (t ) ,
with γ ∈ R. Prove that the origin is an asymptotically stable equilibrium
point if γ ≤ 0, and that it is an unstable equilibrium point if γ > 0.
B.19 (Assumes knowledge of Appendix A.1.) Let the matrices A and Q be given
by
0 1 4 6
A= , Q= ,
−2 −3 6 10
A T P + P A = −I .
(b) Check (without using a computer) that this solution P is positive def-
inite.
ẋ 1 (t ) = x 1 (t ) + 2 x 2 (t ),
ẋ 2 (t ) = −α x 1 (t ) + (1 − α) x 2 (t ).
Determine all α’s for which this differential equation is asymptotically sta-
ble around x̄ = (0, 0).
4 3 2 1 1 2 3 4
F IGURE B.14: The blue phase portrait is that of the system of Exercise B.22.
This is also the phase portrait of the system ẋ (t ) = A even x (t ) of Exer-
cise B.23. In red is the phase portrait of ẋ (t ) = A odd x (t ) of Exercise B.23.
All trajectories (blue and red) converge to zero as t → ∞.
242 A PPENDIX B: D IFFERENTIAL E QUATIONS AND LYAPUNOV F UNCTIONS
B.23 Notice that the results we derived in this chapter are formulated only for
time-invariant systems ẋ (t ) = f ( x (t )). For time-varying systems ẋ (t ) =
f ( x (t ), t ) the story is quite different, even if the system is linear of the form
ẋ (t ) = A(t ) x (t ). (B.29)
in which
−1 − π/2 −1 −3π/2
A even = 3 , A odd = 1 .
3π/2 −1 3 π/2 −1
Here t denotes the floor of t (the largest integer less than are equal to t ).
The system hence switches dynamics at every t ∈ Z.
(b) Verify that x (t ) = e A even t x (0) for all t ∈ [0, 1], and x (t ) = e A odd (t −1) x (1)
for all t ∈ [1, 2].
(c) Show that
−(3/ e)2 0
x (2k + 2) = x (2k)
0 −1/(3 e)2
for all k ∈ Z, and use it to conclude that the time-varying sys-
tem (B.29) is not asymptotically stable.
(d) Use the above to sketch in Fig. B.14 the trajectory x (t ) for t > 0 with
initial condition x (0) = 10 , and argue that this trajectory diverges as
t → ∞.
B.7 E XERCISES 243
x1
F IGURE B.15: Stable or not? Globally attractive or not? See Exercise B.7.
B.24 This exercise is based on an example from a paper by Ryan and Sontag3 .
It is about a system whose equilibrium is globally attractive yet not stable!
Consider the system ẋ (t ) = f ( x (t )) with
⎧ 1 x1
⎪
⎪ −x 1 (1 − x ) − 2x 2 (1 − x )
⎪
⎪ if x ≥ 1,
⎪
⎨ −x 2 (1 − 1 ) + 2x 1 (1 − x1 )
x x
f (x) =
⎪
⎪
⎪
⎪ 2(x 1 − 1)x 2
⎪
⎩ if x < 1.
−(x 1 − 1)2 + x 22
Notice that f inside the unit disc is defined differently than outside the
unit disc. Nevertheless, f (x) is locally Lipschitz continuous, also on the
unit circle. Inside the unit circle, the orbits are arcs (parts of circles) that
converge to x̄ = (1, 0), see Fig. B.15. Outside, x ≥ 1, the system is easier
to comprehend in polar coordinates (x 1 , x 2 ) = (r cos(θ), r sin(θ)) with r =
3 E.P. Ryan and E.D. Sontag. Well-defined steady-state response does not imply CICS. System
x 12 + x 22 . This gives
ṙ (t ) = 1 − r (t ),
(B.31)
θ̇(t ) = 4 sin2 (θ(t )/2) = 2(1 − cos(θ(t ))).
B.25 Let A ∈ Rn×n and suppose that A + A T is negative definite. Is the origin a
stable equilibrium of ẋ (t ) = A x (t )?
Solutions to Odd-Numbered
Exercises
Chapter 1
∂ ∂
1.1 (a) 0 = ∂x
d
− dt ∂ẋ ( ẋ 2 − α x 2 ) = −2α2 x − dt (2 ẋ )
d
= −2α2 x − 2 ẍ . So
ẍ + α x = 0. Its solution (using characteristic polynomials) is
2
∂ d ∂
∂x ( dt G(t , x (t ))) − dt ∂x G(t , x (t ))
d
=
= 0.
d ∂G(t , x (t ))
This holds for all x . In the last equality we used that dt ( ∂x )
∂ d
equals ∂x ( dt G(t , x (t ))).
T T d
(c) 0 F (t , x (t ), ẋ (t )) dt = 0 dt (G(t , x (t )) dt = G(T, x T ) − G(0, x 0 ) so the
outcome is the same for all functions x that satisfy the given bound-
ary conditions.
© The Editor(s) (if applicable) and The Author(s), under exclusive license 245
to Springer Nature Switzerland AG 2023
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0
246 S OLUTIONS TO O DD -N UMBERED E XERCISES
1.5 The constant 4πρv 2 plays no role. So take F (y, ẏ) = y ẏ 3 . Beltrami gives
∂
C = y ẏ 3 − ẏ ( y ẏ 3 ) = y ẏ 3 − 3 y ẏ 3 = −2 y ẏ 3 .
∂ ẏ
Hence y ẏ 3 is constant. Now y (x) := y 1 (x/x 1 )3/4 satisfies this equation
(verify this) and, in addition, then y (0) = 0 and y (x 1 ) = y 1 as required.
(By the way, the function y (x) = y 1 (x/x 1 )3/4 is not differentiable at x = 0.)
1.7 (a) By contradiction. If f is not constant then f (a) = f (b) for some
a, b ∈ (0, T ). Let φ be a continuous function with φ(a) = +1 and
φ(b) = −1 of the form
T
This function satisfies 0 φ(t ) dt = 0 and for small enough “tents”
T
around a, b the integral 0 f (t )φ(t ) dt is nonzero because f (t ) is
continuous and f (a) = f (b). Contradiction. Hence f is constant.
∂F (t , x ∗ (t ), ẋ ∗ (t ))
(b) Momentarily denote ∂x simply as F x (t ), and let G x (t ) be
T
an antiderivative of F x (t ). Now 0 F xT (t )δx (t ) dt equals [G xT (t )δx (t )]T0 −
T T T
0 G x (t )δ̇x (t ) dt . The term [G x (t )δx (t )]0 is zero because δx (0) =
T
1.9 (a)
∂ d ∂
0= − ( ẋ 2 − 2 x ẋ − ẋ )
∂x dt ∂ẋ
d
= −2 ẋ − (2 ẋ − 2 x − 1)
dt
= −2 ẋ − (2 ẍ − 2 ẋ ) = −2 ẍ .
S OLUTIONS TO O DD -N UMBERED E XERCISES 247
1.13
⎡ 2 ⎤
∂ F (... ) ∂2 F (... )
∂2 F (t , xx12 , ẋẋ12 ) ∂ẋ 12 ∂ẋ 1 ∂ẋ 2 2 0
⎣
ẋ1 ẋ1 T = ∂2 F (... ⎦ = > 0.
) ∂2 F (... ) 0 2
∂ ẋ2 ∂ ẋ2 ∂ẋ 2 ∂ẋ 1 ∂ẋ 22
Chapter 2
So H ( x ∗ (T ), p ∗ (T ), u ∗ (T )) is zero.
(d) For u ∗ = − px /2 the costate equation becomes ṗ = ( 12 p 2 − 2) x with
final condition p (T ) = 2. Clearly the constant p ∗ (t ) = 2 satisfies the
final condition, and also the DE because then ṗ ∗ = 0 = ( 12 p 2∗ − 2).
(e) For p = 2 (constant) we have u ∗ = − x so ẋ = − x 2 . See Example B.1.5:
x (t ) = 1/(t + 1).
2.3 The Hamiltonian is H = p(x +u)+ 12 u 2 −2u −2x. If u would have been free
to choose, then the minimizing u would have been the one that achieves
0 = ∂H /∂u = p + u − 2. This gives u = 2 − p. Given that U = [0, 4] and the
fact that H is a convex parabola in u it is easy to see that the minimizing u
is the element of [0, 4] that is closest to û := 2 − p. The costate equation is
ṗ = 2− p , p (1) = 0. Therefore p (t ) = 2(1−e1−t ). Thus û (t ) := 2− p (t ) = 2 e1−t .
We need u (t ) ∈ [0, 4]. Now û (t ) = 4 at t = 1 − ln(2) ≈ 0.3069, and û (t ) > 4 if
t < 1 − ln(2), and û (t ) ∈ [0, 4] if t ≥ 1 − ln(2). Therefore
4 if t ∈ [0, 1 − ln(2))
u ∗ (t ) = 1−t
.
2e if t ∈ [1 − ln(2), 1]
(e) Since x (0) > 0 and ẋ ≥ 0 we have x (t ) > 0 for all t > 0. Now u ∗ (t ) =
−1/( p (t ) x (t )) = 1/(2 − t ) > 0. So yes.
2.13 (a) Once x (t ) > 0 it can only increase because a > 0, u (t ) ≥ 0 so then
ẋ (t ) = a u (t ) x (t ) ≥ 0.
T
(b) Since J ( u ) = − 0 x 2 (t ) dt we have H = p 1 aux 1 + p 2 a(1 − u)x 1 − x 2 .
The costate equations become
ṗ 1 = −a up 1 − a(1 − u ) p 2 , p 1 (T ) = 0,
ṗ 2 = 1, p 2 (T ) = 0.
So p 2 (t ) = t − T . Write H as H = uax 1 (p 1 − p 2 ) + · · · , thus the opti-
mal u only depends on sign of p 1 − p 2 . Now the clever trick: at the
final time we have ṗ 1 (T ) = 0, and since ṗ 2 = 1 it follows that we have
p 1 (t ) − p 2 (t ) < 0 near T . Therefore u (t ) = 0 near T . Now, as in Exam-
ple 2.5.5, solve the equations backwards in time:
p 1 (t ) = − 12 a(t − T )2 ∀t ∈ [t s , T ].
2.15 (We use p 1:n to denote the first n entries of a vector p.)
u
(a) f (z) = M (x,u) , L(z, u) = F (x, u), K (z) = c00 ,
(b)
Hλ (z, p, u) = p 1:n
T
u + p n+1 M (x, u) + λF (x, u).
So
∂H
ṗ n+1 = − = 0.
∂z n+1
(c) First note that the first n entries of the costate satisfy
∂H ∂F ∂M
ṗ 1:n = − =− − p n+1 (B.32)
∂x ∂x ∂x
∂H
(if λ = 1). The optimal input satisfies ∂u = 0. This means
∂M ∂F
p 1:n + p n+1 + = 0. (B.33)
∂u ∂u
Now (1.54) for μ∗ := p n+1 (constant) is satisfied because
∂M d ∂M ∂M
p n+1 ( − dt ( )) = p n+1 d
+ dt p 1:n = 0.
∂x ∂u ∂x
(Actually it can be shown that p n+1 is indeed nonzero, for if p n+1
would have been zero then the minimality property of the Hamil-
tonian would yield p 1:n = 0, but Thm. 2.6.1 guarantees that a zero
costate is impossible in the abnormal case, so p n+1 = 0.)
S OLUTIONS TO O DD -N UMBERED E XERCISES 253
2.19 (a) The same as the proof of Theorem 2.8.1 but with addition of the red
parts:
J (x 0 , u ) − J (x 0 , u ∗ )
T
= L( x , u ) dt − L( x ∗ , u ∗ ) dt + K ( x (T )) − K ( x ∗ (T ))
0
T
= (H ( x , p ∗ , u )− p ∗T ẋ )−(H ( x ∗ , p ∗ , u ∗ )− p ∗T ẋ ∗ ) dt + K ( x (T ))−K ( x ∗ (T ))
0
T
= (H ( x , p ∗ , u ) − H ( x ∗ , p ∗ , u ∗ )) − p ∗T ( ẋ − ẋ ∗ ) dt + K ( x (T )) − K ( x ∗ (T ))
0
T
≥ − ṗ ∗T ( x − x ∗ ) − p T∗ ( ẋ − ẋ ∗ ) dt + K ( x (T )) − K ( x ∗ (T ))
0
T
= − p ∗T (t )( x (t ) − x ∗ (t )) 0 + K ( x (T )) − K ( x ∗ (T ))
∂K ( x ∗ (T ))
≥ − p ∗T (T )( x (T ) − x ∗ (T )) + ( x (T ) − x ∗ (T )) (B.34)
∂x T
∂K ( x ∗ (T ))
=0 because p ∗ (T ) = .
∂x
(Inequality (B.34) is because of convexity of K .)
(b) For every constrained state entry x i we trivially have that (B.34) is
zero (because x i (T ) = x ∗i (T ) for every such entry).
Chapter 3
3.1 K = −K 0 , L = −L 0 , J = −J 0 .
0 + min(2xu + x 2 + u 2 ) = 0.
u∈R
3.5 To turn maximization into minimization we need to swap the sign of the
cost,
3
J [0,3] (x 0 , u ) := x (3) + ( u (t ) − 1) x (t ) dt .
0
So
0 if Q(t ) + 1 > 0
u (t ) = .
1 if Q(t ) + 1 < 0
S OLUTIONS TO O DD -N UMBERED E XERCISES 255
(c) As Q(3) = 1 we have Q(t ) + 1 > 0 near the final time. So then u = 0
which turns the HJB equations into Q̇(t )−1 = 0,Q(3) = 1. Thus Q(t ) =
t − 2. This is the solution on [1, 3] for then we still have Q(t ) + 1 > 0.
On [0, 1] we have u (t ) = 1 so then the HJB equations become Q̇(t ) +
Q(t ) = 0 which, given Q(1) = −1, implies that Q(t ) = − e1−t .
(d) u (t ) = 1 on [0, 1], and u (t ) = 0 on [1, 3]. Then x (t ) satisfies ẋ (t ) =
x (t ) u (t ) which is well defined for all t ∈ [0, 3]. So the candidate opti-
mal solution is truly optimal, and the candidate value function is the
true value function.
(e) The optimal (minimal) cost is V (x 0 , 0) = Q(0)x 0 = − e x 0 , so the max-
imal satisfaction is + e x 0 .
3.7 (a) We want the final state to be as close as possible to zero. So,
whenever x (t ) is nonzero, we steer optimally fast to zero: u (t ) =
− sgn( x (t )). Once x (t ) is zero, we take u (t ) = 0.
(b) The plot below shows a couple of state trajectories x (t ) as a function
of time:
|x| ≤ T − t
and for any (x, t ) in this triangle the final state x (T ) is zero, so then
V (x, t ) = 0. For any (x, t ) above the triangle we have x (T ) = x − (T −
t ) = x +t −T , hence V (x, t ) = (x +t −T )2 . Likewise for any (x, t ) below
the triangle we have x (T ) = x + (T − t ) = x − t + T , hence V (x, t ) =
256 S OLUTIONS TO O DD -N UMBERED E XERCISES
(x − t + T )2 :
As a formula:
⎧
⎪
⎪ if |x| ≤ T − t
⎨0
V (x, t ) = (x + t − T ) 2
if x > T − t .
⎪
⎪
⎩(x − t + T )2 if x < t − T
0 + min 0 = 0.
u∈[−1,1]
∂V (x,t )
3.9 We momentarily use Vx to mean ∂x , and likewise for Vt .
Cancel the common factor (x − 1)2 and we find the simple ODE
Ṗ (t ) = P 2 (t ), P (T ) = 1/β.
258 S OLUTIONS TO O DD -N UMBERED E XERCISES
1 − x (t )
ẋ (t ) = , x (0) = 0.
β+T −t
T ))2 = 1/(β + T ).
(c) limβ↓0 x (T ) = T /T = 1. So it equals the desired voltage 1. This is to be
expected because the quadratic term ( x (T ) − 1)2 /β in the cost func-
tion blows up as β ↓ 0 unless x (T ) → 1.
p ∗ (t ) = et −T if x > 0.
p ∗ (t ) = 1 if x < 0.
(b) For x 0 = 0 the solution of ẋ = xu is zero for all time (not dependent
on bounded u ). Thus every u results in the same cost J .
(c) If x ∗ (t ) > 0 then ∂V ( x ∗ (t ), t )/∂x = et −T . It agrees with the above
p ∗ (t ).
If x ∗ (t ) < 0 then ∂V ( x ∗ (t ), t )/∂x = 1. It agrees with the above p ∗ (t ).
The minimizing u satisfies V (x) + 2u = 0. Using this V (x) = −2u the HJB
equation becomes 0 = −2u 2 +x 4 +u 2 . Hence u = ∓x 2 and, so, V (x) = ±2x 2 .
Clearly, V (x) = 23 |x|3 is the unique solution of the HJB equation that is
nonnegative and such that V (0) = 0. The corresponding input is u ∗ (t ) =
S OLUTIONS TO O DD -N UMBERED E XERCISES 259
3.17 (a) It is a variation of Example 3.6.1. The infinite horizon HJB equa-
tion (3.30) becomes
min V (x)(x + u) + u 4 = 0.
u∈R
ẋ (t ) = f ( x (t ), u (t )), x (t1 ) = x
then, by time-invariance, the shifted x̃ (t ) := x (t − δ), ũ (t ) := u (t − δ)
satisfy the same differential equation but with shifted-time initial
condition
Therefore
Chapter 4
4.1 So A = 3, B = 2,Q = 4, R = 1, S = 0.
3 −4
(a) H = .
−4 −3
(b) Now
x ∗ (t ) 1 4 e5t + e−5t −2 e5t +2 e−5t x0
= .
p ∗ (t ) 5 −2 e5t +2 e−5t e5t +4 e−5t p ∗ (0)
2 e5T −2 e−5T
p ∗ (0) = x0 .
e5T +4 e−5T
This determines p ∗ (0) and, therefore, determines x ∗ , p ∗ for all time.
Finally, u ∗ = −R −1 B T p ∗ = −2 p ∗ . So also u ∗ is determined for all time.
The optimal cost is
2 e5T −2 e−5T
p ∗ (0)x0 = x 02 .
e5T +4 e−5T
(b)
ẋ = A x + B ( u ∗ + v ), x (0) = x0 ,
ẋ ∗ = A x ∗ + B u ∗ , x ∗ (0) = x0 ,
=⇒ ż = A z + B v , z (0) = 0.
T
J= x T Q x + ( u ∗ + v )T ( u ∗ + v ) dt ,
0
T
J∗ = x ∗T Q x ∗ + u ∗T u ∗ dt ,
0
T
=⇒ J − J∗ = ( x ∗ + z )T Q( x ∗ + z ) − x ∗T Q x ∗ + v T v + 2 u ∗T v dt
0
T
= ( z T Q z + v T v ) + 2 z T Q x ∗ + 2 u ∗T v dt .
0
d
dt ( p ∗T z ) = ṗ ∗T z + p ∗T ż
= (−Q x ∗ − A T p ∗ )T z + p ∗T (A z + B v )
= − z T Q x ∗ + p ∗T B v .
= − z T Q x ∗ − u ∗T v .
T
So 0 z T (t )Q x ∗ (t ) + u ∗T (t ) v (t ) dt = [− p ∗T (t ) z (t )]T0 = 0 and, therefore,
T
J − J ∗ = 0 z T (t )Q z (t ) + v T (t ) v (t ) dt ≥ 0.
The sum of these two cancels all cross terms and leaves
T
2L (x, u )T QL (x, u )+2L (z, w )T QL (z, w )+2 u T R u +2 w T R w dt .
0
J [t ,T ] (x + z, u x + w z ) + J [t ,T ] (x − z, u x − w z ) = 2 V (x, t ) + 2 V (z, t ).
J [t ,T ] (x + z, u x + w z ) + J [t ,T ] (x − z, u x − w z ) = 2 V (x, t ) + 2 V (z, t ).
The previous two parts show that the above right-hand side equals
V (x + z, t ) + V (x − z, t ).
(f ) V is the minimal cost, so the left-hand side of the equality of the pre-
vious part is nonnegative, while the right-hand side is non-positive.
So they must both be zero! Hence J [t ,T ] (x + z, u x + w z ) = V (x + z, t ). It
shows that u x + w z is optimal for x +z. Scaling z with a factor λ shows
the result.
(g) Trivial: if u ∗ is linear in x (t ) then so is u ∗ (t ) at every t .
(h) Follows from (a) and the fact that V (x +λz, t ) = J [t ,T ] (x +λz, u x +λ w z ).
4.7 (a) The RDE is Ṗ = −2P + P 2 − 3, P (0) = s. We use Exercise 4.6: we have
Ṗ = (P + 1)(P − 3), so G := 1/(P + 1) satisfies Ġ = 4G − 1. Hence G(t ) =
1 4t
4 + c e for some c. Exercise 4.6 now says that
1 3 − c e4t
P (t ) = −1 + = .
c e4t +1/4 1 + c e4t
(d) The larger s is the higher the penalty on the final state, so for both
plots a “small” final value x (T ) corresponds to a “large” s. A bit
vague: if s is small and x is close to zero then the dominant term
in the cost function is u 2 . So (as long as x does not change too much
over the rest of time) it is “optimal” to take u small, but then ẋ ≈ x
so x starts to increase in magnitude.
u ∗ (t ) = e+αt v ∗ (t ) = − e+αt R −1 B T P (t ) z (t ) = −P (t ) x (t ).
0
1 1
4.13 (a) Note that P B for B = 1 is the 2nd column of P , so P B = 3 2 .
A T P + P A − (P B )R −1 (P B )T + Q
22 1 1 22 1 0 1 1 1 1 1 0
= 13 01 −1 + − 3 2 3( 3 1 2 ) + 0 0
0 1 2 3 1 2 −1 0
1 1 0
= 13 −1 − 2 + 1 −1 2 2 −1
3 − 2 1 3 2 1 2 + 00
$ ! "# $
2 2 1
! "#
−2 2 1 1 2 1 0 0 0
1
= 3 −3 + 00 = 00 .
2 2 2 2
0 1 ? ? 0 1 0
(b) A − B R −1 B T P = 0 1
−1 0 −
1 3 0 1 3 1 2 = −1 0 − 1 1 2 =
0 1
−2 − 2 .
The standard stability test uses eigenvalues. The eigenvalues
are the
−1 T
λ −1
zeros λ of det(λI − (A − B R B P )) = det 2 λ+ 2 = λ(λ + 2) + 2. So
λ1,2 = − 12 2 ± 3/2i. They have negative real part, hence the closed
loop is asymptotically stable.
(An easier test for stability is to verify that V (x) := x T P x is a strong
Lyapunov function for the closed-loop system. Indeed P > 0, and
V̇ (x) = −x T (Q + P B R −1 B T P )x and −(Q + P B R −1 B T P ) < 0.)
264 S OLUTIONS TO O DD -N UMBERED E XERCISES
2
4.15 (a) S = 0,Q = 4, R = 1, A = 3, B = 2. The ARE
is −4P + 6P + 4 = 0. The
solutions P of the ARE are P = (−6 ± 62 + 43 )/ − 8 that is P = −1/2
and P = 2. So we need P = 2.
(b) (By the way, Lemma 4.5.6 guarantees that A − B R −1 B T P is asymp-
totically stable because (A, B ) is stabilizable and (Q, A) detectable.)
We have A − B R −1 B T P = 3 − 4 × 2 = 3 − 8 = −5. Asymptotically stable
indeed.
(c) F = −R −1 B T P = −4.
(d) P x 02 = 2x 02 .
(e) The eigenvalues must be ±5 because A − B R −1 B T P = −5.
3 −4 1 −5 1
(f ) H = −4 −3 , P = 2. Then H 2 = −10 = −5 2 .
4.19 (a) Swapping n rows means the determinant gains a factor (−1)n . Multi-
plying n rows with −1 means another (−1)n . Hence, in total a factor
(−1)2n = 1. So the sign of determinant does not change.
(b) Let Z (λ) be the matrix of (a). Clearly, (Z (−λ))T = Z (λ). Hence r (λ) =
det(Z (λ)) = det(Z (−λ))T ) = r (−λ).
(c) For every zero λ = 0 also −λ is a zero. Suppose it has 2m nonzero
%
zeros λ1 , . . . , λ2m . Then r (λ) = cλ2n−2m m i =1 (λ − λi )(λ + λi ) =
%
c(λ2 )n−m m i =1 (λ 2
− λ 2
i
). It is a function of λ 2
.
Appendix B
∂V (x̄) ∂V (x̄)
B.1 (a) V̇ (x̄) = ∂x T f (x̄) = ∂x T 0 = 0.
(b) Let x̄ 1 , x̄ 2 be two equilibria. Global asymptotic stability of x̄ 1 means
that also x (t ; x̄ 2 ) would have to converge to x̄ 1 , but it does not
because, by definition of equilibrium, we have x (t ; x̄ 2 ) = x̄ 2 for all t .
(c) x̄ is an equilibrium iff A x̄ = 0. If A is nonsingular then x̄ = A −1 0 = 0.
If A is singular then any element in the null space of A is an equilib-
rium (and the null space then has infinitely many elements).
x0
B.3 (a) ẋk̇ = (a−xk2 ) x , xk (0)
(0) = 0 . It is an equilibrium iff (x̄, k̄) = (0, k̄) with
k̄ free to choose.
(b) V̇ (x, k) = 2x(a − k)x + 2(k − a)x 2 = 0. This V is C 1 , is positive definite
relative to (x, k) = (0, a). So V is a Lyapunov function (on Ω = R2 ) for
equilibrium (0, a).
S OLUTIONS TO O DD -N UMBERED E XERCISES 265
B.7 Let Ω = R2 .
266 S OLUTIONS TO O DD -N UMBERED E XERCISES
The first one is unstable (eigenvalue +1) so also the nonlinear system
is unstable at (0, 0). The second one has imaginary eigenvalues only,
so this says nothing about the stability of the nonlinear system.
(c) The function y − ln(y) − 1 ≥ 0 for all y ≥ 0 (proof: its derivative is zero
at y = 1 and its second derivative is > 0 so the function is minimal at
y = 1 and then y −ln(y)−1 = 0 ≥ 0). Likewise V ( x 1 , x 2 ) = ( x 1 +ln( x 1 )−
1) + ( x 2 − ln( x 2 ) − 1) is nonnegative and is minimal at (x 1 , x 2 ) = (1, 1)
(where V = 0). So V is positive definite relative to (1, 1). Clearly it is
also C 1 for all x 1 , x 2 > 0. (Yes, our state space is x 1 , x 2 > 0.) Remains
to analyze V̇ (x):
So V (x) is preserved over time. Hence (1, 1) is stable, but not asymp-
totically stable.
S OLUTIONS TO O DD -N UMBERED E XERCISES 267
B.11 (a) The first equation says x 1 = 0 or x 1 = 1 or x 1 = 1/2. The second equa-
tion says x 2 = 0. So three equilibria: (0, 0), (1, 0), and (1/2, 0).
(b) Jacobian (for general x) is
−2(x 1 − 1)(2x 1 − 1) − 2x 1 (2x 1 − 1) − 2x 1 (x 1 − 1) 0
.
0 −2
The first two have stable eigenvalues only (so the nonlinear system
is asymptotically stable at (0, 0) and (1, 0)).
(c) The third has a unstable eigenvalue (1/2 > 0) so the nonlinear system
at (1/2, 0) is unstable.
B.13 (a) Since the model does not include damping we expect the kinetic
energy to be constant, and V (ω) := 12 (I 1 ω21 + I 2 ω22 + I 3 ω22 ) clearly is
positive definite, and is C 1 . Now
(e) From the inequality 0 < I 1 < I 2 < I 3 it follows that the Jacobian is of
the form (with ω̄2 = 0):
⎡ ⎤
0 0 a ω̄2
⎣ 0 0 0 ⎦
b ω̄2 0 0
(f ) In the first part we already showed that V (ω) := 12 (I 1 ω21 + I 2 ω22 + I 3 ω22 )
is constant over time. Likewise we have for W (ω) defined as
that
Ẇ (ω) = (I 1 (I 2 − I 3 ) + I 2 (I 3 − I 1 ) + I 3 (I 1 − I 2 ))2ω1 ω2 ω3 = 0.
B.17 For γ > 0 it is essentially the same as Example B.4.6. Anyway, for r := x 12 +
x 22 one has ṙ = 2 x 1 ( x 2 + x 1 (γ − x 21 − x 22 )) + 2 x 2 (− x 1 + (γ − x 21 − x 22 )) = 2( x 21 +
x 22 )(γ− x 21 − x 22 ) = 2 r (γ− r 2 ). If γ ≤ 0 then ṙ ≤ −2 r 3 so asymptotically stable
(take, e.g., Lyapunov function V (r ) = r 2 .) If γ > 0 then the linearization
around r = 0 is ṙ = (2γ) r , which is unstable (with eigenvalue 2γ > 0). So
also the nonlinear system is unstable in that case.
B.19 (a) P = 11 12 .
S OLUTIONS TO O DD -N UMBERED E XERCISES 269
det(λI − A) = λ2 + (α − 2)λ + 1 + α.
p − αq = 1,
2p + (2 − α)q − αr = 0,
2q + (1 − α)r = 1.
2α2 − α + 2 3α − 2 6+α
p= , q= 2 , r= 2 .
α2 − α − 2 α −α−2 α −α−2
p q
The P = q r needs to exist, so we need α = −1 and α = 2. As 2α2 −α+2 >
0 for all α, we have p 11 > 0 iff α2 − α − 2 > 0. This is the case iff α > 2 or
α < −1. The determinant of P is 2α(α+2α +8α+8
3 2
2 −α−2)2 . It is positive iff α > −1. This
combined with (α < −1 or α > 2) shows that P exists and is unique with
P > 0 iff α > 2.
270 S OLUTIONS TO O DD -N UMBERED E XERCISES
M. Athans and P.L. Falb. Optimal Control: An Introduction to the Theory and Its
Applications. McGraw-Hill, New York, 1966.
R.W. Brockett. Finite Dimensional Linear Systems. Wiley, New York, 1970.
A.E. Bryson and Y.C. Ho. Applied Optimal Control: Optimization, Estimation and
Control. Taylor & Francis Group, New York, 1975.
H.K. Khalil. Nonlinear Systems. Macmillan Publisher Company, New York, 2nd
edition, 1996.
© The Editor(s) (if applicable) and The Author(s), under exclusive license 271
to Springer Nature Switzerland AG 2023
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0
Index
B (x, r ), 213 C
C 1, 7 catenoid, 17
C 2, 7 characteristic
H (x, p, u), 51 equation, 192
Hλ (x, p, u), 65 polynomial, 192
J T ( u ), 69 root, 192
J [0,T ] (x 0 , u ), 87 closed-loop system, 98
V (x, τ), 93 closed trajectory, 239
V̇ (x), 211 concave function, 198
∂ f (x)
∂x , 189 control
o, 10 closed-loop, 98
H∞ -norm, 172 open-loop, 98
H∞ optimization, 172 optimal, 47
L2 , 169 controllability, 195
L2 -gain, 171, 178 controller, 174
L2 -norm, 169 convex
calculus of variations, 29
A combination, 198
action integral, 19 function, 198
algebraic Riccati equation, 138 minimimum principle, 76
ARE, 138 set, 198
stabilizing solution, 140 cost
asymptotically stable augmented, 50
equilibrium, 210 criterion, 6
matrix, 194 final, 23, 47
attractive equilibrium, 210 function, 6
augmented initial, 23
cost, 49 running, 6, 47
function, 202 terminal, 23, 47
running cost, 50 cost-to-go, 224
available energy, 177 discrete time, 92
optimal, 93
B
costate, 52, 106
Bellman, 87
cycloid, 14
Beltrami identity, 13
brachistochrone problem, 3, 14
© The Editor(s) (if applicable) and The Author(s), under exclusive license 273
to Springer Nature Switzerland AG 2023
G. Meinsma and A. van der Schaft, A Course on Optimal Control, Springer
Undergraduate Texts in Mathematics and Technology,
https://doi.org/10.1007/978-3-031-36655-0
274 Index
D Hamilton-Jacobi-Bellman, 96
detectability, 196 Hamiltonian, 51
Dido’s isoperimetric problem, 31 (ab)normal, 65
differential dissipation inequality, 177 equations, 52
discount for LQ, 123
factor, 5 matrix, 123
rate, 5 modified, 65
dissipation inequality, 175 Hessian, 190
dissipativity, 175 HJB, 96
du Bois-Reymond, 39 infinite horizon, 107
dynamic programming, 87
I
E infinite horizon
endpoint LQ problem, 137
free, 23 optimal control problem, 107
energy, 176 initial
equilibrium, 210 cost, 23
asymptotically stable, 210 state, 205
attractive, 210 input, 47, 195
stable, 210 stabilizing, 109, 139
unstable, 210 integral constraint, 31
escape time, 208 invariant set, 219
Euler, 10
Euler-Lagrange equation, 10 J
discrete time, 45 Jacobian, 190, 230
higher-order, 20
L
Euler equation, 10
Lagrange, 10
F lemma, 9
filter, 173 multiplier, 32, 50, 202
final cost, 23, 47 Lagrangian, 6, 19
final time, 69 submanifold, 182
floor, 242 subspace, 180
free endpoint, 23 LaSalle’s invariance principle, 221
free final time, 69 Legendre condition, 28
lemma of du Bois-Reymond, 39
G linear quadratic optimal control, 121
global linearized system, 229
asymptotic stability, 210 line segment, 198
attractive, 210 Lipschitz
Lipschitz condition, 207 constant, 206
Goldschmidt solution, 18 continuity, 206
global, 207
H local, 206
Hamilton’s principle, 19
Index 275
T V
terminal cost, 47 value function, 93
theorem of alternatives, 203 infinite horizon, 107
time Van der Pol equation, 236
escape, 208
optimal control, 70 W
tuning parameter, 149 Weierstrass necessary condition, 115
Wirtinger inequality, 159
U
unstable equilibrium, 210 Z
Zermelo, 70