Numerical Methods For Eigenvalue Problems (PDFDrive)
Numerical Methods For Eigenvalue Problems (PDFDrive)
De Gruyter
0DWKHPDWLFV6XEMHFW&ODVVL¿FDWLRQ3ULPDU\ 15A18, 15A22, 15A23, 15A42, 65F15,
65F25; 6HFRQGDU\ 65N25, 15B57.
ISBN 978-3-11-025033-6
e-ISBN 978-3-11-025037-4
/LEUDU\RI&RQJUHVV&DWDORJLQJLQ3XEOLFDWLRQ'DWD
A CIP catalog record for this book has been applied for at the Library of Congress.
%LEOLRJUDSKLFLQIRUPDWLRQSXEOLVKHGE\WKH'HXWVFKH1DWLRQDOELEOLRWKHN
7KH'HXWVFKH1DWLRQDOELEOLRWKHNOLVWVWKLVSXEOLFDWLRQLQWKH'HXWVFKH1DWLRQDOELEOLRJUD¿H
detailed bibliographic data are available in the internet at http://dnb.d-nb.de.
Printed in Germany
www.degruyter.com
Preface
Eigenvalues and eigenvectors of matrices and linear operators play an important role
when solving problems from structural mechanics, and electrodynamics, e.g., by de-
scribing the resonance frequencies of systems, when investigating the long-term be-
haviour of stochastic processes, e.g., by describing invariant probability measures,
and as a tool for solving more general mathematical problems, e.g., by diagonalizing
ordinary differential equations or systems from control theory.
This book presents a number of the most important numerical methods for finding
eigenvalues and eigenvectors of matrices. It is based on lecture notes of a short course
for third-year students in mathematics, but it should also be accessible to students of
physics or engineering sciences.
We discuss the central ideas underlying the different algorithms and introduce the
theoretical concepts required to analyze their behaviour. Our goal is to present an
easily accessible introduction to the field, including rigorous proofs of all important
results, but not a complete overview of the vast body of research.
For an in-depth coverage of the theory of eigenvalue problems, we can recommend
the following monographs:
J. H. Wilkinson, “The Algebraic Eigenvalue Problem” [52]
B. N. Parlett, “The Symmetric Eigenvalue Problem” [33]
G. H. Golub and C. F. Van Loan, “Matrix Computations” [18]
G. W. Stewart, “Matrix Algorithms” [43, 44]
D. S. Watkins, “The matrix eigenvalue problem” [49]
We owe a great debt of gratitude to their authors, since this book would not exist
without their work.
The book is intended as the basis for a short course (one semester or trimester) for
third- or fourth-year undergraduate students. We have organized the material mostly
in short sections that should each fit one session of a course. Some chapters and sec-
tions are marked by an asterisk . These contain additional results that we consider
optional, e.g., rather technical proofs of general results or algorithms for special prob-
lems. With one exception, the results of these optional sections are not required for
the remainder of the book. The one exception is the optional Section 2.7 on non-
unitary transformations, which lays the groundwork for the optional Section 4.8 on
the convergence of the power iteration for general matrices.
vi Preface
Preface v
1 Introduction 1
1.1 Example: Structural mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Example: Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Example: Systems of linear differential equations . . . . . . . . . . . . . . . . 5
2 Existence and properties of eigenvalues and eigenvectors 8
2.1 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Characteristic polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Similarity transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Some properties of Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Invariant subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6 Schur decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Non-unitary transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3 Jacobi iteration 39
3.1 Iterated similarity transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Two-dimensional Schur decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 One step of the iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Error estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 Quadratic convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Power methods 61
4.1 Power iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Rayleigh quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3 Residual-based error control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4 Inverse iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5 Rayleigh iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.6 Convergence to invariant subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
viii Contents
Introduction
where u.t; x/ denotes the deflection of the string at time t and position x (cf. Fig-
ure 1.1).
The oscillations are then described by the wave equation
@2 u @2 u
.t; x/ D c .t; x/ for all t 2 R; x 2 .0; 1/;
@t 2 @x 2
where c > 0 is a parameter describing the string’s properties (e.g., its thickness). We
assume that the string is fixed at both ends, i.e., that
Since the differential equation is linear, we can separate the variables: we write u in
the form
∂ 2u ∂2u
u(t,x) ∂t 2 (t,x) = c ∂x 2 (t,x)
u0 W Œ0; 1 ! R; x 7! u0 .x/;
depending only on the location, but not on the time. The differential equation takes
the form
@2 u @2 u
! 2 u0 .x/ cos.!t / D .t; x/ D c .t; x/
@t 2 @x 2
D cu000 .x/ cos.!t / for all t 2 R; x 2 .0; 1/;
in order to obtain
LŒu0 D u0 :
This is an eigenvalue problem in the infinite-dimensional space C 2 .0; 1/.
In order to be able to treat it by a numerical method, we have to discretize the
problem. A simple approach is the finite difference method: Taylor expansion yields
h2 00 h3 .3/ h4 .4/
u0 .x C h/ Du0 .x/ C hu00 .x/ C u0 .x/ C u0 .x/ C u0 .C /;
2 6 24
2 3 4
h h h
u0 .x h/ Du0 .x/ hu00 .x/ C u000 .x/ u0 .x/ C u0 . /
.3/ .4/
2 6 24
Section 1.1 Example: Structural mechanics 3
and assume that the random user chooses each of the links with equal probability
1=`j . In order to make this approach feasible, we have to assume `j ¤ 0 for all
j 2 ¹1; : : : ; nº, i.e., we have to assume that each page contains at least one link.
This means that the probability of switching from page j to page i is given by
´
`ij 1=`j if `ij D 1;
sij WD D for all i; j 2 ¹1; : : : ; nº:
`j 0 otherwise
The probability of visiting page i in step m C 1 is given by
.mC1/
X
n
.m/
pi D sij pj for all i 2 ¹1; : : : ; nº; m 2 N0 :
j D1
to determine the “importance” of a web page: if pj is large, the probability of a user
visiting the j -th web page is high, therefore it is assumed to be important. Due to
p D lim p .m/ D lim p .mC1/ D lim Sp .m/ D S lim p .m/ D Sp ;
m!1 m!1 m!1 m!1
In this example, we not only reduce a problem related to stochastic processes (the
“random walk” of the web user) to an algebraic eigenvalue problem, but we also find
a simple algorithm for computing at least the eigenvector: due to (1.4), we can hope
to approximate p by computing p .m/ for a sufficiently large value of m. Due to
(1.3), this requires only m matrix-vector multiplications, and since we can assume
that each web page contains only a small number of links, these multiplications can
be performed very efficiently.
In order to ensure convergence, the PageRank algorithm replaces the matrix S by
the matrix bS D .Osij /ni;j D1 given by
describing the event that a user switches to a different page without following a link:
when visiting page j , the user either follows one of the links with a total probability
of 1 ˛ or switches to another page with a total probability of ˛. In the latter case,
ui is the relative probability of switching to page i . For a suitable choice of u (e.g.,
a vector with strictly positive entries and u1 C : : : C un D 1), the Perron–Frobenius
theorem [34] ensures convergence of the sequence .p .m/ /1 mD0 to p .
e t y0 D y 0 .t / D Ay.t / D Ae t y0
y0 D Ay0 :
X̀
Ak y .k/ D A` y .`/ C A`1 y .`1/ C C A2 y 00 C A1 y 0 C A0 y D 0;
kD0
where A0 ; : : : ; A` 2 F nn . In this case the ansatz y.t / D e t y0 for some nonzero
vector y0 2 F n yields the identity
X̀
Ak k e t y0 D 0;
kD0
My 00 .t / C Cy 0 .t / C Ky.t / D 0;
where M , C , K are n n matrices called mass matrix, damping matrix, and stiffness
matrix, respectively. The corresponding quadratic eigenvalue problem has the form
.2 M C C C K/x D 0:
A simple example for the case n D 1 is the spring-mass system with damping by
friction, see Figure 1.2.
By Hooke’s law, the equation of motion for this system without friction is
my 00 .t / C ky.t / D 0;
Section 1.3 Example: Systems of linear differential equations 7
m y(t)
where m is the mass attached to the spring and k is the spring constant. If friction
is considered, it is usually modeled in such a way that the friction is assumed to be
proportional to the velocity y 0 .t / thus yielding the equation of motion
my 00 .t / C cy 0 .t / C ky.t / D 0;
Summary
This chapter investigates existence and uniqueness of eigenvalues and eigenvectors
for a given matrix. The key result is the Schur decomposition introduced in Theo-
rem 2.46, a very useful tool for the investigation of eigenvalue problems. One of its
most important consequences is the fact that a matrix can be diagonalized unitarily
if and only if it is normal. The optional Section 2.7 presents a block-diagonalization
result for general square matrices
Learning targets
Recall the definition and some of the most important properties of eigenvalues,
eigenvectors, similarity transformations and the characteristic polynomial corre-
sponding to a matrix.
Introduce a number of basic concepts of Hilbert space theory, e.g., the Cauchy–
Schwarz inequality, self-adjoint, normal, isometric and unitary matrices.
Prove the existence of the Schur decomposition.
Use it to establish the existence of eigenvector bases for normal and self-adjoint
matrices and of invariant subspaces in the general case.
Zero coefficients in a matrix are frequently omitted in our notation, e.g., the n-di-
mensional identity matrix is usually represented by
0 1
1
B C
In D @ : : : A :
1
If the dimension is clear from the context, we denote the identity matrix by I .
The product of a matrix by a vector x 2 F m is given by
0 10 1 0 1
a11 : : : a1m x1 a11 x1 C C a1m xm
B
Ax D @ ::: :: :: C B :: C D B :: C
: : A@ : A @ : A
an1 : : : anm xm an1 x1 C C anm xm
X
m
.Ax/i D aij xj for all i 2 ¹1; : : : ; nº: (2.1)
j D1
The mapping
F m ! F n; x 7! Ax
Definition 2.1 (Null space and range). Let A 2 F nm . The space
N .A/ WD ¹x 2 F m W Ax D 0º
is called the null space of A, and its dimension is called the nullity of A. The space
We recall the rank-nullity theorem: let .yi /kiD1 be a basis of R.A/. By definition,
x i /kiD1 in F m such that
we can find .b
x i D yi
Ab for all i 2 ¹1; : : : ; kº:
Since the family .yi /kiD1 is linearly independent, the same holds for .bx i /kiD1 .
This means that we can expand the family to a basis .b x i /i D1 of F . For each
m m
10 Chapter 2 Existence and properties of eigenvalues and eigenvectors
Ax D x (2.3)
holds. Any such vector is called an eigenvector of A for the eigenvalue . A pair
.; x/ consisting of an eigenvalue and a corresponding eigenvector is called an eigen-
pair.
The set
.A/ WD ¹ 2 F W is an eigenvalue of Aº
is called the spectrum of A.
Exercise 2.5 (Nil-potent matrix). Let N 2 F nn be a nil-potent matrix, i.e., let
there be a k 2 N with N k D 0. Prove .N / ¹0º.
Prove .A/ D ;, i.e., show that A has no eigenvalues. Does the situation change
if we let F D C?
Proof. We first observe that for all x 2 F n and all 2 F, the following statements
are equivalent:
x D Ax;
x Ax D 0;
.I A/x D 0;
x 2 N .I A/:
E.A; / WD N .I A/
Definition 2.9 (Geometric multiplicity). Let A 2 F nn , and let 2 .A/. The
dimension of the eigenspace E.A; / is called the geometric multiplicity of and
denoted by g .A; /.
The matrix A is injective if and only if its columns a1 ; : : : ; am are linearly indepen-
dent.
Section 2.2 Characteristic polynomials 13
det.x1 ; : : : ; xi 1 ; xi ; xi C1 ; : : : ; xj 1 ; xj ; xj C1 ; : : : ; xn /
D det.x1 ; : : : ; xi 1 ; xj ; xi C1 ; : : : ; xj 1 ; xi ; xj C1 ; : : : ; xn /
for all x1 ; : : : ; xn 2 F n ; i; j 2 ¹1; : : : ; nº with i < j
and satisfies det.ı1 ; : : : ; ın / D 1.
Let x1 ; : : : ; xn 2 F n . If there are i; j 2 ¹1; : : : ; nº with i < j and xi D xj , the
fact that the determinant is alternating implies
det.x1 ; : : : ; xi 1 ; xi ; xi C1 ; : : : ; xj 1 ; xj ; xj C1 ; : : : ; xn /
D det.x1 ; : : : ; xi 1 ; xj ; xi C1 ; : : : ; xj 1 ; xi ; xj C1 ; : : : ; xn /
D det.x1 ; : : : ; xi 1 ; xi ; xi C1 ; : : : ; xj 1 ; xj ; xj C1 ; : : : ; xn /;
and therefore det.x1 ; : : : ; xn / D 0. Since the determinant is also multilinear, we can
add any multiple of any argument to any other argument without changing the value
of the determinant. In particular, it is possible to prove that the determinant is equal
to zero if and only if its arguments are linearly dependent.
This means that a matrix A 2 F nn is invertible if and only if det.a1 ; : : : ; an / ¤ 0,
i.e., if the determinant of its column vectors vanishes. We extend the definition of the
determinant to quadratic matrices by setting
det W F nn ! F; A 7! det.a1 ; : : : ; an /;
and have det.A/ ¤ 0 if and only if A is invertible. Combining this property with
Proposition 2.7, we obtain a characterization of eigenvalues that does not require
eigenvectors:
pA W F ! F; t 7! det.tI A/;
Hint: the fundamental theorem of algebra states that every non-constant complex-
valued polynomial has at least one root.
Prove
This means that, given a monic polynomial p of order n, we can find a matrix
C 2 F nn such that pC D p holds, i.e., the task of finding the zeros of a
polynomial and the task of finding the eigenvalues of a matrix are equivalent.
The matrix C is called the companion matrix of the polynomial p.
(Hint: use Laplace’s theorem and induction).
Ax D x:
Let now B 2 F nn be an invertible matrix. Multiplying our equation on both sides
with B 1 yields
B 1 Ax D B 1 x;
but since B 1 appears on the right-hand side, this equation is no longer related to
eigenvalue problems. We fix this problem by introducing bx WD B 1 x and find
B 1 ABb
x D b
x:
This is again an eigenvalue problem. Instead of looking for eigenvalues and eigenvec-
tors of A, we can also look for eigenvalues and eigenvectors of the transformed matrix
A WD B 1 AB. Any eigenvalue of b
b A is also an eigenvalue of A, and any eigenvector
of b
A can be transformed to an eigenvector of A by multiplying by B.
16 Chapter 2 Existence and properties of eigenvalues and eigenvectors
b
A D B 1 AB:
g .A; / D g .b
A; / and a .A; / D a .b
A; / for all 2 .A/ D .b
A/.
bx D B 1 ABB 1 x D B 1 Ax D B 1 x D b
Ab x;
x is an eigenvector of b
so b A, and therefore 2 .b
A/. This implies
.A/ .b
A/; E.A; / BE.b
A; / for all 2 .A/:
.A/ D .b
A/; E.A; / D BE.b
A; /; g .A; / D g .b
A; / for all 2 .A/:
Since the determinant of the product of two matrices is the product of their deter-
minants, we have
pb
A
.t / D det.tI bA/ D det.tB 1 B B 1 AB/ D det.B 1 .tI A/B/
D det.B 1 / det.tI A/ det.B/ D det.B 1 / det.B/ det.tI A/
D det.B 1 B/pA .t / D det.I /pA .t / D pA .t / for all t 2 F;
Proposition 2.19 (Geometric and algebraic multiplicity). Let A 2 F nn , and let 2
.A/. Then we have
g .A; / a .A; /:
We can extend .ej /jkD1 to a basis .ej /jnD1 of F n . Then the matrix E 2 F nn
defined by using the basis vectors as columns, i.e., by
b
A WD E 1 AE:
and therefore
b 1 Ik B
A WD E AE D
C
with B 2 F k.nk/ and C 2 F .nk/.nk/ . Since A and b A are similar, we can apply
Proposition 2.18 to get
b tIk Ik B
pA .t / D pb .t / D det.tIn A/ D det
A tInk C
D det..t /Ik / det.tIkn C /
D .t /k det.tIkn t / for all t 2 F;
A D BDB 1 :
Proof. Assume that a basis .ej /jnD1 of eigenvectors exists, and let .j /jnD1 be the
corresponding eigenvalues. Then the matrix E 2 F nn defined by (2.7) is invertible
and satisfies
E 1 AEıj D E 1 Aej D E 1 j ej D j E 1 Eıj D j ıj for all j 2 ¹1; : : : ; nº;
so D WD E 1 AE is a diagonal matrix, and therefore A is diagonalizable.
Assume now that A is diagonalizable, and let E 2 F nn be a invertible matrix and
D 2 F nn a diagonal matrix satisfying A D EDE 1 . We define vectors .ej /jnD1 by
ej WD Eıj for all j 2 ¹1; : : : ; nº
and obtain
Aej D EDE 1 ej D EDıj D Edjj ıj D djj ej ;
i.e., ej is an eigenvector corresponding to the eigenvalue djj . Since E is invertible,
the vectors .ej /jnD1 are linearly independent and therefore a basis of F n .
The matrix exponential is useful for solving linear ordinary differential equa-
tions: prove
@
exp.tA/ D A exp.tA/ for all t 2 R:
@t
Use Exercise 2.24 to describe the space of solutions t 7! .u.t /; v.t // explicitly.
One of the most important consequences of the close relationship between the inner
product and the norm is the Cauchy–Schwarz inequality. Its proof serves to illustrate
typical Hilbert space arguments:
Both sides are equal if and only if x and y are linearly dependent.
and obtain
jhx; yij2 jhx; yij2 jhx; yij2
0 kx yk2 D kxk2 2 C kyk 2
D kxk 2
; (2.10)
kyk2 kyk4 kyk2
and multiplying both sides by kyk2 yields the estimate (2.9).
If we have equality, i.e., if jhx; yij D kxk kyk holds, (2.10) implies
kx yk D 0.
Lemma 2.29 (Null space and range). Let n; m 2 N and A 2 F nm . We have
Proof. Let x 2 R.A/ and y 2 N .A /. By definition, we can find z 2 F m such that
x D Az and obtain
since y 2 N .A /.
Section 2.4 Some properties of Hilbert spaces 21
kxk2 D hx; xi D 0
by applying (2.12) to y D x.
we have A D 0.
N
i.e., A e D e.
A real unitary matrix is usually also called orthogonal. We always use the term
“unitary” in the interest of consistency.
i.e., we have
X
j
Rıj D rij ıi 2 span¹ı1 ; : : : ; ıj º:
i D1
Eigenspaces share the same structure: if A 2 F nn is a matrix and 2 .A/ is one
of its eigenvalues, we have
In this sense, subspaces with the property (2.18) are a generalization of eigenspaces.
Exercise 2.42 (Invariant subspaces). Find at least five different invariant sub-
spaces for the matrix
0 1
2
B4 3 C
A WD B
@
C 2 C 44 :
7 8A
6
AX D Xƒ (2.20)
Proof. Assume that X is invariant with respect to A. Let j 2 ¹1; : : : ; mº. Due to
X ıj 2 R.X / D X, we have
AXıj 2 X D R.X /;
Assume now that (2.20) holds. For x 2 X D R.X /, we can find y 2 F m with
x D Xy and obtain
The equation (2.20) bears a close resemblance to the equation (2.3) defining eigen-
values and eigenvectors: the eigenvector x is replaced by the matrix X, and the eigen-
value is replaced by the matrix ƒ.
Exercise 2.44 (Invariant subspaces and eigenvalues). Show that any eigenvalue
of the matrix ƒ in (2.20) is an eigenvalue of A if X is injective.
If is an eigenvalue of A for an eigenvector x 2 F n with x 2 R.X /, is it also
an eigenvalue of ƒ?
(where we identify F with F 11 and F n with F n1 in the obvious way) is called an
elementary Householder reflection [21]. It is unitary and self-adjoint and satisfies
By the fundamental theorem of algebra, any complex polynomial has at least one
zero. Applying this result to the characteristic polynomial yields that any complex ma-
trix has at least one eigenvalue, and we can use a Householder reflection to transform
the matrix to the “almost upper triangular” form (2.21). Repeating the process allows
us to reach an upper triangular matrix using only unitary similarity transformations.
Q AQ D R: (2.24)
P q D sgn.q1 /ı1 :
Since b
A is only an n n matrix, we can apply the induction assumption and find a
b 2 C nn with
b 2 C nn and an upper triangular matrix R
unitary matrix Q
b b
Q AQb D R:
b
Now we let
!
1 BQb
Q WD P b ; R WD
Q b
R
and observe
! !
1 1 1 B 1
Q AQ D b b D
P AP
b
b b
Q Q Q A Q
! !
BQb BQb
D D D R:
b b
Q AQb b
R
Theorem 2.46 relies on the fundamental theorem of algebra that holds in C, but
not in R. Since most of the following results are corollaries, they also hold only
in the field of complex numbers.
Remark 2.47 (Order of eigenvalues). In the proof of Theorem 2.46, we can put any
eigenvalue 2 .A/ into the upper left entry of R, and by extension we can choose
any order for the eigenvalues on the diagonal of R.
Section 2.6 Schur decomposition 29
X
n
tr.A/ WD ai i ;
i D1
and conclude X
tr.A/ D a .A; /;
2.A/
While the upper triangular form is already useful, e.g., for finding all eigenvalues
and the corresponding algebraic multiplicities, we are mainly interested in diagonal-
izing a matrix. The following Lemma suggests a possible approach:
Due to ! !
jr11 j2 jr11 j2
bRb D RR D R R D b ;
b R
R R
b is normal, and we can use the induction assumption to complete the
the matrix R
proof.
30 Chapter 2 Existence and properties of eigenvalues and eigenvectors
Corollary 2.50 (Normal matrix). Let n 2 N, and let A 2 C nn . A is normal if and
only if it is unitarily diagonalizable, i.e., if there are a unitary matrix Q 2 C nn and
a diagonal matrix D 2 C nn with
Q AQ D D: (2.25)
Proof. Assume that A is normal. Due to Theorem 2.46, we can find a unitary matrix
Q 2 C nn and an upper triangular matrix R 2 C nn with
A D QRQ :
so R is also normal, and Lemma 2.49 yields that R is, in fact, diagonal.
Assume now that we can find a unitary matrix Q 2 C nn and a diagonal matrix
D 2 C nn satisfying (2.25). Then we have
Proof. Assume that A is self-adjoint. Then it is also normal, and Corollary 2.50 gives
us a unitary matrix Q 2 C nn and a diagonal matrix D 2 C nn with A D QDQ .
Since A is self-adjoint, we obtain
D D Q AQ D Q A Q D .Q AQ/ D D ;
A D QDQ D QD Q D A
Our proofs so far depend on the fundamental theorem of algebra and therefore apply
only to complex matrices. Even if A is self-adjoint and real, Corollary 2.51 still may
yield a complex matrix Q. By slightly modifying the proof of Theorem 2.46, we can
improve the result:
Section 2.6 Schur decomposition 31
Proof. Assume that we can find a unitary matrix Q 2 Rnn and a diagonal matrix
D 2 Rnn satisfying (2.27). Then we have
A D QDQ D QD Q D A ;
so A has to be self-adjoint.
Assume now that A is self-adjoint. According to Corollary 2.51, we can find an
eigenvalue 2 R of A, and due to N .I A/ ¤ ¹0º also a real eigenvector q 2 Rn
with kqk D 1. Using this vector, we can proceed by induction as in the proof of
Theorem 2.46.
X
n X
m X
m X
n
kAk2F D jaij j2 D jbj i j2 D kBk2F D kA k2F :
i D1 j D1 j D1 i D1
32 Chapter 2 Existence and properties of eigenvalues and eigenvectors
Let Q 2 F nn be a unitary matrix. Using the canonical unit vectors .ıj /jmD1 in F m
defined by (2.5), we obtain
X
m X
n X
m X
n X
m
kAk2F D jaij j2 D j.Aıj /i j2 D kAıj k2
j D1 i D1 j D1 i D1 j D1
X m
D kQAıj k2 D kQAk2F
j D1
We measure the “distance” to diagonal form by taking the Frobenius norm of all
entries except for the diagonal.
Q1 R1 Q1 D A D Q2 R2 Q2 :
Proof. (cf. [18, Section 7.1]) Since A, R1 and R2 are similar matrices, Proposition
2.18 yields
Y
n Y
n
.r1;i i / D pR1 ./ D pA ./ D pR2 ./ D .r2;jj / for all 2 C;
iD1 j D1
where r1;i i and r2;jj denote the i -th and j -th diagonal elements of R1 and R2 , re-
spectively. Two factorizations of pA into linear factors can only differ by the ordering
of the factors, and we obtain
X
n X
n
jr1;i i j2 D jr2;jj j2 :
i D1 j D1
Section 2.6 Non-unitary transformations 33
.A/ WD off.R/
equals zero. In this sense, .A/ measures the departure from normality of A, and due
to Proposition 2.56, it does not depend on the particular Schur decomposition.
with R11 2 C kk , R12 2 C k` and R22 2 C `` with n D k C `. We consider the
similarity transformation
I X 1 I X
T WD ; T D (2.29)
I I
If it is injective, the rank-nullity theorem implies that it is also bijective, and there-
fore equation (2.30) is uniquely solvable for all C 2 C nm .
We will prove that .A/ \ .B/ ¤ ; is equivalent to S being not injective.
Let first .A/\.B/ ¤ ;. We pick 2 .A/\.B/ and eigenvectors x 2 C n n¹0º
and y 2 C m n ¹0º satisfying
Ax D x; N
B y D y:
The latter is possible due to Proposition 2.30. Now we let X WD xy and find
i.e., the nullspace of S contains the non-zero matrix X , therefore S cannot be injec-
tive.
Let now S be not injective. Then we can find X 2 C nm n ¹0º with S ŒX D 0, i.e.,
AX D XB: (2.31)
We first consider a special case: assume that B is upper triangular. Let j 2 ¹1; : : : ; mº
be the smallest index satisfying x WD X ıj ¤ 0, i.e., the number of the first non-zero
column of X. Then we have
X
j X
j
Ax D AXıj D XBıj D X bij ıi D bij X ıi :
i D1 i D1
Ax D bjj X ıj D bjj x;
Section 2.7 Non-unitary transformations 35
i.e., x is an eigenvector of A for the eigenvalue bjj . Since B is upper triangular, bjj is
also an eigenvalue of B, i.e., bjj 2 .A/ \ .B/.
Now we return to the general case. According to Theorem 2.46, we can find a Schur
decomposition of B, i.e., a unitary matrix Q 2 C mm and an upper triangular matrix
b such that
B
B D QBQb
holds. (2.31) yields
b ;
AX D XB D XQBQ b
AXQ D XQB;
b WD XQ, we find
due to Q Q D I , and introducing X
bDX
AX b B:
b
If we can ensure .R11 / \ .R22 / D ; in (2.28), Proposition 2.57 implies that the
transformation T introduced in (2.29) can eliminate R12 . Repeating this procedure
allows us to transform any square complex matrix to block-diagonal form:
holds for an invertible matrix B 2 C nn . For each ` 2 ¹1; : : : ; kº, we have .R` / D
¹` º and n` D a .A; ` /.
Proof. (cf. [18, Theorem 7.1.6]) By induction on the cardinality #.A/ of the spec-
trum. If #.A/ D 1, the Schur decomposition of Theorem 2.46 yields the desired
factorization with B WD Q and R1 WD R.
Assume now we have a k 2 N such that the desired factorization exists for all
A 2 C nn satisfying #.A/ D k.
Let A 2 C nn be given with #.A/ D k C 1 and .A/ D ¹1 ; : : : ; kC1 º. Due to
Theorem 2.46, we can find a unitary matrix Q 2 C nn and an upper triangular matrix
R 2 C nn satisfying
Q AQ D R:
We define
n1 WD #¹i 2 ¹1; : : : ; nº W ri i D 1 º; nO WD n n1 :
36 Chapter 2 Existence and properties of eigenvalues and eigenvectors
ri i D 1 for all i n1 ;
ri i ¤ 1 for all i > n1 :
b 2 Rn
We split R into R1 2 Rn1 n1 , R O nO and Y 2 Rn1 nO such that
R1 Y
RD b :
R
b D Y:
R1 X X R
We let
I
B WD QT b
B
and obtain
1 I 1 I I R1 I
B AB D b1 T Q AQT b D b1 b b
B B B R B
0 1
R1
B C
R1 B R2 C
D b1 R
bBb D B :: C:
B @ : A
RkC1
Applying Proposition 2.18 to the representation (2.32) and taking advantage of the
fact that the determinant of a block diagonal matrix is the product of the determinants
of the diagonal blocks yields
Y
k Y
k
pA ./ D pR` ./ D . ` /n` for all 2 C;
`D1 `D1
the vector x is called a generalized eigenvector of A for the eigenvalue . The smallest
integer k 2 N satisfying (2.34) is called the order of the generalized eigenvector.
38 Chapter 2 Existence and properties of eigenvalues and eigenvectors
more specialized Jordan normal form of the matrix A. For our purposes, however, the
block-diagonal decomposition of Theorem 2.58 is sufficient.
Chapter 3
Jacobi iteration
Summary
In this chapter, we consider the simple, but reliable, Jacobi iteration. Given a self-
adjoint matrix A 2 F nn , it performs a sequence of unitary similarity transformations
that eliminate off-diagonal entries and thus moves the matrix closer to diagonal form.
The Jacobi iteration may not be the fastest method, but it is guaranteed to converge
for any self-adjoint matrix. The optional Section 3.5 proves that the Jacobi iteration
will converge quadratically if a matrix is already close to diagonal form.
Learning targets
Compute the two-dimensional Schur decomposition in a numerically stable way.
Construct the elementary Jacobi step that eliminates a sub-diagonal entry of a
matrix (and, due to symmetry, a corresponding super-diagonal entry as well).
Use these elementary steps to construct a globally convergent iteration.
Derive a posteriori bounds for the iteration error that allow us to judge the accu-
racy of each iteration step.
yields
b m AQ
Am D Q bm for all m 2 N0 ;
40 Chapter 3 Jacobi iteration
Multiplying by b, we obtain
2 ad 2 a d 2 .a d /2
0 D .bt / 2 .bt / jbj D bt jbj2 :
2 2 4
Taking the square root yields
s sˇ ˇ
ad .a d /2 ad jbj ˇˇ a d ˇˇ2
bt D ˙ C jbj2 ; tD ˙ C 1:
2 4 2b b ˇ 2b ˇ
We introduce
ad
WD ;
2b
recall (cf. 2.22) that the sign of a complex number is denoted by
´
z=jzj if z ¤ 0;
sgn.z/ WD for all z 2 F;
1 otherwise
and arrive at q
1
t D˙ j j2 C 1: (3.4)
sgn.b/
Using t , we can reconstruct c and s by
1
1 D jcj2 C jsj2 D jcj2 C jt cj2 D .1 C jt j2 /jcj2 ; cDp ; s D t c:
1 C jt j2
(3.5)
42 Chapter 3 Jacobi iteration
We still have to choose the sign of the square root in (3.4). In order to obtain proper
convergence of the iterative method, we have to ensure that the next iterate B D
Q AQ is close to A. For the rotation Q given in (3.1), this means that the sine s
should be as small as possible, i.e., jt j should be as small as possible: we have to
choose the zero that is smaller in modulus. It is given by
q q
sgn.a d /
t D j j2 C 1 D sgn. / j j2 C 1;
sgn.b/
but computing it this way can lead to problems when using floating-point arithmetic.
Computing the zero q
tO WD C sgn. / j j2 C 1
that is larger in modulus, on the other hand, is stable. By construction, t and tO are the
zeros of the monic polynomial
N
p.z/ WD z 2 2 z b=b for all z 2 F;
i.e., we have
N D p.z/ D .z t /.z tO/ D z 2 .t C tO/z C t tO
z 2 2 z b=b for all z 2 F;
b b
is unitary and Q AQb is diagonal. Applying the transformation Q
b to the p-th and q-th
row of an n-dimensional vector is a unitary transformation in F nn corresponding to
the matrix 0 1
Ip1
B c sN C
B C
Q WD B B Iqp1 C; (3.7)
C
@ s cN A
Inq
where Ik 2 F kk again denotes the k-dimensional identity matrix. We compute
the transformation B WD Q AQ in two steps: multiplication of A by Q yields the
intermediate result
0 1
: : : a1p c a1q s : : : a1p sN C a1q cN : : :
B :: :: C
M WD AQ D @ : : A; (3.8)
: : : anp c anq s : : : anp sN C anq cN : : :
where only the p-th and q-th column differ from the ones of the original matrix A.
In the second step, we compute B D Q M . Multiplication by Q changes only the
p-th and q-th row:
0 1
:: ::
B : : C
B cm
N N
s m : : : N
cm sN mq n C
B p1 q1 pn C
B :: :: C
B D Q M D B : : C: (3.9)
B C
Bsmp1 C cmq1 : : : smpn C cmq n C
@ A
:: ::
: :
b b
and since Q AQb is diagonal by construction, we have bqp D bpq D 0. The resulting
algorithm is summarized in Figure 3.1, A is overwritten with B D Q AQ.
Of course, it is not clear that eliminating one entry of A will get us closer to a
diagonal matrix: it could just move entries around without actually reducing the off-
diagonal part. Fortunately, the elimination step works perfectly:
Theorem 3.2 (Jacobi step). Let n 2 N, and let A 2 F nn be self-adjoint. Let p; q 2
¹1; : : : ; nº with p < q. For the Jacobi matrix Q 2 F nn defined by (3.7), we have
Proof. (cf. [18, eq. (8.4.2)]) The transformation Q can change only the p-th and q-th
row and column. In particular, this implies
bDQ
Due to B b b
AQb and Proposition 2.54, we have
since bqp D bpq D 0 and aqp D aNpq . Using Proposition 2.54 again, we obtain
X
n X
n
off.B/2 D kBk2F jbi i j2 D kBk2F jbpp j2 jbqq j2 jbi i j2
i D1 i D1
i 62¹p;qº
X
n
D kAk2F japp j2 2japq j2 jaqq j2 jai i j2
i D1
i 62¹p;qº
X
n
D kAk2F jai i j2 2japq j2 D off.A/2 2japq j2 :
i D1
As long as we choose non-zero entries apq , each Jacobi step reduces the norm of
the off-diagonal part of the matrix. If we choose the maximal entries, we can obtain a
simple linear convergence estimate:
X
n X
n
off.A/2 D jaij j2 japq j2 D japq j2 n.n 1/;
i;j D1 i;j D1
i ¤j i ¤j
These results give rise to the classical Jacobi iteration: in each step, we choose
the off-diagonal entry of maximal modulus and eliminate it, and we repeat until the
off-diagonal entries are sufficiently small. Using off.A/2 n.n 1/japq j2 , we can
compute an upper bound for off.A/ and use it as a stopping criterion. The resulting
algorithm is summarized in Figure 3.2.
An important advantage of the classical Jacobi iteration, as compared to other
eigenvalue algorithms, is its guaranteed rate of convergence. Unfortunately, this ad-
vantage comes at a high cost: finding the maximal japq j directly requires checking
all n.n 1/=2 superdiagonal entries, and even for moderately large values of n, this
task dominates the computational cost. It is possible to reduce the complexity by
using more sophisticated algorithms, but in practice it is more convenient to simply
replace the search for the maximum by a strategy that cycles through all off-diagonal
entries. An example is the cyclic-by-row Jacobi iteration given in Figure 3.3 that cy-
cles through all rows and within each row through all columns. For a 5 5 matrix,
the elements are eliminated in the following pattern:
0 1
1 2 3 4
B10 5 6
B 0 0 7C C
B2 5 8
B 9C C:
@30 60 80 10A
40 70 90 100
The entry m 2 ¹1; : : : ; 10º is eliminated in the m-th elementary step together with the
entry m0 . Diagonal elements marked with are not eliminated.
The elementary Jacobi step given in Figure 3.1 requires 13 arithmetic operations to
compute c and s and 12n operations to update A. The complexity of the overall Jacobi
algorithm, both in the classical version given in Figure 3.2 and the cyclic version
given in Figure 3.3 is usually measured in terms of one “sweep”, i.e., a sequence of
n.n 1/=2 elementary Jacobi steps, since exactly this number is required in the cyclic
algorithm to eliminate all off-diagonal elements once. One sweep of the classical
Section 3.4 Error estimates 47
Although we use the same notation for the norm of a vector and the spectral norm
of a matrix, the meaning is always clear, since the first only applies to vectors and the
second only to matrices.
The spectral norm is compatible with the norm k k in F m : for x 2 F m n ¹0º, the
vector x=kxk has unit norm, and we find
x
kAxk D
A kxk kxk kAk kxk:
Show that this implies kA k D kAk and kA Ak D kAk2 D kAA k.
Proof. As in the proof of Proposition 2.54, we write the Frobenius norm as a sum of
norms of the columns of A and apply (3.11):
X
n X
m X
m X
m
kAk2F D jaij j2 D kAıj k2 kAk2 kıj k2 D mkAk2 :
i D1 j D1 j D1 j D1
For the first estimate, we let x 2 F m with kxk D 1 and use the Cauchy–Schwarz
inequality to obtain
X
n n ˇX
X m ˇ2
2 2 ˇ ˇ
kAxk D j.Ax/i j D ˇ aij xj ˇ
i D1 i D1 j D1
X n Xm Xm
jaij j2 jxk j2 D kAk2F kxk2 D kAk2F :
i D1 j D1 kD1
Lemma 3.8 (Minimization problem). Let A 2 F nn be a self-adjoint matrix, and let
2 .A/ be its minimal eigenvalue. We have
D min¹hAx; xi W x 2 F n ; kxk D 1º:
Proof. Due to Corollary 2.51, we find a unitary matrix Q 2 F nn and a real diagonal
matrix D 2 Rnn with
A D QDQ :
Since is the minimal eigenvalue, Proposition 2.18 yields
D min¹di i W i 2 ¹1; : : : ; nºº;
and we can find j 2 ¹1; : : : ; nº with D djj .
For all x 2 F n with kxk D 1, we have
X
n
hAx; xi D hQDQ x; xi D hDQ x; Q xi D hDb xi D
x ;b di i jxO i j2
i D1
for b
x WD Q x.
Lemma 2.38 yields kb
x k D kxk, and we obtain
´ n μ
X
n 2 n
min¹hAx; xi W x 2 F ; kxk D 1º D min di i jxO i j W b
x 2 F ; kb
xk D 1 :
i D1
Due to
X
n X
n
x k2 D
D kb djj jxO i j2 di i jxO i j2 x 2 F n ; kb
for all b x k D 1;
i D1 i D1
Based on Lemma 3.8, we can investigate the sensitivity of eigenvalues with respect
to perturbations of the matrix. The result is the following simple version of the Bauer–
Fike theorem [3]:
j j kA Bk:
j j D min¹j0 j W 0 2 .A/º:
Finding an estimate for the eigenvectors is slightly more challenging: if the “gap”
between eigenvalues is small, perturbing the matrix may lead to large changes in the
eigenvectors. As an example, we consider
1 C 2 1 1 1 1 1C 1 1
A WD ; B WD D
1 1 2 1 1 1 1 1
Section 3.4 Error estimates 51
for a small value > 0. Using Exercise 3.9 and Proposition 2.13, we can compute the
spectral norm
2
kA Bk D D max¹jj W . 2 / 2 D 0º
0
p p p
D max¹.1 2/ ; .1 C 2/ º D .1 C 2/ :
Although the norm of the difference of A and B can be very small if is, the angles
between each of the eigenvectors of both matrices always equal =2. The reason
is that the “gap” between the two eigenvalues of A equals only 2 , and so a small
perturbation of the eigenvalue can lead to a completely different eigenvector. In order
to obtain a useful estimate for the approximation of eigenvectors, we can introduce a
lower bound for this gap.
Definition 3.12 (Spectral gap). Let A 2 F nn be a self-adjoint matrix. For all 2
.A/, we define the spectral gap of and A by
A ./ WD inf¹j j W 2 .A/ n ¹ºº: (3.12)
Using this quantity, we can formulate the following estimate for the eigenvectors
of a perturbed matrix:
kA Bk
kx yk 2 kyk:
A ./
A D QDQ :
y WD Q y and define b
We let b x 2 F n by
´
yOj if djj D ;
xOj WD for all j 2 ¹1; : : : ; nº:
0 otherwise
For x WD Qb
x , we obtain
Ax D QDQ Qb
x D QDb
x D Qb
x D x
52 Chapter 3 Jacobi iteration
since D is a diagonal matrix and xOj is only non-zero if djj D , and conclude x 2
E.A; /. Now we compute
k.A I /.x y/k D k.A I /x .A I /yk D k.A I /yk
D k.B I /y C .A B/y C . /yk
k.B I /yk C k.A B/yk C j j kyk
kA Bk kyk C kA Bk kyk D 2kA Bk kyk
using the compatibility property (3.11) and the fact that y is an eigenvector of B. Now
we only have to eliminate the matrix A I in this estimate:
k.A I /.x y/k2 D kQ.D I /Q .x y/k2 D k.D I /.b y /k2
x b
X
n X
n
D jdjj j2 jxOj yOj j2 D jdjj j2 jxOj yOj j2
j D1 j D1
djj ¤
X
n X
n
A ./2 jxOj yOj j2 D
A ./2 jxOj yOj j2
j D1 j D1
djj ¤
D
A ./2 kb y k2 D
A ./2 kQ.b
x b y /k2
x b
D
A ./2 kx yk2 ;
where we have used our special choice of b
x to avoid inconvenient terms in the sum
and Lemma 2.38.
The arguments used to eliminate A I in this proof are closely related to the
concept of a pseudo-inverse: due to 2 .A/, the matrix A I is not injective
and therefore not invertible. Still, we can define an inverse in the range of AI ,
and this is called a pseudo-inverse. In our case, the range of AI is the invariant
subspace spanned by all eigenspaces except E.A; /, and the vector x is chosen
to ensure that x y is an element of this space.
Using this estimate, we can estimate the accuracy of the eigenvalue approximations:
jai i j kB Ak off.A/:
The estimate given here is suboptimal: combining Proposition 3.13 with the esti-
mate provided by Theorem 4.6 for the Rayleigh quotient introduced in Section 4.2
yields that jai i j converges like off.A/2 .
Under certain additional conditions, Jacobi’s method offers the advantage that it
can be used to compute eigenvalues to a very high relative accuracy, while most
other algorithms typically reach only a high absolute accuracy [8].
1 1 japq j
jsj jt j jcj jt j p D (3.13)
j j C j j C 1
2 2j j japp aqq j
for the elementary Jacobi step that eliminates apq with 1 p < q n. This bound is
only useful if we can ensure that the diagonal elements of the matrix are sufficiently
far from each other.
54 Chapter 3 Jacobi iteration
Due to Corollary 3.4, we know that the diagonal elements converge to eigenval-
ues, so we can only expect well-separated diagonal elements if the eigenvalues are
well-separated. For the remainder of this section, we assume that A has only simple
eigenvalues, i.e., that
#.A/ D n (3.14)
holds. The minimal distance of two eigenvalues is denoted by
and we can use it in combination with Proposition 3.11 to obtain the following esti-
mate for the diagonal entries:
Lemma 3.15 (Diagonal entries). Let 2 off.A/ < , i.e., ı WD 2 off.A/ > 0. Then
we have
japp aqq j
ı for all p; q 2 ¹1; : : : ; nº; p ¤ q:
Proof. (cf. [51, eq. (1.6)]) Let D 2 F nn be the diagonal part of A, i.e.,
0 1
a11
B :: C
DD@ : A;
ann
We have kA DkF D off.A/, and combining Proposition 3.11 with Proposition 3.7
yields that for each 2 .A/ we can find i 2 ¹1; : : : ; nº with
i.e., D by our assumption (3.15). This means that the mapping 7! i defined
by (3.16) is injective, and (3.14) yields that it also has to be bijective.
Let now p; q 2 ¹1; : : : ; nº be fixed with p ¤ q. We have seen that there are
; 2 .A/ with ¤ such that p D i and q D i , and conclude
We consider one sweep of the algorithm, i.e., N WD n.n 1/=2 elementary Jacobi
steps. In the r-th step, we choose 1 pr < qr n and eliminate the entry in the
pr -th row and qr -th column of the matrix. We denote the sine of the rotation angle
(cf. (3.5)) by sr and the cosine by cr . The original matrix is written as A.0/ , and the
matrix resulting from applying the r-th transformation to A.r1/ as A.r/ . Due to our
.r1/
construction the r-th transformation is chosen to eliminate apr qr . The key to the
convergence analysis is the following bound for the Jacobi angles:
Proof. (cf. [51, eq. (2.2)] and [39, eq. (9)]) Theorem 3.2 yields
off.A.r/ /2 D off.A.r1/ /2 2jap.r1/
r qr
j2 for all r 2 ¹1; : : : ; N º;
and a simple induction gives us
X
N
0 off.A.N / /2 D off.A.0/ /2 2 jap.r1/
r qr
j2 ;
rD1
which implies
X
N
off.A.0/ /2
jap.r1/
r qr
j2 : (3.18)
2
rD1
Due to off.A.r1/ / off.A.0/ /, we can apply Lemma 3.15 to A.r1/ , and (3.13)
yields
X
N X
N
jap.r1/ j2 X
N
jap.r1/ j2 off.A.0/ /2
r qr r qr
jsr j2 2
:
.r1/ .r1/
japr pr aqr qr j2 ı 2ı 2
rD1 rD1 rD1
jap.`/
r qr
j D jap.`1/
r qr
cr ap.`1/
r q`
sr j jap.`1/
r qr
j jcr j C jap.`1/
r q`
j jsr j
jap.`1/
r qr
j C jsr j jap.`1/
r q`
j; (3.19c)
jap.`/
r qr
j D jap.`1/ sN C ap.`1/
r p` r r qr
cNr j jap.`1/
r p`
j jsr j C jap.`1/
r qr
j jcr j
jap.`1/
r qr
j C jsr j jap.`1/
r p`
j: (3.19d)
Lemma 3.17 (Bound for one Jacobi step). Let ` 2 ¹1; : : : ; N º. Define the matrix
E .`/ 2 F nn by
8 .`1/
ˆ
ˆ aq` j if i D p` ; j ¤ q` ;
ˆ
ˆ
ˆ
ˆ
.`1/
if i D q` ; j ¤ p` ;
<ap` j
.`/ .`1/
eij D aiq if j D p` ; i ¤ q` ; for all i; j 2 ¹1; : : : ; nº: (3.20)
ˆ
ˆ `
ˆ
ˆaip .`1/
if j D q` ; i ¤ p` ;
ˆ
:̂ `
0 otherwise
Then we have
.`/ .`1/ .`/
jaij j jaij j C js` j jeij j for all i; j 2 ¹1; : : : ; nº:
Proof. Combine the estimates (3.19) and take advantage of the fact that A D A
implies .A.`1/ / D A.`1/ , which in turn implies .E .`/ / D E .`/ .
If we use a cyclic Jacobi method, i.e., if each off-diagonal entry is eliminated once
during the sweep, we obtain the following convergence estimate:
Theorem 3.18 (Cyclic Jacobi method). Let ı WD 2 off.A.0/ / > 0. If for each
1 p < q n there is an r 2 ¹1; : : : ; N º with pr D p and qr D q, i.e., if each
off-diagonal entry is eliminated once during the sweep, we have
r
.N / N
off.A / off.A.0/ /2 ; (3.21)
2ı 2
i.e., the Jacobi method converges quadratically.
Section 3.5 Quadratic convergence 57
Proof. (cf. [51, Section 2]) Let 1 p < q n. By our assumption, we can find
.r1/
r 2 ¹1; : : : ; N º with pr D p and qr D q. Since the Jacobi rotation eliminates apr qr ,
.r/
we have apq D 0. Lemma 3.17 and a simple induction yield
X
N X
N
.N / .`/ .`/
japq j js` j jepq j js` j jepq j:
`DrC1 `D1
so we have found
off.A.0/ /2 X
N
N
off.A.N / /2 2
off.E .`/ /2 2 off.A.0/ /4 ;
2ı 2ı
`D1
Exercise 3.20 (Improved estimate). Take the structure of the matrices .E .`/ /N
`D1
into account to improve the result of Theorem 3.18 to
r
.N / n
off.A / off.A.0/ /2 :
ı2
58 Chapter 3 Jacobi iteration
Hint: Theorem 3.2 implies off.A.`/ / off.A.0/ / for all ` 2 ¹0; : : : ; N º. How
often is each of the entries in the matrix changed during a sweep? Can you
improve the estimate even further, maybe using additional assumptions regarding
the sequence of elementary steps?
Theorem 3.21 (Classical Jacobi method). Let ı WD 2 off.A.0/ / > 0, and let
the indices .pr /N N
rD1 and .qr /rD1 be chosen as in the classical Jacobi method, i.e.,
satisfying
.r1/
jap.r1/
r qr
j
jaij j for all 1 i < j n; r 2 ¹1; : : : ; N º: (3.23)
Then we have r
n2
.N /
off.A off.A.0/ /2 ;
/ (3.24)
ı2
i.e., the classical Jacobi method also converges quadratically.
Proof. (cf. [39, Satz 2]) Let IN WD ¹.i; j / W 1 i < j nº. We prove by induction
that for each r 2 ¹1; : : : ; N º we can find Ir IN with
!2
X .r/ 2
Xr
#Ir D r; jaij j 2.n 2/ js` j jap.`1/
` q`
j : (3.25)
.i;j /2Ir `D1
We can see that for r D N , this estimate implies an upper bound for off.A.N / /2 =2.
For r D 1, we choose I1 D ¹.p1 ; q1 /º. The Jacobi step guarantees ap.1/ 1 q1
D 0, so
(3.25) holds trivially.
Assume now that r 2 ¹1; : : : ; N 1º is given such that a subset Ir IN satisfying
(3.25) exists. Let J IN be a subset of cardinality #J D r such that
X .r/ 2
jaij j
.i;j /2J
Since r < N and (3.23) holds, we can assume .pr ; qr / 62 J , since otherwise we could
replace it by a different index pair without increasing the sum. Therefore the set
IrC1 WD J [ ¹.pr ; qr /º
A look at (3.20) reveals that only 4.n 2/ entries of E .`/ , namely the p` -th and q` -
th rows and columns with the exception of their intersections, are non-zero. To take
advantage of this property, let
denote the set of all coefficients in J changed during the update from A.r/ to A.rC1/ .
Its cardinality equals #JC D 2.n 2/, since J contains only index pairs above the
.rC1/
diagonal. We use (3.23) to bound jeij j by jap.r/
rC1 qrC1
j for all .i; j / 2 JC and
apply the Cauchy–Schwarz inequality to obtain
X .rC1/ 2
X .r/ .rC1/ 2
jaij j .jaij j C jsrC1 j jeij j/
.i;j /2J .i;j /2J
X .r/ 2 .r/ .rC1/ .rC1/ 2
D .jaij j C 2jaij j jsrC1 j jeij j C jsrC1 j2 jeij j /
.i;j /2J
X .r/ 2
X
jaij j C jsrC1 j2 jap.r/
rC1 qrC1
j2
.i;j /2J .i;j /2JC
X .r/
C 2jap.r/
rC1 qrC1
j jsrC1 j jaij j
.i;j /2JC
X .r/ 2
jaij j C 2.n 2/jsrC1 j2 jap.r/
rC1 qrC1
j2
.i;j /2J
p s X
.r/
C 2jap.r/
rC1 qrC1
j jsrC1 j 2.n 2/ jaij j2
.i;j /2JC
0 12
s X p
D@ .r/ 2
jaij j C 2.n 2/jsrC1 j jap.r/
rC1 qrC1
jA :
.i;j /2J
Power methods
Summary
This chapter introduces the power iteration, a very simple method for computing
eigenvectors that paves the way for a number of very important and efficient meth-
ods like the inverse iteration, the Rayleigh iteration, and the simultaneous iteration.
We investigate the convergence of these iterations and provide simple estimates for
the accuracy of the approximations of eigenvalues and eigenvectors that can be used
to construct stopping criteria for the iteration.
Learning targets
Introduce the power iteration and the (shifted) inverse iteration.
Analyze the convergence of these methods.
Introduce the Rayleigh iteration and establish its cubic convergence.
Investigate the convergence to an invariant subspace instead of an eigenspace.
Introduce the simultaneous iteration to compute a basis of this invariant sub-
space.
Due to (4.4), the first component of b x .m/ will grow more rapidly than any other com-
ponent as m ! 1, so we can expect the transformed iteration vectors to “converge”
to multiples of the eigenvector ı1 and the original vectors to multiples of the eigen-
vector e1 WD Qı1 .
Unfortunately, if 1 ¤ 1, there can be no convergence in the usual sense, since the
length of the iteration vectors is changed by applying the matrix. Instead, we look for
convergence of the angle between the iteration vectors and the eigenvector.
jhx; yij
cos †.x; y/ WD ;
kxk kyk
q
sin †.x; y/ WD 1 cos2 †.x; y/;
sin †.x; y/
tan †.x; y/ WD :
cos †.x; y/
for all m 2 N0 ;
and taking the square root of both sides of the inequality yields the result.
The condition cos †.x .0/ ; e1 / ¤ 0 is required to ensure that the denominators in the
.0/
proof are well-defined: it implies b x 1 ¤ 0, and since we also have j1 j > j2 j
0
due to (4.4), this implies
.m/ .0/
b
x1 D m
1bx 1 ¤ 0; x .m/ k ¤ 0
kb for all m 2 N0 :
.0/
This requirement is also quite natural: if cos †.x .0/ ; e1 / D 0 holds, we have b
x 1 D 0,
so the starting vector contains no component in the direction of the desired eigenvec-
tor, and this will not change no matter how often we apply the iteration step.
Section 4.1 Power iteration 65
x .m/ WD T 1 x .m/
b for all m 2 N0
and obtain
i.e., the transformed vectors result from a power iteration using the matrix D and
the starting vector b
x .0/ . Applying Theorem 4.2 to the transformed vectors yields
convergence estimates.
Using machine numbers also means that rounding effects will influence the con-
vergence behaviour, cf. [52, Chapter 9].
66 Chapter 4 Power methods
Since the angle between vectors does not depend on their scaling, this leaves the
convergence estimate of Theorem 4.2 intact and leads to the numerically stable algo-
rithm summarized in Figure 4.1.
An important advantage of the power iteration is its flexibility: we only have to
be able to evaluate matrix-vector products, we do not require the matrix A itself. In
particular, we do not have to apply transformations to the matrix like in the case of
the Jacobi iteration. This allows us to use the power iteration in situations where A
is only given implicity: e.g., if A is the inverse of a matrix B, we can replace the
matrix-vector multiplication a D Ax D B 1 x by solving the system Ba D x, and
this is usually easier and more stable.
Investigate the convergence of the power iteration for the starting vectors
1 0 1
x .0/ D
b ; b x .0/ D ; bx .0/ D :
0 1 1
Does the angle between b x .m/ and an eigenspace always converge? If not, explain
why this is not a counterexample for Theorem 4.2.
Ae1 D 1 e1 ; 0 D 1 e1 Ae1 ;
so the norm
k1 x .m/ Ax .m/ k
should provide an estimate of how close we are to the eigenvector. In most applica-
tions, the eigenvalue is not known, so we have to approximate it as well. An elegant
solution to this problem is provided by the Rayleigh quotient: assume that e ¤ 0 is
an eigenvector of A, but the corresponding eigenvalue is unknown. Our goal is to
reconstruct using e. By our assumption, we have
Ae D e:
and using
he; ei D kek2 > 0;
we can obtain
hAe; ei
D
he; ei
and have found . This result gives rise to the following definition:
hAx; xi
ƒA W F n n ¹0º ! F; x 7! ;
hx; xi
is called the Rayleigh quotient corresponding to the matrix A.
Theorem 4.6 (Rayleigh quotient). Let A 2 F nn , and let x 2 F n n ¹0º. Let e 2
F n n ¹0º be an eigenvector of A for the eigenvalue 2 F. Then we have
jƒA .x/ j kA I k sin †.x; e/ kA I k tan †.x; e/:
If A is a normal matrix, we even have
jƒA .x/ j kA I k sin2 †.x; e/ kA I k tan2 †.x; e/:
Proof. Since e is an eigenvector, we have .AI /e D 0, and we can use the Cauchy–
Schwarz inequality (2.9) and the compatibility inequality (3.11) of the spectral norm
to find
ˇ ˇ
ˇ hAx; xi hx; xi ˇ
jƒA .x/ j D ˇˇ ˇ D jh.A I /x; xij
hx; xi hx; xi ˇ hx; xi
jh.A I /.x ˛e/; xij k.A I /.x ˛e/k kxk
D
hx; xi kxk2
kA I k kx ˛ek kxk kx ˛ek
D kA I k for all ˛ 2 F:
kxk2 kxk
Section 4.2 Rayleigh quotient 69
i.e., our first estimate. Let now A be a normal matrix. Due to Proposition 2.36, we
now not only have .A I /e D 0, but also .A I / e D 0, and this allows us to
improve the estimate as follows:
Due to the Cauchy–Schwarz inequality (2.9), we have cos †.x; e/ 1 and therefore
sin †.x; e/ tan †.x; e/.
Using the Rayleigh quotient, we can construct a stopping criterion for the power
iteration: for the m-th iteration vector x .m/ , we compute an approximation .m/ D
ƒA .x .m/ / of the eigenvalue and check whether kAx .m/ .m/ x .m/ k is sufficiently
small. In our algorithm, x .m/ is a unit vector, so the Rayleigh quotient takes the
simple form ƒA .x .m/ / D hAx .m/ ; x .m/ i and we obtain the practical algorithm given
in Figure 4.2.
70 Chapter 4 Power methods
Definition 4.7 (Residual). Let A 2 F nn , and let x 2 F n n ¹0º. The vector
r WD ƒA .x/x Ax
The practical power iteration given in Figure 4.2 stops as soon as the norm of the
residual drops below j.m/ j, and we can use backward error analysis to derive error
bounds for the eigenvalue and the corresponding eigenvector: given x .m/ and .m/ D
ƒA .x .m/ /, we construct a matrix B such that
holds, i.e., such that x .m/ is an eigenvector of B for the eigenvalue .m/ . If we can
bound kA Bk, the perturbation theory of Propositions 3.11 and 3.13 can be applied.
In order to find a convenient bound for kA Bk, we require the following lemma:
Lemma 4.8 (Rank-2 matrix). Let a; b 2 F n with ha; bi D 0, and let E WD ab Cba .
Then we have kEk D kak kbk.
i.e., the vectors y, a and b are pairwise perpendicular and y 2 N .E/. We find
The practical power iteration in Figure 4.2 stops as soon as the norm of the residual
r .m/ defined by
r .m/ WD .m/ x .m/ Ax .m/ ; .m/ WD hAx .m/ ; x .m/ i for all m 2 N0 (4.8)
drops below j.m/ j. In this case, we can obtain the following estimates for the accu-
racy of the approximations of eigenvalues and eigenvectors:
Corollary 4.10 (Accuracy). Let 2 Œ0; 1/ and kr .m/ k j.m/ j. Then we can find
2 .A/ and x 2 E.A; / such that
j .m/ j j.m/ j jj; (4.9a)
1
j.m/ j jj
kx x .m/ k 2 2 1 C : (4.9b)
A ./ 1
A ./
Proof. By Theorem 4.9, kr .m/ k j.m/ j implies that we can find an eigenvalue
2 .A/ and a vector x 2 E.A; / such that the bounds
j.m/ j
j .m/ j j.m/ j; kx x .m/ k 2
A ./
hold. In order to obtain an estimate for the relative error of the eigenvalue, we apply
the triangle inequality to the first bound to get
Proof. Let b
A WD .A I /1 and
1
O j WD for all j 2 ¹1; : : : ; nº:
j
jO 2 j
tan †.x .mC1/ ; e1 / tan †.x .m/ ; e1 /
O
j1 j
j1 j
D tan †.x .m/ ; e1 / for all m 2 N0 :
j2 j
If we can find a suitable shift parameter , the shifted inverse iteration converges
rapidly and reliably: as a first example, we consider the matrix
1
AD :
1
Its eigenvalues are 1 and 1, and due to j1j D j 1j, we can expect no convergence
for the power iteration. If we use a shifted inverse iteration with, e.g., D 1=2,
Corollary 4.11 predicts a convergence rate of
j1 1=2j j1=2j
D D 1=3:
j 1 1=2j j 3=2j
By choosing closer to 1, we can get even better rates.
The shift parameter can also help if the gap between eigenvectors is small: let > 0
and consider
1C
AD :
1
Section 4.4 Inverse iteration 75
The matrix A now has a dominant eigenvalue 1 C and a second eigenvalue 1, and
Theorem 4.2 predicts a convergence rate of 1=.1C /. Using a shifted inverse iteration
with D 1 C 2 =3, we get a rate of
j1 C .1 C 2 =3/j j =3j
D D 1=2;
j1 .1 C 2 =3/j j 2 =3j
and again the rate can be improved by moving close to 1 C . In both examples, we
can also compute the second eigenvalue by choosing appropriately.
In a practical implementation of the inverse iteration, it is frequently more efficient
to handle the inverse .A I /1 implicitly: instead of computing a D .A I /1 x,
which would require the inverse, we solve the linear system .A I /a D x for the
unknown vector a. Since each step of the iteration requires us to solve one of these
systems for another right-hand side vector, it is usually a good idea to prepare the
matrix A I in advance, e.g., by computing a suitable factorization. The resulting
algorithm is summarized in Figure 4.3.
The stopping criterion relies on the residual given by
instead of (4.8), since this vector can be computed without additional matrix-vector
multiplications.
76 Chapter 4 Power methods
Using (4.11) again, we find an eigenvalue 0 2 .A/ n ¹º such that O 0 D 1=.0 /.
This means
ˇ ˇ
ˇ 1 1 ˇˇ j.0 / . /j j0 j
b . O
/ D j O
O
0
j D ˇ D D ;
A ˇ 0 ˇ j j j0 j j j j0 j
and (4.14) yields
jjO j j j0 j
kx x .m/ k 2 1 C D 2 1 C
1
./O 1 j j j0 j
b
A
j0 j j0 j C j j
D 2 1 C 2 1 C
1 j0 j 1 j0 j
j j
2 1 C 1C :
1
A ./
Section 4.5 Rayleigh iteration 77
Proposition 4.13 (Convergence). Let A .0/ I be invertible, let (4.2) and (4.3) hold
with
j1 .0/ j < j2 .0/ j jn .0/ j: (4.16)
Let e1 WD Qı1 and let x .1/ be given by (4.15). If there is a constant c 2 R>0 such
that s
j1 2 j
tan †.x .0/ ; e1 / c
2kA 1 I k
since we have assumed A to be normal (or even self-adjoint), and we also find
j.0/ 2 j
j1 2 j j.0/ 1 j
j1 2 j kA 1 I k tan2 †.x .0/ ; e1 /
j1 2 j kA 1 I kc 2
j1 2 j j1 2 j=2 D j1 2 j=2:
78 Chapter 4 Power methods
j1 .0/ j
tan †.x .1/ ; e1 / tan †.x .0/ ; e1 /
j2 .0/ j
kA 1 I k tan2 †.x .0/ ; e1 /
tan †.x .0/ ; e1 /
j1 2 j=2
kA 1 I k
D2 tan3 †.x .0/ ; e1 /
j1 2 j
kA 1 I k j1 2 j
2 tan †.x .0/ ; e1 / D tan †.x .0/ ; e1 / c:
j1 2 j 2kA 1 I k
These are the required estimates.
As we can see, the inverse iteration converges at a rate of approximately 0:47, which
coincides with the expected value of j1 j=j2 j. The Rayleigh iteration shows
inferior convergence rates during the first few steps, but once cubic convergence takes
hold in the fourth step, machine accuracy is attained rapidly.
j1 j
jk j > jkC1 j
jn j (4.17)
holds, i.e., we allow k eigenvalues to have the same norm. In this case, we cannot
expect convergence to the eigenspace corresponding to 1 , but we can still prove
convergence to an invariant subspace.
80 Chapter 4 Power methods
Definition 4.15 (Angle between a vector and a subspace). Let x 2 F n n ¹0º, and let
Y F n be a non-trivial subspace. We define
In order to compute the angle between a vector and a subspace, the minimization
property of Lemma 4.5 is useful: we have
Let now P 2 F nn be a matrix satisfying (4.18). We first consider y 2 Y and find
kP xk kx P xk kx P xk
cos †.x; Y/ D ; sin †.x; Y/ D ; tan †.x; Y/ D :
kxk kxk kP xk
Proof. According to Definitions 4.1 and 4.15 and Lemma 4.5, we have
² ³
kx yk
sin †.x; Y/ D min¹sin †.x; y/ W y 2 Y n ¹0ºº D min W y 2 Y n ¹0º :
kxk
82 Chapter 4 Power methods
j1 j
jk j > jkC1 j
jn j: (4.20)
Due to
P 2 D QPb Q Q Pb Q D Q P bP
b Q D Q P
b Q D P;
P D .QPb Q / D Q P
b Q D QPb Q D P;
x .m/ WD Q x .m/
b for all m 2 N0 :
b x .0/ ¤ 0 and
Due to cos †.x .0/ ; Ek / ¤ 0, we have P x .0/ ¤ 0 and therefore Pb
observe
kx .m/ P x .m/ k kQbx .m/ QP b Q Qb x .m/ k
tan †.x .m/ ; Ek / D D
kP x .m/ k kQP b Q Qb x .m/ k
b x .m/ k
x .m/ Pb
kb
D D tan †.bx .m/ ; b
E k / for all m 2 N0 (4.22)
b
kPbx k
.m/
due to (2.16). The tangent of the transformed vector and subspace can be computed
directly: Proposition 4.19 combined with the definition of the norm yields
Pn
kb b y k2
y Pb j DkC1 jbyj j2
2
tan †.b b
y; E k/ D D Pk for all b b y ¤ 0;
y 2 F n with Pb
b y k2
kPb jb
y j2
j D1 j
similar to the approach used in the proof of Proposition 2.19. The resulting iteration
takes the form
According to Theorem 4.20, the columns of the matrices X .m/ will converge to the
invariant subspace Ek , but they may stop being linearly independent, e.g., if one of
the columns lies in the nullspace of A.
Since we are interested in finding a basis, we have to avoid this problem: we impose
suitable starting conditions.
Proof. Let PX .0/ be injective. Following the approach of Theorem 4.2, we define
transformed matrices
b .m/ WD Q X .m/
X for all m 2 N0 :
b is invertible. Since
Due to (4.17), we have j1 j
jk j > jkC1 j
0, so D
.0/
PX is injective, the matrix
.0/
b
Y
DP bXb .0/ D Q PQQ X .0/ D Q PX .0/
0
may become “numerically linearly dependent”, e.g., they could all converge to
E.A; 1 / if 1 is dominant. In order to avoid numerical instability, we guarantee
linear independence by orthogonalization.
Definition 4.22 (QR factorization). Let A 2 F nm and k WD min¹n; mº. If an iso-
metric matrix Q 2 F nk and an upper triangular matrix R 2 F km satisfy
A D QR;
and conclude
1 r11 R1
A D Q1 b
Q b D QR
R
with the matrices
1 O
.nC1/.kC1/ r11 R1 O
Q WD Q1 b 2F ; R WD b 2 F .kC1/m :
Q R
In order to avoid numerical instabilities, we do not work directly with X .m/ : in a first
step, we compute Q.0/ and R.0/ satisfying (4.24) directly. If Q.m/ and R.m/ have
been computed, we have
Y .mC1/ WD AQ.m/ ;
b.mC1/ 2 F kk
i.e., we find an isometric Q.mC1/ 2 F nk and an upper triangular R
such that
Y .mC1/ D Q.mC1/ Rb.mC1/ :
This implies
b.mC1/ R.m/ ;
X .mC1/ D AQ.m/ R.m/ D Y .mC1/ R.m/ D Q.mC1/ R
and by choosing
b.mC1/ R.m/ ;
R.mC1/ WD R
we can obtain the required factorization without computing X .mC1/ explicitly. The
resulting algorithm is called the simultaneous iteration (sometimes also subspace it-
eration or orthogonal iteration) and summarized in Figure 4.5.
Section 4.7 Simultaneous iteration 87
(recall that we have kx .m/ k D 1 in the practical power iteration), we compute the
k k matrix
and instead of considering the norm of the residual vector r .m/ D Ax .m/ .m/ x .m/ ,
we use the norm of the residual matrix
Note that this is not the same matrix as the one used in (4.24), we only use the same
letter, for obvious reasons. The resulting practical simultaneous iteration is given in
Figure 4.6. The error analysis of Section 4.3 can be extended to this case:
Due to Theorem 4.20, we know that the angles between the columns of the matrices
X .m/ and the invariant subspace Ek spanned by the eigenvectors corresponding to the
first k eigenvalues will converge to zero. Proposition 4.21 ensures that the columns of
X .m/ span the same subspace as those of Q.m/ , so the convergence result holds for
the latter as well.
In view of Proposition 2.43, we can characterize “convergence to an invariant sub-
space” also in terms of equation (2.20) and obtain the following result.
Theorem 4.26 (Convergence to invariant subspace). Let (4.2) and (4.3) hold with
j1 j
jk j > jkC1 j
jn j:
Let P 2 F nn be the projection onto Ek defined by (4.21).
Let X .0/ 2 F nk , and assume that PX .0/ is injective. We can find a matrix ƒ 2
F kk such that
k.AX .m/ X .m/ ƒ/yk jkC1 j m k.AX .0/ X .0/ ƒ/yk
kPX .m/ yk jk j kPX .0/ yk
for all y 2 F k n ¹0º and all m 2 N0 :
Proof. As in the previous proofs, we introduce transformed matrices
b .m/ WD Q X .m/
X for all m 2 N0
and observe
b .mC1/ D Q X .mC1/ D Q AX .m/ D Q AQX
X b .m/ D D X
b .m/ for all m 2 N0 :
b
We define X
.m/ b .m/ 2 F .nk/k by
2 F kk and X
k ?
!
b
X
.m/
b .m/ D
X k for all m 2 N0 :
b .m/
X ?
Section 4.7 Simultaneous iteration 89
b .0/
b .0/ /1 Dk X
ƒ WD .X k k
D b .0/ yk
kDkm X b .0/ yk
jk j kX m
for all y 2 F k ; m 2 N0 ;
k k
b .0/ , we have
in the last step. Using the definitions of P and X
kX bX
b .0/ yk D kP b .0/ yk D kPX .0/ yk for all m 2 N0 ; y 2 F k ;
k
During the simultaneous iteration, we do not compute X .m/ but the isometric ma-
trices Q.m/ . For these matrices, the convergence result takes a particularly simple
form.
Corollary 4.27 (Invariant subspace). Let the conditions of Theorem 4.26 hold, let
.Q.m/ /1
mD0 and .R
.m/ /1
mD0 be as in (4.24), and let .ƒ
.m/ /1
mD0 be as in (4.25). Then
we have
.m/ .m/ .m/ jkC1 j m
kAQ Q ƒ k C0 for all m 2 N0 ;
jk j
where the constant is given by
´ μ
k.AX .0/ X .0/ ƒ/yk k
C0 WD max W y 2 F n ¹0º :
kPX .0/ yk
Now we only have to replace ƒ e .m/ by ƒ.m/ . To this end let … WD Q.m/ .Q.m/ / .
Since Q .m/ is isometric, we have …2 D Q.m/ .Q.m/ / Q.m/ .Q.m/ / D Q.m/ .Q.m/ /
D … and … D …, so … is an orthogonal projection onto the range R.Q.m/ /
of Q.m/ . Let z 2 F k . Due to Q.m/ ƒ e.m/ z 2 R.Q.m/ /, we can use the best-
approximation property (4.19a) to obtain
and using the intermediate result (4.26) in combination with the Definition 3.5 of the
spectral norm completes the proof.
and .R` / D ¹` º for all ` 2 ¹1; : : : ; kº. This decomposition can take the place
of the diagonalization (4.2) we relied on in the analysis of the power iteration in the
diagonalizable case: the matrix is block-diagonalizable.
We have to investigate the matrices R` more closely. Let ` 2 ¹1; : : : ; kº. Since
R` is upper triangular, Proposition 2.13 implies that the diagonal elements of R` are
its eigenvalues. Due to .R` / D ¹` º, all diagonal elements have to equal ` . This
92 Chapter 4 Power methods
suggests a decomposition of R` into the diagonal part and the remainder, and the
remainder will be a special kind of triangular matrix.
We split the matrix R` into the diagonal part ` I and a strictly upper triangular
matrix N` 2 C n` n` :
R` D ` I C N` :
The vectors computed during the power iteration result from applying powers of
the matrix to the starting vector, therefore we have to investigate the powers of R` .
Since I and N` commute, these powers are given by
!
Xm
m mj j
m m
R` D .` I C N` / D ` N` for all m 2 N0 :
j
j D0
Proof. We use contraposition: let i; j 2 ¹1; : : : ; nº be given such that .NM /ij ¤ 0.
Due to
X n
0 ¤ .NM /ij D nik mkj ;
kD1
there has to be at least one index k 2 ¹1; : : : ; nº with nik ¤ 0 and mkj ¤ 0. Since
N is ˛-upper triangular and M is ˇ-upper triangular, this implies k
i C ˛ and
j
k C ˇ due to (4.28). Combining both estimates gives us j
k C ˇ
i C ˛ C ˇ.
We conclude that .NM /ij ¤ 0 implies j
i C ˛ C ˇ, and contraposition yields that
j < i C ˛ C ˇ implies .NM /ij D 0, so NM has to be .˛ C ˇ/-upper triangular.
Since N` is strictly upper triangular, Lemma 4.29 and a simple induction yield
that N`n` is n` -upper triangular, and since the matrix is of size n` n` , this implies
N`n` D 0. Therefore the m-th power of R` takes the form
!
` 1
nX
m mj j
R`m D ` N` for all m 2 Nn` 1 :
j
j D0
Section 4.8 Convergence for general matrices 93
In order to obtain a generalized version of Theorem 4.2, we have to find lower and
upper bounds for matrices of this type.
Lemma 4.30 (Bounds for iteration vectors). Let 2 F , let N 2 F nn be a strictly
upper triangular matrix, and let R D I C N .
Let x 2 F n . There are a constant c 2 R>0 and a polynomial p of degree not higher
than n 1 such that
cjjm kxk kRm xk p.m/jjm.n1/ kxk for all m 2 Nn1 :
Proof. Since the result is trivial for x D 0, we consider only the case x ¤ 0. Let
m 2 Nn , P WD Rm and y WD P x D Rm x. Due to Lemma 4.29, the product
of upper triangular matrices is upper triangular, and a simple induction yields that
P WD Rm is also upper triangular.
We first prove the lower bound. Since x ¤ 0, we can find an index k such that
xk ¤ 0. The largest such index is given by
` WD max¹k 2 ¹1; : : : ; nº W xk ¤ 0º:
Since P is upper triangular and xk D 0 for all k > `, we have
X
n X
n
y` D .P x/` D p`j xj D p`j xj D p`` x` :
j D1 j D`
N is strictly upper triangular, and the same holds for N j with j > 0 by Lemma 4.29.
Therefore the only term in (4.29) contributing to the diagonal of P is the term for
j D 0, i.e., m I . We conclude that all diagonal elements of P equal m , and in
particular
y` D p`` x` D m x` :
Since x` ¤ 0, the constant
jx` j
c WD >0
kxk
is well-defined, and we obtain
kRm xk D kyk
jy` j D jjm jx` j D cjjm kxk:
Now we consider the upper bound. Applying the triangle inequality to (4.29) yields
! !
X
m m mj j X m
m
kyk D kP xk D N x jjmj kN j xk:
j D0 j j D0 j
94 Chapter 4 Power methods
Since N is strictly upper triangular, Lemma 4.29 yields that N n is n-upper triangular,
i.e., N n D 0, and therefore
!
X m
n1
kyk jjmj kN j xk:
j
j D0
the polynomials
Y
j
kCzj
pj W F ! F; z 7! ;
k
kD1
satisfy pj .m/ D m j for all m 2 Nn and j 2 ¹0; : : : ; n 1º.
Since each pj is the product of j linear factors, its order is bounded by j , and
X
n1
p W F ! F; z 7! pj .z/jjn1j kN kj ;
j D0
Note that the constant c of Lemma 4.30 depends on the vector x, particularly on
the ratio between its last non-zero entry and its Euclidean norm.
We can apply this Lemma to the decomposition (4.27) if we assume that one eigen-
value is dominant.
and let
b
G 1 WD C n1 ¹0º C n ; G1 WD B b
G 1:
G1 is the invariant subspace of generalized eigenvectors (cf. Definition 2.59) for the
dominant eigenvalue 1 . Let x .0/ 2 C n n ¹0º satisfy cos †.B 1 x .0/ ; b
G 1 / ¤ 0. Let
.x .m/ /1
mD0 be the iteration vectors of the usual power iteration given by (4.1). Then
there is a polynomial p of degree not higher than nO WD max¹n2 1; : : : ; nk 1º such
that
.m/ j2 j mnO
sin †.x ; G1 / p.m/ for all m 2 NnO ;
j1 j
i.e., the angle between the iteration vectors and the subspace G1 converges to zero.
Proof. We follow the approach already used in the proof of Theorem 4.20: we define
x .m/ WD B 1 x .m/
b for all m 2 N0
and observe
0 1
R1
B :: C .m/
x .mC1/ D @
b : Ab
x for all m 2 N0 : (4.32)
Rk
.m/
We define b
x` 2 C n` for all m 2 N0 and ` 2 ¹1; : : : ; kº by
0 .m/ 1
b
x1
B :: C
x .m/
b D@ : A for all m 2 N0 ;
.m/
b
xk
96 Chapter 4 Power methods
x .0/
c1 j1 jm kb m .0/
1 k kR1 b
x1 k for all m 2 Nn1 :
Due to the definition (2.8) of the Euclidean norm and (4.33), this implies
b x .m/ k2 D kb
kPb
.m/
x 1 k2 D kR1mb
.0/
x 1 k2
c12 j1 j2m kb
x 1 k2
.0/
By assumption and Proposition 4.19, we have kPb b x .0/ k D cos †.bx .0/ ; b
G 1 / ¤ 0, so
both sides of (4.34) are strictly positive. Now we apply Lemma 4.30 to the remaining
blocks R2 ; : : : ; Rk . We find polynomials p2 ; : : : ; pk , independent of m, such that
x .0/
kR`mb `
k p` .m/j` jm.n` 1/ kb
x .0/
`
k for all m 2 Nn` 1 ; ` 2 ¹2; : : : ; kº:
X
k X
k
b /b
k.I P x .m/ k2 D
.m/
x ` k2 D
kb kR`mb
.0/
x ` k2
`D2 `D2
X
k
O .0/
p` .m/2 j` j2.mn/ x ` k2
kb
`D2
Section 4.8 Convergence for general matrices 97
X
k
O
p` .m/2 j2 j2.mn/ x .0/
kb `
k2
`D2
X
k
O .0/
max¹p` .m/2 W ` 2 ¹2; : : : ; kººj2 j2.mn/ x ` k2
kb
`D2
for all m 2 NnO :
By taking the maximum of the absolute values of the coefficients of the polynomials
p2 ; : : : ; pk , we can construct a polynomial pmax of degree not larger than nO such that
and conclude
X
k
b /b
k.I P O
x .m/ k2 pmax .m/2 j2 j2.mn/ x .0/
kb k2
`
`D2
O
D pmax .m/2 j2 j2.mn/ b /b
k.I P x .0/ k2 for all m 2 NnO :
(4.35)
Now we have to translate the estimates back to the original iteration vectors x .m/ . We
e WD B P
let P b B 1 and observe that it is a projection onto G1 , although unfortunately
not orthogonal in general. We denote the orthogonal projection onto G1 by P 2 C nn .
The best-approximation property (4.19a) and (3.11) yield
e /x .m/ k D kB.I P
k.I P /x .m/ k k.I P b /B 1 x .m/ k
D kB.I Pb /bx .m/ k kBk k.I Pb /b
x .m/ k for all m 2 N0 ;
b /b
k.I P /x .m/ k kBkpmax .m/j2 jmnO k.I P x .0/ k for all m 2 NnO : (4.36)
kB 1 k kx .m/ k
kB 1 x .m/ k D kb b x .m/ k
x .m/ k
kPb for all m 2 N0 ;
b x .m/ k
kPb b x .0/ k
c1 j1 jm kPb
kx .m/ k
for all m 2 N0 : (4.37)
kB 1 k kB 1 k
98 Chapter 4 Power methods
Remark 4.33 (Linear convergence). Theorem 4.31 yields “almost” the same conver-
gence behaviour as Theorem 4.2, at least in the long run: let % WD j2 j=j1 j < 1
denote the rate of convergence for the diagonalizable case, and let %O 2 .%; 1/. Then
we have q WD %=%O < 1 and find
sin †.x .m/ ; G1 / %mn p.m/ D %O mn q mn p.m/ for all m 2 Nn :
O WD m n and p.
To simplify the expression, we introduce m O m/
O WD p.m
O C n/ and
obtain
O
sin †.x .mCn/ ; G1 / %O mO q mO p.
O m/
O O 2 N0 :
for all m
Due to q < 1, we can find
2 R>0 such that exp.
/ D 1=q, and therefore
exp.
/ D q and q mO D exp.
m/.
O Since m
O is non-negative, we have
X
nC1
O `
.
m/
nC1
O
exp.
m/
O nC1
m O 2 N0
for all m
`Š .n C 1/Š
`D0
and conclude
O O m/
p. O .n C 1/Šp.
O m/
O
sin †.x .mCn/ ; G1 / %O mO %O mO nC1 nC1 O 2 N0 :
for all m
O
exp.
m/
mO
Section 4.8 Convergence for general matrices 99
QR iteration
Summary
The most successful algorithm for finding all eigenvalues and eigenvectors of a given
basis is the QR iteration. The basic formulation of the method can be derived from the
simultaneous iteration (cf. Section 4.7). In order to obtain an efficient algorithm, sev-
eral improvements are required: the matrix has to be reduced to Hessenberg form, an
appropriate shift strategy has to be used, and partially converged intermediate results
have to be handled. The resulting practical QR algorithm can be expected to show
quadratic or even cubic convergence and compute all eigenvalues and eigenvectors of
an n n matrix in O.n3 / operations.
Learning targets
Introduce the basic QR iteration.
Derive an algorithm for reducing a matrix to Hessenberg or tridiagonal form.
Rewrite the iteration as an implicit algorithm requiring only a small amount of
auxiliary storage.
Get to know shift strategies for improving the speed of convergence.
matrices satisfy
and due to the triangular structure of the second factor R.mC1/ , taking the first block
column of this equation yields
.mC1/ .mC1/
Qk Rkk D AQk.m/ for all m 2 N0 :
Obviously this equation shares the structure of (5.1): the matrices .Qk /1
.m/
mD0 are the
.0/
result of a simultaneous iteration starting with the isometric matrix Qk .
We assume for the moment that A is self-adjoint (for F D R) or normal (for
F D C), respectively, and that its eigenvalues satisfy the condition (4.20). Let
P 2 F nn denote the orthogonal projection into the invariant subspace spanned by
the eigenvectors corresponding to the first k eigenvalues. We also assume PQk.0/ to
be injective.
Corollary 4.27 states
.m/ .m/ .m/ jkC1 j m
kAQk Qk ƒkk k C0 for all m 2 N0 (5.3)
jk j
We are interested in the lower left block. Due to the estimate (5.3), we know that
.m/ .m/ .m/
kAQk Qk ƒkk k converges to zero, and since Q.m/ is unitary, we have
! !
.Qk.m/ /
.m/ .m/
.Qk / Qk .Qk.m/ / Q?
.m/
.m/ .m/
.m/ .m/ .m/ .m/
D .m/ Qk Q?
.Q? / Qk .Q? / Q? .Q? /
.m/ .m/ I
D .Q / Q DI D
I
.m/ .m/
and therefore .Q? / Qk D 0 in particular. For the lower left block of A.m/ , this
means
where we have used the definition of the norm (2.8) in the third step and Proposi-
tion 2.39 and (2.16) in the fifth step.
This estimate means that the lower left block of A.m/ in (5.4) converges to zero,
therefore A.m/ converges to block upper triangular form. We can use this property to
develop an algorithm that takes the matrices arbitrarily close to triangular form, i.e.,
we can approximate the Schur decomposition introduced in Theorem 2.46.
We can judge the convergence of the iteration by checking the entries of the matri-
ces A.m/ , therefore we are interested in finding an efficient way for computing them.
Fortunately, we can rewrite the simultaneous iteration in a way that allows us to com-
pute A.mC1/ directly from A.m/ : let m 2 N0 . By definition, we have
and obtain
b .mC1/ R.mC1/ D A.m/ :
Q (5.8)
We can reverse this construction: we compute a QR factorization b .mC1/ R.mC1/
Q D
A.m/ , let Q.mC1/ WD Q.m/ Q b .mC1/ , and observe that the defining equation (5.6)
holds. We are interested in finding
i.e., the matrix A.mC1/ can be obtained by multiplying the factors of the QR decom-
position of A.m/ in reversed order. The resulting simple QR iteration is summarized
in Figure 5.1. If we are not interested in the eigenvectors, we can neglect to update
the matrix Q in each step.
104 Chapter 5 QR iteration
For a triangular matrix, all entries below the diagonal have to be equal to zero. For a
Hessenberg matrix, the entries immediately below the diagonal (the first subdiagonal)
are allowed to be non-zero. For a tridiagonal matrix, only the entries on the diagonal
and immediately below and immediately above the diagonal are allowed to be non-
zero.
The resulting patterns of non-zero entries (denoted by “”) for Hessenberg (left)
and tridiagonal matrices (right) look as follows:
0 1 0 1
B C B C
B C B C
B C C B C:
B B C
@ A @ A
By multiplying with this matrix, we can eliminate the q-th entry of a given vector
changing only the p-th and the q-th component.
We consider a matrix of the form
0 1
B C
B C
H DB B C
C
@ A
106 Chapter 5 QR iteration
as an example. We apply a Givens rotation G .1/ to the first two rows in order to
eliminate the first sub-diagonal element h21 :
0 1
˝ ˝ ˝ ˝ ˝
B0 ˝ ˝ ˝ ˝C
B C
H .1/ WD G .1/ H D B
B CC;
@ A
where “˝” denotes matrix entries changed by the transformation. Next, we eliminate
the second sub-diagonal element h32 by applying a Givens rotation G .2/ to the second
and third row: 0 1
B ˝ ˝ ˝ ˝C
B C
H WD G H D B
.2/ .2/ .1/
B 0 ˝ ˝ ˝C C:
@ A
Givens rotations G .3/ and G .4/ applied to the third and fourth and fourth and fifth
rows, respectively, lead to the required upper triangular form:
0 1
B C
B C
.3/
H WD G H D B .3/ .2/ B ˝ ˝ ˝C C;
@ 0 ˝ ˝A
0 1
B C
B C
.4/ .3/
R WD G H D B B C C:
@ ˝ ˝A
0 ˝
In general, we require n 1 Givens rotations, and each rotation works on only two
rows of the matrix, so the QR factorization H D QR b of a Hessenberg matrix can
be constructed in O.n2 / operations. In the case of a tridiagonal matrix, we can avoid
computing zeros in the upper right triangle and require only O.n/ operations.
For the next step of the QR iteration, we have to compute Hb WD RQ, b i.e., we have
to apply the adjoint Givens rotations to the columns of R. In our example, we have
Section 5.2 Hessenberg form 107
Since each step changes only two columns of the matrix, we require only O.n2 /
operations to perform one step of the QR iteration. This is a significant improvement
compared to the cubic complexity required by a general QR factorization.
We can see that one step of the QR iteration preserves the Hessenberg form: if A.0/
is a Hessenberg matrix, the same holds for all matrices A.m/ . This property allows us
to perform the entire iteration more efficiently. The resulting efficient algorithm for
performing one step of the QR iteration is summarized in Figure 5.3.
Proposition 5.2 (Complexity). The algorithm given in Figure 5.3 requires not more
than
12n2 operations
to perform one step of the QR iteration.
Proof. For a given k 2 ¹1; : : : ; n 1º, computing %, ck and sk requires not more than
8 operations. The application of the corresponding Givens rotation to all columns
takes not more than 6.n k/ operations, the application to all rows 6k C 2 operations,
108 Chapter 5 QR iteration
and the application to all rows of Q 6n operations. The total number of operations is
bounded by
X
n1
.8 C 6.n k/ C .6k C 2/ C 6n/
kD1
X
n1
D .10 C 12n/
kD1
D 10.n 1/ C 12n.n 1/ 12n2 :
The Hessenberg form has the additional advantage that it is easy to monitor conver-
gence to upper triangular form: we only have to check the n 1 sub-diagonal entries
for convergence to zero.
If A is self-adjoint, the same holds for all matrices A.m/ , since they are unitarily
similar. If A.0/ is also a Hessenberg matrix, the same holds for all A.m/ , so all of these
matrices are self-adjoint Hessenberg matrices and therefore tridiagonal. This means
that for self-adjoint matrices, we can take advantage of the tridiagonal structure to
reduce the number of operations.
Example 5.3 (QR iteration). We consider the QR iteration applied to the matrix
0 1
4
B1 3 C
ADB @ 1 2 A:
C
1 1
Section 5.2 Hessenberg form 109
Since it is lower triangular, we easily find .A/ D ¹1; 2; 3; 4º. Given the results of
Corollary 4.27, we expect convergence rates of 3=4, 2=3 and 1=2 for the first, second
and third subdiagonal entries, respectively. The program gives the following results:
.m/ .m/ .m/
m ja21 j Ratio ja32 j Ratio ja43 j Ratio
1 7:46 101 6:95 10 1 4:13 101
2 5:48 101 0:73 4:62 101 0:66 1:75 101 0:42
3 3:98 101 0:73 2:95 101 0:64 8:24 102 0:47
4 2:86 101 0:72 1:87 101 0:63 4:09 102 0:50
5 2:06 101 0:72 1:20 101 0:64 2:07 102 0:51
6 1:49 101 0:72 7:81 102 0:65 1:05 102 0:51
7 1:08 101 0:72 5:13 102 0:66 5:33 103 0:51
8 7:88 102 0:73 3:39 102 0:66 2:70 103 0:51
The subdiagonal entries appear to converge at the rate predicted by our theory.
Exercise 5.4 (Self-adjoint QR iteration). Derive a version of the Hessenberg QR
iteration given in Figure 5.3 for self-adjoint matrices that requires only O.n/
operations for each iteration.
Apparently the Hessenberg form is useful, therefore we are looking for an algorithm
that transforms an arbitrary matrix into this form. We consider a matrix of the form
0 1
B C
B C
ADB B C :
C
@ A
We first look for a unitary similarity transformation that eliminates the entries a31 ,
a41 and a51 . This can be done by Givens rotations, but we choose to use Householder
reflections for the sake of efficiency: according to Lemma 2.45, we can find a unitary
self-adjoint matrix P1 such that
0 1 0 1
a21 1
Ba31 C B 0 C
P1 B C B C
@a41 A D @ 0 A
a51 0
holds for a 1 2 F, and applying this transformation to the second to fifth row of
A yields 0 1
B˝ ˝ ˝ ˝ ˝C
1 B C
ADB B0 ˝ ˝ ˝ ˝CC:
P1 @0 ˝ ˝ ˝ ˝A
0 ˝ ˝ ˝ ˝
110 Chapter 5 QR iteration
Since we are looking for a similarity transformation, we have to apply the inverse of
P1 to the second to fifth column. Due to P1 D P1 D P11 , computing the inverse is
not really a challenge, and we obtain
0 1
˝ ˝ ˝ ˝
B ˝ ˝ ˝ ˝C
1 1 B C
b .1/
A WD A DBB ˝ ˝ ˝ ˝C C:
P1 P1 @ ˝ ˝ ˝ ˝A
˝ ˝ ˝ ˝
.1/ .1/
Now we find a Householder reflection P2 that eliminates aO 42 and aO 52 , i.e., such that
0 .1/
1 0 1
aO 32 2
B .1/ C @ A
P2 @aO 42 A D 0
.1/ 0
aO 52
holds for a 2 2 F. Applying it to the third to fifth rows and columns yields
0 1
B C
I2 B C
A DB
b .1/
B ˝ ˝ ˝ ˝C C;
P2 @ 0 ˝ ˝ ˝A
0 ˝ ˝ ˝
0 1
˝ ˝ ˝
B ˝ ˝ ˝C
I2 .1/ I2
B C
b .2/
A WD b
A DB B ˝ ˝ ˝C C:
P2 P2 @ ˝ ˝ ˝A
˝ ˝ ˝
.2/
In the last step, we find a Householder reflection P3 that eliminates aO 53 , i.e., such
that !
.2/
aO 43
P3 .2/ D 3
aO 53 0
holds for a 3 2 F. Applying this reflection to the fourth and fifth rows and columns
gives us the required Hessenberg form:
0 1
B C
I3 B C
b
A.2/ DB
B CC;
P3 @ ˝ ˝ ˝A
0 ˝ ˝
Section 5.2 Hessenberg form 111
0 1
˝ ˝
B ˝ ˝C
I .2/ I3
B C
b
A WD 3
.3/ b
A DB
B ˝ ˝CC:
P3 P3 @ ˝ ˝A
˝ ˝
We can “hide” this procedure in the choice of the initial basis for the QR iteration: we
choose the initial matrix as
.0/ 1 I2 I3
Q D
P1 P2 P3
and obtain
A.0/ D .Q.0/ / AQ.0/ D b
A.3/ :
The resulting algorithm is given in Figure 5.4.
Exercise 5.5 (Triangular form). Explain why this procedure cannot be used to
reach triangular form, i.e., to compute the Schur decomposition directly.
Proposition 5.7 (Complexity). The algorithm given in Figure 5.4 requires not more
than
16 3
n operations
3
to transform A into a Hessenberg matrix.
Proof. For a given k 2 ¹1; : : : ; n 2º, computing ˛, ˇ and the Householder vector
requires
.n k/ C 6 operations:
Applying the Householder reflection to the rows of A requires
end;
for i 2 ¹1; : : : ; nº do begin ¹ Apply to columns of Q º
0;
for j 2 ¹k C 1; : : : ; nº do
C qij aj k ;
ˇ;
for j 2 ¹k C 1; : : : ; nº do qij qij aNj k
end
end
end
end
Figure 5.4. Transformation to Hessenberg form.
X
n2
.8n.n k/ C 4.n k/2 C 4.n k/ C 6/
kD1
X
n2 X
n2
D .8n C 4/ .n k/ C 4 .n k/2 C 6.n 2/
kD1 kD1
n.n 1/ 4
.8n C 4/ 1 C n.n 1/.2n 1/ C 6.n 2/
2 6
Section 5.3 Shifting 113
2 16 4
4n C 2 C .2n 1/ n.n 1/ D nC .n2 n/
3 3 3
16 3 4 2 16 2 4 16 3
D n C n n n n :
3 3 3 3 3
Exercise 5.8 (Improved computation of Q). Assume that we start the transfor-
mation to Hessenberg form in the algorithm of Figure 5.4 with Q D I . Prove that
in this case the number of operations can be reduced by 2n3 =3 without changing
the result. (Hint: It may be necessary to apply the Householder reflections in
reversed order)
5.3 Shifting
The rate of convergence observed in Example 5.3 is too slow for a practical method:
since the computational effort for each step of the iteration is proportional to n2 , we
would like the number of steps to be as small as possible. We have already seen in
Example 4.14 that shifting the spectrum can significantly improve the convergence,
so we are interested in introducing a shift strategy for the QR iteration.
Since the QR method works with a full basis of F n , we can use shift strategies
without the need for computing an inverse: if we have found a shift parameter 2
F sufficiently close to an eigenvalue n 2 .A/ and sufficiently far from all other
eigenvalues, i.e., if we have
j1 j
j2 j
jn1 j > jn j;
applying (5.5) to k D n1 and the matrix AI yields that the off-diagonal elements
in the n-th row will converge to zero at a rate of jn j=jn1 j.
We can easily introduce the shift to the QR iteration: let m 2 N0 . One step of the
simultaneous iteration for A I is given by
.mC1/ .mC1/ .m/
Q R D .A I /Q ;
and write the first half of the shifted QR step in the form
b .mC1/
Q
.mC1/
R D A.m/
I: (5.10)
.mC1/
In order to compute the next iteration matrix A , we consider
.mC1/ b .mC1/ .m/ b .mC1/
A.mC1/
D .Q .mC1/
/ AQ D .Q / .Q
.m/
/ AQ Q
b .mC1/
D .Q / A.m/
Q
b .mC1/ D .Q
b .mC1/
/ .A.m/ b .mC1/ C I
I /Q
b .mC1/
D .Q
b .mC1/
/ Q
.mC1/ b .mC1/
R Q C I
.mC1/ b .mC1/
D R Q C I: (5.11)
Combining (5.10) and (5.11) yields the definition of the shifted QR iteration:
b .mC1/
Q
.mC1/
R D A.m/
I;
.mC1/ b .mC1/
A.mC1/
D R Q C I for all m 2 N0 :
As we can see, the only difference between the shifted QR iteration and the original
one consists of subtracting from the diagonal elements prior to computing the QR
factorization and adding after the multiplication. Compared to the overall complex-
ity of the QR iteration, the additional operations required for the shift are negligible.
As in the case of the Rayleigh iteration, we can choose a different shift parameter
for each step of the iteration, and doing so promises quadratic or even cubic con-
vergence of the resulting shifted QR iteration. In the following, we will write A.m/
.m/
instead of A in order to keep the notation simple if the choice of shift parameter is
clear from the context in order to keep the notation simple.
It is customary to try to choose the shift parameter to speed up convergence of
the last row of the iteration matrices. According to (5.5), this means that we should
ensure that the quotient jn j=jn1 j is as small as possible. If the subdiagonal
.m/
elements of the row are sufficiently close to zero, the diagonal element ann is close
to the eigenvalue n , so it is a natural choice for the shift parameter.
Definition 5.9 (Rayleigh shift). The Rayleigh shift strategy is given by choosing the
.m/
shift parameter in the .m C 1/-th step of the QR iteration as D ann .
The name “Rayleigh shift” is motivated by the close connection to the Rayleigh
quotient introduced in Definition 4.4: if we denote the last column of Q.m/ by q .m/ ,
we have
.m/
ann D ..Q.m/ / AQ.m/ /nn D hAq .m/ ; q .m/ i D ƒA .q .m/ /
due to the definition of A.m/ and kq .m/ k D 1. This equation suggests that we can
expect the same good convergence behaviour as for the Rayleigh iteration described
in Section 4.5.
Section 5.3 Shifting 115
pb
A
.t / D det.tI b A/ D .t aO 11 /.t aO 22 / aO 12 aO 21
2
D t .aO 11 C aO 22 /t C aO 11 aO 22 aO 12 aO 21
2 aO 11 C aO 22 2 aO 11 C aO 22 2
D t .aO 11 C aO 22 /t C C aO 11 aO 22 aO 12 aO 21
2 2
aO 11 C aO 22 2 aO 11 aO 22 2
D t aO 12 aO 21 ;
2 2
which are given by
8 s 9
< aO 2 =
C aO 22 aO 11 aO 22
.b
11
A/ D ˙ C aO 12 aO 21 :
: 2 2 ;
Since we are interested in convergence of the last row, we should choose the eigen-
value in .b
A/ that is closest to the n-th diagonal element.
Definition 5.10 (Francis single shift). The Francis single shift strategy is given by
.m/
choosing the element closest to ann in the set
8 v 9
ˆ u .m/ !2 >
<a .m/ .m/ u a .m/ =
b n1;n1 C ann t n1;n1 ann .m/ .m/
.A/ D ˙ C a a
n1;n n;n1 >
:̂ 2 2 ;
If we are dealing with a real matrix, the Francis shift strategy may yield a complex-
valued shift parameter, and this would make the next iteration matrix also complex-
valued. Since eigenvalues of real matrices always appear in complex-conjugate pairs,
it is possible to avoid working with complex values by first performing a QR step with
the Francis shift and following it by a QR step with its complex conjugate . N Due to
N / D A2 . C /A
.A I /.A I N C jj2 I;
116 Chapter 5 QR iteration
the resulting iteration matrix is again real-valued. It is possible to arrange the com-
putation in such a way that complex numbers are avoided entirely, this results in the
Francis double shift strategy (cf. Example 5.17).
Example 5.11 (Failure to converge). As we have seen in the case of the Rayleigh
iteration (cf. Proposition 4.13), shift strategies work best if the iteration vectors or
matrices are sufficiently close to the result. The matrix
0 1
0 0 1
P WD @1 0 0A
0 1 0
taken from [32] illustrates this point: its characteristic polynomial is given by
pP .t / D t 3 1;
so its eigenvalues are the third roots of unity. Since they are equal in modulus, we can
expect no convergence unless a suitable shift parameter is chosen.
Unfortunately, both the Rayleigh and the Francis shift strategies choose the shift
parameter D 0.
5.4 Deflation
Due to (5.5), we can expect the matrices A.m/ to converge to a matrix of the form
B F
A.m/ ; B 2 F kk ; D 2 F .nk/.nk/ : (5.12)
C
Once A.m/ is sufficiently close to this block-triangular form, we can stop performing
iterations for the entire matrix and focus on handling the diagonal blocks B and C
individually: if we can find unitary matrices QB 2 F kk and QC 2 F .nk/.nk/
such that QB BQ and Q CQ are sufficiently close to upper triangular form, the
B C C
same holds for
QB .m/ QB QB B F QB
A
QC QC QC C QC
QB BQB QB FQC
D CQ ;
QC C
correctly, we will apply the procedure to smaller and smaller diagonal blocks until we
arrive at 1 1 blocks and have reached an approximation of the Schur decomposition.
Applying the deflation approach to Hessenberg matrices is particularly simple:
the subdiagonal blocks contain only one non-zero entry, so we can easily check for
convergence. In a practical deflation approach, we first look for the largest index
˛ 2 ¹1; : : : ; nº such that
.m/
akC1;k 0 for all k 2 ¹1; : : : ; ˛ 1º:
If ˛ D n, the matrix A.m/ is sufficiently close to upper triangular form and we can
.m/
stop the iteration. Otherwise, we have a˛C1;˛ 6 0 due to the maximality of ˛, and
we look for the largest ˇ 2 ¹˛; : : : ; nº such that
.m/
akC1;k
6 0 for all k 2 ¹˛; : : : ; ˇ 1º:
.m/
If ˇ < n, we have aˇ C1;ˇ 0 due to the maximality of ˇ.
We split the matrix A.m/ in the form
0 1
H11 H12 H13 H11 2 F .˛1/.˛1/ ; H22 2 F .ˇ ˛C1/.ˇ ˛C1/ ;
A.m/ @ H22 H23 A ;
H33 H33 2 F .nˇ /.nˇ / :
Due to our choice of ˛, H11 is sufficiently close to triangular form and the block below
H11 is sufficiently close to zero. Due to our choice of ˇ, the block below H22 is also
sufficiently close to zero. H22 is a Hessenberg matrix, and we can efficiently apply the
QR iteration to try to eliminate its subdiagonal entries. Once we have succeeded, H11
and H22 are upper triangular and we can apply the same procedure to the remaining
matrix H33 .
The resulting algorithm is given in Figure 5.5. In order to decide whether a condi-
.m/
tion of the form “a˛C1;˛ 0” is satisfied in practice, a condition like
is frequently used, where > 0 denotes a relative error tolerance. A typical choice
for depends on the machine’s floating point
p accuracy mach . Common choices are
D c mach for a small constant c or D n mach , taking into account the matrix
dimension.
Example 5.12 (QR iteration). We apply the QR iteration with Francis and Rayleigh
shift to the matrix A given in Example 5.3. The program gives the following results:
118 Chapter 5 QR iteration
Rayleigh Francis
.m/ .m/ .m/ .m/ .m/ .m/
m ja21 j ja32 j ja43 j ja21 j ja32 j ja43 j
1 6:78 101 6:23 101 conv. 6:78 101 6:23 101 conv.
2 3:86 101 7:14 102 3:66 101 conv.
3 1:73 101 4:51 10 3 conv.
4 7:68 102 2:32 105
5 3:60 102 5:96 1010
6 1:74 102 conv.
7 2:73 104
8 7:27 108
9 conv.
We can see that the Rayleigh shift strategy seems to lead to quadratic convergence as
soon as the subdiagonal entries are sufficiently close to zero. With the Francis shift
strategy, the iteration requires only three steps to converge. For practical problems,
the difference between both strategies can be expected to be far less pronounced.
sponding to the Givens rotations used during the factorization process. In this section,
we derive an alternative version of the QR iteration that avoids these inconveniences
and facilitates the implementation of multiple shift strategies like the Francis double
shift or even more sophisticated techniques that can lead to significant improvements
of the basic algorithm [50, 4, 5].
We have seen that the QR iteration preserves the Hessenberg form of the iteration
matrices A.m/ : if A.m/ is a Hessenberg matrix, then so is the next iteration matrix
A.mC1/ D .Q b .mC1/ / A.m/ Q
b .mC1/ . Unitarily similar Hessenberg matrices like A.m/
and A.mC1/ have special properties that we can use to simplify the algorithm: if we
have two unitarily similar Hessenberg matrices with non-zero subdiagonal entries and
if the first column of the corresponding similarity transformation matrix is the first
canonical unit vector, then both matrices can only differ in the signs of their entries.
Theorem 5.13 (Implicit Q). Let H; J 2 F nn be Hessenberg matrices and let Q 2
F nn be unitary such that
J D Q HQ: (5.13)
Let m 2 ¹0; : : : ; n 1º satisfy
If the first column of Q is the canonical unit vector ı1 , the first m C 1 columns of Q
are multiples of the first m C 1 canonical unit vectors.
Proof. (cf. [18, Theorem 7.4.2]) We prove by induction on k 2 ¹1; : : : ; nº that the
k-th column qk WD Qık of Q is a multiple of the k-th canonical unit vector.
For k D 1, this is given. Let now k 2 ¹1; : : : ; mº be such that q1 ; : : : ; qk are
multiples of the corresponding unit vectors. This implies
X
kC1 X
kC1 X
kC1
q` j`k D Q ı` j`k D QJ ık D HQık D H qk D H ık qkk D ı` h`k qkk :
`D1 `D1 `D1
X
k X
kC1
qkC1 jkC1;k C ı` q`` j`k D ı` h`k qkk ;
`D1 `D1
120 Chapter 5 QR iteration
X
k
qkC1 jkC1;k D ıkC1 hkC1;k qk C ı` .h`k qkk q`` j`k /:
`D1
since Q is unitary. Combining (5.15) and (5.16) yields that qkC1 has to be a multiple
of ıkC1 .
We aim to use this result to simplify the QR iteration: the QR iteration takes a
Hessenberg matrix A.m/ and computes a unitarily similar Hessenberg matrix A.mC1/ .
If we can find a new algorithm that also yields a unitarily similar Hessenberg matrix
e
A.mC1/ and if the first column of both similarity transformations are equal, Theo-
rem 5.13 yields that A.mC1/ and e A.mC1/ can differ only in the signs of the coeffi-
cients.
B D Q AQ; eDQ
B e AQ:
e
Then there are a unitary diagonal matrix D 2 F .mC1/.mC1/ and a unitary matrix
b 2 F .nm1/.nm1/ such that
P
e D D
BD b B b :
P P
Section 5.5 Implicit iteration 121
e Q and observe
Proof. In order to be able to apply Theorem 5.13, we define P WD Q
eB
ADQ eQ
e ;
eB
B D Q AQ D Q Q eQ
e Q D P BP:
e
Since the first columns of Q and Qe are identical, the first column of P has to be the
first canonical unit vector.
Theorem 5.13 yields that the first m C 1 columns of P are multiples of the first
m C 1 canonical unit vectors, so we can find a diagonal matrix D 2 F .mC1/.mC1/
b 2 F .nm1/.nm1/ and R 2 F .mC1/.nm1/ such that
and matrices P
D R
P D b :
P
The Hessenberg QR step discussed in Section 5.2 consists of first finding Givens
rotations G1 ; : : : ; Gn1 that render A.m/ upper triangular
and then multiplying by their adjoints in reversed order to compute the next iteration
matrix
In our construction, only the first rotation G1 can change the first component of a
vector, therefore the first column of Q.mC1/ depends only on G1 . If we can construct
e2; : : : ; G
an alternate sequence of unitary transformations G e n1 that do not change the
first component, the first columns of
e e .mC1/ / A.m/ Q
A.mC1/ WD .Q e .mC1/
is also a Hessenberg matrix, Corollary 5.14 applies and guarantees that A.mC1/ and
e
A.mC1/ differ only in the signs of the coefficients, while the convergence of the re-
sulting iteration remains unchanged. The condition (5.17) can be taken care of by
using the deflation approach introduced in Section 5.4: as soon as a subdiagonal entry
vanishes, the matrix is split into submatrices with non-zero subdiagonal entries and
the submatrices are processed independently.
Our goal is now to construct the matrices G e n1 . We illustrate the proce-
e2; : : : ; G
dure using a Hessenberg matrix of the form
0 1
B C
B C
H WD A .m/
DBB C C:
@ A
.1/
We can see that H .1/ is “almost” a Hessenberg matrix, only the coefficient h31 in
the first column of the third row poses a problem. Fortunately, we can apply a Givens
e 2 to the second and third row to eliminate this entry and obtain
rotation G
0 1
B˝ ˝ ˝ ˝ ˝C
B C
G 2H D B
e .1/
B 0 ˝ ˝ ˝ ˝C ;
C
@ A
Section 5.5 Implicit iteration 123
0 1
˝ ˝
B ˝ ˝ C
B C
H .2/ WD G e 2 D B
e 2 H .1/ G ˝ ˝ C
B C:
@ ˝ A
The entry in the first column is gone, unfortunately we have introduced a non-zero
.2/ e3
entry h42 in the second column of the fourth row. We can apply a Givens rotation G
to the third and fourth row to get rid of this entry and find
0 1
B C
B C
G 3H D B
e .2/
B ˝ ˝ ˝ ˝C C;
@ 0 ˝ ˝ ˝A
0 1
˝ ˝
B ˝ ˝ C
B C
H WD G 3 H G 3 D B
.3/ e .2/ e
B ˝ ˝ C C:
@ ˝ ˝ A
˝
.3/
Once more only one entry causes problems, now the entry h53 in the third column of
e 4 for the fourth and
the fifth row. We already know what to do: a Givens rotation G
fifth row eliminates the offending entry:
0 1
B C
B C
e4H D B
.3/
C
G B C;
@ ˝ ˝ ˝A
0 ˝ ˝
0 1
˝ ˝
B ˝ ˝C
B C
.4/
H WD G e4H G.3/ e4 D B
˝ ˝C
B C:
@ ˝ ˝A
˝ ˝
We can see that we have achieved our goal:
e e n1 : : : G
A.mC1/ WD H .4/ D G e 2 : : : G
e 2 G1 A.m/ G1 G e n1
is a Hessenberg matrix. This technique is called “bulge chasing”, since the matrices
H .1/ ; : : : ; H .n2/ differ from Hessenberg form only by the “bulge” marked as
consisting of one non-zero entry and this bulge is “chased” down and to the right until
it “drops out of the matrix” and we have reached Hessenberg form.
124 Chapter 5 QR iteration
e2; : : : ; G
Since the rotations G e n1 do not change the first row, the conditions of
Corollary 5.14 are fulfilled and we have computed an iteration matrix e A.mC1/ that
can replace A .mC1/ without changing the important properties of our algorithm (as
long as deflation is used). The resulting algorithm is called the implicit QR method
and offers several advantages, most importantly an elegant way of handling shifts: for
the shifted QR iteration, the next iteration matrix is given by
and since adding and subtracting the identity does not change the Hessenberg form,
we can also apply our construction with a small change: we compute the first Givens
rotation G1 not for the matrix A.m/ , but for the shifted matrix A.m/ I . Then we
apply it to A.m/ and proceed as before to regain Hessenberg form. For the implicit
approach, the only difference between the original and the shifted iteration lies in
the choice of the first Givens rotation. Combining the shifted implicit iteration with
deflation leads to the algorithm given in Figure 5.6 that performs one step of the
practical implicit QR iteration.
Section 5.5 Implicit iteration 125
Proposition 5.15 (Complexity). The algorithm given in Figure 5.6 requires not more
than
12n2 C 8n operations
to perform a step of the implicit QR iteration. The estimate does not include the
operations required for finding an appropriate shift parameter.
Proof. We require 8 operations to compute the first Givens rotation and 6.n ˛ C 1/
operations to apply it to the rows as well as 6.˛ C 1/ operations to apply it to the
columns, for a total of
8 C 6.n ˛ C 1 C ˛ C 1/ D 8 C 6.n C 2/ D 20 C 6n
6n
operations. For each k 2 ¹˛; : : : ; ˇ 2º, updating the .k C 2/-th row takes 2 op-
erations, determining the Givens rotation to eliminate the bulge takes 6 operations,
applying this rotation to the rows requires 6.n k/ operations, and applying it to the
column requires 6.k C 2/, for a total of
8 C 6.n k C k C 2/ D 8 C 6.n C 2/ D 20 C 6n
6n
operations. In the worst case, all subdiagonal entries are non-zero and we have ˛ D 1
and ˇ D n, so the entire algorithm requires not more than
X
ˇ 2
20 C 12n C .20 C 12n/ D .20 C 12n/.1 C ˇ 2 ˛ C 1/
kD˛
D .20 C 12n/.ˇ ˛/
D .20 C 12n/.n 1/ 20n C 12n2 20 12n
D 12n2 C 8n 20 < 12n2 C 8n
operations.
Although the number of operations for the implicit algorithm is slightly higher than
in the case of the original algorithm given in Figure 5.5, its practical performance is
frequently superior, e.g., in the self-adjoint case, A is a tridiagonal matrix, and the
bulge-chasing algorithm works its way from the upper left to the lower right along the
diagonal. If the tridiagonal matrix is stored appropriately, this operation can benefit
from the cache memory of modern processors.
126 Chapter 5 QR iteration
Remark 5.16 (Structured eigenvalue problems). Frequently, the matrix under inves-
tigation exhibits additional structure, e.g., it may be Hamiltonian, Skew–Hamiltonian
or the product of two matrices. In these cases, it is possible to derive special variants
of the QR iteration that can take advantage of the structure [24, 49].
We are looking for a way to get from A.m/ to A.mC2/ without computing A.mC1/ .
We have
b .mC2/
b .mC2/ / A.mC1/ Q
A.mC2/ D .Q
b .mC2/ / .Q
D .Q b .mC1/ Q
b .mC1/ / A.m/ Q b .mC2/
b .mC2/ / A.m/ Q
b .mC1/ Q
D .Q b .mC1/ Q
b .mC2/ D .Q b .mC2/
b .mC2/ / A.m/ Q
dbl dbl
b .mC2/ WD Q
Q b .mC2/ :
b .mC1/ Q
dbl
In order to reach our goal, we have to find this matrix without computing A.mC1/
or Q.mC1/ . We know that the QR iteration is closely related to the simultaneous
iteration, so it is reasonable to consider the result of performing two steps of the
simultaneous iteration at once, i.e., to investigate the product matrix
.A.m/ mC2 I /.A.m/ mC1 I /
DQ b .mC1/ .Qb .mC1/ / .A.m/ mC2 I /Q b .mC1/ .Q
b .mC1/ / .A.m/ mC1 I /
b .mC1/ ..Q
DQ b .mC1/ mC2 I /.Q
b .mC1/ / A.m/ Q b .mC1/ / .A.m/ mC1 I /
b .mC1/ .A.mC1/ mC2 I /R.mC1/
DQ
b .mC2/ R.mC2/ R.mC1/
b .mC1/ Q
DQ
b .mC2/ R.mC2/ ;
DQ dbl dbl
.mC2/
Since R.mC2/ and R.mC1/ are upper triangular, their product Rdbl shares this prop-
b .mC2/
erty (cf. Lemma 4.29) and the matrices Qdbl
.mC2/
and Rdbl define a QR decompo-
sition
b .mC2/ R.mC2/ D .A.m/ mC2 I /.A.m/ mC1 I /
Q (5.18)
dbl dbl
of the product matrix. As long as mC1 and mC2 are not eigenvalues, the right-hand
side matrix is invertible and this equation can be used to construct a matrix that differs
from Qb .mC2/ only by the signs of its columns. If one of the shift parameters happens
dbl
to be an eigenvalue, a subdiagonal entry of A.mC2/ vanishes and we can use deflation.
We can construct Q b .mC2/ using Givens rotations or Householder reflections: we
dbl
consider again the case of 5 5 matrices. Since A.m/ mC1 I and A.m/ mC2 I
are Hessenberg matrices, their product has the form
0 1
B C
B C
.0/
B WD .A .m/
mC2 I /.A .m/
mC1 / D B B C :
C
@ A
We choose a Householder reflection M1 that changes the first three rows to eliminate
.0/ .0/
the entries b21 and b31 :
0 1
˝ ˝ ˝ ˝ ˝
B0 ˝ ˝ ˝ ˝C
B C
B .1/ WD M1 B .0/ DB
B0 ˝ ˝ ˝ ˝CC:
@ A
.1/
A second Householder reflection M2 works on the rows two to four to eliminate b32
.1/
and b42 :
0 1
B ˝ ˝ ˝ ˝C
B C
.2/
B WD M2 B D B .1/ B 0 ˝ ˝ ˝C C:
@ 0 ˝ ˝ ˝A
In the same way, we can construct reflections M3 and M4 acting on the rows three to
five and four to five, respectively, that complete the transformation to upper triangular
128 Chapter 5 QR iteration
form:
0 1
B C
B C
B .3/ WD M3 B .2/ B
DB ˝ ˝ ˝CC;
@ 0 ˝ ˝A
0 ˝ ˝
0 1
B C
B C
B .4/ WD M4 B .3/ B
DB CC:
@ ˝ ˝A
0 ˝
We let Q b
.mC2/
WD M1 : : : Mn1
and Rdbl WD B .n1/ and have found a QR factor-
dbl
ization.
In order to construct an implicit method, we can once again use Corollary 5.14:
we already know that A.mC2/ D .Q b .mC2/ / A.m/ Qb .mC2/ and A.m/ are Hessenberg
dbl dbl
matrices. If we can find a unitary matrix Q e .mC2/ such that eA.mC2/ WD .Q e .mC2/ /
dbl dbl
A.m/ Qe .mC2/ is a Hessenberg matrix and the first columns of Q e .mC2/ and Q b .mC2/
dbl dbl dbl
are identical, we can replace A.mC2/ by e A.mC2/ without harming the convergence of
the QR iteration.
We again employ bulge chasing to construct the matrix Q e .mC2/ : we aim for
dbl
e .mC2/ D M M
Q f : : : M
f , where the unitary transformations M f2 ; : : : ; M
fn1 do
dbl 1 2 n1
not change the first row and ensure that Ae.mC2/ is a Hessenberg matrix. We start with
the Hessenberg matrix
0 1
B C
B C
H WD A .m/
DBB C C
@ A
and apply the first Householder reflection M1 . Since it works on the first three rows,
we end up with a larger bulge:
0 1
˝ ˝ ˝ ˝ ˝
B˝ ˝ ˝ ˝ ˝C
B C
M1 H D B B˝ ˝ ˝ ˝ ˝C ;
C
@ A
Section 5.6 Multiple-shift strategies 129
0 1
˝ ˝ ˝
B˝ ˝ ˝ C
B C
H .1/ WD M1 HM1 D B
B ˝ ˝ C
C:
@ ˝ A
.1/
We can eliminate the offending entries h31 and h.1/
41 by applying a Householder re-
f2 to the rows two to four and find
flection M
0 1
B˝ ˝ ˝ ˝ ˝C
B C
f2 H .1/ D B 0 ˝ ˝ ˝ ˝C
M B C;
@0 ˝ ˝ ˝ ˝A
0 1
˝ ˝ ˝
B ˝ ˝ ˝ C
B C
H .2/ WD M f2 D B
f2 H .1/ M ˝ ˝ ˝ C
B C:
@ ˝ ˝ A
˝
As in the case of the simple implicit QR iteration, we have managed to chase the
bulge one step towards the right lower edge of the matrix. We apply a Householder
f3 to the rows three to five to eliminate h.2/ and h.2/ :
reflection M 42 52
0 1
B C
B C
f3 H .2/ D B
M ˝ ˝ ˝ ˝C
B C;
@ 0 ˝ ˝ ˝A
0 ˝ ˝ ˝
0 1
˝ ˝ ˝
B ˝ ˝ ˝C
B C
H .3/ WD M f3 D B
f3 H .2/ M ˝ ˝ ˝C
B C:
@ ˝ ˝ ˝A
˝ ˝
130 Chapter 5 QR iteration
Part of the bulge has already disappeared, now we only have to apply one last
Householder reflection Mf4 to the fourth and fifth row to eliminate h.3/ and get
53
0 1
B C
B C
f4 H .3/ D B C
M B C;
@ ˝ ˝ ˝A
0 ˝ ˝
0 1
˝ ˝
B ˝ ˝C
B C
f4 D B
f4 H .3/ M
H .4/ WD M ˝ ˝C
B C:
@ ˝ ˝A
˝ ˝
We have constructed
e fn1 : : : M
A.mC2/ WD H .n1/ D M f2 : : : M
f2 M1 A.m/ M1 M fn1 ;
Example 5.17 (Francis double shift). A double shift strategy like the one described
here is particularly useful when dealing with real-valued matrices: although all matrix
entries are real, the eigenvalues may be complex. In this case, they have to appear in
conjugate pairs, so it makes sense to perform a double shift using both numbers, i.e.,
to choose mC2 D N mC1 . This approach is particularly elegant since the matrix
Remark 5.18 (Multiple-shift strategies). Obviously, we do not have to stop with dou-
ble shifts, we can also apply r shifts 1 ; : : : ; r simultaneously by replacing the equa-
tion (5.18) by
Qb .mCr/ R.mCr/ D .A r I / : : : .A 1 I /:
multi multi
The first Householder reflection M1 now has to eliminate r entries in the first column,
therefore the implicit method has to chase a bulge with r rows out of the matrix.
Sophisticated multiple-shift strategies [4] can be used to apply a large number of shifts
simultaneously.
Section 5.6 Multiple-shift strategies 131
Bisection methods
Summary
If we are mainly interested in the eigenvalues of a matrix, we can look for roots of
its characteristic polynomial (cf. Proposition 2.13). In general, evaluating the char-
acteristic polynomials takes too long, but if the matrix is self-adjoint, Householder
transformations can be used to make it tridiagonal (cf. Figure 5.4), and the character-
istic polynomial of tridiagonal matrices can be evaluated efficiently. In this setting, a
bisection algorithm can be used to compute the eigenvalues. Similar to all bisection
methods, it is guaranteed to always converge at a rate of 1=2. In the case of character-
istic polynomials of self-adjoint tridiagonal matrices, it is even possible to refine the
bisection method to choose which eigenvalues are computed.
Learning targets
Introduce the bisection algorithm for eigenvalue problems.
Modify the algorithm to compute specific eigenvalues.
Use Gershgorin circles to obtain a reliable initial guess for the bisection algo-
rithm.
Using the algorithm given in Figure 5.4, we can turn any matrix A 2 F nn into
a Hessenberg matrix H 2 F nn by using unitary similarity transformations, i.e., we
have
Q AQ D H
with a unitary matrix Q 2 F nn . If A is self-adjoint, i.e., if we have A D A , we find
H D .Q AQ/ D Q A Q D Q AQ D H;
We are looking for eigenvalues, and due to Proposition 2.13, these coincide with the
zeros of the characteristic polynomial pT given by
0 1
t a1 bN1
B :: :: C
B b1 : : C
pT .t / D det.tI T / D det B C for all t 2 F:
B :: :: C
@ : : bNn1 A
bn1 t an
In order to derive an efficient algorithm for evaluating pT , we consider the character-
istic polynomials
0 1
t a1 bN1
B :: :: C
B b1 : : C
B
pm .t / WD det B C for all m 2 ¹1; : : : ; nº; t 2 F
: : C
@ : : : : N
bm1 A
bm1 t am
corresponding to the m-th principal submatrices of T . We are looking for a recurrence
relation that allows us to compute pn D pT .
Computing p1 and p2 is straightforward, therefore we only have to consider the
computation of pm for m 2 ¹3; : : : ; nº. We can evaluate the characteristic polynomial
pm for a given t 2 F by Laplace expansion in the last column and row:
0 1
t a1 bN1
B :: :: C
B b1 : : C
B C
B :: :: N C
B
pm .t / D det B : : b C
m3 C
B bm3 t am2 bm2 N C
B C
@ bm2 t am1 bm1 AN
bm1 t am
0 1
t a1 bN1
B :: :: C
B b1 : : C
B C
D .t am / det B
B :: :: N
C
C
B
: : bm3 C
@ bm3 t am2 bm2 AN
bm2 t am1
0 1
t a1 bN1
B :: :: C
B b1 : : C
B C
.bNm1 / det B B ::
:
::
: Nm3
b
C
C
B C
@ bm3 t am2 bm2 AN
bm1
134 Chapter 6 Bisection methods
0 1
t a1 bN1
B :: :: C
B b1 : : C
D .t am / det B
B
C
C
: ::
@ :: : bNm2 A
bm2 t am1
0 1
t a1 bN1
B :: :: C
B b1 : : C
jbm1 j2 det B
B
C
C
: ::
@ : : : bNm3 A
bm3 t am2
2
D .t am /pm1 .t / jbm1 j pm2 .t /:
(4)
pT
(3)
pT
p"T
p'T
pT
Figure 6.1. Sign pattern for pT and its derivatives. The boxes represent the interval Œa; b,
subintervals with negative values of the polynomials are marked in blue.
.m1/
roots of pT . We can take advantage of this property to determine how many roots
.m/
of pT are located within a given interval Œa; b: since all roots of pT are simple, the
.m/
polynomial has to change its sign at each root. Since the roots of pT are interleaved
.mC1/
with the roots of pT , the signs change in a pattern like the one given in Figure 6.1.
We introduce a function s giving us the number of sign changes between pT and
its derivatives for any points 2 R
.m/ .mC1/
s./ WD #¹m 2 ¹0; : : : ; n 1º W pT ./pT ./ < 0º
and see that it decreases by one each time passes from left to right across a root
of pT and remains otherwise constant: it can be used to count the number of roots.
Computing all derivatives of pT is possible, but not very elegant. Fortunately,
.m/
we can replace pT by the characteristic polynomial qm WD pnm of the .n m/-
th principal submatrix without losing the important sign-change properties. Since
p1 ; p2 ; : : : ; pn are computed anyway during our algorithm for evaluating pn D pT ,
checking the signs requires almost no additional effort and we obtain a very efficient
algorithm.
Then we have
Proof. We are interested in the zeros of the polynomials q0 ; : : : ; qn , i.e., in the set
The function s changes it value only if the signs of the functions qm change, i.e., in
points 2 R with Z./ ¤ ;.
Let 2 R with Z./ ¤ ;. Since the set of zeros of polynomials cannot have limit
points, there is an 2 R>0 such that Z./ D ; and q00 ./ ¤ 0 for all 2 .; C .
Let m 2 Z./.
Case 1: Assume m > 0. Condition 4 yields m ¤ n and therefore m 2 ¹1; : : : ; n 1º.
Condition 3 implies that qm1 ./ and qmC1 ./ are not zero and have opposite signs,
and due to the choice of , both qm1 and qmC1 cannot change their signs in Œ; C .
If m 2 S. C /, we have qm . C /qmC1 . C / < 0 and therefore
qm1 . C /qm . C / > 0, i.e., m 1 62 S. C /.
On the other hand if m 62 S. C /, we have qm . C /qmC1 . C / > 0 and
therefore qm1 . C /qm . C / < 0, i.e., m 1 2 S. C /.
cm WD ¹m 1; mº and b
We let M S m ./ WD S./ \ ¹m 1; mº for all 2 Œ; C
and summarize our findings as
#b
S m ./ D 1 for all 2 .; C :
Since qm1 ./ is not zero, we also have b S m ./ D ¹mº and therefore #b
S m ./ D 1
for all 2 Œ; C .
Case 2: Assume now m D 0. Due to condition 2, we have q00 ./q1 ./ > 0, and due
to our choice of , q00 cannot change its sign in .; C .
By the fundamental theorem of calculus, we obtain
Z C
q0 . C /q1 ./ D q00 .t /q1 ./dt > 0;
q0 . C /q1 . C / > 0;
Section 6.1 Sturm chains 137
c0 WD ¹0º and
i.e., 0 62 S. C /. By definition, we also have 0 2 S./. We let M
S 0 ./ WD S./\¹0º and summarize our result as #S 0 ./ D 1 and #b
b b S 0 .C / D 0.
Conclusion: Due to our choice of , the set
[
R WD S./ n cm
M
m2Z./
Proof. We are looking for eigenvectors and eigenvalues of the matrix T . Given a
t 2 R, we look for a non-zero vector e.t / 2 F n such that the first n 1 components
of .tI T /e.t / vanish. If the last component also vanishes, e.t / is an eigenvector for
the eigenvalue t .
In order to ensure e.t / ¤ 0, we choose e1 .t / D 1. By considering the first row of
.tI T /e.t /, we obtain
0 D .t a1 /e1 .t / bN1 e2 .t /;
bN1 e2 .t / D .t a1 /e1 .t / D p1 .t /:
The right-hand side resembles (6.2c), and we can take advantage of this resemblance:
we define p0 WD 1 and let
pm1 .t /
em .t / WD Qm1 for all m 2 ¹1; : : : ; nº; t 2 R:
bNk
kD1
using (6.2c) in the last step. (6.3) implies that the vector e.t / meets our requirements:
the first n 1 components of .tI T /e.t / are equal to zero. Now we consider the
last component given by
pn2 .t / pn1 .t /
.t / WD bn1 en1 .t / C .t an /en .t / D bn1 Qn2 C .t an / Qn1
Nbk N
kD1 kD1 bk
jbn1 j2 pn2 .t / C .t an /pn1 .t / pn .t /
D Qn1 N D Qn1 for all t 2 R:
kD1 bk kD1bNk
Now we can verify the conditions of Definition 6.1. We begin with condition (2). Let
2 R be a root of q0 D pn , i.e.,
.
/ D 0. Taking the inner product of (6.4) by the
vector e.
/ yields
eNn .
/
0 .
/ D he.
/ C .
I T /e 0 .
/; e.
/i D ke.
/k2 C he 0 .
/; .
I T /e.
/i
D ke.
/k2 > 0;
and we conclude
pn1 .
/ pn0 .
/ pn1 .
/pn0 .
/ q1 .
/q 0 .
/
0 < eNn .
/
0 .
/ D Qn1 Qn1 N D Qn1 D Qn1 0 ;
kD1 jbk j kD1 jbk j
2 2
kD1 bk kD1 bk
so the condition (2) holds. It implies q00 .
/ ¤ 0, therefore all zeros of q0 are simple
and (1) holds as well. qn D 1 yields condition (4). This leaves only condition (3) to
consider. Let m 2 ¹1; : : : ; n 1º and let 2 R with 0 D qm ./ D pnm ./. The
recurrence relation (6.2c) yields
qm1 ./ D pnmC1 ./ D . anmC1 /pnm ./ jbnm j2 pnm1 ./
D jbnm j2 qmC1 ./;
else
˛
end
end
Figure 6.2. Compute an interval containing the k-th eigenvalue by a bisection method applied
to Sturm chains.
We can use Theorem 6.2 to compute the k-th eigenvalue of the matrix T : we have
´
1 if m is even;
lim pm .t / D for all m 2 ¹1; : : : ; nº;
t!1 1 otherwise
since the leading coefficients of the polynomials are always equal to one, and this
implies
lim s.t / D n:
t!1
Theorem 6.2 states that s decreases by one each time a zero of pT is passed, so for any
t 2 R, n s.t / gives us the number of zeros less than or equal to t . If we are looking
for the k-th eigenvalue, we simply check whether k n s.t / holds: in this case
we continue to search in Rt , otherwise we consider R>t . The resulting algorithm is
given in Figure 6.2.
Section 6.2 Gershgorin discs 141
The bisection method, particularly the evaluation of the function s, can react
very sensitively to rounding errors, since even small perturbations of pm .t / and
pm1 .t / may change the sign and therefore the result.
Remark 6.5 (Slicing the spectrum). The key ingredient of the method is the compu-
tation of the number of eigenvalues smaller and larger than a given value t . Instead of
examining a Sturm sequence, we can also solve this problem by computing factoriza-
tions I T D LDL of the shifted matrix and examining the number of negative
and positive diagonal elements of the matrix D [33, page 15].
and by
Combining this property with the definition of the norm yields the estimate
kABk1 kAk1 kBk1 for all A; B 2 F nn (6.6)
that leads to the following simple criterion for the invertibility of perturbed matrices.
Let i 2 ¹1; : : : ; nº be the index for which the maximum in (6.7) is attained. We
define a vector x 2 F n by
xj WD sgn.aN ij / for all j 2 ¹1; : : : ; nº
and find kxk1 D 1 and
ˇ ˇ ˇ ˇ
ˇ n ˇ ˇ n ˇ
ˇX ˇ ˇX ˇ X n
kAxk1
ˇˇ aij xj ˇˇ D ˇˇ aij sgn.aN ij /ˇˇ D jaij j D
ˇj D1 ˇ ˇj D1 ˇ j D1
Theorem 6.10 (Gershgorin spheres). Let A 2 F nn . We define discs (or intervals)
8 9
ˆ
ˆ >
>
< X n =
Gi WD 2 F W j ai i j jaij j for all i 2 ¹1; : : : ; nº:
ˆ >
>
:̂ j D1 ;
j ¤i
Then we have
[
n
.A/ Gi ;
i D1
i.e., the spectrum of A is contained in discs (or intervals) centered at the diagonal
elements.
Proof. (cf. [16, Satz II]) Let 2 .A/, and let
0 1
a11
B :: C
D WD @ : A
ann
be the diagonal part of A. If I D is not invertible, there has to be an index
i 2 ¹1; : : : ; nº such that D ai i , therefore we have 2 Gi .
Assume now that I D is invertible. Since is an eigenvalue, I A cannot be
invertible, and the same holds for
.I D/1 .I A/ D .I D/1 .I D .A D// D I .I D/1 .A D/:
Due to Propositions 6.8 and 6.9, this implies
8 9
<X n
ja d j =
1 k.I D/1 .A D/k1 D max
ij ij
W i 2 ¹1; : : : ; nº
: j ai i j ;
j D1
8 9
ˆ
ˆ >
>
<X n
jaij j =
D max W i 2 ¹1; : : : ; nº :
ˆ j ai i j >
>
:̂j D1 ;
j ¤i
144 Chapter 6 Bisection methods
We choose an index i 2 ¹1; : : : ; nº for which the maximum is attained and conclude
1 X
n X
n
1 jaij j; j ai i j jaij j;
j ai i j
j D1 j D1
j ¤i j ¤i
i.e., 2 Gi .
In the context of Sturm chain bisection methods, we can use this result to compute
an interval Œ˛; ˇ given by
Summary
Large sparse eigenvalue problems typically have an underlying matrix that has a very
large size, but also a special structure so that matrix-vector multiplications can ef-
ficiently be performed, but similarity transformations cannot, so different strategies
for the computation of eigenvalues and eigenvectors are needed. The most common
methods are Krylov subspace methods that can be used to compute some, but not
all, eigenvalues of the matrix. Among those, the Arnoldi iteration and the symmetric
Lanczos algorithm will be considered in detail in this chapter and also the conver-
gence behaviour of these methods will be investigated with the help of an important
tool: Chebyshev polynomials.
Learning targets
Introduce projection methods for large sparse eigenvalue problems.
Construct Krylov subspaces and investigate their basic properties.
Introduce the Arnoldi iteration and the symmetric Lanczos algorithm as exam-
ples of Krylov subspace methods.
Use Chebyshev polynomials to analyze the convergence behaviour of Krylov
subspaces methods.
500
1000
1500
2000
Figure 7.1. Depiction of a sparse matrix which represents the 5-point discrete Laplacian on a
square 50-by-50 grid using a nested dissection ordering of the gridpoints. Nonzero entries of
the matrix are plotted in blue. “nz” is the total number of nonzero entries.
from (1.2). We observe that regardless how large n is, there are at most three entries in
each row that are distinct from zero. Another more complicated example is depicted
in Figure 7.1. The matrix is of size 2304 2304, but only 11,328 of its 5,308,416
entries are nonzero.
Sparsity is a special matrix structure having two important properties that allow us
to deal with matrices of very large size n. First, it is not necessary to store the matrix
in the conventional way which would require to store all n2 entries. Instead, we only
need to store the nonzero entries and information on their positions within the matrix.
For example, instead of storing the matrix
0 1
1 2 0 0
B0 0 3 0 C
ADB @0 0 0 0 A
C
0 0 4 5
Section 7.1 Sparse matrices and projection methods 147
instead, where l is the list of nonzero entries of A, and r and c are vectors containing
the corresponding row and column indices of the entries, respectively. The matrix
A D .aij / can be easily reconstructed from these three vectors by setting
²
lk if rk D i and ck D j;
aij D (7.2)
0 else.
Exercise 7.1 (Storing sparse matrices). Construct vectors l, r, c, so that the ma-
trix generated by (7.2) corresponds to the matrix A in (7.1) for n D 6.
x ! A ! Ax (7.3)
For example, for our matrix A from (7.1) such a procedure that computes Ax from a
given vector x is presented in Figure 7.2. In this situation, we cannot apply similarity
transformations anymore, because A is not given explicitly. This also refers to the
case that we actually do store the matrix, because then similarity transformations may
(and do in general) destroy the sparsity of the matrix. We would then have to store
the matrix in the conventional way and this may not be possible because of its large
size. Therefore, the efficient QR iteration from Chapter 5 is no longer a method of
choice as it heavily manipulates the given matrix. Therefore, we have to come up with
148 Chapter 7 Krylov subspace methods for large sparse eigenvalue problems
Definition 7.2 (Ritz pair). Let A 2 F nn and let X F n be a subspace. Then a
pair .; x/ 2 F X is called a Ritz pair of A with respect to X if it satisfies the
Ritz–Galerkin condition
Ax x ? X:
If .; x/ is a Ritz pair, then is called a Ritz value and x is called the associated Ritz
vector.
We will turn to question 1. and 3. in the next section. Question 2., however, has a
fairly immediate answer if we are given an orthonormal basis of X, i.e., a basis with
pairwise perpendicular vectors that all have norm one.
Theorem 7.3 (Detection of Ritz pairs). Let A 2 F nn and let X F n be a subspace.
Furthermore let Q 2 F nm be an isometric matrix such that X D R.Q/, i.e., the
columns of Q form a orthonormal basis of X. Then the following two conditions are
equivalent for x D Qy 2 X, where y 2 F m ,
1. .; Qy/ 2 F X is a Ritz pair of A with respect to X;
2. .; y/ 2 F F m is an eigenpair of the m m matrix Q AQ.
Proof. .; y/ is an eigenpair of Q AQ if and only if
Q AQy D y D Q Qy
which itself is equivalent to
Q .AQy Qy/ D 0:
The latter identity means that AQy Qy is orthogonal to R.Q/ D X, i.e., .; Qy/
is a Ritz pair of A with respect to X.
The name refers to the Russian mathematician Alexei Krylov who used the se-
quence .x; Ax; A2 x; : : : / to find the coefficients of the characteristic polynomial of a
given matrix A, see [25]. Obviously, we always have dim Km .A; x/ m, but the di-
mension can be considerably smaller than m. For example, if x ¤ 0 is an eigenvector
of A, then we clearly have dim Km .A; x/ D 1 for all m 2 N.
150 Chapter 7 Krylov subspace methods for large sparse eigenvalue problems
˛0 x C ˛1 Ax C C ˛` A` x D 0:
p.A/ WD ˇ` A` C C ˇ1 A C ˇ0 In ;
where A 2 F nn . The vector space of all polynomials over F will be denoted by F Œt
and by Fm Œt we denote the subspace of all polynomials of degree not exceeding m.
Then we have the following characterization of Krylov subspaces.
Proof. Let y 2 Km .A; x/ D span.x; Ax; : : : ; Am1 x/. Then there exist coefficients
ˇ0 ; : : : ; ˇm1 2 F such that
where p.t / D ˇm1 t m1 C C ˇ1 t C ˇ0 which proves “”. On the other hand, if
we start with an arbitrary polynomial p.t / D ˇm1 t m1 C C ˇ1 t C ˇ0 , then the
identity in (7.6) implies p.A/x 2 Km .A; x/.
Let A 2 F nn , x 2 F n n ¹0º, and let ` be as in Theorem 7.5. Then from (7.4)
one finds that there exist coefficients ˇ0 ; : : : ; ˇ`1 2 F such that
Thus, in order to obtain an orthonormal basis of KkC1 .A; x/, the only task is to
orthonormalize the vector Ak x against the previously constructed vectors q1 ; : : : ; qk .
If we do this for k D 1; : : : ; m (assuming dim Kk .A; x/ D k for k D 1; : : : ; m), we
will finally obtain an orthonormal basis q1 ; : : : ; qm of Km .A; x/ that satisfies
For this purpose, the so-called Gram–Schmidt process is more appropriate than
Householder orthogonalization.
For deriving the corresponding algorithm in general, let us assume that we have m
linearly independent vectors v1 ; : : : ; vm 2 F n . Our task is to construct an orthonormal
basis q1 ; : : : ; qm of V D span.v1 ; : : : ; vm / such that
span.v1 ; : : : ; vk / D span.q1 ; : : : ; qk /; k D 1; : : : ; m:
Section 7.3 Gram–Schmidt process 153
*
6
v2
e
q 2 D v2 r12 q1
- - -
q1 v1 r12 q1
For the second step, we look for a vector e q 2 2 span.v1 ; v2 / D span.q1 ; v2 / that is
orthogonal to q1 . Note that eq 2 must have a component in the direction v2 (that is
e
q 2 D ˛1 q1 C ˛2 v2 , where ˛2 ¤ 0), because otherwise q1 and e q 2 would be linearly
dependent. As we have to normalize the vector e q 2 anyhow, we may assume without
loss of generality that ˛2 D 1, i.e., e
q 2 has the form
e
q 2 WD v2 r12 q1 ; (7.9)
for some r12 2 F. (The minus sign in (7.9) has been introduced for notational
convenience—this will become clear later.) Then the requirement that e
q 2 be orthogo-
nal to q1 implies
0 D he
q 2 ; q1 i D hv2 r12 q1 ; q1 i D hv2 ; q1 i r12 ;
The identity (7.11) can be interpreted geometrically in the following way: the vector
r12 q1 is the orthogonal projection of the vector v2 onto the subspace generated by q1 .
Thus, we obtain by v2 r12 q1 a vector that is orthogonal to q1 , see Figure 7.3.
Assuming we have already computed orthonormal vectors q1 ; : : : ; qk1 satisfying
span.v1 ; : : : ; vj / D span.q1 ; : : : ; qj /; j D 1; : : : ; k 1;
154 Chapter 7 Krylov subspace methods for large sparse eigenvalue problems
rj k D hvk ; qj i; j D 1; : : : ; k 1;
using hqi ; qj i D ıj i . Here, the vector r1k q1 C C rk1;k qk1 can be interpreted as
the orthogonal projection of vk onto the subspace spanned by q1 ; : : : ; qk1 . We then
obtain the vector qk as follows.
rj k WD hvk ; qj i; j D 1; : : : ; k 1
q k WD vk r1k q1 rk1;k qk1 ;
e
rkk WD keqk k
1
qk WD eqk :
rkk
are linearly dependent. Thus, in exact arithmetic a breakdown will never occur as long
as the vectors v1 ; : : : ; vm are linearly independent.
Remark 7.10 (Orthonormal bases for nested sets of subspaces). The application of
the Gram–Schmidt process to the linearly independent vectors v1 ; : : : ; vm 2 F n pro-
duces orthonormal vectors q1 ; : : : ; qm 2 F n with the additional property
.j / .j 1/ .j 1/
e
qk De
qk he
qk ; qj iqj ; j D 1; : : : ; k 1
156 Chapter 7 Krylov subspace methods for large sparse eigenvalue problems
.0/
starting with e
q k D vk . In exact arithmetic this procedure is equivalent to the Gram–
Schmidt procedure as we can show by induction that
.j /
X
j
e
qk D vk hvk ; qj iqi ; j D 0; : : : ; k 1:
i D1
Indeed, for j D 0 there is nothing to show and using the induction hypothesis for
j 1, we obtain that
e
qk
.j /
Deq k.j 1/ he
q k.j 1/ ; qj iqj
jX1
* 1
jX
+
D vk hvk ; qj iqi vk hvk ; qj iqi ; qj qj
i D1 i D1
X j
D vk hvk ; qj iqi ;
i D1
q k.k1/ D e
because hqi ; qj i D ıij . Thus, after k 1 steps we have e q k , where e
qk
is as in (7.12). The corresponding algorithm, the so-called modified Gram–Schmidt
process, is given in Figure 7.5.
A heuristic explanation why the modified Gram–Schmidt process performs better
.j /
than the classical Gram–Schmidt process is that in the computation of e
q k the vector
.j / .j 1/
e
qk is also orthogonalized to the errors made in the computation of e
qk .
Section 7.3 Gram–Schmidt process 157
Table 7.1. Defect from orthogonality in some variants of the Gram–Schmidt process. The
results were obtained in Matlab 7.11.0 with unit roundoff u D 253 1:1 1016 .
so the values rij have to be updated to rij C sij in the algorithm to yield the corre-
sponding QR factorization. In a similar way, reorthogonalization can be performed
in the modified Gram–Schmidt process. The corresponding algorithm is given in Fig-
ure 7.6.
One may think that reorthogonalizing the computed vector e q k multiple times
against q1 ; : : : ; qk1 would produce an even better result in finite precision arith-
metic. However, it has been observed empirically that in general it suffices to do the
reorthogonalization once. We refer the reader to [49] and [17] for details.
The Hilbert matrices Hn from Example 7.14 are highly ill-conditioned in the
sense that the columns of Hn are nearly linearly dependent. As a consequence,
the inverse of Hn contains very large entries even for moderate values of n. Thus,
Hilbert matrices represent a challenge to numerical algorithms and are often used
as test matrices in numerical experiments.
Exercise 7.15 (Hilbert matrices). Compute the inverse of the Hilbert matrices
H2 and H3 from (7.16).
Ak1 x D ˛1 q1 C C ˛k qk
which implies
Ak x D ˛1 Aq1 C C ˛k Aqk :
160 Chapter 7 Krylov subspace methods for large sparse eigenvalue problems
Note that Aq1 ; : : : ; Aqk1 2 Kk .A; x/, because q1 ; : : : ; qk1 2 Kk1 .A; x/. Thus,
with Ak x also Aqk must be contained in KkC1 .A; x/ n Kk .A; x/ from which we
obtain that
KkC1 .A; x/ D span.q1 ; : : : ; qk ; Ak x/ D span.q1 ; : : : ; qk ; Aqk /:
Since the vectors q1 ; : : : ; qm are orthonormal, we obtain that hij D hAqj ; qi i D 0 for
i D j C 2; : : : ; m. Thus, Hm is in Hessenberg form.
Remark 7.17 (Possible breakdown of the Arnoldi iteration). Note that the Arnoldi
iteration will break down if hkC1;k D 0 for some k < m. By (7.18), we have
hkC1;k ¤ 0 whenever Aqk 2 KkC1 .A; x/ n Kk .A; x/. If dim Km .A; x/ D m then
this is guaranteed for k < m. Under this hypothesis, the Arnoldi iteration will run
without breakdown for at least m 1 steps. For the mth step, however, there are two
possibilities:
Section 7.4 Arnoldi iteration 161
X
kC1
Aqk D hik qi ; for k D 1; : : : ; m 1;
i D1
X m
Aqm D him qi ;
i D1
2. If hmC1;m ¤ 0, then we can continue to perform the mth step of the Arnoldi
iteration to produce the next iterate qmC1 . Then we have
X
kC1
Aqk D hik qi ; for k D 1; : : : ; m;
i D1
or, equivalently,
0 1
h11 ::: ::: h1n
B :: :: C
Bh21 h22 : : C
B C
B :: :: :: C
A Œ q1 ; : : : ; qm D Œ q1 ; : : : ; qm ; qmC1 B 0 : : : C:
B C
B :: :: C
@ : : hm;m1 hmm A
0 ::: 0 hmC1;m
(7.20)
Note that the upper m m block of the .m C 1/ m matrix in (7.20) is just Hm
and that the .m C 1/st row is hmC1;m ım T , where ı denotes the mth canonical
m
unit vector of length m. Thus, we can write (7.20) in the shorter form
T
AQm D Qm Hm C hmC1;m qmC1 ım (7.21)
which is sometimes called an Arnoldi configuration.
Exercise 7.18 (Arnoldi configuration). Let A 2 F nn , let q1 ; : : : ; qm ; qmC1 2
F n be linearely independent (but not necessarily orthonormal) such that (7.20)
holds with hkC1;k ¤ 0 for k D 1; : : : ; m C 1. Show that
span.q1 ; : : : ; qk / D Kk .A; q1 /
for k D 1; : : : ; m C 1.
Exercise 7.19 (Krylov subspaces and Hessenberg matrices). Let A 2 F nn and
let Q D Œq1 ; : : : ; qn 2 F nn be an invertible matrix such that H D Q1 AQ is
in Hessenberg form. Show that
span.q1 ; : : : ; qk / D Kk .A; q1 /
Once we have successfully computed Ritz pairs, the question arises whether a given
Ritz pair .; v/ 2 F Km .A; x/ is a good approximation to an eigenpair. We answer
this question by slightly extending the notion of residual from Definition 4.7.
Definition 7.21 (Residual). Let A 2 F nn and let .; v/ 2 F F n n ¹0º. Then the
vector
r WD v Av
is called the residual of the pair .; v/ with respect to A.
Theorem 7.22 (Backward error from the residual). Let A 2 F nn and .; v/ 2 F
F n be such that kvk D 1. Then there exists a matrix E 2 F nn such that
v v D Av C v Av D v
.A C E/v D Av C .v Av/ „ƒ‚…
D1
Theorem 7.23 (Residual of Ritz pairs). Let A 2 F nn be a matrix and Qm 2 F nm ,
Hm 2 F mm , qmC1 2 F n , and hmC1;m 2 F be as in (7.21). If .; y/ 2 F .F m n¹0º/
is an eigenpair of Hm , then .; v/ with v D Qm y is a Ritz pair of A with respect to
Km .A; x/ satisfying
kv Avk D jhmC1;m j jym j;
where ym denotes the last entry of the vector y.
v Av D Qm y AQm y
(7.21) T
D Qm y .Qm Hm C hmC1;m qmC1 ım /y
D Qm .y Hm y/ hmC1;m qmC1 ym ;
„ ƒ‚ …
D0
Thus, the residual of the Ritz pair .; v/ can be immediately computed from the
two scalars hmC1;m and ym without explicitly forming the Ritz vector v and without
computing the matrix-vector product Av.
where ˛k WD hkk 2 R, because self-adjoint matrices have real diagonal elements, and
ˇk1 WD hk;k1 D ke q k k 2 R by (7.19) which implies hk1;k D hk;k1 D ˇk1 .
Thus, even in the case that we started with a complex self-adjoint matrix A, the matrix
H m D Qm AQ will be tridiagonal and real. With this notation and starting with an
m
arbitrary vector x 2 F n n ¹0º, the equations (7.18) and (7.19) simplify to
If only eigenvalues are needed, then one may use the Lanczos iteration without
reorthogonalization and refrain from storing the vectors q1 ; : : : ; qm except for
those from the two previous iterations. As a consequence, the number of steps
m can be taken much higher as with reorthogonalization, but on the other hand
a loss in orthogonality of the vectors q1 ; : : : ; qm will occur. This will typically
lead to the occurrence of ghost eigenvalues. These are multiple “copies” of some
eigenvalues of Hm , typically those that are well approximating the smallest and
largest eigenvalues of A. Thus, the Ritz values give the wrong impression that
the multiplicity of the corresponding eigenvalue of A is higher than it actually
is. We refer the reader to the monograph [45] for an illustrative example and a
heuristic explanation of this observation.
Example 7.24. Let A D diag.3; 2:99; 2:98; : : : ; 2:02; 2:01; 2; 1/ 2 R102102 , i.e.,
the diagonal matrix with eigenvalues k D 3 .k 1/=100 for k D 1; : : : ; 101 and
102 D 1. Applying the Arnoldi iteration (or Lanczos iteration) with the start vector
x 2 R102 with all entries equal to one, we obtain after 10 steps the 1010 matrix H10
having the eigenvalues i , i D 1; : : : ; 10 displayed in the following table rounded to
five significant digits:
i i minj ji j j
1 2:9840 6:2933 1004
2 2:9262 3:7955 1003
3 2:8192 7:5917 1004
4 2:6786 1:4213 1003
5 2:5180 1:9794 1003
6 2:3539 3:8811 1003
7 2:2038 3:7657 1003
8 2:0851 4:8982 1003
9 2:0130 2:9747 1003
10 1:0000 3:8769 1012
Apparently, the eigenvalue 102 D 1 of A in Example 7.24 is well approximated
by a Ritz value of A with respect to K10 .A; x/. At first, this may be a surprise,
because K10 .A; x/ is spanned by x; Ax; : : : ; A9 x and so by the results of Chapter 4
one may expect that we find a good approximation to eigenvectors associated with the
dominant eigenvalue 1 D 3 in K10 .A; x/. However, the gap between 1 and 2 is
rather small, and so the convergence of the power method for A is rather slow which
explains why we did not yet find a good approximation to an eigenvector associated
with 1 D 3 after only ten iterations. On the other hand, we see that the smallest
eigenvalue 102 is approximated with an absolute error as small as 3:8769 1012 .
How can this be explained?
For the remainder of this section, we will assume that A is self-adjoint in order to
keep the discussion simple. Thus, we may assume that A 2 F nn has the n eigen-
values 1
n in descending order and a corresponding orthonormal basis of
eigenvectors e1 ; : : : ; en associated with 1 ; : : : ; n . The start vector x 2 F n n ¹0º
can then be represented as a linear combination of this basis vectors, i.e., there exists
Section 7.6 Chebyshev polynomials 167
Remark 7.25 (Elements of Krylov subspaces). Let A 2 F nn be self-adjoint and let
Pn of F ofn eigenvectors associated with the eigenvalues
n
e1 ; : : : ; en be a basis
1 ; : : : ; n . If x D i D1 ˛i ei 2 F , ˛1 ; : : : ; ˛n 2 F, and if p 2 FŒt , then
X
n
p.A/x D ˛i p.i /ei :
i D1
Assuming without loss of generality that kxk2 D 1 (we can always scale the start
vector without changing the corresponding Krylov subspace), we obtain that
X
j˛i j2 D 1 j˛j j2 D 1 jhx; ej ij D 1 cos2 †.x; ej / D sin2 †.x; ej /:
i¤j
If there exists a polynomial p 2 Fm Œt with p.j / D 1 and max jp.i /j ", then
i ¤j
Lemma 7.26 states that if " is small and ˛j ¤ 0, then the vector p.A/x is a good
approximation to the vector ˛j ej and hence to an eigenvector associated with the
eigenvalue j . Revisiting Example 7.24, we see that the minimal eigenvalue 102 is
well separated from the remaining eigenvalues in the interval Œ2; 3. If we choose a
polynomial
Y10
pDc .t i / 2 F10 Œt ; c 2 R
i D1
having all zeros in the interval Œ2; 3, and if we choose c such that p.102 / D 1, we
can expect due to the continuity of polynomials that
where " is small. Thus, K10 .A; x/ can be expected to contain good approximations
to eigenvectors associated with 102 . In order to obtain quantitative statements, it is
beneficial to construct p with the help of Chebyshev polynomials.
T0 WD 1;
T1 WD t;
TkC2 WD 2t TkC1 Tk for all k 2 N0 ;
Example 7.29 (Low degree Chebyshev polynomials). The first six Chebyshev poly-
nomials have the form
T0 D 1; T3 D 4t 3 3t;
T1 D t; T4 D 8t 4 8t 2 C 1;
T2 D 2t 2 1; T5 D 16t 5 20t 3 C 5t:
The Chebyshev polynomials have other representations that are quite useful for
theoretical considerations.
Using the addition theorem cos.x C y/ D cos x cos y sin x sin y we then obtain for
k 2 N0 that
and
where we also used cos.x/ D cos.x/ and sin.x/ D sin.x/. Adding both equa-
tions yields
which implies that the functions fn satisfy the recursive formula of Definition 7.27.
2. For a fixed t 2 C, the equation
z C z 1
tD (7.23)
2
170 Chapter 7 Krylov subspace methods for large sparse eigenvalue problems
p
has two solutions z1=2 D t ˙ t 2 1 (for some branch of the complex square root)
that satisfy z2 D z11 , because
p p
z1 z2 D .t C t 2 1/.t t 2 1/ D t 2 .t 2 1/ D 1:
z C z 1 z n C z n z n1 C z .n1/
2 t gn .t / gn1 .t / D 2
2 2 2
z nC1 C z nC1 C z n1 C z n1 z n1 z .n1/
D
2
z nC1 C z .nC1/
D D gnC1 .t /;
2
i.e., the functions gn , n 2 N satisfy the recursive formula of Definition 7.27 which
implies Tn .t / D gn .t / for all t 2 C.
2. max jTn .t /j D 1.
t2Œ1;1
Z1
1
Tn .t /Tm .t / p dt D 0
1 t2
1
.1 t 2 /Tn00 .t / t Tn0 .t / C n2 Tn .t / D 0:
What makes Chebyshev polynomials so useful is the fact that they satisfy an opti-
mality condition. Recall that a polynomial is called monic if the leading coefficient
is equal to one. Note that the Chebyshev polynomials are not monic for n > 1, but
we can easily make them monic by normalizing them as T en D 21n Tn , because by
Exercise 7.28 the leading coefficient of Tn is 2n1 .
jcj bn .t /j D
D max jT min max jp.t /j : (7.25)
Tn .
/ t2Œ1;1 p2Rn Œt t2Œ1;1
p./Dc
en .t /j D 21n :
max jT
t2Œ1;1
172 Chapter 7 Krylov subspace methods for large sparse eigenvalue problems
Note that
> 1 if 1 > 2 . Using this transformation, we obtain the following
optimality condition.
has the smallest maximum norm on the interval Œn ; 2 . In particular, setting
p
1 n 1
WD and % WD p ;
1 2 C1
we have
1 2%m
min max jp.t /j D max jpm .t /j D D : (7.27)
p2Rn Œt t2Œn ;2 t2Œn ;2 Tm .
/ 1 C %2m
p.1 /D1
Proof. By applying part 2 of Theorem 7.24, we immediately obtain the first and the
second identity in (7.27). For the third identity, we have to calculate the value of
1=Tm .
/. The trick is to use the second alternative representation
p of Tn from Theo-
rem 7.30. Let us pick the smaller solution z D
1 of the two solutions of
2
the equation
z C z 1
D :
2
(Note that both solutions are real, because of
> 1.) Then we have
2
2 1 2 .1 2 /2 1 2
1D 2 C1 1D4 C4
2 n .2 n / 2 2 n
.1 2 /2 C .1 2 /.2 n / .1 2 /.1 n /
D4 D4 :
.2 n /2 .2 n /2
This implies
q p p
2.1 2 / C .2 n / 2 1 2 1 n
2 1D
2 n
p p
.1 2 / 2 1 2 1 n C .1 n /
D
2 n
p p
. 1 2 1 n /2
D p
. 2 n /2
174 Chapter 7 Krylov subspace methods for large sparse eigenvalue problems
0s s s 12
2A
D@
1 2 1 n 1
2 n 2 n 1 2
0 s 12
1 2 @ 1 n A
D 1 :
2 n 1 2
Using
1 2 1 2 1
D D ;
2 n .1 n / .1 2 / 1
we finally obtain
q p p p
.1 /2 . 1/2 1
2 1D D p p Dp D %:
1 . C 1/. 1/ C1
1 2 2%m
D m D
Tm .
/ % C %m %2m C 1
Theorem 7.36 (Convergence rate for Krylov subspace methods I). Let A 2 F nn be
self-adjoint with eigenvalues 1 > 2
n , where 2 > n and let e1 ; : : : ; en
be a corresponding orthonormal basis of eigenvectors such that Aei D i ei for i D
1; : : : ; n. Furthermore let m 2 N and
X
n X
n
xD ˛i ei 2 F n ; j˛i j D 1:
i D1 i D1
Then with the notation of Lemma 7.35 there exists a polynomial pm 2 Fm Œt with
2%m
k˛1 e1 pm .A/xk2 sin †.x; e1 /:
1 C %2m
Note that Theorem 7.36 simply contains an existence statement: we know about
the existence of the vector pm .A/x in the Krylov subspace Km .A; x/, but we do not
know how to construct this vector. However, concerning eigenvalues, we can now
give a relation between the largest eigenvalue 1 and the corresponding largest Ritz
value with respect to Km .A; x/.
Section 7.7 Convergence of Krylov subspace methods 175
Theorem 7.37 (Convergence rate for Krylov subspace methods II). Let A 2 F nn be
self-adjoint with eigenvalues 1 > 2
n , where 2 > n and let e1 ; : : : ; en
be a corresponding orthonormal basis of eigenvectors such that Aei D i ei for i D
1; : : : ; n. Furthermore let m 2 N and
X
n X
n
n
xD ˛i ei 2 F ; j˛i j D 1;
i D1 i D1
where ˛1 ¤ 0. If 1 is the largest Ritz value of A with respect to Km .A; x/, then with
the notation of Lemma 7.35 we have
2
2%m1
1
1
1 .1 n / tan2 †.x; e1 /:
1 C %2.m1/
Proof. Without loss of generality, we may assume that dim Km .A; x/ D m. Other-
wise, the result is trivial, because Km .A; x/ is an invariant subspace which must con-
tain e1 because of ˛1 ¤ 0, see Exercise 7.38. Thus, let the columns of Qm 2 F mn
form an orthonormal basis of Km .A; x/. Then the largest Ritz value 1 of A with
AQ and thus, by
respect to Km .A; x/ is the largest eigenvalue of the matrix Qm m
Exercise 3.9 we have
y Qm
AQ y
m
1 D max hQm AQm y; yi D max :
y2F y2F n¹0º yy
kykD1
Recalling that
R.Qm / D Km .A; x/ D ¹p.A/x j p 2 Fm1 Œt º
and letting pm1 denote the transformed Chebyshev polynomial from Theorem 7.36
which satisfies pm1 .1 / D 1, we obtain
x p.A/ Ap.A/x x pm1 .A/ Apm1 .A/x
1 D max
Pn
i D2 j˛i j D 1 j˛1 j2 . The result then follows from
where we have used 2
X
n
xD ˛i ei 2 F n n ¹0º;
i D1
Example 7.39 (Convergence of the largest Ritz value I). We illustrate the result of
Theorem 7.37 for the diagonal 100 100 matrix A with diagonal entries akk D
1 C k=100 for k D 1; : : : ; 99 and a100;100 D 2:2, i.e.,
Thus, there is a sufficiently large gap between the largest eigenvalue 1 D 2:2 and the
remaining eigenvalues 2 ; : : : ; 100 2 Œ1; 1:99. Choosing the start vector
0 1
1
1 B:C 100
xDp @ :: A 2 R ;
100
1
.m/ .m/
we obtain the following error 1 1 for the largest Ritz value 1 with respect
to Km .A; x/ as plotted in Figure 7.9.
As one can see, the convergence rate corresponds quite well to the convergence
rate predicted by the bound from Theorem 7.37. In fact the error is a little bit smaller
than predicted by the bound, but this effect is due to the simplifications we performed
in the proof of Theorem 7.37. Numerical experiments show that the bound becomes
much tighter if the gap between 1 and 2 is very large.
Example 7.40 (Convergence of the largest Ritz value II). For a second test, let B be
the diagonal matrix with diagonal entries bkk D 1 C k=100 for k D 1; : : : ; 100, i.e.,
0
10
−2
10
−4
10
−6
10
−8
10
−10
10
−12
10
0 5 10 15 20 25 30 35 40 45 50
In contrast to Example 7.39, the gap between the largest and second largest eigenval-
ues 1 and 2 is now much smaller. Choosing the same start vector x as in Exam-
.m/ .m/
ple 7.39, we obtain the following error 1 1 for the largest Ritz value 1 with
respect to Km .A; x/ as plotted in Figure 7.10.
In this case, the bound is quite an overestimate of the actual error. Moreover, the
convergence seems to be superlinear at it seems that from step 25 onwards, the actual
convergence rate becomes better than the linear convergence rate predicted by the
bound from Theorem 7.37.
How can the observation in Example 7.40 be explained? Why does the convergence
rate seem to accelerate for increasing values of m? The reason is that for computing
the bound in Theorem 7.37, we used the transformed Chebyshev polynomial pm1
which has the minimal maximum norm on the interval Œn ; 2 under the constraint
pm1 .1 / D 1. However, for our particular purpose, we do not need a polynomial that
has minimal maximum norm on a whole interval, but it is sufficient if the polynomial
is small on the eigenvalues n ; : : : ; 2 . For example, we could use the polynomial
t 2
b
p m2 .t / D pm2 .t /;
1 2
where pm2 is the transformed Chebyshev polynomial of degree m 2 from Theo-
rem 7.36 with pm2 .1 / D 1 and minimal maximum norm on the interval Œn ; 3 .
178 Chapter 7 Krylov subspace methods for large sparse eigenvalue problems
0
10
−2
10
−4
10
−6
10
−8
10
−10
10
−12
10
0 5 10 15 20 25 30 35 40 45 50
.m/ .m/
Figure 7.10. The error 1 1 of the largest Ritz value 1 of the matrix B in Exam-
ple 7.40 with respect to Km .A; x/ for m D 1; : : : ; 50. The error is plotted as a solid line, the
dotted line is the bound from Theorem 7.37.
Theorem 7.41 (Convergence rate for Krylov subspace methods III). Let A 2 F nn
be self-adjoint with eigenvalues 1
2 > 3
n , where 3 > n and
let e1 ; : : : ; en be a corresponding orthonormal basis of eigenvectors such that Aei D
i ei for i D 1; : : : ; n. Furthermore let m 2 N and
X
n X
n
xD ˛i ei 2 F n ; j˛i j D 1;
i D1 i D1
where ˛1 ¤ 0. If p
1 n b1
b
WD and b
% WD p
1 3 b
C1
then
2 2 Pn
n 2 %m2
2b i D3 j˛i j
2
1
1
1 .1 n / :
1 2 1 Cb
%2.m2/ j˛1 j2
where 1 is the largest Ritz value of A with respect to Km .A; x/.
Section 7.7 Convergence of Krylov subspace methods 179
t 2
p m2 .t / D
b pm2 .t /
1 2
instead of pm1 .
Note that if 2 > 3 , then b % < %. Thus, asymptotically the bound from Theo-
rem 7.41 will be smaller than the one from Theorem 7.37. However, this may only
be relevant for sufficiently large m as the new bound has a different constant now and
the exponent has reduced from m 1 to m 2.
Example 7.43 (Convergence of the largest Ritz value III). Let C 2 R100100 be the
diagonal matrix with diagonal entries
Thus, now have a nice gap between 2 and 3 , whereas the gap between 1 and 2
is rather small. Choosing the start vector as in Example 7.39, we obtain the following
.m/ .m/
error 1 1 for the largest Ritz value 1 with respect to Km .A; x/ as plotted in
Figure 7.11.
This time, the convergence rate corresponds quite well to the convergence rate pre-
dicted by the bound from Theorem 7.41 while the bound from Theorem 7.36 is a huge
overestimate due to the fact that the gap between 1 and 2 is rather small.
The observation from Example 7.43 can be generalized to explain also the effect
observed in Example 7.40. If m is sufficiently large, then Km .A; x/ contains poly-
nomials bp mk that have zeros in 2 ; : : : ; k , that satisfy b
p mk .1 / D 1, and that
have small norm in the interval Œn ; kC1 . For such polynomials, the relevant gap
in eigenvalues is the gap between kC1 and 1 rather than 2 and 1 . Thus, the
convergence rate accelerates as m increases.
Exercise 7.44 (Convergence of the largest Ritz value). Under the hypothesis
k > kC1 > n construct polynomials b p mk as outlined above and obtain
a result analogous to the one of Theorem 7.41 by improving the bound b
% to
p
e
1 1 n
% WD p
e with e WD :
eC1 1 k
180 Chapter 7 Krylov subspace methods for large sparse eigenvalue problems
0
10
−2
10
−4
10
−6
10
−8
10
−10
10
−12
10
0 5 10 15 20 25 30 35 40 45 50
.m/ .m/
Figure 7.11. The error 1 1 of the largest Ritz value 1 of the matrix C in Exam-
ple 7.43 with respect to Km .A; x/ for m D 1; : : : ; 50. The error is plotted as a solid line,
the dotted line is the bound from Theorem 7.37, and the dashed line is the bound from Theo-
rem 7.41.
Exercise 7.45 (Convergence of the second largest Ritz value). Let A 2 F nn be
self-adjoint with eigenvalues 1
n , where 1 > 2 > 3 > n . Derive
a bound similar to the one in Theorem 7.37 that gives an estimate for the error
2 2 , where 2 is the second largest Ritz value of A with respect to Km .A; x/.
Hint: Construct a polynomial ep m2 with e p m2 .1 / D 0 and e
p m2 .2 / D 1
that has small norm on the interval Œn ; 3 .
Another important point in Krylov subspace method are restarts. At first, the
start vector x in the Arnoldi or Lanczos iteration may be chosen randomly, but
after a few iterations, we may actually find a vector b x in the computed Krylov
subspace Km .A; x/ that would have been a better start vector. We could then
restart our iteration with the new start vector bx . In the Arnoldi iteration, this
may be a huge advantage, because the summation in (7.18) becomes more and
more expensive by each step, because in each step an additional summand occurs
in the formula. A commonly used and rather effective methods is the so-called
implicitly restarted Arnoldi iteration. For further reading we refer to [2, 49].
There are other algorithms for computing eigenvalues and eigenvectors of large
sparse matrices. Two among those are the Jacobi Davidson method and Precon-
ditioned Inverse iteration (PINVIT).
The Jacobi Davidson method introduced in [40] is a projection method, but
uses other subspaces instead of Krylov subspaces. The basic idea is to enlarge the
current subspace R.Qm / in such a way that a particular eigenpair .; v/ of A that
is approximated by a Ritz pair .; Qm y/ will be approximated much better by a
Ritz pair with respect to the enlarged subspace R.QmC1 /. The resulting method
is particularly suited for the computation of eigenvalues in the interior of the
spectrum and leads to quadratic convergence of the selected Ritz pairs. However,
typically only one Ritz pair converges at a time and once it has converged one
focuses on the next Ritz pair. This is different to the case of Krylov subspaces
where convergence of several Ritz pairs takes place simultaneously. For details,
we refer the reader to [40] and [41].
The principal idea of PINVIT can be described as follows. If we consider
inverse iteration and scale the next iteration vector by the Rayleigh quotient, the
resulting iteration equation can be reformulated as
Summary
Generalized eigenvalue problems are the natural extension of the standard eigenvalue
problem. Their discussion will lead to the theory of matrix pencils. Concerning the
solution of generalized eigenvalue problems, a generalization of the efficient QR iter-
ation introduced in Chapter 5, the QZ algorithm, turns out to be an efficient method.
It is based on a generalization of the Schur decomposition introduced in Chapter 2.
An even further generalization of the standard eigenvalue problem are polynomial
eigenvalue problems that can be reduced to generalized eigenvalue problems by the
concept of linearization.
Learning targets
Introduce matrix polynomials and matrix pencils.
Introduce the eigenvalue 1.
Generalize the Schur decomposition to matrix pencils.
Generalize the QR algorithm to the corresponding QZ algorithm for matrix
pencils.
X̀
Ak y .k/ D A` y .`/ C A`1 y .`1/ C C A2 y 00 C A1 y 0 C A0 y D 0; (8.1)
kD0
X̀
P .t / D t k Ak (8.2)
kD0
P ./x D 0 (8.3)
How can we solve the polynomial eigenvalue problem (8.3)? For a fixed 2 F , the
equation P ./x D 0 has a nontrivial solution if the matrix P ./ 2 F nn is singular,
i.e., if det P ./ D 0.
X
`1
y10 D y2 ; : : : ; y`1
0
D y` ; A` y`0 C Ak ykC1 D 0;
kD0
or, equivalently,
0 1 0 10 0 10 1
In 0 : : : 0 y1 0 In 0 y1
B : : : C B
: : : : :: C By2 CC B :: B
C By2 C
B0 CB C
:: ::
B CB C C B : : :
C :C
B :: : : C B :: C B
@0 0 In A B C D 0:
@: : In 0 A @ : A ::: @ :: A
0 : : : 0 A` y` A0 A1 : : : A`1 y`
184 Chapter 8 Generalized and polynomial eigenvalue problems
Note that the eigenvalue problem (8.5) is linear in the sense that is depends linearly
on the eigenvalue parameter . If A` D In , then this eigenvalue problem reduces to
a standard eigenvalue problem with an n` n` matrix, but if A` ¤ In , then it is a
special case of a so-called generalized eigenvalue problem.
Definition 8.6 (Matrix pencil). Let E; A 2 F nn . Then the matrix polynomial
L.t / D tE A is called a matrix pencil. The corresponding polynomial eigenvalue
problem L./x D 0 is called a generalized eigenvalue problem.
The reason for introducing the minus sign in the term tE A is the fact that the
corresponding generalized eigenvalue problem can then be written in the convenient
form Ex D Ax.
Section 8.2 Matrix pencils 185
Example 8.7 (Number of eigenvalues). Let L1 .t / D tE1 A1 and L2 .t / D tE2 A2 ,
where
0 1 1 0 0 1 1 0
E1 D ; A1 D ; E2 D ; A2 D :
0 0 0 0 0 0 0 1
Then for all 2 C we have
1 1
det.E1 A1 / D det D 0 and det.E2 A2 / D det D 1;
0 0 0 1
thus E2 A2 is nonsingular for all 2 C and consequently L2 ./ does not have
any eigenvalues according to Definition 8.2. On the other hand, E1 A1 is singular
for all 2 C and for each 2 C, the vector
v D
1
is an eigenvector of L1 associated with the eigenvalue .
The pencil L1 .t / in Example 8.7 is a pathologic case and we will therefore restrict
ourselves to pencils for which the determinant is not identically zero.
Definition 8.8 (Regular pencils). An nn matrix pencil L.t / over F is called regular
if det L.t / is not the zero polynomial. Otherwise, the pencil is called singular.
ee
Show that for any " > 0 and any 1 ; 2 2 R, there exists a regular pencil t E A
having the eigenvalues 1 and 2 such that
e Ek2 ; ke
max.kE A Ak2 / ":
Thus, singular matrix pencils are extremely ill-conditioned in the sense that arbi-
trarily small perturbations make them regular pencils with arbitrarily prescribed
eigenvalues.
186 Chapter 8 Generalized and polynomial eigenvalue problems
A key idea for solving the standard eigenvalue problem Ax D x, where A 2
F nn , was the application of similarity transformations to A. There is a similar con-
cept for matrix pencils. Let E; A 2 F nn and let x 2 F n n¹0º be an eigenvector of the
pencil tE A associated with the eigenvalue 2 F, i.e., we have Ex D Ax. Multi-
plying this equation from the left by a nonsingular matrix P 2 F nn and introducing
x WD Q1 x, where Q 2 F nn is nonsingular, we find
b
x D PAQb
PEQb x:
b b
Thus, any scalar is an eigenvalue of the pencil t E A WD tPEQ PAQ if and only
bb
if it is an eigenvalue of the pencil tE A, and any associated eigenvector of t E A
can be transformed to an eigenvector of tE A associated with by multiplication
with Q.
Definition 8.10 (Equivalence of pencils). Two nn matrix pencils L1 .t / D tE1 A1
and L2 .t / D tE2 A2 over F are called equivalent if there exist nonsingular matrices
P; Q 2 F nn such that
Remark 8.11 (Equivalence and similarity). Let A 2 F nn . Then the standard eigen-
value problem Ax D x can also be interpreted as a generalized eigenvalue problem
of the form In x D Ax. Thus, any matrix A can be canonically identified with the
matrix pencil tIn A. Two matrix pencils L1 .t / D tIn A1 and L2 .t / D tIn A2
with A1 ; A2 2 F nn are equivalent if and only if there exist nonsingular matri-
ces P; Q such that
PQ D In and A2 D PA1 Q:
The first identity immediately implies P D Q1 and thus the pencils L1 .t / and L2 .t /
are equivalent if and only if A1 and A2 are similar. Thus, Definition 8.10 is the natural
extension of Definition 2.17 to matrix pencils.
e WD .E A/1 E
E and e
A WD .E A/1 A
commute.
.˛1 ; ˇ1 / .˛2 ; ˇ2 / W” ˛1 ˇ2 D ˛2 ˇ1 :
Then for ˛; ˇ 2 F with ˇ ¤ 0 satisfying ˛Ex D ˇAx for some nonzero vector
x 2 F n ¹0º, the equivalence class
˛ ; ě/ j . e
Œ.˛; ˇ/ WD ¹. e ˛ ; ě/ .˛; ˇ/º
˛
can be identified with the eigenvalue D ˇ
of tE A. Note that there is an
additional equivalence class
1 WD ¹.˛; 0/ j ˛ 2 F n ¹0ºº:
˛Ex D 0
for all ˛ ¤ 0. Thus, we interpret the equivalence class 1 as the so-called infinite
eigenvalue of tE A. Associated eigenvectors are all nonzero vectors from the
null space of E.
Definition 8.16 (Algebraic and geometric multiplicity). Let E; A 2 F nn such that
L.t / D tE A is regular and let 2 F [ ¹1º be an eigenvalue of L.t /.
1. If ¤ 1 then the multiplicity a .E; A; / of as a zero of det.tE A/ is
called the algebraic multiplicity of .
2. If D 1 and if k 2 ¹0; 1; 2; : : : ; nº is the degree of the polynomial det.tE A/,
then a .E; A; 1/ D n k is called the algebraic multiplicity of 1.
3. If ¤ 1 then g .E; A; / WD dim N .E A/ is called the geometric multi-
plicity of .
4. If D 1 then g .E; A; 1/ WD dim N .E/ is called the geometric multiplicity
of 1.
Section 8.3 Deflating subspaces and the generalized Schur decomposition 189
Indeed, the subspace on the right hand side of the equivalence is contained in X if X
is invariant with respect to A, and hence its dimension does not exceed k. On the other
hand, the space also contains X, and therefore it must be of dimension exactly k. We
can now use this equivalence to generalize the concept of invariant subspaces to the
case of matrix pencils.
dim¹Ex C Ay j x; y 2 Xº k:
The inequality in Definition 8.19 is due to the fact that the pencil is allowed to be
singular which may lead to a drop in the dimension.
190 Chapter 8 Generalized and polynomial eigenvalue problems
Part 3 of Theorem 8.21 is the reason for the name deflating subspace that was
introduced in [42], because the knowledge of a deflating subspace allows to deflate
the eigenvalue problem .E A/x D 0 into two smaller subproblems with the pencils
tE11 A11 and tE22 A22 .
Proof. The proof proceeds by induction on n. The case for n D 1 is trivial. Thus, let
us assume n > 1 and that the assertion of the theorem holds for n 1. Let 2 C and
x 2 C n n ¹0º be such that
Ex D Ax:
Such a choice is always possible. Indeed, if tE A is regular, then let be an
eigenvalue and x be an associated eigenvector. If, on the other hand, tE A is
singular, then det.tE A/
0 and for an arbitrary 2 C choose a nonzero vector x
from the kernel of E A. Then span.x/ is a deflating subspace of tE A. Thus, by
Theorem 8.21, there exist unitary matrices Q; Z 2 Cnn , where the first column z1
of Z spans span.x/ such that
r11 E12 s11 A12
tQ EZ Q AZ D t ;
0 E22 0 A22
192 Chapter 8 Generalized and polynomial eigenvalue problems
where r11 ; s11 2 C and E22 ; A22 2 C .n1/.n1/ . By the induction hypothesis, there
exist unitary matrices Q2 ; Z2 2 C .n1/.n1/ such that Q2 E22 Z2 and Q2 A22 Z2
are both upper triangular. Setting then
1 0 1 0
Q D Q1 ; Z D Z1 ;
0 Q2 0 Z2
we obtain the desired decomposition. The remainder of the proof then follows from
Y
n
det.tE A/ D det.Q/ det.tR S / det.Z / D det.QZ / .t rkk skk /:
kD1
Exercise 8.25. Let E; A 2 C nn be Hermitian. Show that if the matrix pencil
tE A has a diagonal generalized Schur decomposition, then E and A commute,
i.e., EA D AE.
tQ EZ Q AZ D tR S (8.7)
Since the inverse of an upper triangular matrix and the product of two upper tri-
angular matrices are again upper triangular matrices, the latter identity means that
Q .AE 1 /Q D SR1 is a Schur decomposition of the matrix AE 1 . Thus, a pos-
sible idea is to first compute a unitary matrix Q that transforms AE 1 into Schur
form and then to compute a unitary Z so that (8.7) holds. The standard method for
computing Q is, of course, the QR iteration from Chapter 5. However, E may be ill-
conditioned (or even singular), so we do not want (or are not able) to form the inverse
E 1 and the product AE 1 . Instead, we will try to develop a method that implicitly
performs the QR iteration for AE 1 , but works directly on the matrices A and E.
Section 8.4 Hessenberg-triangular form 193
Before we do so, recall that the QR iteration requires only O.n2 / operations for Hes-
senberg matrices as opposed to O.n3 / operations for arbitrary matrices. Therefore, a
related reduction for matrix pencils would be useful. Observe that if the pencil tE A
is such that A is a Hessenberg matrix and E is upper triangular and invertible, then
also AE 1 is a Hessenberg matrix.
In the following, we will show constructively that for arbitrary E; A 2 F nn there
exist unitary matrices Q; Z 2 F nn such that tQ EZ Q AZ is in Hessenberg-
triangular form. We will depict the general procedure examplarily for n D 5. Thus,
we start with matrices of the form
0 1 0 1
B C B C
B C B C
EDB B C ; A D B C :
C B C
@ A @ A
In the first step, we compute the QR factorization of E (e.g., with the help of House-
holder reflections, see Section 4.7), i.e., we compute a unitary Q1 2 F nn such that
Q1 E is upper triangular. Then we have
0 1 0 1
B C B C
B C B C
.1/
E WD Q1 E D B B C ; A WD Q1 A D B
C .1/
C;
B C
@ A @ A
where blanks stand for zero entries.
In the following, we will transform A.1/ step by step to Hessenberg form and simul-
taneously keep the triangular form of E .1/ . Alternatingly we will apply elimination
steps and restoration steps. Thus, if k is odd, then after the kth step we eliminate an
element of A.k/ with the help of a Givens rotation (introduced in Section 5.2), and
in the next step, we restore the triangular form of E .kC1/ , again with the help of a
Givens rotation. We start by applying a Givens rotation G .1/ that transforms the forth
and fifth row in order to eliminate the .5; 1/-element of A.1/ :
0 1 0 1
B C B C
B C B C
.2/
E WD G E D B .1/ .1/ B C ; A WD G A D B
C .2/ .1/ .1/
C:
B C
@ ˝˝ A @ ˝ ˝ ˝ ˝ ˝A
˝ 0 ˝˝˝˝
194 Chapter 8 Generalized and polynomial eigenvalue problems
Note that this transformation destroys the upper triangular form of E .1/ as it intro-
duces a fill-in in the .5; 4/-position of the matrix. But we can restore the triangular
form by multiplying E .2/ with a Givens rotation G .2/ which manipulates the forth
and fifth columns of E .2/ . Application of this Givens rotation to A.2/ will keep the
zero entry in the .5; 1/-position, because the first column remains unchanged.
0 1 0 1
˝˝ ˝˝
B ˝ ˝C B ˝ ˝C
B C B C
.3/ .2/ .2/
E WD E G D B B ˝ ˝C ; A WD A G D B
C .3/ .2/ .2/
˝ ˝C:
B C
@ ˝˝ A @ ˝ ˝A
0 ˝ ˝˝
Next, we perform an elimination step to eliminate the .4; 1/-element of A.3/ by ap-
plying a Givens rotation G .3/ to the rows three and four (this time producing a fill-in
in the .4; 3/-position of E .3/ ), followed by a restauration step to restore the triangular
form of E .3/ by applying a Givens rotation G .4/ to the columns three and four:
0 1 0 1
B C B C
B C B C
E WD G E D B
.4/ .3/ .3/
B ˝ ˝ ˝C .4/ .3/ .3/
C ; A WD G A D B˝ ˝ ˝ ˝ ˝C ;
B C
@ ˝ ˝A @ 0 ˝ ˝ ˝ ˝A
0 1 0 1
˝ ˝ ˝ ˝
B ˝ ˝ C B ˝ ˝ C
B C B C
E .5/ WD E .4/ G .4/ DB
B ˝ ˝ CC; A.5/ WD A.4/ G .4/ DB
B ˝ ˝ C
C:
@ 0 ˝ A @ ˝ ˝ A
˝ ˝
We continue this procedure until we have annihilated all but the first two elements
of the first column of A. In the case n D 5, we need one more pair of elimination-
restauration-steps:
0 1 0 1
B ˝ ˝ ˝ ˝C B˝ ˝ ˝ ˝ ˝C
B C B C
E WD G E D B ˝ ˝ ˝C ; A WD G A D B
.6/ .5/ .5/ B C .6/ .5/ .5/
B 0 ˝ ˝ ˝ ˝C ;
C
@ A @ A
0 1 0 1
˝˝ ˝ ˝
B ˝ ˝ C B ˝ ˝ C
B C B C
E .7/ WD E .6/ G .6/ DB
B 0 ˝ C;
C A.7/ WD A.6/ G .6/ DB
B ˝ ˝ C
C:
@ A @ ˝ ˝ A
˝ ˝
Section 8.4 Hessenberg-triangular form 195
Observe that at this stage we cannot proceed with eliminating the element in the
.2; 1/-position of A.7/ , because this would produce a fill-in in the .2; 1/-position
of E .7/ , and the following restauration step would work on the first and second
columns of our matrices, thereby destroying all the strenuously obtained zeros in A.7/ .
Instead, we now continue with the second column of A.7/ , again starting from the
bottom. Note that all our transformation will preserve the zero structure of the first
column of A.7/ :
0 1 0 1
B C B C
B C B C
.8/ B C ; A D B
C .8/ C
E DB B C ;
@ ˝˝ A @ ˝ ˝ ˝ ˝A
˝ 0 ˝˝˝
0 1 0 1
˝˝ ˝ ˝
B ˝ ˝C B ˝C
˝
B C B C
E DB
.9/
B ˝ ˝CC; A .9/
DB
B ˝C
C; ˝
@ ˝ ˝A @ ˝A
˝
0 ˝ ˝ ˝
0 1 0 1
B C B C
B C B C
E .10/
DB
B ˝ ˝ ˝ C;
C A.10/
DBB ˝ ˝ ˝ ˝ C;
C
@ ˝ ˝A @ 0 ˝ ˝ ˝A
0 1 0 1
˝˝ ˝˝
B ˝ ˝ C B ˝ ˝ C
B C B C
E .11/ D B
B ˝ ˝ CC; A.11/ D B
B ˝ ˝ C :
C
@ 0 ˝ A @ ˝ ˝ A
˝˝
For the second column, we have to stop the procedure after eliminating all but the
first three element, because proceeding by eliminating the .3; 2/-entry of A.11/ would
lead to a fill-in in the .3; 1/-position of A.11 /, thereby destroying the zero structure of
the first column. We now continue this series of steps until A has reached Hessenberg
form. In the case n D 5 only one more pair of steps is necessary:
0 1 0 1
B C B C
B C B C
.12/ B C
C ; A .12/
DB C
E DB B C ;
@ ˝ ˝A @ ˝ ˝ ˝A
˝ 0 ˝˝
196 Chapter 8 Generalized and polynomial eigenvalue problems
0 1 0 1
˝˝ ˝˝
B ˝ ˝C B ˝ ˝C
B C B C
E .13/ DB
B ˝ ˝CC; A.13/ DB C
B ˝ ˝C :
@ ˝ ˝A @ ˝ ˝A
0 ˝ ˝˝
Exercise 8.28. Calculate how many operations are necessary to compute the
Hessenberg triangular form for an n n matrix pencil tE A over F. Also
calculate the number of operations needed to explicitly compute the transforma-
tion matrices Q and Z.
8.5 Deflation
Deflation was an important strategy in the practical use of the QR iteration as noted
in Section 5.4, and the same is true for generalized eigenvalue problems. Assume
that our n n matrix pencil tE A over F is regular and in Hessenberg-triangular
form. If one or more of the subdiagonal elements of A are zero (or, sufficiently small
in magnitude), then the corresponding generalized eigenvalue problem can be divided
into two smaller generalized eigenvalue problems of smaller size. Let k 2 ¹1; : : : ; nº
be an index such that akC1;k ¤ 0. Then the pencil tE A has the form
tE11 A11 tE12 A12
tE A D ;
0 tE22 A22
where tE11 A11 and tE22 A22 are two pencils of sizes k k and .n k/ .n k/,
respectively, that are again in Hessenberg-triangular form. Thus, we can compute the
eigenvalues of tE A by computing the eigenvalues of the pencils tE11 A11 and
tE22 A22 . In practice, we will deflate the problem as soon as there exists an index
k for akC1;k 0.
Next assume that all subdiagonal entries of A are nonzero and assume that the
pencil tE A has the eigenvalue 1. Then E is singular and necessarily must have
a zero entry on the diagonal. Thus, the eigenvalue 1 obviously plays an exceptional
role, because it is automatically displayed after reduction to Hessenberg-triangular
form. Due to this exceptional role, it may not be a complete surprise that we are also
able to deflate it. The corresponding procedure is called zero-chasing, and we will
explain it exemplarily on a 5 5 matrix pencil. Thus, let us assume that e33 D 0,
Section 8.5 Deflation 197
where
D 3 is the smallest index for which ekk ¤ 0 for all k D
C 1; : : : ; n. Thus,
our pencil has the following pattern.
0 1 0 1
B C B C
B C B C
B C ; A D B
C C
EDB B C :
@ A @ A
Our aim is to chase the ‘zero’ on the diagonal of E down to the lower right corner
of E. In the first step, we apply a Givens rotation G1 to E and A acting on the third
and forth rows that will eliminate the .4; 4/-entry of E.
0 1 0 1
B C B C
B C B C
E .1/ D G1 E D B B ˝ ˝ C ; A.1/ D G1 A D B ˝ ˝ ˝ ˝C :
C B C
@ 0 ˝A @ ˝ ˝ ˝A
This procedure will generate a fill-in in the .4; 2/-position of A thus creating a bulge in
the Hessenberg form. However, applying a Givens rotation G2 acting on the columns
two and three, we can restore the Hessenberg form of A.1/ without destroying the
upper triangular form of0E .1/ . 1 0 1
˝˝ ˝˝
B ˝ ˝ C B ˝ ˝ C
B C B C
.2/ .1/ B C ; A D A G2 D B
C .2/ .1/ C
E D E G2 D B B ˝ ˝ C :
@ A @ 0 ˝ A
Note that we chased the zero on the diagonal of E further down while the Hessenberg
form of A has been restored. (There is an additional zero entry remaining in the
.3; 3/-position of E, but this one will fill up again later, so it is out of our focus.) We
continue now to chase the zero down to the .n; n/-position.
0 1 0 1
B C B C
B C B C
.3/ .2/ B C ; A D G3 A D B
C .3/ .2/ C
E D G3 E D B B C ;
@ ˝ A @ ˝ ˝ ˝A
0 ˝˝
0 1 0 1
˝˝ ˝˝
B ˝ ˝ C B ˝ ˝ C
B C B C
E .4/ D E G4 D B
.3/
B ˝ C;
C A.4/ D A G4 D B
.3/
B ˝ ˝ C:
C
@ A @ ˝ ˝ A
0
198 Chapter 8 Generalized and polynomial eigenvalue problems
Observe that the last step has finally filled up the .3; 3/-entry of E .3/ . However, as
E .3/ still has upper triangular form, this is acceptable. The pattern of the matrices
E .4/ and A.4/ is the typical pattern that we obtain after chasing down the zero on the
diagonal of E to the .n; n/-position. In order to deflate the problem, we can apply
a final Givens rotation G5 acting on columns that eliminates the .5; 4/-entry of A.4/ .
This will create a fill-in in the .4; 4/-position of E .4/ , but again this is acceptable as
the upper triangular form is not destroyed.
0 1 0 1
˝ ˝ ˝˝
B ˝ ˝C B ˝ ˝C
B C B C
B C
˝ ˝ C ; A D A G5 D B C
B ˝ ˝C:
.5/ .4/ .5/ .4/
E D E G5 D B
@ ˝A @ ˝ ˝A
0 ˝
We can now deflate the eigenvalue 1 and continue with the .n 1/ .n 1/ ma-
trix pencil from the upper left corner of tE .5/ A.5/ which is again in Hessenberg-
triangular form.
after one step of the implicit QR iteration with multi-shift strategy outlined in Sec-
tion 5.6. We will do this exemplary for the case n D 5 and the double shift strategy
as in Example 5.17.
As in Chapter 5, the main idea is the application of Corollary 5.14. In the following
let H WD AE 1 and let 1 ; 2 2 C be two shifts. In Section 5.6, we started by
investigating the matrix B D .H 1 In /.H 2 In / which has the form
0 1
B C
B C
BDB B C :
C
@ A
Section 8.6 The QZ step 199
We then computed a Householder reflection M1 that eliminates the .2; 1/- and
.3; 1/-entries of B, but instead of applying M1 directly to B and computing the QR
factorization of B, we applied M1 directly to the matrix H and restored its Hessenberg
form with bulge chasing. Corollary 5.14 then implied that the resulting Hessenberg
matrix was essentially the matrix that we would have obtained after two consecutive
steps of the QR iteration with shifts 1 and 2 . Applying the same strategy here, we
will have to compute a Householder reflection that reflects the first column b1 of the
matrix B D .H 1 In /.H 2 In / to a multiple of the first canonical unit vector.
It is important to highlight that the first column of B can be easily computed without
explicitly forming E 1 and the product AE 1 , see Exercise 8.30.
Exercise 8.30 (First column of B). Let E; A 2 C nn such that E is nonsingular
and tE A is in Hessenberg-triangular form. Show that only the first three entries
of the first column
is essentially the matrix that we would have obtained after two steps of the QR iter-
ation with shifts 1 and 2 applied to AE 1 . As in Chapter 5, we will achieve this
by bulge chasing. We start by applying a Householder reflection Z1 that changes the
.0/ .0/
first three columns to eliminate e31 and e32 . This will create fill-ins in the .4; 1/- and
.4; 2/-positions of A.0/ .
0 1 0 1
˝˝˝ ˝˝˝
B˝ ˝ ˝ C B˝ ˝ ˝ C
B C B C
E D E Z1 D B 0 0 ˝ C ; A D A Z1 D B
.1/ .0/ B C .1/ .0/
B˝ ˝ ˝ C :
C
@ A @ ˝ A
Next, we apply a Householder reflection Z2 that changes the first two columns to
.0/
eliminate e21 . Note that this will not create any additional fill-ins in our matrices.
0 1 0 1
˝˝ ˝ ˝
B 0 ˝ C B˝ ˝ C
B C B C
E .2/ D E Z2 D B
.1/
B CC; A.2/ D A.1/ Z2 D B
B˝ ˝ C
C:
@ A @˝ ˝ A
These two step have restored the triangular form of E. We now concentrate on the
matrix A.2/ and apply a Householder reflection Q1 changing the rows two to four to
.2/ .2/
eliminate the entries a31 and a41 .
0 1 0 1
B ˝ ˝ ˝ ˝C B˝ ˝ ˝ ˝ ˝C
B C B C
E .3/ D Q1 E .2/ DB
B ˝ ˝ ˝CC; A.3/ D Q1 A.2/ DB
B0 ˝ ˝ ˝ ˝CC:
@ ˝ ˝A @0 ˝ ˝ ˝ ˝A
The fact that Q1 only changes the rows two to four, its conjugate transpose Q1 will
manipulate only the columns two to four if we multiply it from the right to M1 . Thus,
M1 and M1 Q1 will have identical first columns. Observe that as in Chapter 5, we
have managed to chase the bulge one step towards the right lower edge of the matrix.
We continue to chase the bulge further down and finally off the diagonal to restore the
Hessenberg-triangular form of our pencil.
0 1 0 1
˝˝˝ ˝˝˝
B ˝ ˝ ˝ C B ˝ ˝ ˝ C
B C B C
.4/ .3/ B ˝ ˝ ˝ C .4/ .3/ B C
E D E Z3 D B C ; A D A Z3 D B ˝ ˝ ˝ C ;
@ 0 0 ˝ A @ ˝ ˝ ˝ A
˝
Section 8.6 The QZ step 201
0 1 0 1
˝˝ ˝ ˝
B ˝ ˝ C B ˝ C
˝
B C B C
E D E Z4 D B
.5/ .4/ C
B 0 ˝ C ; A .5/
D A.4/ Z4 D B
B ˝ C
˝C;
@ A @ ˝ A
˝
˝ ˝
0 1 0 1
B C B C
B C B C
E D Q2 E D B
.6/ .5/
B ˝ ˝ ˝ C;
C A D Q2 A D B
.6/ .5/
B ˝ ˝ ˝ ˝ C;
C
@ ˝ ˝A @ 0 ˝ ˝ ˝A
˝ 0 ˝˝˝
0 1 0 1
˝˝˝ ˝˝˝
B ˝ ˝ ˝C B ˝ ˝ ˝C
B C B C
E .7/ D E .6/ Z5 D B
B ˝ ˝ ˝CC; A.7/ D A.6/ Z5 D B
B ˝ ˝ ˝C ;
C
@ ˝ ˝ ˝A @ ˝ ˝ ˝A
0 0 ˝ ˝˝˝
0 1 0 1
˝˝ ˝˝
B ˝ ˝ C B ˝ ˝ C
B C B C
E D E Z6 D B
.8/ .7/
B ˝ ˝ CC; A D A Z6 D B
.8/ .7/
B ˝ ˝ C ;
C
@ 0 ˝ A @ ˝ ˝ A
˝˝
0 1 0 1
B C B C
B C B C
E D Q3 E D B
.9/ .8/
B CC; A D Q3 A D B
.9/ .8/
B C ;
C
@ ˝ ˝A @ ˝ ˝ ˝A
˝ 0 ˝˝
0 1 0 1
˝˝ ˝˝
B ˝ ˝C B ˝ ˝C
B C B C
E .10/
D E Z7 D B
.9/
B ˝ ˝CC; A .10/
D A Z7 D B
.9/
B ˝ ˝C :
C
@ ˝ ˝A @ ˝ ˝A
0 ˝ ˝˝
In the general n n case, this procedure yields matrices
Q D Q0 Qn2 ; Z D Z1 Z2n3
ee
such that the pencil t E A WD Q .tE A/Z is in Hessenberg-triangular form
Q and M have identical first columns. As explained above, the
and such that M1 1
matrix e
AEe 1 is essentially the Hessenberg matrix that is obtained after two steps of
the QR iteration with shifts 1 and 2 applied to the Hessenberg matrix AE 1 . We
can now repeat the steps until one or more elements on the subdiagonal of AE 1 are
202 Chapter 8 Generalized and polynomial eigenvalue problems
[1] W. E. Arnoldi, The principle of minimized iterations in the solution of the matrix eigen-
value problem, Quart. Appl. Math. 9 (1951), 17–29.
[2] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe and H. van der Vorst, Templates for the Solution
of Algebraic Eigenvalue Problems, SIAM, Philadelphia, PA, 2000.
[3] F. L. Bauer and C. T. Fike, Norms and exclusion theorems, Numer. Math. 2 (1960),
137–141.
[4] K. Braman, R. Byers and R. Mathias, The Multishift QR Algorithm. Part I: Maintaining
Well-Focused Shifts and Level 3 Performance, SIAM Mat. Anal. Appl. 23 (2002), 929–
947.
[5] K. Braman, R. Byers and R. Mathias, The Multishift QR Algorithm. Part II: Aggressive
Early Deflation, SIAM Mat. Anal. Appl. 23 (2002), 948–973.
[6] R. Courant and D. Hilbert, Methoden der mathematischen Physik, Springer, 1924.
[7] J. J. M. Cuppen, A Divide and Conquer Method for the Symmetric Tridiagonal Eigen-
problem, Numer. Math. 36 (1981), 177–195.
[8] J. Demmel and K. Veselić, Jacobi’s method is more accurate than QR, SIAM J. Matrix
Anal. Appl. 13 (1992), 1204–1245.
[9] J. W. Demmel, Applied Numerical Linear Algebra, SIAM, 1997.
[10] I. S. Dhillon, A New O.n2 / Algorithm for the Symmetric Tridiagonal Eigen-
value/Eigenvector Problem, Ph. D. thesis, University of California, Berkeley, 1997.
[11] I. S. Dhillon and B. N. Parlett, Multiple representations to compute orthogonal eigen-
vectors of symmetric tridiagonal matrices, Lin. Alg. Appl. 387 (2004), 1–28.
[12] J. Dongarra and D. Sorensen, A fully parallel algorithm for the symmetric eigenvalue
problem, SIAM J. Sci. Statist. Comput. 8 (1987), 139–154.
[13] Z. Drmač, A Global Convergence Proof for Cyclic Jacobi Methods with Block Rota-
tions, SIAM J. Matrix Anal. Appl. 31 (2009), 1329–1350.
[14] J. G. F. Francis, The QR Transformation. A Unitary Analogue to the LR Transformation
— Part 1, The Computer Journal 4 (1961), 265–271.
[15] J. G. F. Francis, The QR Transformation — Part 2, The Computer Journal 4 (1962),
332–345.
[16] S. Gershgorin, Über die Abgrenzung der Eigenwerte einer Matrix, Izv. Akad. Nauk SSSR
Ser. Mat. 1 (1931), 749–754.
204 Bibliography
[17] L. Giraud, J. Langou, M. Rozložnik and J. van den Eshof, Rounding error analysis of the
classical Gram-Schmidt orthogonalization process, Numer. Math. 101 (2005), 87–100.
[18] G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press,
London, 1996.
[19] P. Henrici, On the Speed of Convergence of Cyclic and Quasicyclic Jacobi Methods
for Computing Eigenvalues of Hermitian Matrices, J. Soc. Ind. Appl. Math 6 (1958),
144–162.
[20] K. Hessenberg, Behandlung linearer Eigenwertaufgaben mit Hilfe der Hamilton-
Cayleyschen Gleichung, Institut für Praktische Mathematik, TH Darmstadt, Report
no. 1, 1940, available at http://www.hessenberg.de/numverf.pdf.
[21] A. S. Householder, Unitary Triangularization of a Nonsymmetric Matrix, Journal of the
ACM 5 (1958), 339–342.
[22] C. G. J. Jacobi, Über ein leichtes Verfahren die in der Theorie der Säcularstörungen
vorkommenden Gleichungen numerisch aufzulösen, Journal für die reine und ange-
wandte Mathematik 30 (1846), 51–94.
[23] A. Knyazev and K. Neymeyr, A geometric theory for preconditioned inverse iteration.
III: A short and sharp convergence estimate for generalized eigenvalue problems, Linear
Algebra Appl. 358 (2003), 95–114.
[24] D. Kressner, Numerical Methods for General and Structured Eigenvalue Problems,
Springer, 2005.
[25] A. N. Krylov, On the numerical solution of the equation by which, in technical matters,
frequencies of small oscillations of material systems are determined, Izv. Akad. Nauk.
S.S.S.R. Otdel. Mat. Estest. 1 (1931), 491–539.
[26] B. H. Kublanovskaja, On some algorithms for the solution of the complete problem of
proper values, J. Comput. Math. and Math. Phys. 1 (1961), 555–570.
[27] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear
differential and integral operators, J. Res. Nat. Bur. Stand. 45 (1950), 255–282.
[28] C. B. Moler and G. W. Stewart, An algorithm for generalized matrix eigenvalue prob-
lems, SIAM J. Numer. Anal. 10 (1973), 241–256.
[29] K. Neymeyr, A geometric theory for preconditioned inverse iteration. I: Extrema of the
Rayleigh quotient, Linear Algebra Appl. 322 (2001), 61–85.
[30] K. Neymeyr, A geometric theory for preconditioned inverse iteration. II: Convergence
estimates, Linear Algebra Appl. 322 (2001), 87–104.
[31] L. Page, S. Brin, R. Motwani and T. Winograd, The PageRank Citation Ranking: Bring-
ing Order to the Web, Stanford InfoLab, Report, 1999.
[32] B. N. Parlett, Global Convergence of the Basic QR Algorithm on Hessenberg Matrices,
Math. Comp. 22 (1968), 803–817.
[33] B. N. Parlett, The Symmetric Eigenvalue Problem, SIAM, 1987.
[34] O. Perron, Zur Theorie der Matrices, Mathematische Annalen 64 (1907), 248–263.
Bibliography 205