Eigenvalue Problems
Eigenvalue Problems
1 Introduction 1
1.1 What makes eigenvalues interesting? . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Example 1: The vibrating string . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Problem setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 The method of separation of variables . . . . . . . . . . . . . . . . . 5
1.3 Numerical methods for solving 1-dimensional problems . . . . . . . . . . . . 6
1.3.1 Finite differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 The finite element method . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.3 Global functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.4 A numerical comparison . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Example 2: The heat equation . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Example 3: The wave equation . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 The 2D Laplace eigenvalue problem . . . . . . . . . . . . . . . . . . . . . . 13
1.6.1 The finite difference method . . . . . . . . . . . . . . . . . . . . . . . 13
1.6.2 The finite element method (FEM) . . . . . . . . . . . . . . . . . . . 16
1.6.3 A numerical example . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 Cavity resonances in particle accelerators . . . . . . . . . . . . . . . . . . . 21
1.8 Spectral clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8.1 The graph Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.8.2 Spectral clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.8.3 Normalized graph Laplacians . . . . . . . . . . . . . . . . . . . . . . 27
1.9 Googles PageRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.10 Other sources of eigenvalue problems . . . . . . . . . . . . . . . . . . . . . . 30
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2 Basics 33
2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 Statement of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3 Similarity transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4 Schur decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5 The real Schur decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.6 Normal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.7 Hermitian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.8 The Jordan normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.9 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.10 The Rayleigh quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.11 Cholesky factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.12 The singular value decomposition (SVD) . . . . . . . . . . . . . . . . . . . . 50
iii
iv CONTENTS
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 Newton methods 53
3.1 Linear and nonlinear eigenvalue problems . . . . . . . . . . . . . . . . . . . 53
3.2 Zeros of the determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2.1 Algorithmic differentiation . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.2 Hymans algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.3 Computing multiple zeros . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Newton methods for the constrained matrix problem . . . . . . . . . . . . . 58
3.4 Successive linear approximations . . . . . . . . . . . . . . . . . . . . . . . . 60
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4 The QR Algorithm 63
4.1 The basic QR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1.1 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2 The Hessenberg QR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.1 A numerical experiment . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.2 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3 The Householder reduction to Hessenberg form . . . . . . . . . . . . . . . . 71
4.3.1 Householder reflectors . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3.2 Reduction to Hessenberg form . . . . . . . . . . . . . . . . . . . . . 71
4.4 Improving the convergence of the QR algorithm . . . . . . . . . . . . . . . . 73
4.4.1 A numerical example . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.4.2 QR algorithm with shifts . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4.3 A numerical example . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5 The double shift QR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5.1 A numerical example . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.5.2 The complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.6 The symmetric tridiagonal QR algorithm . . . . . . . . . . . . . . . . . . . 84
4.6.1 Reduction to tridiagonal form . . . . . . . . . . . . . . . . . . . . . . 84
4.6.2 The tridiagonal QR algorithm . . . . . . . . . . . . . . . . . . . . . . 85
4.7 Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Introduction
Before we start with the subject of this notes we want to show how one actually arrives at
large eigenvalue problems in practice. In the following, we restrict ourselves to problems
from physics [7, 18, 14] and computer science.
decay factors,
frequencies,
singular values,
condition numbers.
In the sequel we give a number of examples that show why computing eigenvalues is
important. At the same time we introduce some notation.
1
2 CHAPTER 1. INTRODUCTION
1.2 Example 1: The vibrating string
1.2.1 Problem setting
Let us consider a string as displayed in Fig. 1.1. The string is fixed at both ends, at x = 0
u(x,t)
x
0 L
and x = L. The x-axis coincides with the strings equilibrium position. The displacement
of the rest position at x, 0 < x < L, and time t is denoted by u(x, t).
We will assume that the spatial derivatives of u are not very large:
u
is small.
x
Plugging this into (1.1) and omitting also the second order term (leaving just the number 1)
gives
dx u 2
dT = .
2 t
The kinetic energy of the whole string is obtained by integrating over its length,
Z L Z 2
1 L u
T = dT (x) = (x) dx
0 2 0 t
The potential energy of the string has two components
1.2. EXAMPLE 1: THE VIBRATING STRING 3
ds
dx
Here functions u(x, t) are admitted that are differentiable with respect to x and t and
satisfy the boundary conditions (BC) that correspond to the fixing,
u(x, t1 ) = u1 (x),
(1.5) 0 < x < L.
u(x, t2 ) = u2 (x),
4 CHAPTER 1. INTRODUCTION
According to the principle of Hamilton a mechanical system with kinetic energy T and
potential energy V behaves in a time interval t1 t t2 for given initial and end positions
such that
Z t2
I= L dt, L = T V,
t1
is minimized.
Let u(x, t) be such that I(u) I(w) for all w, that satisfy the initial, end, and
boundary conditions. Let w = u + v with
Zt2 ZL " 2 2 #
1 (u + v) (u + v)
I(u + v) = (x) + 2f (u + v) dx dt
2 t x
t1 0
(1.6)
Zt2 ZL
u v u v
= I(u) + (x) + 2f v dx dt + O(2 ).
t t x x
t1 0
Thus, after integration by parts, exploiting the conditions in (), the equation
Z t2 Z L
I 2u 2u
= + 2 f v dx dt = 0
t1 0 t2 x2
must hold for all admissible v. Therefore, the bracketed expression must vanish,
2u 2u
(1.7) + = 2 f.
t2 x2
2
(x) u + p(x) u + q(x)u(x, t) = 0.
(1.8) t2 x x
u(0, t) = u(1, t) = 0
which is a special case of the Euler-Lagrange equation (1.7). Here, (x) plays the role of
a mass density, p(x) of a locally varying elasticity module. We do not specify initial and
end conditions for the moment. Note that there are no external forces present in (1.8).
From physics we know that (x) > 0 and p(x) > 0 for all x. These properties are of
importance also from a mathematical view point! For simplicity, we assume that (x) = 1.
1.2. EXAMPLE 1: THE VIBRATING STRING 5
1.2.2 The method of separation of variables
For the solution u in (1.8) we make the ansatz
Here, v is a function that depends only on the time t, while w depends only on the spatial
variable x. With this ansatz (1.8) becomes
A value for which (1.12) has a non-trivial (i.e. nonzero) solution w is called an eigen-
value; w is a corresponding eigenfunction. It is known that all eigenvalues of (1.12) are
positive. By means of our ansatz (1.9) we get
h i
u(x, t) = w(x) a cos( t) + b sin( t)
as a solution of (1.8). It is known that (1.12) has infinitely many real positive eigenvalues
0 < 1 2 , (k ). (1.12) has a non-zero solution, say wk (x), only for these
k
particular values k . Therefore, the general solution of (1.8) has the form
X h p p i
(1.13) u(x, t) = wk (x) ak cos( k t) + bk sin( k t) .
k=0
The coefficients ak and bk are determined by initial and end conditions. We could, e.g.,
require that
X
u(x, 0) = ak wk (x) = u0 (x),
k=0
p
X
u
(x, 0) = k bk wk (x) = u1 (x),
t
k=0
where u0 and u1 are given functions. It is known that the wk form an orthogonal basis in
the space of square integrable functions L2 (0, 1),
Z 1
wk (x)w (x)dx = k k .
0
6 CHAPTER 1. INTRODUCTION
Therefore, it is not difficult to compute the coefficients ak and bk ,
Z 1 Z 1 p
ak = u0 (x)wk (x)dx/k , bk = u1 (x)wk (x)dx/k k .
0 0
In concluding, we see that the difficult problem to solve is the eigenvalue problem (1.12).
Knowing the eigenvalues and eigenfunctions the general solution of the time-dependent
problem (1.8) is easy to form.
Eq. (1.12) can be solved analytically only in very special situation, e.g., if all coefficients
are constants. In general a numerical method is needed to solve the Sturm-Liouville
problem (1.12).
x
0 xi1xi xi+1 L
(1.14) Aw = w.
By construction, A is symmetric and tridiagonal. One can show that it is positive definite
as well. Note that this matrix has just a few nonzeros: out of the n2 elements of A only
3n 2 are nonzero. This is an example of a sparse matrix.
Remark: Requiring continuous differentiability is too strong and does not lead to a
mathematically suitable formulation. In particular, the test functions that will be used
below are not differentiable in the classical sense. It is more appropriate to require w and
to be weakly differentiable. In terms of Sobolev spaces: w, H01 ([0, 1]). An introduction
to Sobolev spaces is, however, beyond the scope of these notes.
i 1
x
0 xi1xi xi+1 L
Figure 1.4: A basis function of the finite element space: a hat function.
where
|x xi | |x xi |
(1.17) i (x) = 1 = max{0, 1 },
h + h
8 CHAPTER 1. INTRODUCTION
is the function that is linear in each interval (xi , xi+1 ) and satisfies
1, i = k,
i (xk ) = ik :=
0, i 6= k.
An example of such a basis function, a so-called hat function, is displayed in Fig. 1.4.
We now replace w in (1.15) by the linear combination (1.16), and replace testing
against all by testing against all j . In this way (1.15) becomes
Z 1 n n
!
X X
p(x)( i i (x))j (x) + (q(x) ) i i (x)j (x) dx, for all j,
0 i=1 i=1
or,
n
X Z 1
(1.18) i p(x)i (x)j (x) + (q(x) )i (x)j (x) dx = 0, for all j.
i=1 0
These last equations are called the RayleighRitzGalerkin equations. Unknown are
the n values i and the eigenvalue . In matrix notation (1.18) becomes
(1.19) Ax = M x
with Z Z
1 1
aij = p(x)i j + q(x)i j dx and mij = i j dx
0 0
For the specific case p(x) = 1 + x and q(x) = 1 we get
Z kh " #
1 x (k 1)h 2
akk = (1 + x) 2 + dx
(k1)h h h
Z (k+1)h " #
1 (k + 1)h x 2 2 1
+ (1 + x) 2 + dx = 2(n + 1 + k) +
kh h h 3n+1
Z (k+1)h
1 (k + 1)h x x kh 3 1 1
ak,k+1 = (1 + x) 2 + dx = n k +
kh h h h 2 6n+1
and solve it with the finite difference and finite element methods as well as with the global
functions method. The results are given in Table 1.1.
Clearly the global function method is the most powerful of them all. With 80 basis
functions the eigenvalues all come right. The convergence rate is exponential.
With the finite difference and finite element methods the eigenvalues exhibit quadratic
convergence rates. If the mesh width h is reduced by a factor of q = 2, the error in the
eigenvalues is reduced by the factor q 2 = 4. There exist higher order finite elements and
higher order finite difference stencils [11, 6].
u(x, t)
u(x, t) = 0, x , t > 0,
t
(1.21) u(x, t)
= 0, x , t > 0,
n
u(x, 0) = u0 (x), x .
dv(t)
(1.25) + v(t) = 0,
dt
the solution of which has the form aexp(t). By separating variables, the problem (1.21)
is divided in two subproblems that are hopefully easier to solve. A value , for which (1.24)
has a nontrivial (i.e. a nonzero) solution is called an eigenvalue; w then is called a corre-
sponding eigenfunction.
If n is an eigenvalue of problem (1.24) with corresponding eigenfunction wn , then
en t wn (x)
is a solution of the first two equations in (1.21). It is known that equation (1.24) has
infinitely many real eigenvalues 0 1 2 , that tend to infinity, n as
n . Multiple eigenvalues are counted according to their multiplicity. An arbitrary
bounded piecewise continuous function can be represented as a linear combination of the
eigenfunctions w1 , w2 , . . . Therefore, the solution of (1.21) can be written in the form
X
(1.26) u(x, t) = cn en t wn (x),
n=1
(1.28) u(x, t) c1 .
t
Thus, in the limit (i.e., as t goes to infinity), the temperature will be constant in the whole
container. The convergence rate towards this equilibrium is determined by the smallest
positive eigenvalue 2 of (1.24):
X
X
n t
ku(x, t) c1 k = k cn e wn (x)k |en t |kcn wn (x)k
n=2 n=2
X
2 t
e kcn wn (x)k e2 t ku0 (x)k.
n=2
Here we have assumed that the value of the constant function w1 (x) is set to unity.
12 CHAPTER 1. INTRODUCTION
1.5 Example 3: The wave equation
The air pressure u(x, t) in a volume with acoustically hard walls satisfies the equations
2 u(x, t)
(1.29) u(x, t) = 0, x , t > 0,
t2
u(x, t)
(1.30) = 0, x , t > 0,
n
(1.31) u(x, 0) = u0 (x), x ,
u(x, 0)
(1.32) = u1 (x), x .
t
Sound propagates with speed u, along the (negative) gradient from high to low pres-
sure.
To solve the wave equation we proceed as with the heat equation in section 1.4: sepa-
ration of u according to (1.23) leads again to equation (1.24) but now together with
d2 v(t)
(1.33) + v(t) = 0.
dt2
We know this equation from the analysis of the vibrating string, see (1.11). From there
we know that the general solution of the wave equation has the form
X h p p i
(1.13) u(x, t) = wk (x) Ak cos( k t) + Bk sin( k t) .
k=0
where the wk , k = 1, 2, . . ., are the eigenfunctions of the eigenvalue problem (1.24). The
coefficients ak and bk are determined by (1.31) and (1.32).
If a harmonic oscillation is forced on the system, an inhomogeneous problem
2 u(x, t)
(1.34) u(x, t) = f (x, t),
t2
is obtained. The boundary and initial conditions are taken from (1.29)(1.32). This
problem can be solved by expanding u and f in the eigenfunctions wn (x),
X
u(x, t) := vn (t)wn (x),
n=1
(1.35)
X
f (x, t) := n (t)wn (x).
n=1
d2 vn
(1.36) + n vn = n (t).
dt2
If n (t) = an sin t, then the solution becomes
p p 1
(1.37) vn = An cos n t + Bn sin n t + an sin t.
n 2
1.6. THE 2D LAPLACE EIGENVALUE PROBLEM 13
Anand Bn are real constants that are determined by the initial conditions.
If gets close
to n , then the last term can be very large. In the limit, if = n , vn gets the form
p p
(1.38) vn = An cos n t + Bn sin n t + an t sin t,
in which case, vn is not bounded in time anymore. This phenomenon is called resonance.
Often resonance is not desirable; it may, e.g., mean the blow up of some structure. In
order to prevent resonances eigenvalues have to be known. Possible remedies are changing
the domain (the structure) or parameters (the materials).
Remark 1.1. Vibrating membranes satisfy the wave equation, too. In general the boundary
conditions are different from (1.30). If the membrane (of a drum) is fixed at its boundary,
the condition
(1.39) u(x, t) = 0
is imposed. These boundary conditions are called Dirichlet boundary conditions. The
boundary conditions in (1.21) and (1.30) are called Neumann boundary conditions. Com-
binations of these two can occur.
(1.41) u(x) = 0, x 1 ,
u
(1.42) (x) + (x)u(x) = 0, x 2 .
n
Here, 1 and 2 are disjoint subsets of with 1 2 = . We restrict ourselves in
the following on two-dimensional domains and write (x, y) instead of (x1 , x2 ).
In general it is not possible to solve a problem of the form (1.40)(1.42) exactly (ana-
lytically). Therefore, one has to resort to numerical approximations. Because we cannot
compute with infinitely many variables we have to construct a finite-dimensional eigenvalue
problem that represents the given problem as well as possible, i.e., that yields good approx-
imations for the desired eigenvalues and eigenvectors. Since finite-dimensional eigenvalue
problem only have a finite number of eigenvalues one cannot expect to get good approxi-
mations for all eigenvalues of (1.40)(1.42).
Two methods for the discretization of eigenvalue problems of the form (1.40)(1.42) are
the Finite Difference Method [11, 16, 9] and the Finite Element Method (FEM) [6, 15, 8].
We briefly introduce these methods in the following subsections.
By a Taylor expansion one can show that for sufficiently smooth functions u
1
u(x, y) = (4u(x, y) u(x h, y) u(x + h, y) u(x, y h) u(x, y + h))
h2
+ O(h2 ).
At the points at the upper boundary of we first take the difference equation (1.44)
The value ui,N +1 corresponds to a grid point outside of the domain! However the Neumann
boundary conditions suggest to reflect the domain at the upper boundary and to extend the
eigenfunction symmetrically beyond the boundary. This procedure leads to the equation
ui,N +1 = ui,N 1 . Plugging this into (1.47) and multiplying the new equation by the factor
1/2 gives
1 1 1
(1.48) 2ui,N ui1,N ui+1,N ui,N 1 = h2 ui,N , 0 < i < N.
2 2 2
1.6. THE 2D LAPLACE EIGENVALUE PROBLEM 15
In summary, from (1.44) and (1.48), taking into account that (1.46) we get the matrix
equation
4 1 0 1 u1,1
1 4 1 0 1
u1,2
0 1 4 0 0 1 u
1,3
1 0 0 4 1 0 1 u
2,1
1 0 1 4 1 0 1 u
2,2
1 0 1 4 0 0 1 u2,3
1 0 0 4 1 0 1 u3,1
1 0 1 4 1 0 1 u3,2
1 0 1 4 0 0 1 u3,3
1 0 0 2 21 0 u4,1
1 0 12 2 12 u4,2
1
1 0 2 2 u4,3
(1.49)
1 u1,1
1 u
1,2
1 u
1,3
1 u
2,1
1 u
2,2
1 u2,3
= h2 .
1 u3,1
1 u3,2
1 u3,3
1
u4,1
2
1
2
u4,2
1
2 u4,3
ui,1
ui,2
ui := .. RN 1 ,
.
ui,N 1
4 1
..
1 4
.
T :=
.. ..
R(N 1)(N 1) ,
. . 1
1 4
1
1
I := .. R(N 1)(N 1) .
.
1
16 CHAPTER 1. INTRODUCTION
In this way we obtain from (1.44), (1.46), (1.48) the discrete eigenvalue problem
T I u1 I u1
.
I T . . . ..
2
. ..
..
.
(1.50) = h
.. ..
. . I u 3 I uN 1
1
I 12 T u4 2I uN
A property common to matrices obtained by the finite difference method are its spar-
sity. Sparse matrices have only very few nonzero elements.
In real-world applications domains often cannot be covered easily by a rectangular
grid. In this situation and if boundary conditions are complicated the method of finite
differences can be difficult to implement. Because of this the finite element method is
often the method of choice.
Nevertheless, problems that are posed on rectangular grids can be solved very effi-
ciently. Therefore, tricks are used to deal with irregular boundaries. The solution of
the problem may be extended artificially beyond the boundary, see e.g. [1, 17, 9]. Simi-
lar techiques, so-called immersed boundary conditions are applied at (irregular) interfaces
where, e.g., equations or parameters change [11].
where V is vector space of bounded twice differentiable functions that satisfy the boundary
conditions (1.41)(1.42). By partial integration (Greens formula) this becomes
Z Z Z
(1.54) uv dx dy + u v ds = u v dx dy, v V,
2
1.6. THE 2D LAPLACE EIGENVALUE PROBLEM 17
or
where Z Z Z
a(u, v) = u v dx dy + u v ds, and (u, v) = u v dx dy.
2
to become a Hilbert space H [3, 19]. H is the space of quadratic integrable functions with
quadratic integrable first derivatives that satisfy the Dirichlet boundary conditions (1.41)
u(x, y) = 0, (x, y) 1 .
(Functions in H in general do not satisfy the so-called natural boundary conditions (1.42).)
One can show [19] that the eigenvalue problem (1.40)(1.42) is equivalent with the eigen-
value problem
(The essential point is to show that the eigenfunctions of (1.56) are elements of V .)
are chosen. These functions span a subspace S of H. Then, problem (1.56) is solved where
H is replaced by S.
(1.62) Ax = M x
where
x1 a11 a1n m11 m1n
.. .. , .. ..
(1.63) x = ... , A= . ..
. . M = . ..
. .
xn an1 ann mn1 mnn
with Z Z
aij = a(i , j ) = i j dx dy + i j ds
2
and Z
mij = (i , j ) = i j dx dy.
The finite element method (FEM) is a special case of the RayleighRitz method.
In the FEM the subspace S and in particular the basis {i } is chosen in a particularly
clever way. For simplicity we assume that the domain is a simply connected domain with
a polygonal boundary, c.f. Fig 1.5. (This means that the boundary is composed entirely
of straight line segments.) This domain is now partitioned into triangular subdomains
Finite element spaces for solving (1.40)(1.42) are typically composed of functions that
are continuous in and are polynomials on the individual subdomains Te . Such functions
1.6. THE 2D LAPLACE EIGENVALUE PROBLEM 19
are called piecewise polynomials. Notice that this construction provides a subspace of the
Hilbert space H but not of V , i.e., the functions in the finite element space are not very
smooth and the natural boundary conditions are not satisfied.
An essential issue is the selection of the basis of the finite element space S. If S1 H
is the space of continuous, piecewise linear functions (the restriction to Te is a polynomial
of degree 1) then a function in S1 is uniquely determined by its values at the vertices of the
triangles. Let these nodes, except those on the boundary portion 1 , be numbered from
1 to n, see Fig. 1.6. Let the coordinates of the i-th node be (xi , yi ). Then i (x, y) S1 is
defined by
17 20 24 27
16 29
15 19 23 26
13 28
11 14 21
10
9 22 25
7 18
6
12
4
3 8
1 5
2
1 i=j
(1.65) i ((xj , yj )) := ij =
0 i 6= j
A typical basis function i is sketched in Figure 1.7.
Another often used finite element element space is S2 H, the space of continuous,
piecewise quadratic polynomials. These functions are (or can be) uniquely determined by
their values at the vertices and edge midpoints of the triangle. The basis functions are
defined according to (1.65). There are two kinds of basis functions i now, first those
that are 1 at a vertex and second those that are 1 at an edge midpoint, cf. Fig. 1.8. One
immediately sees that for most i 6= j
(1.66) a(i , j ) = 0, (i , j ) = 0.
20 CHAPTER 1. INTRODUCTION
Figure 1.8: The piecewise quadratic basis functions corresponding to the edge midpoints [5]
Therefore the matrices A and M in (1.62)(1.63) will be sparse. The matrix M is positive
definite as
N
X N
X N
X
T
(1.67) x Mx = xi xj mij = xi xj (i , j ) = (u, u) > 0, u= xi i 6= 0,
i,j=1 i,j=1 i=1
p
because the i are linearly independent and because ||u|| = (u, u) is a norm. Similarly
it is shown that
xT Ax 0.
It is possible to have xT Ax = 0 for a nonzero vector x. This is the case if the constant
u
function u = 1 is contained in S. This happens if Neumann boundary conditions n =0
are posed on the whole boundary . Then,
X
u(x, y) = 1 = i (x, y),
i
14 14
12 12
10 10
8 8
6 6
4 4
2 2
0 0
2 2
0 5 10 15 20 25 0 5 10 15 20 25
16
14
12
10
0 5 10 15 20 25
B
curl E(x, t) = (x, t), (Faradays law)
t
D
curl H(x, t) = (x, t) + j(x, t), (MaxwellAmp`ere law)
t
div D(x, t) = (x, t), (Gausss law)
div B(x, t) = 0. (Gausss law magnetic)
where E is the electric field intensity, D is the electric flux density, H is the magnetic
field intensity, B is the magnetic flux density, j is the electric current density, and is the
electric charge density. Often the optical problem is analyzed, i.e. the situation when
the cavity is not driven (cold mode), hence j and are assumed to vanish.
Again by separating variables, i.e. assuming a time harmonic behavior of the fields,
e.g.,
E(x, t) = e(x)eit ,
and by using the constitutive relations
D = E, B = H, j = E,
22 CHAPTER 1. INTRODUCTION
0.05
0.05
0.1
With the correct finite element discretization this problem turns in a matrix eigenvalue
problem of the form
A C x M O x
= .
CT O y O O y
The solution of this matrix eigenvalue problem correspond to vibrating electric fields. A
possible shape of domain is given in Figure 1.11.
There are several possibilities to define the weights of the similarity graph associated
with a set of data points and a similarity function:
fully connected graph All points with positive similarity are connected with each other
and we simply set wij = s(xi , xj ). Usually, this will only result in reasonable clusters
if the similarity function models locality very well. One example of such a similarity
kx x k2
function is the Gaussian s(xi , xj ) = exp i22j , where kxi xj k is some
distance measure (e.g., Euclidean distance) and is some parameter controlling
how strongly locality is enforced.
-neighbors Two vertices xi , xj are connected if their pairwise distance is smaller than
for some parameter > 0. In this case, the weights are usually chosen uniformly,
e.g., wij = 1 if xi , xj are connected and wij = 0 otherwise.
Assuming that the similarity function is symmetric (s(xi , xj ) = s(xj , xi ) for all xi , xj ) all
definitions above give rise to a symmetric weight matrix W . In practice, the choice of the
most appropriate definition depends as usual on the application.
In the case of an unweighted graph, the degree di amounts to the number of vertices
adjacent to vi (counting also vi if wii = 1). The degree matrix is defined as
D = diag(d1 , d2 , . . . , dn ).
(1.70) L = D W.
By (1.69), the row sums of L are zero. In other words, Le = 0 with e the vector of all
ones. This implies that 0 is an eigenvalue of L with the associated eigenvector e. Since L
1.8. SPECTRAL CLUSTERING 25
is symmetric all its eigenvalues are real and one can show that 0 is the smallest eigenvalue;
hence L is positive semidefinite. It may easily happen that more than one eigenvalue is
zero. For example, if the set of vertices can be divided into two subsets {x1 , . . . , xk },
{xk+1 , . . . , xn }, and vertices from one subset are not connected with vertices from the
other subset, then
L1 0
L= ,
0 L2
where L1 , L2 are the Laplacians of the two disconnected components. Thus L has two
eigenvectors (0e) and (0e) with eigenvalue 0. Of course, any linear combination of these two
linearly independent eigenvectors is also an eigenvector of L.
The observation above leads to the basic idea behind spectral graph partitioning: If
the vertices of the graph decompose into k connected components V1 , . . . , Vk there are k
zero eigenvalues and the associated invariant subspace is spanned by the vectors
(1.71) V1 , V2 , . . . , Vk ,
Example 1.1 200 real numbers are generated by superimposing samples from 4 Gaussian
distributions with 4 different means:
m = 50; randn(state,0);
x = [2+randn(m,1)/4;4+randn(m,1)/4;6+randn(m,1)/4;8+randn(m,1)/4];
The following two figures show the histogram of the distribution of the entries of x and the
eigenvalues of the graph Laplacian for the fully connected similarity graph with similarity
|x x |2
function s(xi , xj ) = exp i 2 j :
4
There are more efficient algorithms for finding connected components, e.g., breadth-first and depth-first
search.
26 CHAPTER 1. INTRODUCTION
8 50
40
6
30
4 20
10
2
0
0 10
2 4 6 8 2 4 6 8 10
As expected, one eigenvalue is (almost) exactly zero. Additionally, the four smallest
eigenvalues have a clearly visible gap to the other eigenvalues. The following four figures
show the entries of the 4 eigenvectors belonging to the 4 smallest eigenvalues of L:
0.0707 0.15 0.1 0.15
0.1 0.1
0.0707 0.05
0.05 0.05
0.0707 0
0 0
0.0707 0.05
0.05 0.05
On the one hand, it is clearly visible that the eigenvectors are well approximated by linear
combinations of indicator vectors. On the other hand, none of the eigenvectors is close to
an indicator vector itself and hence no immediate conclusion on the clusters is possible.
To solve the issue that the eigenbasis (1.72) may be transformed by an arbitrary
orthogonal matrix, we transpose the basis and consider the row vectors of U :
U T = u1 , u2 , . . . , un , ui Rk .
If U contained indicator vectors then each of the short vectors ui would be a unit vector
ej for some 1 j k (possibly divided by |Vj |). In particular, the ui would separate very
well into k different clusters. The latter property does not change if the vectors ui undergo
an orthogonal transformation QT . Hence, applying a clustering algorithm to u1 , . . . , un
allows us to detect the membership of ui independent of the orthogonal transformation.
The key point is that the short vectors u1 , . . . , un are much better separated than the
original data x1 , . . . , xn . Hence, a much simpler algorithm can be used for clustering. One
of the most basic algorithms is k-means clustering. Initially, this algorithm assigns each
ui randomly5 to a cluster with 1 k and then iteratively proceeds as follows:
1. Compute cluster centers c as cluster means:
X . X
c = ui 1.
i in cluster i in cluster
3.5
3
Cluster
2.5
1.5
1
2 4 6 8
Data points
(1.73) D 1 L = I D 1 W
instead of the standard Laplacians. Equivalently, this means that we compute the eigen-
vectors belonging to the smallest eigenvalues of the generalized eigenvalue problem W x =
Dx. Alternatively, one may also compute the eigenvalues from the symmetric matrix
D 1/2 W D 1/2 but the eigenvectors need to be adjusted to compensate this transforma-
tion.
Example 1.1 (contd). The eigenvalues of the normalized Laplacian for Example 1.1
are shown below:
0.8
0.6
0.4
0.2
0.2
2 4 6 8 10
In comparison to the eigenvalues of the standard Laplacian, the four smallest eigenvalues
of the are better separated from the rest. Otherwise, the shape of the eigenvectors is
similar and the resulting clustering is identical with the one obtained with the standard
Laplacian.
28 CHAPTER 1. INTRODUCTION
Figure 1.12: Things that can go wrong with the basic model: left is a dangling node, right
a terminal strong component featuring a cyclic path. Figures are from [2]
Clearly, this is an extremely sparse matrix. The number of its nonzero elements nnz(G)
equals the number of hyperlinks in W . Let ri and cj be the row and column sums of G,
X X
ri = gij , cj = gij .
j i
Then rj is called the in-degree and cj is called the out-degree of the jth page. cj = 0
means a dead end.
In Fig. 1.13 we see the example of a tiny web with just n = 6 nodes. The nodes , ,
, , , correspond to labels 1 to 6 in the matrix notation, in this sequence.
6
Here we closely follow Section 2.11 in Molers Matlab introduction [13].
7
http://www.worldwidewebsize.com/ accessed on Feb. 18, 2016.
1.9. GOOGLES PAGERANK 29
Notice the zero 5th column of G. This column corresponds to the dead end at the dangling
node .
Let A be the matrix with elements
(
gij /cj if cj 6= 0
aij =
1/n if cj = 0 (dead end).
The entries in As column j indicate the probabilities of jumping from the jth page to the
other pages on the web. Column 3, e.g., tells that starting from node 3 (= ) nodes , ,
are chosen with equal probability 1/3. Note that we choose any page of the web with
equal probability when we land at a dead end.
30 CHAPTER 1. INTRODUCTION
To not be stuck to much in parts of the web, we follow the links only with probability .
With probability 1 we choose a random page. Therefore, we replace A by the matrix
A = A + (1 )peT ,
A cannot have an eigenvalue larger than 1 in modulus. The PerronFrobenius theorem for
matrices with nonnegative entries states that such matrices have a simple real eigenvalue
of largest modulus [4]. Therefore, the eigenvalue 1 is in fact the largest eigenvalue of A.
We are not interested in the left eigenvector e but in the right eigenvector x,
x = Ax.
The PerronFrobenius theory confirms that x can be chosen such that all its entries are
nonnegative. If x is scaled such that
n
X
xi = 1
i=1
then x is the state vector of the Markov chain and is Googles PageRank.
The computation of the PageRank amounts to determining the largest eigenvalue and
corresponding eigenvector of a matrix. It can be determined by vector iteration. The
computation gets easier the smaller the damping factor is chosen. However, small
means small weight is given to the structure of the web. In [13] a Matlab routine
pagerankpow.m is provided to compute the PageRank exploiting the sparsity structure of
G.
Bibliography
[1] A. Adelmann, P. Arbenz, and Y. Ineichen, A fast parallel Poisson solver on ir-
regular domains applied to beam dynamics simulations, J. Comput. Phys., 229 (2010),
pp. 45544566.
[3] O. Axelsson and V. Barker, Finite Element Solution of Boundary Value Prob-
lems, Academic Press, Orlando, FL, 1984.
[6] P. G. Ciarlet, The Finite Element Method for Elliptic Problems, North-Holland,
Amsterdam, 1978. (Studies in Mathematics and its Applications, 4).
[10] A. N. Langville and C. D. Meyer, Googles pagerank and beyond: the science of
search engine rankings, Princeton University Press, Princeton, N.J., 2006.
[11] R. J. LeVeque, Finite Difference Methods for Ordinary and Partial Differential
Equations, SIAM, Philadelphia, PA, 2007.
[13] C. B. Moler, Numerical Computing with Matlab, SIAM, Philadelphia, PA, 2004.
[14] Y. Saad, Numerical Methods for Large Eigenvalue Problems, SIAM, Philadelphia,
PA, 2011.
[15] H. R. Schwarz, Methode der finiten Elemente, Teubner, Stuttgart, 3rd ed., 1991.
32 CHAPTER 1. INTRODUCTION
[16] , Numerische Mathematik, Teubner, Stuttgart, 3rd ed. ed., 1993.
Basics
2.1 Notation
The fields of real and complex numbers are denoted by R and C, respectively. Elements
in R and C, scalars, are denoted by lowercase letters, a, b, c, . . ., and , , , . . .
Vectors are denoted by boldface lowercase letters, a, b, c, . . ., and , , , . . . We
denote the space of vectors of n real components by Rn and the space of vectors of n
complex components by Cn .
x1
x2
(2.1) x Rn x = . , xi R.
..
xn
We often make statements that hold for real or complex vectors or matrices. Then we
write, e.g., x Fn .
The inner product of two n-vectors in C is defined as
n
X
(2.2) (x, y) = xi yi = y x,
i=1
that is, we require linearity in the first component and anti-linearity in the second.
y = (y1 , y2 , . . . , yn ) denotes conjugate transposition of complex vectors. To simplify
notation we denote real transposition by an asterisk as well.
Two vectors x and y are called orthogonal, x y, if x y = 0.
The inner product (2.2) induces a norm in F,
n
!1/2
p X
(2.3) kxk = (x, x) = |xi |2 .
i=1
33
34 CHAPTER 2. BASICS
The matrix A Fnm ,
a
11 a
21 . . . a
m1
a
a m2
22 . . . a
12
(2.5) A = . .. ..
.. . .
a
1n a
2n . . . a
nm
is the Hermitian transpose of A. Notice, that with this notation n-vectors can be
identified with n-by-1 matrices.
The following classes of square matrices are of particular importance:
We define the norm of a matrix to be the norm induced by the vector norm (2.3),
kAxk
(2.6) kAk := max = max kAxk.
x6=0 kxk kxk=1
(2.7) Ax = x,
(2.8) (A I)x = 0
Definition 2.1 Let the pair (, x) be a solution of (2.7) or (2.8), respectively. Then
is called an eigenvalue of A,
(2.11) Ax = Bx,
(2.12) (A B)x = 0
(2.16) S 1 AS = C.
Theorem 2.6 Similar matrices have equal eigenvalues with equal multiplicities. If (, x)
is an eigenpair of A and C = S 1 AS then (, S 1 x) is an eigenpair of C.
CS 1 x = S 1 ASS 1 x = S 1 x.
Hence, A and C have equal eigenvalues and their geometric multiplicity is not changed by
the similarity transformation. From
it follows that the characteristic polynomials of A and C are equal and hence also the
algebraic eigenvalue multiplicities are equal.
Similarity transformations are used to transform matrices into similar matrices from
which eigenvalues can be easily read. Diagonal matrices are the preferred matrix structure.
However, not all matrices are diagonalizable. There is, e.g., no invertible matrix S that
diagonalizes the matrix
1 1
.
0 1
In the Jordan normal form introduced in section 2.8 the transformation is into a bidiagonal
matrix. In the Schur normal form, see section 2.4 the transformation is into an upper
tridiagonal matrix, but with an unitary S.
Definition 2.7 Two matrices A and B are called unitarily similar if S in (2.16) is
unitary. If the matrices are real the term orthogonally similar is used.
Ax = Bx T AS 1 Sx = T BS 1 Sx.
38 CHAPTER 2. BASICS
This sometimes called equivalence transformation of A, B. Thus, (A; B) = (T AS 1 ; T BS 1 ).
Let us consider a special case: let B be invertible and let B = LU be the LU-factorization
of B. Then we set S = U and T = L1 and obtain T BU 1 = L1 LU U 1 = I. Thus,
(A; B) = (L1 AU 1 ; I) = (L1 AU 1 ).
(2.17) U AU = T
Thus, the first k Schur vectors u1 , . . . , uk form an invariant subspace1 for A. From
(2.18) it is clear that the first Schur vector is an eigenvector of A. The other columns of U ,
however, are in general not eigenvectors of A. Notice, that the Schur decomposition is not
unique. In the proof we have chosen any eigenvalue . This indicates that the eigenvalues
can be arranged in any order in the diagonal of T . This also indicates that the order with
which the eigenvalues appear on T s diagonal can be manipulated.
The Schur decomposition for Hermitian matrices is particularly simple. We first note
that A being Hermitian implies that the upper triangular in the Schur decomposition
A = U U is Hermitian and thus diagonal. In fact, because
= = (U AU ) = U A U = U AU = ,
each diagonal element i of satisfies i = i . So, has to be real. In summary have the
following result.
Theorem 2.14 (Spectral theorem for Hermitian matrices) Let A be Hermitian.
Then there is a unitary matrix U and a real diagonal matrix such that
n
X
(2.23) A = U U = i ui ui .
i=1
that models the vibration of a homogeneous string of length that is fixed at both ends.
The eigenvalues and eigenvectors or eigenfunctions of (2.24) are
(2.26) Tn x = x
where
2 1
1 2 1
2
(n + 1) 1 2 1
Tn := 2 . . .. .. Rnn .
. . .
1 2 1
1 2
The matrix eigenvalue problem (2.26) can be solved explicitly [9, p.229]. Eigenvalues and
eigenvectors are given by
This equation shows that p(A) has the same eigenvectors as the original matrix A. The
eigenvalues are modified though, k becomes p(k ). Similarly, more complicated functions
of A can be computed if the function is defined on spectrum of A.
2.8. THE JORDAN NORMAL FORM 43
2.8 The Jordan normal form
Theorem 2.16 (Jordan normal form) For every A Fnn there is a nonsingular
matrix X Fnn such that
(2.29) X 1 AX = J = diag(J1 , J2 , . . . , Jp ),
where
k 1
..
k .
(2.30) Jk = Jmk (k ) =
..
Fmk mk
. 1
k
are called Jordan blocks and m1 + + mp = n. The values k need not be distinct. The
Jordan matrix J is unique up to the ordering of the blocks. The transformation matrix X
is not unique.
A matrix is diagonalizable if all Jordan blocks are 1 1, i.e., mk = 1 for all k 3 . In this
case the columns of X are eigenvectors of A.
More generally, there is one eigenvector associated with each Jordan block, e.g.,
1 1
J2 ()e1 = = e1 .
0 0
Nontrivial Jordan blocks give rise to so-called generalized eigenvectors e2 , . . . , emk since
(Jk () I)ej+1 = ej , j = 1, . . . , mk 1.
This choice of generalized eigenvectors is not unique though, as (Jk ()I)(e2 +e1 ) = e1
for any . This is one of the reasons for the non-uniqueness of the transformation matrix
X in Theorem 2.16.
From the Jordan blocks we can read geometric and algebraic multiplicity of an eigen-
value: The number of Jordan blocks associated with a particular eigenvalue give the
geometric multiplicity; the sum of its orders gives the algebraic multiplicity.
Numerically the size of the Jordan blocks cannot be determined stably as the following
example shows. Let
1 0 1
= J2 (0)
0 0 0
be the approximation for J2 (0) that some numerical algorithm has computed. This matrix
has two distinct eigenvalues and thus two eigenvectors,
1 1 1 1 1 0
= .
0 0 2 0 2 0
For small the two eigenvectors are very close. They even collaps when 0. A
numerical code cannot differ between the two cases ( = 0, 6= 0) that have a completely
different structure.
3
1 1 Jordan blocks are called trivial.
44 CHAPTER 2. BASICS
Let Y := X and let X = [X1 , X2 , . . . , Xp ] and Y = [Y1 , Y2 , . . . , Yp ] be partitioned
according to J in (2.29), meaning that Xj , Yj Fnmj . Then,
p
X p
X
(2.31) A = XJY = Xk Jk Yk = (k Xk Yk + Xk Nk Yk ),
k=1 k=1
Pk D = D Pk = k D ,
APk = Pk A = Pk APk = k Pk + Dk ,
Aj Pk = Pk Aj = Pk Aj Pk = Pk (k In + Dk )j = (k In + Dk )j Pk .
The Jordan normal form can be computed from the Schur decomposition A = U T U ,
see, e.g., [2], although it is not recommended in general to do so.
2. Let
T1 T12 T1s
T2 T2s
(2.33) T = .. ..
. .
Ts
where the s diagonal blocks Tk are related to the s distinct eigenvalues of T . The off-
diagonal blocks Tj are zeroed one after the other. Each steps requires the solution
of a Sylvester equation Tj = Tj Y Y T .
Exercise: Consider the case of two (simple or multiple) eigenvalues,
T1 T12
T = .
T2
Determine Y ? How can this be extended to the case (2.33) with s diagonal blocks?
The Jordan normal form can be nicely employed to define matrix functions, see [5].
2.9. PROJECTIONS 45
2.9 Projections
Definition 2.17 A matrix P that satisfies
(2.34) P2 = P
is called a projection.
x1
P := X(Y X)1 Y
is a projection.
Example 2.21 The spectral projectors Xk Yk introduced in (2.31) are projectors. Their
range is the span of all eigenvectors and generalized eigenvectors associated with the
eigenvalue k .
46 CHAPTER 2. BASICS
If P is a projection then I P is a projection as well. In fact, (I P )2 = I 2P + P 2 =
I 2P + P = I P . If P x = 0 then (I P )x = x. Therefore, the range of I P coincides
with the null space of P , R(I P ) = N (P ). It can be shown that R(P ) = N (P ) .
Notice that R(P )R(I P ) = N (I P )N (P ) = {0}. For, if P x = 0 then (I P )x =
x, which can only be zero if x = 0. So, any vector x can be uniquely decomposed into
(i) P 2 = P
(2.37)
(ii) P = P.
Proposition 2.23 Let P be a projection. Then the following statements are equivalent.
(i) P = P ,
(ii) R(I P ) R(P ), i.e. (P x) (I P )y = 0 for all x, y.
x P y = (P x) y = (P x) (P y + (I P )y)
= (P x) (P y)
= (P x + (I P )x)(P y) = x (P y).
This equality holds for any x and y and thus implies (i).
Problem 2.28 What is the form of the orthogonal projection onto span{q} if the inner
product is defined as hx, yi := y M x where M is a symmetric positive definite matrix?
2.10. THE RAYLEIGH QUOTIENT 47
2.10 The Rayleigh quotient
Definition 2.29 The quotient
x Ax
(x) := , x 6= 0,
x x
is called the Rayleigh quotient of A at x.
Notice, that (x) = (x), 6= 0. Hence, the properties of the Rayleigh quotient
can be investigated by just considering its values on the unit sphere. Using the spectral
decomposition A = U U , we get
n
X
x Ax = x U U x = i |ui x|2 .
i=1
Pn
Similarly, x x = 2
i=1 |ui x| . With 1 2 n , we have
n
X n
X n
X
1 |ui x|2 i |ui x|2 n |ui x|2 .
i=1 i=1 i=1
So,
1 (x) n , for all x 6= 0.
As
(uk ) = k ,
the extremal values 1 and n are actually attained for x = u1 and x = un , respectively.
Thus we have proved the following theorem.
Theorem 2.30 Let A be Hermitian. Then the Rayleigh quotient satisfies
As the Rayleigh quotient is a continuous function it attains all values in the closed interval
[1 , n ].
The next theorem generalizes the above theorem to interior eigenvalues. The following
theorems is attributed to Poincare, Fischer and Polya.
Theorem 2.31 (Minimum-maximum principle) Let A be Hermitian. Then
(2.40) k k , 1 k p.
(2.41) A wi = i wi , 1 i p,
with wi wj = ij . Then the vectors Qw1 , . . . , Qwp are normalized and mutually orthogo-
nal. Therefore, we can construct a normalized vector x0 with kx0 k = 1,
x0 ui = 0, 1 i k 1.
(Note, that kx0 k = 1 implies kak = 1.) Then, with the minimum-maximum principle we
get
k
X
k = min R(x) R(x0 ) = x0 Ax0
= a Q AQa =
|a|2i i k .
x6=0
x u1 ==x uk1 =0 i=1
Exercise: It is possible to prove the inequalities (2.40) without assuming that the
q1 , . . . , qp are orthonormal. But then one has to use the eigenvalues k of
A x = Bx, B = Q Q,
Remark 2.5. Let qi = eji , 1 i k. This means that we extract rows and columns
j1 , . . . , jk to construct A . (The indices ji are assumed to be distinct.)
Remark 2.6. Lets remove a single row/column (with equal index) from A. Then k = n 1
in Remark 2.5 and the index set j1 , . . . , jn1 contains all but one of the integers 1, . . . , n.
If we formulate a monotonicity principle based on the eigenvalues n , n1 , . . . as con-
secutive maxima of the Rayleigh quotient, then we arrive at the interlacing property
Remark 2.7. When working with symmetric matrices, one often stores only half of the
matrix, e.g. the lower triangle consisting of all elements including and below the diagonal.
The L-factor of the Cholesky factorization can overwrite this information in-place to save
memory.
50 CHAPTER 2. BASICS
Definition 2.36 The inertia of a Hermitian matrix is the triple (, , ) where , , is
the number of negative, zero, and positive eigenvalues.
where 1 2 p 0.
df (x A Ay +
y A Ay)kx + yk2 (x y + y
y)kA(x + y)k2
() =
d kx + yk4
(x A A 2 x )y = (A Ax 2 x) y = 0, for all y,
from which
A Ax = 2 x
follow. Multiplying Ax = y from the left by A we get A Ax = A y = 2 x from which
A y = x
where A = U AV .
The proof above is due to W. Gragg. It nicely shows the relation of the singular value
decomposition with the spectral decomposition of the Hermitian matrices A A and AA ,
(2.47) A = U V = A A = V 2 V . AA = U 2 U ,
Note that the proof given in [4] is shorter and maybe more elegant.
The SVD of dense matrices is computed in a way that is very similar to the dense Her-
mitian eigenvalue problem. However, in the presence of roundoff error, it is not advisable
to make use of the matrices A A and AA . Instead, let us consider the (n + m) (n + m)
Hermitian matrix
O A
(2.48) .
A O
mn and
assume that m n. Then we write U = [U1 , U2 ] where U1 F
Now, let us
1
= with 1 Rnn . Then
O
O O 1 U1 O O 1 O U1 O
O A U U2 O U O U2
= 1 O O O U2 O = 1 1 O O O V .
A O O O V O V O
1 O O O V O O O U2 O
The first and third diagonal zero blocks have order n. The middle diagonal block has
order n m. Now we employ the fact that
0 1 1 1 0 1 1 1
=
0 2 1 1 0 2 1 1
to obtain
" # 1 1
O O U V
O A 1 U1 1 U1 U2 1 2 1 2
O
(2.49) = 2
1 V
2 1 O 2 U1 2 V .
1 1
A O 12 V O
2 O O O U2 O
Thus, there are three ways how to treat the computation of the singular value decompo-
sition as an eigenvalue problem. One of the two forms in (2.47) is used implicitly in the
QR algorithm for dense matrices A, see [4],[1]. The form (2.48) is suited if A is a sparse
matrix.
52 CHAPTER 2. BASICS
Bibliography
[1] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. D.
Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov,
and D. Sorensen, LAPACK Users Guide Release 2.0, SIAM, Philadel-
phia, PA, 1994. (Software and guide are available from Netlib at URL
http://www.netlib.org/lapack/).
[4] G. H. Golub and C. F. van Loan, Matrix Computations, The Johns Hopkins
University Press, Baltimore, MD, 2nd ed., 1989.
[7] R. J. LeVeque, Finite Difference Methods for Ordinary and Partial Differential Equa-
tions, SIAM, Philadelphia, PA, 2007.
[8] Y. Saad, Numerical Methods for Large Eigenvalue Problems, SIAM, Philadelphia, PA,
2011.
hl, Matrizen und ihre technischen Anwendungen, Springer, Berlin, 4th ed.,
[9] R. Zurmu
1964.
Chapter 3
Newton methods
(3.1) (A I)x = 0 Ax = x.
In the linear eigenvalue problem (3.1) the eigenvalue appears linearly. However, as the
unknown is multiplied with the unknown vector x, the problem is in fact nonlinear. We
have n+1 unknowns , x1 , . . . , xn that are not uniquely defined by the n equations in (3.1).
We have noticed earlier, that the length of the eigenvector x is not determined. This can
be rectified by adding a further equation that fixes the length of x. The straightforward
condition is
(3.2) kxk2 = x x = 1,
Eq. (3.3) is linear in x and thus simpler. However, the combined equations (3.1)(3.3) are
nonlinear anyway. Furthermore, c must be chosen such that it has a strong component in
the (unknown) direction of the searched eigenvector. This requires some knowledge about
the solution.
In nonlinear eigenvalue problems we have to find values C such that
(3.4) A()x = 0
where A() is a matrix the elements of which depend on in a nonlinear way. An example
is a matrix polynomial,
d
X
(3.5) A() = k Ak , Ak Fnn .
k=0
A() = A0 A1 , A0 = A, A1 = I.
53
54 CHAPTER 3. NEWTON METHODS
Quadratic eigenvalue problems of the form
(3.6) Ax + Kx + 2 M x = 0.
Matrix polynomials can be linearized, i.e., they can be transformed in a linear eigenvalue
of bigger size. The quadratic eigenvalue problem (3.6) can be transformed in a linear
eigenvalue problem of size 2n. Setting y = x we get
A O x K M x
=
O I y I O y
or
A K x O M x
= .
O I y I O y
Notice that many other linearizations are possible [2, 6]. Notice also the relation with the
transformation of high order to first order ODEs [5, p. 478].
Instead of looking at the nonlinear system (3.1) (complemented with (3.3) or (3.2)) we
may look at the nonlinear scalar equation
and apply some zero finder. Here the question arises how to compute f () and in particular
d
f () = d det A().
where P () is the permutation matrix due to partial pivoting, L() is a lower unit trian-
gular matrix, and U () is an upper triangular matrix. From well-known properties of the
determinant function, equation (3.8) gives
Taking the particular structures of the factors in (3.8) into account, we get
n
Y
(3.9) f () = det A() = 1 uii ().
i=1
How can we compute the derivatives uii of the diagonal elements of U ()?
3.2. ZEROS OF THE DETERMINANT 55
3.2.1 Algorithmic differentiation
A clever way to compute derivatives of a function is by algorithmic differentiation, see
e.g., [1]. Here we assume that we have an algorithm available that computes the value
f () of a function f , given the input argument . By algorithmic differentiation a new
algorithm is obtained that computes besides f () the derivative f ().
The idea is easily explained by means of the Horner scheme to evaluate polynomials.
Let
n
X
f (z) = ci z i .
i=1
pn := cn ,
pi := z pi+1 + ci , i = n 1, n 2, . . . , 0,
f (z) := p0 .
Note that each of the pi can be considered as a function (polynomial) in z. We use the
above recurrence to determine the derivatives dpi ,
dpn := 0, pn := cn ,
We can proceed in a similar fashion for computing det A(). We however need to be able
to compute the derivatives aij . Then, we can derive each single assignment in the GEPP
algorithm.
If we restrict ourselves to the standard eigenvalue problem Ax = x then A() =
A I. Then, aij = ij , the Kronecker .
hij = 0, i > j + 1.
Ax = x Hy = y, x = Sy.
So, A and H have equal eigenvalues (A and H are similar) and the eigenvectors are
transformed by S. We now assume that H is unreduced, i.e., hi+1,i 6= 0 for all i.
Otherwise we can split Hx = x in smaller problems.
Let be an eigenvalue of H and
(3.11) (H I)x = 0,
i.e., x is an eigenvector of H associated with the eigenvalue . Then the last component
of x cannot be zero, xn 6= 0. The proof is by contradiction. Let xn = 0. Then (for n = 4)
h11 h12 h13 h14 x1 0
h21 h22 h23 h24 x2 0
= .
h32 h33 h34 x3 0
h43 h44 0 0
from which xn1 = 0 follows since we assumed hn,n1 6= 0. In the exact same procedure
we obtain xn2 = 0, . . . , x1 = 0. But the zero vector cannot be an eigenvector. Therefore,
xn must not be zero. Without loss of generality we can set xn = 1.
We continue to expose the procedure with a problem size n = 4. If is an eigenvalue
then there are xi , 1 i < n, such that
h11 h12 h13 h14 x1 0
h21 h22 h23 h24
x2 0
(3.12) = .
h32 h33 h34 x3 0
h43 h44 1 0
1
xi = (hi+1,i+1 ) xi+1 + hi+1,i+2 xi+2 + + hi+1,n xn , i = n 1, . . . , 1.
hi+1,i |{z}
1
The xi are functions of , in fact, xi Pni . The first equation in (3.13) gives
The last column of this equation corresponds to (3.13). Taking determinants yields
n1
!
Y
det(H I) = (1)n1 hi+1,i p() = c p().
i=1
and similarly for multiple zeros z1 , . . . , zm . Working with (3.16) is called implicit defla-
tion. Here, f is not modified. In this way errors in z are not propagated to f1
(3.23) Tk uk+1 = Tk xk .
Note that
So, uk+1 points in the desired direction; it just needs to be normalized. Premultiply-
ing (3.24) by cT and using (3.22) gives
or
1
(3.25) k+1 = k .
cT uk+1
In summary, we get the following procedure.
If the linear eigenvalue problem is solved by Algorithm 3.3 then T ()x = x. In each
iteration step a linear system has to be solved which requires the factorization of a matrix.
We now change the way we normalize x. Problem (3.17) becomes
(3.26) T () x = 0, kxk2 = 1,
60 CHAPTER 3. NEWTON METHODS
with the corresponding nonlinear system of equations
T () x 0
(3.27) f (x, ) = 1 T = .
2 (x x 1) 0
or
1
(3.31) k+1 = k .
xTk uk+1
(3.34) T (k )x = T (k )x
Bibliography
[1] P. Arbenz and W. Gander, Solving nonlinear eigenvalue problems by algorithmic
differentiation, Computing, 36 (1986), pp. 205215.
[3] B. Parlett, Laguerres method applied to the matrix eigenvalue problem, Math. Com-
put., 18 (1964), pp. 464485.
[4] A. Ruhe, Algorithms for the nonlinear eigenvalue problem, SIAM J. Numer. Anal., 10
(1973), pp. 674689.
[6] F. Tisseur and K. Meerbergen, The quadratic eigenvalue problem, SIAM Rev., 43
(2001), pp. 235286.
[7] R. C. Ward, The QR algorithm and Hymans method on vector computers, Math.
Comput., 30 (1976), pp. 132142.
62 CHAPTER 3. NEWTON METHODS
Chapter 4
The QR Algorithm
(4.1) Ak = Rk Qk = Qk Ak1 Qk ,
and hence Ak and Ak1 are unitarily similar. The matrix sequence {Ak } converges (under
certain assumptions) towards an upper triangular matrix [11]. Let us assume that the
63
64 CHAPTER 4. THE QR ALGORITHM
Algorithm 4.1 Basic QR algorithm
1: Let A Cnn . This algorithm computes an upper triangular matrix T and a unitary
matrix U such that A = U T U is the Schur decomposition of A.
2: Set A0 := A and U0 = I.
3: for k = 1, 2, . . . do
4: Ak1 =: Qk Rk ; /* QR factorization */
5: Ak := Rk Qk ;
6: Uk := Uk1 Qk ; /* Update transformation matrix */
7: end for
8: Set T := A and U := U .
eigenvalues are mutually different in magnitude and we can therefore number the eigen-
values such that |1 | > |2 | > > |n |. Then as we will show in Chapter 8 the
elements of Ak below the diagonal converge to zero like
(k)
(4.2) |aij | = O(|i /j |k ), i > j.
With the same assumption on the eigenvalues, Ak tends to an upper triangular matrix
and Uk converges to the matrix of Schur vectors.
D = diag([4 3 2 1]);
rand(seed,0);
format short e
S=rand(4); S = (S - .5)*2;
A = S*D/S % A_0 = A = S*D*S^{-1}
for i=1:20,
[Q,R] = qr(A); A = R*Q
end
Looking at the element-wise quotients of the last two matrices one recognizes the conver-
gence rates claimed in (4.2).
A(20)./A(19) = [ 1.0000 0.9752 1.0000 -1.0000]
[ 0.7495 1.0000 0.9988 -1.0008]
[ 0.5000 0.6668 1.0000 -1.0001]
[ -0.2500 -0.3334 -0.4999 1.0000]
So, again the eigenvalues are visible on the diagonal of A20 . The element-wise quotients
of A20 relative to A19 are
4.2. THE HESSENBERG QR ALGORITHM 67
A(20)./A(19) = [ 1.0000 1.0000 1.0000 -1.0000]
[ 0.4000 1.0000 0.4993 -1.0000]
[ 0.4000 0.4753 1.0000 -1.0000]
[ -0.2000 -0.5000 -0.5000 1.0000]
Notice that (4.2) does not state a rate for the element at position (3, 2).
These little numerical tests are intended to demonstrate that the convergence rates
given in (4.2) are in fact seen in a real run of the basic QR algorithm. The conclusions we
can draw are the following:
1. The convergence of the algorithm is slow. In fact it can be arbitrarily slow if eigen-
values are very close to each other.
2. The algorithm is expensive. Each iteration step requires the computation of the QR
factorization of a full n n matrix, i.e., each single iteration step has a complexity
O(n3 ). Even if we assume that the number of steps is proportional to n we would
get an O(n4 ) complexity. The latter assumption is not even assured, see point 1 of
this discussion.
In the following we want to improve on both issues. First we want to find a matrix
structure that is preserved by the QR algorithm and that lowers the cost of a single
iteration step. Then, we want to improve on the convergence properties of the algorithm.
Definition 4.1 A matrix H is a Hessenberg matrix if its elements below the lower off-
diagonal are zero,
hij = 0, i > j + 1.
Proof. We give a constructive proof, i.e., given a Hessenberg matrix H with QR factor-
ization H = QR, we show that H = RQ is again a Hessenberg matrix.
The Givens rotation or plane rotation G(i, j, ) is defined by
1 0 0 0
.. . . .. .. ..
. . . . .
0 c s 0
i
..
G(i, j, ) := ... ..
.
..
. .
..
.
(4.4) 0 s c 0 j
.. .. .. . . ..
. . . . .
0 0 0 1
i j
Thus, it is a simple matter to zero a single specific entry in a vector by using a Givens
rotation1 .
Now, let us look at a Hessenberg matrix H. We can show the principle procedure by
means of a 4 4 example.
G(1, 2, 1 ) 0
H=
0 0
0 0 0 0
G(2, 3, 2 ) 0 G(3, 4, 3 )
0 = R
0 0
0 0
0 0 0 0 0
G G G H = R H = QR.
| 3 {z2 }1
Q
H = RQ = RG1 G2 G3 ,
or, pictorially,
0 G(1, 2, 1 )
R=
0 0 0 0
0 0 0 0 0 0
G(2, 3, 2 ) G(3, 4, 1 )
0 0
=H
0 0 0 0 0
D = diag([4 3 2 1]);
rand(seed,0);
S=rand(4); S = (S - .5)*2;
A = S*D/S % A_0 = A = S*D*S^{-1}
H = hess(A); % built-in MATLAB function
for i=1:30,
[Q,R] = qr(H); H = R*Q
end
This yields the matrix sequence
H( 0) = [ -4.4529e-01 -1.8641e+00 -2.8109e+00 7.2941e+00]
[ 8.0124e+00 6.2898e+00 1.2058e+01 -1.6088e+01]
[ 0.0000e+00 4.0087e-01 1.1545e+00 -3.3722e-01]
[ 0.0000e+00 0.0000e+00 -1.5744e-01 3.0010e+00]
Again the elements in the lower off-diagonal reflect nicely the convergence rates in (4.2).
70 CHAPTER 4. THE QR ALGORITHM
4.2.2 Complexity
We give the algorithm for a single Hessenberg-QR-step in a Matlab-like way, see Algo-
rithm 4.2. By
Hk:j,m:n
If we neglect the determination of the parameters ck and sk , see (4.5), then each of
the two loops requires
n1
X n(n 1)
6i = 6 3n2 flops.
2
i=1
A flop is a floating point operation (+, , , /). We do not distinguish between them,
although they may slightly differ in their execution time on a computer. Optionally, we
also have to execute the operation Uk := Uk1 Qk of Algorithm 4.1. This is achieved by a
loop similar to the second loop in Algorithm 4.2. Since all the rows and columns of U are
n1
X
6n 6n2 flops.
i=1
Altogether, a QR step with a Hessenberg matrix, including the update of the unitary
transformation matrix, requires 12n2 floating point operations. This has to be set in
relation to a QR step with a full matrix that costs 37 n3 . Consequently, we have gained a
factor of O(n) in terms of operations by moving from dense to Hessenberg form. However,
we may still have very slow convergence if one of the quotients |k |/|k+1 | is close to 1.
4.3. THE HOUSEHOLDER REDUCTION TO HESSENBERG FORM 71
4.3 The Householder reduction to Hessenberg form
In the previous section we discovered that it is a good idea to perform the QR algorithm
with Hessenberg matrices instead of full matrices. But we have not discussed how we
transform a full matrix (by means of similarity transformations) into Hessenberg form.
We catch up on this issue in this section.
P = I 2uu , kuk = 1,
It is easy to verify that Householder reflectors are Hermitian and that P 2 = I. From this
we deduce that P is unitary. It is clear that we only have to store the Householder
vector u to be able to multiply a vector (or a matrix) with P ,
This multiplication only costs 4n flops where n is the length of the vectors.
A task that we repeatedly want to carry out with Householder reflectors is to transform
a vector x on a multiple of e1 ,
P x = x u(2u x) = e1 .
Since P is unitary, we must have = kxk, where C has absolute value one. Therefore,
x1 kxk
x kxke1 1 x2
u= = ..
kx kxke1 k kx kxke1 k .
xn
We can freely choose provided that || = 1. Let x1 = |x1 |ei . To avoid numerical
cancellation we set = ei .
In the real case, one commonly sets = sign(x1 ). If x1 = 0 we can set in any way.
The multiplication of P1 from the left inserts the desired zeros in column 1 of A. The
multiplication from the right is necessary in order to have similarity. Because of the
nonzero structure of P1 the first column of P1 A is not affected. Hence, the zeros stay
there.
The reduction continues in a similar way:
P2 / P2
P1 AP1 = 0 0
0 0 0
0 0 0
P3 / P3
0 = P3 P2 P1 A P1 P2 P3 .
0 0 | {z }
U
0 0 0
Algorithm 4.3 gives the details for the general n n case. In step 4 of this algorithm,
the Householder reflector is generated such that
ak+1,k u1
ak+2,k u2
0
(I 2uk uk ) . = with u k = .. and || = kxk
.
. 0 .
an,k 0 unk
according to the considerations of the previous subsection. The Householder vectors are
stored at the locations of the zeros. Therefore the matrix U = P1 Pn2 that effects the
similarity transformation from the full A to the Hessenberg H is computed after all House-
holder vectors have been generated, thus saving (2/3)n3 flops. The overall complexity of
the reduction is
P
n2
Application of Pk from the left: 4(n k 1)(n k) 34 n3
k=1
n2
P
Application of Pk from the right: 4(n)(n k) 2n3
k=1
4.4. IMPROVING THE CONVERGENCE OF THE QR ALGORITHM 73
Algorithm 4.3 Reduction to Hessenberg form
1: This algorithm reduces a matrix A Cnn to Hessenberg form H by a sequence of
Householder reflections. H overwrites A.
2: for k = 1 to n2 do
3: Generate the Householder reflector Pk ;
4: /* Apply Pk = Ik (Ink 2uk uk ) from the left to A */
5: Ak+1:n,k:n := Ak+1:n,k:n 2uk (uk Ak+1:n,k:n);
6: /* Apply Pk from the right, A := APk */
7: A1:n,k+1:n := A1:n,k+1:n 2(A1:n,k+1:n uk )uk ;
8: end for
9: if eigenvectors are desired form U = P1 Pn2 then
10: U := In ;
11: for k = n2 downto 1 do
12: /* Update U := Pk U */
13: Uk+1:n,k+1:n := Uk+1:n,k+1:n 2uk (uk Uk+1:n,k+1:n);
14: end for
15: end if
P
n2
Form U = P1 Pn2 : 4(n k)(n k) 43 n3
k=1
Lemma 4.4 Let H be an irreducible Hessenberg matrix, i.e., hi+1,i 6= 0 for all i =
1, . . . , n 1. Let H = QR be the QR factorization of H. Then for the diagonal elements
of R we have
|rkk | > 0, for all k < n.
Thus, if H is singular then rnn = 0.
Proof. Let us look at the k-th step of the Hessenberg QR factorization. For illustration,
let us consider the case k = 3 in a 5 5 example, where the matrix has the structure
+ + + + +
0 + + + +
0 0 + + + .
0 0
0 0 0
The plus-signs indicate elements that have been modified. In step 3, the (nonzero) element
h43 will be zeroed by a Givens rotation G(3, 4, ) that is determined such that
cos() sin() kk
h r
= kk .
sin() cos() hk+1,k 0
74 CHAPTER 4. THE QR ALGORITHM
Because the Givens rotation preserves vector lengths, we have
1: H I = QR /* QR factorization */
2: H = RQ + I
H = Q (H I)Q + I = Q HQ.
Thus, " #
RQ =
00
(4.7) k := h(k1)
n,n = en H (k1) en .
Algorithm 4.4 implements this heuristic. Notice that the shift changes in each iteration
step! Notice also that deflation is incorporated in Algorithm 4.4. As soon as the last lower
off-diagonal element is sufficiently small, it is declared zero, and the algorithm proceeds
with a smaller matrix. In Algorithm 4.4 the active portion of the matrix is m m.
76 CHAPTER 4. THE QR ALGORITHM
Lemma 4.4 guarantees that a zero is produced at position (n, n 1) in the Hessenberg
matrix H if the shift equals an eigenvalue of H. What happens, if hn,n is a good approx-
imation to an eigenvalue of H? Let us assume that we have an irreducible Hessenberg
matrix
0 ,
0 0
0 0 0 hn,n
where is a small quantity. If we perform a shifted Hessenberg QR step, we first have to
factor H hn,n I, QR = H hn,n I. After n 2 steps of this factorization the R-factor is
almost upper triangular,
+ + + + +
0 + + + +
0 0 + + + .
0 0 0
0 0 0 0
From (4.5) we see that the last Givens rotation has the nontrivial elements
cn1 = p , sn1 = p .
||2 + ||2 ||2 + ||2
Applying the Givens rotations from the right one sees that the last lower off-diagonal
element of H = RQ + hn,n I becomes
n,n1 = 2
(4.8) h .
2 + 2
So, we have quadratic convergence unless is also tiny.
A second even more often used shift strategy is the Wilkinson shift:
" (k1) (k1)
#
hn1,n1 hn1,n
(4.9) k := eigenvalue of (k1) (k1)
that is closer to h(k1)
n,n .
hn,n1 hn,n
D = diag([4 3 2 1]);
rand(seed,0);
S=rand(4); S = (S - .5)*2;
A = S*D/S;
H = hess(A)
for i=1:8,
[Q,R] = qr(H-H(4,4)*eye(4)); H = R*Q+H(4,4)*eye(4);
end
produces the output
4.5. THE DOUBLE SHIFT QR ALGORITHM 77
H( 0) = [ -4.4529e-01 -1.8641e+00 -2.8109e+00 7.2941e+00]
[ 8.0124e+00 6.2898e+00 1.2058e+01 -1.6088e+01]
[ 0.0000e+00 4.0087e-01 1.1545e+00 -3.3722e-01]
[ 0.0000e+00 0.0000e+00 -1.5744e-01 3.0010e+00]
If 1 C \ R then 2 =
1 . Let us perform two QR steps using 1 and 2 as shifts. Setting
k = 1 for convenience we get
H 0 1 I = Q 1 R1 ,
H1 = R1 Q1 + 1 I,
(4.10)
H 1 2 I = Q 2 R2 ,
H2 = R2 Q2 + 2 I.
R1 Q1 + (1 2 )I = Q2 R2 .
Multiplying this equation with Q1 from the left and with R1 from the right we get
Because 2 =
1 we have
Therefore, (Q1 Q2 )(R2 R1 ) is the QR factorization of a real matrix. We can choose (scale)
Q1 and Q2 such that Z := Q1 Q2 is real orthogonal. (Then also R2 R1 is real.) By
consequence,
H2 = (Q1 Q2 ) H0 (Q1 Q2 ) = Z T H0 Z
is real.
A procedure to compute H2 by avoiding complex arithmetic could consist of three
steps:
(k1)
1. Form the real matrix M = H02 sH0 + tI with s = 2Re() = trace(G) = hn1,n1 +
(k1) (k1) (k1) (k1) (k1)
hn,n and t = ||2 = det(G) = hn1,n1 hn,n hn1,n hn,n1 . Notice that M has
two lower off-diagonals, " #
M= .
3. Set H2 = Z T H0 Z.
This procedure is however too expensive since item 1, i.e., forming H 2 requires O(n3 )
flops.
A remedy for the situation is provided by the Implicit Q Theorem.
4.5. THE DOUBLE SHIFT QR ALGORITHM 79
Theorem 4.5 (The implicit Q theorem) Let A Rnn . Let Q = [q1 , . . . , qn ] and
V = [v1 , . . . , vn ] be orthogonal matrices that both similarly transform A to Hessenberg
form, H = QT AQ and G = V T AV . Let k denote the smallest positive integer for which
hk+1,k = 0, with k = n if H is irreducible.
If q1 = v1 then qi = vi and |hi,i1 | = |gi,i1 | for i = 2, . . . , k. If k < n, then
gk+1,k = 0.
(4.11) wi = W ei span{e1 , . . . , ei }, i k.
(Notice that orthogonal upper triangular matrices are diagonal with diagonal entries 1.)
This is proced inductively. For i = 1 we have w1 = e1 by the assumption that q1 = v1 .
For 1 < i k we assume that (4.11) is true for wi and use the equality GW = W H. The
(i1)-th column of this equation reads
i
X
Gwi1 = GW ei1 = W Hei1 = wj hj,i1 .
j=1
i1
X
wi hi,i1 = Gwi1 wj hj,i1 span{e1 , . . . ei },
j=1
hi,i1 = eTi Hei1 = eTi QT AQei1 = eTi QT V GV T Qei1 = wiT Gwi1 = gi,i1 ,
k
X
gk+1,k = eTk+1 Gek = eTk+1 GW ek = eTk+1 W Hek = eTk+1 wj hj,k = 0.
j=1
Golub and van Loan [6, p.347] write that The gist of the implicit Q theorem is that if
QT AQ = H and Z T AZ = G are both unreduced Hessenberg matrices and Q and Z have
the same first column, then G and H are essentially equal in the sense that G = DHD
with D = diag(1, . . . , 1).
We apply the Implicit Q Theorem in the following way: We want to compute the
Hessenberg matrix Hk+1 = Z T Hk1 Z where ZR is the QR factorization of M = Hk1 2
sHk1 + tI. The Implicit Q Theorem now tells us that we essentially get Hk+1 by any
orthogonal similarity transformation Hk1 Z1 Hk1 Z1 provided that Z1 HZ1 is Hessen-
berg and Z1 e1 = Ze1 .
80 CHAPTER 4. THE QR ALGORITHM
Let P0 be the Householder reflector with
Since only the first three elements of the first column M e1 of M are nonzero, P0 has the
structure
P0 = 1 .
.
. .
1
So,
+
Hk1 := P0 Hk1 P0 =
T
+ + .
We now reduce P0T Hk1 P0 similarly to Hessenberg form the same way as we did earlier, by
a sequence of Householder reflectors P1 , . . . , Pn2 . However, P0T Hk1 P0 is a Hessenberg
matrix up to the bulge at the top left. We take into account this structure when forming
the Pi = I 2pi pTi . So, the structures of P1 and of P1T P0T Hk1 P0 P1 are
1
0
P1 =
, H = P T H P1 = 0 + .
k1 1 k1
1 + +
1
1
The transformation with P1 has chased the bulge one position down the diagonal. The
consecutive reflectors push it further by one position each until it falls out of the matrix
at the end of the diagonal. Pictorially, we have
Hk1 = P2T Hk1
P2 =
0
0 +
+ +
Hk1 = P3 Hk1 P3 =
T
0
0 +
+ +
4.5. THE DOUBLE SHIFT QR ALGORITHM 81
Hk1 = P4T Hk1
P4 =
0
0 +
Hk1 T
= P5 Hk1 P5 =
0
It is easy to see that the Householder vector pi , i < n 2, has only three nonzero elements
at position i + 1, i + 2, i + 3. Of pn2 only the last two elements are nonzero. Clearly,
P0 P1 Pn2 e1 = P0 e1 = M e1 /.
Remark 4.3. Notice that in Algorithm 4.5 a double step is taken also if the eigenvalues of
hqq hqp
G=
hpq hpp
H(0) =
>> PR=qr2st(H)
1 6 -1.7735e-01 -1.2807e+00
2 6 -5.9078e-02 -1.7881e+00
3 6 -1.6115e-04 -5.2705e+00
4 6 -1.1358e-07 -2.5814e+00
5 6 1.8696e-14 1.0336e+01
6 6 -7.1182e-23 -1.6322e-01
H(6) =
7 5 1.7264e-02 -7.5016e-01
8 5 2.9578e-05 -8.0144e-01
9 5 5.0602e-11 -4.6559e+00
10 5 -1.3924e-20 -3.1230e+00
H(10) =
11 4 1.0188e+00 -9.1705e-16
H(11) =
The inner product uT x costs 5 flops, multiplying with 2 another one. The operation
x := x u, = 2uT x, cost 6 flops, altogether 12 flops.
In the k-th step of the loop there are n k of these application from the left in step 13
and k + 4 from the right in step 15. In this step there are thus about 12n + O(1) flops to be
executed. As k is running from 1 to p 3. We have about 12pn flops for this step. Since p
runs from n down to about 2 we have 6n3 flops. If we assume that two steps are required per
eigenvalue the flop count for Francis double step QR algorithm to compute all eigenvalues
of a real Hessenberg matrix is 12n3 . If also the eigenvector matrix is accumulated the two
additional statements have to be inserted into Algorithm 4.5. After step 15 we have
1: Q1:n,k+1:k+3 := Q1:n,k+1:k+3 P ;
1: Q1:n,p1:p := Q1:n,p1:pP ;
which costs another 12n3 flops.
We earlier gave the estimate of 6n3 flops for a Hessenberg QR step, see Algorithm 4.2.
If the latter has to be spent in complex arithmetic then the single shift Hessenberg QR al-
gorithm is more expensive than the double shift Hessenberg QR algorithm that is executed
in real arithmetic.
Remember that the reduction to Hessenberg form costs 10 3
3 n flops without forming the
14 3
transformation matrix and 3 n if this matrix is formed.
1: Choose a shift
2: Compute the QR factorization A I = QR
3: Update A by A = RQ + I.
Of course, this is done by means of plane rotations and by respecting the symmetric
tridiagonal structure of A.
In the more elegant implicit form of the algorithm we first compute the first Givens
rotation G0 = G(1, 2, ) of the QR factorization that zeros the (2, 1) element of A I,
c s a11
(4.12) = , c = cos(0 ), s = sin(0 ).
s c a21 0
A = Q AQ, Q = G0 G1 Gn2 .
Q e1 = G0 G1 Gn2 e1 = G0 e1 .
Both explicit and implicit QR step form the same first plane rotation G0 . By referring to
the Implicit Q Theorem 4.5 we see that explicit and implicit QR step compute essentially
the same A.
86 CHAPTER 4. THE QR ALGORITHM
The shift for the next step is determined from elements a5 , a6 , and b6 . According to (4.12)
the first plane rotation is determined from the shift and the elements a1 and b1 . The im-
plicit shift algorithm then chases the bulge down the diagonal. In this particular situation,
the procedure finishes already in row/column 4 because b4 = 0. Thus the shift which is an
approximation to an eigenvalue of the second block (rows 4 to 6) is applied to the wrong
first block (rows 1 to 3). Clearly, this shift does not improve convergence.
If the QR algorithm is applied in its explicit form, then still the first block is not treated
properly, i.e. with a (probably) wrong shift, but at least the second block is diagonalized
rapidly.
Deflation is done as indicated in Algorithm 4.6:
Deflation is particularly simple in the symetric case since it just means that a tridiagonal
eigenvalue problem decouples in two (or more) smaller tridiagonal eigenvalue problems.
Notice, however, that the eigenvectors are still n elements long.
4.7 Research
Still today the QR algorithm computes the Schur form of a matrix and is by far the
most popular approach for solving dense nonsymmetric eigenvalue problems. Multishift
and aggressive early deflation techniques have led to significantly more efficient sequential
implementations of the QR algorithm during the last decade. For a brief survey and a
discussion of the parallelization of the QR algorithm, see [7].
The three steps of the presented symmetric QR algorithm are (1) reducion of the origi-
nal matrix to tridiagonal form, (2) computation of the eigenpairs of the tridiagonal matrix,
and (3) back-transformation of the eigenvectors. In the ELPA project the first step has
been successfully replaced by a two-stage procedure: transformation full to banded, and
banded to tridiagonal. This approach improves the utilization of memory hierarchies [8, 3].
4.8 Summary
The QR algorithm is a very powerful algorithm to stably compute the eigenvalues and (if
needed) the corresponding eigenvectors or Schur vectors. All steps of the algorithm cost
O(n3 ) floating point operations, see Table 4.1. The one exception is the case where only
eigenvalues are desired of a symmetric tridiagonal matrix. The linear algebra software
package LAPACK [1] contains subroutines for all possible ways the QR algorithm may be
employed.
88 CHAPTER 4. THE QR ALGORITHM
nonsymmetric case symmetric case
without with without with
Schurvectors eigenvectors
10 3 14 3 4 3 8 3
transformation to Hessenberg/tridiagonal form 3 n 3 n 3n 3n
20 3 50 3
real double step Hessenberg/tridiagonal QR al- 3 n 3 n 24n2 6n3
gorithm (2 steps per eigenvalues assumed)
4 3
total 10n3 25n3 3n 9n3
We finish by repeating, that the QR algorithm is a method for dense matrix problems.
The reduction of a sparse matrix to tridiagonal or Hessenberg form produces fill-in, thus
destroying the sparsity structure which one almost always tries to preserve.
Bibliography
[1] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. D.
Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov,
and D. Sorensen, LAPACK Users Guide Release 2.0, SIAM, Philadel-
phia, PA, 1994. (Software and guide are available from Netlib at URL
http://www.netlib.org/lapack/).
[2] P. Arbenz and G. H. Golub, Matrix shapes invariant under the symmetric QR
algorithm, Numer. Linear Algebra Appl., 2 (1995), pp. 8793.
[4] J. W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997.
[6] G. H. Golub and C. F. van Loan, Matrix Computations, The Johns Hopkins
University Press, Baltimore, MD, 2nd ed., 1989.
[9] B. N. Parlett, The QR algorithm, Computing Sci. Eng., 2 (2000), pp. 3842.
BIBLIOGRAPHY 89
[10] H. Rutishauser, Solution of eigenvalue problems with the LR-transformation, NBS
Appl. Math. Series, 49 (1958), pp. 4781.
[11] J. H. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, 1965.
90 CHAPTER 4. THE QR ALGORITHM
Chapter 5
In this chapter we deal with an algorithm that is designed for the efficient solution of the
symmetric tridiagonal eigenvalue problem
a1 b1
..
b1 a2 .
(5.1) T x = x, T = .. ..
.
. . bn1
bn1 an
We noticed from Table 4.1 that the reduction of a full symmetric matrix to a similar tridi-
agonal matrix requires about 83 n3 while the tridiagonal QR algorithm needs an estimated
6n3 floating operations (flops) to converge. Because of the importance of this subproblem
a considerable effort has been put into finding faster algorithms than the QR algorithms
to solve the tridiagonal eigenvalue problem. In the mid-1980s Dongarra and Sorensen [4]
promoted an algorithm originally proposed by Cuppen [2]. This algorithm was based on
a divide and conquer strategy. However, it took ten more years until a stable variant was
found by Gu and Eisenstat [5, 6]. Today, a stable implementation of this latter algorithm
is available in LAPACK [1].
91
92 CHAPTER 5. CUPPENS DIVIDE AND CONQUER ALGORITHM
5.2 Partitioning the tridiagonal matrix
Partitioning the irreducible tridiagonal matrix is done in the following way. We write
(5.2)
a 1 b1
..
.
b
1 a2
.. ..
. .
b m1
bm1 am bm
T =
b a b
m m+1 m+1
. .
b a .
m+1 m+2
.. ..
. . bn1
bn1 an
a1 b1
.
a2 . .
b
1
.. ..
. .
bm1
bm1 am bm bm bm
= +
am+1 bm bm+1 bm bm
.
am+2 . .
bm+1
.. ..
. . bn1
bn1 an
T1 em
= + uuT with u = and = bm ,
T2 e1
These two spectral decompositions can be computed by any algorithm, in particular also
by this divide and conquer algorithm by which the Ti would be further split. It is clear
that by this partitioning an large number of small problems can be generated that can be
potentially solved in parallel. For a parallel algorithm, however, the further phases of the
algorithm must be parallelizable as well.
Plugging (5.3) into (5.2) gives
T
Q1 T1 T Q1 1
(5.4) + uu = + vvT
QT2 T2 Q2 2
with
QT1 QT1 em last row of Q1
(5.5) v= u= = .
QT2 QT2 e1 first row of Q2
5.4. DEFLATION 93
Now we have arrived at the eigenvalue problem
(5.6) (D + vvT )x = x, D = 1 2 = diag(1 , . . . , n ).
That is, we have to compute the spectral decomposition of a matrix that is a diagonal
plus a rank-one update. Let
(5.7) D + vvT = QQT
be this spectral decomposition. Then, the spectral decomposition of the tridiagonal T is
T
Q1 T Q1
(5.8) T = QQ .
Q2 QT2
Forming the product (Q1 Q2 )Q will turn out to be the most expensive step of the
algorithm. It costs n3 + O(n2 ) floating point operations
5.4 Deflation
There are certain solutions of (5.7) that can be given immediately, by just looking carefully
at the equation.
If there are zero entries in v then we have
(5.9) vi = 0 v T e i = 0 = (D + vvT )ei = di ei .
Thus, if an entry of v vanishes we can read the eigenvalue from the diagonal of D at once
and the corresponding eigenvector is a coordinate vector.
If identical entries occur in the diagonal of D, say di = dj , with i < j, then we can find
a plane rotation G(i, j, ) (see (4.4)) such that it introduces a zero into the j-th position
of v,
..
p .
vi 2 + vj 2
i
..
GT v = G(i, j, )T v = .
0 j
..
.
Notice, that (for any ),
G(i, j, )T DG(i, j, ) = D, di = dj .
So, if there are multiple eigenvalues in D we can reduce all but one of them by introducing
zeros in v and then proceed as previously in (5.9).
When working with floating point numbers we deflate if
(5.10) |vi | < CkT k or |di dj | < CkT k, (kT k = kD + vvT k)
where C is a small constant. Deflation changes the eigenvalue problem for D + vvT into
the eigenvalue problem for
D1 + v1 v1T O p
(5.11) = GT (D + vvT )G + E, kEk < C kDk2 + ||2 kvk4 ,
O D2
where D1 has no multiple diagonal entries and v1 has no zero entries. So, we have to
compute the spectral decomposition of the matrix in (5.11) which is similar to a slight
perturbation of the original matrix. G is the product of Givens rotations.
94 CHAPTER 5. CUPPENS DIVIDE AND CONQUER ALGORITHM
5.4.1 Numerical examples
with
0.7370 0.5910 0.3280
0.5910 0.3280 0.7370
0.3280 0.7370 0.5910
Q0 =
0.9018 0.4153 0.1200
0.4042 0.7118 0.5744
0.1531 0.5665 0.8097
2 1 2 1 0
1 2 1 1 2 1 0
1 2 1 1 1 0 1 1
T =
=
+
1 2 1 0 1 1 1 1
1 2 1 1 2 1 0
1 2 1 2 0
T
2 1 0 0
1 2 1 0 0
1 1 1 1
=
+
= T0 + uuT .
1 1 1 1
1 2 1 0 0
1 2 0 0
5.5. THE EIGENVALUE PROBLEM FOR D + VVT 95
Now, Matlab gives
T
0.1981 0.7370 0.7370
1.5550 0.5910 0.5910
3.2470 0.3280 0.3280
Q0 T Q0 =
T
+
0.1981 0.7370 0.7370
1.5550 0.5910 0.5910
3.2470 0.3280 0.3280
with
0.3280 0.7370 0.5910
0.5910 0.3280 0.7370
0.7370 0.5910 0.3280
Q0 =
0.7370 0.5910 0.3280
0.5910 0.3280 0.7370
0.3280 0.7370 0.5910
In this example we have three double eigenvalues. Because the corresponding components
of v (vi and vi+1 ) are equal we define
G = G(1, 4, /4)G(2, 5, /4)G(3, 6, /4)
0.7071 0.7071
0.7071 0.7071
0.7071 0.7071
= 0.7071
.
0.7071
0.7071 0.7071
0.7071 0.7071
Then,
Notice that this procedure permutes the elements of v as well. Let (, x) be an eigenpair
of
(5.12) (D + vvT )x = x.
Then,
cannot be equal to one of the di . If = dk then the k-th element on the left of (5.13)
vanishes. But then either vk = 0 or vT x = 0. The first cannot be true for our assumption
about v. If on the other hand vT x = 0 then (D dk I)x = 0. Thus x = ek and
vT ek = vk = 0, which cannot be true. Therefore D I is nonsingular and
(I D)1 v
(5.15) x= .
k(I D)1 vk
1
0
10
2 0 1 3 3.5 7 8 10
P
n
vk2
(5.17) f () := 1 vT (I D)1 v = 1 = 0.
k=1
dk
5.5. THE EIGENVALUE PROBLEM FOR D + VVT 97
This equation is called secular equation. The secular equation has poles at the eigen-
values of D and zeros at the eigenvalues of D + vvT . Notice that
n
X vk2
f () = 2.
k=1
( d k )
Thus, the derivative of f is positive if > 0 wherever it has a finite value. If < 0
the derivative of f is negative (almost) everywhere. A typical graph of f with > 0 is
depicted in Fig. 5.1. (If is negative the image can be flipped left to right.) The secular
equation implies the interlacing property of the eigenvalues of D and of D + vvT ,
or
So, we have to compute one eigenvalue in each of the intervals (di , di+1 ), 1 i < n, and
a further eigenvalue in (dn , ) or (, d1 ). The corresponding eigenvector is then given
by (5.15). Evidently, these tasks are easy to parallelize.
Equations (5.17) and (5.15) can also been obtained from the relations
" # " 1 #
1 v T 1 0T 0T 1 vT
=
v I D v I 0 I D vvT 0 I
" #
1 vT (I D)1 1 vT (I D)1 v 0T 1 0T
= .
0 I 0 I D (I D)1 v I
These are simply block LDLT factorizations of the first matrix. The first is the well-known
one where the factorization is started with the (1, 1) block. The second is a backward fac-
torization that is started with the (2, 2) block. Because the determinants of the tridiagonal
matrices are all unity, we have
1 1
(5.20) det(I D vvT ) = (1 vT (I D)1 v) det(I D).
Denoting the eigenvalues of D + vvT again by 1 < 2 < < n this implies
n
Y n
Y
( j ) = (1 vT (I D)1 v) ( dj )
j=1 j=1
n
! n
X vk2 Y
(5.21) = 1 ( dj )
dk
k=1 j=1
n
Y n
X Y
= ( dj ) vk2 ( dj )
j=1 k=1 j6=k
Setting = dk gives
n
Y n
Y
(5.22) (dk j ) = vk2 (dk dj )
j=1 j=1
j6=i
98 CHAPTER 5. CUPPENS DIVIDE AND CONQUER ALGORITHM
or
Q
n Q
k1 Q
n
(dk j ) (j dk )(1)nk+1
(dk j )
1 j=1 1 j=1 j=k
vk2 = =
Q n
k1
Q Q
n
(dk dj ) (dk dj ) (dj dk )(1)nk
j=1 j=1 j=k+1
j6=i
(5.23)
k1
Q Q
n
(dk j ) (j dk )
1 j=1 j=k
= k1 > 0.
Q Q
n
(dk dj ) (dj d k )
j=1 j=k+1
(Similar arguments hold if < 0.) Thus, we have the solution of the following inverse
eigenvalue problem:
Given D = diag(d1 , . . . , dn ) and values 1 , . . . , n that satisfy (5.18). Find a vector
v = [v1 , . . . , vn ]T with positive components vk such that the matrix D + vvT has the
prescribed eigenvalues 1 , . . . , n .
The solution is given by (5.24). The positivity of the vk makes the solution unique.
c1
(5.28) h1 () := c1 + satisfies h1 (j ) = 1 (j ) and h1 (j ) = 1 (j ).
di
This means that the graphs of h1 and of 1 are tangent at = j . This is similar to
Newtons method. However in Newtons method a straight line is fitted to the given
function. The coefficients in (5.28) are given by
c1 = 1 (j )(di j )2 > 0,
i
X dk di
c1 = 1 (j ) 1 (j )(di j ) = vk2 0.
k=1
(dk j )2
c2
(5.29) h2 () := c2 + satisfies h2 (j ) = 2 (j ) and h2 (j ) = 2 (j )
di+1
c2 = 2 (j )(di+1 j )2 > 0,
Xn
dk di+1
c2 = 2 (j ) 2 (j )(di+1 j ) = vk2 2 0.
k=i+1
(dk )
Finally, we set
c1 c2
(5.30) h() = 1 + h1 () + h2 () = (1 + c1 + c2 ) + + .
| {z } di di+1
c3
This zerofinder is converging quadratically to the desired zero [7]. Usually 2 to 3 steps
are sufficient to get the zero to machine precision. Therefore finding a zero only requires
O(n) flops. Thus, finding all zeros costs O(n2 ) floating point operations.
This serial complexity of the algorithm very often overestimates the computational costs
of the algorithm due to significant deflation that is observed surprisingly often.
100 CHAPTER 5. CUPPENS DIVIDE AND CONQUER ALGORITHM
Algorithm 5.1 The tridiagonal divide and conquer algorithm
1: Let T Cnn be a real symmetric tridiagonal matrix. This algorithm computes
the spectral decomposition of T = QQT , where the diagonal is the matrix of
eigenvalues and Q is orthogonal.
2: if T is 1 1 then
3: return ( = T ; Q = 1)
4: else
T1 O
5: Partition T = + uuT according to (5.2)
O T2
6: Call this algorithm with T1 as input and Q1 , 1 as output.
7: Call this algorithm with T2 as input and Q2 , 2 as output.
8: Form D + vvT from 1 , 2 , Q1 , Q2 according to (5.4)(5.6).
9: Find the eigenvalues
and the eigenvectors Q of D + vvT .
Q1 O
10: Form Q = Q which are the eigenvectors of T .
O Q2
11: return (; Q)
12: end if
Q = sqrt(diag(q*q));
q = q./(e*Q); % normalized eigenvectors
10
2 1 0 1 2 3 4 5 6 7
10
10
2 1 0 1 2 3 4 5 6 7
We observe loss of orthogonality among the eigenvectors as the eigenvalues get closer
and closer. This may not be surprising as we compute the eigenvectors by formula (5.15)
(I D)1 v
x= .
k(I D)1 vk
If = 2 and = 3 which are almost equal, 2 3 then intuitively one expects almost
the same eigenvectors. We have in fact
2.2204 1016 4.3553 108 1.7955 108 1.1102 1016
4.3553 108 0 5.5511 108 1.8298 108
Q T Q I4 =
1.7955 10
.
8 5.5511 108 1.1102 1016 7.5437 109
1.1102 1016 1.8298 108 7.5437 109 0
Orthogonality is lost only with respect to the vectors corresponding to the eigenvalues
close to 2.
102 CHAPTER 5. CUPPENS DIVIDE AND CONQUER ALGORITHM
10
10
1 1.5 2 2.5 3
10
10
1.9 1.95 2 2.05 2.1
Figure 5.5: Secular equation corresponding to (5.32) for = 0.01 for 1.9 2.1
Already Dongarra and Sorensen [4] analyzed this problem. In their formulation they
normalize the vector v of D + vvT to have norm unity, kvk = 1. They stated
1 |f () f ()|
(5.34) |qT q | = .
| | [f ()f ()]1/2
Formula (5.34) indicates how problems may arise. In exact arithmetic, if and are
eigenvalues then f () = f () = 0. However, in floating point arithmetic this values may
be small but nonzero, e.g., O(). If | | is very small as well then we may have trouble!
So, a remedy for the problem was for a long time to compute the eigenvalues in doubled
precision, so that f () = O(2 ). This would counteract a potential O() of | |.
This solution was quite unsatisfactory because doubled precision is in general very slow
since it is implemented in software. It took a decade until a proper solution was found.
(i I D)1 v
qi = .
k(i I D)1 v
k
6: return (; Q)
The Matlab code that we showed did not give orthogonal eigenvectors. We show in
the following script that the formulae (5.24) really solve the problem.
dlam = zeros(n,1);
for k=1:n,
[dlam(k), dvec(:,k)] = zerodandc(d,v,k);
end
V = ones(n,1);
for k=1:n,
V(k) = prod(abs(dvec(k,:)))/prod(d(k) - d(1:k-1))/prod(d(k+1:n) - d(k));
V(k) = sqrt(V(k));
end
Q = (dvec).\(V*e);
diagq = sqrt(diag(Q*Q));
Q = Q./(e*diagq);
5.8. THE ALGORITHM OF GU AND EISENSTAT 105
for k=1:n,
if dlam(k)>0,
dlam(k) = dlam(k) + d(k);
else
dlam(k) = d(k+1) + dlam(k);
end
end
norm(Q*Q-eye(n))
norm((diag(d) + v*v)*Q - Q*diag(dlam))
A zero finder returns for each interval the quantity i di and the vector [d1
i , . . . , dn i ]T to high precision. These vector elements have been computed as (dk
di ) (i di ). The zerofinder of Li [7] has been employed here. At the end of this sec-
tion we list the zerofinder written in Matlab that was used here. The formulae (5.37)
and (5.38) have been used to solve the quadratic equation (5.36). Notice that only one of
the while loops is traversed, depending on if the zero is closer to the pole on the left or
to the right of the interval. The vk of formula (5.24) are computed next. Q contains the
eigenvectors.
Table 5.2: Loss of orthogonality among the eigenvectors computed by the straightforward
algorithm (I) and the Gu-Eisenstat approach (II)
Again we ran the code for = 10k for k = 0, 1, 2, 4, 8. The numbers in Table 5.2
confirm that the new formulae are much more accurate than the straight forward ones.
The norms of the errors obtained for the Gu-Eisenstat algorithm always are in the order
of machine precision, i.e., 1016 .
Note that there may be errors in the following code! The vector v is squared on the
3rd line, and later (assigning values to di1, psi1, psi2) squared again!
n = length(d);
di = d(i);
v = v.^2;
if i < n,
di1 = d(i+1); lambda = (di + di1)/2;
else
di1 = d(n) + norm(v)^2; lambda = di1;
106 CHAPTER 5. CUPPENS DIVIDE AND CONQUER ALGORITHM
end
eta = 1;
psi1 = sum((v(1:i).^2)./(d(1:i) - lambda));
psi2 = sum((v(i+1:n).^2)./(d(i+1:n) - lambda));
end
BIBLIOGRAPHY 107
dl = d - lambda;
return
Bibliography
[1] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. D.
Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov,
and D. Sorensen, LAPACK Users Guide - Release 2.0, SIAM, Philadel-
phia, PA, 1994. (Software and guide are available from Netlib at URL
http://www.netlib.org/lapack/).
[2] J. J. M. Cuppen, A divide and conquer method for the symmetric tridiagonal eigen-
problem, Numer. Math., 36 (1981), pp. 177195.
[3] J. W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997.
[4] J. J. Dongarra and D. C. Sorensen, A fully parallel algorithm for the symmetric
eigenvalue problem, SIAM J. Sci. Stat. Comput., 8 (1987), pp. s139s154.
[5] M. Gu and S. C. Eisenstat, A stable and efficient algorithm for the rank-one
modification of the symmetric eigenproblem, SIAM J. Matrix Anal. Appl., 15 (1994),
pp. 12661276.
[7] R.-C. Li, Solving secular equations stably and efficiently, Technical Report UT-CS-
94-260, University of Tennessee, Knoxville, TN, Nov. 1994. LAPACK Working Note
No. 89.
6.1 LAPACK
(This section is essentially compiled from the LAPACK Users Guide [1] that is available
online from http://www.netlib.org/lapack/lug/.)
LAPACK [1] is a library of Fortran 77 subroutines for solving the most commonly
occurring problems in numerical linear algebra. It has been designed to be efficient on a
wide range of modern high-performance computers. The name LAPACK is an acronym
for Linear Algebra PACKage.
LAPACK can solve systems of linear equations, linear least squares problems, eigen-
value problems and singular value problems. LAPACK can also handle many associated
computations such as matrix factorizations or estimating condition numbers.
LAPACK contains driver routines for solving standard types of problems, compu-
tational routines to perform a distinct computational task, and auxiliary routines to
perform a certain subtask or common low-level computation. Each driver routine typically
calls a sequence of computational routines. Taken as a whole, the computational routines
can perform a wider range of tasks than are covered by the driver routines. Many of the
auxiliary routines may be of use to numerical analysts or software developers, so we have
documented the Fortran source for these routines with the same level of detail used for
the LAPACK routines and driver routines.
Dense and banded matrices are provided for, but not general sparse matrices. In all
areas, similar functionality is provided for real and complex matrices.
LAPACK is designed to give high efficiency on vector processors, high-performance
super-scalar workstations, and shared memory multiprocessors. It can also be used sat-
isfactorily on all types of scalar machines (PCs, workstations, mainframes). A distributed-
memory version of LAPACK, ScaLAPACK [2], has been developed for other types of
parallel architectures (for example, massively parallel SIMD machines, or distributed mem-
ory machines).
LAPACK has been designed to supersede LINPACK [3] and EISPACK [10, 8], princi-
pally by restructuring the software to achieve much greater efficiency, where possible, on
modern high-performance computers; also by adding extra functionality, by using some
new or improved algorithms, and by integrating the two sets of algorithms into a unified
package.
LAPACK routines are written so that as much as possible of the computation is per-
formed by calls to the Basic Linear Algebra Subprograms (BLAS) [9, 6, 5]. Highly
efficient machine-specific implementations of the BLAS are available for many modern
109
110 CHAPTER 6. LAPACK AND THE BLAS
high-performance computers. The BLAS enable LAPACK routines to achieve high per-
formance with portable code.
The BLAS are not strictly speaking part of LAPACK, but Fortran 77 code for the
BLAS is distributed with LAPACK, or can be obtained separately from netlib where
model implementations are found.
The model implementation is not expected to perform as well as a specially tuned
implementation on most high-performance computers on some machines it may give
much worse performance but it allows users to run LAPACK codes on machines that do
not offer any other implementation of the BLAS.
The complete LAPACK package or individual routines from LAPACK are freely avail-
able from the World Wide Web or by anonymous ftp. The LAPACK homepage can be
accessed via the URL http://www.netlib.org/lapack/.
6.2 BLAS
By 1976 it was clear that some standardization of basic computer operations on vectors
was needed [9]. By then it was already known that coding procedures that worked well
on one machine might work very poorly on others. In consequence of these observations,
Lawson, Hanson, Kincaid and Krogh proposed a limited set of Basic Linear Algebra
Subprograms (BLAS) to be (hopefully) optimized by hardware vendors, implemented in
assembly language if necessary, that would form the basis of comprehensive linear alge-
bra packages [9]. These so-called Level 1 BLAS consisted of vector operations and some
attendant co-routines. The first major package which used these BLAS kernels was LIN-
PACK [3]. Soon afterward, other major software libraries such as the IMSL library and
NAG rewrote portions of their existing codes and structured new routines to use these
BLAS. Early in their development, vector computers saw significant optimizations us-
ing the BLAS. Soon, however, such machines were clustered together in tight networks
and somewhat larger kernels for numerical linear algebra were developed [6, 7] to include
matrix-vector operations (Level 2 BLAS). Additionally, Fortran compilers were by then
optimizing vector operations as efficiently as hand coded Level 1 BLAS. Subsequently,
in the late 1980s, distributed memory machines were in production and shared memory
machines began to have significant numbers of processors. A further set of matrix-matrix
operations was proposed [4] and soon standardized [5] to form a Level 3. The first major
package for linear algebra which used the Level 3 BLAS was LAPACK [1] and subsequently
a scalable (to large numbers of processors) version was released as ScaLAPACK [2]. Ven-
dors focused on Level 1, Level 2, and Level 3 BLAS which provided an easy route to
optimizing LINPACK, then LAPACK. LAPACK not only integrated pre-existing solvers
and eigenvalue routines found in EISPACK [10] (which did not use the BLAS) and LIN-
PACK (which used Level 1 BLAS), but incorporated the latest dense and banded linear
algebra algorithms available. It also used the Level 3 BLAS which were optimized by much
vendor effort. Later, we will illustrate several BLAS routines. Conventions for different
BLAS are indicated by
A root operation. For example, axpy for the operation
(6.1) y := ax + y
A prefix (or combination prefix) to indicate the datatype of the operands, for example
saxpy for single precision axpy operation, or isamax for the index of the maximum
absolute element in an array of type single.
6.2. BLAS 111
a suffix if there is some qualifier, for example cdotc or cdotu for conjugated or
unconjugated complex dot product, respectively:
n1
X
cdotc(n,x,1,y,1) = xi yi
i=0
n1
X
cdotu(n,x,1,y,1) = xi y i
i=0
Tables 6.1 and 6.2 give the prefix/suffix and root combinations for the BLAS, respectively.
Prefixes:
S REAL
D DOUBLE PRECISION
C COMPLEX
Z DOUBLE COMPLEX
Suffixes:
U transpose
C Hermitian conjugate
Level 1 BLAS
Level 2 BLAS
Level 3 BLAS
Table 6.3: Number of memory references and floating point operations for vectors of length
n.
Table 6.4: Some performance numbers for typical BLAS in Mflop/s for a 2.4 GHz Pentium
4.
This leads to a very low performance rate. The Level 3 BLAS dgemm performs at a good
fraction of the peak performance of the processor (4.8Gflop/s). The performance increases
with the problem size. We see from Table 6.3 that the ratio of computation to memory
accesses increases with the problem size. This ratio is analogous to a volume to surface
area effect.
6.3 Blocking
In the previous section we have seen that it is important to use Level 3 BLAS. However, in
the algorithm we have treated so far, there were no blocks. For instance, in the reduction
to Hessenberg form we applied Householder (elementary) reflectors from left and right to
a matrix to introduce zeros in one of its columns.
The essential point here is to gather a number of reflectors to a single block transfor-
mation. Let Pi = I 2ui ui , i = 1, 2, 3, be three Householder reflectors. Their product
is
So, if e.g. three rotations are to be applied on a matrix in blocked fashon, then the three
Householder vectors u1 , u2 , u3 have to be found first. To that end the rotations are first
applied only on the first three columns of the matrix, see Fig. 6.1. Then, the blocked
rotation is applied to the rest of the matrix.
114 CHAPTER 6. LAPACK AND THE BLAS
Remark 6.1. Notice that a similar situation holds for Gaussian elimination because
1 1 1
l21 1 1 l21 1
l31 1
l32 1 = l31 l32 1 .
.. .. .. .. .. .. ..
. . . . . . .
ln1 1 ln2 1 ln1 ln2 1
An expert driver computes all or a selected subset of the eigenvalues and (optionally)
eigenvectors. If few enough eigenvalues or eigenvectors are desired, the expert driver
is faster than the simple driver.
6.4. LAPACK SOLVERS FOR THE SYMMETRIC EIGENPROBLEMS 115
A divide-and-conquer driver solves the same problem as the simple driver. It is much
faster than the simple driver for large matrices, but uses more workspace. The name
divide-and-conquer refers to the underlying algorithm.
A relatively robust representation (RRR) driver computes all or (in a later release)
a subset of the eigenvalues, and (optionally) eigenvectors. It is the fastest algorithm
of all (except for a few cases), and uses the least workspace. The name RRR refers
to the underlying algorithm.
This computation proceeds in the following stages:
1. The real symmetric or complex Hermitian matrix A is reduced to real tridiagonal
form T . If A is real symmetric this decomposition is A = QT QT with Q orthogonal
and T symmetric tridiagonal. If A is complex Hermitian, the decomposition is
A = QT QH with Q unitary and T , as before, real symmetric tridiagonal.
2. Eigenvalues and eigenvectors of the real symmetric tridiagonal matrix T are com-
puted. If all eigenvalues and eigenvectors are computed, this is equivalent to factor-
izing T as T = SS T , where S is orthogonal and is diagonal. The diagonal entries
of are the eigenvalues of T , which are also the eigenvalues of A, and the columns
of S are the eigenvectors of T ; the eigenvectors of A are the columns of Z = QS, so
that A = ZZ T (ZZ H when A is complex Hermitian).
In the real case, the decomposition A = QT QT is computed by one of the routines
sytrd, sptrd, or sbtrd, depending on how the matrix is stored. The complex analogues
of these routines are called hetrd, hptrd, and hbtrd. The routine sytrd (or hetrd)
represents the matrix Q as a product of elementary reflectors. The routine orgtr (or
in the complex case unmtr) is provided to form Q explicitly; this is needed in particular
before calling steqr to compute all the eigenvectors of A by the QR algorithm. The
routine ormtr (or in the complex case unmtr) is provided to multiply another matrix by
Q without forming Q explicitly; this can be used to transform eigenvectors of T computed
by stein, back to eigenvectors of A.
For the names of the routines for packed and banded matrices, see [1].
There are several routines for computing eigenvalues and eigenvectors of T , to cover the
cases of computing some or all of the eigenvalues, and some or all of the eigenvectors. In
addition, some routines run faster in some computing environments or for some matrices
than for others. Also, some routines are more accurate than other routines.
steqr This routine uses the implicitly shifted QR algorithm. It switches between the QR
and QL variants in order to handle graded matrices. This routine is used to compute
all the eigenvalues and eigenvectors.
sterf This routine uses a square-root free version of the QR algorithm, also switching
between QR and QL variants, and can only compute all the eigenvalues. This
routine is used to compute all the eigenvalues and no eigenvectors.
stedc This routine uses Cuppens divide and conquer algorithm to find the eigenvalues and
the eigenvectors. stedc can be many times faster than steqr for large matrices
but needs more work space (2n2 or 3n2 ). This routine is used to compute all the
eigenvalues and eigenvectors.
stegr This routine uses the relatively robust representation (RRR) algorithm to find eigen-
values and eigenvectors. This routine uses an LDLT factorization of a number of
116 CHAPTER 6. LAPACK AND THE BLAS
translates T I of T , for one shift near each cluster of eigenvalues. For each
translate the algorithm computes very accurate eigenpairs for the tiny eigenvalues.
stegr is faster than all the other routines except in a few cases, and uses the least
workspace.
stebz This routine uses bisection to compute some or all of the eigenvalues. Options
provide for computing all the eigenvalues in a real interval or all the eigenvalues
from the ith to the jth largest. It can be highly accurate, but may be adjusted to
run faster if lower accuracy is acceptable.
stein Given accurate eigenvalues, this routine uses inverse iteration to compute some or
all of the eigenvectors.
2. ABz = z
3. BAz = z
where A and B are symmetric or Hermitian and B is positive definite. For all these
problems the eigenvalues are real. The matrices Z of computed eigenvectors satisfy
Z T AZ = (problem types 1 and 3) or Z 1 AZ T = I (problem type 2), where is a
diagonal matrix with the eigenvalues on the diagonal. Z also satisfies Z T BZ = I (problem
types 1 and 2) or Z T B 1 Z = I (problem type 3).
There are three types of driver routines for generalized symmetric and Hermitian eigen-
problems. Originally LAPACK had just the simple and expert drivers described below,
and the other one was added after an improved algorithm was discovered.
a simple driver computes all the eigenvalues and (optionally) eigenvectors.
an expert driver computes all or a selected subset of the eigenvalues and (optionally)
eigenvectors. If few enough eigenvalues or eigenvectors are desired, the expert driver
is faster than the simple driver.
a divide-and-conquer driver solves the same problem as the simple driver. It is much
faster than the simple driver for large matrices, but uses more workspace. The name
divide-and-conquer refers to the underlying algorithm.
Notice that most of the lines (indicated by ) contain comments. The initial comment
lines also serve as manual pages. Notice that the code only looks at one half (upper or
lower triangle) of the symmetric input matrix. The other triangle is used to store the
Householder vectors. These are normed such that the first component is one,
In the main loop of dsytrd there is a call to a subroutine dlatrd that generates a
block reflektor. (The blocksize is NB.) Then the block reflector is applied by the routine
122 CHAPTER 6. LAPACK AND THE BLAS
dsyr2k.
Directly after the loop there is a call to the unblocked dsytrd named dsytd2 to deal
with the first/last few (<NB) rows/columns of the matrix. This excerpt concerns the
situation when the upper triangle of the matrix A is stored. In that routine the mentioned
loop looks very much the way we derived the formulae.
ELSE
*
* Reduce the lower triangle of A
*
DO 20 I = 1, N - 1
*
* Generate elementary reflector H(i) = I - tau * v * v
* to annihilate A(i+2:n,i)
*
CALL DLARFG( N-I, A( I+1, I ), A( MIN( I+2, N ), I ), 1,
$ TAUI )
E( I ) = A( I+1, I )
*
IF( TAUI.NE.ZERO ) THEN
*
* Apply H(i) from both sides to A(i+1:n,i+1:n)
*
A( I+1, I ) = ONE
*
* Compute x := tau * A * v storing y in TAU(i:n-1)
*
CALL DSYMV( UPLO, N-I, TAUI, A( I+1, I+1 ), LDA,
$ A( I+1, I ), 1, ZERO, TAU( I ), 1 )
*
* Compute w := x - 1/2 * tau * (x*v) * v
*
ALPHA = -HALF*TAUI*DDOT( N-I, TAU( I ), 1, A( I+1, I ),
$ 1 )
CALL DAXPY( N-I, ALPHA, A( I+1, I ), 1, TAU( I ), 1 )
*
* Apply the transformation as a rank-2 update:
* A := A - v * w - w * v
*
CALL DSYR2( UPLO, N-I, -ONE, A( I+1, I ), 1, TAU( I ), 1,
$ A( I+1, I+1 ), LDA )
*
A( I+1, I ) = E( I )
END IF
D( I ) = A( I, I )
TAU( I ) = TAUI
20 CONTINUE
D( N ) = A( N, N )
END IF
Bibliography
[1] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. D.
Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov,
and D. Sorensen, LAPACK Users Guide - Release 2.0, SIAM, Philadel-
phia, PA, 1994. (Software and guide are available from Netlib at URL
http://www.netlib.org/lapack/).
BIBLIOGRAPHY 123
[2] L. S. Blackford, J. Choi, A. Cleary, E. DAzevedo, J. Demmel,
I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet,
K. Stanley, D. Walker, and R. C. Whaley, ScaLAPACK Users Guide,
SIAM, Philadelphia, PA, 1997. (Software and guide are available at URL
http://www.netlib.org/scalapack/).
[5] , A set of level 3 basic linear algebra subprograms, ACM Trans. Math. Softw., 16
(1990), pp. 117.
[7] , An extended set of fortran basic linear algebra subprograms: Model implemen-
tation and test programs, ACM Transactions on Mathematical Software, 14 (1988),
pp. 1832.
[9] C. Lawson, R. Hanson, D. Kincaid, and F. Krogh, Basic linear algebra sub-
programs for Fortran usage, ACM Trans. Math. Softw., 5 (1979), pp. 308325.
125
126 CHAPTER 7. VECTOR ITERATION (POWER METHOD)
7.2 Angles between vectors
Let q1 and q2 be unit vectors, cf. Fig. 7.1. The length of the orthogonal projection of q2
q2
q1
on span{q1 } is given by
s2 = k(I q1 q1 )q2 k2
= q2 (I q1 q1 )q2
(7.7)
= q2 q2 (q2 q1 )(q1 q2 )
= 1 c2
So, there is a number, say, , 0 2 , such that c = cos and s = sin . We call this
uniquely determined number the angle between the vectors q1 and q2 :
= (q1 , q2 ).
Definition 7.1 The angle between two nonzero vectors x and y is given by
xx y
|x y|
(7.8) = (x, y) = arcsin
I kxk2 kyk
= arccos kxkkyk .
we get
k(I q1 q1 )q2 k = k(I q2 q2 )q1 k.
This immediately leads to
with eigenvalues
(7.10) |1 | > |2 | |3 | |n |.
i=2
If we omit the normalization kx(k) k = 1, which we will do for convenience, then this
becomes v
(k) u Pn (k)
kx2 k u |x |2
sin := sin((x , e1 )) = (k) = t Pi=2 i(k) .
(k) (k)
kx k n 2
i=1 |xi |
This means that for the convergence analysis we look at the iteration (7.1), while the
actual implementation follows closely Algorithm 7.1.
From (7.1) we have
! ! !
x
(k)
0 x
(k1)
0 k x
(0)
1 1
x(k) = 1
(k) = 1
(k1) = 1
(0) .
x2 0 J2 x2 0 J2 x2
128 CHAPTER 7. VECTOR ITERATION (POWER METHOD)
Defining
1 (k)
(7.11) y(k) := x
k1
we have
(k) 1 0
y = 1 y(k1) .
0 1 J2
(0) (k)
Let us assume that y1 = 1. Then y1 = 1 for all k and
2
3
1 1 |k |
(k) (k1) .. ..
y2 = J2 y2 , J2 = . . , |k | = < 1.
1 1 |1 |
n1
n
Regarding the convergence of the vector iteration, Theorem 7.3 implies that for any
> 0 there is an integer K() such that
We will apply this theorem to the case M = 1 1 J2 , the matrix norm ||| ||| will be the
ordinary 2-norm. Thus, for any > 0 there is a K() N with
1 k
1/k
(7.14)
J2
|2 | + , k > K().
1
e1 (Y x(0) ) = (Y e1 ) x(0) 6= 0.
The first column of Y is a left eigenvector associated with 1 . Therefore, we have proved
Theorem 7.5 Let the eigenvalues of A Fnn be arranged such that |1 | > |2 |
|3 | |n |. Let u1 and v1 be right and left eigenvectors of A corresponding to 1 ,
respectively. Then, the vector sequence generated by Algorithm 7.1 converges to u1 in the
sense that
k
2
(7.15) sin = sin((x , u1 )) c
(k) (k)
1
Remark 7.1. The quantity k in Algorithm 7.1 converges to |1 |. The true value 1 C
can be found by comparing single components of y(k) and x(k1) . If 1 R then only the
sign of 1 is at stake.
Remark 7.2. The convergence of the vector iteration is faster the smaller the quotient
|2 |/|1 | is.
Remark 7.3. From (7.12) we see that the norm of the powers of a matrix goes to zero if all
40
35
30
25
||A ||
k
20
15
10
0
0 20 40 60 80 100
k
is eigenvalues are smaller than one in modulus. For small powers the norm can initially
grow considerably. In Fig. 7.2 we have plotted the norms of B k with
0.9 5
(7.16) B= .
0 0.9
130 CHAPTER 7. VECTOR ITERATION (POWER METHOD)
Remark 7.4. If v1 x(0) = 0 then the vector iteration converges to an eigenvector corre-
sponding to the second largest eigenvalue. In practice, rounding errors usually prevent
this behaviour of the algorithm. After a long initial phase the x(k) turn to u1 .
Remark 7.5. In case that 1 6= 2 but |1 | = |2 | there may be no convergence at all. An
example is
1 0 (0)
A= , x = .
0 1
hist = [ang,nan,nan];
5
eigenvalue
angle(xk, u1)
4 angle/old angle
0
0 20 40 60 80 100 120 140 160 180
Figure 7.3: Plot of three important quantities: eigenvalue, angle between eigenvector
approximation and exact eigenvector, convergence rate of eigenvector
The development of three important quantities is given in Fig. 7.3. In Fig. 7.4 the case
is depicted when the initial vector is chosen orthogonal to the left eigenvector correspond-
ing to 1 = 6. Initially, the approximated eigenvalue is 5. Because the stopping criterion
does not hold, the iteration continues until eventually rounding errors take effect.
(7.18) 1 > 2 n 0.
eigenvalue
4 angle(xk, u1)
angle/old_angle
3
0
0 50 100 150 200 250 300 350 400
Figure 7.4: Plot of three important quantities: eigenvalue, angle between eigenvector
approximation and exact eigenvector, convergence rate of eigenvector. Here, the initial
vector is chosen orthogonal to the left eigenvector corresponding to the largest eigenvalue
In order to investigate the convergence of the Rayleigh quotient we work with auxiliary
vectors
!
1 1
(7.20) y(k) = (k) = (k) x(k) .
y2 |x | 1
Notice, that any reasonable approximation of the first eigenvector e1 has a nonzero first
component. For the Rayleigh quotients we have
(y(k) ) = (x(k) ).
Now,
(k) (k) (k) (k)
(k) y(k) Ay(k) (e1 + y2 ) A(e1 + y2 ) 1 + y2 Ay2
(7.21) = (k) (k) = (k)
= (k)
y y 1 + ky2 k2 1 + ky2 k2
7.5. THE SYMMETRIC CASE 133
(k) (k)
where we used that e1 y2 = 0 and e1 Ay2 = 0. Because,
(k)
tan (k) := tan((y(k) , e1 )) = ky2 k
and
1
1 + tan2 () =
1 sin2 ()
we get from (7.21) that
(k) (k) (k) (k)
(7.22) (k) = (1 + y2 Ay2 )(1 sin2 (k) ) = 1 1 sin2 (k) + y2 Ay2 cos2 (k) .
Now, since 1 > 0,
(k) (k)
0 1 (k) = 1 sin2 (k) y2 Ay2 cos2 (k)
(7.23)
(k)
1 sin2 (k) n ky2 k2 cos2 (k) = (1 n ) sin2 (k) .
In summary, we have proved
Theorem 7.6 Let A be a symmetric matrix with spectral decomposition (7.17)(7.18).
Then, the simple vector iteration of Algorithm 7.2 computes sequences (k) k=0 and
(k)
x k=0
that converge linearly towards the largest eigenvalue 1 of A and the corre-
sponding eigenvector u1 provided that the initial vector x(0) has a nonzero component in
the direction of u1 , i.e., that u1 x(0) 6= 0. The convergence rates are given by
k 2k
2 2
(k)
sin sin , (0)
|1 | (1 n ) sin2 (0) .
(k)
1 1
where (k) = (x(k) , u1 .
Thus, the speed of convergence is determined by the ratio of the two
eigenvalues
largest
in
modulus and the quality of the initial guess x(0) . Both sequences (k) and x(k) con-
verge linearly, but the decisive ratio appears squared in the bound for the approximation
error in the eigenvalue. 1 n is called the spread of the spectrum of A. Its occurance
in the bound for max (k) shows that a simple scaling of the matrix does not affect the
convergence behavior of the algorithm.
Example 7.7 Lets compute the smallest eigenvalue and corresponding eigenvector of the
one-dimensional Poisson matrix T = Tn of Example 2.7 with n = 40. Let us assume that
we know an upper bound for the largest eigenvalue n of T then the transformed matrix
I T has the same eigenvectors as T and eigenvalues n < n1 < < 1 .
So, we apply vector iteration to compute the desired quantities.
We set = 4(n + 1)2 / 2 a number that is easily obtained by applying Gerschgorins
circle theorem. We performed a Matlab experiment starting with a random vector.
>> n=40;
>> T = (4*((n+1)^2/pi^2))*eye(n) - ((n+1)^2/pi^2)*p_1d(n);
>> rand(state,0); x0=rand(n,1);
>> [x,lam,nit]=vit(T,x0,1e-4);
>> tau-lam
ans =
0.9995
>> nit
nit =
1968
134 CHAPTER 7. VECTOR ITERATION (POWER METHOD)
In as many as 1968 iteration steps we arrived at an eigenvalue approximation 0.9995.
This number is correct to all digits. The difference to the eigenvalue 1 of the continuous
eigenvalue problem u (x) = u(x) is due to the discretization error. Figure 7.5 shows
the convergence history of this calculation. The straight lines show the actual angle (k)
2
10
0
10
2
10
4
10
6
10
8
10
10
10
0 200 400 600 800 1000 1200 1400 1600 1800 2000
between x(k) and u1 (above) and the actual error (k) 1 . These quantities can of course
not be computed in general. In this example we know them, see Ex. 2.7. The dotted
lines show powers of q = ( 2 )/( 1 ) that indicate the convergence rates given by
Theorem 7.6. Here, q = 0.9956. Clearly, the convergence is as predicted.
Example 7.8 We mentioned that a good initial vector can reduce the number of iteration
steps. Remember that the smallest eigenfunction is sin x, a function that is positive on
the whole interval (0, ). Let us therefore set x(0) to be the vector of all ones.
>> x0 = ones(n,1);
>> [x,lam,nit]=vit(T,x0,1e-4);
>> nit
nit =
866
This is a surprisingly high reduction in the number of iteration steps. Figure 7.6 shows
the convergence history of this calculation. Again the doted lines indicate the convergence
rates that are to be expected. The actual convergence rates are evidently much better.
How can that be?
The eigenvectors of Tn resemble the eigenfunctions sin kx of the continuous eigen-
value problem. Therefore the coefficients corresponding to eigenvectors corresponding to
eigenfunctions antisymmetric with respect to the point /2 vanish. In particular x2 = 0.
Therefore the convergence rate is not q = ( 2 )/( 1 ) but q = ( 3 )/( 1 ). This
is verified by the numbers given in Fig. 7.7 where the assymptotic corrected convergence
rates q and q2 are indicated.
7.6. INVERSE VECTOR ITERATION 135
2
10
0
10
2
10
4
10
6
10
8
10
10
10
0 100 200 300 400 500 600 700 800 900
Figure 7.6: Simple vector iteration with I40 T40 and starting vector (1, 1, . . . , 1)T
Problem 7.9 When computing the smallest eigenvalue of Tn by the simple vector iter-
ation we can find a better shift than above if the extremal points of the spectrum are
known. Determine such that In Tn exhibits the optimal convergence rate. Hint: On
the one hand we would like the quotient ( n1 )/( n ) to be as small as possible. On
the other hand | 1 |/( n ) must not become to big. Hint: Equate the two quantities.
The iteration converges towards the eigenvector with eigenvalue closest to . A linear
system of equations has to be solved in each iteration step. Of course only one Cholesky
or LU factorization has to be computed as the shift remains constants in all iterations.
The stopping criterion is changed into
0
10
2
10
4
10
6
10
8
10
10
10
0 100 200 300 400 500 600 700 800 900
Figure 7.7: Simple vector iteration with I40 T40 and starting vector (1, 1, . . . , 1)T
Then, provided
(k) that u1 x(0) 6= 0, the inverse vector iteration of Algorithm 7.6 constructs
sequences k=0
and x(k) k=0 that converge linearly towards that eigenvalue 1 clos-
est to the shift and to the corresponding eigenvector u1 , respectively. The bounds
1 k 1 2k 2 (0)
sin
(k) sin ,
(0)
1
(k) sin .
2 2
Table 7.1: Computing the lowest eigenvalue of the one-dimensinal Poisson equation by
inverse iteration
>> x = (T - lam(20)*eye(n))\e;
>> y = (T - lam(21)*eye(n))\e;
>> x = x/norm(x); y = y/norm(y);
>> x*y
ans =
0.00140329005834
>> norm((T - lam(20)*eye(n))*x)
ans =
7.325760095786749e-15
>> norm((T - lam(21)*eye(n))*y)
ans =
7.120036319503636e-15
The computed vectors x and y are good approximations in the sense that they give small
residuals. However, the two vectors are nor mutually orthogonal at all. We try to improve
orthogonality by apply a second step of inverse iteration
>> x = (T - lam(20)*eye(n))\x;
>> y = (T - lam(21)*eye(n))\y;
>> x = x/norm(x); y = y/norm(y);
>> x*y
ans =
-1.313592004487587e-05
>> y = y - x*(x*y);
>> x*y
ans =
-2.155571068436496e-17
>> norm((T - lam(21)*eye(n))*y)
ans =
1.557058217172078e-15
>> norm((T - lam(20)*eye(n))*x)
ans =
4.117116818055497e-16
This helped. The two eigenvectors are now perpendicular on each other, and the residuals
are still fine.
we can compute eigenvectors corresponding to any (simple and well separated) eigen-
value if we choose the shift properly, and that
A I = U V , = diag(1 , . . . , n ), with 1 n 0.
The iteration performs an ordinary vector iteration for the eigenvalue problem
1
(7.28) (A B)1 Bx := x, = .
Thus, the iteration (7.27) converges to the largest eigenvector of (7.28), i.e., the eigenvector
with eigenvalue closest to the shift .
In exact arithmetic, the condition u1 x(0) = = uj1 x(0) = 0 implies that all x(k) are
orthogonal to u1 , . . . , uj1 . In general, however, one has to expect rounding errors that
introduce components in the directions of already computed eigenvectors. Therefore, it is
necessary to enforce the orthogonality conditions during the iteration.
Assuming exact arithmetic, Theorem 7.10 immediately implies that
k
sin (x(k) , xj ) c1 j
2k j
(k) j
| j | c2
j
Lemma 7.13 Let q be any nonzero vector. The number that minimizes kAq qk is
the Rayleigh quotient
q Aq
(7.29) = .
q q
Proof. Let R be the Rayleigh quotient (7.29) of q 6= 0 and let C be any number.
Then we have
kAq ( + )qk2 = q A2 q (2 + + ) q Aq + | + |2 q q
= q A2 q 2 q Aq 2Re( ) q Aq + 2 q q + 2Re( ) q q + | |2 q q
(q Aq)2
= q A2 q + | |2 q q.
q q
So,
X X i
(A k I)1 uk = (A k I)1 i xi = xi
i k
i 6= i 6=
We define the gap between the eigenvalue and the rest of As spectrum by
:= min |i |.
i 6=
The assumption implies that there must be a k0 N such that | k | < 2 for all k > k0 ,
and, therefore,
|i k | > for all i 6= .
2
Using this in (7.32) gives
1 2
k(A k I)1 uk k , k > k0 .
mini 6= |i k |
1. We assume that {uk } converges. Then the limit vector u must be an eigenvector of
A in span{x} . (In general, u
is an eigenvector corresponding to the eigenvalue
that is closest to .) Thus,
k
( (uk ))k(A k I)1 uk k | | = 1.
u/( )k
k
2. Now we assume that {uk } does not converge. Then A has two eigenvalues of equal
distance to and the cluster points of the sequence {uk } are two vectors in the plane
7.9. RAYLEIGH QUOTIENT ITERATION 143
that is spanned by two eigenvectors corresponding to these two eigenvalues ,
xp + xq , where 6= 0, 6= 0, and 2 + 2 = 1. Their Rayleigh quotients are
(xp + xq ) = 2 p + 2 q = 2 ( ) + 2 ( ) = (2 2 ).
and, therefore,
k+1
|2 2 | < 1
3 k
k
Remark 7.6. Notice that we have not proved global convergence. Regarding this issue
consult the book by Parlett [4] that contains all of this and more.
RQI converges almost always. However, it is not clear in general towards which
eigenpair the iteration converges. So, it is wise to start RQI only with good starting
vectors. An alternative is to first apply inverse vector iteration and switch to Rayleigh
quotient iteration as soon as the iterate is close enough to the solution. For references on
this technique see [6, 1].
Remark 7.7. The Rayleigh quotient iteration is expensive. In every iteration step another
system of equations has to be solved, i.e., in every iteration step a matrix has to be
factorized. Therefore, RQI is usually applied only to tridiagonal matrices.
% Initializations
k = 0; rho = 0; ynorm = 0;
and the initial vector x = [4, 3, . . . , 4]T . The numbers obtained are
144 CHAPTER 7. VECTOR ITERATION (POWER METHOD)
k rho ynorm
1 0.6666666666666666 3.1717e+00
2 0.4155307724080958 2.9314e+01
3 0.3820048793104663 2.5728e+04
4 0.3819660112501632 1.7207e+13
5 0.3819660112501051 2.6854e+16
Bibliography
[1] C. Beattie and D. Fox, Localization criteria and containment for Rayleigh Quotient
iteration, SIAM J. Matrix Anal. Appl., 10 (1989), pp. 8093.
[2] G. H. Golub and C. F. van Loan, Matrix Computations, The Johns Hopkins
University Press, Baltimore, MD, 4th ed., 2012.
[4] B. N. Parlett, The Symmetric Eigenvalue Problem, Prentice Hall, Englewood Cliffs,
NJ, 1980. (Republished by SIAM, Philadelphia, 1998.).
[5] H. R. Schwarz, Methode der finiten Elemente, Teubner, Stuttgart, 3rd ed., 1991.
[6] D. B. Szyld, Criteria for combining inverse and Rayleigh Quotient iteration, SIAM
J. Numer. Anal., 25 (1988), pp. 13691375.
Chapter 8
The QR factorization in step 6 of the algorithm prevents the columns of the X (k) from
converging all to an eigenvector of largest modulus.
If the convergence criterion is satisfied then
X (k) X (k1) (X (k1) X (k) ) = E, with kEk tol.
145
146 CHAPTER 8. SIMULTANEOUS VECTOR OR SUBSPACE ITERATIONS
Xei = Xe i for i = 1, . . . , q then, for all k, we would have X (k) ei = X
(k) ei for i = 1, . . . , j.
(k)
This, in particular, means that the first columns X e1 perform a simple vector iteration.
q3
q1
q2
This would give an angle of 90o as we could chose q1 in S1 and q3 in S2 . This angle
would not change as we turn S2 around q1 . It would even stay the same if the two planes
coincided.
What if we would take the minimum in (8.1)? This definition would be equally un-
satisfactory as we could chose q1 in S1 as well as in S2 to obtain an angle of 0o . In fact,
any two 2-dimensional subspaces in 3 dimensions would have an angle of 0o . Of course,
we would like to reserve the angle of 0o to coinciding subspaces.
A way out of this dilemma is to proceed as follows: Take any vector x1 S1 and
determine the angle between x1 and its orthogonal projection (I Q2 Q2 )x1 on S2 . We
now maximize the angle by varying x1 among all non-zero vectors in S1 . In the above
3-dimensional example we would obtain the angle between x2 and x3 as the angle between
8.2. ANGLES BETWEEN SUBSPACES 147
S1 and S3 . Is this a reasonable definition? In particular, is it well-defined in the sense
that it does not depend on how we number the two subspaces? Let us now assume that
S1 , S2 Fn have dimensions p and q. Formally, the above procedure gives an angle with
= k(In Q2 Q2 )Q1 k.
Notice, that W W Fpp and W W Fqq and that both matrices have equal rank.
Thus, if W has full rank and p < q then < = /2. However if p = q then W W and
W W have equal eigenvalues, and, thus, = . In this most interesting case we have
1 + Q1 Q1 q
x=q 2 + (In Q1 Q1 )
q2 =: q1 + q2
Proof. Because
kQ2 Q2 Q1 Q1 k = k(I Q2 Q2 ) (I Q1 Q1 )k
the claim immediately follows from Lemma 8.3.
This means that the eigenvalues are arranged on the diagonal of the Jordan block matrix
J in the order given in (8.7).
In fact as we can either analyze the original iteration {X (k) } in the canonical coordi-
nate system or the iteration {Y (k) } = {U X (k) } in the coordinate system generated by
the (generalized) eigenvectors we assume that A itself is a Jordan block matrix with its
diagonal elements arranged as in (8.7).
The invariant subspace of A associated with the p largest (or dominant) eigenvalues is
given by R(Ep ) where Ep = [e1 , . . . , ep ]. We are now going to show that the angle between
R(X (k) ) and R(Ep ), tends to zero as k goes to .
From Problem 8.1 we know that
(k)
From (8.7) we know that J1 is nonsingular. Let us also assume that X1 = Ep X (k) is
invertible. This means, that X (k) has components in the direction of the invariant subspace
associated with the p dominant eigenvalues. Then, with Problem 8.1,
" #
(0)
(k) k (0) J1k X1 Ip k (0) (k) k (0) (0)
1 k
(8.9) X R = A X = (0) = (k) J1 X1 , S := J2 X 2 X 1 J1 .
J2k X2 S
(k) (0)
Notice that X1 is invertible if X1 is so. (8.8) and (8.9) imply that
Likewise, we have
1
cos (k) = kEp X (k) k = p .
1 + kS (k) k2
Since (J2 ) = |p+1 | and (J11 ) = 1/|p | we obtain with (7.13) and a few algebraic
manipulations for any > 0 that
k
p+1
(8.11) tan (k)
= kS (k)
k kJ2k kkS (0) kkJ1k k
+ tan (0) ,
p
for k > K(). Making a transformation back to a general matrix A as before Theorem 7.5
we get
Theorem 8.5 Let Up , Vp Fnp , Up Up = Vp Vp = Ip , be matrices that span the right and
left invariant subspace associated with the dominant p eigenvalues 1 , . . . , p of A. Let
X (0) Fnp be such that Vp X (0) is nonsingular. Then, if |p | < |p+1 | and > 0, the
iterates X (k) of the basic subspace iteration with initial subpace X (0) converges to Up , and
k
p+1
(8.12) tan (k)
+ tan (0) , (k) = (R(Up ), R(X (k) ))
p
Theorem 8.6 Let Up := [u1 , . . . , up ] be the matrix formed by the eigenvectors correspond-
ing to the p dominant eigenvalues 1 , . . . , p of A. Let X (0) Fnp be such that Up X (0) is
nonsingular. Then, if |p | < |p+1 |, the iterates X (k) of the basic subspace iteration with
initial subpace X (0) converges to Up , and
p+1 k
(8.13) tan (k)
tan (0) , (k) = (R(Up ), R(X (k) )).
p
150 CHAPTER 8. SIMULTANEOUS VECTOR OR SUBSPACE ITERATIONS
Let us elaborate on this result. (Here we assume that A is Hermitian or real-symmetric.
Otherwise the statements are similar modulo terms as in (8.12).) Let us assume that
not only Wp := Up X is nonsingular but that each principal submatrix
w11 w1j
.. , 1 j p,
Wj := ... .
wj1 wjj
(k) (k)
of Wp is nonsingular. Then we can apply Theorem 8.6 to each set of columns [x1 , . . . , xj ],
1 j p, provided that |j | < |j+1 |. If this is the case, then
(k) j+1 k
(8.14) tan j tan (0) ,
j j
Proof. Recall that the sine of the angle between two subspaces S1 , S2 of equal dimension is
(k) (k)
the norm of the projection on S2 restricted to S1 , see (8.6). Here, S1 = R([xq , . . . , xp ])
and S2 = R([uq , . . . , up ]).
Let x S1 with kxk = 1. The orthogonal projection of x on S2 reflects the fact, that
y R([uq , . . . , up ]) implies that y R([u1 , . . . , up ]) as well as y R([u1 , . . . , uq ]) ,
Uq1 Uq1 x + (I Up Up )x.
To estimate the norm of this vector we make use of Lemmata 8.4 and (8.10),
1/2
kUq1 Uq1 x + (I Up Up )xk = kUq1 Uq1
xk2 + k(I Up Up )xk2
1/2 n o
(k) (k)
sin2 q1 + sin2 (k)
p 2 max sin q1 , sin (k)
p
n o
(k)
2 max tan q1 , tan (k)
p .
Then, inequality (8.15) is obtained by applying (8.14) that we know to hold true for both
j = q1 and j = p.
Corollary 8.8 Let X Fnp . Let |j1 | > |j | > |j+1 | and let Wj1 and Wj be
nonsingular. Then
( )
(k) j k j+1 k
(8.16) sin (xj , uj ) c max , .
j1 j
Example 8.9 Let us see how subspace iteration performs with the matrix
if we iterate with 5 vectors. The critical quotients appearing in Corollary 8.8 are
8.3. CONVERGENCE OF BASIC SUBSPACE ITERATION 151
j 1 2 3 4 5
.
|j+1 |/|j | 1/3 3/4 2/3 3/5 2/3
(k)
So, according to (8.16), the first column x1 of X (k) R405 should converge to the first
(k) (k)
eigenvector at a rate 1/3, x2 and x3 should converge at a rate 3/4 and the last two
columns should converge at the rate 2/3. The graphs in Figure 8.2 show that convergence
0
10
2
10
4
10
6
10
8
10
10
sin(1)
10 sin(2)
sin(3)
sin(4)
12
10 sin(5)
14
10
0 5 10 15 20 25
(k) (k)
since kxj k = 1. Let xj = u + u , where u is the eigenvalue corresponding to j . Then,
(k) (k)
since u = xj cos and u = xj sin for a (k) , we have
Thus,
(k+1) 2 ( k )!
(k+1) 21 j j j+1 k
|j j | (k+1)
sin2 (k) = O max , .
j + j j1 j
152 CHAPTER 8. SIMULTANEOUS VECTOR OR SUBSPACE ITERATIONS
So, the numbers presented in Table 8.1 reflect quite accurately the predicted rates. The
numbers in column 6 are a little too small, though.
The convergence criterion
(k1)
max k(I X (k) X (k) )xi k = 105
1ip
where k
(k) j
sij = sij , 1 i n p, 1 j p.
p+i
But we have
Ip
min sin (ui , x) sin ui , U ei ,
xR(X (k) ) S(k)
0
..
.
0
1
0
Ip
=
(I ui ui )U
/
(k) ei
..
S
.
s1i (i /p+1 )k
..
.
snp,i (i /n ) k
Xn k
i
(I ui ui ) ui + sjp,i uj
p+j
j=p+1
v v
unp unp
uX
t 2 2k
i i k u t
X
= sji 2k s2ji .
p+j p+1
j=1 j=1
9: end for
(k)
Remark 8.1. The columns xi of X (k) are called Ritz vectors and the eigenvalues
(k) (k)
1 p in the diagonal of are called Ritz values. According to the Rayleigh-
8.4. ACCELERATING SUBSPACE ITERATION 155
Ritz principle 2.32 we have
(k)
i i 1 i p, k > 0.
The solution of the full eigenvalue problems H (k) y = y is solved by the symmetric
QR algorithm.
The computation of the matrix H (k) is expensive as matrix-vector products have to
be executed. The following considerations simplify matters. We write X (k) in the form
Furthermore, the columns of Z (k) G(k) are the Ritz vectors in R(Ak X) of A2 ,
2
G(k) Z (k) A2 Z (k) G(k) = (k) ,
where (k) is a diagonal matrix. Using the definition of Z (k) in Algorithm 8.2 we see that
2
G(k) X (k1) X (k1) G(k) = G(k) G(k) = (k) ,
and that Y (k) := G(k) (k) is orthogonal. Substituting into (8.18) gives
2
Y (k) Z (k) Z (k) Y (k) = (k) .
So, the columns of Y (k) are the normalized eigenvectors of H (k) := Z (k) Z (k) .
Thus we obtain a second variant of the inverse subspace iteration with Rayleigh-Ritz
step.
Remark 8.2. An alternative to Algorithm 8.3 is the subroutine ritzit, that has been
programmed by Rutishauser [5] in ALGOL, see also [3, p.293].
We are now going to show that the Ritz vectors converge to the eigenvectors, as
Theorem 8.10 lets us hope. First we prove
Lemma 8.11 ([3, p.222]) Let y be a unit vector and F. Let be the eigenvalue of A
closest to and let u be the corresponding eigenvector. Let
:= min |i (A) |
i (A)6=
156 CHAPTER 8. SIMULTANEOUS VECTOR OR SUBSPACE ITERATIONS
and let = (y, u). Then
kr(y)k kAy yk
sin := ,
where r(y, ) = Ay y plays the role of a residual.
Proof. We write y = u cos + v sin with kvk = 1. Then
with
Hk = 2k
p+1 S
(0)
k 2 2 k (0)
2 (2 I)2 S .
8.4. ACCELERATING SUBSPACE ITERATION 157
As the largest eigenvalue of 1
2 is 1/p+1 , Hk is bounded,
kHk k c1 k > 0.
Thus,
k k !
1 1
1 Hk 1 t 0.
p+1 p+1
(8.21) B A2 Bt = 2 t.
= , = 2
i ,
(k) (k)
1/2
(k) 1/2
y = Ip + S S ei /
Ip + S
(k)
S ei
,
1/2
1/2
(k) (k)
(k) (k)
u = Ip + S S t i /
Ip + S S ti
.
Now we have
1/2
1/2
2 (k) (k)
(k) (k)
2
kr(y)k =
(B A B i I) Ip + S S
ei
Ip + S S ei
1 h i
(k) (k) (k) Ip 1 (k) (k)
ei
2 2
Ip + S S Ip , S UA U (k) 2 Ip + S S
S i
(k) (k)
1/2
k k
Ip + S S
2 2 I + 1 1 Hk 1 1
1 i p+1 p+1 ei
k k
1
p+1 1 H k 1
p+1 1 ei
k
k
k
1
1 ei
c1 p i k
1
p+1 1
kHk k
p+1
.
p+1 p+1
158 CHAPTER 8. SIMULTANEOUS VECTOR OR SUBSPACE ITERATIONS
Then, Lemma 8.11 implies that
1/2 1/2 c k
(k) (k) (k) (k) (k) (k) 1 i
sin (xi , yi ) = sin Ip + S S t i , Ip + S S ei .
p+1
By consequence,
k
(k) i
(xi , ui ) c2
p+1
must be true for a constant c2 independent of k.
As earlier, for the eigenvalues we can show that
2k
(k) i
|i i | c3 .
p+1
A numerical example
For the previous example that is concerned with the accustic vibration in the interior of a
car the numbers listed in Table 8.2 are obtained. The quotients 2 /
2 , that determine
i p+1
the convergence behavior of the eigenvalues are
1 /
( 6 )2 = 0.004513, 4 /
( 6 )2 = 0.2045,
2 /
( 6 )2 = 0.02357, 5 /
( 6 )2 = 0.7321.
3 /
( 6 )2 = 0.1362,
The numbers in the table confirm the improved convergence rate. The convergence rates of
the first four eigenvalues have improved considerably. The predicted rates are not clearly
visible, but they are approximated quite well. The convergence rate of the fifth eigenvalue
(k) (k)
has not improved. The convergence of the 5-dimensional subspace R([x1 , . . . , x5 ]) to
the searched space R([u1 , . . . , u5 ]) has not been accelerated. Its convergence rate is still
5 /6 according to Theorem 8.5
(k)
By means of the Rayleigh-Ritz step we have achieved that the columns xi = x(k)
converge in an optimal rate to the individual eigenvectors of A.
AI = A0 = AX0 = Y1 = X1 R1 (SV I)
A1 = X1 AX1 = X1 X1 R1 X1 = R1 X1 (QR)
AX1 = Y2 = X2 R2 (SV I)
A1 = X1 Y2 = X1 X2 R2 (QR)
A2 = R2 X1 X2 (QR)
= X2 X1 X1 X2 R2 X1 X2 = X2 AX2 (QR)
| {z }
A1
This holds for all p. We therefore can interpret the QR algorithm as a nested sub-
space iteration. There is also a relation to simultaneous inverse vector iteration! Let us
assume that A is invertible. Then we have,1
1
Notice that A = (A1 ) = (A )1 .
8.6. ADDENDUM 161
Then,
u
,
.. ..
k
Xk [e , . . . , en ] . . = A X0 [e , . . . , en ]
u
n, u
n,n
By consequence, the last n + 1 columns of Xk execute a simultaneous inverse
vector iteration. This holds for all . Therefore, the QR algorithm also performs a nested
inverse subspace iteration.
8.6 Addendum
Let A = H be an irreducible Hessenberg matrix and W1 = [w1 , . . . , wp ] be a basis of the
p-th dominant invariant subspace of H ,
H W1 = W1 S, S invertible.
Notice that the p-th dominant invariant subspace is unique if |p | > |p+1 |.
Let further X0 = [e1 , . . . , ep ]. Then we have the
Bibliography
[1] Z. Bai and G. W. Stewart, Algorithm 776. SRRIT A FORTRAN subroutine
to calculate the dominant invariant subspace of a nonsymmetric matrix, ACM Trans.
Math. Softw., 23 (1997), pp. 494513.
[2] G. H. Golub and C. F. van Loan, Matrix Computations, The Johns Hopkins
University Press, Baltimore, MD, 2nd ed., 1989.
[3] B. N. Parlett, The Symmetric Eigenvalue Problem, Prentice Hall, Englewood Cliffs,
NJ, 1980. (Republished by SIAM, Philadelphia, 1998.).
162 CHAPTER 8. SIMULTANEOUS VECTOR OR SUBSPACE ITERATIONS
[4] B. N. Parlett and W. G. Poole, A geometric theory for the QR, LU, and power
iterations, SIAM J. Numer. Anal., 8 (1973), pp. 389412.
Krylov subspaces
9.1 Introduction
In the power method or in the inverse vector iteration we computed, up to normalization,
sequences of the form
x, Ax, A2 x, . . .
The information available at the k-th step of the iteration is the single vector x(k) =
(0)
A x/kA xk. One can pose the question if discarding all the previous information x , . . . , x(k1)
k k
is not a too big waste of information. This question is not trivial to answer. On one hand
there is a big increase of memory requirement, on the other hand exploiting all the infor-
mation computed up to a certain iteration step can give much better approximations to
the searched solution. As an example, let us consider the symmetric matrix
2 1
2 1 2 1
51 .. .. ..
T = . . . R5050 .
1 2 1
1 2
the lowest eigenvalue of which is around 1. Let us choose x = [1, . . . , 1] and compute the
first three iterates of inverse vector iteration, x, T 1 x, and T 2 x. We denote their Rayleigh
(k) (k) (k)
k (k) 1 2 3
1 10.541456 10.541456
2 1.012822 1.009851 62.238885
3 0.999822 0.999693 9.910156 147.211990
(k)
Table 9.1: Ritz values j vs. Rayleigh quotients (k) of inverse vector iterates.
(k)
quotients by (1) , (2) , and (3) , respectively. The Ritz values j , 1 j k, obtained
with the Rayleigh-Ritz procedure with Kk (x) = span(x, T 1 x, . . . , T 1k x), k = 1, 2, 3,
are given in Table 9.1. The three smallest eigenvalues of T are 0.999684, 3.994943, and
(3)
8.974416. The approximation errors are thus (3) 1 0.000 14 and 1 1 0.000 009,
which is 15 times smaller.
These results immediately show that the cost of three matrix vector multiplications
can be much better exploited than with (inverse) vector iteration. We will consider in this
163
164 CHAPTER 9. KRYLOV SUBSPACES
section a kind of space that is very often used in the iterative solution of linear systems
as well as of eigenvalue problems.
generated by the vector x Fn is called a Krylov matrix. Its columns span the Krylov
(sub)space
n o
(9.2) Km (x) = Km (x, A) := span x, Ax, A2 x, . . . , A(m1) x = R (K m (x)) Fn .
The Arnoldi and Lanczos algorithms are methods to compute an orthonormal basis
of the Krylov space. Let
h i
x, Ax, . . . , Ak1 x = Q(k) R(k)
be the QR factorization of the Krylov matrix K m (x). The Ritz values and Ritz vectors
of A in this space are obtained by means of the k k eigenvalue problem
(9.3) Q(k) AQ(k) y = (k) y.
(k) (k)
If (j , yj ) is an eigenpair of (9.3) then (j , Q(k) yj ) is a Ritz pair of A in K m (x).
The following properties of Krylov spaces are easy to verify [1, p.238]
Notice that the scaling and translation invariance hold only for the Krylov subspace, not
for the Krylov matrices.
What is the dimension of Km (x)? It is evident that for n n matrices A the columns
of the Krylov matrix K n+1 (x) are linearly dependent. (A subspace of Fn cannot have a
dimension bigger than n.) On the other hand if u is an eigenvector corresponding to the
eigenvalue then Au = u and, by consequence, K2 (u) = span{u, Au} = span{u} =
K1 (u). So, there is a smallest m, 1 m n, depending on x such that
K1 (x) 2 m
6= K (x) 6= 6= K (x) = K
m+1
(x) =
so-called
QVandermonde matrices, are nonsingular if the i are different (their determinant
equals i6=j (i j )) the Krylov matrices K j (x) are nonsingular for j m. Thus for
diagonalizable matrices A we have
where m is the number of eigenvectors needed to represent x. The subspace Km (x) is the
smallest invariant space that contains x.
Let m be the smallest index for which Km (x) = Km+1 (x). Then, for j m the mapping
X X
P j1 ci i ci Ai x Kj (x)
of A provides the Ritz values of A in Kj (x). The columns yi of Y = QX are the Ritz
vectors.
By construction the Ritz vectors are mutually orthogonal. Furthermore,
because
Q (AQxi Qxi i ) = Q AQxi xi i = A xi xi i = 0.
It is easy to represent a vector in Kj (x) that is orthogonal to yi .
(9.9) (A)x yk (k ) = 0.
k (A)x
(9.11) yk = ,
kk (A)xk
9.3. POLYNOMIAL REPRESENTATION OF KRYLOV SUBSPACES 167
as k () = 0 for all i , i 6= k. According to Lemma 9.2 k (A)x is perpendicular to all yi
with i 6= k. Further,
Lemma 9.3 ([1, p.241]) For each Pj1 and each i j m the Rayleigh quotient
((A)x) (A i I)((A)x)
((A)x; A i I) = = ((A)x; A) i
k(A)xk2
x = Ui Ui x + (I Ui Ui )x = cos g + sin h.
k(A)hk2
(s; A i I) sin2 (m i ) .
k(A)xk2
With
m
X
ksk2 = k(A)xk2 = 2 (l )(x ul )2 2 (i ) cos2
l=1
The last inequality is important! In this step the search of a maximum in a few selected
points (i+1 , . . . , m ) is replaced by a search of a maximum in a whole interval containing
these points. Notice that i is outside of this interval. Among all polynomials of a given
degree that take a given fixed value (i ) the Chebyshev polynomial have the smallest
(j)
maximum. As i is a Ritz value, we know from the monotonicity principle 2.32 that
(j)
0 i i .
(j)
Further, from the definition of i (as an eigenvalue of A in the subspace Kj (x)),
(j)
i i (s, A i I) provided that s yl , 1 l i 1.
(j) (j)
() = ( 1 ) ( i1 )(), Pji .
and
Q
i1
m l
i l
sin
(9.16) tan (ui , projektion of ui on Kj ) l=1 .
cos Tji (1 + 2)
Proof. For proving the second part of the Theorem we write
x = g cos (x, Ui1 Ui1 x) + ui cos (x, ui ) +h sin (x, Ui Ui x) .
| {z } | {z }
(j) (j)
Theorem 9.4 can easily be rewritten to give error bounds for m j , m1 j1 ,
etc.
We see from this Theorem that the eigenvalues at the beginning and at the end of
the spectrum are approximated the quickest. For the first eigenvalue the bound (9.15)
simplifies a little,
(j) tan2 1 2 1
(9.17) 0 1 1 (m 1 ) , 1 = , 1 = (x, u1 ).
Tji (1 + 21 )2 m 2
Now, we have
1 1
2 1 1j
1 2 2 1 > 1.
1
j
1 = 2(1 + 1 ) 1 = 2
1 + 2 1 1
1=2
2 j
1 1 2
1
j
| {z }
>1
Since |Tj1 ()| grows rapidly and monotonically outside [1, 1] we have
2
1 ) Tj1 (2
Tj1 (1 + 2 1),
1
and thus
2
1 1 1
(9.19) c1
1 (j)
T (2
2
1)
1 j1 1
2 /
1 j=5 j = 10 j = 15 j = 20 j = 25
(1/Tj1 (2 2 /
1 1))2
Table 9.2: Ratio 2 /
for varying j and ratios 1 .
1 /
( 2 ) 2(j1)
Bibliography
[1] B. N. Parlett, The Symmetric Eigenvalue Problem, Prentice Hall, Englewood Cliffs,
NJ, 1980. (Republished by SIAM, Philadelphia, 1998.).
[2] Y. Saad, On the rates of convergence of the Lanczos and the block Lanczos methods,
SIAM J. Numer. Anal., 17 (1980), pp. 687706.
172 CHAPTER 9. KRYLOV SUBSPACES
Chapter 10
Then {q1 , . . . , qj+1 } is an orthonormal basis of Kj+1 (x), called in general the Arnoldi
basis or, if the matrix A is real symmetric or Hermitian, the Lanczos basis. The vectors
qi are called Arnoldi vectors or Lanczos vectors, respectively, see [6, 1].
The vector qj+1 can be computed in a more economical way since
Kj+1 (x, A) = R [x, Ax, . . . , Aj x] , (q1 = x/kxk) ,
j
= R [q1 , Aq1 , . . . , A q1 ] (Aq1 = q1 + q2 , 6= 0),
= R [q1 , q1 + q2 , A(q1 + q2 ), . . . , Aj1 (q1 + q2 )] ,
= R [q1 , q2 , Aq2 , . . . , Aj1 q2 ] ,
..
.
= R ([q1 , q2 , . . . , qj1 , Aqj ]) .
173
174 CHAPTER 10. ARNOLDI AND LANCZOS ALGORITHMS
If rj = 0 then the procedure stops which means that we have found an invariant subspace,
namely span{q1 , . . . , qj }. If krj k > 0 we obtain qj+1 by normalizing,
rj
(10.4) qj+1 = .
krj k
Since, qj+1 and rj are aligned, we have
(10.3)
(10.5) qj+1 rj = krj k = qj+1 Aqj .
The last equation holds since qj+1 (by construction) is orthogonal to all the previous
Arnoldi vectors. Let
hij = qi Aqj .
Then, (10.3)(10.5) can be written as
j+1
X
(10.6) Aqj = qi hij .
i=1
The Arnoldi algorithm returns if hj+1,j = 0, in which case j is the degree of the
minimal polynomial of A relative to x, cf. (9.5). This algorithm costs k matrix-vector
multiplications, n2 /2 + O(n) inner products, and the same number of axpys.
Defining Qk = [q1 , . . . , qk ], equation (10.6) can be collected for j = 1, . . . , k,
Equation (10.7) is called Arnoldi relation. The construction of the Arnoldi vectors is
expensive. Most of all, each iteration step becomes more costly as the number of vectors
against which r has to be orthogonalized increases. Therefore, algorithms based on the
Arnoldi relation like GMRES or the Arnoldi algorithm itself are restarted. This in general
means that the algorithm is repeated with a initial vector that is extracted from previous
invocation of the algorithm.
10.2. ARNOLDI ALGORITHM WITH EXPLICIT RESTARTS 175
10.2 Arnoldi algorithm with explicit restarts
Algorithm 10.1 stops if hm+1,m = 0, i.e., if it has found an invariant subspace. The vectors
{q1 , . . . , qm } then form an invariant subspace of A,
AQm = Qm Hm , Qm = [q1 , . . . , qm ].
The eigenvalues of Hm are eigenvalues of A as well and the Ritz vectors are eigenvectors
of A.
In general, we cannot afford to store the vectors q1 , . . . , qm because of limited memory
space. Furthermore, the algorithmic complexity increases linearly in the iteration number
j. The orthogonalization would cost 2nm2 floating point operations.
Often it is possible to extract good approximate eigenvectors from a Krylov space of
small dimension. We have seen, that in particular the extremal eigenvalues and corre-
sponding eigenvectors are very well approximated after a few iteration steps. So, if only a
small number of eigenpairs is desired, it is usually sufficient to get away with Krylov space
of much smaller dimension than m.
Exploiting the Arnoldi relation (10.7) we can get cheap estimates for the eigenvalue/eigen-
(k) (k) (k)
vector residuals. Let ui = Qk si be a Ritz vector with Ritz value i . Then
(k) (k) (k) (k) (k) (k) (k) (k)
Aui i ui = AQk si i Qk si = (AQk Qk Hk )si = hk+1,k qk+1 ek si .
Therefore,
(k) (k) (k)
(10.8) k(A i I)ui k2 = hk+1,k |ek si |.
(k)
The residual norm is equal to the last component of si multiplied by hk+1,k (which is
positive by construction). These residual norms are not always indicative of actual errors
(k)
in i , but they can be helpful in deriving stopping procedures.
We now consider an algorithm for computing some of the extremal eigenvalues of a
non-Hermitian matrix. The algorithm proceeds by computing one eigenvector or rather
Schur vector at the time. For each of them an individual Arnoldi procedure is employed.
Let us assume that we have already computed k1 Schur vectors u1 , . . . uk1 . To compute
uk we force the iterates in the Arnoldi process (the Arnoldi vectors) to be orthogonal to
Uk1 where Uk1 = [u1 , . . . uk1 ]. So, we work essentially with the matrix
(I Uk1 Uk1 )A
that has k 1 eigenvalues zero which we of course neglect.
The procedure is given in Algorithm 10.2. The Schur vectors u1 , . . . uk1 are kept
in the search space, while the Krylov space is formed with the next approximate Schur
vector. The search space thus is
span{u1 , . . . uk1 , uk , Auk , . . . Amk uk }.
In Algorithm 10.2 the basis vectors are denoted vj with vj = uj for j < k. The vectors
vk , . . . , vm form an orthonormal basis of span{uk , Auk , . . . Amk uk }.
The matrix Hm for k = 2 has the structure
Hm =
176 CHAPTER 10. ARNOLDI AND LANCZOS ALGORITHMS
Algorithm 10.2 Explicitly restarted Arnoldi algorithm
1: Let A Fnn . This algorithm computes the nev largest eigenvalues of A together with
the corresponding Schur vectors.
2: Set k = 1.
3: loop
4: for j = k, . . . , m do /* Execute m k steps of Arnoldi */
5: r := Aqj ;
6: for i = 1, . . . , j do
7: hij := qi r, r := r qi hij ;
8: end for
9: hj+1,j := krk;
10: qj+1 = r/hj+1,j ;
11: end for
12: Compute approximate eigenvector of A associated with k and the corresponding
residual norm estimate k according to (10.8).
13: Orthogonalize this eigenvector (Ritz vector) against all previous vj to get the ap-
proximate Schur vector uk . Set vk := uk .
14: if k is small enough then /* accept eigenvalue */
15: for i = 1, . . . , k do
16: hik := vi Avk ;
17: end for
18: Set k := k + 1.
19: if k nev then
20: return (v1 , . . . , vk , H Fkk )
21: end if
22: end if
23: end loop
where the block in the lower right corresponds to the Arnoldi process for the Krylov space
)A).
Kmk (uk , (I Uk1 Uk1
This algorithm needs at most m basis vectors. As soon as the dimension of the search
space reaches m the Arnoldi iteration is restarted with the best approximation as the
initial vector. The Schur vectors that have already converged are locked or deflated.
(10.9) Qk AQk = Qk Qk Hk = Hk .
Tk
Ak Qk = Qk + O
The Lanczos algorithm is summarized in Algorithm 10.3. In this algorithm just the
three vectors q, r, and v are employed. In the j-th iteration step (line 8) q is assigned
qj and v stores qj1 . r stores first (line 9) Aqj j1 qj1 . Later (step 11), when j
is available, it stores rj = Aqj j1 qj1 j qj . In the computation of j the fact is
exploited that qj qj1 = 0 whence
In each traversal of the j-loop a column is appended to the matrix Qj1 to become Qj . If
the Lanczos vectors are not desired this statement can be omitted. The Lanczos vectors
are required to compute the eigenvectors of A. Algorithm 10.3 returns when j = m, where
m is the degree of the minimal polynomial of A relative to x. bm = 0 implies
(10.14) AQm = Qm Tm .
178 CHAPTER 10. ARNOLDI AND LANCZOS ALGORITHMS
Algorithm 10.3 Basic Lanczos algorithm for the computation of an orthonormal
basis for of the Krylov space Km (x)
1: Let A Fnn be Hermitian. This algorithm computes the Lanczos relation (10.13),
i.e., an orthonormal basis Qm = [q1 , . . . , qm ] for Km (x) where m is the smallest index
such that Km (x) = Km+1 (x), and (the nontrivial elements of) the tridiagonal matrix
Tm .
2: q := x/kxk; Q1 = [q];
3: r := Aq;
4: 1 := q r;
5: r := r 1 q;
6: 1 := krk;
7: for j = 2, 3, . . . do
8: v = q; q := r/j1 ; Qj := [Qj1 , q];
9: r := Aq j1 v;
10: j := q r;
11: r := r j q;
12: j := krk;
13: if j = 0 then
14: return (Q Fnj ; 1 , . . . , j ; 1 , . . . , j1 )
15: end if
16: end for
Let (i , si ) be an eigenpair of Tm ,
(m) (m) (m)
(10.15) Tm si = i si .
Then,
(m) (m) (m) (m)
(10.16) AQm si = Qm Tm si = i Qm si .
The cost of a single iteration step of Algorithm 10.3 does not depend on the index of
the iteration! In a single iteration step we have to execute a matrix-vector multiplication
and 7n further floating point operations.
Remark 10.1. In certain very big applications the Lanczos vectors cannot be stored for
reasons of limited memory. In this situation, the Lanczos algorithm is executed without
building the matrix Q. When the desired eigenvalues and Ritz vectors have been deter-
mined from (10.15) the Lanczos algorithm is repeated and the desired eigenvectors are
accumulated on the fly using (10.17).
Proof. Let
n
X
A = U U = i ui ui
i=1
be the spectral decomposition of A. Then,
n
X n
X
(A I)x = (i ui ui ui ui )x = (i )(ui x)ui .
i=1 i=1
(j)
sji is the j-th, i.e., the last element of the eigenvector matrix Sj of Tj ,
(j) (j)
Tj Sj = Sj j , j = diag(1 , , j ).
(j) |sji |
(10.19) sin (yi , z) j ,
where z is the eigenvector corresponding to in (10.18) and is the gap between and
the next eigenvalue 6= of A. In an actual computation, is not known. Parlett suggests
(j) (j) (j)
to replace by the distance of i to the next k , k 6= i. Because the i converge to
eigenvalues of A this substitution will give a reasonable number, at least in the limit.
In order to use the estimate (10.18) we need to compute all eigenvalues of Tj and the
last row of Sj . It is possible and in fact straightforward to compute this row without the
rest of Sj . The algorithm, a simple modification of the tridiagonal QR algorithm, has been
introduced by Golub and Welsch [3] in connection with the computation of interpolation
points and weights in Gaussian integration.
A numerical example
This numerical example is intended to show that the implementation of the Lanczos algo-
rithm is not as simple as it seems from the previous. Let
A = diag(0, 1, 2, 3, 4, 100000)
and
x = (1, 1, 1, 1, 1, 1)T .
The diagonal matrix A has six simple eigenvalues and x has a non-vanishing component in
the direction of each eigenspace. Thus, the Lanczos algorithm should stop after m = n = 6
iteration steps with the complete Lanczos relation. Up to rounding error, we expect that
6 = 0 and that the eigenvalues of T6 are identical with those of A. Lets see what happens
if Algorithm 10.3 is applied with these input data. in the sequel we present the numbers
that we obtained with a Matlab implementation of this algorithm.
j=1
1 = 16668.33333333334, 1 = 37267.05429136513.
j=2
2 = 83333.66652666384, 2 = 3.464101610531258.
The diagonal of the eigenvalue matrix 2 is:
2 S2,: = (1.4142135626139063.162277655014521) .
Similarly as with j = 4, the first four Ritz vectors satisfy the orthogonality condition
very well. But they are not perpendicular to the last Ritz vector.
j=6
6 = 99998.06336906151 6 = 396.6622037049789
The diagonal of the eigenvalue matrix is
0.02483483859326367
1.273835519171372
2.726145019098232
diag(6 ) =
.
3.975161765440400
9.999842654044850 10+4
1.000000000000000 10+5
The eigenvalues are not the exact ones, as was to be expected. We even have two
copies of the largest eigenvalue of A in 6 ! The last row of 6 S6 is
6 S6,: = 0.20603, 0.49322, 0.49323, 0.20604, 396.66, 8.6152 1015
10.4. THE LANCZOS PROCESS AS AN ITERATIVE METHOD 183
although theory predicts that 6 = 0. The sixth entry of 6 S6 is very small, which
means that the sixth Ritz value and the corresponding Ritz vector are good approx-
imations to an eigenpair of A. In fact, eigenvalue and eigenvector are accurate to
machine precision.
5 s6,5 does not predict the fifth column of Y6 to be a good eigenvector approximation,
although the angle between the fifth and sixth column of Y6 is less than 103 . The
last two columns of Y6 are
4.7409 104 3.3578 1017
1.8964 103 5.3735 1017
2.8447 103 7.0931 1017
1.8965 103 6.7074 1017 .
4.7414 104 4.9289 1017
0.99999 1.0000
As 6 6= 0 one could continue the Lanczos process and compute ever larger tridi-
agonal matrices. If one proceeds in this way one obtains multiple copies of certain
(j)
eigenvalues [2, 2]. The corresponding values j sji will be tiny. The corresponding
Ritz vectors will be almost linearly dependent.
From this numerical example we see that the problem of the Lanczos algorithm consists
in the loss of orthogonality among Ritz vectors which is a consequence of the loss of
orthogonality among Lanczos vectors, since Yj = Qj Sj and Sj is unitary (up to roundoff).
To verify this diagnosis, we rerun the Lanczos algorithm with complete reorthogonal-
ization. This procedure amounts to the Arnoldi algorithm 10.1. It can be accomplished
by modifying line 11 in the Lanczos algorithm 10.3, see Algorithm 10.4.
Of course, the cost of the algorithm increases considerably. The j-th step of the
algorithm requires now a matrix-vector multiplication and (2j + O(1))n floating point
operations.
j=3
3 = 2.000112002240894 3 = 1.183215957295905
The diagonal of the eigenvalue matrix is
diag(3 ) = (0.5857724375677908, 3.414199561859357, 100000.0000000000) T .
184 CHAPTER 10. ARNOLDI AND LANCZOS ALGORITHMS
j=4
4 = 2.000007428719501 4 = 1.014185105707661
0.1560868732475296
1.999987898917647
diag(4 ) =
3.843904655996084
99999.99999999999
The matrix of Ritz vectors Y4 = Q4 S4 is
0.93229 0.12299 0.03786 1.1767 1015
0.34487 0.49196 0.10234 2.4391 1015
2.7058 106 0.69693 2.7059 106 4.9558 1017
0.10233 0.49195 0.34488 2.3616 1015
0.03786 0.12299 0.93228 1.2391 1015
2.7086 1017 6.6451 1017 5.1206 1017 1.00000
j=5
5 = 2.000009143040107 5 = 0.7559289460488005
0.02483568754088384
1.273840384543175
diag(5 ) =
2.726149884630423
3.975162614480485
10000.000000000000
The Ritz vectors are Y5 =
9.91 1001 4.62 1002 2.16 1002 6.19 1003 4.41 1018
1.01 10 01 8.61 1001 1.36 1001 3.31 1002 1.12 1017
7.48 1002 4.87 1001 4.87 1001 7.48 1002 5.89 1018
3.31 1002 1.36 1001 8.61 1001 1.01 1001 1.07 1017
6.19 10 03 2.16 1002 4.62 1002 9.91 1001 1.13 1017
5.98 1018 1.58 1017 3.39 1017 5.96 1017 1.000000000000000
j=6
6 = 2.000011428799386 6 = 4.1785508667493421028
7.9503070793407461013
1.000000000000402
2.000000000000210
diag(6 ) =
3.000000000000886
4.000000000001099
9.999999999999999104
The Ritz vectors are very accurate. Y6 is almost the identity matrix are 1.0. The
largest off diagonal element of Y6T Y6 is about 1016 . Finally,
6 S6,: = 4.991029 , 2.001028 , 3.001028 , 2.001028 , 5.001029 , 1.201047 .
10.5. AN ERROR ANALYSIS OF THE UNMODIFIED LANCZOS ALGORITHM 185
With a much enlarged effort we have obtained the desired result. Thus, the loss
of orthogonality among the Lanczos vectors can be prevented by the explicit reorthogo-
nalization against all previous Lanczos vectors. This amounts to applying the Arnoldi
algorithm. In the sequel we want to better understand when the loss of orthogonality
actually happens.
(10.20) AQj Qj Tj = rj ej + Fj
where the matrix Fj accounts for errors due to roundoff. Similarly, we write
(10.21) Ij Qj Qj = Cj + j + Cj ,
where j is a diagonal matrix and Cj is a strictly upper triangular matrix (with zero
diagonal). Thus, Cj + j + Cj indicates the deviation of the Lanczos vectors from orthog-
onality.
We make the following assumptions
1. The tridiagonal eigenvalue problem can be solved exactly, i.e.,
3. Furthermore,
(10.24) kqi k = 1.
So, we assume that the computations that we actually perform (like orthogonalizations or
solving the eigenvalue problem) are accurate. These assumptions imply that j = O and
(j)
ci,i+1 = 0 for i = 1, . . . , j 1.
We premultiply (10.20) by Qj and obtain
(10.25) Qj AQj Qj Qj Tj = Qj rj ej + Qj Fj
In order to eliminate A we subtract from this equation its transposed,
Qj rj ej ej rj Qj = Qj Qj Tj + Tj Qj Qj + Qj Fj Fj Qj ,
= (I Qj Qj )Tj Tj (I Qj Qj ) + Qj Fj Fj Qj ,
(10.21)
(10.26) = (Cj + Cj )Tj Tj (Cj + Cj ) + Qj Fj Fj Qj ,
= (Cj Tj Tj Cj ) + (Cj Tj Tj Cj ) Fj Qj + Qj Fj .
| {z } | {z }
upper triangular lower triangular
186 CHAPTER 10. ARNOLDI AND LANCZOS ALGORITHMS
Fj Qj Qj Fj is skew symmetric. Therefore we have
Fj Qj Qj Fj = Kj + Kj ,
where Kj is an upper triangular matrix with zero diagonal. Thus, (10.25) has the form
. 0 Cj Tj Tj Cj 0 Kj
O ..
= ..
.
+
..
.
.
Cj Tj Tj Cj 0 Kj 0
| {z
} 0
First j 1
components
of rj Qj .
As the last component of Qj rj vanishes, we can treat these triangular matrices sepa-
rately. For the upper triangular matrices we have
Qj rj ej = Cj Tj Tj Cj + Kj .
Multiplication by si and si , respectively, from left and right gives
si Qj rj ej si = si (Cj Tj Tj Cj )si + si Kj si .
| {z } |{z} |{z}
yi j qj+1 sji
Let Gj := Si Kj Si . Then we have
(j) (j)
(10.27) j sji yi qj+1 = sji yi rj = si Cj si i i si Cj si + gii = gii .
We now multiply (10.25) with si from the left and with sk from the right. As Qj si = yi ,
we have
yi Ayk yi yk k = yi rj ej sk + si Qj Fj sk .
Now, from this equation we subtract again its transposed, such that A is eliminated,
yi yk (i k ) = yi rj ej sk yk rj ej si + si Qj Fj sk sk Qj Fj si
!
(j) (j)
(10.27) gii g
= (j)
sjk kk (j)
sji
sji sjk
1 1
+ (si Qj Fj sk + sk Fj Qj si ) (sk Qj Fj si + si Fj Qj sk )
2 2
(j) (j)
(j) sjk (j) sji (j) (j)
= gii (j) gkk (j) (gik gki ).
sji sjk
Thus we have proved
Theorem 10.2 (Paige, see [7, p.266]) With the above notations we have
(j)
(j) gii
(10.28) yi qj+1 = (j)
j sji
(j) (j)
(j) (j) (j) (j) (j) sjk (j) sji (j) (j)
(10.29) (i k )yi yk = gii (j) gkk (j) (gik gki ).
sji sjk
10.6. PARTIAL REORTHOGONALIZATION 187
then the tridiagonal matrix Tj is the projection of A onto the subspace R(Vj ),
where fj accounts for the roundoff errors committed in the j-th iteration step. Let Wj =
((ik ))1i, kj . Premultiplying equation (10.30) by qk gives
Given Wj we employ equation (10.33) to compute the j + 1-th row of Wj+1 . However,
elements j+1,j and j+1,j+1 are not defined by (10.33). We can assign values to these
two matrix entries by reasoning as follows.
We set j+1,j+1 = 1 because we explicitly normalize qj+1 .
Numerical example
We perform the Lanczos algorithm with matrix
A = diag(1, 2, . . . , 50)
k,k = 1, k = 1, . . . , j
k,k1 = k , k = 2, . . . , j
(10.35) 1
j+1,k = [k j,k+1 + (k j )jk
j
k1 j,k1 j1 j1,k ] + i,k , 1 k j.
Here, we set j,0 = 0. The values k and i,k could be defined to be random variables of
the correct magnitude, i.e., O(k). Following a suggestion of Parlett [7] we used
p
k = kAk, i,k = kAk.
0
0 0
1 1 0
1 0 1 0
189
13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 12 12 12 12 12 12 11 11 11 11 11 11 10 10 10 9 9 9 8 8 8 7 7 7 6 6 5 4 4 3 1 3 2 1 0
Table 10.1: Matlab demo on the loss of orthogonality among Lanczos vectors. Unmodified Lanczos. round(log10(abs(I-Q50Q50 )/eps))
190 CHAPTER 10. ARNOLDI AND LANCZOS ALGORITHMS
Reorthogonalization takes place in the j-th Lanczos step if maxk (j+1,k ) > macheps.
qj+1 is orthogonalized against all vectors qk with j+1,k > macheps3/4 . In the following
iteration step also qj+2 is orthogonalized against these vectors. In Table 10.2 the base-
10 logarithms of the values |wi,j |/macheps obtained with this procedure are listed where
|wi,j | = |qi qj |, 1 j i 50 and macheps 2.2 1016 . In Table 10.3 the base-10
logarithms of the estimates |i,j |/macheps are given. The estimates are too high by (only)
an order of magnitude. However, the procedure succeeds in that the resulting {qk } are
semi-orthogonal.
(We suppose, for simplicity, that Aj1 Q1 has rank p. Otherwise we would have to consider
variable block sizes.)
The approach is similar to the scalar case with p = 1: Let Q1 , . . . , Qj Rnp be
pairwise orthogonal block matrices (Qi Qk = O for i 6= k) with orthonormal columns
(Qi Qi = Ip for all i j). Then, in the j-th iteration step, we obtain the matrix AQj
and orthogonalize it against matrices Qi , i j. The columns of the matrices are obtained
by means of the QR factorization or with the GramSchmidt orthonormalization process.
We obtained the following:
191
Table 10.2: Matlab demo on the loss of orthogonality among Lanczos vectors: Lanczos with partial reorthogonalization.
round(log10(abs(I-Q50Q50 )/eps))
0
2 0
0 2 0
1 0 2 0
7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 6 7 6 7 7 7 7 8 7 7 7 7 7 7 6 6 6 6 5 5 3 3 0 2 0
Table 10.3: Matlab demo on the loss of orthogonality among Lanczos vectors: Lanczos with partial reorthogonalization.
round(log10(abs(I-W50)/eps))
10.8. EXTERNAL SELECTIVE REORTHOGONALIZATION 193
Let Q j := [Q1 , Q2 , . . . , Qj ] be the Krylov basis generated by Algorithm 10.5. Then, in
this basis, the projection of A is the block tridiagonal matrix Tj
A1 B1
..
B1 A2 .
AQ
Q j = Tj = , Ai , Bi Rpp .
j .. ..
. . Bj1
Bj1 Aj
If matrices Bi are chosen to be upper triangular, then Tj is a band matrix with bandwidth
2p + 1!
Similarly as in scalar case, in the j-th iteration step we obtain the equation
O
..
AQj Q j Tj = Qj+1 Bj E + Fj ,
j Ej = . ,
O
Ip
where Fj accounts for the effect of roundoff error. Let (i , yi ) be a Ritz pair of A in
Kjp (Q1 ). Then
yi = Q j si , Tj si = i si .
As before, we can consider the residual norm to study the accuracy of the Ritz pair (i , yi )
of A
sj(p1)+1,i
kAyi i yi k = kAQ j si k kQj+1 Bj E si k =
j si i Q
B
..
.
j
j .
sjp+1,i
We have to compute the bottom p components of the eigenvectors si in order to test for
convergence.
Similarly as in the scalar case, the mutual orthogonality of the Lanczos vectors (i.e.,
the columns of Q j ) is lost, as soon as convergence sets in. The remedies described earlier
are available: full reorthogonalization or selective orthogonalization.
[4] R. Grimes, J. G. Lewis, and H. Simon, A shifted block Lanczos algorithm for
solving sparse symmetric generalized eigenproblems, SIAM J. Matrix Anal. Appl., 15
(1994), pp. 228272.
[6] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear
differential and integral operators, J. Res. Nat. Bureau Standards, Sec. B, 45 (1950),
pp. 255282.
[7] B. N. Parlett, The Symmetric Eigenvalue Problem, Prentice Hall, Englewood Cliffs,
NJ, 1980. (Republished by SIAM, Philadelphia, 1998.).
[8] H. Simon, Analysis of the symmetric Lanczos algorithm with reorthogonalization meth-
ods, Linear Algebra Appl., 61 (1984), pp. 101132.
[9] , The Lanczos algorithm with partial reorthogonalization, Math. Comp., 42 (1984),
pp. 115142.
Chapter 11
The number of iteration steps can be very high with the Arnoldi or the Lanczos algorithm.
This number is, of course, not predictable. The iteration count depends on properties of
the matrix, in particular the distribution of its eigenvalues, but also on the initial vectors.
High iteration counts entail a large memory requirement to store the Arnoldi/Lanczos
vectors and a high amount of computation because of growing cost of the reorthogonal-
ization.
The idea behind the implicitely restarted Arnoldi (IRA) and implicitely restarted Lanc-
zos (IRL) algorithms is to reduce these costs by limiting the dimension of the search space.
This means that the iteration is stopped after a number of steps (which is bigger than
the number of desired eigenvalues), reduce the dimension of the search space without
destroying the Krylov space structure, and finally resume the Arnoldi / Lanczos iteration.
The implicitely restarted Arnoldi has first been proposed by Sorensen [7, 8]. It is imple-
mented together with the implicitely restarted Lanczos algorithms in the software package
ARPACK [4]. The ARPACK routines are the basis for the sparse matrix eigensolver eigs
in Matlab.
We start with the Algorithm 11.1 that is a variant of the Arnoldi Algorithm 10.1. It
195
196 CHAPTER 11. RESTARTING ARNOLDI AND LANCZOS ALGORITHMS
executes just m Arnoldi iteration steps. We will now show how the dimension of the search
space is reduced withouth losing the information regarding the eigenvectors one is looking
for.
Remark 11.1. Step 8 in Algorithm 11.1 is classical GramSchmidt orthogonalization. As
Now we have,
rj+1 = corrected rj+1
= rj+1 Qj+1 Qj+1 rj+1
| {z }
c
= z Qj+1 Qj+1 z Qj+1 Qj+1 rj+1 = z Qj+1 (h + c)
| {z } | {z }
h c
available with
rm = m qm+1 , kqm+1 k = 1.
If m = 0 then R(Qm ) is invariant under A, i.e., Ax R(Qm ) for all x R(Qm ).
This lucky situation implies that (Hm ) m (A). So, the Ritz values and vectors are
eigenvalues and eigenvectors of A.
What we can realistically hope for is m being small. Then,
AQm rm em = (A rm qm )Qm = Qm Hm .
(11.2) AQm = Qm Hm + rm em ,
11.2. IMPLICIT RESTART 197
Algorithm 11.2 k implicit QR steps applied to Hm
1: Hm+ := H .
m
2: for i := 1, . . . , k do
Hm + := V H + V , + I =V R
where Hm
3: i m i i i i (QR factorization)
4: end for
Vm+ = .
}k
We define
Q+ +
m := Qm V ,
+
Hm := (V + ) Hm V + .
Then, from (11.2) we obtain
AQm V + = Qm V + (V + ) Hm V + + rm em V + ,
or
(11.3) AQ+ + + +
m = Qm Hm + rm em V .
As V + has k nonzero off-diagonals below the main diagonal, the last row of V + has the
form
em V + = (0, . . . , 0, , . . . , ), k + p = m.
| {z } | {z }
p1 k+1
We now simply discard the last k columns of (11.3).
AQ+ + + +
m (:, 1 : p) = Qm Hm (:, 1 : p) + rm em V (:, 1 : p)
+ +
= Q+ + +
m (:, 1 : p)Hm (1 : p, 1 : p) + hp+1,p qp+1 ep + vm,p rm ep
| {z }
p+
= Q+ + + + +
m (:, 1 : p)Hm (1 : p, 1 : p) + (qp+1 hp+1,p + rm vm,p ) ep .
| {z }
r+
p
In Algorithm 11.3 we have collected what we have derived so far. We have however left
open in step 3 of the algorithm how the shifts 1 , . . . , k should be chosen. In ARPACK [4],
all eigenvalues of Hm are computed. Those k eigenvalues that are furthest away from some
target value are chosen as shifts. We have not specified how we determine convergence,
too.
One can show that a QR step with shift i transforms the vector q1 in a multiple of
(A i I)q1 . In fact, a simple modification of the Arnoldi relation (11.2) gives
AQm = Qm Hm + rm em .
k
Y
q1 (A)q1 , () = ( i ).
i=1
(11.5) (A + E)
x =
x, E = rm qm , kEk = krm k = m .
11.4. THE GENERALIZED EIGENVALUE PROBLEM 199
According to an earlier theorem we know that if (A) is simple and is the
eigenvalue of A + E closest to , then
kEk
(11.6) | | + O(kEk2 ).
y x
Here, y and x are left and right eigenvectors of E corresponding to the eigenvalue . A
similar statement holds for the eigenvectors, but the distance (gap) to the next eigenvalue
comes into play as well.
In ARPACK, a Ritz pair (, x ) is considered converged if
As || kHm k kAk, the inequality kEk tol kAk holds at convergence. According
to (11.6) well-conditioned eigenvalues are well approximated.
(11.8) Ax = M x.
(11.10) SQm = Qm Hm + rm em , Qm M Qm = Im , Qm M rm = 0.
(11.11) SQm s = Sy = Qm Hm s + rm em s = y + rm em s.
So, y+rm em s can be considered a vector that is obtained by one step of inverse iteration.
This vector is an improved approximation to the desired eigenvector, obtained at negligible
cost. This so-called eigenvector purification is particularly important if M is singular.
Let us bound the residual norm of the purified vector. With (11.11) we have
(11.12) M y = (A M )(y + rm em s)
| {z }
y
with q
k
ykM = 2 + k2 |em s|2 .
This equality holds as y M r. By consequence,
kA
y M y
k = k(A M )
y + My
( })k
| {z
(11.13) 1
Since || is large in general, we obtain good bounds for the residual of the purified eigen-
vectors.
200 CHAPTER 11. RESTARTING ARNOLDI AND LANCZOS ALGORITHMS
EIGS Find a few eigenvalues and eigenvectors of a matrix using ARPACK
D = EIGS(A) returns a vector of As 6 largest magnitude eigenvalues.
A must be square and should be large and sparse.
Example:
A = delsq(numgrid(C,15)); d1 = eigs(A,5,SM);
n = size(A,1); opts.issym = 1;
d2 = eigs(@(x)dnRk(x,C,15),n,5,SM,opts);
50
100
150
200
250
300
350
400
450
To compute the eight largest eigenvalues of this matrix we issue the following Matlab
commands.
-0.0561 - 0.0536i
202 CHAPTER 11. RESTARTING ARNOLDI AND LANCZOS ALGORITHMS
0.1081 + 0.0541i
0.1081 - 0.0541i
-0.1009 - 0.0666i
-0.1009 + 0.0666i
-0.0072 + 0.1207i
-0.0072 - 0.1207i
0.0000 - 1.7007i
0.0000 + 1.7007i
-0.0866
-0.1009 - 0.0666i
-0.1009 + 0.0666i
-0.0072 + 0.1207i
-0.0072 - 0.1207i
0.1081 - 0.0541i
0.1081 + 0.0541i
0.0000 - 1.7007i
0.0000 + 1.7007i
0.0614 - 0.0465i
-0.0072 - 0.1207i
-0.0072 + 0.1207i
0.1081 + 0.0541i
0.1081 - 0.0541i
-0.1009 + 0.0666i
-0.1009 - 0.0666i
0.0000 - 1.7007i
0.0000 + 1.7007i
-0.0808
-0.0072 + 0.1207i
-0.0072 - 0.1207i
-0.1009 - 0.0666i
-0.1009 + 0.0666i
0.1081 + 0.0541i
0.1081 - 0.0541i
0.0000 + 1.7007i
0.0000 - 1.7007i
0.0734 - 0.0095i
-0.0072 + 0.1207i
-0.0072 - 0.1207i
0.1081 - 0.0541i
11.5. A NUMERICAL EXAMPLE 203
2000
1500
1000
500
500
1000
1500
2000
150 100 50 0 50 100 150
0.1081 + 0.0541i
-0.1009 - 0.0666i
-0.1009 + 0.0666i
0.0000 - 1.7007i
0.0000 + 1.7007i
-0.0747
-0.0072 - 0.1207i
-0.0072 + 0.1207i
0.1081 + 0.0541i
0.1081 - 0.0541i
-0.1009 + 0.0666i
-0.1009 - 0.0666i
0.0000 + 1.7007i
0.0000 - 1.7007i
The output indicates that eigs needs seven sweeps to compute the eigenvalues to
the default accuracy of machepskAk. The Ritz values given are the approximations of the
eigenvalues we want to compute. The complete spectrum of west0479 is given in Fig. 11.2.
Notice the different scales of the axes! Fig. 11.3 is a zoom that shows all eigenvalues except
the two very large ones. Here the axes are equally scaled. From the two figures it becomes
clear that eigs has computed the eight eigenvalues (and corresponding eigenvectors) of
largest modulus.
To compute the eigenvalues smallest in modulus we issue the following command.
dsm=eigs(west0479,8,sm);
Iteration 1: a few Ritz values of the 20-by-20 matrix:
0
204 CHAPTER 11. RESTARTING ARNOLDI AND LANCZOS ALGORITHMS
200
150
100
50
50
100
150
200
200 150 100 50 0 50 100 150 200
Figure 11.3: A zoom to the center of the spectrum of matrix west0479 that excludes the
largest two eigenvalues on the imaginary axis
0
0
0
0
0
0
0
0
-0.0228 - 0.0334i
0.0444
-0.0473
0.0116 + 0.0573i
0.0116 - 0.0573i
-0.0136 - 0.1752i
-0.0136 + 0.1752i
-3.4455
5.8308
-0.0228 + 0.0334i
0.0444
-0.0473
0.0116 - 0.0573i
0.0116 + 0.0573i
-0.0136 + 0.1752i
-0.0136 - 0.1752i
-3.4455
5.8308
-0.0228 + 0.0334i
0.0444
-0.0473
0.0116 - 0.0573i
0.0116 + 0.0573i
-0.0136 + 0.1752i
-0.0136 - 0.1752i
-3.4455
5.8308
>> dsm
dsm =
0.0002
-0.0003
-0.0004 - 0.0057i
-0.0004 + 0.0057i
0.0034 - 0.0168i
0.0034 + 0.0168i
-0.0211
0.0225
206 CHAPTER 11. RESTARTING ARNOLDI AND LANCZOS ALGORITHMS
0.025
0.02
0.015
0.01
0.005
0.005
0.01
0.015
0.02
0.025
0.025 0.02 0.015 0.01 0.005 0 0.005 0.01 0.015 0.02 0.025
>> 1./dsm
ans =
1.0e+03 *
5.8308
-3.4455
-0.0136 + 0.1752i
-0.0136 - 0.1752i
0.0116 + 0.0573i
0.0116 - 0.0573i
-0.0473
0.0444
>> [p,e,t]=initmesh(auto);
>> [p,e,t]=refinemesh(auto,p,e,t);
>> [p,e,t]=refinemesh(auto,p,e,t);
>> p=jigglemesh(p,e,t);
>> [A,M]=assema(p,t,1,1,0);
>> whos
Name Size Bytes Class
11.6. ANOTHER NUMERICAL EXAMPLE 207
A 1095x1095 91540 double array (sparse)
M 1095x1095 91780 double array (sparse)
e 7x188 10528 double array
p 2x1095 17520 double array
t 4x2000 64000 double array
>> sigma=-.01;
>> p=10; tol=1e-6; X0=rand(size(A,1),15);
>> [V,L] = sivit(A,M,p,X0,sigma,tol);
||Res(0)|| = 0.998973
||Res(5)|| = 0.603809
||Res(10)|| = 0.0171238
||Res(15)|| = 0.00156298
||Res(20)|| = 3.69725e-05
||Res(25)|| = 7.11911e-07
>> % 25 x 15 = 375 matrix - vektor - multiplications until convergence
>>
>> format long, L
L =
0.00000000000000
0.01269007628847
0.04438457596824
0.05663501055565
0.11663116522140
0.13759210393200
0.14273438015546
0.20097619880776
0.27263682280769
0.29266080747831
ans =
1.8382e-15
Then we use Matlabs solver eigs. We set the tolerance and the shift to be the same
as with sivit. Notice that ARPACK applies a shift-and-invert spectral transformation if
a shift is given.
>> flag
flag =
ans =
3.7671e-14
ans = 8.0575e-15
Clearly the eigenvectors are mutually m-orthogonal. Notice that eigs returns the
eigenvalues sorted from large to small such that they have to be reordered before comparing
with those sivit computed.
In the next step we compute the largest eigenvalues of the matrix
(11.14) S = R(A M )1 RT ,
x = RB*(RA\(RA\(RB*x)));
>> global RA RB
>> RA = chol(A-sigma*M);
>> RB = chol(M);
>> [v,l1,flag]=eigs(afun,n,10,lm,options);
Iteration 1: a few Ritz values of the 20-by-20 matrix:
0
0
0
0
0
0
0
0
0
0
>> flag
flag =
>> l1 = diag(l1)
l1 =
210 CHAPTER 11. RESTARTING ARNOLDI AND LANCZOS ALGORITHMS
100.0000
44.0721
18.3876
15.0071
7.8970
6.7754
6.5473
4.7399
3.5381
3.3040
ans =
0.0000
0.0127
0.0444
0.0566
0.1166
0.1376
0.1427
0.2010
0.2726
0.2927
ans =
4.4047e-14
(11.15) R := AV V M, V = [v1 , . . . , vk ],
k+1 eTk ,
AQ = QH + q Q q
k+1 = 0, H Hessenberg.
AV = V SHS 1 + q
k+1 ek S 1 .
(11.16) AV = V M + R = V M + vw , M Fkk .
with some v Fn and w Fk . Let S1 , S11 = S1 , be the Householder reflector that maps
w onto a multiple of ek , S1 w = ek . Then, (11.16) becomes
AV S1 = V S1 S1 M S1 + veTk .
be the spectral decomposition of the tridiagonal matrix Tk . Then, for all i, the Ritz vector
(k)
yi = Qk si Kk (A, q)
By Theorem 11.1 we see that any set [yi1 , yi2 , . . . , yij ] of Ritz vectors forms a Krylov
space. Note that the generating vector differs for each set.
212 CHAPTER 11. RESTARTING ARNOLDI AND LANCZOS ALGORITHMS
Algorithm 11.4 Thick restart Lanczos
1: Let us be given k Ritz vectors yi and a residual vector rk such that Ayi = i yi + i rk ,
i = 1, . . . , k. The value k may be zero in which case r0 is the initial guess.
This algorithm computes an orthonormal basis y1 , . . . , yj , qj+1 , . . . , qm that spans a
m-dimensional Krylov space whose generating vector is not known unless k = 0.
2: qk+1 := rk /krk k.
3: z := Aqk+1 ;
4: k+1 := qk+1 z;
Pk
5: rk+1 = z k+1 qk+1 i=1 i yi
6: k+1 := krk+1 k
7: for i = k + 2, . . . , m do
8: qi := ri1 /i1 .
9: z := Aqi ;
10: i := qi z;
11: ri = z i qi i1 qi1
12: i = kri k
13: end for
We now split the indices 1, . . . , k in two sets. The first set contains the good Ritz
vectors that we want to keep and that we collect in Y1 , the second set contains the bad
ones that we want to remove. Those we put in Y2 . In this way we get
1
(11.18) A[Y1 , Y2 ] [Y1 , Y2 ] = k+1 qk+1 [s1 , s2 ].
2
Keeping the first set of Ritz vectors and purging (deflating) the rest yields
AY1 Y1 1 = k+1 qk+1 s1 .
We now can restart a Lanczos procedure by orthogonalizing Aqk+1 against Y1 =: [y1 , . . . , yj ]
and qk+1 . From the equation
(k)
Ayi yi i = qk+1 i , i = k+1 ek si
we get
qk+1 Ay = ,
whence
j
X
(11.19) rk+1 = Aqk+1 k+1 qk+1 i yi Kk+1 (A, q.)
i=1
From this point on the Lanczos algorithm proceeds with the ordinary three-term recur-
rence. We finally arrive at a relation similar to (11.17), however, with
Qm = [y1 , . . . , yj , qk+1 , . . . , qm+kj ]
and
1 1
.. ..
. .
j j
Tm = ..
1 j k+1 .
.. ..
. . m+kj1
m+kj1 m+kj
11.8. KRYLOVSCHUR ALGORITHM 213
This procedure, called thick restart, has been suggested by Wu & Simon [11], see Al-
gorithm 11.4. It allows to restart with any number of Ritz vectors. In contrast to the
implicitly restarted Lanczos procedure, here we need the spectral decomposition of Tm .
Its computation is not an essential overhead in general. The spectral decomposition ad-
mits a simple sorting of Ritz values. We could further split the first set of Ritz pairs into
converged and unconveregd ones, depending on the value m+1 |sk,i |. If this quantity is
below a given threshold we set the value to zero and lock (deflate) the corresponding Ritz
vector, i.e., accept it as an eigenvector.
The procedure is mathematically equivalent with the implicitely restarted Lanczos al-
gorithm. In fact, the generating vector of the Krylov space span{y1 , . . . , yj , qj+1 , . . . , qm }
that we do not compute is q1 = (A j+1 I) (A m I)q1 . This restarting procedure
is probably simpler than with IRL.
The problem of losing orthogonality is similar to plain Lanczos. Wu & Simon [11] in-
vestigate the various reorthogonalizing strategies known from plain Lanczos (full, selective,
partial). In their numerical experiments the simplest procedure, full reorthogonalization,
performs similarly or even faster than the more sophisticated reorthogonalization proce-
dures.
Remark 11.2. The thick restart Lanczos procedure does not need a Krylov basis of
span{y1 , . . . , yj } or, equivalently, the tridiagonalization of
1 1
.. ..
. .
.
j j
1 j k+1
However, at the next restart, the computation of the spectral decomposition will most
probably require it.
Question: How can the arrow matrix above be tridiagonalized economically?
(11.1) AQm = Qm Hm + rm em ,
be a Schur
where Hm is Hessenberg and [Qm , rm ] has full rank. Let Hm = Sm Tm Sm
decomposition of Hm with unitary Sm and triangular Tm . Then, similarly as in the
previous section we have
(11.20) AYm = Ym Tm + rm s , Y m = Qm S m , s = em Sm .
The upper trangular form of Tm eases the analysis of the individual Ritz pairs. In par-
ticular, it admits moving unwanted Ritz values to the lower-right corner of Tm . (See the
subroutine trexc in LAPACK for details.) Similarly as in (11.18) we collect the good
and bad Ritz vectors in matrices Y1 and Y2 , respectively. In this way we get
T T12
(11.21) A[Y1 , Y2 ] [Y1 , Y2 ] 11 = k+1 qk+1 [s1 , s2 ].
T22
214 CHAPTER 11. RESTARTING ARNOLDI AND LANCZOS ALGORITHMS
Keeping the first set of Ritz vectors and purging the rest yields
In the thick-restart Lanczos procedure we have found an eigenpair as soon as k+1 |sik | is
sufficiently small. The determination of a converged subspace with the general Krylov
Schur procedure is not so easy. However, if we manage to bring s1 into the form
s1 0
s1 = =
s1 s1
i.e.,
AY1 = Y1 T11
In most cases s1 consists of a single small element or of two small elements in the case of
a complex-conjugate eigenpair of a real nonsymmetric matrix [9]. These small elements
are then declared zero and the columns in Y1 are locked, i.e., they are not altered any-
more in the future computations. Orthogonality against them has to be enforced in the
continuation of the eigenvalue computation though.
(11.22) j.
(A 1 B)1 BQj = Qj Hj + rj eT = Qj+1 H
We want to derive a KrylovSchur relation for a new shift 2 6= 1 from (11.23) for the
same space R(Yj+1 ) without accessing the matrices A or B. The tricky thing is to avoid
discard all the information gathered in the basis Yj+1 that was computed with the old
shift 1 . This is indeed possible if we replace the basis Yj+1 with a new basis Wj+1 , which
spans the same subspace as Yj+1 but can be interpreted as the orthonormal basis of a
KrylovSchur relation with the new shift 2 .
We rewrite the relation (11.23) as
Ij Tj
BYj = BYj+1 = (A 1 B)Y j+1 .
0 s
BIBLIOGRAPHY 215
Introducing the shift 2 this becomes
Ij Tj Tj
(11.24) BYj+1 + (1 2 ) = (A 2 B)Yj+1 .
0 s s
To construct a KrylovSchur relation we must get rid of the last non-zero row of the matrix
in braces in (11.24). To that end we use the QR factorization
Ij Tj Rj
+ (1 2 ) = Qj+1 .
0T s 0T
Using it we obtain
Rj Rj Tj
BYj+1 Qj+1 BW j+1 = BW R
j j = (A 2 B)W Q
j+1 j+1
0T 0T s
This equation can easily been transformed into an Arnoldi or KrylovSchur relation.
All these transformations can be executed without performing any operations on the
large sparse matrices A and B.
In a practical implementation, the mentioned procedure is combined with locking,
purging, and implicit restart. First run shifted and inverted Arnoldi with the first shift
1 . When an appropriate number of eigenvalues around 1 have converged, lock these
converged eigenvalues and purge those that are altogether outside the interesting region,
leaving an Arnoldi (11.22) or KrylovSchur recursion (11.22) for the remaining vectors.
Then introduce the new shift 2 and perform the steps above to get a new basis Wj+1
that replaces Vj+1 . Start at the new shift by operating on the last vector of this new basis
r := (A 2 B)1 Bwj+1
and get the next basis vector wj+2 in the Arnoldi recurrence with the new shift 2 . Con-
tinue until we get convergence for a set of eigenvalues around 2 , and repeat the same
procedure with new shifts until either all interesting eigenvalues have converged or all the
shifts in the prescribed frequency range have been used.
Bibliography
. Bjo
[1] A rck, Numerics of GramSchmidt orthogonalization, Linear Algebra Appl.,
197/198 (1994), pp. 297316.
[5] The Matrix Market. A repository of test data for use in comparative studies of
algorithms for numerical linear algebra. Available at URL http://math.nist.gov/
MatrixMarket/.
[6] A. Ruhe, Rational Krylov subspace method, in Templates for the solution of Algebraic
Eigenvalue Problems: A Practical Guide, Z. Bai, J. Demmel, J. Dongarra, A. Ruhe,
and H. van der Vorst, eds., SIAM, Philadelphia, PA, 2000, pp. 246249.
[8] , Implicitly restarted Arnoldi/Lanczos methods for large scale eigenvalue calcu-
lations, in Parallel Numerical Algorithms, D. E. Keyes, A. Sameh, and V. Venkatakr-
ishnan, eds., Kluwer, Dordrecht, 1997, pp. 119165. (ICASE/LaRC Interdisciplinary
Series in Science and Engineering, 4).
[11] K. Wu and H. D. Simon, Thick-restart Lanczos method for large symmetric eigen-
value problems, SIAM J. Matrix Anal. Appl., 22 (2000), pp. 602616.
Chapter 12
The Lanczos and Arnoldi methods are very effective to compute extremal eigenvalues
provided these are well separated from the rest of the spectrum. Lanczos and Arnoldi
methods combined with a shift-and-invert spectral transformation are also efficient to
compute eigenvalues in the vicinity of the shift . In this case it is necessary to solve a
system of equation
(A I)x = y, or (A M )x = y,
respectively, in each iteration step. These systems have to be solved very accurately since
otherwise the Lanczos or Arnoldi relation does not hold anymore. In most cases the matrix
A I (or A M ) is LU or Cholesky factored. The JacobiDavidson (JD) algorithm is
particularly attractive if this factorization is not feasible [11].
(12.1) AVm s Vm s v1 , . . . , vm .
(12.2) Vm AVm s = Vm Vm s
217
218 CHAPTER 12. THE JACOBI-DAVIDSON METHOD
where DA is the diagonal of the matrix A. The vector t is then made orthogonal to the
basis vectors v1 , . . . , vm . The resulting vector, after normalization, is chosen as vm+1 by
which R(Vm ) is expanded, i.e., Vm+1 = [v1 , . . . , vm , vm+1 ].
This method is successful in finding dominant eigenvalues of (strongly) diagonally dom-
inant matrices. The matrix DA j I has therefore often been viewed as a preconditioner
for the matrix A j I. A number of investigations were made with more sophisticated
preconditioners M j I, see e.g. [7, 8]. They lead to the conclusion that M j I should
not be too close to A j I which contradicts the notion of a preconditioner as being an
easily invertible (factorizable) approximation of A j I.
This is called the Jacobi orthogonal component correction (JOCC) by Sleijpen &
van der Vorst [11]. As t uj we may split equation (12.4) in the part parallel to uj and
in the part orthogonal to uj . If kuj k = 1 then the part parallel to uj is
(12.6) j + uj At = .
which is equivalent to
(12.10) uj+1 = uj + t.
This equation has the solution t = uj . Therefore, without the condition t uj there is
no progress in solving the eigenvalue problem Ax = x.
One can argue that this is the approach suggested by Davidson [2]. Davidson approx-
imated A on the left side of (12.12) by an approximation of it, typically the diagonal,
say DA , of A. As his matrices were diagonally dominant, he solved a reasonably good
approximation of (12.12). If DA in (12.3) is considered a preconditioner of A then any
matrix closer to A should lead to better performance of the algorithm. In extremis, A
should be a possible choice for the matrix on the left. But we have just seen that this
leads to a situation without progress. In fact the progess in the iteration deteriorates the
better the preconditioner approximates the system matrix. In consequence, DA in (12.3)
must not be considered a preconditioner.
Let us now investigate what happens if the correction equation is solved exactly. To
that end we write it as
(I uj uj )(A j I)t = rj , t uj ,
(A j I)t uj uj (A j I)t = rj ,
| {z }
F
or,
(A j I)t = uj rj .
Assuming that j is not an eigenvalue of A we get
t = (A j I)1 uj (A j I)1 rj .
0 = uj (A j I)1 uj uj (A j I)1 rj ,
whence
uj (A j I)1 rj
= .
uj (A j I)1 uj
220 CHAPTER 12. THE JACOBI-DAVIDSON METHOD
By (12.10), the next approximate is then
(12.13) uj+1 = uj + t = uj + (A j I)1 uj (A j I)1 rj = (A j I)1 uj
| {z }
uj
which is a step of Rayleigh quotient iteration! This implies a fast (quadratic in general,
cubic in the Hermitian case) convergence rate of this algorithm.
In general the correction equation
(12.14) = (I uj uj )(A j I)(I uj uj )t = rj ,
At t uj ,
is solved iteratively with a Krylov space solver like GMRES or MINRES [1]. To get
a decent performance a preconditioner is needed. Sleijpen and van der Vorst suggest
preconditioners of the form
(12.15) = (I uj u )K(I uj u ),
K K A j I.
j j
12.2.1 Restarts
Evidently, in Algorithm 12.1, the dimension m of the search space can get large. To limit
memory consumption, we limit m such that m mmax . As soon as m = mmax we restart:
Vm = Vmmax is replaced by the q Ritz vectors corresponding to the Ritz values closest to
the target . Notice that the Schur decomposition of M = Mm,m = Vm AVm is computed
already in step 11 of the Algorithm. Let M = S T S be this Schur decomposition with
|t11 | |t22 | . Then we set Vq = Vm S:,1:q , VqA = VmA S:,1:q , M = T S1:q,1:q .
Notice that the restart is easy because the JacobiDavidson algorithm is not a Krylov
space method.
(12.17) AQk = Qk Tk , Qk = [
x1 , . . . , x
k ].
is a partial Schur decomposition of A [13]. We want to extend the partial Schur de-
composition by one vector employing the JacobiDavidson algorithm. Since Schur vectors
are mutually orthogonal we can apply the JacobiDavidson algorithm in the orthogonal
complement of R(Qk ), i.e., we apply the JacobiDavidson algorithm to the matrix
(12.18) (I Qk Qk )A(I Qk Qk ), Qk = [
x1 , . . . , x
k ].
(12.20) kQ
(I Q kQ
)(A j I)(I Q )t = rj , t = 0.
Q
k k k
(12.21) = (I Q
K kQ kQ
)K(I Q ), K A j I.
k k
we have to solve
= (I Q
Kz kQ
)Kz = (I Q
kQ
)y, k.
zQ
k k
Thus,
k a.
z = K 1 y K 1 Q
z = 0,
Similarly as earlier, we determine a by means of the constraint Q k
a = (Q k )1 Q
K 1 Q K 1 y.
k k
with
Tk Qk Axk+1
Tk+1 = .
0 xk+1 A
xk+1
where Q k = [Qk , q ].
vj = (I Vj1 Vj1 )t/k(I V
9: j1 Vj1 )tk; Vj := [Vj1 , vj ].
Hj1 Vj1 wj
10: wj = Avj ; Hj = ; Wj = [Wj1 , wj ].
vj Wj1 vj wj
11: Compute the Schur decomposition of
Hj =: Sj Rj Sj
(j)
with the eigenvalues rii sorted according to their distance to .
12: /* Test for convergence */
13: repeat
(j)
14: = 1 ; q = Vj s1 ; w = Wj s1 ; r = w q
15: found := krk <
16: if found then
17: Qk+1 = [Qk , q ]; k := k + 1;
18: end if
19: until not found
20: /* Restart */
21: if j = jmax then
22: Vjmin := Vj [s1 , . . . , smin ]; Tjmin := Tj (1 : jmin , 1 : jmin );
23: Hjmin := Tjmin ; Sjmin := Ijmin ; J := jmin
24: end if
25: end while
reasons, the spectral shift in the preconditioner K is always fixed. In this way it has to
be computed just once. Notice that K is changing with each correction equation.
Remark 12.2. As long as the shift is held fixed JacobiDavidson is actually performing a
shift-and-invert Arnoldi iteration.
Algorithm 12.2 gives the framework for an algorithm to compute the partial Schur
decomposition of a matrix A. Qk stores the converged Schur vectors; Vj stores the active
search space. This algorithm does not take into account some of the just mentioned
issues. In particular the shift is always taken to be the Rayleigh quotient of the most
recent approximate q.
224 CHAPTER 12. THE JACOBI-DAVIDSON METHOD
12.3 The generalized Hermitian eigenvalue problem
We consider the problem
(12.23) Ax = M x,
with A and M nn Hermitian, and M additionally positive definite. Then the eigenvectors
can be chosen mutually M -orthogonal,
(12.24) xi M xj = ij , Axi = i M xi , 1 i, j n,
where ij denotes the Kronecker delta function. Then it makes sense in the Jacobi
Davidson (as in other algorithms) to keep the iterates M -orthogonal.
Let Vm = [v1 , . . . , vm ] be an M -orthogonal basis of the search space Vm . Then the
Galerkin condition
(12.25) AVm s M Vm s v1 , . . . , vm ,
(12.26) Vm AVm s = Vm M Vm s = s.
u
Let (, = Vm
s) be a solution of (12.26). Then the correction t to u
must be M -
orthogonal,
(12.27) t M u
=0 (I u
u M )t = t.
(12.28) (I M u
u )(I u
)(A M u M )t = (I u
u M )
r, =
r, t M u
,
where
r = A u
u M . Preconditioners for the secular equation are chosen of the form
(12.29) = (I M u
K u )K(I u
u M ),
The converged eigenvalues and eigenvectors are stored in Q and lambda. The
number of outer JD iterations performed is returned in it.
226 CHAPTER 12. THE JACOBI-DAVIDSON METHOD
>> K=diag(diag(A));
>> options
options =
linsolv: 6
strategy: 0
>> options.tol=1e-8
options =
tol: 1.0000e-08
jmax: 20
jmin: 10
clvl: 1
optype: 1
linsolv: 5
N= 1095 ITMAX=1.095000e+02
KMAX= 5 JMIN= 10 JMAX= 20 V0DIM= 1
TAU= -1.0000e-02 JDTOL= 1.0000e-08 STRATEGY= 0
LINSOLVER= MINRES OPTYPE= SYM
LINITMAX= 100 EPS_TR= 1.000e-04 TOLDECAY= 2.00e+00
JDSYM
I LAMBDA(I) RES(I)
---------------------------------------
1 9.102733263227557e-16 1.70111e-09
2 1.269007628846320e-02 9.86670e-11
3 4.438457596823515e-02 7.01153e-09
4 5.663501055565738e-02 6.50940e-09
5 1.166311652214006e-01 3.19504e-09
>>
0
10
2
10
4
10
6
10
8
10
10
10
0 5 10 15 20 25 30 35
Figure 12.2: View of a spectrum (A) in the complex plane. The eigenvalues in the red
circle are to be computed
The success of the JacobiDavidson algorithm depends heavily on the quality of the
actual Ritz pair (j , q
). However, the RayleighRitz procedure can lead to problem if it is
applied to interior eigenvalues. The following simple numerical example shall demonstrate
the problem. Let
0 0 0 1 0
A= 0 1 0 , U = 0 0.5 .
0 0 1 0 0.5
Then,
0 0
U AU = U U = I2 .
0 0
So, any linear combination of the columns of U is a Ritz vector corresponding to the Ritz
value 0, e.g.,
0.5
0.5
U = 0.5 .
0.5
0.5
Thus, although the basis contains the correct eigenvalue associated with the eigenvalue 0,
the RayleighRitz procedure fails to find it and, instead, returns a very bad eigenvector
approximation.
This example may look contrived. So, we conduct a Matlab experiment with the
same A but with a randomly perturbed U .
12.6. HARMONIC RITZ VALUES AND VECTORS 229
>> rand(state,0)
>> U1=U+1e-4*rand(size(U)); [U1,dummy]=qr(U1,0); U1=-U1
U1 =
1.0000 -0.0001
0.0000 0.7071
0.0001 0.7071
>> B=U1*A*U1
B =
1.0e-04 *
-0.0000 -0.2656
-0.2656 0.1828
>> [X,L]=eig(B)
X =
-0.8140 -0.5808
-0.5808 0.8140
L =
1.0e-04 *
-0.1896 0
0 0.3723
>> x=U1*-X(:,1)
x =
0.8140
0.4107
0.4107
>> theta=L(1,1)
theta =
-1.8955e-05
>> norm(A*x-x*theta)
ans =
0.5808
(12.30) V (A I)1 V s = V V s.
The largest Ritz values j approximate the largest eigenvalues of (A I)1 , i.e.,
1 1
j j + ,
j j
(12.31) U (A I) U s = U (A I) (A I)U s,
(12.32) U (A I) (A I)U s = U (A I) U s.
(12.33) V V s = V U s.
(12.34) V (A I)U s = V U s.
In practice, we are interested only in the harmonic Ritz pair corresponding to the small-
est harmonic Ritz values. In the correction equation of the JacobiDavidson algorithm the
harmonic Ritz vector is used as the latest eigenvector approximation and the harmonic
Ritz values as the shift. In the symmetric case the harmonic Ritz value is replaced by the
Rayleigh quotient of the harmonic Ritz vector x, since
We continue the previous numerical example regarding the computation of the eigen-
value 0 of A = diag(0, 1, 1)
>> V=(A-theta*eye(3))*U1;
>> [v,l]=eig(V*V, V*U1)
v =
-1.000000000000000 -1.000000000000000
0.000059248824925 -0.713473633096137
l =
1.0e+17 *
0.000000000000000 0
0 -1.970695224946170
The above considerations affect the JacobiDavidson algorithm in the extraction phase.
Steps 11 and 14 in Algorithm 12.2 become
(Wj
Vj )(Wj Vj )s = (Wj
Vj )Vj s.
14: Set q
= Vj
s, w
= Wj
s. = + or = q
A q q
q/ .
To solve the eigenvalue problem (12.34) the QZ algorithm has to be employed, see
section 12.8. In the symmetric case (12.33) the symmetric QR algorithm will suffice in
general since the matrix on the left is positive definite.
(12.35) min kA
x x
k
U ,k
x xk=1
This minimization problem is solved by the right singular vector corresponding to the
smallest singular value of (A I)U or, equivalently, the eigenvector corresponding to
the smallest eigenvalue of
U (A I) (A I)U z = z.
u =
232 CHAPTER 12. THE JACOBI-DAVIDSON METHOD
0 1.0000 0
-0.7071 0 0.7071
0.7071 0 0.7071
s =
1.0000 0
0 0
0 0
v =
0 1
-1 0
>> U*v(:,2)
ans =
1
0
0
u =
-0.0000 0.5810 0.8139
-0.7071 -0.5755 0.4108
0.7071 -0.5755 0.4108
s =
1.0001 0
0 0.0000
0 0
v =
-0.0001 1.0000
-1.0000 -0.0001
ans =
1.00009500829405
-0.00001878470226
0.00001878647014
With the refined Ritz vector approach Steps 11 and 14 in Algorithm 12.2 are replaced
by
11: q
Compute the Ritzpair (, ) of A closest to the target value.
s of AVj V
Compute the smallest singular vector j.
14: Replace q
by Vj
s.
12.8. THE GENERALIZED SCHUR DECOMPOSITION 233
12.8 The generalized Schur decomposition
The QZ algorithm computes the following generalized Schur decomposition.
(12.37) Q AZ = T A , Q BZ = T B ,
(A, B) = {tA B B
ii /tii | tii 6= 0}.
(12.38) Ax = Bx,
with arbitrary A and B. There is a variant of JacobiDavidson called JDQZ that com-
putes a partial QZ decomposition of the stencil (A, B). This section follows closely the
corresponding section in the eigenvalue templates [12]. Further details are found in [3].
With = /, the generalized eigenproblem (12.38) is equivalent to the eigenproblem
(12.39) (A B)x = 0,
where we denote a generalized eigenvalue of the matrix pair {A, B} as a pair (, ). The
notation (12.39) is preferred over (12.40), because underflow or overflow for = / in
finite precision arithmetic may occur when and/or are zero or close to zero. It also
emphazises the symmetry of the roles of A and B.
A partial generalized Schur form of dimension k for a matrix pair {A, B} is the
decomposition
where Qk and Zk are unitary n k matrices and RkA and RkB are upper triangular k k
matrices. A column qi of Qk is referred to as a generalized Schur vector, and we refer to a
234 CHAPTER 12. THE JACOBI-DAVIDSON METHOD
pair ((i , i ), qi ), with (i , i ) = (RkA (i, i), RkB (i, i)) as a generalized Schur pair. It follows
that if ((, ), y) is a generalized eigenpair of (RkA , RkB ) then ((, ), Qk y) is a generalized
eigenpair of {A, B}.
From the relations (12.40) we see that
i Aqi i Bqi zi .
(12.41) Au Bu span(Wj ).
This decomposition can be reordered such that the first column of S R and the (1, 1)-
entries of T A and T B represent the wanted Petrov solution [3]. With s := sR R
1 := S e1 and
A B
:= T1,1 , := T1,1 , the Petrov vector is defined as
u := Vj s = Vj sR
1
for the associated generalized Petrov value (, ). In an analogous way we can define a
left Petrov vector as
p := Wj sL
1 sL L
1 := S e1
If Vj and Wj are unitary, as in Algorithm 12.3, then ksR k2 = ksL k2 = 1 implies kuk2 = 1.
With the decomposition in (12.43), we construct an approximate partial generalized
Schur form (cf. (12.40)): Vj S R approximates a Qk , and Wj S L approximates the associated
Zj .
It is not yet clear how to choose the test space Wj . The equations span(Zj ) =
span(AQj ) = span(BQj ), cf. (12.40), suggest to choose Wj such that span(Wj ) coin-
cides with span(0 AVj + 0 BVj ) for some suitably chosen 0 and 0 . With the weights
0 and 0 we can influence the convergence of the Petrov values. If we want eigenpair
approximations for eigenvalues close to a target , then the choice
p
0 = 1/ 1 + | |2 , 0 = 0
is very effective [3], especially if we want to compute eigenvalues in the interior of the
spectrum of A B. We will call the Petrov approximations for this choice the harmonic
Petrov eigenpairs. The Jacobi-Davidson correction equation for the component t u for
the pencil A B becomes
(12.44) (I pp ) (A B) (I uu ) t = r, r := Au Bu.
12.9. JDQZ: COMPUTING A PARTIAL QZ DECOMPOSITION 235
Sleijpen et al. [10] have shown that if (12.44) is solved exactly, the convergence to the
generalized eigenvalue is quadratic. Usually, this correction equation is solved only ap-
proximately, for instance, with a (preconditioned) iterative solver. The obtained vector t
is used for the expansion v of Vj and 0 Av + 0 Bv is used for the expansion of Wj . For
both spaces we work with orthonormal bases. Therefore, the new columns are orthonor-
malized with respect to the current basis by a modified Gram-Schmidt orthogonalization
process.
12.9.1 Restart
Suppose that the generalized Schur form (12.43) is ordered with respect to such that
A B A B A B
|T1,1 /T1,1 | |T2,2 /T2,2 | |Tj,j /Tj,j |,
where j is the dimension of span(Vj ). Then, for i < j, the space span(Vj sR R
1 , . . . , Vj si )
R
spanned by the first i columns of Vj S contains the i most promising Petrov vectors.
The corresponding test subspace is given by span(Wj sL , . . . , W sL
i ). Therefore, in order to
reduce the dimension of the subspaces (implicit restart) to jmin , jmin < j, the columns
vjmin +1 through vj and wjmin+1 through wj can simply be discarded and the Jacobi-
Davidson algorithm can be continued with
V = [V sR R L L
1 , . . . , V sjmin ] and W = [W s1 , . . . , W sjmin ].
12.9.2 Deflation
Like in the Jacobi-Davidson algorithm for the standard eigenvalue problem, in the Jacobi-
Davidson process for the generalized eigenvalue problem found (converged) Ritz (here
Petrov) vectors can be deflated.
The partial generalized Schur form can be obtained in a number of successive steps.
Suppose that we have already available the partial generalized Schur form AQk1 =
A
Zk1 Rk1 and BQk1 = Zk1 Rk1 B . We want to expand this partial generalized Schur
form with the new right Schur vector u and the left Schur vector p to
A
Rk1 a
A[Qk1 u] = [Zk1 p]
0
and
B
Rk1 b
A[Qk1 u] = [Zk1 p]
0
The new generalized Schur pair ((, ), u) satisfies
(A B)u,
or, since a b = Zk1
Qk1 u = 0 and
I Zk1 Zk1 (A B) I Qk1 Qk1 u = 0.
This eigenproblem can be solved again with the Jacobi-Davidson QZ process. In that
process we construct vectors vi that are orthogonal to Qk1 and vectors wi that are
orthogonal to Zk1 . This simplifies the computation of the interaction matrices M A and
M B , associated with the deflated operators
M A W (I Zk1 Zk1 ) A (I Qk1 Qk1 ) V = W AV,
M A W (I Zk1 Zk1 ) B (I Qk1 Qk1 ) V = W BV,
12.9.3 Algorithm
The Jacobi-Davidson algorithm to compute a partial QZ decomposition for a general
matrix pencil (A, B) is given in Algorithm 12.3 This algorithm attempts to compute the
generalized Schur pairs ((, ), q), for which the ratio / is closest to a specified target
value in the complex plane. The algorithm includes restart in order to limit the dimension
of the search space, and deflation with already converged left and right Schur vectors.
To apply this algorithm we need to specify a starting vector v0 , a tolerance , a target
value , and a number kmax that specifies how many eigenpairs near should be computed.
The value of jmax specifies the maximum dimension of the search subspace. If it is exceeded
then a restart takes place with a subspace of dimension jmin .
On completion the kmax generalized eigenvalues close to are delivered, and the cor-
responding reduced Schur form AQ = ZRA , BQ = ZRB , where Q and Z are n by kmax
orthogonal and RA , RB are kmax by kmax upper triangular. The generalized eigenvalues
are the on-diagonals of RA and RB . The computed form satisfies kAqj ZRA ej k2 = O(),
kBqj ZRB ej k2 = O(), where qj is the jth column of Q.
(12.45) T ()x = 0
where the n n matrix T () has elements that depend on the scalar parameter . For the
linear eigenvalue problem T () = A B. is an eigenvalue of (12.45) if T () is singular;
a nontrivial solution x of the singular linear system is a corresponding eigenvector.
For small problems, Newton iteration is applicable. Ruhe [9] suggests to proceed as
follows. Complement (12.45) by a normalization condition
(12.46) v x = 1.
Then, we solve
x T ()x 0
(12.47) P = = .
v x 1 0
12.10. JACOBI-DAVIDSON FOR NONLINEAR EIGENVALUE PROBLEMS 237
13: r = uA uB ; = Z uB ;
a = Z uA ; b r = r Z(
a b);
14: while k rk < do
15: A B
A R
a B R
b
R := ; R := ;
0T 0T
16: Q := [Q, u]; Z := [Z, p]; k := k + 1;
17: if k = kmax then
18: return (Q, Z, RA , RB )
19: end if
20: m := m 1;
21: for i = 1, . . . , m do
22: vi := V sR A A R B B R
i+1 ; vi := V si+1 ; vi := V si+1 ;
23: L R L
wi := W si+1 ; si := si := ei ;
24: end for
25: M A , M B is the lower m m block of T A , T B , resp.
26: u := u1 ; p := w1 ; uA := v1A ; uB := v1b ; = T1,1 A ; = TB ;
1,1
27: r = uA uB ; = Z uB ;
a = Z uA ; b r = r Z(
a b);
28: end while
29: if m mmax then
30: for i = 2, . . . , mmin do
31: vi := V sR A A R B B R
i ; vi := V si ; vi := V si ; wi := W si ;
L
Here, C is some normalization constant. The vector vs may depend on the iteration step.
It can be chosen in a number of ways. It could be constant, e.g., vs = ei . This amounts
to keeping one of the entries of xs constant. Another choce is
vs = T (s ) ys
pu uu
(12.50) (I )T ()(I )t = r, t u.
u p u u
pu uu
(12.50) (I )T ( m )(I )t = r = T (m )u, t u.
u p u u
1
T (m )t p = r, = u T (m )t.
u p
Using T (m )u = r we get
t = u + T (m )1 p = u + T (m )1 T (m )u.
The correction equation (12.50) in Algorithm 12.4 is typically solved to low accuracy
by a preconditioned GMRES iteration where the preconditioner has the form
(12.51) = (I pu )K(I uu ),
K K T ().
u p u u
= g,
Kt t u.
Bibliography
[1] R. Barret, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra,
V. Eijkhout, R. Pozo, C. Romine, and H. van der Vorst, Templates for the
Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadel-
phia, PA, 1994. (Available from Netlib at URL http://www.netlib.org/templates/
index.html).
[2] E. R. Davidson, The iterative calculation of a few of the lowest eigenvalues and cor-
responding eigenvectors of large real-symmetric matrices, J. Comp. Phys., 17 (1975),
pp. 8794.
[4] R. Geus, The JacobiDavidson algorithm for solving large sparse symmetric eigen-
value problems, PhD Thesis No. 14734, ETH Z urich, 2002. (Available at URL
http://e-collection.ethbib.ethz.ch/show?type=diss&nr=14734).
[5] G. H. Golub and C. F. van Loan, Matrix Computations, The Johns Hopkins
University Press, Baltimore, MD, 2nd ed., 1989.
240 CHAPTER 12. THE JACOBI-DAVIDSON METHOD
[6] C. G. J. Jacobi, Uber ein leichtes Verfahren die in der Theorie der S
acularst
orungen
vorkommenden Gleichungen numerisch aufzul osen, J. reine angew. Math., 30 (1846),
pp. 5195.
[9] A. Ruhe, Algorithms for the nonlinear eigenvalue problem, SIAM J. Numer. Anal.,
10 (1973), pp. 674689.
[13] G. W. Stewart, Matrix Algorithms II: Eigensystems, SIAM, Philadelphia, PA, 2001.
13.1 Introduction
In this chapter we restrict ourselves to the symmetric/Hermitian eigenvalue problem
(13.1) Ax = M x, A = A , M = M > 0.
(13.3) xk+1 = xk + k pk .
The parameter k is determined such that the Rayleigh quotient of the new iterate xk+1
becomes minimal,
We can write the Rayleigh quotient of the linear combination xk + pk of two (linearly
independant) vectors xk and pk as
(13.5)
1 xk Axk xk Apk 1
xk Axk + 2xk Apk + 2 pk Apk pk Axk pk Apk
(xk + pk ) = 2 =
.
xk M xk + 2xk M pk + pk M pk 1 xk M xk xk M pk 1
pk M xk pk M pk
This is the Rayleigh quotient associated with the generalized 2 2 eigenvalue problem
xk Axk xk Apk xk M xk xk M pk
(13.6) = .
pk Axk pk Apk pk M xk pk M pk
241
242 CHAPTER 13. RAYLEIGH QUOTIENT AND TRACE MINIMIZATION
The smaller of the two eigenvalues of (13.6) is the searched value k+1 := (xk+1 ) in (13.4)
that minimizes the Rayleigh quotient. The corresponding eigenvector is normalized such
that its first component equals one1 . The second component of this eigenvector is = k .
Inserting the solution [1, k ] into the second line of (13.6) we obtain
So, the next residual rk+1 is orthogonal to the actual search direction pk .
There are various ways how to choose the search direction pk . A simple way is to cycle
through the coordinate vectors, a method that is called coordinate relaxation [3]. It cannot
compete with the methods we discuss next; but it has some potential for parallelization.
(13.8) Ax = b,
(13.3) xk+1 = xk + k pk
holds among any two consecutive vectors. The search direction pk is chosen to be the
negative gradient (xk ) = rk = b Axk . This is the direction in which decreases
the most. Setting xk+1 as in (13.3) we get
(xk+1 )
0= = pk (Axk b) + k pk Apk = pk rk + k pk Apk .
=k
Thus,
pk rk
(13.11) k =
pk Apk
which, for steepest descent, becomes
rk rk
(13.12) k =
rk Ark
Remark 13.1. Notice that
Notice that gk points in the same direction as the residual rk . (This is in contrast to the
linear system case!) Since in eigenvalue problems we only care about directions we can
equivalently set
With this choice of search direction we immediately have from (13.7) that
(13.16) rk rk+1 = 0.
Not surprisingly, the method of steepest descent often converges slowly, as it does for
linear systems. This happens if the spectrum is very much spread out, i.e., if the condition
number of A relative to M is big.
where the coefficient k is determined such that pk and pk1 are conjugate, i.e.,
such that
gk Apk1
(13.19) k = .
pk1 Apk1
as with the method of steepest descent. Still in the case of linear systems, using these
identities we find formulae equivalent to (13.19),
gk Apk1 (13.13) gk (gk gk1 ) (13.14) gk (gk gk1 )
k = = =
pk1 Apk1 pk1 (gk gk1 ) pk1 gk1
(13.20) gk (gk gk1 )
(13.22) = g
gk1 k1
(13.21) gk gk
(13.23) = g .
gk1 k1
The equivalent identities (13.19), (13.22), and (13.23) can be used to define k the most
economic being (13.23).
We now look at how a conjugate gradient algorithm for the eigenvalue problem can be
devised. The idea is straightforward. The algorithm differs from steepest descent by the
choice of the search directions that are kept conjugate, i.e., consecutive search directions
satisfy pk Apk1 = 0.
The crucial difference to linear systems stems from the fact, that the functional that
is to be minimized, i.e., the Rayleigh quotient, is not quadratic anymore. (In particular,
there is no finite termination property.) The gradient of (x) is
2
g = (xk ) = (Ax (x)M x).
x M x
So, in particular, the equation (13.14), does not hold:
Therefore, in the context of nonlinear systems or eigenvalue problems the formulae in (13.19),
(13.22), and (13.23) that define k are not equivalent anymore! Feng and Owen [4] exten-
sively compared the three formulae and found that in the context of eigenvalue problems
the last identity (13.23) leads to the fastest convergence. So, we opt for this equation and
define the search directions according to
p0 = g0 , k = 0,
gk M gk
(13.24)
pk = gk + g M g
pk1 , k > 0,
k1 k1
where we have given the formulae for the generalized eigenvalue problem Ax = M x. The
complete procedure is given in Algorithm 13.1
Convergence
The construction of Algorithm 13.1 guarantees that (xk+1 ) < (xk ) unless rk = 0, in
which case xk is the searched eigenvector. In general, i.e., if the initial vector x0 has a
nonvanishing component in the direction of the smallest eigenvector u1 , convergence is
toward the smallest eigenvalue 1 . This assumption must also hold for vector iteration or
the Lanczos algorithm.
13.3. THE CONJUGATE GRADIENT ALGORITHM 245
Algorithm 13.1 The Rayleigh quotient algorithm
1: Let x0 be a unit vector, kx0 kM = 1.
2: v0 := Ax0 , u0 := M x0 ,
v x
3: 0 := 0 0 ,
u0 x0
4: g0 := 2(v0 0 u0 )
5: while kgk k > tol do
6: if k = 1 then
7: pk := gk1 ;
8: else
g M gk1
9: pk := gk1 + k1 p ;
gk2 M gk2 k1
10: end if
11: Determine the smallest Ritz value k and corresponding Ritz vector xk of (A, M )
in R([xk1 , pk ])
12: vk := Axk , uk := M xk
13: k := xk vk /xk uk
14: gk := 2(vk k uk )
15: end while
Let
As seen earlier, in symmetric eigenvalue problems, the eigenvalues are much more accurate
than the eigenvectors.
Let us now suppose that the eigenvalues have already converged, i.e.,
(xk ) = k
= 1 ,
while the eigenvectors are not yet as accurate as desired. Then we can write
n
X
(13.27) rk = (A k M )xk
= (A 1 M )xk = (j 1 )M uj uj M xk ,
j=1
which entails u1 rk = 0 since the first summand on the right of (13.27) vanishes. From (13.25)
we have wk = sin k zk M u1 . Thus,
(
(A 1 M )wk = (A 1 M )xk = rk u1
(13.28)
wk M u1 = 0
A high condition number implies slow convergence. We see from (13.31) that the condition
number is high if the distance of 1 and 2 is much smaller than the spread of the spectrum
of (A; B). This happens more often than not, in particular with FE discretizations of
PDEs.
Preconditioning
In order to reduce the condition number of the eigenvalue problem we change
Ax = M x
into
(13.32) A M
x = x
,
such that
(13.33) 1M
(A ) (A 1 M ).
To further investigate this idea, let C be a nonsingular matrix, and let y = Cx. Then,
x Ax y C AC 1 y
y Ay
(13.34) (x) = = = = (y)
x M x y C M C 1 y M
y y
Thus,
A 1 M
= C (A 1 M )C 1 ,
C 1 (A 1 M
)C = (C C)1 (A 1 M ).
Note that
1
0 1 < 1.
j
13.4. LOCALLY OPTIMAL PCG (LOPCG) 247
Dividing the largest eigenvalue of A1 (A1 M ) by the smallest positive gives the condition
number
1 1 2 n 1 2
1 n
(13.35) 1 := A (A 1 M ) R(u1 )M = 1
= = 0 .
1 n 2 1 n
2
We can consider xk+1 as having been obtained by a Rayleigh quotient minimization from
xk along pk = gk +(k /k )pk1 . Notice that this direction is needed in the next iteration
step. (Otherwise it is not of a particular interest.)
248 CHAPTER 13. RAYLEIGH QUOTIENT AND TRACE MINIMIZATION
% PA 16.6.2000
u = M*x;
q = sqrt(x*u);
x = x/q; u = u/q;
v = A*x;
rho = x*v;
x = x + delta*p;
u = M*x;
q = sqrt(x*u);
x = x/q; u = u/q;
v = A*x;
gnorm = norm(g);
if nargout>2, log = [log; [k,rho,gnorm]]; end
end
% PA 2002-07-3
n = size(M,1);
u = M*x;
q = sqrt(x*u);
x = x/q; u = u/q;
v = A*x;
rho = x*v;
p = [-g p]*delta(2:end);
x = delta(1)*x + p;
u = M*x;
q = sqrt(x*u);
x = x/q; u = u/q;
v = A*x;
if nargout>2, log = [log; [k,rho,gnorm]]; end
end
Figure 13.2: Matlab code LOPCG: Locally Optimal Preconditioned Conjugate Gradient
algorithm
250 CHAPTER 13. RAYLEIGH QUOTIENT AND TRACE MINIMIZATION
13.5 The block Rayleigh quotient minimization algorithm
(BRQMIN)
The above procedures converge very slowly if the eigenvalues are clustered. Hence, these
methods should be applied only in blocked form.
Longsine and McCormick [8] suggested several variants for blocking Algorithm 13.1.
See [1] for a recent numerical investigation of this algorithm.
If dj = [dT1j , dT2j , dT3j ]T , dij Rq , is the eigenvector corresponding to the j-th eigenvalue
of (13.1) restricted to R([Xk , Hk , Pk1 ]), then the j-th column of Xk+1 is the corresponding
13.7. A NUMERICAL EXAMPLE 251
Ritz vector
with
Pk ej := Hk d2j + Pk1 d3j .
Notice that P0 is an empty matrix such that the eigenvalue problem in step (8) of the
locally-optimal block preconditioned conjugate gradient method (LOBPCG), displayed in
Algorithm 13.2, has order 2q only for k = 0.
The algorithm as proposed by Knyazev [6] was designed to compute just a few eigen-
pairs and so a memory efficient implementation was not presented. For instance, in addi-
tion to Xk , Rk , Hk , Pk , the matrices M Xk , M Hk , M Pk and AXk , AHk , APk are also stored.
The resulting storage needed is prohibitive if more than a handful of eigenpairs are needed.
A more memory efficient implementation results when we iterate with blocks of width q
in the space orthogonal to the already computed eigenvectors. The computed eigenvectors
are stored in Q and neither M Q nor AQ are stored. Hence only storage for (p + 10q)n +
O(q 2 ) numbers is needed.
Here, the columns of [Xk , Hk , Pk ] may become (almost) linearly dependent leading to
ill-conditioned matrices A e and M
f in step (9) of the LOBPCG algorithm. If this is the case
we simply restart the iteration with random Xk orthogonal to the computed eigenvector
approximations. More sophisticated restarting procedures that retain Xk but modify Hk
and/or Pk were much less stable in the sense that the search space basis again became
linearly dependent within a few iterations. Restarting with random Xk is a rare occurrence
and in our experience, has little effect on the overall performance of the algorithm.
>> [p,e,t]=initmesh(auto);
>> [p,e,t]=refinemesh(auto,p,e,t);
>> [p,e,t]=refinemesh(auto,p,e,t);
>> p=jigglemesh(p,e,t);
>> [A,M]=assema(p,t,1,1,0);
>> whos
Name Size Bytes Class
>> n=size(A,1);
252 CHAPTER 13. RAYLEIGH QUOTIENT AND TRACE MINIMIZATION
>> R=ichol(A); % Incomplete Cholesky factorization
>> x0=rand(n,1)-.5;
>> tol=1e-6;
>> [x,rho,log0] = rqmin1(A,M,x0,tol);
>> [x,rho,log1] = rqmin1(A,M,x0,tol,R);
>> [x,rho,log2] = lopcg(A,M,x0,tol);
>> [x,rho,log3] = lopcg(A,M,x0,tol,R);
>> whos log*
Name Size Bytes Class
>> L = sort(eig(full(A),full(M)));
>> format short e, [L(1) L(2) L(n)], format
ans =
ans =
0.9862
>> l0=log0(end-6:end-1,2).\log0(end-5:end,2);
>> l1=log1(end-6:end-1,2).\log1(end-5:end,2);
>> l2=log2(end-6:end-1,2).\log2(end-5:end,2);
>> l3=log3(end-6:end-1,2).\log3(end-5:end,2);
>> [l0 l1 l2 l3]
ans =
>> semilogy(log0(:,1),log0(:,3)/log0(1,3),log1(:,1),log1(:,3)/log1(1,3),...
log2(:,1),log2(:,3)/log2(1,3),log3(:,1),log3(:,3)/log3(1,3),LineWidth,2)
>> legend(rqmin,rqmin + prec,lopcg,lopcg + prec)
The convergence histories in Figure 13.3 for RQMIN and LOPCG show that precon-
ditioning helps very much in reducing the iteration count.
In Figure 13.4 the convergence histories of LOBPCG for computing ten eigenvalues is
shown. In 43 iteration steps all ten eigenvalues have converged to the desired accuracy ( =
105 ). Clearly, the iteration count has been decreased drastically. Note however, that each
13.8. TRACE MINIMIZATION 253
0
10
1
10
rqmin
rqmin + prec
2
10 lopcg
lopcg + prec
3
10
norm of residual
4
10
5
10
6
10
7
10
8
10
0 100 200 300 400 500 600 700 800 900
iteration number
iteration step requires solving ten systems of equation resulting in 430 system solves. (In
fact, if converged eigenvectors are locked, only 283 systems had to be solved.) Nevertheless,
when comparing with Fig. 13.3 one should remember that in the LOBPCG computation
ten eigenpairs have been computed. If a single eigenpair is required then a blocksize of 10
is too big, but a smaller blocksize may reduce the execution time. If a small number of
eigenvalues is desired then a blocksize equal or slightly bigger than theis number is certainly
advantageous. Not that in step (5) of Algorithm 13.2 q linear systems of equations are
solved concurrently. An efficient implementation accesses the preconditioner N only once.
The Matlab code does this naturally. A parallel implementation of LOBPCG can be
found in the software package Block Locally Optimal Preconditioned Eigenvalue Xolvers
(BLOPEX) [7].
The following theorem [11] generalizes the trace theorem 2.33 for the generalized eigenvalue
problem
(13.1) Ax = M x, A = A , M = M > 0.
Theorem 13.1 (Trace theorem for the generalized eigenvalue problem) Let A
and M be as in (13.1). Then,
0
10
1
10
2
10
3
10
4
10
5
10
6
10
0 5 10 15 20 25 30 35 40 45
where 1 , . . . , n are the eigenvalues of problem (13.1). Equality holds in (13.43) if and
only if the columns of the matrix X that achieves the minimum span the eigenspace cor-
responding to the smallest p eigenvalues.
Sameh and coworkers [11, 10] suggested an algorithm to exploit this property of
the trace, following the lines of Rayleigh quotient minimization. Let Xk Fnp with
Xk M Xk = Ip and
(k)
Xk AXk = k = diag(1 , . . . , p(k) ).
We want to construct the next iterate Xk+1 by setting
such that
(13.45) Xk+1 M Xk+1 = Ip ,
(k+1)
(13.46) Xk+1 AXk+1 = k+1 = diag(1 , . . . , p(k+1) ),
(13.47) trace(Xk+1 AXk+1 ) < trace(Xk AXk ).
(13.48) k M Xk = 0.
under the constraint (13.48). Let Zk+1 = Xk k be the solution of (13.49). Then, by
construction,
trace(Zk+1 AZk+1 ) trace(Xk AXk ).
13.8. TRACE MINIMIZATION 255
Furthermore,
(13.48)
Zk+1 M Zk+1 = Xk M Xk + k M k Xk M Xk = Ip .
MZ
From this it follows that Zk+1 has maximal rank and that all eigenvalues of Zk+1 k+1
are 1. Therefore, the spectral decomposition of Zk+1 M Zk+1 can be written in the form
Zk+1 M Zk+1 = U D 2 U , U U = Ip , D = diag(1 , . . . , p ), i 1.
This implies that the columns of Zk+1 U D 1 are M -orthogonal. Let the spectral decom-
position of D 1 U Zk+1
AZ
k+1 U D
1 be given by
D 1 U Zk+1
AZk+1 U D 1 = V k+1 V , V V = Ip .
Then,
(13.50) V D 1 U Zk+1
M Zk+1 U D 1 V = Ip ,
| {z }
Ip
(13.51) V D 1 U Zk+1
AZk+1 U D 1 V = k+1 .
Sk = U D 1 V.
Here, the vector l contains the Lagrange multipliers. A necessary condition for d to be a
solution of (13.52) is
d f = 0 A(xi d) + M Xk l = 0,
l f = 0 Xk M d = 0.
we obtain
A M Xk k I 0 AXk AXk
= = .
O Xk M A1 M Xk L Xk M A1 I O Xk M Xk
L = (Xk M A1 M Xk )1 .
k + A1 M Xk L = Xk ,
such that
Zk+1 = Xk k = A1 M Xk L = A1 M Xk (Xk M A1 M Xk )1 .
Thus, one step of the above trace minimization algorithm amounts to one step of
subspace iteration with shift = 0. This proves convergence of the algorithm to the
smallest eigenvalues of (13.1). Remember the similar equation (12.13) for the Jacobi
Davidson iteration and Remark 12.2.
Let P be the orthogonal projection onto R(M Xk ) ,
(13.55) P AP k = P AXk , Xk M k = 0,
k
are equivalent, i.e., they have the same solution k . In fact, let be the solution
L
of (13.53). Then, from Xk M k = 0 we get P k = k . Equation (13.55) is now obtained
13.8. TRACE MINIMIZATION 257
Algorithm 13.3 Trace minimization algorithm to compute p eigenpairs of Ax =
M x.
1: Choose random matrix V1 Rnq with V1T M V1 = Iq , q p.
2: for k = 1, 2, . . . until convergence do
3: Compute Wk = AVk and Hk := Vk Wk .
4: Compute spectral decomposition Hk = Uk k Uk ,
(k) (k) (k) (k)
with k = diag(1 , . . . , q ), 1 . . . q .
5: Compute Ritz vectors Xk = Vk Uk and residuals Rk = Wk Uk M Xk k
6: For i = 1, . . . , q solve approximatively
(k) (k) (k)
P (A i M )P di = P ri , di M Xk
by multiplying the first equation in (13.53) by P . On the other hand, let k be the
solution of (13.55). Since P (AP k AXk ) = 0 we must have AP k AXk = M Xk L for
some L. As Xk M k = 0 we get P k = k and thus the first equation in (13.53).
As P AP is positive semidefinite, equation (13.55) is easier to solve than equation (13.53)
which is an indefinite system of equations. (13.55) can be solved by the (preconditioned)
conjugate gradient method (PCG). The iteration has to be started by a vector z0 that
satisfies the constrains Xk M z0 . A straightforward choice is z0 = 0
Table 13.1: The basic trace minimization algorithm (Algorithm 13.3). The inner systems
are solved by the CG scheme which is terminated such that the 2-norm of the residual is
reduced by a specified factor. The number of outer iterations (#its) and the number of
multiplications with matrix A (A mults) are listed for different residual reduction factors.
to high accuracy. In Table 13.1 the number of outer iterations (#its) are given and the
number of multiplications of the matrix A with a vector for various relative stopping
criteria for the inner iteration (reduction factor) [10].
Acceleration techniques
Sameh & Tong [10] investigate a number of ways to accelerate the convergence of the trace
minimization algorithm 13.3.
1. Simple shifts. Choose a shift 1 1 until the first eigenpair is found. Then proceed
with the shift 2 2 and lock the first eigenvector. In this way PCG can be used
to solve the linear systems as before.
258 CHAPTER 13. RAYLEIGH QUOTIENT AND TRACE MINIMIZATION
2. Multiple dynamic shifts. Each linear system (13.56) is solved with an individual
shift. The shift is turned on close to convergence. Since the linear systems are
indefinite, PCG has to be adapted.
In Table 13.2 results are collected for some problems in the HarwellBoeing collec-
tion [10]. These problems are diffcult because the gap ratios for the smallest eigenvalues
are extremely small due to the huge span of the spectra. Without preconditioning, none of
these problems can be solved with a reasonable cost. In the experiments, the incomplete
Cholesky factorization (IC(0)) of A was used as the preconditioner for all the matrices of
the form A B.
Table 13.2: Numerical results for problems from the HarwellBoeing collection with four
processors (reproduced from [10])
The Davidson-type trace minimization algorithm with multiple dynamic shifts works
better than the block JacobiDavidson algorithm for three of the five problems. For the
other two, the performance for both algorithms is similar.
Bibliography
[1] P. Arbenz, U. L. Hetmaniuk, R. B. Lehoucq, and R. Tuminaro, A comparison
of eigensolvers for large-scale 3D modal analysis using AMG-preconditioned iterative
methods, Internat. J. Numer. Methods Eng., 64 (2005), pp. 204236.
[2] O. Axelsson and V. Barker, Finite Element Solution of Boundary Value Prob-
lems, Academic Press, Orlando FL, 1984.
[4] Y. T. Feng and D. R. J. Owen, Conjugate gradient methods for solving the smallest
eigenpair of large symmetric eigenvalue problems, Internat. J. Numer. Methods Eng.,
39 (1996), pp. 22092229.
BIBLIOGRAPHY 259
[5] M. R. Hestenes and E. Stiefel, Methods of conjugent gradients for solving linear
systems, J. Res. Nat. Bur. Standards, 49 (1952), pp. 409436.
[10] A. Sameh and Z. Tong, The trace minimization method for the symmetric gener-
alized eigenvalue problem, J. Comput. Appl. Math., 123 (2000), pp. 155175.
[11] A. H. Sameh and J. A. Wisniewski, A trace minimization algorithm for the gen-
eralized eigenvalue problem, SIAM J. Numer. Anal., 19 (1982), pp. 12431259.