Krylov methods, an introduction
Paul Van Dooren, CESAME, Univ. Cath. Louvain, Belgium
slides on http://www.auto.ucl.ac.be/vdooren/Iterative.html
What well talk about ...
basic ideas of iterative methods
recursive refinement
Krylov methods and their variants
orthogonality vs bi-orthogonality
some numerical aspects
error propagation
some algebraic aspects
breakdowns
eigenvalue problems
projected eigenvalues
rational approximation
Pade approximation
2
Motivation
Every method performs better for some classes of problems ...
Direct methods
Jacobi/Gauss-Seidel
Krylov methods
Multigrid methods
Fast multipole methods
but their features can be combined (hybrid, preconditioning)
Advantages of Krylov methods depend on whom to compare with
Recurrences and Krylov methods
Solve Ax = b via fixed point of xk := b + (I A)xk1
Rewrite this as xk := xk1 + rk1 using residual rk1 := b Axk1
=
xk = x0 + r0 + r1 + . . . rk1
From b Axk = b Axk1 Ark1 we find rk := (I A)rk1
=
k1
xk = x0 + r0 + (I A)r0 + . . . (I A)
k1
r0 c
= x0 + r0 , Ar0 , . . . , A
r0
A Krylov subspace is a space spanned by
k1
Kk (A, r0 ) := Im r0 , Ar0 , . . . , A
r0
We are looking for good linear combinations
xk x0 =
k1
X
j=0
cj Aj r0 Kk (A, r0 )
There are essentially two different criterions
min kAxk bk2 ,
make Axk b Kk (A, r0 )
related to orthogonal recurrence relations (GMRES and FOM)
Two additional classes of methods are related to bi-orthogonal
relations (QMR and BI-CG)
5
Arnoldi process
There always exists an orthogonal matrix U T U = In such that
h1,1
h
2,1
T
U AU =: H =
h1,2
h2,2
..
.
...
..
.
h1,n
..
.
..
hn1,n
hn,n1
hn,n
and the first column u1 of U can be chosen arbitrarily.
Equate columns of AU = U H. First one: Au1 = u1 h1,1 + u2 h2,1
h1,1 := uT1 Au1 ,
u
2 := Au1 u1 h1,1 ,
h2,1 := k
u 2 k2
For the following columns:
Auk =
k
X
uj hj,k + uk+1 hk+1,k
j=1
u
k+1
k
X
:= Auk
uj hj,k ,
j=1
hk+1,k := k
uk+1 k2 ,
hj,k := uTj Auk
uk+1 := u
k+1 /hk+1,k
In block notation, with Uk := U (:, 1 : k), Hk := H(1 : k, 1 : k):
Uk
Uk
Hk
+u
k+1 eTk
It is easy to see that Kk (H, e1 ) = Im
...
.
..
I
. ..
= Im k
Choose U such that r0 /kr0 k2 = U e1 , A = U HU T , then
Kk (A, r0 ) = Im [Uk ]
because
T
k1 T
k1
U e1 , (U HU )U e1 , . . . , (U H
U )U e1 = U e1 , He1 , . . . , H
e1
Galerkin condition (FOM)
Look for xk x0 Kk such that b Axk Kk . Therefore
xk x0 = Uk y,
UkT (bAxk ) = 0,
UkT AUk y = UkT r0 = kr0 k2 e1
So we solve using efficient recurrence relations
h1,1
h
2,1
h1,2
h2,2
..
.
...
..
.
..
hk,k1
h1,k
..
.
hk1,k
hk,k
y1
y2
..
.
yk
Error bound kb Axk k2 = |hk+1,k .yk | may grow ...
kr0 k2
0
..
.
0
Minimize residual (GMRES)
Look for xk x0 Kk to minimize kb Axk k2 . Therefore
xk x0 = Uk y,
T
T
bAxk Im[Uk+1 ] kbAxk k2 = kUk+1
r0 Uk+1
AUk yk2
So we solve using efficient recurrence relations
h1,1 h1,2
...
h1,k
kr0 k2
y1
..
..
.
.
h2,1 h2,2
y2
..
. =
..
..
.
.
hk1,k .
0
hk,k1
hk,k
y
k
hk+1,k
One can prove that kb Axk k2 kb Axk1 k2 but it may stall ...
10
GMRES vs FOM
G
Denote k := kb Axk k2 of FOM and GMRES by F
k and k , resp.
Then
2
2
2
(G
= (F
+ (G
k)
k)
k1 )
F
G
k k
The Arnoldi process breaks down when hk+1,k = 0 since we need
to divide by it.
But then the system is solved since both FOM and GMRES yield
F
the same answer and G
k = k = |hk+1,k .yk | = 0
11
Stalling
Consider
...
0
..
.
0
..
1
.
..
0
..
.
0
y1
y2
..
.
yn
1
0
..
.
0
Then
G = (1, . . . , 1, 0)
F = (, . . . , , 0)
GMRES is always bounded, outperforms FOM but still can stall
12
Lanczos process
For A = AT there exists an orthogonal matrix U T U = In such that
U AU =: T
2
..
.
..
..
and the first column u1 of U can be chosen arbitrarily.
Same derivation but recurrences are now short :
Auk = uk1 k + uk k + uk+1 k+1
u
k+1 := Auk uk1 k uk k ,
k+1 := k
uk+1 k2 ,
k := uTk Auk
uk+1 := u
k+1 /k+1
For A = AT 0 the tri-diagonal matrix T can be factored, yielding
two coupled 2-term recurrences instead (Conjugate Gradient).
13
Minimize residual (MINRES)
Look for xk x0 Kk to minimize kb Axk k2 using efficient
recurrence relations
1 2
kr0 k2
y1
..
0
.
2 2
y2 .
. = .
..
..
.
.
k .
k
k
y
k
k+1
One shows that kb Axk k2 decreases linearly with approximate
factor ( 1)/( + 1), where := kAkkA1 k
14
The complexity of the different methods up to step k is
Arnoldi
Lanczos
Conj G
Ax
orthog.
2k 2 n
9kn
10kn
storage
kn
3n
4n
This clearly shows the need for other approaches for unsymmetric
A if k gets large.
Partial orthogonalization (IOM, ...)
Restarts (FOM(m), GMRES(m), ... )
r
b
I
A
T
T
=
Consider A Ax = A b or
x
0
AT 0
15
Unsymmetric Lanczos process
There exist invertible matrices V, W such that W T V = In and
W T AV =: T
2
..
.
..
..
for almost all first columns v1 , w1 of V, W .
Derivation now involves columns of AV = V T and AT W = W T T :
Avk = vk1 k + vk k + vk+1 k+1
AT wk = wk1 k + wk k + wk+1 k+1
16
Without breakdowns we have with Tk := T (1 : k, 1 : k),
Vk := V (:, 1 : k), Wk := W (:, 1 : k), in block notation:
AT
WkT AVk = Tk ,
Vk
Wk
Tk
+ k+1 vk+1 eTk
= Wk
TkT
+ k+1 wk+1 eTk
Vk
WkT Vk = Ik
17
T
(k+1 k+1 wk+1
vk+1 6= 0)
Coupled Krylov subspaces
Since
Kk (T, e1 ) = Im
Ik
0
and
Kk (T , e1 ) = Im
Ik
we have that
Kk (A, v1 ) = Im [Vk ] ,
Kk (AT , w1 ) = Im [Wk ]
because
T
k1
T
k1
W )V e1 = V e1 , T e1 , . . . , T
e1
V e1 , (V T W )V e1 , . . . , (V T
W e1 , (W T T V T )W e1 , . . . , (W T
T k1
18
V T )W e1 = W e1 , T T e1 , . . . , T
T k1
e1
Galerkin condition
Look for xk x0 Kk (A, r0 ) such that b Axk Kk (AT , w1 ).
Therefore
xk x0 = Vk y,
WkT (bAxk ) = 0,
So we solve recursively
2
..
.
..
..
WkT AVk y = WkT r0 = kr0 k2 e1
y1
y2
..
.
yk
But now kW T (b Axk )k2 = |k+1 .yk | ...
19
kr0 k2
0
..
.
0
Minimize quasi residual (QMR)
Look for xk x0 Kk (A, r0 ) to minimize kW T (b Axk )k2 .
Therefore xk x0 = Vk y,
b Axk Im[Vk+1 ],
T
T
kW T (b Axk )k2 = kWk+1
r0 Wk+1
AVk yk2
So we solve using efficient recurrence relations
1 2
kr0 k2
y1
..
0
.
2 2
y2
..
. =
..
..
.
.
k .
0
k
k
yk
k+1
One can only prove that kW T (b Axk )k2 kW T (b Axk1 )k2
20
Variants
Avoid transposes via inner products (x, AT y) = (Ax, y)
Factorize T = L.U to get coupled 2-term recurrences rather
than 3-term recurrences
Re-orthogonalize to compensate for loss of bi-orthogonality
Apply look-ahead to avoid breakdowns
Restart rather than look-ahead or re-orthogonalization
Block versions for all of the above
21
Loss of orthogonality
The orthogonalization process u
k+1 := Auk
under finite precision
Uk
Pk
j=1
uj hj,k yields
uk+1 eTk + Ek ,
Uk Hk +
kEk k kAk
but the error Fk := UkT Uk Ik can grow arbitrarily (even with
MGS) unless one orthogonalizes once again : kFk k
The same comment holds for Lanczos (higher relative cost !)
For unsymmetric Lanczos bounds grow with kWk kkVk k
22
Breakdowns
Arnoldi stops when hk+1,k = 0, implying Axk = b (solved)
Lanczos stops when k+1 = 0, implying Axk = b (solved)
when k+1 = 0, choose vk+1 Vk , e.g. vk+1 = wk+1
T
when wk+1
vk+1 = 0, no bi-orthogonality, serious breakdown
Breakdown is cured by going to the block version (lookahead)
T
det(Wk+1
Vk+1 ) 6= 0
23
Recap solving Ax = b
Choose x0 , r0 := b Ax0 and look for xk x0 Kk (A, r0 )
Four different methods
O(k 2 n)
O(kn)
O(kn)
Axk b Kk (A, r0 )
min kAxk bk2
CG (A = AT 0)
MINRES (A = AT )
FOM
Axk b Kk (AT , w1 )
BICG
GMRES
min kW T (Axk b)k2
QMR
Notice that with full reorthogonalization all methods are O(k 2 n)
Many variants try to cure orthogonality and erratic convergence
All methods improve with preconditioning (application-dependent)
24
Eigenvalue problems
Bauer-Fike Theorem applied to
AUk Uk Hk = hk+1,k uk+1 eTk
yields (for X 1 AX = )
i : |j (Hk ) i (A)| |hk+1,k |(X)
i.e. each eigenvalue of Hk approximates well some eigenvalue of A
For A = AT , (X) = 1
Breakdown hk+1,k = 0 is good since j (Hk ) = i (A)
Improved bounds exist if we know something about (A)
25
Bauer-Fike Theorem applied to
AVk Vk Tk = vk+1 eTk
AWk Wk TkT = w
k+1 eTk
yields (for X 1 AX = )
i : |(Tk ) i (A)| kVk k2 k
vk+1 k2 (X), kWk k2 kw
k+1 k2 (X)
i.e. each eigenvalue of Tk approximates well some eigenvalue of A
Breakdowns vk+1 = 0, w
k+1 = 0 are good since j (Tk ) = i (A)
T
Breakdown w
k+1
vk+1 = 0 does not help
Result still holds when bi-orthogonality gets lost
26
Eigenvalue convergence
Note that Krylov spaces are related to the power method
k1
Kk (A, r) := Im r, Ar, . . . , A
r
and that under exact arithmetic
Kk ((A cI), r) = Kk (A, r)
For Arnoldi, (Hk ) should converge to outer spectrum of A cI
The same holds for Lanczos and (Tk ) (real spectrum if A = AT )
In practice one often converges to outer eigenvalues
but the full story is more complex ...
27
Convergence strongly influenced by rational transformation or
implicit shift technique
Rational transformation A := (bI cA)(dI A)1 transforms
spectrum also (inverse iteration A := (dI A)1 is a special case)
h
i
r) := Im r, Ar,
. . . , Ak1 r
Kk (A,
tends to eigenspace with spectrum of A closest to d
approximate inverses needed for multiplication with A
loss of orthogonality indicates convergence of eigenvalues
symmetric Lanczos extensively studied
28
Implicit shifted ARPACK
One implicit shift QR step applied to Hk :
k = R.Q + I
H
Hk I = Q.R,
k and
(and a deflation) deletes the undesired eigenvalue from H
yields a new starting vector r := (A I)r such that
k2
Kk1 (A, r) := Im r, A
r, . . . , A
r
k1 as projected matrix
yields H
Choose shifts i recursively
r := [(A i )]r = p(A)r
such that p(s) filters the complex plane from undesired eigenvalues
29
Filter after a number of implicit steps
30
Implicit shifts ideas
allow to keep k small (deflations)
behaves numerically better than explicit calculation of r
extends to the Lanczos and unsymmetric Lanczos algorithm
extends to the block Arnoldi and block Lanczos algorithm
extends to the generalized eigenvalue problem
det(B A) = 0 det(I B 1 A) = 0
implicit QR steps become implicit QZ steps
31
Rational approximation
Approximate R(s) of degree N by R(s)
of degree n << N . Assume
R(s) := c(IN A)1 b,
1b
R(s)
:= c(In A)
To interpolate at a point up to order 2k (Pade approximation):
R(s) R(s)
= O(s )2k
one chooses
A = Wk AVk ,
b = Wk b,
c = cVk ,
where
A := (AI)1 ,
Ab),
Im[Vk ] = Kk (A,
Im[Wk ] = Kk (AT , AT cT )
Also valid for multipoint, multi-input, multi-output and Arnoldi
32
We did NOT cover
least squares variants
implementation aspects
preconditioning
singular value problems
block versions
multiple right hand sides
restarts ...
33
Conclusion
Krylov subspace algorithms are used to
solve large scale systems of equations
find eigenvalues, generalized eigenvalues and singular values of
large scale matrix problems
approximate the exponential or high degree rational functions
by lower degree ones
There are numerous variants and sophisticated algorithms
It is still a very active area of research
34
References
[1] R. Lehoucq, D. Sorensen, C. Yang, ARPACK users guide,
SIAM, 1997.
[2] B. Parlett, The symmetric eigenvalue problem, Prentice Hall,
1980.
[3] Y. Saad, Iterative methods for sparse linear systems, SIAM,
2nd Ed., 2003.
[4] H. van der Vorst, Iterative Krylov Methods for Large Linear
Systems, Cambridge Monographs on Applied and
Computational Mathematics, 2003.
[5] P. Van Dooren, Gramian based model reduction of large-scale
dynamical systems, in Numerical Analysis, Chapman and Hall,
CRC Press, London, 2000.
35