[go: up one dir, main page]

0% found this document useful (0 votes)
192 views115 pages

MA398 Script

The document is an introduction to a textbook on matrix analysis and algorithms. It discusses three main problems that will be addressed: solving simultaneous linear equations, least squares problems, and eigenvalue problems. It provides references for additional reading on matrix analysis, numerical linear algebra, stability of algorithms, and specific topics like iterative methods and eigenvalue problems. The contents section outlines the chapters of the textbook, including topics like vector and matrix norms, matrix factorizations, stability and conditioning, algorithm complexity, and methods for solving systems of linear equations, least squares, and eigenvalue problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
192 views115 pages

MA398 Script

The document is an introduction to a textbook on matrix analysis and algorithms. It discusses three main problems that will be addressed: solving simultaneous linear equations, least squares problems, and eigenvalue problems. It provides references for additional reading on matrix analysis, numerical linear algebra, stability of algorithms, and specific topics like iterative methods and eigenvalue problems. The contents section outlines the chapters of the textbook, including topics like vector and matrix norms, matrix factorizations, stability and conditioning, algorithm complexity, and methods for solving systems of linear equations, least squares, and eigenvalue problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 115

Matrix Analysis and Algorithms

Andrew Stuart Jochen Voss

4th August 2009


2
Introduction
The three basic problems we will address in this book are as follows. In all cases we are given
as data a matrix A Cmn , with mn and, for the rst two problems, the vector b Cm .
(SLE) denotes simultaneous linear equations, (LSQ) denotes least squares and (EVP) denotes
eigenvalue problem.
(SLE) (m = n) (LSQ) (m n) (EVP) (m = n)
nd x Cn : nd x Cn : nd (x, ) Cn C:

Ax = b min kAx bk22 Ax = x, kxk22 = 1


xCn
The book contains an introduction to matrix analysis, and to the basic algorithms of numer-
ical linear algebra. Further results can be found in many text books. The book of Horn and
Johnson [HJ85] is an excellent reference for theoretical results about matrix analysis; see also
[Bha97]. The subject of linear algebra, and matrix analysis in particular, is treated in an original
and illuminating fashion in [Lax97]. For a general introduction to the subject of numerical linear
algebra we recommend the book by Trefethen and Bau [TB97]; more theoretical treatments of
the subject can be found in Demmel [Dem97], Golub and Van Loan [GL96] and in Stoer and
Bulirsch [SB02]. Higham's book [Hig02] contains a wealth of information about stability and
the eect of rounding errors in numerical algorithms; it is this source that we used for almost
all theorems we state concerning backward error analysis. The book of Saad [Saa97] covers the
subject of iterative methods for linear systems. The symmetric eigenvalue problem is analysed
in Parlett [Par80].

Acknowledgement
We are grateful to Menelaos Karavelas, Ian Mitchell and Stuart Price for assistance in the type-
setting of this material. We are grateful to a variety of students at Stanford University (CS237A)
and at Warwick University (MA398) for many helpful comments which have signicantly im-
proved the notes.

3
4
Contents
1 Vector and Matrix Analysis 7
1.1 Vector Norms and Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Structured Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Matrix Factorisations 23
2.1 Diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 QR Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 LU Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6 Cholesky Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3 Stability and Conditioning 37


3.1 Conditioning of SLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Conditioning of LSQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Conditioning of EVP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 Stability of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 Complexity of Algorithms 47
4.1 Computational Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Matrix-Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 Bidiagonal and Hessenberg Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5 Systems of Linear Equations 57


5.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Gaussian Elimination with Partial Pivoting . . . . . . . . . . . . . . . . . . . . . 61
5.3 The QR Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6 Iterative Methods 73
6.1 Linear Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2 The Jacobi Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3 The Gauss-Seidel and SOR Methods . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.4 Nonlinear Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.5 The Steepest Descent Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.6 The Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7 Least Squares Problems 93


7.1 LSQ via Normal Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.2 LSQ via QR factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.3 LSQ via SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5
6 CONTENTS

8 Eigenvalue Problems 101


8.1 The Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.2 Inverse Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.3 Rayleigh Quotient Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.4 Simultaneous Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.5 The QR Algorithm for Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.6 Divide and Conquer for Symmetric Problems . . . . . . . . . . . . . . . . . . . . 110
Chapter 1

Vector and Matrix Analysis


The purpose of this chapter is to summarise the fundamental theoretical results from linear
algebra to which we will frequently refer, and to provide some basic theoretical tools which we will
use in our analysis. We study vector and matrix norms, inner-products, the eigenvalue problem,
orthogonal projections and a variety of special matrices which arise frequently in computational
linear algebra.

1.1 Vector Norms and Inner Products

Denition 1.1. A vector norm on Cn is a mapping k k : Cn R satisfying

a) kxk 0 for all x Cn and kxk = 0 i x = 0,


b) kxk = ||kxk for all C, x Cn , and

c) kx + yk kxk + kyk for all x, y Cn .

Remark. The denition of a norm on Rn is identical, but with Cn replaced by Rn and C


replaced by R.

Examples.
the p-norm for 1 p < :
n
X 1/p
kxkp = |xj |p x Cn ;
j=1

for p=2 we get the Euclidean norm:


v
u n
uX
kxk2 = t |xj |2 x Cn ;
j=1

for p=1 we get


n
X
kxk1 = |xj | x Cn ;
j=1

Innity norm: kxk = max1jn |xj |.

Theorem 1.2. All norms on Cn are equivalent: for each pair of norms k ka and k kb on Cn
there are constants 0 < c1 c2 < with
c1 kxka kxkb c2 kxka x Cn .

7
8 CHAPTER 1. VECTOR AND MATRIX ANALYSIS

Proof. Using property b) from the dention of a vector norm it suces to consider vectors

x S = x Cn kxk2 = 1 . Since k ka is non-zero on all of S we can dene f : S R
by f (x) = kxkb /kxka . Because the function f is continuous and the set S is compact there are
x1 , x2 S with f (x1 ) f (x) f (x2 ) for all x S . Setting c1 = f (x1 ) > 0 and c2 = f (x2 )
completes the proof.

Remarks.
1. The same result holds for norms on Rn . The proof transfers to this situation without
change.

2. We remark that, if A Cnn is an invertible matrix and kk a norm on Cn then k kA :=


kA k is also a norm.

Denition 1.3. An inner-product on Cn is a mapping h , i : Cn C n C satisfying:

a) hx, xi R+ for all x Cn and hx, xi = 0 i x = 0;


b) hx, yi = hy, xi for all x, y Cn ;
c) hx, yi = hx, yi for all C, x, y Cn ;
d) hx, y + zi = hx, yi + hx, zi for all x, y, z Cn ;

Remark. Conditions c) and d) above state that h , i is linear in the second component. Using
the rules for inner products we get

hx + y, zi = hx, zi + hy, zi for all x, y, z Cn

and
hx, yi = hx, yi for all C, x, y Cn .
The inner product is said to be anti-linear in the rst component.

Example. The standard inner product on Cn is given by

n
X
hx, yi = xj yj x, y Cn . (1.1)
j=1

Denition 1.4. Two vectors x, y are orthogonal with respect to an inner product h, i i
hx, yi = 0.
Lemma 1.5 (Cauchy-Schwarz inequality). Let h , i : Cn Cn C be an inner product. Then
hx, yi 2 hx, xi hy, yi

(1.2)

for every x, y Cn and equality holds if and only if x and y are linearly dependent.
Proof. For every C we have

0 hx y, x yi = hx, xi hy, xi hx, yi + hy, yi. (1.3)

For = hy, xi/hy, yi this becomes

hx, yi 2

hx, yihy, xi hy, xihx, yi hy, xihx, yi
0 hx, xi + = hx, xi
hy, yi hy, yi hy, yi hy, yi
and multiplying the result by hy, yi gives (1.2).
If equality holds in (1.2) then xy in (1.3) must be 0 and thus x and y are linearly dependent.
If on the other hand x and y are linearly dependent, say x = y , then = hy, yi/hy, yi =
and x y = 0 giving equality in (1.3) and thus in (1.2).
1.1. VECTOR NORMS AND INNER PRODUCTS 9

Lemma 1.6. Let h , i : Cn Cn C be an inner product. Then k k : Cn R dened by


p
kxk = hx, xi x Cn

is a vector norm.
Proof.
p
a) Since h, i is an inner product we have hx, xi 0 for all x Cn , i.e. hx, xi is real
and positive. Also we get

kxk = 0 hx, xi = 0 x = 0.

b) We have
p p
kxk = hx, xi = hx, xi = || kxk.
c) Using the Cauchy-Schwarz inequality

x, y Cn

hx, yi kxkkyk

from Lemma 1.5 we get

kx + yk2 = hx + y, x + yi
= hx, xi + hx, yi + hy, xi + hy, yi
kxk2 + 2 hx, yi + kyk2

kxk2 + 2kxkkyk + kyk2


2
= kxk + kyk x, y Cn .

This completes the proof.

Remark. The angle between two vectors x and y is the unique value [0, ] with

cos()kxkkyk = hx, yi.

When considering the Euclidean norm and inner product on Rn , this denition of angle coincides
with the usual, geometric meaning of angles. In any case, two vectors are orthogonal, if and only
if they have angle /2.

We write matrices A Cmn as



a11 a12 ... a1n
a21 a22 ... a2n
A= . . ;

.
.. .
.
.
.
am1 am2 ... amn

we write (A)ij = aij for the ij th entry of A.


Denition 1.7. adjoint A Cnm A

Given A Cmn we dene the by
ij
= aji . (For

A Rmn we write AT instead of A .)


By identifying the space Cn of vectors with the space Cn1 of n 1-matrices, we can take
the adjoint of a vector. Then we can write the standard inner product as

hx, yi = x y.

Thus, the standard inner product satises

hAx, yi = (Ax) y = x A y = hx, A yi

for all x Cn , y Cm and allA Cmn . Unless otherwise specied, we will use h , i to
denote the standard inner product (1.1) and k k2 to denote the corresponding Euclidean norm.
10 CHAPTER 1. VECTOR AND MATRIX ANALYSIS

The following families of special matrices will be central in what follows:

Denition 1.8. 1. Q Cmn is unitary if Q Q = I . (If Q is real then QT Q = I and we say


Q is orthogonal.)
2. A Cnn is Hermitian if A = A. (If A is real, we say A is symmetric.)
3. A Hermitian matrix A Cnn is positive-denite (resp. positive semi-denite) if x Ax =
hAx, xi > 0 (resp. 0) for all x Cn \ {0}. In this text, whenever we use the terminology
positive-denite or positive semi-denite we are necessarily refering to Hermitian matrices.

Remarks. Unitary matrices have the following properties:

A matrix Q is unitary, if and only if the columns of Q are orthonormal with respect to the
standard inner-product. In particular unitary matrices cannot have more columns than
rows.

If Q is a square matrix, Q1 = Q and thus QQ = I .


A square matrix Q is unitary, if and only if Q is unitary.

The standard inner product and norm are invariant under multiplication by a unitary matrix:

Theorem 1.9. Let h , i denote the standard inner product. Then for any unitary Q Cmn
and any x, y Cn we have hQx, Qyi = hx, yi and kQxk2 = kxk2 .
Proof. The rst claim follows from hQx, Qyi = hx, Q Qyi = hx, yi and using the relation kxk2 =
hx, xi gives the second claim.

Other inner products with appropriate properties can give rise to other norms; for example,
for matrices A which are Hermitian and positive-denite,

hx, yiA = hx, Ayi (1.4)

is an inner product and p


kxkA = hx, xiA . (1.5)

denes a norm (see Exercise 1-2).

1.2 Eigenvalues and Eigenvectors

Denition 1.10. Given a matrix A Cnn , a vector x Cn is an eigenvector and C is an


eigenvalue (also called a right eigenvalue) of A if
Ax = x and x 6= 0. (1.6)

When x is an eigenvector of A, then for every 6= 0 the vector x is an eigenvector for the
same eigenvalue, since both sides of (1.6) are linear in x. Sometimes it is convenient to normalise
n
x by choosing kxk2 = 1. Then the eigenvalue problem is to nd (x, ) C C satisfying

Ax = x and kxk2 = 1.

Denition 1.11. Given a matrix A Cnn we dene the characteristic polynomial of A as


A (z) := det(A zI).

Theorem 1.12. A value C is an eigenvalue of the matrix A, if and only if A () = 0.


Proof. is an eigenvalue of A, if and only if there is an x 6= 0 with (A I)x = 0. This is
equivalent to the condition that A I is singular which in turn is equivalent to det(A I) =
0.
1.2. EIGENVALUES AND EIGENVECTORS 11

Since A is a polynomial of degree n, there will be n (some possibly repeated) eigenvalues,


denoted by 1 , . . . , n and determined by A (k ) = 0.
Denition 1.13. An eigenvalue has algebraic multiplicity q if q is the largest integer such
that (z )q is a factor of the characteristic polynomial A (z). The geometric multiplicity, r, is
the dimension of the null space of A I . An eigenvalue is simple if q = r = 1.
If is an eigenvalue of A Cnn then det(A I) = 0 which implies that =0
det(A I)
and so
(A I) has non-trivial null space. Thus there is a vector y (the eigenvector of A
corresponding to the eigenvalue )
with y A = y and y 6= 0.
Denition 1.14. A vector y Cn with

y A = y and y 6= 0

is known as a left eigenvector of A Cnn corresponding to the eigenvalue .


Note that, even though the corresponding eigenvalues are the same, the right and left eigen-
vectors of a matrix are usually dierent.

Denition 1.15. Matrices A, B Cnn are similar, if B = S 1 AS with S Cnn invertible.


The matrix S is a similarity transform.

Remarks. A Cnn has n linearly independent


If a matrix eigenvectors xi and we arrange
them as columns of the matrix X , then X is invertible. If we let denote a diagonal matrix
with eigenvalues of A on the diagonal, then we may write

AX = X.

By invertibility of X we have
A = XX 1 . (1.7)

Thus is a similarity transform of A. It reveals the eigenstructure of A and is hence very


useful in many situations. However, in general a matrix does not have n linearly independent
eigenvalues and hence generalizations of this factorization are important. Two which will arise
in the next chapter are:

Jordan Canonical Form: A = SJS 1 (see Theorem 2.7)

Schur Factorization: A = QT Q (see Theorem 2.2)

These are both similarity transformations which reveal the eigenvalues of A on the diagonals
of J and T, respectively. The Jordan Canonical Form is not stable to perturbations, but the
Schur Factorization is. Hence Schur Factorization will form the basis of good algorithms while
the Jordan Canonical Form is useful for more theoretical purposes, such as dening the matrix
exponential eAt .

Theorem 1.16 (Similarity and Eigenvalues). If B is similar to A, then B and A have the same
eigenvalues with the same algebraic and geometric multiplicities.
Proof. Exercise 2-4.

Lemma 1.17. For a simple eigenvalue ,


dim ker(A I)2 = 1.


Proof. (Sketch) We prove this in the case where A has n linearly independent eigenvalues
xi all of which correspond to simple eigenvalues i . Then A may be factorized as in (1.7) with
= diag{1 , 2 , . . . , n }. Hence

(A I)2 = XX 1 ,
12 CHAPTER 1. VECTOR AND MATRIX ANALYSIS

where = diag{(1 )2 , (2 )2 , . . . , (n )2 }.
Without loss of generality let = 1 , noting that j 6= by simplicity. ker() is one
2
dimensional, spanned by e1 . Hence ker(A I) is one dimensional by Theorem 1.16
The general case can be established by use of the Jordan form (see Theorem 2.7), using the
fact that the Jordan block corresponding to a simple eigenvalue is diagonal.

Theorem 1.18 (Eigenvalue Multiplicities). For any eigenvalue of A Rnn the algebraic and
geometric multiplicities q and r respectively satisfy
1 r q.

Proof. Let be the eigenvalue. Since (A I) is non-invertible, its null-space U has dimension
r 1. Let V Cnr have r columns comprising an orthonormal basis for U ; then extend V to
V Cnn by adding orthonormal columns so that V is unitary. Now
 
I C
B = V AV = ,
0 D

where I is the rr identity, C is r (n r) and D is (n r) (n r), and B and A are similar.


Then

det(B zI) = det(I zI) det(D zI)


= ( z)r det(D zI).

Thus B has algebraic multiplicity r for B and hence A has algebraic multiplicity r, by
Theorem 1.16.

Denition 1.19. The spectral radius of a matrix A Cnn is dened by



(A) = max || is eigenvalue of A .

By considering the eigenvectors of a matrix, we can dene an important class of matrices

Denition 1.20. A matrix A Cnn is normal i it has n orthogonal eigenvectors.


The importance of this concept lies in the fact, that normal matrices can always be diago-
nalised: if Q Cnn is a matrix where the columns form an orthonormal system of eigenvectors
and Cnn is the diagonal matrix with the corresponding eigenvalues on the diagonal, then
we have
A = QQ .
Using this relation, we see that every normal matrix satises A A = Q Q = Q Q =

AA . In Theorem 2.3 we will see that the condition A A = AA is actually equivalent to A

being normal in the sense of the denition above. Sometimes this alternative condition is used
to dene when a matrix is normal. As a consequence of this equivalence, every Hermitian matrix
is also normal.
Let now A be Hermitian and positive denite. Then Ax = x implies

xi.
hx, xi = hx, xi = hx, Axi = hAx, xi = hx, xi = hx,

Thus, all eigenvalues of A are real and we can arrange them in increasing order min = 1
n = max . The following lemma uses min and max to estimate the values of the norm
k kA from (1.5) by k k.
Lemma 1.21. Let min and max be the smallest and largest eigenvalues of a Hermitian, positive
denite matrix A Cnn . Then
min kxk2 kxk2A max kxk2 x Cn .
1.3. DUAL SPACES 13

Proof. Let 1 , . . . , n be an orthonormal system of eigenvectors of A with corresponding


Pn eigen-
values min = 1 n = max . By writing x as x = i=1 i i we get
n
X
kxk2 = |i |2
i=1

and
n
X
kxk2A = i i2 .
i=1
This gives the upper bound
n
X
kxk2A max i2 max kxk2
i=1

and similarly we get the lower bound.

1.3 Dual Spaces

Let h, i denote the standard inner-product.

Denition 1.22. Given a norm k k on Cn , the pair (Cn , k k) is a Banach space B . The Banach
space B , the dual of B , is the pair (C , k kB ), where
0 n 0

kxkB 0 = max |hx, yi|.


kyk=1

See Exercise 1-5 to deduce that the preceeding denition satises the norm axioms.

Theorem 1.23. The spaces (Cn , k k1 ) and (Cn , k k ) are the duals of one another.
Proof. Firstly, we must show
kxk1 = max |hx, yi|.
kyk =1

This is clearly true for x=0 and so we consider only x 6= 0. Now


n
X
|hx, yi| max |yi | |xj | = kyk kxk1 ,
i
j=1

and therefore
max |hx, yi| kxk1 .
kyk =1

We need to show that this upper-bound is achieved. If yj = xj /|xj | (with the convention that
this is 0 when xj is 0) then kyk = 1 (since x 6= 0) and

n
X n
X
hx, yi = |xj |2 /|xj | = |xj | = kxk1 .
j=1 j=1

Hence maxkyk =1 |hx, yi| = kxk1 .


Secondly, it remains to show that

kxk = max |hx, yi|.


kyk1 =1

We have

|hx, yi| kyk1 kxk


max |hx, yi| kxk .
kyk1 =1

If x = 0 we have equality; if not then, for some k such that |xk | = kxk > 0, choose yj =
jk xk /|xk |. Then
hx, yi = |xk | = kxk and kyk1 = 1.
Thus maxkyk1 =1 |hx, yi| = kxk .
14 CHAPTER 1. VECTOR AND MATRIX ANALYSIS

Theorem 1.24. If p, q (1, ) with p1 + q1 = 1 then the Banach spaces (Cn , k kp ) and
(Cn , k kq ) are the duals of one another.

Proof. See Exercise 1-6.

1.4 Matrix Norms

Since we can consider the space Cmn of all mn-matrices to be a copy of the mn-dimensional
mn
vector space C , we can use all vector norms on Cmn as vector norms on the matrices Cmn .
Examples of vector norms on the space of matrices include

maximum norm: kAkmax = maxi,j |aij |


P  12
m,n
Frobenius norm: kAkF = i,j=1 |aij |2

operator norm Cn Cm : if A Cmn ,

kAk(m,
n) = max kAxkm

kxkn
=1

where k km
is a norm on Cm , and k kn is a norm on Cn . Note that, for any operator
norm,
kAxkm
kAk(m,
n) = max kAxkm
= max kAxkm
= max .
kxkn
1 kxkn
=1 xCn \{0} kxkn

Sometimes it is helpful to consider special vector norms on a space of matrices, which are
compatible with the matrix-matrix multiplication.

Denition 1.25. A matrix norm on Cnn is a mapping k k : Cnn R with

a) kAk 0 for all A Cnn and kAk = 0 i A = 0,

b) kAk = ||kAk for all C, A Cnn ,

c) kA + Bk kAk + kBk for all A, B Cnn .

d) kABk kAkkBk for all A, B Cnn .

Remark. Conditions a), b) and c) state that kk is a vector norm on the vector space Cnn .
Condition d) only makes sense for matrices, since general vectors spaces are not equipped with
a product.

Examples of matrix norms include

p-operator norm Cn Cn : if A Cnn ,

kAkp = max kAxkp , 1p


kxkp =1

The vector operator norm from Cn into Cm reduces to the p-operator norm if n=m and
the p-norm is chosen in the range and image spaces.

Denition 1.26. Given a vector norm k kv on Cn we dene the induced norm k km on Cnn
by
kAxkv
kAkm = max
x6=0 kxkv
nn
for all AC .

We now show that the induced norm is indeed a norm.


1.4. MATRIX NORMS 15

Theorem 1.27. The induced norm k km of a vector norm k kv is a matrix norm with
kIkm = 1

and
kAxkv kAkm kxkv
for all A Cnn and x Cn .
Proof. a) kAkm R and kAkm 0 for all A Cnn is obvious from the denition. Also from
the denition we get

kAxkv
kAkm = 0 = 0 x 6= 0
kxkv
kAxkv = 0 x 6= 0 Ax = 0 x 6= 0 A = 0.

b) For C and A Cnn we get

kAxkv ||kAxkv
kAkm = max = max = ||kAkm
x6=0 kxkv x6=0 kxkv

c) For A, B Cnn we get

kAx + Bxkv
kA + Bkm = max
x6=0 kxkv
kAxkv + kBxkv
max
x6=0 kxkv
kAxkv kBxkv
max + max = kAkm + kBkm .
x6=0 kxkv x6=0 kxkv

Before we check condition d) from the denition of a matrix norm we verify

kIxkv kxkv
kIkm = max = max =1
x6=0 kxkv x6=0 kxkv

and
kAykv kAxkv
kAkm = max x Cn \ {0}
y6=0 kykv kxkv
which gives
kAxkv kAkm kxkv x Cn .
d) Using this estimate we nd

kABxkv
kABkm = max
kxkv
x6=0

kAkm kBxkv
max = kAkm kBkm .
x6=0 kxkv

Remarks.
1. Usually one denotes the induced matrix norm with the same symbol as the corresponding
vector norm. For the remainder of this text we will follow this convention.

2. As a consequence of theorem 1.27 we can see that not every matrix norm is an induced
norm: If k km is a matrix norm, then it is easy to check that k k0m = 2k km is a matrix
norm, too. But at most one of these two norms can equal 1 for the identity matrix, and
thus the other one cannot be an induced matrix norm.
16 CHAPTER 1. VECTOR AND MATRIX ANALYSIS

3. Recall that k kA := kA k is a vector norm on Cn whenever kk is, provided that A is


invertible. The inequality from Theorem 1.27 gives the following upper and lower bounds
for the norm kk in terms of the original norm:

1
kxk kxkA kAkkxk.
kA1 k

Theorem 1.28. The matrix norm induced by the innity norm is the maximum row sum:
n
X
kAk = max |aij |.
1in
j=1

Proof. For x Cn we get

n
X n
X
kAxk = max |(Ax)i | = max | aij xj | max |aij |kxk
1in 1in 1in
j=1 j=1

which gives
n
kAxk X
max |aij |
kxk 1in
j=1
Pn
for all x Cn kAk max1in j=1 |aij |.
and thus
For the lower bound choose k {1, 2, . . . , n} such that

n
X n
X
max |aij | = |akj |
1in
j=1 j=1

and dene x Cn by xj = akj /|akj | for all j = 1, . . . , n (with the convention that this is 0 when
akj is 0). Then we have kxk = 1 and

kAxk
kAk
kxk
n
X akj
= max | aij |
1in
j=1
|akj |
n
X akj
| akj |
j=1
|akj |
n
X
= |akj |
j=1
n
X
= max |aij |.
1in
j=1

This is the required result.

Theorem 1.29. Let A Cnn . Then kA k1 = kAk and so


n
X
kAk1 = kA k = max |aij |.
1jn
i=1

This expression is known as the maximum column sum.


1.4. MATRIX NORMS 17

Proof. (Sketch)

kAk = max kAxk


kxk =1
 
= max max |hAx, yi| (Dual denition)
kxk =1 kyk1 =1
 
= max max |hAx, yi| (needs careful justication)
kyk1 =1 kxk =1
 
= max max |hx, A yi| (denition of A )
kyk1 =1 kxk =1

= max kA yk1 (Dual denition)


kyk1 =1

= kA k1 .

Since (A )ij = aji , the result follows.

Remark. This result is readily extended to non-square matrices A Cmn .

Recall that the spectral radius of a matrix is dened as


(A) = max || is eigenvalue of A .

Theorem 1.30. For any matrix norm k k, any matrix A Cnn and any k N we have
(A)k (Ak ) kAk k kAkk .

Proof. B = Ak . The rst inequality is a consequence of the fact that, whenever x is an


Let
eigenvector of A with eigenvalue , the vector x is also an eigenvector of B , but with eigenvalue
k . By denition of the spectral radius (B) we can nd an eigenvector x with Bx = x and
(B) = ||. Let X Cnn be the matrix where all n columns are equal to x. Then we have
BX = X and thus

kBkkXk kBXk = kXk = ||kXk = (B)kXk.

Dividing by kXk gives (B) kBk. The nal inequality follows from property d) in the denition
of a matrix norm.

Theorem 1.31. If A Cnn is normal, then


(A)` = kA` k2 = kAk`2 ` N.

Proof. Let x1 , . . . , xn be an orthonormal basis composed of eigenvectors of A with corresponding


eigenvalues 1 , . . . , n . Without loss of generality we have (A) = |1 |.
Let x Cn . Then we can write
Xn
x= j xj
j=1

and get
n
X
kxk22 = |j |2 .
j=1

Similarly we nd
n
X n
X
Ax = j j xj and kAxk22 = |j j |2 .
j=1 j=1
18 CHAPTER 1. VECTOR AND MATRIX ANALYSIS

This shows

2 1/2
Pn 
kAxk2 j=1 |j j |
=
kxk2 2 1/2
Pn 
j=1 |j |
 Pn |j |2 |1 |2 1/2
j=1
P n 2
j=1 |j |
= |1 | = (A) x Cn

and consequently kAk2 (A).


Using Theorem 1.30 we get

(A)` kA` k2 kAk`2 (A)`

for all ` N. This completes the proof.

Similar methods to those used in the proof of the previous result yield the following theorem.

Theorem 1.32. For all matrices A Cmn


kAk22 = (A A).

Proof. See Exercise 1-9.

The matrix 2-norm has the special property that it is invariant under multplication by a
unitary matrix. This is the analog of Theorem 1.9 for vector norms.

Theorem 1.33. For all matrices A Cmn and unitary matrices U Cmm , V Cnn

kU Ak2 = kAk2 , kAV k2 = kAk2 .

Proof. The rst result follows from the previous theorem, after noting that (U A) (U A) =
A U U A = A A. Because (AV ) (AV ) = V (A A)V and because V (A A)V is a similarity
transformation of A A the second result also follows from the previous theorem.

Let A, B Cmn . In the following it will be useful to employ the notation |A| to denote the
matrix with entries (|A|)ij = |aij | and the notation |A| |B| as shorthand for |aij | |bij | for
all i, j .
Lemma 1.34. If two matrices A, B Cmn satisfy |A| |B| then kAk kBk and kAk1
kBk1 . Furthermore k|AB|k k|A||B|k .
Proof. For the rst two observations, it suces to prove the rst result since kAk1 = kA k and

|A| |B| implies that |A | |B |. The rst result is a direct consequence of the representation
of the -norm and 1-norm from theorems 1.28 and 1.29. To prove the last result note that

X
(|AB|)ij = | Aik Bkj |
k
X
|Aik ||Bkj |
k
= (|A||B|)ij .

The rst result completes the proof.

Lemma 1.35. Let A, B Cnn . Then


kAkmax kAk nkAkmax ,
kABkmax kAk kBkmax ,
kABkmax kAkmax kBk1 .
1.5. STRUCTURED MATRICES 19

Proof. Exercise 1-8.

Denition 1.36. The outer product of two vectors a, b Cn is the matrix a b Cnn dened
by

(a b)c = (b c)a = hb, cia c Cn .

We sometimes write a b = ab . The ij th entry of the outer product is (a b)ij = aibj .

Denition 1.37. Let S be a subspace of Cn . Then the orthogonal complement of S is dened


by

S = {x Cn | hx, yi = 0 y S}.

The orthogonal projection onto S, P , can be dened as follows: let {yi }ki=1 be an orthonormal
basis for S, then
k
X k
X  k
X 
Px = hyj , xiyj = yj yj x = yj yj x.
j=1 j=1 j=1

Theorem 1.38. P is a projection, that is P2 = P. Furthermore, if P = I P , then P is


the orthogonal projection onto S .
Proof. Extend {yi }ki=1 to a basis for Cn , denoted {yi }ni=1 , noting that S = span {yk+1 , . . . , yn }.
n
Any xC can be written uniquely as

n
X
x= hyj , xiyj ,
j=1

and so
k
X
Px = hyj , xiyj ,
j=1

found by truncating to k terms. Clearly truncating again leaves the expression unchanged:

k
X
P 2x = hyj , xiyj = P x, x Cn .
j=1

Pn
Now (I P )x = P x = j=k+1 hyj , xiyj , proving the second result.

1.5 Structured Matrices

Denition 1.39. A matrix A Cnn is

diagonal if aij = 0 i 6= j
(strictly) upper-triangular if aij = 0 i > j ()
(strictly) lower-triangular if aij = 0 i < j ()
upper Hessenberg if aij = 0 i > j + 1
upper bidiagonal if aij = 0 i > j & i < j 1
tridiagonal if aij = 0 i > j + 1 & i < j 1

Denition 1.40. A matrix P Rnn is called a permutation matrix if every row and every
column contains n1 zeros and 1 one.
20 CHAPTER 1. VECTOR AND MATRIX ANALYSIS

Remarks.
1. If P is a permutation matrix, then we have

n
X
(P T P )ij = pki pkj = ij
k=1

and thus P T P = I. This shows that permutation matrices are orthogonal.

2. If : {1, . . . , n} {1, . . . , n} is a permutation, then the matrix P = (pij ) with

(
1 if j = (i) and
pij =
0 else

is a permutation matrix. Indeed every permutation matrix is of this form. In particular


the identity matrix is a permutation matrix.

3. If P is the permutation matrix corresponding to the permutation , then (P 1 )ij = 1 if and


1 1 1
only if j= (i). Thus the permutation matrix P corresponds to the permutation .

4. We get
n
X
(P A)ij = pik akj = a(i),j
k=1

for all i, j {1, . . . , n}. This shows that multiplying a permutation matrix from the left
reorders the rows of A. Furthermore we have
n
X
(AP )ij = aik pkj = ai,1 (j)
k=1

and hence multiplying a permutation matrix from the right reorders the columns of A.

5. If P is a permutation matrix, then PT is also a permutation matrix.

Bibliography

Excellent treatments of matrix analysis may be found in [Bha97] and [HJ85]. More advanced
treatment of the subject includes [Lax97]. Theorem 1.16 and Lemma 1.17 are proved in [Ner71].
The proof of Theorem 1.24 may be found in [Rob01]. Theorem 1.32 is proved in the solutions
for instructors.

Exercises

Exercise 1-1.
Show that the following relations hold for all x Cn :
a) kxk2 kxk1 nkxk2 ,

b) kxk kxk2 nkxk and
c) kxk kxk1 nkxk .

Exercise 1-2. Prove that, for Hermitian positive denite A, equations (1.4) and (1.5) dene an
inner-product and norm, respectively.

Exercise 1-3. For matrices in Rmn prove that


kAkmax kAkF mnkAkmax .
1.5. STRUCTURED MATRICES 21

Exercise 1-4. Dene an inner product h, i on matrices in Rnn such that

kAk2F = hA, Ai.

Exercise 1-5. Prove that the norm k kB 0 appearing in Denition 1.22 is indeed a norm.

Exercise 1-6. Prove Theorem 1.24.

Exercise 1-7. Show that kAkmax = maxi,j |aij | for all A Cnn denes a vector norm on the
space of n n-matrices, but not a matrix norm.

Exercise 1-8. Prove Lemma 1.35.

Exercise 1-9. Show that kAk22 = (AT A) for every matrix A Rnn (this is the real version
of Theorem 1.32).

Exercise 1-10. For A Rnn recall the denition of |A|, namely (|A|)ij = |aij |. Show that

k|A|k = kAk

holds in the Frobenius, innity and 1-norms. Is the result true in the Euclidean norm? Justify
your assertion.

Exercise 1-11. Let kk be an operator norm. Prove that if kXk < 1, then

I X is invertible,
P
the series i=0 Xi converges, and
P
(I X)1 = i=0 X i.

Moreover, prove that in the same norm

k(I X)1 k (1 kXk)1 .

Exercise 1-12. Let K be a matrix in Rnn with non-negative entries and let f, g be two vectors
n
in R with strictly positive entries which satisfy

(Kf )i /gi < , (K T g)i /fi < i {1, . . . , n}.

Prove that kKk22 .

Exercise 1-13. A Rkl , B Rlm and C Rmm . Here k l. If A and C are


Let
T m T l
orthogonal, that is if C C = I (the identity on R ) and A A = I (the identity on R ) then
show that kABCk2 = kBk2 .

Exercise 1-14. Prove that the Frobenius norm of a matrix is unchanged by multplication by
unitary matrices. This is an analogue of Theorem 1.33.

Exercise 1-15. Show that for every vector norm kk on Rnn there is a number >0 such
that
kAk = kAk A Rnn
denes a matrix norm.
22 CHAPTER 1. VECTOR AND MATRIX ANALYSIS
Chapter 2

Matrix Factorisations
In this chapter we present various matrix factorisations. These are of interest in their own right,
and also because they form the basis for many useful numerical algorithms.
There are two groups of results presented in this chapter. The rst kind of results factorises
a matrix A Cnn as
1
A = S AS
where, typically, A is of a simpler form than the original matrix A. Results of this type are
matrix diagonalisations and the Jordan canonical form. These factorisations are useful, because
properties of A are often easier to understand than properties of A and often questions about A
can be reduced to questions about A. For example, since 1 x) = (S 1 x),
Ax = x implies A(S
the matrices A and A have the same eigenvalues (but dierent eigenvectors). These factorisations
are typically used in proofs.
The second group of results, including the QR factorisation and LU factorisation, just splits
a matrix A into two, simpler parts:
A = BC
where B and C are matrices of a simpler form than A is, for example triangular matrices. These
factorisations typically form the basis of numerical algorithms, because they allow to split one
complicated problem into two simpler ones. This strategy will be used extensively in the later
chapters of this text.

2.1 Diagonalisation

A matrix A Cnn is diagonalised by nding a unitary matrix Q Cnn and a diagonal matrix
nn
DC such that
A = QDQ .
Since this implies AQ = DQ we see, by considering the individual columns of this matrix
equation, that the diagonal elements of D are the eigenvalues of A and the (orthonormal) columns
of Q are the corresponding eigenvectors. This insight has several consequences: Firstly, a matrix
can be diagonalised if and only if it has a complete, orthonormal system of eigenvectors. And,
secondly, there can be no direct algoritms to diagonalise a matrix, since the eigenvalues of a
matrix in general cannot be found exactly in nite time (see the discussion around Theorem 8.2).
Thus, diagonalisation will be mostly useful as a tool in our proofs and not as part of an algorithm.
The basic result in this section is the Schur triangularisation of a matrix; diagonalisation will
follow from this. The next lemma is key in proving the Schur factorisation.

Lemma 2.1. For all A Cnn satisfying dim(range(A)) = k n, there is an orthonormal set
{y1 . . . , yk } range(A) with the property that Ayl span{y1 , . . . , yl } for l = 1, . . . , k .

Proof. k = 1, then there


If is a y1 Cn with ky1 k2 = 1 which spans range(A). Clearly
Ay1 range(A) = span{y1 }.

23
24 CHAPTER 2. MATRIX FACTORISATIONS

For induction assume that we have the result for some k < n. Let A satisfy dim(range(A)) =
k+1. y1 to be an eigenvector of A with ky1 k2 = 1. Let P denote the orthogonal projection
Choose

onto span{y1 } and P = I P . Dene A = P A and note that dim(range(A )) = k . By the

inductive hypothesis, there is an orthonormal set {y2 , . . . , yk+1 } range(A ) with the property

A yl span{y2 , . . . , yl } l = 2, . . . , k + 1.

Furthermore we have that y1 is orthogonal to span{y2 , . . . , yk }. Consider the set {y1 , . . . , yk+1 }.
Note that Ay1 = y1 . Also

Ayl = (P A + P A)yl = P Ayl + A yl .

Since P Ayl span{y1 } and A yl span{y2 , . . . , yl } we obtain

Ayl span{y1 , . . . , yl },

as required.

Theorem 2.2 (Schur Factorisation). For any A Cnn , there is a unitary Q Cnn and an
upper triangular T Cnn such that
A = QT Q .

Proof. Let k = dim(range(A)), and construct orthonormal vectors {y1 , . . . , yk } as in Lemma 2.1.
dim range(A) = n k , we can nd an orthonormal basis {yk+1 , . . . , yn } of range(A) .

Since
Then {yq , . . . , yn } is an orthonormal basis of Cn and

Ayl range(A) = span{y1 , . . . , yk } span{y1 , . . . , yl }

for l = k + 1, . . . , n. We also have Ayl span{y1 , . . . , yl } for l = 1, . . . , k and thus

l
X
Ayl = tjl yj , l = 1, . . . , n.
j=1

Q = y1 yn
Letting and dening T by (T )ij = tij for ij and (T )ij = 0 for i>j we obtain
AQ = QT as required.

Theorem 2.3 (Normal Diagonalisation). If A Cnn satises A A = AA , then there is


unitary Q Cnn and diagonal D Cnn such that A = QDQ .
Proof. By Schur factorisation, there is T upper triangular and Q unitary such that

A = QT Q ,

and it suces to show that T is diagonal.


We have
A A = QT T Q and QT T Q = AA ,
and since A is normal we deduce that T T = T T . Now
X X
(T T )ij = (T )ik (T )kj = (T)ki (T )kj ,
k k

so that
X i
X
(T T )ii = |tki |2 = |tki |2 .
k k=1

Similarly,
n
X
(T T )ii = |tik |2 .
k=i
2.1. DIAGONALISATION 25

We now prove that T is diagonal by equating these expressions and using induction.

n
X
i=1: |t11 |2 = |t1k |2 ,
k=1

and so t1k = 0 k = 2, . . . , n.
for
Assume for induction in m that tlk = 0 for l = 1, . . . , m 1 and all k 6= l. Note that we have
proved this for m = 2. Then
m
X
(T T )mm = |tkm |2 = |tmm |2 (by induction hyp.)
k=1
Xn n
X
(T T )mm = |tmk |2 = |tmm |2 + |tmk |2 ,
k=m k=m+1

and so tmk = 0 for k = m + 1, . . . , n. Also, tmk = 0 for k = 1, . . . , m 1 since T is upper


triangular. Thus

tmk = 0 k 6= m
and tlk = 0 l = 1, . . . , m, k 6= l,
and the induction is complete.

Remark. In the situation of the preceeding theorem, the diagonal elements of D are the
eigenvalues of A and the columns of Q are corresponding eigenvectors. Since Q is unitary, the
eigenvectors are orthogonal and thus the matrix A is normal. When combined with the discussion
after denition 1.20 this shows that a matrix A is normal if and only if A A = AA .

Theorem 2.4 (Hermitian Diagonalisation). If A Cnn is Hermitian, then there exists a


unitary matrix Q Cnn and diagonal Rnn such that A = QQ .
Proof. Since Hermitian matrices are normal, A can be factorised in the required form with
Cnn diagonal by Theorem 2.3. It remains to show that is real. We have

AQ = Q,
and hence, if q1 , . . . , q n are the columns of Q, we get Aqi = i qi and kqi k = 1. This implies

i = hqi , i qi i = hqi , Aqi i = hAqi , qi i = hi qi , qi i = i


for i = 1, . . . , n as required.

To illustrate the usefulness of Hermitian diagonalisation, we consider the following applica-


tion.

Lemma 2.5. Let A Cnn be Hermitian and positive denite. Then there is a Hermitian,
positive denite matrix A1/2 Cnn , the square root of A, such that A = A1/2 A1/2 .
Proof. Since A Cnn is positive-denite, we have
i kxi k22 = hxi , Axi i > 0
for all eigenpairs (xi , i ) and thus all eigenvalues are positive.
By Theorem 2.4 we have
A = QQ

with = diag(1 , . . . , n ). Since all i 0, we may dene 1/2 = diag( 1 , . . . , n ) and this
is real. Now dene
A1/2 = Q1/2 Q . (2.1)
1/2 1/2 1/2 1/2

Then A A = Q Q = QQ = A as required and, since i > 0 for all i = 1, . . . , n,
the matrix A1/2 is Hermitian, positive denite.
26 CHAPTER 2. MATRIX FACTORISATIONS

Remarks.

A real, positive number has two distinct square roots, and . Similarly, a Hermi-
nn n
tian, positive denite matrix A C has 2 distinct square roots, obtained by choosing
1/2
all possible combination of signs in front of the square roots on the diagonal of in (2.1).
The square root constructed in the lemma is the only positive one.

The same principle used to construct the square root of a matrix here, can be used to
construct many dierent functions of a Hermitian matrix: one diagonalises the matrix and
applies the function to the eigenvalues on the diagonal.

2.2 Jordan Canonical Form

Denition 2.6. A Jordan block Jn () Cnn for C is the matrix satisfying Jk ()ii = ,
Jk ()i,i+1 = 1, and Jk ()ij = 0 i, j = 1, . . . , n, i.e. a matrix of the form
else, for

1
.. ..
. .
Jk () = .
..
. 1

A Jordan matrix is a block diagonal matrix J Cnn of the form

Jn1 (1 )
Jn2 (2 )
J =

..
.
Jnk (k )
Pk
where j=1 nj = n.
The following factorisation is of central theoretical importance.

Theorem 2.7 (Jordan Canonical Form). For any A Cnn there is an invertible S Cnn
and a Jordan matrix J Cnn satisfying
A = SJS 1
where the diagonal elements 1 , . . . , k of the Jordan blocks are the eigenvalues of A.

Remarks.
1. Clearly both the normal and Hermitian diagonalisation results reveal the eigenvalues of A:
they are simply the diagonal entries of D and . This is also true of the Jordan and Schur
factorisations. The following lemma shows that triangular matrices reveal their eigenvalues
as diagonal entries. Since both the Jordan Canonical Form and the Schur Factorisation pro-
vide similarity transformations of A which reduce it to triangular form, and since similarity
transformations leave the eigenvalues unchanged, this establishes the desired properties.
Thus all the preceding factorisations are eigenvalue revealing factorisations.
2. An eigenvalue revealing factorisation cannot be achieved in a nite number of arithmetic
steps, in dimension n 5, since it implies factorisation of a polynomial equation of degree
n. See Chapter 8.

Lemma 2.8. Let T Cnn be triangular. Then


n
Y
det(T ) = Tii .
i=1

Hence the eigenvalues of T are its diagonal entries.


2.2. JORDAN CANONICAL FORM 27

Proof. Let Tj Cjj be upper triangular:

b a C, b, 0 Cj1 ,
 
a
Tj = ,
0 Tj1 Tj1 C(j1)(j1) upper triangular.

Then det Tj = a det(Tj1 ). By induction,

n
Y
det(T ) = Tii .
i=1

Eigenvalues of T are such that det(T I) = 0. Now T I is triangular with diagonal entries
Tii , therefore
n
Y
det(T I) = (Tii ).
i=1

Hence det(T i ) = 0 if and only if i = Tii for some i = 1, . . . , n.


As an example of the central theoretical importance of the Jordan normal form we now prove
a useful lemma showing that a matrix norm can be constructed which, for a given matrix A, has
norm arbitrarily close to the spectral radius.

Denition 2.9. A -Jordan block Jn () Cnn for C is the matrix satisfying Jk ()ii = ,
Jk ()i,i+1 = , and Jk ()ij = 0 else, for i, j = 1, . . . , n. A -Jordan matrix is a block diagonal

matrix J Cnn of the form

Jn1 (1 )
Jn 2 (2 )
J =

..
.
Jn k (k )
Pk
where j=1 nj = n.
Lemma 2.10. Let A Cnn and > 0. Then there is a vector norm k kS on Cn such that
the induced matrix norm satises (A) kAkS (A) + .
Proof. From Theorem 1.30 we already know (A) kAk for every matrix norm k k. Thus we
only have to show the second inequality of the claim.
Let J = S 1 AS be the Jordan Canonical Form of A and D = diag(1, , 2 , . . . , n1 ). Then

(SD )1 A(SD ) = D1 JD = J .

Dene a vector norm k kS on Cn by

kxkS = (SD )1 x

for all x Cn . Then the induced matrix norm satises

kAxkS
kAkS = max
x6=0 kxkS

k(SD )1 Axk
= max
x6=0 k(SD )1 xk

k(SD )1 A(SD )yk


= max
y6=0 kyk
1

= (SD ) A(SD )k

= kJ k .

Since we know the k k -matrix norm from Theorem 1.28 and we have calculated the explicit
form of the matrix (SD )1 A(SD ) above, this is easy to evaluate. We get kAk maxi |i |+ =
(A) + . This completes the proof.
28 CHAPTER 2. MATRIX FACTORISATIONS

Remark. In general, ( ) is not a norm. But note that if the Jordan matrix J is diagonal then
=0 and we can deduce the existence of a norm in which kAkS = (A). This situation arises
whenever A is diagonalisable.

2.3 Singular Value Decomposition

The singular value decomposition is based on the fact that, for any matrix A, it is possible to
nd a set of real positive i and vectors ui , vi such that

Avi = i ui .

The i are known as singular values and, in some applications, are more useful than eigenvalues.
This is because the singular values exist even for non-square matrices, because they are always
real, and because the {ui } and {vi } always can be chosen orthogonal. Furthermore, the singular
value decomposition is robust to perturbations, unlike the Jordan canonical form.

Denition 2.11. Let A Cmn with m, n N. A factorisation

A = U V

is called singular value decomposition U Cmm and V Cnn are unitary,


(SVD) of A, if
mn
R is diagonal, and the diagonal entries of are 1 2 p 0 where
p = min(m, n). The values 1 , . . . , p are called singular values of A. The columns of U are
called left singular vectors of A, the columns of V are right singular vectors of A.

A = U V

Theorem 2.12 (SVD). Every matrix has a singular value decomposition and the singular values
are uniquely determined.
Proof. A Cmn . We prove existence of the SVD by induction over p = min(m, n). If p = 0
Let
the matrices U , V , and are just the appropriately shaped empty matrices (one dimension is
zero) and there is nothing to show.
Assume p>0 and that the existence of the SVD is already known for matrices where one
dimension is smaller than 1 = kAk2 = maxx6=0 kAxk
min(m, n). Let
kxk2 = maxkxk2 =1 kAxk2 .
2

n
Since the map v 7 Av is continuous and the set { x | kxk2 = 1 } C is compact, the image
{ Ax | kxk2 = 1 } C is also compact. Since k k2 : C R is continuous there is a v1 Cn
m n

with kv1 k2 = 1 and


kAv1 k2 = max kAxk2 = 1 .
kxk2 =1

Dening u1 = Av1 /1 we get ku1 k2 = 1.


Extend {v1 } to an orthonormal basis {v1 , . . . , vn } of Cn and {u1 } to an orthonormal basis
{u1 , . . . , um } of Cm . Consider the matrices

U1 = (u1 , . . . , um ) Cmm

and
V1 = (v1 , . . . , vn ) Cnn .
Then the product U1 AV1 is of the form

w
 
1
S = U1 AV1 =
0 B
2.3. SINGULAR VALUE DECOMPOSITION 29

withw Cn1 , 0 Cm1 and B C(m1)(n1) .


For unitary matrices U we have kU xk2 = kxk2 and thus

kU1 AV1 xk2 kAV1 xk2


kSk2 = max = max = kAk2 = 1 .
x6=0 kxk2 x6=0 kV1 xk2

On the other hand we get

+ w w
   2   
1 1
1/2
= 1 12 + w w = 12 + w w

w 2 Bw w 2
S
2

and thus kSk2 (12 + w w)1/2 . Thus we conclude that w = 0 and thus
 
1 0
A = U1 SV1 = U1 V1 .
0 B

By the induction hypothesis the (m1)(n1)-matrix B has a singular value decomposition

B = U2 2 V2 .

Then      
1 0 1 0 1 0
A = U1 V1
0 U2 0 2 0 V2
is a SVD of A and existence of the SVD is proved.
Uniqueness of the largest singular value 1 holds, since 1 is uniquely determined by the
relation
kU V xk2 kxk2
kAk2 = max = max = 1 .
x6=0 kxk2 x6=0 kxk2

Uniqueness of 2 , . . . , n follows by induction as above.

The penultimate line of the proof shows that, with the ordering of singular values as dened,
we have the following:

Corollary 2.13. For any matrix A Cmn we have kAk2 = 1 .


Remarks.
1. Inspection of the above proof reveals that for real matrices A the matrices U and V are
also real.

2. If m > n then the last mn columns of U do not contribute to the factorisation A = U V :

A = U V

Hence we can also write AA=Uas V


where U Cmn consists of the rst n columns
Cnn
of U and consists of the rst n rows of . This factorisation is called the reduced
singular value decomposition (reduced SVD) of A.
3. Since we have A A = V U U V = V V and thus A A V = V , we nd

A Avj = j2 vj for the columns v1 , . . . , vn of V . This shows that the vectors vj are eigen-
2
vectors of A A with eigenvalues j .

4. From the proof we see that we can get the k k2 -norm of a matrix from its SVD: we have
kAk2 = 1 .
30 CHAPTER 2. MATRIX FACTORISATIONS

Theorem 2.14. For m n the SVD has the following properties:


1. If A Rnn is Hermitian then A = QQ with = diag(1 , . . . , n ) and Q = q1 qn .


An SVD of A may be found in the form A = U V T with U = Q, = ||, and



V = v1 vn , vi = sgn(i )qi .

2. The eigenvalues of A A are i2 and the eigenvectors of A A are the right singular vectors
vi .

3. The eigenvalues of AA are i2 and (m n) zeros. The (right) eigenvectors of AA cor-


responding to eigenvalues i2 are the left singular vectors ui corresponding to the singular
values i .
Proof. 1. By denition.

2. We have, from the reduced SVD,

V
A=U = A A = V
2 V Rnn

Since V is orthogonal and 2 = diag( 2 , . . . , 2 ),


the result follows.
1 n

3. We have

A = U V = AA = U U
where Rm(mn)
U is any matrix such that [U ] Rmm
U is orthogonal. The result
then follows since

2
 
0
= .
0 0

For the rest of this section let A Cmn be a matrix with singular value decomposition

A = U V and singular values 1 r > 0 = = 0. To illustrate the usefulness of the
SVD we prove several fundamental results about it.

Theorem 2.15. The rank of A is equal to r.


Proof. Since U and V are invertible we have rank(A) = rank() = r.

Theorem 2.16. We have range(A) = span{u1 , . . . , ur } and ker(A) = span{vr+1 , . . . , vn }.


Proof. Since is diagonal and V is invertible we have

range(V ) = range() = span{e1 , . . . , er } Cm .

This shows

range(A) = range(U V ) = span{u1 , . . . , ur } Cm .

We also have

ker(A) = ker(U V ) = ker(V ).


Since V is orthogonal we can conclude

ker(A) = span{vr+1 , . . . , vn } Cn .
2.4. QR FACTORISATION 31

Theorem 2.17 (The SVD and Eigenvalues). Let A Rnn be invertible with SVD A = U V T .
If
AT
 
0
H= R2n2n
A 0
and
U = (u1 un )
V = (v1 vn )
= diag(1 , . . . , n )

then H has
2n eigenvalues {i }ni=1
 
vi
eigenvectors

1

i = 1, . . . , n
2 ui

Proof. If Hx = x with x = (y T , z T )T then

AT z = y
Ay = z.

Hence

AT (z) = 2 y
and AT Ay = 2 y.

Thus 2 {12 , . . . , n2 } and so the 2n eigenvalues of H are drawn from the set

{1 , . . . , n }.

Note that

AT U = V , AV = U
and so

Avi = i ui , AT ui = i vi .

The eigenvalue problem for H may be written as

Ay = z, AT z = y.

Hence, taking = i , we obtain 2n solutions of the eigenvalue problem given by

1
(y T , z T ) = (vi , ui ).
2

This exhibits a complete set of 2n eigenvectors for H.

2.4 QR Factorisation

The SVD factorisation, like the four preceding it, reveals eigenvalues; hence it cannot be achieved
in a nite number of steps. The next three factorisations, QR, LU and Cholesky do not reveal
eigenvalues and, as we will show in later chapters, can be achieved in a polynomial number
of operations, with respect to dimension n. Recall the following classical algorithm for the
construction of an orthonormal basis from the columns of a matrix A.
32 CHAPTER 2. MATRIX FACTORISATIONS

Algorithm (Gram-Schmidt orthonormalisation).


input: A Cmn with m n
mm mn
output: Q C unitary, R C upper triangular with A = QR
m
let a1 , . . . , an C be the columns of A.

1: R = 0
2: for j=1,. . . ,nPdo
j1
3: qj = aj k=1 rkj qk with rkj = hqk , aj i
4: rjj = k qj k2
5: if rjj > 0 then
6: qj = qj /rjj
7: else
8: let qj be an arbitrary normalised vector orthogonal to q1 , . . . , qj1
9: end if
10: end for
11: choose qn+1 , . . . , qm to make q1 , . . . , qm an orthonormal basis.
12: let q1 , . . . , qm Cm be the columns of Q; let (R)ij = rij , i j , (R)ij = 0 otherwise.

m A = Q R

From this algorithm we prove:

Theorem 2.18 (QR factorisation). Every matrix A Cmn with m n can be written as
A = QR where Q Cmm is unitary and R Cmn is upper triangular.
Proof. The Gram-Schmidt algorithm calculates matrices Q and R with
j
X  j1
X 
(QR)ij = qk rkj = qk rkj + qj = (aj )i
i i
k=1 k=1

and thus we get A = QR.


By construction we have kqj k2 = 1 for j = 1, . . . , m. We use induction to show that the
columns q1 , . . . , qj are orthogonal for all j {1, . . . , m}. For j = 1 there is nothing to show.
Now let j > 1 and assume that q1 , . . . , qj1 are orthogonal. We have to prove hqi , qj i = 0 for
i = 1, . . . , j 1. If rjj = 0, this holds by denition of qj . Otherwise we have
1
hqi , qj i = hqi , qj i
rjj
j1
1 X 
= hqi , aj i rkj hqi , qk i
rjj
k=1
1 
= hqi , aj i rij
rjj
= 0.

Thus induction shows that the columns of Q are orthonormal and hence that Q is unitary.

Remarks.
1. The factorisation in the theorem is called full QR factorisation. Since all entries below the
diagonal of R are 0, the columns n + 1, . . . , m of Q do not contribute to the product QR.
2.5. LU FACTORISATION 33

Let Cmn consist of the rst n columns of Q and R


Q Cnn consist of the rst n
rows of R. Then we have A = QR. This is called the reduced QR factorisation of A. The

following picture illustrates the situation.

m A = Q R
mn

n n mn

2. For m=n we get square matrices Q, R Cnn . Since

det(A) = det(QR) = det(Q) det(R)

and det(Q) {+1, 1} the matrix R is invertible if and only if A is invertible.

3. The Gram-Schmidt orthonormalisation algorithm is numerically unstable and should not


be used to calculate a QR factorisation in practice.

2.5 LU Factorisation

Denition 2.19. A triangular matrix is said to be unit if all diagonal entries are equal to 1.
Denition 2.20. The j th principal sub-matrix of a matrix A Cnn is the matrix Aj Cjj
with (Aj )kl = akl for 1 k, l j .
Theorem 2.21 (LU Factorisation). a) Let A Cnn be a matrix such that Aj is invertible
for j = 1, . . . , n. Then there is a unique factorisation A = LU where L Cnn is unit lower
triangular and U Cnn is non-singular upper triangular. b) If Aj is singular for one j
{1, . . . , n} then there is no such factorisation.
The following picture gives a graphical representation of the LU factorisation.

A = L U

Proof. a) We use a proof by induction: If n = 1 we have a1 6= 0 by assumption and can set


L = (1) C11 and U = (a11 ) C11 to get A = LU . Since L is the only unit lower triangular
1 1-matrix the factorisation is unique.
(n1)(n1)
Now let n > 1 and assume that any matrix A C can be uniquely factorised in
nn
the required form A = LU if all its principal sub-matrices are invertible. We write A C as
 
An1 b
A= (2.2)
c ann

where An1 is the (n 1)th principal sub-matrix of A, and b, c C(n1) and ann C are the
remaining blocks. We are looking for a factorisation of the form
    
L 0 U u LU Lu
A= = (2.3)
` 1 0 ` U ` u +
34 CHAPTER 2. MATRIX FACTORISATIONS

with L C(n1)(n1) unit lower triangular, U C(n1)(n1) invertible upper triangular,


`, u Cn1 and C\{0}. We compare the blocks of (2.2) and (2.3).
By the induction hypothesis L and U with An1 = LU exist and are unique. Since the matrix
L is invertible the condition Lu = b determines a unique vector u. Since U is invertible there is

an uniquely determined ` with U ` = c and thus ` U = c . Finally the condition ` u + = ann
uniquely determines C. This shows that the required factorisation for A exists and is unique.
Since 0 6= det(A) = 1 det U the upper triangular matrix is non-singular and 6= 0.
b) Assume that A has an LU factorisation and let j {1, . . . , n}. Then we can write A = LU
in block form as
      
A11 A12 L11 0 U11 U12 L11 U11 L11 U12
= =
A21 A22 L21 L22 0 U22 L21 U11 L21 U12 + L22 U22

where A11 , L11 , U11 Cjj . We get

det(Aj ) = det(A11 ) = det(L11 U11 ) = det(L11 ) det(U11 ) = 1 det(U11 ) 6= 0

and thus Aj is non-singular.

To illustrate the failure of LU factorisation when the principal submatrices are not invertible,
consider the following matrix:

0 0 1
A = 1 1 0
0 2 1
This matrix is clearly non-singular: det(A) = 2. However, both principal sub-matrices are
singular:

A1 = (0)
 
0 0
A2 =
1 1

and therefore the factorisation A = LU is not possible. In contrast,


1 1 0
A0 = 0 2 1
0 0 1

(which is a permutation of the rows of A) has non-singular principal sub-matrices

A01 = (1)
 
0 1 1
A2 =
0 2

and hence A0 has an LU factorisation.

Because a non-singular matrix may not have an LU factorisation, while that same matrix
with its rows interchanged may be factorisable, it is of interest to study the eect of permutations
on LU factorisation.

Theorem 2.22 (LU Factorisation with Permutations). If A Cnn is invertible, then there
exists a permutation matrix P Cnn , unit lower triangular L Cnn , and non-singular upper
triangular U Cnn such that P A = LU .
Proof. We prove the result by induction. Note that the base case n = 1 is straightforward:
P = L = 1 and U = A 6= 0. Now assume the theorem is true for the (n 1) (n 1) case. Let
A Cnn be invertible. Choose a permutation P1 such that
11 := a 6= 0.
(P1 A)
2.6. CHOLESKY FACTORISATION 35

This is possible since A invertible implies that the rst column of A has a non-zero entry and
P1 permutes rows. Now as follows:
we can factorise P1 A

a u u
    
1 0 a
P1 A := =
l B l/a I 0 A

which is possible if A = B lu /a = B a1 l u. Now

= det(P1 A)
0 6= det(A) = a det(A)

and so det(A) 6= 0 since a 6= 0. Thus A A = P2 LU .


is invertible and Hence

a u
   
1 0 1 0
P1 A =
l/a I 0 P2 L 0 U

  
1 0 a u
=
l/a P2 L 0 U
a u
   
1 0 1 0
=
0 P2 P2T l/a L 0 U

= P3 LU ,

and therefore
P3T P1 A = L
U .
Note that
L is unit lower triangular and that det(U ) = a det(U ) 6= 0 so that
U is non-singular
upper trangular. Since P = P3T P1 is a permutation the result follows.

2.6 Cholesky Factorisation

Theorem 2.23 (Cholesky Factorisation). If A Cnn is positive denite then there exists a
unique upper triangular R Cnn with positive diagonal elements such that A = R R.

A = R R


Proof. We use induction. The claim is clearly true if n = 1: A = R+ , R = .
Assume the claim holds true for An1 C(n1)(n1) :

An1 = Rn1 Rn1 , (Rn1 )ii > 0 i = 1, . . . , n 1.

Then write A Cnn as  


An1 c
A= .
c
It is straightforward to show that An1 is Hermitian and positive denite, and that is real
and positive, because A is Hermitian and positive denite.
We can now attempt to factor A as follows:
  
Rn1 0 Rn1 r
A = R R :=
r 0

 
An1 Rn1 r
= .
r Rn1 krk22 + 2
36 CHAPTER 2. MATRIX FACTORISATIONS

For this factorisation to work we require r Cn1 and R+ such that


Rn1 r=c
2 = krk22 .

Since Rn1 is non-singular (positive diagonal elements), r and are uniquely dened.
It remains to show R+ . Note that, since A has positive eigenvalues det(A) > 0 and so

0 < det(A) = det(R R) = det(R ) det(R)



= 2 det(Rn1 ) det(Rn1 = 2 det(Rn1 )2 .

Here we have used the fact that det(Rn1 ) = det(Rn1 ) because Rn1 is triangular with real
positive entries. But det(Rn1 ) R since Rn1 has diagonals in R+ . Thus 2 > 0 and we
2+

can choose R+ .

Bibliography

The books [Bha97], [HJ85] and [Ner71] all present matrix factorisations in a theoretical context.
The books [Dem97], [GL96] and [TB97] all present related material, focussed on applications to
computational algorithms. Theorem 2.7 is proved in [Ner71].

Exercises

Exercise 2-1. By following the proof of the existence of a singular value decomposition, nd
an SVD for the following matrices:


  2 0 1
1 2
A= , B= 2 0 1 .
3 1
0 18 0

Exercise 2-2. Show that a symmetric positive denite matrix A Rnn can be written in the
form
zT 1 z T
     
a 0 1 0
A= = 1 .
z A1 z I 0 A1 a1 zz T 0 I

where = a. Based on this observation nd an alternative proof of Theorem 2.23, for the
Cholesky factorisation of real symmetric positive denite matrices.

Exercise 2-3. Let A Rmn have singular values ( i | i = 1, . . . , n ). Show that kAk2 = max
and, if m=n and A is invertible, kA1 k1
2 = min .

Exercise 2-4. Prove Theorem 1.16 concerning properties of similar matrices.


Chapter 3

Stability and Conditioning


Rounding errors lead to computational results which are dierent from the theoretical ones.
The methods from this chapter will help us to answer the following question: how close is the
calculated result to the correct one?
It is common to view the computed solution of a problem in linear algebra as the exact
solution to a perturbed problem. This approach to the study of error is known as backward
error analysis. If the perturbation is small (when measured in terms of machine precision)
the algorithm is termed backward stable; when the perturbation is not small it is referred to as
unstable. Thus the eect of computing the solution in oating point arithmetic, using a particular
algorithm, is converted into a perturbation of the original problem. Once this perturbed problem
is known, the central issue in studying error is to estimate the dierence between the solution
of the original problem and a perturbed problem: the issue of conditioning. Conditioning is a
property of the problem at hand, whilst the size of the perturbation arising in the backward
error analysis framework is a property of the algorithm.
The three problems (SLE), (LSQ) and (EVP) can all be viewed as solving a linear or nonlinear
equation of the form
G(y, ) = 0, (3.1)

p q p
for some G: C C C . Here y is the solution that we want to nd, and the data which
denes the problem. For (SLE) and (LSQ) the problem is linear and y = x (see below for a
proof of this for (LSQ)) and for (EVP) it is nonlinear and y = (x, ). The parameter is thus
dened by the matrix A and, for (SLE) and (LSQ), the matrix A and the vector b. Backward
error analysis will enable us to show that the computed solution y
solves

G(
y , ) = 0.

Conditioning is concerned with estimating y y in terms of . Since conditioning of a given


problem will aect the error analysis for all algorithms applied to it, we start with its study, for
each of our three problems in turn. The following denition will be useful in this chapter.

Denition 3.1. Let f = f () : Cq Cp . We write f = O(kk ) for > 0 if there is a constant


C>0 such that kf k Ckk uniformly as 0.

3.1 Conditioning of SLE

A problem is called well conditioned if small changes in the problem only lead to small changes
in the solution and badly conditioned if small changes in the problem can lead to large changes
in the solution. In this context the following denition is central.

Denition 3.2. The condition number (A) of a matrix A Cnn in the norm kk is the
number (
kAk kA1 k if A is invertible and
(A) =
+ else.

37
38 CHAPTER 3. STABILITY AND CONDITIONING

For this section x a vector norm kk and an induced matrix norm k k. By Theorem 1.27
these norms then satisfy

kAxk kAk kxk for all x Cn , A Cnn


kIk = 1.

Remark. We always have kIk = kAA1 k kAkkA1 k = (A). For induced matrix norms this
implies (A) 1 for every A Cnn .

Example. Let A be real, symmetric and positive-denite with eigenvalues

max = 1 2 n = min > 0.

Then we have kAk2 = max . Since the matrix A1 has eigenvalues 1/1 , . . . , 1/n we nd
1 1
kA k2 = min and thus the condition number of A in the 2-norm is

max
(A) = . (3.2)
min

Proposition 3.3. Let Ax = b and A(x + x) = b + b. Assume b 6= 0. Then


kxk kbk
(A) .
kxk kbk
Proof. If A is not invertible the right hand side of the inequality is + and the result holds.
Thus we assume that A is invertible and we have

kbk = kAxk kAkkxk. (3.3)

Since A1 b = x we get

kxk kA1 bk kA1 kkbk kbk


= kAkkA1 k
kxk kxk kxk kbk
where the last inequality is a consequence of (3.3).

The previous proposition gave an upper bound on how much the solution of the equation
Ax = b can change if the right hand side is perturbed. The result shows that the problem is well
conditioned if the condition number (A) is small. Proposition 3.5 below gives a similar result
for perturbation of the matrix A instead of the vector b. For the proof we will need the following
lemma.

Lemma 3.4. Assume that A Cnn satises kAk < 1 in some induced matrix norm. Then
I + A is invertible and
k(I + A)1 k (1 kAk)1 .
Proof. With the triangle inequality we get

kxk = k(I + A)x Axk


k(I + A)xk + k Axk
k(I + A)xk + kAkkxk

and thus

k(I + A)xk 1 kAk kxk
for every x Cn . Since this implies (I + A)x 6= 0 for every x 6= 0, and thus the matrix I +A is
invertible.
Now let b 6= 0 and x = (I + A)1 b. Then

k(I + A)1 bk kxk 1


= .
kbk k(I + A)xk 1 kAk
3.2. CONDITIONING OF LSQ 39

Since this is true for all b 6= 0, we have

k(I + A)1 bk 1
k(I + A)1 k = sup .
b6=0 kbk 1 kAk

This completes the proof.

Proposition 3.5 (Conditioning of SLE). Let x solve the equations


Ax = b and (A + A)(x + x) = b.

Assume that A is invertible with condition number (A) in some induced matrix norm k k.
Then we have, for (A)kAk < kAk,
kxk (A) kAk
kAk
.
kxk 1 (A) kAk kAk

Proof. Note that


kAk
kA1 Ak kA1 kkAk = (A) < 1. (3.4)
kAk
Here I + A1 A is invertible by Lemma 3.4.
We have
(A + A)x = A x
and thus
(I + A1 A)x = A1 A x.
Using Lemma 3.4 we can write

x = (I + A1 A)1 A1 A x

and we get
kA1 Ak
kxk k(I + A1 A)1 kkA1 Akkxk kxk.
1 kA1 Ak
Using (3.4) and the fact that the map x 7 x/(1 x) is increasing on the interval [0, 1) we
get

kxk (A) kAk


kAk
.
kxk 1 (A) kAk
kAk

This is the required result.

Refer to Exercise 3-2 for a result which combines the statements of propositions 3.3 and 3.5.

3.2 Conditioning of LSQ

In this section we study the conditioning of the following problem: given A Cmn with m n,
m n
rank(A) = n and bC , nd xC kAx bk2 .
which minimizes
For expository purposes consider the case where A and b are real. Notice that then x solving
n
(LSQ) minimizes : R R given by

1 1
(x) := hAx, Axi hAx, bi + kbk22 .
2 2
This is equivalent to minimizing

1 1
hx, A Axi hx, A bi + kbk22 .
2 2
40 CHAPTER 3. STABILITY AND CONDITIONING

(Note that A = AT in this real case). This quadratic form is positive-denite since A A is
Hermitian and has positive eigenvalues under (3.5). The quadratic form is minimized where the
gradient is zero, namely where
A Ax = A b.
Although we have derived this in the case of real A, b, the nal result holds as stated in the
complex case; see Chapter 7.
Consider the reduced SVD A = U V
where U Cmn ,
Rnn with ii = i , and
nn
V C . The assumption on rank(A) implies, by Theorem 2.15,

1 2 n > 0. (3.5)

In particular, A A = V 2 V is invertible. The solution of (LSQ) is hence unique and given by


solution of the normal equations
x = (A A)1 A b.

Denition 3.6. For A Cmn , the matrix A = (A A)1 A Cnm is called the pseudo-
inverse (or Moore-Penrose inverse) of A.
With this notation the solution of (LSQ) is

x = A b.

Denition 3.7. For A Cmn with m n and n positive singular values, dene the condition
number of A in the norm k k to be
(
kAk kA k if rank(A) = n and
(A) =
+ else.

Remark. Since for square, invertible matrices A we have A = A1 , the denition of (A) is
consistent with the previous one for square matrices. As before the condition number depends
on the chosen matrix norm.

Lemma 3.8. Let A Cmn with m n have n positive singular values satisfying (3.5) . Then
the condition number in the k k2 -norm is
1
(A) = .
n

Proof. Let V
A=U be a reduced SVD of A. Then 2V
A A = V and thus

1 U
A = V . (3.6)

This equation is a reduced SVD for A (modulo ordering of the singular values) and it can be
extended to a full SVD by adding m n orthogonal columns to
U and zeros to 1 .
Doing this
we nd, by Corollary 2.13,
1
(A) = kAk2 kA k2 = 1 .
n

Notice that (3.6) implies


U
Ax = AA b = U b

and hence that


kAxk2 kU U bk2 = kU bk2 = kbk2
kAxk2
by the properties of orthogonal matrices. Thus we may dene [0, /2] by cos() = kbk2 .
3.3. CONDITIONING OF EVP 41

Theorem 3.9. Assume that x solves (LSQ) for (A, b) and x + x solves (LSQ) for (A, b + b).
Dene = kAk2 kxk2 /kAxk2 1. Then we have
kxk2 (A) kbk2

kxk2 cos() kbk2

where (A) is the condition number of A in k k2 -norm.


Proof. We have x = A b and x + x = A (b + b). Linearity then gives x = A b and we get

kxk2 kA k2 kbk2 (A)kbk2


=
kxk2 kxk2 kAk2 kxk2
(A)kbk2 (A) kbk2
= = .
kAxk2 cos() kbk2

Remark. The constant (A)/ cos() becomes large if either (A) is large or /2. In either
of these cases the problem is badly conditioned. The rst case involves only the singular values
of A. The second case involves a relationship between A and b. In particular cos() is small if
the range of A is nearly orthogonal to b. Then

1 1
(x) kAxk22 + kbk22
2 2
so that simply setting x=0 gives a good approximation to the minimizer; in this case small
changes in b can induce large relative changes in x.

Proof of the following result is left as Exercise 3-4.

Theorem 3.10. Let and as above. Assume that x solves (LSQ) for (A, b) and x + x solves
(LSQ) for (A + A, b). Then
kxk2  (A)2 tan()  kAk2
(A) +
kxk2 kAk2

where (A) is the condition number of A in k k2 -norm.


More generally, when both the matrix A and vector b are perturbed one has

Theorem 3.11 (Conditioning of LSQ). Assume that x solves (LSQ) for (A, b) and x + x
solves (LSQ) for (A + A, b + b). Let r = b Ax and
 kAk kbk 
2 2
= max , .
kAk2 kbk2

Then
kxk2 (A)  krk2 
2 + ((A) + 1)
kxk2 1 (A) cos()kbk2
where (A) is the condition number of A in k k2 -norm.

3.3 Conditioning of EVP

In this section we will discuss the conditioning of the eigenvalue problem (EVP), i.e. we will
discuss how much the eigenvalues of a matrix can change, if the matrix is changed by a small
amount. A preliminiary result is given in the following theorem: the eigenvalues change con-
tinuously when the matrix is changed. A more detailed analysis is presented in the rest of the
section.
42 CHAPTER 3. STABILITY AND CONDITIONING

Theorem 3.12 (Continuity of Eigenvalues). Let (A) denote the vector of eigenvalues of the
matrix A Cnn , ordered in decreasing absolute value, and repeated according to algebraic
multiplicities. Then : Cnn Cn is continuous.
Proof. This follows since the eigenvalues are the roots of the characteristic polynomial of A; this
has coecients continuous in A.

Before discussing the conditioning of eigenvalue problems we make a brief diversion to state
a version of the implicit function theorem (IFT) which we will then use.

Theorem 3.13 (IFT). Let G : Cl Cm Cm be two times dierentiable and assume that
G(x, y) = 0 for some (x, y) Cl Cm . If Dy G(x, y) is invertible then, for all suciently small
x, there is a unique solution y of the equation

G(x + x, y + y) = 0

in a small ball at the origin. Furthermore the solution y satises


Dx G(x, y) x + Dy G(x, y) y = O kxk2 .


Denition 3.14. For A Cnn , the eigenvalue condition number for an eigenvalue C of a
nn
matrix AC is (
|hx, yi|1 , if hx, yi =
6 0 and
A () =
, else,

where x and y are the normalised right and left eigenvectors for eigenvalue .

Theorem 3.15 (Conditioning of EVP). Let be a simple eigenvalue of A corresponding to right


and left normalized eigenvectors x and y. Assume hy, xi 6= 0. Then for all suciently small
A Cnn the matrix A + A has an eigenvalue + with
1  
= hy, A xi + O kAk22 .
hy, xi

Proof. Dene G : Cnn Cn C Cn C by

 
Ax x
G(A, x, ) = 1 2 1 .
2 kxk2 2

Thus we have G(A, x, ) = 0 if and only if x is a normalized eigenvector of A with eigenvalue


and clearly G C .
We will apply the IFT with l = n2 and m = n + 1 to write (x, ) as a function of A: Provided
the invertibility condition holds, the equation

G(A + A, x + x, + ) = 0

has, for suciently small A, a solution (x, ) satisfying

 
x
= O kAk2 .

DA G(A, x, ) A + D(x,) G(A, x, )

Now computing the derivatives of G gives

 
A I x
D(x,) G(A, x, ) = C(n+1)(n+1)
x 0

and, for every C Cnn ,  


Cx
DA G(A, x, ) C = Cn+1
0
3.4. STABILITY OF ALGORITHMS 43

To show that D(x,) G(A, x, ) is invertible assume D(x,) G(A, x, ) (y, ) = 0 for y Cn and
C. It suces to show that this implies (y, ) = 0. From

(A I)y x = 0
x y = 0

we get
(A I)2 y = 0 and hx, yi = 0.
This implies (A I) has a two dimensional null-space, contradicting simplicity by Lemma 1.17,
unless y = 0. Hence it follows that y = 0, = 0, and so D(x,) G is invertible.
Finally,
      
A x A I x x
= + .
0 x 0
satises
kk2 + || = O kAk22 .


by Theorem 3.13. Using


= A x + (A I)x x.
we nd
y x + y A x = y = O kAk22


and the result follows.

Corollary 3.16. Let be a simple eigenvalue of A corresponding to right and left normalized
eigenvectors x and y. Then for all suciently small A Cnn the matrix A + A has an
eigenvalue + with  
|| A () kAk2 + O kAk22

.

3.4 Stability of Algorithms

The stability of an algorithm measures how susceptible the result is to rounding errors occurring
during the computation. Consider the general framework for all our problems, namely equation
(3.1), where y denotes the solution we wish to compute and the input data. Then we may
view a locally unique family of solutions to the problem as a mapping from the input data to
the solution: y = g(). For example, for (SLE) we have y = x, = (A, b) and g() = A1 b. We
assume that the algorithm returns the computed 6= y which can be represented as
result y the
exact image y = g(
) of a dierent input value .

Denition 3.17. The quantity y = y y is called the forward error and = is called a
backward error. If is not uniquely dened then we make the choice of which results in minimal
kk. The relative backward error is kk/kk and the relative forward error is kyk/kyk.

exact y

uted
comp y


exact y
input data

results
44 CHAPTER 3. STABILITY AND CONDITIONING

As an illustration of the concept, imagine that we solve (SLE) and nd x xb =


satisfying A
b. Then x := x x
is the forward error and b the backward error. The following theorem
is a direct consequence of Proposition 3.3.

Theorem 3.18. For the problem (SLE) we have


rel. forward error (A) rel. backward error.
Internally computers represent real number using only a nite number of bits. Thus they
can only represent nitely many numbers and when dealing with general real numbers rounding
errors will occur. Let F R be the set of representable numbers and let fl : R F be the
rounding to the closest element of F.
In this book we will use a simplied model for computer arithmetic which is described by the
following two assumptions. The main simplication is that we ignore the problems of numbers
which are unrepresentable because they are very large (overows) or very close to zero (under-
ows). We simply use a model in which every bounded interval of R is approximated by a nite
set of numbers from F in such a way that the following assumption holds:

Assumption 3.19. There is a parameter m > 0 ( machine epsilon) such that the following
conditions hold.

A1: For all xR there is an (m , +m ) with

fl(x) = x (1 + ).

A2: For each operation {+, , , /} and every x, y F there is a (m , +m ) with

x ~ y = (x y) (1 + )

where ~ denotes the computed version of .

Denition 3.20. An algorithm, is called backward stable if the relative backward error satises
kk
= O(m ).
kk

Remark. Some algorithms which are backward stable by the denition are termed unstable in
practice. This occurs when the constant in the O(m ) term depends badly on some measure of
the problem dimension  for instance grows exponentially in n for the (SLE). For this reason,
rather than just stating a backward stability result, we will often try to identify a norm k k and
constant C(n) such that
kk
C(n)m .
kk
This, of course, establishes backward stability. But it gives more detailed information on exactly
when the backward error is small, as a function of problem size. Note, however, that statements
about the dependence of the constants in the backward error are norm-dependent. The denition
of backward stable itself is not.

The typical way to use backward stability is in a two-step procedure as follows. In the rst
step, backward error analysis, one shows that the algorithm in question is backward stable, i.e.
that the inuence of rounding errors can be represented as a small perturbation x of the original
problem. (This is a property of the algorithm used.) In the second step one uses results like
Theorem 3.18 about the conditioning of the problem (which does not depend on the algorithm
used) to show that the forward error is also small. Together these steps show that the calculated
result is close the the exact result.
We give a simple example of a backward stability calculation.

Lemma 3.21. The calculated subtraction is backward stable.


3.4. STABILITY OF ALGORITHMS 45

Proof. The exact result of a subtraction is given by g(x1 , x2 ) = x1 x2 , the computed result is
g(x1 , x2 ) = fl(x1 ) fl(x2 ). Using Assumption A1 we get

fl(x1 ) = x1 (1 + 1 ) and fl(x2 ) = x2 (1 + 2 )


with |1 |, |2 | < m . Using Assumption A2 we get

fl(x1 ) fl(x2 ) = fl(x1 ) fl(x2 ) (1 + 3 )
where |3 | < m . This gives

fl(x1 ) fl(x2 ) = x1 (1 + 1 ) x2 (1 + 2 ) (1 + 3 )
= x1 (1 + 1 )(1 + 3 ) x2 (1 + 2 )(1 + 3 )
= x1 (1 + 4 ) x2 (1 + 5 )
where4 = 1 + 3 + 1 3 and 5 = 2 + 3 + 2 3 satisfy |4 |, |5 | 2m + O(2m ) = O(m ) for
m 0.
Thus for the input error we can choose
   
x1 x1 (1 + 4 )
x= , x
= , x
x = x
x2 x2 (1 + 5 )
p
and we get kxk2 = 24 x21 + 25 x22 O(m )kxk2 . This completes the proof.

Remarks.
1. The above proof is a case where the x
from the denition of the backward error is not
uniquely determined. But since we are only interested in the x
which minimizes the
backward error, we can choose any x
which gives the result kxk2 O(m )kxk2 . The
optimal x
can only be better.

2. Similar proofs show that the operations , and are also backward stable.

3. Proofs of backward stability have to analyse the inuence of rounding errors and thus are
typically based on our Assumptions A1 and A2 about computer arithmetic. Since they
tend to be long and involved we omit most of these proofs in this book. We frequently
study the statements of backward stability results, however, as they give useful practical
insight into the ecacy of algorithms.

Bibliography

Conditioning for the basic problems of numerical linear algebra is a central part of the pre-
sentation in [Dem97]. Numerical stability is introduced, and thoroughly studied, in a range of
contexts arising in numerical linear algebra, in [Hig02]. Theorem 3.10 may be found in [Dem97]
and Theorem 3.11 in [Hig02]. The Implicit Function Theorem 3.13 is proved in [CH96].

Exercises

Exercise 3-1. Choose a matrix norm and calculate the condition number of the matrix
 
1
A=
1 1
in this norm.

Exercise 3-2. Assume that A is invertible with (A)kAk/kAk < 1 in some induced matrix
norm. Let x be a solution of Ax = b and let x be a solution of (A + A)x = b + b. Show that
k
x xk (A)  kAk kbk 
kAk
+ .
kxk 1 (A) kAk kbk
kAk
46 CHAPTER 3. STABILITY AND CONDITIONING

Exercise 3-3. Let A = U V T be an SVD factorization of A Rmn with m n and rank=n.


Prove that there exists a matrix A with kAk2 = min such that A + A does not have
rank n.

Exercise 3-4. Prove Theorem 3.10.

Exercise 3-5. Show that the calculated arithmetic operations , and are backward stable.
Chapter 4

Complexity of Algorithms
In this chapter we study methods which quantify how long it takes to solve a numerical problem
on a computer. Specically we focus on the question how the run time, as measured by the
number of algebraic operations, depends on some measure of the problem size.

4.1 Computational Cost

The computational cost of an algorithm is the amount of resources it takes to perform this algo-
rithm on a computer. For simplicity here we just count the number of oating point operations
(additions, subtractions, multiplications, divisions) performed during one run of the algorithm,
together with a count of the number of square roots taken; on one occasion we also count the
number of comparisons between real numbers. A more detailed analysis would also take factors
like memory usage into account.

Denition 4.1. The cost of an algorithm is


C(n) = number of additions, subtractions, multiplications, divisions and square roots

where n is the size of the input data (e.g. the number of equations etc.).

The following denition provides the notation we will use to describe the asyptotic compu-
tation cost of an algorithm, this is the behaviour of the cost C(n) for n . (In the case of
the notation O it is closely related to the notation used in Denition 3.1 from the last chapter,
except that there we consider the order of magnitude of small quantities. Here our concern with
cost leads to consideration of the order of magnitude of large quantities. However, the denitions
below can be adapted to the case x0 and we will sometimes use this without comment.)

Denition 4.2. For f, g : N N or f, g : R+ R+ we write

 g(x)
g(x) = O f (x) if lim sup < ,
x f (x)
 g(x)
g(x) = f (x) if lim inf > 0,
x f (x)

g(x) = f (x) if g(x) = (f (x)) and g(x) = O(f (x)),
g(x)
g(x) f (x) if lim = 1.
x f (x)

Example.

From the denition we can see that the property g(x) f (x) implies g(x) = f (x) .

Example. Using this notation we can write 5n2 + 2n 3 5n2 = (n2 ), n2 = O(n3 ) and
2
n = (n).

47
48 CHAPTER 4. COMPLEXITY OF ALGORITHMS

Algorithm (standard inner product algorithm).


input: x, y Cn
output: s = hx, yi

1: s = x
1 y1
2: for i = 2, . . . , n do
3: s=s+x i yi
4: end for
5: return s

Theorem 4.3. The standard inner-product algorithm on Cn has computational cost C(n) =
(n). Any algorithm for the inner product has C(n) = (n).

Proof. The standard inner-product algorithm above has n multiplications and n 1 additions,
i.e. C(n) = 2n 1 = (n). Sketch of the proof for C(n) = (n): since each of the products xi yi
is independent of the others, we have to calculate all n of them.

Remark. Giving a real proof for the lower bound in the above theorem would require a detailed
model about what an algorithm actually is. One would for example need to be able to work in
a framework where guessing the result in just one operation and returning it is not a proper
algorithm. We avoid these diculties here by only giving sketches for lower bounds.

Algorithm (standard matrix-vector multiplication).


input: A Cmn , x Cn
m
output: y = Ax C

let a1 , . . . , am denote the rows of A.

1: for i = 1, . . . , m do
2: let yi = hai , xi using the standard inner product algorithm
3: end for
4: return y

Theorem 4.4. The computational complexity of the standard method for Cnn -matrix-vector
multiplication satises C(n) = (n2 ). Any method has C(n) = (n).
Proof. See Exercise 4-2.

Algorithm (standard matrix-matrix multiplication).


input: A Clm with rows a1 , . . . , al , B Cmn with columns b1 , . . . , bn
ln
output: C = AB C

1: for i = 1, . . . , l do
2: for j = 1, . . . , n do
3: let cij = hai , bj i using the standard inner product algorithm
4: end for
5: end for
6: return C

Theorem 4.5. The standard method for Cnn -matrix-matrix multiplication satises C(n) =
(n3 ). Any method has C(n) = (n2 ).

Proof. We have to calculate n2 inner products. Thus the asymptotic computational cost is
2 3
C(n) = n (n) = (n ). Sketch for the lower bound: we have to calculate n2 entries of the
2
resulting matrix and thus C(n) n .
4.2. MATRIX-MATRIX MULTIPLICATION 49

4.2 Matrix-Matrix Multiplication

In Theorem 4.5 there is a gap between the order (n3 ) of the standard method for multiplying
2
matrices and the lower bound (n ). The purpose of this section is to show that there are
actually algorithms with an asymptotic order which is better than (n3 ).
nn
For A, B C , n even and D = AB write
     
A11 A12 B11 B12 D11 D12
A= ,B = ,D =
A21 A22 B21 B22 D21 D22

where Aij , Bij , Dij Cn/2n/2 . Then we have

D11 = A11 B11 + A12 B21


D12 = A11 B12 + A12 B22
D21 = A21 B11 + A22 B21
D22 = A21 B12 + A22 B22 .

The above method calculates the product of two n n-matrices using eight multiplications
of (n/2) (n/2)-matrices. Using this idea recursively leads to an algorithm for nn matrix
multplication which builds the answer from the multplication of small matrices. The cost then
satises
C(n) = 8C(n/2) + n2
with the n2 contribution coming from the 4 additions of (n/2) (n/2) matrices. Exercise 4-4
shows that this leads to a cost of (n3 ) and is thus no better than the standard algorithm.
There is, however, another way to calculate the entries of the matrix D, which is algebraically
more complicated, but, crucially, only uses seven multiplications of (n/2) (n/2)-matrices. It
will transpire that this fact can be utilised to get an asymptotically faster method of multiplying
matrices. Using

P1 = (A11 + A22 )(B11 + B22 )


P2 = (A21 + A22 )B11
P3 = A11 (B12 B22 )
P4 = A22 (B21 B11 )
P5 = (A11 + A12 )B22
P6 = (A21 A11 )(B11 + B12 )
P7 = (A12 A22 )(B21 + B22 )

we can write

D11 = P1 + P4 P5 + P7
D12 = P3 + P5
D21 = P2 + P4
D22 = P1 + P3 P2 + P6 .

Algorithm (Strassen Multiplication).


input: A, B Cnn with n = 2k for some k N0
nn
output: D = AB C

1: if n = 1 then
2: return AB
3: else
4: calculate P1 , . . . , P7 (using recursion)
5: calculate D11 , D12 , D21 and D22
6: return D
7: end if
50 CHAPTER 4. COMPLEXITY OF ALGORITHMS

Remark. Recursive algorithms of this kind are called divide and conquer algorithms.
n n
Using the Strassen-multiplication we can calculate D with 7 multiplications of 2 2 -matrices
n n
and 18 additions of
2 2 -matrices. Thus we nd

C(n) = 7C(n/2) + 18n2 /4.

Lemma 4.6. The Strassen-multiplication has computational cost C(2k ) = 7 7k 6 4k for all
k N0 .

Proof. For k=0 we get C(20 ) = C(1) = 1.


Assume the claim is true for k N0 . Then

C(2k+1 ) = 7C(2k ) + 18 (2k )2


= 7(7 7k 6 4k ) + 18 4k
= 7 7k+1 (7 6 18)4k
= 7 7k+1 6 4k+1 .

Now the claim follows by induction.

Theorem 4.7 (Strassen Multiplication). Using the Strassen-algorithm it is possible to con-


struct an algorithm for matrix-matrix multiplication with asymptotic computational cost C(n) =
(nlog 7 ).2

Proof. We rst prove the theorem for n = 2k , k N0 . Then we have

k
7k = 2log2 (7 )
= 2k log2 7 = (2k )log2 7 = nlog2 7

and

4k = (2k )2 = n2 .
Using the lemma we get

C(n) = 7 7k 6 4k = 7nlog2 7 6 n2 = (nlog2 7 ).

This nishes the proof.


If n is not a power of 2 then we argue as follows. We can extend the matrices: choose k N0
with 2k n > 2k1 and dene A, C2k 2k by
B
   
A 012 B 012
A= , B=
021 022 021 022
k k k k
n)
where 012 Cn(2 021 C(2 n)n and 022 C(2 n)(2 n)
, are zero-matrices of appro-
priate size. The and B
product of A may again be written in block form:

 
AB 012
AB =
021 022

Thus we can nd the product of the n n-matrices A and B by multiplying the 2k 2k -matrices
A and B
k
with the Strassen-algorithm. Since we have n 2 2n, the extended matrices are at

most double the size of the original ones, and because (2n) = (n ) for every > 0 the result
k log2 7
for n = 2 implies C(n) = (n ) as required.

Remark. By ideas similar to (but more involved than) those used by Strassen it is possible to
construct an algorithm which multiplies matrices in O(n2.376... ). It remains an open question
whether the exponent can be made arbitrarily close to 2.
4.2. MATRIX-MATRIX MULTIPLICATION 51

Often the rst method encountered for matrix inversion is Cramer's rule:

adj(A)
A1 = .
det(A)

Using the naive way to calculate det(A) takes (n!) operations, and so for Cramer's rule we
would get asymptotic computational cost

C(n) = (n!) = (en/2 nn ).

Although of theoretical importance, this is a totally inecient method to invert a matrix. The
next result indicates why.

Theorem 4.8 (Complexity of Matrix Inversion). If there is a method to multiply n n-matrices


with asymptotic computational cost O(n ) for some 2, then it is also possible to invert
n n-matrices with cost O(n ).
Proof. Let A Cnn be invertible. The proof consists of three steps.
1) As in the previous theorem we can restrict ourselves to the case n = 2k for some k N0 .
If n is not of this form we extend the matrix: choose k N0 with 2 n > 2k1 and dene the
k
k k
matrix A C2 2 by
 
A 012
A =
021 I22
k k
n) n)n
where 012 Cn(2 and 021 C(2 are zero-matrices and I22 is the (2k n) (2k n)-
identity matrix. Then the inverse of this matrix is

A1
 
012
A1 =
021 I22

A by inverting the 2k 2k -matrix A.


 
and we can invert Since (2n) = n the asymptotic
cost is unchanged.
2) Since A is invertible we have

x (A A)x = (Ax) Ax = kAxk22 > 0

for every x 6= 0 and thus A A is positive denite and therefore invertible. We can write

A1 = (A A)1 A .

This allows us to invert A with cost C(n) = D(n) + O(n ) where D is the cost of inverting a
Hermitian, positive denite matrix and O(n ) is the cost for matrix-matrix multiplication. So
we can restrict ourselves to the case of Hermitian, positive denite matrices.
3) To determine the cost function D B be
let Hermitian and positive denite:

 
B11 B12
B=
B12 B22
n n 1
where the Bjk 2 2 -matrices. Let
are S = B22 B12 B11 B12 , known as the Schur complement.
A direct calculation shows that
 1 1 1 1
B12 S 1 B12

B12 S 1

B11 + B11 B11 B11
B 1 = 1 .
S 1 B12

B11 S 1

From exercise 4-3 we know that the matrices B11 and S


are invertible.
This method to calculate B 1 needs 2 inversions of
n n
2 2 -matrices (namely of B11 and of
n n n n
S ), a 2 2 -matrices and
products of b sums/subtractions of
2 2 -matrices where a and b are
independent of n. This shows that

D(n) 2D(n/2) + O(n ) + O(n2 )


52 CHAPTER 4. COMPLEXITY OF ALGORITHMS

where O(n ) is the cost for the multiplications and O(n2 ) is the cost for the additions and
subtractions.
From Theorem 4.5 we already know 2, so we can simplify the above estimate to

D(n) 2D(n/2) + cn

for some constant c > 0. With an induction argument (see Exercise 4-5 below) one can conclude

c
D(2k ) (2k )
1 21
and thus we get
D(2k ) = O (2k )


for k . This nishes the proof.

4.3 Fast Fourier Transform

The previous section shows that the inner-product based algorithm for matrix-matrix multipli-
cation can be substantially improved by use of a recursive divide and conquer approach. We now
attempt a similar improvement of matrix-vector multiplication, over the inner-product based al-
gorithm. However, in contrast to the matrix-matrix case, the idea we present here works only for
a restricted class of matrices  the discrete Fourier transform matrices. However other, similar,
ideas have also been developed for other discrete transforms. The technique we present here is
prototypical of a whole range of results available for matrix-vector multiplications of structured
matrices.
As an abbreviation write
n = e2i/n
and dene the vectors l Cn for l = 0, . . . , n 1 by

1
l = (e2il/n )0 , (e2il/n )1 , . . . , (e2il/n )n1

n
1
= 1, nl , n2l , . . . , n(n1)l .

n

The following lemma shows that these vectors form a basis of Cn .

Lemma 4.9. hl , m i = lm .

Proof.
n1
1 X (lm)j
hl , m i = .
n j=0 n

If l = m, clearly hl , m i = 1. If l 6= m the geometric sum gives

(lm)n
1 n 1
.
n n(lm) 1

(lm)n
But n = e2i(lm) = 1 implying hl , n i = 0.

Hence the {l }n1


l=0 form an orthonormal basis for Cn . Thus any vector u Cn can be
expanded as
n1
X
u= al l ,
l=0

where u, l Cn and the al C.


4.3. FAST FOURIER TRANSFORM 53

The al can be found using orthogonality:

n1
X n1
X
hm , ui = al hm , l i = al ml = am .
l=0 l=0

Thus am = hm , ui.
Writing this out, with n denoted by for simplicity, gives

a0
1 1 1 ... 1

a1 1 2 ... n1

..
. = 1 2 4 2(n1)

1 ...
u
.

.
n .. ..
.. . .
2
n1 2(n1)
an1 1 ... (n1)
| {z }
:=An

Thus to expand arbitrary u in the basis {k } we need to be able to perform matrix vector
multiplication by An Cnn .
Theorem 4.10 (Complexity of FFT Matrix-Vector Multiply). Let n = 2k . Matrix-vector mul-
tiplication by An Cnn can be performed in O(n log n) time.
Proof. We let
Fk = nAn with n = 2k .
Note that Fk1 Cn/2n/2 and is built from n/2 . We prove that multiplication by Fk can be
k
eected in O(k2 ) time, which gives the desired result.
Let
k
x = (x0 , x1 , . . . , xn1 )T C2
k1
xe = (x0 , x2 , . . . , xn2 )T C2
k1
xo = (x1 , x3 , . . . , xn1 )T C2 .
Knowledge of xo , xe gives x. Let

y = Fk x, y = ((y t ) , (y b ) ) .
Then, for j = 0, . . . , 2k 1,
k
2X 1
yj = 2jlk xl
l=0
2k1
X1 2k1
X1 j(2l+1)
= 22jl
k x2l + 2k x2l+1 .
l=0 l=0
k k1 j(2l+1)
But 22jl
k = e
2i(2jl)/2
= e2i(jl)/2 = 2jlk1 , and so 2k = 2j k 22jl j jl
k = 2k 2k1 . Hence,

2k1
X1 2k1
X1
yj = 2jlk1 xel + 2j k 2jlk1 xol .
l=0 l=0

jl (j+M )l j j+M
Now M = M and 2M = 2M , and so

2k1
X1 2k1
X1
yj = 2jlk1 xel + 2j k 2jlk1 xol , j = 0, . . . , 2k1 1
l=0 l=0
2k1
X1 2k1
X1
y2k1 +j = 2jlk1 xel 2j k 2jlk1 xol , j = 0, . . . , 2k1 1.
l=0 l=0
54 CHAPTER 4. COMPLEXITY OF ALGORITHMS

Then, in matrix notation,

y t =Fk1 xe + DFk1 xo = y e + Dy o
y b =Fk1 xe DFk1 xo = y e Dy o .

where
k1
1
D = diag{20k , 21k , . . . , 22k }.

Thus, if y e = Fk1 xe andy o = Fk1 xo are known, then y is found through O(2k ) multipli-
cations and additions (since D is diagonal). If Lk = C(2k ), C(n) being cost of multiplication by
An , then

Lk 2Lk1 + a2k .
For induction, assume Lk (ak + L0 )2k . For the inductive step l 1 7 l assume Ll1
l1
(a(l 1) + L0 )2 , then

Ll (a(l 1) + L0 )2l + a2l = (al + L0 )2l ,

completing the proof.

4.4 Bidiagonal and Hessenberg Forms

We conclude with the statement of a pair of matrix factorization results. Their importance is
intricately bound up with the complexity of the algorithms to eect them. It is for this reason
that they appear in this chapter.

Lemma 4.11. For any A Cnn there exist unitary U, V and upper bidiagonal B such that
A = U BV .

Furthermore this factorization can be achieved in O(n3 ) operations.


This result may be proved by means of Householder reections, which are introduced in
Chapter 5. Similar techniques enable proof of the second result.

Lemma 4.12. For any A Cnn there exists unitary Z and upper Hessenberg T such that
A = ZT Z .

Furthermore this factorization can be achieved in O(n3 ) operations.


If we remove the stipulation that the factorizations be achieved in O(n3 ) then, by using the
SVD and the Schur factorisation respectively, we may obtain the desired results; indeed B is
then diagonal and T upper triangular. However, both the SVD and the Schur factorizations
T
reveal eigenvales (of A A and A respectively). Hence they cannot be achieved, for arbitary n, in
a nite number of arithmetic operations (see Chapter 8). The point, then, of these two lemmas,
is that the factorizations can be achieved at O(n3 ) cost. The rst result may be proved by means
of Householder reections, which are introduced in Chapter 5. Similar techniques enable proof
of the second result.

Bibliography

A pedagogical introduction to the study of complexity of algorithms is the book [CLRS01].


Although aimed at a wide range of problems in computer science in general, it includes a wealth of
information concerning numerical algorithms. The topic of computing lower bounds, as touched
upon in Theorem 4.3, is well-explained in this book. Lemmas 4.11 and 4.12 are proved in [TB97].
4.4. BIDIAGONAL AND HESSENBERG FORMS 55

Exercises

Exercise 4-1.

Let (i) f (n) = n2 1 + sin(n) and (ii) f (n) = n + n2 . In each case, which of the
following are true:

f (n) = O(1);
f (n) = O(n);
f (n) = O(n2 );
f (n) = (1);

f (n) = (n);
f (n) = (n2 ).

Exercise 4-2. Show that the standard method for matrix-vector multiplication has asymptotic
computational cost C(n) = (n2 ).

Exercise 4-3. Show that the matrices S and B11 in the proof of Theorem 4.8 are invertible.

Exercise 4-4. Prove that if C(n) = 8C(n/2) + (n2 ) then C(n) = (n3 ). This shows that a
divide and conquer approach to matrix multiplication which requires 8 matrix multiplications of
the smaller sub-matrices cannot beat the complexity of standard matrix-matrix multiplication;
in contrast, the Strassen method, using 7 multiplications, does.

Exercise 4-5. Show by induction that D(1) = 1 and

D(n) 2D(n/2) + cn

for some constant c>0 and 2 implies

c
D(2k ) (2k )
1 21
for all k N0 .

Exercise 4-6. Let A Rnn wheren = 2k . Noting that the LU factorisation of a matrix A
can be written as   
L11 0 U11 U12
A = LU = ,
L21 L22 0 U22
design a divide and conquer strategy which results in LU factorisation of A in O(na ) operations
a
where O(n ) is the cost of matrix multiplication.

Exercise 4-7. Prove Lemma 4.11.

Exercise 4-8. Prove Lemma 4.12.


56 CHAPTER 4. COMPLEXITY OF ALGORITHMS
Chapter 5

Systems of Linear Equations


In this chapter we analyse algorithms for (SLE): given A Cnn and b Cn , nd an x Cn
such that Ax = b.
We present several methods to solve this problem. The common idea is to factorise the matrix
A into simpler matrices M and N as A = M N and to solve rst the system M y = b and then
N x = y . The result is a vector x with Ax = M N x = M y = b and thus we have solved (SLE).
In each section of the chapter we present a method, discuss its computational complexity, and
study its backward error properties.

5.1 Gaussian Elimination

Gaussian elimination is the most commonly used method to solve systems of linear equations.
The method is based on a systemisation of the elimination procedure used to solve simultaneous
linear equations by hand. The mathematical background of the algorithm is the LU factorisation
described in Theorems 2.21 and 2.22.

The Method
Under the conditions of Theorem 2.21 the matrix A can be converted into upper triangular shape
by multiplying lower triangular matrices from the left. We introduce the subsequent theoretical
developments with an example.

Example. Consider the matrix



2 1 1
A = 4 3 3 .
8 7 9
Then we can create zeros in the rst column below the diagonal by subtracting multiples of the
rst row from the other rows. In matrix notation this can be written as

1 2 1 1 2 1 1
L1 A = 2 1 4 3 3 = 0 1 1 .
4 1 8 7 9 0 3 5

Repeating this for the second column gives


1 2 1 1 2 1 1
L2 L1 A = 1 0 1 1 = 0 1 1 .
3 1 0 3 5 0 0 2

We have found lower triangular matrices L1 and L2 and an upper triangular matrix U with
A = (L2 L1 )1 U . It turns out that (L2 L1 )1 itself is unit lower triangular with inverse that is
easily calculable. Hence we have found an LU factorisation of A during the process of elimination.

57
58 CHAPTER 5. SYSTEMS OF LINEAR EQUATIONS

The ideas in this example generalise to arbitrary dimension. The following lemma helps to
calculate (Lk L2 L1 )1 = L1 1 1
1 L2 Lk in general.

Lemma 5.1. a) Let L = (`ij ) be unit lower triangular with non-zero entries below the diagonal
only in column k. Then L1 is also unit lower triangular with non-zero entries below the diagonal
only in column k and we have (L1 )ik = `ik for all i > k.
b) Let A = (aij ) and B = (bij ) be unit lower triangular n n matrices where A has non-
zero entries below the diagonal only in columns 1, . . . , k and B has non-zero entries below the
diagonal only in columns k + 1, . . . , n. Then AB is unit lower triangular with (AB)ij = aij for
j {1, . . . , k} and (AB)ij = bij for j {k + 1, . . . , n}.
Proof. a) Multiplying L with the suggested inverse gives the identity. b) Direct calculation.

Example. For the matrices L1 and L2 from the previous example we get

1 1 1
(L2 L1 )1 = L1 1
1 L2 = 2 1 1 = 2 1 .
4 1 3 1 4 3 1
Thus we have found the LU factorisation

2 1 1 1 2 1 1
4 3 3 = 2 1 1 1 .
8 7 9 4 3 1 2

Recall the notation for principal submatrix from Denition 2.20. The technique to convert A
into an upper triangular matrix by multiplying lower triangular matrices leads to the following
algorithm:

Algorithm LU (LU factorisation).


input: A Cnn with det(Aj ) 6= 0 for j = 1, . . . , n
nn
output: L, U C where A = LU is the LU factorisation of A

1: L = I , U = A
2: for k = 1, . . . , n 1 do
3: for j = k + 1, . . . , n do
4: ljk = ujk /ukk
5: (uj,k , . . . , uj,n ) = (uj,k , . . . , uj,n ) lj,k (uk,k , . . . , uk,n )
6: end for
7: end for
Remarks. Line 5 of the algorithm subtracts a multiple of row k from row j, causing ujk = 0
without changing columns 1, . . . , k1. This corresponds to multiplication with a lower triangular
matrix Lk as in the example above. Thus after the loop ending in line 6 is nished, the current
value of the matrix U is Lk L1 A and it has zeros below the diagonal in columns 1, . . . , k .
Since the principal sub-matrices Aj are non-singular and the matrices Lj are unit triangular
we get, for A = Lk L1 A,
det Ak+1 = det Ak+1 6= 0
and thus, since A has zeros below the diagonal in column k, we must have ukk 6= 0 in line 4.
Lemma 5.1 shows that the algorithm calculates the correct entry ljk for the matrix L =
(Ln L1 )1 .

The last missing building block for the Gaussian elimination method is the following algorithm
to solve systems of linear equations when the coecient matrix is triangular.

Algorithm BS (back substitution).


input:U Cnn non-singular, upper triangular and b Cn
n
output: x C with U x = b
5.1. GAUSSIAN ELIMINATION 59

1: for j = n, . . . , 1 do
n
1  X 
2: xj = bj ujk xk
ujj
k=j+1
3: end for
Remarks.
1. Since U is triangular we get

n  1 n n
X X  X
(U x)i = uij xj = uii bi uik xk + uij xj = bi .
j=i
uii j=i+1
k=i+1

Thus the algorithm is correct.

2. The corresponding algorithm to solve Lx = b where L is a lower triangular matrix is called


forward substitution.

Combining all our preparations we get the Gaussian elimination algorithm to solve the prob-
lem (SLE):

Algorithm GE (Gaussian elimination).


input: A Cnn with det(Aj ) 6= 0 for j = 1, . . . , n and b Cn
n
output: x C with Ax = b

1: nd the LU factorisation of A


2: solve Ly = b using forward substitution
3: solve U x = y using back substitution

Remarks.
1. The result of this algorithm is an x Cn with Ax = LU x = Ly = b and thus the algorithm
gives the correct result.

2. The vector y may be calculated at the same time as the LU factorisation of A, by adding
b as an additional column in A.

Computational Complexity
Lemma 5.2. The LU factorisation algorithm has computational cost
2 3 1 2 7
C(n) = n + n n.
3 2 6
Proof. We have to count the number of oating point operations in the LU factorisation algo-
rithm. Line 5 requires (n k + 1) multiplications and (n k + 1) subtractions, i.e. 2(n k + 1)
operations. Line 4 contributes one division. Thus the loop starting at line 3 needs
 (n k) 1 +
2(n k + 1) operations. Considering the outer loop the total number of operations is

n1
X  n1
X n1
X n1
X
C(n) = (n k) 1 + 2(n k + 1) = 2(n k)2 + 3(n k) = 2 l2 + 3 l.
k=1 k=1 l=1 l=1

The claim follows now by induction.

Remark. Using the -notation the claim of the lemma becomes C(n) 32 n3 .

Lemma 5.3. Forward substitution and back substitution have computational cost C(n) n2 .
60 CHAPTER 5. SYSTEMS OF LINEAR EQUATIONS

Proof. Calculating xj in the back substitution algorithm needs 2(n j) + 1 operations. Thus
the total cost is

n
X n1
X
k + n = n(n 1) + n = n2 .

C(n) = 2(n j) + 1 = 2
j=1 k=0

A similar argument applies to the situation of forward substitution.

Theorem 5.4 (Computational Complexity of Gaussian Elimination). The asymptotic compu-


tational cost of the GE algorithm is of order
2 3
C(n) n .
3
Proof. This is an immediate consequence of the previous two lemmas.

Error Analysis
The following theorem is proved by using the Assumptions (A1) and (A2) concerning computer
arithmetic. The proof is somewhat involved and we omit it here. We include the statement,
however, because it is an important building block in the overall understanding of backward
error analysis for Gaussian elimination. We use the notation || for matrices, introduced just
prior to Lemma 1.34.

Theorem 5.5. The forward and back substitution algorithms are backward stable: the computed
solution x of a triangular system of equations T x = b satises (T +T )x = b for some triangular
matrix T Cnn with, for m suciently small,
nm
|T | |T |.
1 nm
Note that this theorem implies that T has the same triangular structure as T itself. The
result shows that, for nm suciently small, there is a constant c independent of n such that,
by Lemma 1.34,
kT k cnm kT k .
Applying this result to backsubstitution we nd that the computed solution of U x = y, x
, solves
(U + U )
x=y and that
kU k cnm kU k .
Using Proposition 3.5 about the conditioning of (SLE) we get an upper bound on the error in
the computed result of back substitution:

k
x xk
c(U )nm
kxk

for (U )nm suciently small.


Thus the backward substitution step of Gaussian elimination is numerically stable and does
not introduces large errors, provided (U )nm remains small. The same holds for forward
substitution. The LU factorisation algorithm, however, is not backward stable. To explain this
fact we describe two more results from backward error analysis.

Theorem 5.6. The computed LU -factors in a Gaussian elimination of A Rnn , L and U ,


satisfy
U
L = A + A

where, for m suciently small,


nm
|A| |L||U |.
1 nm
5.2. GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING 61

The proof of this result is similar to that of the previous theorem, and we omit it too.
Combining the two previous theorems gives the following backward error analysis result for the
solution of (SLE) by Gaussian elimnation.

Theorem 5.7. Let x be the approximate solution of (SLE) as computed by algorithm GE. Then
(A + A)
x=b

where, for m suciently small,


3nm
|A| |L||U |.
1 3nm
Proof. See Exercise 5-10.

This would give a backward stability result if it were possible to bound U


k|L|| |k in terms of
kAk. Unfortunately this is not possible as we illustrate in the following example.

Example. [Eect of Rounding] Let


 
1
A=
1 1
for some > 0. Then A has a LU factorisation A = LU with
   
1 0 1
L= , U= .
1 1 0 1 1

Now assume  1. Then 1 is a huge number and the representation of these matrices stored
in a computer will be rounded. The matrices might be represented as

   
= 1 0 = 1
L , U ,
1 1 0 1

which is compatible with Assumption (A1) on rounding errors. We have =L


L and U.
U
But multiplying the two rounded matrices gives

   
U= 1 0 0
L =A+ .
1 0 0 1
| {z }
A

A small rounding error leads to a large backward error A. The example shows that for Gaussian
elimination a backward error analysis will, in general, lead to the conclusion that the perturbed
problem is not close to the original one: the algorithm is not backward stable.
Discussing the result in terms of the previous theorem, note that kAk = 2 and that
U
k|L|| |k = (1 ). Hence it is not possible to bound U
k|L|| |k in terms of kAk, indepen-
dently of .
Note that this problem is not related to the conditioning of the matrix A. We have

 
1 1
A1 = (1 )1
1

and thus (A) = kAk kA1 k 4 for small >0 and the matrix A is well conditioned.

5.2 Gaussian Elimination with Partial Pivoting

Because of the instability illustrated in the previous example, the classical Gaussian elimination
method is not used in black box numerical software for (SLE). However, the continuation of the
example illustrates the resolution: permutations can cure this problem.
62 CHAPTER 5. SYSTEMS OF LINEAR EQUATIONS

Example. [Eect of Rounding Continued] Let

   
0 1 1 1
P = , and A0 = P A = ,
1 0 1

then
   
1 0 1 1
L0 = U0 =
1 0 1
   
0 = 1
L
0 0 = 1 1 .
U
1 0 1

Thus    
0U
0 = 1 1 0 0
L = A0 + ,
1+ 0
| {z }
A0

and the backward error A0 is small.

Thus permuting is important. There are two primary methods of permutation:

GEPP Swap rows only to maximise


(l)
|all | over all choices from l, . . . , n: (n l + 1) of them.

GECP Swap rows and columns to maximise


(l)
|all | over all choices from l, . . . , n: (n l + 1)2 of
them.

Here GEPP denotes Gaussian Elimination with partial pivoting whilst GECP denotes Gaus-
sian Elimination with complete pivoting. GEPP is the most widely used in practice and so we
devote most of our analysis to this algorithm.

The Method
The instability of Gaussian elimination highlighted above is caused by the fact that we divide by
the tiny number ukk = in step 4 of the LU factorisation algorithm. Motivated by the resolution
in the example above, we will avoid this problem in the improved version of the algorithm by
rearranging rows k, . . . , n by multiplication with a permutation matrix at the beginning of the
k th iteration in order to maximise the element ukk . The following argument shows that the
modied algorithm still works correctly.
We want to calculate
U = Ln1 Pn1 L1 P1 A.
Multiplying the permutation matrix Pk from the left exchanges rows k and ik where ik is chosen
from {k, . . . , n} to maximise the element uik k . We can rewrite this as

U = L0n1 L01 Pn1 P1 A

where L0n1 = Ln1 and


1 1
L0k = Pn1 Pk+1 Lk Pk+1 Pn1
1 1
for k = 1, . . . , n 2. Since Pn1 Pk+1 exchanges rows k + 1, . . . , n and Pk+1 Pn1 performs
0
the corresponding permutation on the columns k + 1, . . . , n the shape of Lk is the same as the
shape of Lk : it is unit lower triangular and the only non-vanishing entries below the diagonal
are in column k. Hence we can still use Lemma 5.1 to calculate L = (L0n1 L01 )1 . The above
arguments lead to the following algorithm.

Algorithm LUPP (LU factorisation with partial pivoting).


input:A Cnn non-singular
nn
output: L, U, P C where P A = LU with L unit lower triangular,
U non-singular upper triangular and P a permutation matrix
5.2. GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING 63

1: U = A, L = I , P = I
2: for k = 1, . . . , n 1 do
3: choose i {k, . . . , n} which maximises |uik |
4: exchange row (uk,k , . . . , uk,n ) with (ui,k , . . . , ui,n )
5: exchange row (lk,1 , . . . , lk,k1 ) with (li,1 , . . . , li,k1 )
6: exchange row (pk,1 , . . . , pk,n ) with (pi,1 , . . . , pi,n )
7: for j = k + 1, . . . , n do
8: ljk = ujk /ukk
9: (uj,k , . . . , uj,n ) = (uj,k , . . . , uj,n ) lj,k (uk,k , . . . , uk,n )
10: end for
11: end for

Remark. One can show that, whenever A is non-singular, the element ukk is always non-zero
after the row-exchange. Thus we do no longer require the condition det(Aj ) 6= 0 to avoid a
division by zero here.

Lemma 5.8. The resulting matrices L and Lk satisfy kLk n and kLk k 2.
Proof. Each matrix Lk has entries |(Lk )ij | 1 for all i, j {1, . . . , n}. Consequently the same
is true of L. The result for kLk follows directly. Furthermore, as each row of Lk contains at
most two non-zero entries, we have kLk k 2.

Gaussian elimination with partial pivoting now works as follows:

Algorithm GEPP (Gaussian elimination with partial pivoting).


input: A Cnn non-singular, b Cn
n
output: x C with Ax = b

1: nd P A = LU using algorithm LUPP


2: solve Ly = P b using forward substitution
3: solve U x = y using back substitution

Remarks.
1. The result of this algorithm is an x Cn with Ax = P 1 LU x = P 1 Ly = P 1 P b = b
and thus the algorithm is correct.

2. Again it is possible to compute y by appending b as an extra column in the matrix A.

Computational Complexity
The computational complexity is the same as for the LU algorithm given in Theorem 5.4. This
follows from the fact that we only added steps to exchange rows in the LU algorithm and such
exchanges do not incur computational cost in our model. If we assign unit cost to the comparisons
between number, made to maximise |uik | in step 3 of the algorithm, then this added additional
cost is of order (n2 ); thus it can be neglected for a (n3 ) algorithm. For GECP the additional
3
cost is of order (n ) and hence cannot be neglected.

Error Analysis
To derive a backward error analysis for GEPP is nontrivial. We limit ourselves to an heuristic
understanding of the analysis, based on the following two assumptions:

The computed permutation matrix P is the correct permutation P;

The computed L, U factors , U


L are well approximated by L, U .
64 CHAPTER 5. SYSTEMS OF LINEAR EQUATIONS

Then applying Theorem 5.7 to P A = P A we obtain a computed solution x


which satises

P (A + A)
x = Pb

where
3nm kU
k .
kAk kLk
1 3nm
To bound the right-hand side of this expression we use the following denition:

Denition 5.9. Dene gn (A), the growth factor for matrix A Cnn , by
kU kmax
gn (A) = . (5.1)
kAkmax
From this it follows, by Lemma 1.35, that

kU k ngn (A)kAk .

We also have kLk n and hence, using the approximation L, U


L U,

3n3 gn (A)m
kAk . kAk . (5.2)
1 3nm
(Here . indicates that approximations have been made in deriving the inequality). To under-
stand the implications of this bound we study the behaviour of the growth factor in further
detail.

Lemma 5.10. The growth factor gn (A) satises gn (A) 2n1 for every A Cnn .
Proof. We have U = L0n1 L01 P A. For any k {1, . . . , n 1} and any matrix A Cnn we
nd

kL0k Akmax = max (L0k A)ij



i,j
n
X
max (L0k )il alj

=
i,j=1,...,n
l=1
n
X
max (L0k )il max |aij |

i=1,...,n i,j
l=1
= kL0k k kAkmax
2kAkmax

and thus by applying the matrices L01 , . . . L0n1 in order we get, since kP Akmax = kAkmax ,
n1
kU kmax 2 maxi,j kAkmax . This completes the proof.

Placing this upper-bound on gn (A) in (5.2) shows a worryingly large n-dependence in the
backward error bound for GEPP:

3n3 2n1 m
kAk . kAk .
1 3nm
Furthermore, the following example shows that the upper bound on gn (A) from the lemma can
be attained.

Example. Let

1 1
1 1 1

A = 1 1 1 1 Rnn .

.. .
. .. .. .
.
. . . . .
1 1 1 1
5.3. THE QR FACTORISATION 65

Since all relevant elements have modulus 1 we do not need to use pivoting and LU factorisation
gives

1 1

1 2

1 4
nn
U = . R .

.. .
. .

1 2n2
2n1
Thus for the matrix A as dened above we get

kU kmax
gn (A) = = 2n1 .
kAkmax

Despite this worst case analysis, GEPP is widely used in practice; we outline why this is the
case.

Remarks.
1. The implied backward error analysis for GEPP given by (5.2) is indeed very weak: although
the algorithm is technically backward stable, for certain problems it is necessary that 2n m
be small to ensure a small backward error.

2. The preceding analysis may also be carried out for the GECP algorithm and results in a
1 1
bound for the growth factor which scales like n 4 log n+ 2 ; whilst still faster than polynomial
in n, this bound is considerably slower than 2n . Thus GECP has, in a worst case analysis,
far better stability properties than GEPP.

3. If comparisons between numbers are assigned unit cost, then GECP is still an (n3 )
algorithm, but the constant is larger than for GEPP.

4. Practical experience, and some statistical analysis, has led to a belief that matrices resulting
in large backward error for GEPP do not arise commonly in practice. Thus the worst case
analysis is thought, typically, to be misleading.

5. For these reasons GEPP is commonly used in practice, because it is cheaper than GECP
if comparisons are assigned a unit computational cost, and because it appears to produce
reasonable backward error on most problems.

5.3 The QR Factorisation

The Householder QR factorisation is another method to solve (SLE). QR factorisation avoids the
poor dependence of the stability constant on n that arises for the LU factorisation with partial
pivoting. However the computation takes roughly twice the number of operations.

The Method
The following algorithm solves the problem (SLE) using the QR factorisation from Theorem 2.18.
In order to apply the algorithm we will need a numerically stable method to calculate the QR
factorisation. To avoid a minor technical complication, we restrict ourselves to the case of
real-valued matrices in this section.

Algorithm (solving (SLE) by QR factorisation).


input: A Rnn non-singular, b Rn
output: x Rn with Ax = b
1: nd the QR factorisation A = QR
2: calculate y = Q b
3: solve Rx = y using back substitution
66 CHAPTER 5. SYSTEMS OF LINEAR EQUATIONS

The result of this algorithm is an x Rn with Ax = QRx = Qy = QQ b = b and thus the


algorithm is correct. To calculate the QR factorisation we will use the following algorithm. We
present the full algorithm rst and then analyse it to see how it works. The algorithm uses the
sign function, which is dened as follows.
(
+1, ifx 0, and
sign(x) =
1 else.

Algorithm QR (Householder QR factorisation).


input: A Rmn with m n
output: Q Rmm orthogonal, R Rmn upper triangular with A = QR
1: Q = I, R = A
2: for k = 1, . . . , n do
3: u = (rkk , . . . , rmk ) Rmk+1
4: v = sign(u1 )kuk2 e1 + u where e1 = (1, 0, . . . , 0) Rmk+1
5: v = v/kv k2
(mk+1)(mk+1)
6: Hk =  (Imk+1 2vv  )R
I 0
7: Qk = k1
0 Hk
8: R = Qk R
9: Q = Q Qk
10: end for
Remarks.
1. This method of calculating a QR factorisation has very desirable numerical stability prop-
erties.

2. The algorithm calculates matrices Qk with Qk = Qk for k = 1, . . . , n 1 as well as


R = Qn1 Q1 A and Q = Q1 Qn1 .
3. We will see that Qk Q1 A has zeros below the diagonal in columns 1, . . . , k and thus the
nal result R = Qn1 Q1 A is upper triangular.
4. The only use we make of the matrix Q when solving (SLE) by QR factorisation is to
calculate Q b. Thus for solving (SLE) we can omit the explicit calculation of Q by replacing
line 9 of algorithm QR with the statement b = Qk b. The nal result in the variable b will
then be
Qn1 Q1 b = (Q1 Qn1 ) b = Q b.

Householder Reections
In step 8 of algorithm QR we calculate a product of the form
   
R11 R12 R11 R12
Qk = (5.3)
0 R22 0 Hk R22

where R11 R(k1)(k1) and Hk , R22 R(mk+1)(mk+1) . The purpose of this subsection is
to understand this step of the algorithm.
If Hk as calculated in step 6 of algorithm QR is applied to a vector x Rmk+1 the result is

Hk x = x 2vv x = x 2vhv, xi.

Since the vector vhv, xi is the projection of x onto v , the value x vhv, xi is the projection of
x onto the plane which is orthogonal to v and x 2vhv, xi is the reection of x at that plane.
Reecting twice at the same plane gives back the original vector and thus we nd

Hk Hk = Hk Hk = I.
5.3. THE QR FACTORISATION 67

This shows that the matrices Hk , and then also Qk , k {1, . . . , n 1}.
are orthogonal for every
The vector which denes the reection plane is either v = u kuk2 e1
v = u (kuk2 e1 ), or
depending on the sign of u1 . The corresponding reection maps the vector u to Hk u = kuk2 e1
or Hk u = kuk2 e1 respectively. In either case the image is a multiple of e1 and since u is the
rst column of the matrix block R22 the product Hk R22 has zeros below the diagonal in the
rst column. The rst column of R22 is the k th column of R and thus Qk R has zeros below
the diagonal in columns 1, . . . , k . For k = n 1 we nd that R = Qn1 Q1 A is an upper
triangular matrix as required.

Remarks.
1. The matrices Hk and sometimes also Qk are called Householder reections.
2. The choice of sign in the denition of u helps to increase the stability of the algorithm in
the cases u kuk2 e1 and u kuk2 e1 . In the complex case, sign(x) will be replaced by
a complex number C with || = 1.

Computational Complexity
We considered two variants of algorithm QR, either calculating the full matrix Q as formulated
in line 9 of the algorithm or only calculating Q b by replacing line 9 with the statement b = Qk b.
We handle the dierent cases by rst analysing the operation count for the algorithm with line 9
omitted.

Lemma 5.11. The computational cost C(m, n) for algorithm QR applied to an m n-matrix
without calculating Q or Q b is asymptotically
2
C(m, n) 2mn2 n3 for m, n with m = (n).
3
Proof. We count the number of operations for the individual steps of algorithm QR. From
equation (5.3) we can see that for calculating the product Qk R
in step 8 we only have to
calculate Hk R22 = R22 2vv R22 . Since v k2
v = v/k and v k2 = v v we can calculate this
k as

v
Hk R22 = R22 v R22 .
v v/2
Using this formula we get the following operations count:

construction of v: 2(m k + 1) + 1 operations (counting as 1).

computing v R22 : Since each of the n k + 1 components of the matrix-vector prod-



R22 needs m k + 1 multiplications and
uct v
 m k additions, the computation of v R22
requires (n k + 1) (m k + 1) + (m k) operations.

Calculating v v/2 needs 2(m k + 1) operations and dividing v by the result requires
another (m k + 1) divisions.
Calculating v R22 )
( )( from this needs (m k + 1)(n k + 1) multiplications.

Calculating R22 ( ) requires (m k + 1)(n k + 1) subtractions.

Thus the total operation count is

n
X 
C(m, n) = 5(m k + 1) + 1 + (n k + 1) 4(m k + 1) 1
k=1
m
X
= 5l + 1 + (n m + l)(4l 1)
l=mn+1
2
= 2mn2 n3 + terms with at most two factors m, n
3
2
2mn2 n3
3
68 CHAPTER 5. SYSTEMS OF LINEAR EQUATIONS

for m, n with m = (n).

If we need to calculate the full matrix Q we have to perform an (m k + 1) (m k + 1)


matrix-matrix multiplication in step 9. Assuming that we use the standard matrix multiplication
algorithm this contributes asymptotic cost (m3 ) and so the asymptotic cost of algorithm QR
will be increased by this step. But if we apply algorithm QR only to solve (SLE), we just have
to calculate Q b instead of Q. Algorithmically this is the same as appending the vector b as an
additional column to the matrix A. Thus the computational cost for this algorithm is C(m, n+1)
and since
2 2
2m(n + 1)2 (n + 1)3 2mn2 n3
3 3
for m, n with m = (n) the asymptotic cost does not change. For solving (SLE) we also
have m = n and thus we nd that the asymptotic computational cost of solving (SLE) using
Householder QR factorisation is

2 4
C(n) 2nn2 n3 = n3 for n .
3 3

This analysis shows that solving (SLE) using Householder QR factorisation requires asymp-
totically double the number of steps than algorithm GEPP does. This is the price we have to
pay for the better stability properties of the QR algorithm.

Error Analysis
The backward error analysis for Householder QR is best expressed in terms of the norms of
matrix columns. For any matrix D Rnn we write dj to denote the j th column of D. A
similar convention for upper and lower case letters is used in what follows.
The rst result concerns the form of the triangular factor in the Householder QR factorisation.

Theorem 5.12. Let R denote the computed upper triangular factor in a Householder QR fac-
torisation of A Rnn . Then there exists orthogonal Q Rnn and constant c R+ such
that

A + A = QR

where, for m suciently small,


cn2 m
kaj k2 kaj k2 , j = 1, . . . , n.
1 cn2 m

This result can be used to show the following.

Theorem 5.13. Let x solve Ax = b and let x be the solution computed through Householder
QR. Then
(A + A)
x = b + b,

where, for some constant c R+ and for m suciently small,


cn2 m
kaj k2 kaj k2 ,
1 cn2 m
cn2 m
kbk2 kbk2 .
1 cn2 m

This is a considerable improvement over the worst case analysis for GEPP. However, as we
have seen, the worst case analysis does not give a true picture of the stability of GEPP in
practice; furthermore, Householder QR costs twice as much as GEPP. Consequently GEPP is
favoured in practice.
5.3. THE QR FACTORISATION 69

Bibliography

The book [Hig02] is an excellent source of material concerning backward error analysis for nu-
merical algorithms. In particular, proofs of Theorems 5.5, 5.6, 5.12 and 5.13 may all be found
in this book. The fact that the worst case analysis of GEPP does not seem to arise often in
practice is discussed in [TB97].

Exercises

Exercise 5-1. The purpose of this question is to illustrate some of the content of the chapter
by means of some Matlab experiments. We consider the linear system

Ax = b (5.4)

where A Rnn , b Rn are given and x Rn is to be found. In Matlab we start by creating


a random matrix A and vector b. Type

>> A=randn(5)
>> b=randn(5,1)
These two commands will produce a random 55 matrix and 5-vector where each entry is
independent of all others and is distributed normally with mean 0 and variance 1. Now type

>> x=A\b
This x is the (computer approximation) to the solution of (5.4). To verify this type

>> r=A*x-b; rn=norm(r)


which will produce the norm of the residual r between Ax (with the computed x) and b. How

Matlab
big is rn? What does this tell you?
Now we try and explain how solved (5.4). Type

>> [L,U,P]=lu(A)
and describe what special properties L, U and P have. In particular, what eect does P have
when applied to an arbitrary 55 matrix? Why is P 1 = P T ? If you type

>> P*A
and then

>> L*U
you see that P A = LU (approximately  use norm to measure the residual). Thus, applying P
to (5.4), we see that x solves
LU x = P b.
Thus
x = U 1 L1 P b.
What special features do U and L have which make this a good way to nd x? (Think about
what is involved in solving an arbitrary linear system and compare it to what is involved if the
U L).
Matlab
matrix to be inverted has the form of or
The LU factorisation described above is what uses to solve (5.4) (unless A is sym-
metric with positive diagonal entries, in which case Cholesky factorisation is used). But an
alternative is to use the QR factorisation which we now outline. Type

>> [Q,R]=qr(A)
and then type
70 CHAPTER 5. SYSTEMS OF LINEAR EQUATIONS

>> rn=norm(Q*R-A)
noting that this shows that QR is a (approximate) factorisation of A. R has a property you
have seen on another matrix in this question (what is it?) and is hence easy to invert. What
about Q? Let us extract the columns from Q. Type

>> q1=Q(1:5,1); q2=Q(1:5,2); q3=Q(1:5,3); q4=Q(1:5,4); q5=Q(1:5,5)


What do these vectors q represent? Now, using the Matlab command dot(a,b), which calculates
dot products of vectors, experiment with the properties of inner-products amongst the q vectors.
What do you nd? Why is Q easy to invert?
Using this QR factorisation it is possible to nd x solving (5.4) as x = R1 Q1 b. Although
this method is not often used it has some nice mathematical properties which we will explain in
the lectures.

Exercise 5-2. In analogy to the back substitution algorithm formulate the forward substitution
algorithm to solve Ly = b where L Cnn is a lower triangular, invertible matrix and b Cn
is a vector. Show that your algorithm computes the correct result and that it has asymptotic
computational cost of order (n2 ).

Exercise 5-3. a) Find the LU factorisation of


 
1 2
A= .
3 1
What is the growth factor gn (A) in this case? b) Find a QR factorisation of the same matrix,
using both Gram-Schmidt and Householder. c) Find the condition number of A in the k k ,
k k2 and k k1 norms.

Exercise 5-4. A = ab. Find all eigenvalues and eigenvectors of A. When is A a normal
a) Let
matrix? b) Let H Rnn be a Householder reection. Show that H has a single eigenvalue at
1 and an eigenvalue +1 of multiplicity (n 1). What is the value of det(H)?

Exercise 5-5. Determine the asymptotic computational cost of algorithm QR when calculating
the full matrix Q.

Exercise 5-6. Write a Matlab algorithms to solve

Ux = y
Ly = b
where U Rnn (resp. L Rnn ) is an upper (resp. lower) triangular matrix. Construct
examples U and y where you know the exact solution x (for some large n). Compute x
using
your algorithm and study the error
kx x
k
e= ,
kxk
discussing your results.
Now generate a random upper triangular U R5050 with non-zero entries uniformly dis-
T 50
tributed in [1, 1]. Let x = (1, . . . , 1) R and compute y = U x by use of Matlab's matrix
multiply. Solve U x
= y (with the just computed y ) using the algorithm you designed above and
compute the relative error
kx x
k
e= .
kxk
Do this several hundred times and discuss the resulting statistics of e.

Exercise 5-7. Using the construction from Exercise 2-2 as the basis for an algorithm, and
making maximum use of symmetry, show that Cholesky factorisation of a symmetric positive-
denite matrix can be achieved in approximately half the ops of standard Gaussian Elimination.
5.3. THE QR FACTORISATION 71

Exercise 5-8. Let A be a symmetric positive denite matrix that is also strongly (row) di-
agonally dominant. Let gn (A) be the growth factor from (5.1). Show that when Gaussian
Elmination is applied to A we get gn (A) 1.

Exercise 5-9. Banded matrices arise frequently in discretisations of PDEs and it is thus im-
portant to understand them both from a theoretical and a practical point of view.
We consider here m m tridiagonal matrices. These have the form


d1 c1
a2 d2 c2

.. .. ..
(5.5)

. . .

am1 dm1 cm1
am dm

First of all it will be useful to be able to invert tridiagonal matrices. Consider solving

Ax = b, (5.6)

where A is dened by (5.5) and b is an m 1 vector of the form b = [b1 , b2 , . . . , bm ]T .


Given the coecients ai , di , ci of the tridiagonal matrix A and the right-hand side bi , the
solution xi of (5.6) may be found as follows.

Algorithm (TriDiagonal Solver).


input: the coecients ai , di , ci of the tridiagonal matrix A
output: x Rn
1: for k = 2, . . . , m do
2: if dk1 = 0 then
3: signal failure and stop
4: else
ak
5: rm = dk1
6: end if
7: dk = dk rm ck1
8: bk = bk rm bk1
9: end for
10: if dn = 0 then
11: signal failure and stop
12: else
bm
13: xm = dm
14: end if
15: for k = m 1, . . . , 1 do
bk ck xk+1
16: xk = dk
17: end for
Note that this algorithm does not include any pivoting strategies. When and why is this
satisfactory?

Exercise 5-10. Prove Theorem 5.7 about the backward stability of algorithm GE.
72 CHAPTER 5. SYSTEMS OF LINEAR EQUATIONS
Chapter 6

Iterative Methods
In the previous chapter we considered methods which directly solve (SLE) using (n3 ) opera-
tions. In this chapter we will introduce iterative methods. These methods construct a sequence
(xk )kN with
xk x for k ,
where x solves (SLE). These methods approximate the exact solution x by the computed value
xk for large k and thus, even in the absence of rounding errors, the resulting algorithms only
compute approximate solutions to (SLE). In practical applications this is no problem, since one
can choose k large enough such that the approximation error is of the same order of magnitude
as the deviation caused by rounding errors in the exact methods.
The primary motivation for the use of such iterative methods arises for problems where
A Cnn with n  1, so that solving the problem by the direct methods of Chapter 5, with
3
cost (n ), is prohibitively expensive. If A has some special structure, for example if matrix-
vector multiplication with A is cheap to calculate, then iterative methods can have a substantial
advantage over direct methods.
For the methods considered in this chapter, the sequence (xk ) is constructed iteratively, i.e.
starting with a value x0 the value xk , for each k, is computed from xk1 . The map xk1 7 xk
may be either linear or nonlinear; we discuss methods of both types. The two competing criteria
for the iteration are:

The calculation of xk from xk1 should be cheap, relative to the cost of solving (SLE).
The convergence of xk to x should be fast.
These two criteria are often contradictory to one another and the trade-o between them is very
much dependent upon the matrix A itself. To illustrate this trade o it is useful to have the
following two extreme iterative methods in mind:

xk = A1 b; and

xk = xk1 .
The rst is not cheap to implement  each step involves solving (SLE) by a direct method,
and our assumption is that this is prohibitively expensive  but converges fast: in one step.
The second is cheap to implement but does not converge at all. Successful iterative methods lie
somewhere between these extremes.
In many cases, the cost of computing xk from xk1 is dominated by the cost of matrix-vector
multiplication for a matrix N which is derived from A. The resulting methods will be ecient if
these multiplications can be computed eciently. The most common structure leading to cheap
matrix-vector multiplication is sparsity : where A has only a small number of non-zero entries.
Then matrix-vector multiplication by A, A or any matrix N where N inherits the sparsity of
A will be cheap; the matrix N might, for example, be formed from the o-diagonals of A.
In this chapter we will focus only on errors incurred through using a nite number of iterations
but we will not discuss the eect of rounding errors. For each section describing a computational

73
74 CHAPTER 6. ITERATIVE METHODS

approach there will be three subsections, one dening the method, the second studying error as
a function of the number of iterations k, and the third discussing computational cost. We will
consider the cost of achieving an error of size . We will show that, for some important problems,
it is possible to compute an approximate solution in time (na ) with a<3 as the dimension n
of the problem increases, thus beating the cost of the direct methods from the previous chapter.

6.1 Linear Methods

Iterative methods where xk is computed from xk1 by application of a linear map are called
linear methods. In this section we describe results about linear methods in general. Sections 6.2
and 6.3 below introduce two specic instances of linear methods, namely the Jacobi method and
the successive over relaxation scheme (SOR).

The Method
The basic idea of the methods described in this section is to write the matrix A as A = M +N
where the matrix M is easier to invert than A. Then, given xk1 , we can dene xk by

M xk = b N xk1 . (6.1)

If we assume for the moment that limk xk = x, then the limit x satises

M x = lim M xk = lim (b N xk1 ) = b N x


k k

and thus we get Ax = b. This shows that the only possible limit for this sequence is the solution
of (SLE). Thus, once we know that the generated sequence converges for a given matrix A
and a given initial value x0 , we can iteratively compute xk for a big value of k to get an
approximation for x. In the following sections we will give sucient criteria for convergence of
the sequence (xk )kN .
This leads to the following algorithm.

Algorithm LI (linear iterative methods).


input: A = M + N Cnn , b Cn , x0 Cn
n
output: xk C with Axk b

1: for k = 1, 2, 3, . . . do
2: compute yk = b N xk1
3: solve M xk = yk
4: end for
To actually implement this method one needs to choose M and N such that the method
converges (we discuss dierent choices below). One also needs a stopping criterion to decide
when to quit iterating and to return the resulting approximation xk .
Thus, the remaining problem is to choose a stopping criterion for the iteration. We consider
the error
ek = x xk ,
and the residual vector
rk = b Axk = Aek
where x is the exact solution of (SLE) and xk the approximation at step k. We would like to
choose k big enough such that kek k for a given >0 but, since ek cannot be computed
without knowledge of the exact solution x, this is not a practical stopping criterion. Instead,
we will stop the iteration once krk k r for some r > 0. The residual error rk can easily be
computed during the iteration and from the estimate kek k = kA1 rk k kA1 kkrk k kA1 kr
we can get estimates for the error if needed.
6.1. LINEAR METHODS 75

Error Analysis
Since Mx = b Nx we get the relation

ek = M 1 N ek1 .

The method converges if ek 0 for k . The following lemma characterises convergence


with the help of the spectral radius of the matrix R = M 1 N .
Lemma 6.1. Let R Cnn , e0 Cn and ek = Rk e0 for all k N. Then ek 0 for all e0 Cn
if and only if (R) < 1.
Proof. Assume rst (R) < 1. Then by Lemma 2.10 we nd an induced matrix norm k kS with
kRkS < 1 and we get
kek kS = kRk e0 kS kRkkS ke0 kS 0
for k .
On the other hand, if (R) 1, then there is an e0 Cn \ {0} with Re0 = e0 for some
C with || 1. For this vector e0 we get, in any norm k k,

kek k = kRk e0 k = ||k ke0 k

and thus ek does not converge to 0 as k .

Remarks.
1. The lemma shows that the linear iterative method dened by (6.1) converges for every
initial condition x 0 Cn if and only if (R) < 1 for R = M 1 N . The convergence is
fast if (R) is small.

2. Since k k () for every matrix norm k k, a sucient criterion for convergence of an


iterative method is kRk < 1 for any matrix norm k k. On the other hand, whenever the
method converges for every initial condition, there is an induced matrix norm kk with
kRk < 1 by Lemma 2.10.

Computational Complexity
When measuring computational cost for iterative methods it is of interest to know how many
iterations are required to reduce the error to size . As before we will consider the relative error
kxk xk/kxk = kek k/kxk. By Proposition 3.3 we have

kek k krk k
(A) . (6.2)
kxk kbk
Finding k to achieve
krk k
, (6.3)
kbk
and thus
kek k
(A), (6.4)
kxk
will lead to an estimate of the computational cost of using the iterative method.
Using the relation rk = Aek = ARk e0 we can estimate the left hand side of (6.3) to get,
from (6.2),
kek k kAkkRkk ke0 k
(A)
kxk kbk
for any vector norm k k. If kk is chosen so that kRk < 1 then choosing k to be the smallest
integer greater than
ln 1 + ln kAk + ln ke0 k ln kbk
k] = (6.5)
ln kRk1
76 CHAPTER 6. ITERATIVE METHODS

will ensure (6.3).


In order to study the dependence of the cost on the dimension n of the problem we consider
a family of matricesA = M + N Cnn and vectors b Cn and also a family of vector norms
n ]
on the spaces C . The number k depends on both and n; the n dependence can enter through
kRk , kAk, kbk and ke0 k, but the largest contribution is often through kRk1 , for families of
1

problems in which kRk = 1(n ). We incorporate this into the following assumptions, which
lead to a quantication of computational cost.

Assumption 6.2. 1. Calculation of Nx and M 1 x together costs (n ) for some > 0,


uniformly for all x Cn .

2. kRk = 1 (n ) for some > 0.

3. (A), kAk, kbk and ke0 k are bounded uniformly in n.

Theorem 6.3. Under Assumption 6.2 the computational cost to achieve (6.4) by using the linear
iterative method is bounded above by cn+ ln 1 , for some constant c independent of n and .
Proof. The rst item of the assumption ensures that each step of the iteration costs (n ). The
1 1

second item of the assumption implies that ln kRk = (n ). Hence combining the second
and third items and using (6.5) gives the desired result.

Remarks.

1. For the theorem to be practically useful, the norm in (6.3) must be readily computable.
In some cases kRk < 1 in some computable norm such as k k , k k2 or k k1 . Then the
preceding theorem applies directly.

In other cases we only know that kRkS < 1; as the construction of kkS requires knowledge
of the Jordan Canonical form, this is not a readily computable norm. However, by norm
equivalence, we know that there is a constant c2 (0, ) such that

kak kakS
c2 .
kbk kbkS

The constant c2 may depend on n, but in many important applications this occurs only in
a polynomial fashion.

Iterating until we nd a k so that


krk kS
(6.6)
kbkS c2
will ensure
krk k

kbk
and hence (6.4). Finding k to achieve (6.6) will incur cost

 
kn+ ln 1 + ln c2 (n)

by the methods given in the preceding theorem, if Assumption 6.2 holds in the norm k kS .
Thus there is an increase in cost by only a logarithmic factor in n when c2 (n) is polynomial
in n.

2. In many applications (A), kAk and kbk do depend upon n, but only polynomially. Again
this leads to an increase in computational cost over the above, but only by a factor which
is logarithmic in n.
6.2. THE JACOBI METHOD 77

6.2 The Jacobi Method

In this and the next section we will consider specic methods which are obtained by choosing
the matrices M and N in (6.1). For a matrix A Cnn dene the three n n-matrices

a11
a21 a22

L = a31 a32 , D= a33 ,

.. ..
. .
an1 an2 ... an,n1 ann

a12 a13 ... a1n

a23 ... a2n
.. .
U = . . (6.7)

. .

an1,n

Then we have A = L + D + U. The matrices L and U are, respectively, strictly lower and upper
triangular.

The Method
The iterative method obtained by choosing M = D and N = L+U in (6.1) is called Jacobi
method. This choice of M and N leads to the iteration

xk = D1 b (L + U )xk1


for all k N. Since D is diagonal, the inverse D1 is trivial to compute.

Error Analysis
In order to study convergence of the Jacobi method using Lemma 6.1 we have to consider the
matrix R = M 1 N = D1 (L + U ). The method converges if and only if (R) < 1. The
following theorems give sucient criteria for this to happen.

Theorem 6.4. a) The Jacobi method is convergent for all matrices A with
X
|aii | > |aij | (6.8)
j6=i

for i = 1, . . . , n. (This condition is called the strong row sum criterion.)


b) The Jacobi method is convergent for all matrices A with
X
|ajj | > |aij | (6.9)
i6=j

for j = 1, . . . , n. (This condition is called the strong column sum criterion.)


Proof. a) The matrix R = D1 (L + U ) has entries
(
a
aij , if i 6= j , and
rij = ii

0 else.

Using Theorem 1.28 we nd


1 X
kRk = max |aij |.
i=1,...,n |aii |
j6=i

Thus the strong row sum criterion gives kRk < 1 which implies (R) < 1 and the method
converges.
78 CHAPTER 6. ITERATIVE METHODS

b) If the strong column sum criterion (6.9) holds for A, then the strong row sum crite-
A and
rion (6.8) holds for thus the method converges for A . From Lemma 6.1 we know then
1
(D ) (L + U ) < 1. Since for every matrix R the matrices R, R and also D1 RD have
the same eigenvalues we get

D1 (L + U ) = D1 (L + U )D1 D
 

= ((L + U )D1 )
= (D )1 (L + U ) < 1


and so the method converges for A.

Remark. As well as showing convergence, the proof of the theorem estimates kRk < 1. This
estimate can be used in Theorem 6.3 to analyse computational complexity.

Denition 6.5. A matrix A Cnn is called irreducible if there is no permutation matrix P


such that
A11 A12
 
P T AP =
0 A22
where A11 Cpp and A22 Cqq are square matrices with p, q > 0 and p + q = n, A12 Cpq ,
and 0 is the q p zero-matrix.

There is an alternative description of irreducibility, which is often easier to check than the
denition given. To the matrix A we associate the oriented graph G(A) with vertices 1, . . . , n
and edges i j i, j {1, . . . , n} with aij 6= 0. Then the matrix A is irreducible if
for all
and only if the graph G(A) is connected, i.e. if you can reach any vertex j from any vertex i by
following edges.

Example. Consider the matrix



1 1
A= 1 1 .
1
The associated graph is

1 2 3

The matrix A is not irreducible (indeed P = I in the denition is enough to see this) and since
there is no path from 3 to 1, the graph G(A) is not connected.

Example. In continuation of the previous example consider the modied matrix


1 1
A = 1 2 1 .
1 1

The associated graph is

1 2 3

Now the graph G(A) is connected and the matrix is thus irreducible.
6.2. THE JACOBI METHOD 79

Remarks.
The proof of equivalence of the two characterisations of irreducibility is based on the following
observations.

1. For any permutation P the graphs G(A) and G(P T AP ) are isomorphic, only the vertices
are numbered in a dierent way.

2. The block A22 in the denition of irreducibility corresponds to a set of states from where
there is no path into the states corresponding to the block A11 .
The characterisation of irreducibility through G(A) shows that the diagonal entries of A are
irrelevant to determining irreducibility. Any matrix which has non-zero o-diagonal entries if
and only if A does so, will have the same graph. Hence R = D1 (L + U ) is irreducible if and
only if A is.

Theorem 6.6. If A is irreducible and satises


X
|aii | |aij | (6.10)
j6=i

for i = 1, . . . , n but X
|akk | > |akj | (6.11)
j6=k
for one index k, then the Jacobi method converges. (This condition is called the weak row sum
criterion.)
Proof. We have to show that (R) < 1 where R = D1 (L + U ). Dene the matrix |R| =
(|rij |)ij Rnn and the vector e = (1, 1, . . . , 1)T Rn . Then we have
Xn  X |a | 
ij
(|R|e)i = |rij | 1 = 1 = ei .
j=1
i |a ii |
j6=i

Thus |R|e e where this and some of the following inequalities between vectors are to be read
componentwise. Therefore we get

|R|l+1 e |R|l e e
for all l N.
Let t(l) = e |R|l e 0. Then the vectors t(l) l be the
are componentwise increasing. Let
(l)
number of non-vanishing components of t l is strictly increasing until it
. We will show that
reaches the value n. Assume, for contradiction, that l+1 = l = k < n. Since one row of A
satises the strict inequality (6.11) we have |R|e 6= e and thus k > 0. Then without loss of
generality (since we can reorder the rows and columns of A) we have
   
a b
t(l) = , t(l+1) =
0 0
where a, b Rk and a, b > 0. We can conclude
 
b
= t(l+1) = e |R|l+1 e
0
|R|e |R|l+1 e = |R|t(l)
  
|R11 | |R12 | a
= .
|R21 | |R22 | 0
From a > 0 we have |R21 | = 0. Hence R is not irreducible. This implies that A is not irreducible,
and we have found the required contradiction. Thus we can conclude that l+1 > l whenever
l < n. This proves t(n) > 0 componentwise.
n
Hence e > |R| e and we get, using Lemma 1.34,

(R)n (Rn ) kRn k = |Rn | |R|n = max |R|n e



i
<1
i=1,...,n

and thus (R) < 1. This completes the proof.


80 CHAPTER 6. ITERATIVE METHODS

Computational Complexity
The sequence (xk )kN from the Jacobi method is dened by the relation

xk = D1 b (L + U )xk1 .


For each iteration we have to calculate

1) the product (L + U )xk1


2) the dierence b (needs n subtractions)
1
3) the product D (needs n divisions because D is diagonal)
2
For general matrices the rst step needs (n ) operations and thus dominates the computational
cost. If the matrix A is sparse, i.e. if it contains many zeros, the matrix-vector multiplication
can be performed using fewer operations.

Example. Assume that each row of A contains at most ` > 0 non-zero entries outside the
diagonal. Then step 1) requires (n) operations and performing one step of Jacobi method has
cost (n) as n . In this case we get =1 in Theorem 6.3.

6.3 The Gauss-Seidel and SOR Methods

The Method
The method obtained by using M = L+D and N = U +(1)D in (6.1), for the matrices L, D
and U from (6.7) and some R, is called the Successive Over Relaxation (SOR) scheme. The
parameter can be chosen to accelerate convergence. The special case = 1, i.e. M = L + D
and N = U , is called the Gauss-Seidel method. For the SOR scheme we get the iteration

(L + D)xk = b U xk1 (1 )Dxk1 .

Since (L + D) is lower triangular we can use forward substitution to calculate xk in each step.

Error Analysis
The error analysis of the SOR and Gauss-Seidel method is similar to the analysis for the Jacobi
method from the previous section. The convergence properties of the SOR scheme are determined
by the matrix

R = (L + D)1 U + (1 )D .


The following summarises some sucient criteria for convergence of the Gauss-Seidel method.

Theorem 6.7. Assume either


a) A satises the strong row sum criterion or
b) A is irreducible and satises the weak row sum criterion.
Then the Gauss-Seidel ( = 1) method converges.

Computational Complexity
The speed of convergence and thus the cost of the method depends on the choice of the param-
eter . For linear systems arising from discretisation of certain elliptic PDEs, optimising can
lead to an order of magnitude improvement in eciency, when compared with Gauss-Seidel and
Jacobi methods. This arises though decreasing in Theorem 6.3 for appropriate choice of . In
the same context Gauss-Seidel improves over Jacobi, but not by an order of magnitude.
6.4. NONLINEAR METHODS 81

6.4 Nonlinear Methods

In the remaining part of this chapter we will consider nonlinear methods for solving (SLE),
i.e. methods where the map from xk to xk+1 is no longer linear. We restrict ourselves to the
case where the matrix A is Hermitian and positive denite. This section explains the general
approach. The remaining two sections of the chapter examines two specic instances of nonlinear
methods in detail.

The fundamental observation for the methods we consider here is that the vector equation
Ax = b kAx bk = 0 for any vector norm k k. Since we assume that A is
is equivalent to
Hermitian and positive denite, hx, yiA = hx, Ayi denes an inner product and kxk2A = hx, Axi
1
denes the associated vector norm. The matrix A is again Hermitian and positive denite
2 1
and kxkA1 = hx, A xi denes the associated vector norm.
Lemma 6.8. Let A Cnn be Hermitian and positive denite and b Cn . Then x Cn solves
Ax = b if and only if x is the minimiser of the function
1
g(y) = kAy bk2A1 y Cn . (6.12)
2
In this section we will consider iterative methods which solve (SLE) by constructing sequences
(xk )kN which converge to the unique minimum of the function g . Using the residual rk = bAxk
and the error ek = x xk , where x is the exact solution of Ax = b, we can write

1 1
g(xk ) = krk k2A1 = kek k2A .
2 2
Thus minimising g corresponds to minimising the length of the residual vector in the k kA1 -
norm or, equivalently, to minimising the error for the exact solution in the k kA -norm.

The Method
The two methods that we study below both take the form

xk = xk1 + k1 dk1 , (6.13)

where dk1 Cn \ {0} is called the search direction and the scalar k1 C is called the step
length. The step length is chosen so that, given xk1 and dk1 ,
k1 = argmin g(xk1 + dk1 ).
C

Since g is a convex, quadratic function, the value k1 is uniquely determined.


The minimum can be found using the fact the g satises
2
g(y + z) = g(y) + Rehz, Ay bi + kzk2A (6.14)
2
for all R and y, z Cn , where Re denotes the real part of a complex number. Taking the
derivative w.r.t. (along the real line) gives


g(y + z) =0 = Rehz, Ay bi. (6.15)

At the minimum we nd

k1 )

0= g(xk1 + k1 dk1 + d =0

= Re hdk1 , A(xk1 + k1 dk1 ) bi
= Re hdk1 , rk1 + k1 Adk1 i
for all C and thus
hdk1 , rk1 i
k1 = .
kdk1 k2A
Hence we get the following algorithm.
82 CHAPTER 6. ITERATIVE METHODS

Algorithm NI (nonlinear iterative methods).


input: A Cnn Hermitian and positive denite, b Cn , x 0 Cn
n
output: xk C with Axk b

1: for k = 1, 2, 3, . . . do
2: compute dk1
hdk1 , rk1 i
3: k1 =
kdk1 k2A
4: xk = xk1 + k1 dk1
5: end for
The steepest descent method and the conjugate gradient method, discussed below, are in-
stances of this algorithm, employing dierent choices for the search directions dk .

Error Analysis
The error analysis for nonlinear methods is done in two steps. We rst study how fast the
value kek k2A = 2g(xk ) decays. This analysis depends on the specic choice of search directions dk
and is presented for the two methods below, separately. In a second step we can use this result
to estimate the error of the approximate solution xk in the Euclidean norm:

Lemma 6.9. Assume kek kA cqk ke0 kA for all k N for some constants q, c > 0. Then
p
kek k (A)c q k ke0 k k N

where (A) = kAk2 kA1 k2 is the condition number of A in the 2-norm.


Proof. Let min and max be the minimal and maximal eigenvalue of A. Then we have (A) =
max /min by (3.2) and from Lemma 1.21 we nd

1 2 max k
kek k2 kek k2A cq k ke0 k2A cq ke0 k2 .
min min min
This completes the proof.

For future reference we state the following relation, which describes how the residual error
changes in each step:
rk = rk1 k1 Adk1 k N. (6.16)

This is a direct consequence of (6.13).

6.5 The Steepest Descent Method

The Method
The steepest descent method chooses a search direction which makes a step in the direction of
the (locally) steepest descent for g. From equation (6.14) we get

2
g(xk + dk ) = g(xk ) Rehdk , rk i + kdk k2A (6.17)
2
for all R. Thus, the direction of steepest descent corresponds to the direction dk which
maximises Rehdk , rk i while keeping the length of dk xed. Since dk is only determined up to a
scalar factor, we can choose
dk = rk k N0
in (6.13). With this choice, the step length becomes

krk k2
k =
krk k2A
and we get the following algorithm.
6.5. THE STEEPEST DESCENT METHOD 83

Algorithm SD (steepest descent).


input: A Cnn Hermitian, positive denite, b Cn , x 0 Cn , r > 0
n
output: xk C with Axk b

1: for k = 1, 2, . . . do
2: rk1 = b Axk1
3: if krk1 k r then
4: return xk1
5: end if
6: k1 = krk1 k2 /krk1 k2A
7: xk = xk1 + k1 rk1
8: end for

Error Analysis
The following result establishes the convergence of the method.

Theorem 6.10. Let A be Hermitian and positive denite. Then the rate of convergence of the
steepest descent algorithm for solving (SLE) is given by
p k
kxk xkA 1 1/(A) kx0 xkA k N,
where (A) is the condition number of A in the 2-norm.
Proof. By substituting = k1 into equation (6.17) we get

1 hdk1 , rk1 i2  hdk1 , rk1 i2 


g(xk ) = g(xk1 ) = 1 g(xk1 )
2 kdk1 k2A kdk1 k2A krk1 k2A1
for all k N. Together with the estimates from Lemma 1.21 we get

 krk1 k4   1 
g(xk ) 1 1 g(x k1 ) = 1 g(xk1 )
max krk1 k2 min krk1 k2 (A)

for all kN and thus


 1 k
g(xk ) 1 g(x0 ).
(A)
The result follows then with kek k2A = 2g(xk ).

Remarks.
1. If the matrix A is ill-conditioned then the method converges slowly. Ill-conditioning occurs
in the 2-norm when the eigenvalues of A are very dierent: max  min . In this case the
steepest descent direction is almost orthogonal to the direction in which the solution lies.

2. Since the steepest descent method uses dk = rk , the search directions satisfy rk = rk1
k1 Ark1 by (6.16). Using induction we get

xk x0 + span{r0 , Ar0 , A2 r0 , . . . , Ak1 r0 } (6.18)

for all k N.

Computational Complexity
We can achieve
kxk xkA (6.19)

by choosing k to be the smallest integer greater than

ln 2 + ln kx0 xk2A
k# =   . (6.20)
ln 1 (A)1
84 CHAPTER 6. ITERATIVE METHODS

Assumption 6.11. 1. Calculation of Ax costs O(n ) uniformly in x Cn .


2. The condition number of A in the 2-norm satises (A) = O(n ).
3. kx x0 kA is bounded uniformly in n.
Theorem 6.12. Under Assumption 6.11 the computational cost of using the steepest descent
method to achieve (6.19) is bounded above by cnmax{,1}+ ln 1 , for some constant c indepen-
dent of n and .
Proof. The rst item of the assumption ensures that each step of the iteration costs (nmax{,1} ).
Hence, combining the second and third items and using (6.20) gives the desired result.

6.6 The Conjugate Gradient Method

Theorem 6.10 shows that the steepest descent method can behave quite poorly on ill-conditioned
problems. The conjugate gradient method, and its pre-conditioned variants, go a long way to
ameliorating this diculty, at little extra cost when compared to the method of steepest descents.
Again, we assume that A is Hermitian and positive denite. The conjugate gradient method
is similar to the steepest descent method but instead of making steps in direction of the steepest
descent it restricts the increments xk xk1 to lie in the Krylov subspace
Kk (A, r0 ) = span{r0 , Ar0 , A2 r0 , . . . , Ak1 r0 }.

Since the spaces Kk (A, r0 ) are increasing, this leads to xk x0 + Kk (A, r0 ) for all k and, as we
will see, the algorithm even chooses xk to minimise the function g from (6.12) on this space.
An underlying, general approach to the design of iterative methods for systems of linear
equations (and also for the eigenvalue problem) is as follows: seek approximate solutions which
at the k th step of the iteration use only increments in Kk (A, r0 ). Such methods are natural when
matrix-vector multiplication by A is cheap to compute. Equation (6.18) shows that the steepest
descent method follows this principle. The conjugate gradient method can be implemented al-
most as easily as the method of steepest descents, but since it nds the optimal approximation to
x from Kk (A, r0 ) we expect it to perform better. Indeed, the CG method converges considerably
faster on ill-conditioned problems. Many other iterative methods also use the idea of nding
approximations from the Krylov subspace.

The Method
Denition 6.13. Given A Hermitian positive-denite we say that the set {d0 , , dk } is A-
orthogonal (or conjugate) if
hdi , dj iA = 0 i 6= j.
The following Lemma shows that choosing the search directions dk to be A-orthogonal is
advantageous for non-linear iterative methods.

Lemma 6.14. Assume that the search directions {d0 , . . . , dk1 } form an A-orthogonal set. Then
xk , given by (6.13), minimises g over x0 + span{d0 , , dk1 }.
Proof. Dene G : Cn R by

k1
X
G() = g(x0 + l dl ) Cn .
l=0

Then G is a convex quadratic function and has a unique minimum where all directional deriva-
Pk1
tives vanish. Using (6.15) with y = x0 + l dl and m
z = d we nd
l=0

k1
X
Re hdm , Ax0 + l Adl bi = 0
l=0
6.6. THE CONJUGATE GRADIENT METHOD 85

for all C and thus


hdm , r0 i
m =
kdm k2A
for m = 0, . . . , k 1.
Using (6.16) and the conjugacy of the dl we get

hdm , rk i = hdm , rk1 i k1 hdM , Adk1 i = hdm , rk1 i

for k = 1, . . . , m and thus hdm , r0 i = hdm , rm i. This shows that the minimum is, indeed, attained
for = .
The steepest descent method also chooses xk x0 + span{d0 , , dk1 }, but minimises g
only along a line instead of over all of the ane sub-space. Thus, any method with A-orthogonal
search directions will be at least as good as the steepest descent method.
In general it is expensive to construct orthogonal sets  something like a Gram-Schmidt
orthogonalisation is required. Thus to make a viable method it remains to try and construct
an iteration which iteratively generates an A-orthogonal set in a cheap fashion. The idea of the
conjugate gradient method is to construct the A-orthogonal set by using the iteration

dk = rk + k dk1

where the factor k is chosen to enforce hdk1 , dk iA = 0, giving

hdk1 , rk iA
k = .
kdk1 k2A

This clearly ensures that dk is orthogonal to dk1 ; we will see that it is possible to choose d0
such that the entire set {d0 , . . . , dk } is an A-orthogonal set.
A preliminary form of the resulting algorithm is given by the following procedure: choose
x0 , d0 Cn and let r0 = b Ax0 . Then, for k = 1, 2, . . . dene, in the given order,

hdk1 , rk1 i
k1 = , xk = xk1 + k1 dk1 ,
kdk1 k2A
hdk1 , rk iA
rk = rk1 k1 Adk1 , k = , dk = rk + k dk1 . (6.21)
kdk1 k2A

Here we have used (6.16) to compute rk inductively. This allows to compute all required quan-
tities with only one matrix-vector multiplication per step (to calculate Adk1 ).
Lemma 6.15. Let x1 , . . . , xk be given by the conjugate gradient method (6.21) with d0 = r0 .
Assume r0 , . . . , rk 6= 0 and d0 , . . . , dk1 6= 0. Then dk 6= 0 and hdk1 , rk i = hrk1 , rk i = 0.
Furthermore 2 2
krk1 k krk k
k1 = > 0, k = > 0.
kdk1 k2A krk1 k2
Proof. From the denition of rk and the denition of k1 we get

hdk1 , rk i = hdk1 , rk1 i k1 kdk1 k2A = hdk1 , rk1 i hdk1 , rk1 i = 0

and thus, by Pythagoras' theorem,

kdk k2 = krk k2 + |k |2 kdk1 k2 krk k2 > 0

and dk 6= 0. k = 1 we have hrk1 , dk1 iA = hd0 , d0 iA = kdk1 k2A . For k > 1 we can solve the
For
2
equation dening dk1 for rk1 to get hrk1 , dk1 iA = hdk1 k1 dk2 , dk1 iA = kdk1 kA .
Hence
hrk1 , rk i = hrk1 , rk1 i k1 hrk1 , Adk1 i = 0
which is the second orthogonality relation from the claim.
86 CHAPTER 6. ITERATIVE METHODS

For k=1 we have hdk1 , rk1 i = hr0 , r0 i = krk1 k2 . For k > 1 we can use the the denition
of dk1 to get hdk1 , rk1 i = hrk1 + k1 dk2 , rk2 i = krk1 k2 . Thus we get

hdk1 , rk1 i krk1 k2


k1 = = > 0.
kdk1 k2A kdk1 k2A

Finally, since k1 > 0, we can solve the denition of rk for Adk1 to get

hAdk1 , rk i hrk rk1 , rk i krk k2


k = 2 = 2 = > 0.
kdk1 kA k1 kdk1 kA krk1 k2

This completes the proof.

A consequence of Lemma 6.15 is that, assuming we have not already found the exact solution,
we will have dk 6= 0 and thus all the expressions in (6.21) make sense. We can now prove that
the conjugate gradient method does indeed produce an A-orthogonal set of search directions.

Lemma 6.16. Let x1 , . . . , xk be given by the conjugate gradient method (6.21) with d0 = r0 .
Then {d0 , , dk1 } Kk (A, r0 ) is an A-orthogonal set.
Proof. We rst prove by induction that

span{r0 , r1 , . . . , rl } = span{d0 , d1 , . . . , dl } = span{r0 , Ar0 , . . . , Al r0 } (6.22)

for l = 0, . . . , k 1. d0 = r0 , the statement holds for l = 0. Now let l > 0 and assume that
Since
the statement holds for l 1. Then, since l1 6= 0, we can solve the denition of rl for Adl1
to get Adl1 span{rl1 , rl } and thus

Al r0 = A(Al1 r0 ) span{Ad0 , . . . , Adl1 } span{r0 , . . . , rl }.

This shows span{r0 , Ar0 , . . . , Al r0 } span{r0 , r1 , . . . , rl }. From the denition of dk we get

rl = dl l dl1 span{dl1 , dl } span{d0 , d1 , . . . , dl }

and thus span{r0 , r1 , . . . , rl } span{d0 , d1 , . . . , dl }. Finally we have

dl = rl + l dl1 = l1 Adl1 + rl1 + l dl1 span{r0 , . . . , Al r0 }

and thus span{d0 , d1 , . . . , dl } span{r0 , Ar0 , . . . , Al r0 }. This completes the proof of (6.22).
Assume for induction that {d0 , . . . , dk1 } is an A-orthogonal set. This is true for k = 1. Now
let k > 1. From the denition of rj+1 we get

hdi , rj+1 i = hdi , rj i m hdi , dj iA = hdi , rj i

for j = i + 1, . . . , k 1 and thus, by induction, hdi , rj i = hdi , ri+1 i = 0 for all 0 i < j k where
the last identity comes from Lemma 6.15. Since Adi span{ri , ri+1 } span{d0 , . . . , di+1 }, we
get

hdi , rk iA = hAdi , rk i = 0
for i = 0, . . . , k 2. Using this result we can prove that dk is A-orthogonal to d0 , . . . , dk1 : for
i < k 1 we get
hdi , dk iA = hdi , rk iA + k hdi , dk1 iA = hdi , rk iA = 0
and hdk1 , dk i = 0 by the construction of k . This completes the proof.

The CG-algorithm is normally implemented in the following form which exploits the ex-
pressions for k , k as derived in Lemma 6.15. This allows the required number of arithmetic
operations to be kept small.
6.6. THE CONJUGATE GRADIENT METHOD 87

Algorithm CG (conjugate gradient method).


input: A Cnn Hermitian, positive denite, b Cn , x 0 Cn , r > 0
n 1
output: xk C with xk A b
1: r0 = b Ax0 , d0 = r0
2: for k = 1, 2, 3, . . . do
krk1 k2
3: k1 = >0
kdk1 k2A
4: xk = xk1 + k1 dk1
5: rk = rk1 k1 Adk1
6: if krk k r then
7: return xk
8: end if
krk k2
9: k = >0
krk1 k2
10: dk = rk + k dk1
11: end for

Error Analysis
We have already seen, in Lemma 6.14, that any method with A-orthogonal search directions will
have smaller errors than the steepest descent method. The following theorem shows a surprising
result: these methods, even though iterative by construction, obtain the exact solution of (SLE)
after at most n iterations.

Theorem 6.17. If {d0 , . . . , dn1 } form an A-orthogonal set in Cn then, for any starting vec-
tor x0 , the sequence (xk ) given by (6.13) reaches the exact solution x of (SLE) in at most n
steps.
Proof. The set {d0 , dn1 } forms an A-orthogonal basis for Cn and so we may write

n1
X hdk , x x0 iA
x x0 = k dk , k = .
kdk k2A
k=0

Also, by construction of the xk ,


k1
X
xk x0 = i di .
i=0

Conjugacy gives
hdk , xk x0 iA = 0
and thus

hdk , x x0 iA = hdk , x x0 iA + hdk , x0 xk iA = hdk , x xk iA = hdk , rk i

for k = 0, . . . , n 1. Comparing the denitions of k and k we nd k = k which implies

n1
X
x x0 = k dk = xn x0
k=0

so that x = xn as required.

Corollary 6.18. The conjugate gradient algorithm, for any starting vector x0 , reaches the exact
solution of (SLE) in at most n steps.
Proof. Assume that the algorithm has not converged in n 1 steps; otherwise the proof is
complete. Then the set {d0 , , dn1 } is A-orthogonal by Lemma 6.16. Hence the result follows
from Theorem 6.17.
88 CHAPTER 6. ITERATIVE METHODS

Remark. When the computation is performed on a computer, rounding errors will cause rn 6= 0
where rn is the calculated value for the residual error rn . In practice one just treats the method
as an iterative method and continues the iteration until the residual error k
rk k is small enough.
Typically the exit criterion will even be reached for k < n.

In order to understand how the conjugate gradient method performs iteratively, we describe
an error analysis based around the theory of polynomial approximation. In the following we let

(A) := { C | is an eigenvalue of A}
and we let Pk denote the set of all polynomials of degree k with p(0) = 1.
Theorem 6.19. If the conjugate gradient algorithm has not converged at step k, then
kxk xkA = inf kp(A)(x0 x)kA
pPk
 
inf max |p()| kx0 xkA .
pPk (A)

Proof. Since xk x0 + span{r0 , Ar0 , . . . , Ak1 r0 } we can nd j C with

k1
X k1
X
ek = x xk = x x0 + j Aj r0 = e0 + j Aj+1 e0 .
j=0 j=0

Dening p Pk by
k
X
p() = 1 + j1 j
j=1

we obtain ek = p(A)e0 and thus

kxk xkA = kp(A)(x0 x)kA .


The value xk minimises kxk xkA over xk Kk (A, b), which is equivalent to minimising over
all choices of j . Hence
kxk xkA = inf kp(A)(x0 x)kA .
pPk

To complete the proof let (j ) be a basis of


Pn Cn composed of eigenvectors of A with corre-
sponding eigenvalues j . If we let e0 = j=1 aj j , then
n
X
ek = p(A)e0 = aj p(j )j
j=1

and thus we get


n
X n
X
ke0 k2A = j |aj |2 , kek k2A = j |aj p(j )|2 .
j=1 j=1

Therefore
n
X
kek k2A max |p()|2 |aj |2 j = max |p()|2 ke0 k2A
(A) (A)
j=1

and the result follows.

By choosing an appropriately scaled and shifted Chebyshev polynomial and applying the
previous result, the following may be shown:

Theorem 6.20. Let be the condition number of a symmetric positive denite matrix A in the
Euclidean norm. Then the error in step k of the conjugate gradient algorithm for solving (SLE)
satises "  k  k
# 1
+1 +1
kxk xkA 2 + kx0 xkA .
1 1
Proof. See Exercise 6-6.
6.6. THE CONJUGATE GRADIENT METHOD 89

Computational Complexity
We make the same assumptions as for the steepest descent method, and show the decrease in
computational cost that can be obtained by using the conjugate gradient method.

Theorem 6.21. Under Assumption 6.11 the computational cost of using the conjugate gra-
dient method to achieve (6.19) is bounded above by cnmax{,1}+ ln 1 , for some constant c
1
2

independent of n and .
Proof.
The rst item of the assumption ensures that each step of the iteration costs (nmax{,1} ).
If 1 then the Theorem 6.20 gives

 k
2
kxk xkA . 2 1 kx0 xkA .

Combining the second and third items and using this expression gives the desired result.

Remark. In an applied context the most important form of the conjugate gradient method is
its preconditioned form. Roughly speaking the method is applied to the linear system

M Ax = M b

where the preconditioner M is chosen to try and improve the conditioning of MA over A, yet
have M easy to invert. In practical terms the algorithm diers from standard conjugate gradient
only by the addition of one multiplication with M per step.

Bibliography

Linear iterative methods for (SLE) are described in the book [Saa97]. Theorem 6.7 is proved in
[SB02]. For discussion of Theorem 6.20 see [TB97]; for related results see [NW06]. The SOR
method is analysed in detail in [LT88].
The nonlinear iterative methods that we discuss are constructed through minimisation or
optimisation techniques. As such they are strongly related to the topic of optimisation in general.
This topic is well overviewed in [NW06]; the exposition of the conjugate gradient method there
forms the basis for ours. A good discussion of preconditioned versions of the conjugate gradient
method may also be found in this book.

Exercises

Exercise 6-1. The purpose of this exercise is to introduce you to the idea of iteration. Many
problems are simply too large to solve by the methods we have looked at so far  too much
computer time or memory is required if the dimension n of the matrix is large. For linear systems

Ax = b, A Rnn , b Rn (6.23)

this will lead to the idea of generating approximating sequences {xk } which hopefully converge
to x as k . For eigenvalue problems which are large we will study methods which generate
approximations to a subset of the eigenvalues, and not to all of them. (Note that for eigenvalue
problems it is necessary to use iterative methods in all cases, provided the matrix has dimension
bigger than 4; the main dierence when the problem is large is that we solve for only a subset
of the eigenvalues).
(i) To show why iterative methods are necessary construct a matrix A with n = 1000 as follows:
type

>> A=0.0005*rand(1000);
>> b=rand(1000,1);
>> for i=1:1000; A(i,i)=1; end;
90 CHAPTER 6. ITERATIVE METHODS

This creates a matrix A with 1 on the diagonal and random entries uniformly distributed in
[0, 5 104 ] on the o-diagonals. Now try solving (6.23). Type

>> x=A\b;
Almost certainly your computer cannot handle a problem of this size. So type ctrl c to stop it
from trying. (If you are lucky enough to be working on a machine where this problem is solved
1000 by 10000 or by some matrix suciently
Matlab
easily then repeat the exercise replacing large that
fails.) Keep the (original 1000 1000) matrix A for use in (iii) where we will use a
more successful approach to solve this problem.
(ii) We look at the simplest iterative method, known as Jacobi. Any matrix A can be written
uniquely as
A=D+L+U
where D is diagonal, L is zero on and above the diagonal and U is zero on and below the diagonal.
Assuming that D is invertible, consider a sequence {xk }
k=1 satisfying

xk+1 = D1 [b Lxk U xk ].

If xk x as k then what equation does x satisfy and why?


(iii) Implement the idea in (ii) as follows. Type

>> L=tril(A,-1);
>> U=triu(A,1);
>> 0=L+U;
This creates L and U as dened in (ii) (a Matlab question: why is this so?). Type

>> j=1;
>> y=ones(1000,1);
>> n(j)=2*j;
>> for i=1:n(j); y=b-O*y; end
>> rn(j)=norm(A*y-b)
What does this tell you? Repeat the above 5 lines of code with j = 2, 3, . . . , 5 and type:

>> plot(n,rn)
What do you observe? Also type

>> plot(n,log(rn))
(iv) Under what conditions on A will the Jacobi iteration converge to the correct solution of
(6.23)?

Exercise 6-2. Prove that the Jacobi iteration that you implemented in Exercise 6-1 converges
for the matrix A given there.

Exercise 6-3. Which of the following matrices are irreducible? For which of these matrices
does the Jacobi-method converge?

4 2 1 1 3 8
2 1 , 3 25 4 , 4 2
1 4 1 2 1 1

Exercise 6-4. Show that the Jacobi method for the solution of

1 1 x1 1
1 10 2 x2 = 2
2 1 x3 3
converges and, for the iteration starting with x0 = 0, give an upper bound on the number of
steps required to get the relative error of the result below 106 .
6.6. THE CONJUGATE GRADIENT METHOD 91

Exercise 6-5. Consider the linear system Ax = b where


 
1 0
A= ,
0

where b = (1, )T . Calculate x1 from the steepest descents algorithm, starting from x0 = 0.
Which component of the solution is better approximated after one step? Why is this?

Exercise 6-6. Prove Theorem 6.20 by choosing a Chebyshev polynomial scaled and shifted so
that its min/max properties hold on the interval [min (A), max (A)]. Then prove that

kxk xkA 2 k kx0 xkA

where (0, 1) and


r
min
12
max
for  1.

Exercise 6-7. Let A Rnn be a symmetric positive-denite matrix. If A = QM QT is the


eigenvalue/eigenvector decomposition of A, then we dene a square root of A (one of many
possible square roots) to be A1/2 = QDQT , where D is a diagonal matrix whose elements are
the square roots of the corresponding elements of M.
1. If B Rnn is a matrix such that B and A1/2 commute, prove that

kBxkA kBk2 kxkA , x Rn

2. Using your proof above, prove that

inf kp(A)ykA inf max |p()|kykA


pPk pPk (A)

where Pk is the set of all polynomials p of degree less than or equal to k with p(0) = 1,
y Rn , and (A) is the set of eigenvalues of A.

Exercise 6-8. Dene


1
(x) := hx, Axi hx, bi
2
with A symmetric positive denite. Show that minimising (x) is equivalent to solving Ax = b.
Consider the algorithm
xk+1 = xk + k rk
to minimise (x). Find k which minimises (xk+1 ) over all k R given xk . This choice of
k denes the gradient or steepest descent algorithm.
For CG we have
xk+1 = xk + k dk .
Show that the choice of k made in Algorithm CG minimises (xk+1 ) over all R, given xk
and dk .
92 CHAPTER 6. ITERATIVE METHODS
Chapter 7

Least Squares Problems


In this chapter we introduce algorithms for the (LSQ) problem: given A Cmn and b Cm
n
with m n, nd x C which minimises kAx bk2 . We will describe and discuss three dierent
algorithms to solve (LSQ). The following theorem underpins all of the algorithms described.

Theorem 7.1. A vector x Cn minimises kAx bk2 for A Cmn and b Cm , if and only
if Ax b is orthogonal to range(A).
Proof. Let g(x) = 21 kAx bk22 . Then minimising kAx bk2 is equivalent to minimising the
function g.
Assume that Ax b is orthogonal to range(A) and let y Cn . Then

Ay Ax = A(y x) Ax b
and Pythagoras' theorem gives

kAy bk22 = kAy Axk22 + kAx bk22 kAx bk22


for all y Cn . Thus x minimises g.
Now assume that the vector x minimises g. y Cn we have
Then for every

1 
0= g(x + y) = hAy, Ax bi + hAx b, Ayi = RehAx b, Ayi
2
and
1 
0= g(x + iy) = ihAy, Ax bi + ihAx b, Ayi = ImhAx b, Ayi.
2
This shows hAx b, Ayi = 0 and thus Ax b Ay for all y Cn .
Corollary 7.2. A vector x Cn solves (LSQ) if and only if
A Ax = A b. (7.1)

Proof. From the theorem we know that x solves (LSQ) if and only if Ax b range(A). This
in turn is equivalent to Ax b ai for every column ai of A, i.e. to A (Ax b) = 0.
It is hence of interest to know whether A A is invertible. We assume throughout the remain-
der of this chapter that the singular values of A satisfy

1 2 n > 0
and in this case A A is indeed invertible. Under this assumption, (LSQ) is solved by

x = A b
where the pseudo-inverse A = (A A)1 A is dened in Denition 3.6. The system (7.1) is
called the normal equations for (LSQ).
Recall the denition of the 2-norm condition number for (LSQ) from Chapter 3. Note that
this is large when A is close to being rank-decient, so that n is small. In many practical
applications of (LSQ), such as arise naturally in statistics, A is indeed close to being rank-
decient. We will comment on this issue when we discuss the stability of the algorithms.

93
94 CHAPTER 7. LEAST SQUARES PROBLEMS

7.1 LSQ via Normal Equations

The Method
A natural approach to solution of the normal equations is as follows:

Algorithm (LSQ via normal equations).


input: A Cmn with m n and rank(A) = n, b Cm
n
output: x C with minimal kAx bk2

1: calculate A A and A b
2: solve (A A)x = A b

Computational Complexity
The method is usually implemented via standard algorithms for the matrix-matrix multiplica-
tion and matrix-vector multiplication, together with Cholesky factorisation for solution of the
symmetric linear system.

Theorem 7.3 (Cost of Normal Equations for LSQ). The cost of solving (LSQ) by the Normal
Equations approach is, in the usual implementation described above,
1
C(m, n) mn2 + n3
3
as m, n .
Proof. Calculating A A and A b requires asymptotically 2mn2 operations for m, n .

Since A A is symmetric we only need to calculate half of the entries, which can be done in
mn2 operations. Calculation of A b costs (mn). Solving (A A)x = A b with Cholesky-
1 3
factorisation requires 3 n operations. Combining these terms gives the result.

Of particular interest is the case m an as n for some constant a  1, which arises


frequently in practice. If a1 the cost is dominated by mn2 .

Error Analysis
One way that the error analysis proceeds is by proving that the computed x, x
, solves

(A A + A)
x = A b + c
and then using the backward error analysis for (SLE). Because (A A) = (A)2 in the 2-norm
it follows that
kxk2
= O (A)2 m .

kxk2
In practice this method can perform very badly when the matrix A is close to being rank-decient
(so that n is small by Lemma 3.8) and the 2-norm condition number of A is large.

7.2 LSQ via QR factorisation

The Method
To increase stability, at an increase in computational cost, the following QR based algorithm is
often used in practice.

Algorithm (LSQ via QR factorisation).


input: A Cmn with m n and rank(A) = n, b Cm
output: x Cn with minimal kAx bk2
1: compute the reduced QR factorisation A = Q R


2: compute Q b C n
=Q
3: solve Rx b using back substitution
7.3. LSQ VIA SVD 95

The result of the algorithm satises

Rx
A Ax = A Q
Q
= A Q b
Q
=R QQ
b
Q
=R b
= A b

and thus solves (LSQ).

Computational Complexity
The method is usually implemented by means of Householder based QR factorisation, standard
matrix-vector multiplication for b
Q and standard triangular inversion of R. As for the normal
equations approach, we assume that there is positive a, independent of n, such that m an as
n .

Theorem 7.4 (Cost of QR for LSQ). Let sn (A) > 0. The cost of solving (LSQ) by the QR
approach is, in the usual implementation,
2
2mn2 n3 + (mn + n2 ).
3

Proof. Steps 1 and 2 together have asymptotic cost C1 (m, n) 2mn2 32 n3 + (mn), and step 3
2
has asymptotic cost C2 (m, n) = (n ). Thus we get the desired result.

If a  1 then the cost is dominated by 2mn2 and is roughly twice that of the normal equations
approach.

Error Analysis
The Householder QR approach has considerably improved backward stability properties when
compared with the normal equations based solution method. In particular, the backward error
is small in the following sense.

Theorem 7.5. Let x solve the (LSQ) problem for (A, b) and let x be the computer solution via
Householder QR factorisation. Then x minimises k(A + A)x (b + b)k2 for matrix A with
columns aj and vector b satisfying, for m suciently small,
Cmnm
kaj k2 kaj k2 , j = 1, . . . , n
1 Cmnm
Cmnm
kbk2 kbk2 .
1 Cmnm

In contrast to the normal equations method, this bound on the backward error is independent
of the condition number of A. However, practitioners are often interested in the implied error in
estimating the residual r = b Ax. This can still behave poorly for the QR method, particularly
when the matrix A is ill-conditioned.

7.3 LSQ via SVD

The Method
The most stable solution of (LSQ), especially useful for problems where A is close to being
rank-decient, and n close to zero, is SVD based.
96 CHAPTER 7. LEAST SQUARES PROBLEMS

Algorithm (LSQ via SVD).


input: A Cmn with m n and rank(A) = n, b Cm
n
output: x C with minimal kAx bk2
V
1: compute the reduced SVD A = U

2: compute U b C n
=U
3: solve y b
4: return x = V y

The result x of the calculation satises

V
A Ax = A U V y = A U
U b = A b

and thus it is the solution of (LSQ).

Computing the SVD


Assume that we are given A Cmn . A common approach to computing the SVD is as follows:

Step 1. Find the reduced QR factorisation R


A=Q with Cmn
Q and Cnn .
R
Step 2. Perform a bidiagonalisation of R, via orthogonal transformations as in Lemma 4.11,
so that B = U1 RV1 where B Cnn is upper-bidiagonal: has non-zeros only on the
diagonal and on the rst upper o-diagonal.

Step 3. Find an SVD B = U2 V2 by use of Theorem 2.17.

Step 4. Note that A = (QU1 U2 )(V1 V2 ) and that hence A = U V with orthogonal U, V
given by U = QU1 U2 , V = V1 V2 . (Note that the composition of orthogonal matrices is
itself orthogonal).

It may be shown that the cost of this algorithm is 2mn2 + 11n3 + (mn + n2 ).

Computational Complexity
Using these ideas is the basis for the standard implementation of SVD based approaches to
(LSQ). As for the normal equations approach, we assume that there is a positive a, independent
of n, such that m an as n .
Theorem 7.6 (Cost of SVD for LSQ). Let sn (A) > 0. The cost of solving (LSQ) by the SVD
approach is, in the usual implementation,
2mn2 + 11n3 + (mn + n2 ).
Proof. (Sketch) SVD is 2mn2 + 11n3 + (mn + n2 ).
form c=U b : (mn)
= c : (n)
solve w

form x=Vy : (n2 ).

If a  1 the cost is similar to that of of the QR based method. But, for moderate values
of a, the additional 11n3 term means that this method can be considerably more time-consuming
than the QR method.

Error Analysis
The SVD approach is by far the most stable of the three methods for (LSQ). This is because
it reveals the singular values which are close to zero, and hence possibly reect rank-deciency
of A. Exploiting this fact, enables the method to perform well even in situations which are close
to rank-decient and, in particular, where n is very close to zero.
7.3. LSQ VIA SVD 97

Bibliography

The books [TB97] and [Dem97] both contain clear discussions of algorithms for (LSQ); the
statements about complexity of the SVD, and SVD based methods for (LSQ) may be found in
[TB97] for example. More details concerning the error analysis, and Theorem 7.5 in particular,
can be found in [Hig02].

Exercises

Exercise 7-1. The purpose of this question is to introduce you to the content of this chapter.
It is broken into three parts: (i) formulate a least squares problem; (ii) solve it blindly using
Matlab; (iii) introduce the SVD, a major tool used in solving least squares problems.
(i) Let {(u(i) , v (i) )}m
i=1 be a set of points in the (u, v) plane. We wish to nd the best straight
line t
v = u +
to this data. The least squares approach to this problem is to nd (, ) R2 minimising

m
X
G(, ) = |u(i) + v (i) |2 .
i=1

Show that G may be written as


G(, ) = kAy bk22
for y = (, )T and some choice of A Rm2 and b Rm which you should specify.
i=1 in the following way: specify some delta R
{(u , v )}100
(i) (i)
(ii) Generate some data and type

>> u=rand(100,1);
>> v=2*u+ones(100,1)+delta*rand(100,1);
Construct A and b as dened in (i) and then form

>> x=A\b
This is the least squares solution. What would you expect the solution to be if delta is very
small? Test your hypothesis by repeating the previous construction of A and b for a variety of
small delta.
(iii) As we shall see, there are several competing methods to solve a least squares problem. One
is based on the QR decomposition. A second is based on the SVD ( singular value decomposi-
tion). To make the matrices shorter type A=A(1:5,:). We will perform operations on this new
shortened version of A (corresponding to 5 data points rather than 100.) Type

>> [U,X,V]=svd(A)
and discuss the properties of U, X and V (think about columns!). Now type

>> rn=norm(U*X*V'-A)
What does this tell you? Now type

>> [W,S,Z]=svd(A,0)
and compare W, S and Z with U, X and V. Again type

>> rn=norm(W*S*Z'-A)
What does this tell you?
The two factorisations are known (respectively) as the full SVD and the reduced SVD In our
theoretical development we will mainly use the reduced SVD, and will refer to this as the SVD
for short.
98 CHAPTER 7. LEAST SQUARES PROBLEMS

Exercise 7-2. You are given m data points (u(i) , v (i) ) where u(i) Rn1 and v (i) R for
n1
i = 1, . . . , m. We would like to nd R and R to minimise

m
X
|T u(i) + v (i) |2 .
i=1

Show that this problem may be reformulated as a standard least squares problem by specifying
appropriate choices of A Rmn and b Rm .
Now let n = 2 and generate m = 50 random data points of the form

v (i)
=u (i)
+ 102 W (i)
where the u(i) are chosen at random uniformly from [0, 100] and the W (i) are chosen at random
from the standard normal distribution (with mean zero and variance one). Solve the resulting
least squares problem by the QR method and by using the SVD (make use of Matlab's qr and
svd routines). What do you expect the answer to be? Compare the eciency and accuracy of
the two dierent methods.

Exercise 7-3. Vandermonde matrices again: remember the command vander in Matlab.
a) Consider the polynomial interpolation problem. For each of the cases l = 2, 4, 6, 8, 10,
generate a square Vandermonde matrix B (l) with the data points {uj }lj=1 chosen from the set

{1, 2, . . . , l}. Estimate the condition numbers of the B (l)


with the Matlab routine cond, and plot
it as a function of l. What do you deduce from this exercise?
b) Now consider the problem of polynomial data tting. Write down a least squares problem
B = v which determines the coecients {j }nj=1 in a polynomial

n
X
v(x) = j uj
j=1

which best ts the data points {(uj , vj )}nj=1 in the sense of minimising

m
X
|v(ui ) vi |2 .
i=1

The matrix B is a rectangular Vandermonde matrix. For each of the cases l = 2, 4, 6, 8, 10,
generate the rectangular B (l) with m=l and n = l/2. Choose the data points {uj }lj=1 from the
set {1, 2, . . . , l}. Estimate the condition numbers of the B (l) with the Matlab routine cond, and
plot it as a function of l. Discuss what you observe in relation to the rst part of the question.
c) Now consider the data points

uj = j, j = 1, . . . , 9
u10 = 9.0000001
vj = 1 + uj + u4j + 102 W (j)

where the W (j) are independently chosen at random from the standard normal distribution
(mean zero and variance one). Find the tenth degree polynomial interpolant to this data set,
using QR factorisation. Then nd the best t polynomial of degree four, using QR to solve the
resulting least squares problem. Compare the results you obtain from interpolation and best t.
Use graphical output to illustrate your discussion.

Exercise 7-4. Let A Rnn be symmetric positive denite with eigenvalues

0 < 1 2 . . . n .
1. Show that
kAk2 = n and kA1 k2 = 1
1 .

What is the condition number of A in the Euclidean norm in this case?


7.3. LSQ VIA SVD 99

2. Explicitly nd A solving

 
kAk2
min : A + A is singular .
kAk2

How big is kAk2 /kAk2 for this A?


3. Let B Rmn with m > n. State and derive the normal equations for the solution of the
least squares problem
min kBx bk2
xRn

where b Rm .

4. If B has singular values {i }ni=1 then what is the Euclidean norm condition number of the
matrix which must be inverted to solve the normal equations?

Exercise 7-5. Show that the least squares solution will yield Ax = b if and only if b = AA b

where A is the Moore-Penrose inverse of A.
100 CHAPTER 7. LEAST SQUARES PROBLEMS
Chapter 8

Eigenvalue Problems
In this chapter we will study methods for solution of the eigenvalue problem (EVP): given
A Cnn , compute the eigenvalues and eigenvectors of A.
A rst insight is that nding eigenvalues and nding eigenvectors are equivalent problems.
To nd an eigenvalue from an eigenvector we proceed as follows: Given an eigenvector x Cn we
try to nd the C which minimises kx Axk2 . Since x is an eigenvector, the minimum of 0
is attained for the corresponding eigenvalue. It is easy to check, for example using Corollary 7.2,
that the optimal choice of is

hx, Axi
= (x x)1 x (Ax) = .
hx, xi
Note that solvability of the normal equations in this case is guaranteed, provided that x 6= 0. The
preceding considerations show that the following denition provides a natural way to estimate
an eigenvalue from an estimate of an eigenvector.

Denition 8.1. The Rayleigh quotient of a matrix A Cnn is dened by

hx, Axi
rA (x) =
hx, xi
for all x Cn .
Using this notation, we get that
Ax = rA (x)x
for all eigenvectors x of A.
Calculating an eigenvectorx from an eigenvalue can be achieved by xing one component
of the vector x, say 1 and then solving n 1 of the linear equations Ax = x, with
xl , to be
and xl given. If the eigenvectors corresponding to have xl = 0 then this system of linear
equations will have no solution but, since x 6= 0, there will be an l {1, . . . , n} such that the
system can be solved and the solution will be an eigenvector with eigenvalue .

Theorem 1.12 shows that, if we can nd the roots of arbitrary polynomials, we can also
nd the eigenvalues of arbitrary matrices. In fact the two problems are equivalent: for every
polynomial we can nd a matrix with this polynomial as the characteristic polynomial. To see
p(z) = (1)n (c0 + c1 z + + cn1 z n1 + z n ) be given.
this let Dene the companion matrix
C Cnn of the polynomial p by

0 c0
1
c1
.. .
.

C= . . . (8.1)

..
. cn2
1 cn1

101
102 CHAPTER 8. EIGENVALUE PROBLEMS

One can show that C (z) = det(C zI) = p(z) (see Exercise 8-8). Hence nding eigenvalues
of C is equivalent to the problem of nding the roots of the polynomial p. This establishes
equivalence between the eigenvalue problem and root nding for polynomials. Regarding this
latter problem we know the following:

Theorem 8.2 (Abel, 1824). For every n 5 there is a polynomial p of degree n with ratio-
nal coecients that has a real root which cannot be expressed by only using rational numbers,
addition, subtraction, multiplication, division and taking kth roots.
Any possible algorithm will be based on the operations mentioned in the theorem and hence
it is not possible to nd an algorithm which calculates eigenvalues exactly after a nite number
of steps. The conclusion to be drawn from this is that any eigenvalue solver must be iterative:
aiming to approximate the solution of the eigenvalue problem in a nite number of steps. We
describe a variety of iterative methods for (EVP) in the following sections.

In the remainder of this chapter we will consider the Hermitian eigenvalue problem. As we
have seen, if A Cnn is Hermitian, the matrix has an orthonormal system of eigenvectors and
the corresponding eigenvalues are real. In this chapter x1 , . . . , xn Cn will always denote an or-
thonormal system of eigenvectors of A and 1 , . . . , n R will be the corresponding eigenvalues.
We order the eigenvalues such that

|1 | |2 | |n | 0.

Since our methods will be iterative, the algorithms will only compute approximate eigenvec-
tors and eigenvalues. The following theorem shows, that the Rayleigh quotient of an approximate
eigenvector is a good approximation for the corresponding eigenvalue.

Theorem 8.3. Let A Cnn be Hermitian and x Rn \{0}.


a) x is an eigenvector of A with eigenvalue if and only if rA (x) = 0 and rA (x) = .
b) If x is an eigenvector of A, then
rA (x) rA (y) = O kx yk22


as y x.
Proof. a) The gradient of rA is

2  
rA (x) = Ax rA (x) x .
kxk2

Assume Ax = x. Then rA (x) = hx, xi/hx, xi = and

2 
rA (x) = 2 x x = 0.
kxk2

Note that rA (x) as x 0. Thus, if rA (x) = 0, then Ax rA (x)x = 0 and x 6= 0 so


that (x, rA (x)) is an eigenpair for A.
b) From part a) we know rA (x) = 0. Taylor expansion of rA around x gives

rA (y) = rA (x) + hrA (x), y xi + O ky xk22




= rA (x) + O ky xk22 .


Many of the methods for nding eigenvalues rely on repeated matrix-vector multiplication
by A. Thus they will be considerably cheaper if A is preprocessed so that it has as many zeros as
possible, but is similar to the original matrix. This can be achieved by reducing A to Hessenberg
form using Lemma 4.12. When A is Hermitian, the Hessenberg form is necessarily tridiagonal.
Hence tridiagonal matrices play an important role in what follows.
8.1. THE POWER METHOD 103

8.1 The Power Method

The Method
The key idea of the power method is that, assuming some minor technical conditions, repeated
application of A to an arbitrary starting vector will lead to a sequence of vectors which eventually
align with the eigenvector associated with the largest eigenvalue.

Algorithm PI (power iteration).


input: A Cnn Hermitian
(k)
output: z Cn , (k) R with z (k) x1 and (k) 1
1: choose z (0) Rn with kz (0) k2 = 1
2: for k = 1, 2, 3, . . . do
3: w(k) = Az (k1)
w(k)
4: z (k) =
kw(k) k2
5: end for
6: (k) = rA (z (k) )

Remarks.
1. As with all iterative methods, the method must be stopped at some point where the result
is close enough to the real one.

2. The algorithm calculates an approximate eigenvector as z (k) = Ak z (0) /kAk z (0) k2 . To avoid
(k)
overow/underow errors the vector z is normalised in every step of the iteration.

3. The method is based on the following idea: if we express z (0) in the basis x1 , . . . , x n we
get
n
X
z (0) = i xi
i=1

and
n
X n
X
Ak z (0) = i Ak xi = i ki xi .
i=1 i=1

For large k this expression is dominated by the term corresponding to the eigenvalue with
the largest modulus.

Error Analysis
Since eigenvectors are only determined up to scalar factors, we cannot expect the approxima-
tionz (k) to be close to the vector x1 we chose. Instead, we would expect z (k) to be close to x1
where is a factor which might depend on k . We get the following result.

Theorem 8.4. Let |1 | > |2 | |n | and hx1 , z (0) i = 6 0. Then there is a sequence
( (k) )kN in C with | (k) | = 1 for all k N such that the sequences (z (k) ) and ((k) ) from the
power iteration algorithm satisfy

z (k) x1 = O 2 k
(k)  
(8.2)
2 1

and
1 = O 2 2k
(k)  
(8.3)
1
as k .
104 CHAPTER 8. EIGENVALUE PROBLEMS

Proof. a) We can write z (0) as

n
X
z (0) = i xi , i = hxi , z (0) i.
i=1

Since 1 = hx1 , z (0) i =


6 0, we get

n n
X  X i  i k 
Ak z (0) = i ki xi = 1 k1 x1 + xi
i=1
1
i=2 1

and Pythagoras' Theorem gives

n
k (0) 2  X i 2 i 2k 
A z = |1 k1 |2 1 + = |1 k1 |2 (1 + k )
2
i=2 1 1

where
n (0) 2
kz k2 2 2k .
X i 2 i 2k
k =
i=2
1 1 |1 |2 1
Thus we can conclude
n
Ak z (0) 1 k1  X i  i k  1
z (k) = k (0)
= k
x 1 + xi .
kA z k2 |1 1 | 1
i=2 1
1 + k

Now dene (k) = 1 k1 /|1 k1 |. Then

Ak z (0) 2 Ak z (0) 2 X n
(k) i 2 i 2k
x = x = = k

k 1 k 1
|1 1 | 1 1

2 1 1 2
i=2

and thus
Ak z (0) Ak z (0) Ak z (0)
(k)
z (k) x1 k (0) + (k)
x

1
kA z k2 |1 k1 | 2 |1 k1 |

2 2
p
= 1 + k 1 + k

2 k
kz (0) k2 2 k
2 .
|1 | 1
This completes the proof of (8.2).
b) Using part b) of Theorem 8.3 and (8.2) we get

1 = rA (z (k) ) rA ( (k) x1 ) = O kz (k) (k) x1 k22 = O 2 2k .


(k)   
1
This completes the proof of (8.3).

Computational Complexity
For simplicity assume that our primary interest is in the accuracy of eigenvectors. We wish to
iterate until
kz (k) (k) x1 k2 . (8.4)

Inspection of the preceding proof shows that this inequality will be attained provided

 kz (0) k 
2 2 k
2 . (8.5)
1 1
Hence it suces to choose k to be the smallest integer greater than

ln 1 + ln kz (0) k2 + ln 2 ln 1
k# = .
ln(1 /2 )
8.2. INVERSE ITERATION 105

Assumption 8.5. The rst and second eigenvalues of A satisfy


1
2 = 1 + (n ).

Theorem 8.6. Under Assumption 8.5 and the assumptions of Theorem 8.4, the computational
cost of using the power method to achieve (8.4) is bounded above by c n1+ ln 1 + n3 , for


some constant c independent of n and .


Proof. Because of kz (0) k2 = 1, kz (0) k and 1 are independent of n. The cost
the quantities
3
of reducing A to Hessenberg (tridiagonal) form is (n ) (see Lemma 4.12). Each step of the
iteration then costs (n), since matrix vector-multiplication and inner-products are both (n)
1
when the matrix tridiagonal. Combining the second and third items gives a bound of (n ln )
for the number of iterations. Using (8.5) gives the desired result for the largest eigenvalue and
corresponding eigenvector of the tridiagonal matrix. Transforming back to the original matrix
A costs (n2 ) and is hence negligible relative to the cost of transforming to Hessenberg form.
The transformation is an orthogonal one and hence does not change the 2-norm of the error in
the eigenvector.

8.2 Inverse Iteration

The Method
The power iteration algorithm helps to nd the eigenvalue with the largest modulus and the cor-
responding eigenvector. The following simple idea leads to a modication of the power iteration
method to nd dierent eigenvalues of A.
Given R, the matrix (A I)1 has eigenvalues i = 1/(i ) for i = 1, . . . , n.
The power iteration method applied to this matrix will thus return an approximation for the
eigenvector corresponding to the j with the biggest modulus. This is the j where j is the
eigenvalue closest to . The resulting algorithm is called the inverse iteration method.

Algorithm II (inverse iteration).


input: A Cnn Hermitian, R
(k)
output: z Cn , (k) R with z (k) xj and (k) j
where j is the eigenvalue closest to

1: choose z (0) Rn with kz (0) k2 = 1


2: for k = 1, 2, 3, . . . do
(k)
3: solve (A I)w = z (k1)
w(k)
4: z (k) =
kw(k) k2
5: end for
6: return (k) = rA (z (k) ).

Remark. For practical usage the method is stopped at some point where the result is close
enough to the real one.

Error Analysis
The following convergence result is a direct consequence of Theorem 8.4.

Theorem 8.7 (Convergence of Inverse Iteration). Let A Cnn be Hermitian and R.


Suppose a, b {1, . . . , n} are such that |a | < |b | |j | for all j {1, . . . , n} \ {a, b}.
Then, if hz (0) , xa i =
6 0, there exists a sequence ( (k) )kN with | (k) | = 1 such that
!
a 2k

(k) (k)
kz xa k22 (k)
+ | a | = O .
b
106 CHAPTER 8. EIGENVALUE PROBLEMS

Computational Complexity
The computational cost is analogous to that of the power iteration, under appropriate assump-
tions, found by translating the theorem for the power method, where we iterate A, to the inverse
iteration situation where we iterate (A I)1 . Because, after preprocessing, A is tridiagonal
the cost of each step of inverse iteration is the same as for the power method.

8.3 Rayleigh Quotient Iteration

The Method
For inverse iteration it is necessary to estimate the eigenvalue in advance. An alternative is to
estimate the eigenvalue during the course of the algorithm:

Algorithm RQI (Rayleigh quotient iteration).


input: A Cnn symmetric
(k)
output: z Rn , (k) R with z (k) xj and (k) j for some j {1, , n}
1: choose z (0) such that kz (0) k2 = 1, (0) R
2: for k = 1, 2, . . . do
3: [A (k1) I]w(k) = z (k1)
4: z (k) = w(k) /kw(k) k2
hz (k1) , w(k) i + (k1) kw(k) k22
5: (k) =
kw(k) k22
6: end for
Notice that (k) in step 5 is simply rA (w(k) ), computed without application of A.

Error Analysis
The previous two algorithms both converged linearly: asymptotically, the error decreases by a
constant factor at each step. In contrast, when the Rayleigh quotient iteration converges to an
eigenvalue, say j , it does so cubically:
|(k+1) j | = O(|(k) j |3 ).

A similar cubic rate of convergence is also obtained for the eigenvector. Such methods can hence
be very ecient.

Computational Complexity
However, with iterations of this type, where the convergence is not linear (as with the New-
ton method for example), it is dicult to predict when it will converge, and to which eigen-
value/eigenvector pair it will converge. Much of the computational complexity of such algorithms
stems from choosing starting points which will lead to convergence to the desired pair.

8.4 Simultaneous Iteration

The Method
Now imagine that we wish to calculate all the eigenvectors. A natural idea is to take

z (0) Rnn

orthonormal, and generate


Z (k+1) = AZ (k) .
8.4. SIMULTANEOUS ITERATION 107

(k)
Then each column of Z (k) , zi , is calculated as

(k+1) (k)
zi = Azi .

Let
n
(0)
X
zi = j,i xj
j=1

and assume that


(0)
i,i = hzi , xi i =
6 0, i = 1, . . . , n (8.6)

Now
n
(k)
X
zi = j,i kj xj .
j=1

If
|1 | > |2 | > > |n1 | > |n | (8.7)

then

(k)
span{z1 } = span{y1 }, ky1 k2 = 1, y 1 = x1 + e1
(k) (k)
span{z1 , z2 } = span{y1 , y2 }, ky2 k2 = 1, y 2 = x2 + e2
.
.
.
(k) (k)
span{z1 , z2 , . . . , zn(k) } = span{y1 , y2 , . . . , yn }, kyn k2 = 1, yn = xn + en ,

where
min{j,n1} !
l+1 k

X
ej = O = O()
l
l=1

for
l+1
:= max .
1ln1 l
However, in practice this is a poor algorithm, not only because of overow / underow if
(k) (k) (k)
|1 | =
6 1, but also because the approximate basis {z1 , z2 , . . . , zn } for {x1 , . . . , xn } is highly
ill-conditioned, here meaning that the vectors nearly all point in the same direction, x1 .
To overcome these problems the algorithm used in practice forms an orthonormal basis from
the approximate basis for {x1 , . . . , xn }, by means of QR.

Algorithm (simultaneous iteration).


input: A Rnn symmetric
(k) (k)
output: Z , (k) Rnn with Z (k) (x1 , x2 , . . . , xn ) and ii i for i = 1, . . . , n
1: choose an orthogonal matrix Z (0) Rnn
2: for k = 1, 2, 3, . . . do
3: W (k) = AZ (k1)
(k)
4: calculate the QR factorisation W = Z (k) R(k)
(k) (k1) T (k)
5: = (Z ) W
6: end for
Here Z (k) R(k) is the QR factorisation of W (k) . In the following we write Z (k) and W (k) in
terms of columns:

(k) (k)
Z (k) = (z1 , . . . , zn(k) ), W (k) = (w1 , . . . , wn(k) ).
(k1)
We show below that the diagonal entries of (k) are rA (zi ).
108 CHAPTER 8. EIGENVALUE PROBLEMS

Error Analysis
Theorem 8.8 (Convergence of Simultaneous Iteration). Let A Rnn be symmetric, satisfying
(8.7) so that < 1. Under assumption (8.6) there are, for j = 1, . . . , n, sequences {sj }kZ ,
(k)
+

sj {1, +1}, such that


(k)

(k) (k) (k)


kzj sj xj k22 + |jj j | = O(||2k ).
Proof. By the discussion preceding the algorithm the result follows for the approximate eigen-
vectors. Since

(k) (k1) (k) (k1) (k1)


jj = hzj , wj i = hzj , Azj i
(k1)
= rA (zj ),
the convergence to the eigenvalues follows by Theorem 8.3.

Computational Complexity
In practice the simultaneous iteration is usually implemented in the form known as the QR
algorithm, which we discuss in the next section. Hence we discuss complexity in that context.

8.5 The QR Algorithm for Eigenvalues

The Method
In practice the eigenvalues of A can be found directly by a method related to simultaneous
iteration, and which we now derive. The resulting algorithm appears almost magical upon rst
sight, but when related to simultaneous iteration, its mechanism is clear. To establish this
relation let Z (k1) , Z (k) , R(k) , W (k) and (k) be as in the simultaneous iteration algorithm and
dene
Q(k) := (Z (k1) )T Z (k) .
This is an orthogonal matrix by construction.
Recall that
(k) = (Z (k1) )T AZ (k1) .
Thus

Q(k) R(k) = (Z (k1) )T Z (k) (Z (k) )T W (k)


= (Z (k1) )T AZ (k1)
= (k) .
Furthermore

R(k) Q(k) = (Z (k) )T W (k) (Z (k1) )T Z (k)


= (Z (k) )T AZ (k)
= (k+1) .
Thus reversing the order of Q and R results in an improved estimate of the eigenvalues. This
suggests the following algorithm for locating eigenvalues of A.

QR for eigenvalues input: A Rnn symmetric


nn
output: R with diagonal entries approximating the eigenvalues

1: Set (0) = A
2: for k = 1, 2, . . . do
3: calculate the QR factorisation (k1) = Q(k1) R(k1)
(k) (k1) (k1)
4: =R Q
5: end for
8.5. THE QR ALGORITHM FOR EIGENVALUES 109

Error Analysis
Theorem 8.9 (Convergence of QR). Under the assumptions of Theorem 8.8, the QR eigenvalue
algorithm satises (possibly after reordering of eigenvalues)
(k)
kii i k2 = O ||2k ,

i = 1, , n.

Proof. The algorithm produces the same (k) as simultaneous iteration. So the diagonal be-
haviour follows from Theorem 8.8.

For the o-diagonals, note that by orthogonality of the xl


(k) (k) (k)
jl = hzj , Azl i = hxj , Axl i + O(||k )
= hxj , l xl i + O(||k )
= l hxj , xl i + O(||k )
= O(||k ).

Thus the matrix becomes diagonal as k .

Computational Complexity
ln
Making the error in the QR method of size gives the number of iterations k = O( ). Under
ln
the assumptions of Theorem 8.8, the QR algorithm as stated would cost
 
ln 3
O n
ln
work to obtain estimates of all eigenvalues to within O(). However, this can be improved to
 
ln 2
O n + O(n3 ),
ln
with second term independent of , as follows.
Recall from Denition 1.8 that a matrix A Cnn is upper Hessenberg if

aij = 0 i > j + 1.

Lemma 8.10. If (0) is upper Hessenberg, the QR Algorithm generates (k) which are upper
Hessenberg.
Proof. We have
(k) = R(k1) Q(k1)
= R(k1) (k1) (R(k1) )1 .

But R(k1) and its inverse are upper triangular and pre- or post-multiplication by upper trian-
gular matrices preserves the upper Hessenberg form.

Theorem 8.11 (Cost of QR). Under the assumptions of Theorem 8.8 the QR algorithm can be
implemented to give eigenvalue approximations of size with cost
ln 2
C(n, ) k1 n + k2 n3 ,
ln
where k1 , k2 are independent of n, and .
Proof. (Sketch) First nd T as in Lemma 4.12, at cost (n3 ). Then apply the QR algorithm to
T. Since each QR factorisation is on an upper Hessenberg matrix, the cost reduces to (n2 ).
The number of iterations required to reduce the error to size is bounded by O(ln / ln ) and
the result follows.
110 CHAPTER 8. EIGENVALUE PROBLEMS

8.6 Divide and Conquer for Symmetric Problems

We conclude the chapter with a somewhat dierent method for eigenvalue calculation, speci-
cally constructed for symmetric tridiagonal systems. It uses a divide and conquer approach, as
introduced for the Strassen algorithm.

The Method
If A Rnn is symmetric then the reduction to upper Hessenberg form of Lemma 4.12 yields
3
tridiagonal T in (n ) operations. Thus we consider nding the eigenvalues of a symmetric
tridiagonal T . By expressing T in terms of two smaller tridiagonal matrices of half the size,
we create the basis for a divide and conquer approach for the symmetric eigenvalue problem.
Consider a tridiagonal T in the form:

a1 b1

.. ..
b1 . .

.. ..
. . bn/21

bn/21 an/2 bn/2
T = .

bn/2 an/2+1 bn/2+1

.. ..
bn/2+1 . .

.. ..
. . bn1
bn1 an
The basis of the divide and conquer approach is to split T into two separate tridiagonal
matrices, each of whose eigenvalues and eigenvectors determine those of T. This can be done
exactly if b is zero, and the following splitting of T tries to build on this fact.
Let n = 2k , b = bn/2 and write
 
T1 0
T = + b(v v), (8.8)
0 T2

where v = (eTn/2 |eT1 )T , with e1 , en/2 Rn/2 , and T1 , T2 tridiagonal. In fact,

(T1 )ij = Tij provided (i, j) 6= (n/2, n/2)


(T1 )n/2 n/2 = Tn/2 n/2 b
(T2 )ij = Ti+n/2,j+n/2 provided (i, j) 6= (1, 1)
(T2 )11 = T1+n/2,1+n/2 b.

How do we nd the eigenvalues of T from those of T1 and T2 ? Since T1 , T2 are real symmetric,

Ti = Qi i QTi , i = 1, 2.

Let
 
1 0
D= = diag(d1 , . . . , dn )
0 2
 T   T

Q1 0 last column of Q1
u= v = .
0 QT2 rst column of Q2
T

Denition 8.12. Let f: RR be dened by

f () := 1 + bhu, (D I)1 ui.

The equation f () = 0 is called the secular equation.


The following theorem shows how the eigenvalues of T can be calculated from those of T1
and T2 .
8.6. DIVIDE AND CONQUER FOR SYMMETRIC PROBLEMS 111

Theorem 8.13. Assume that the {di } are distinct and that ui 6= 0 for all i. Then the eigenvalues
of T are the real roots of the secular equation.
Proof.
Q1 1 QT1
 
0
T = + bv v
0 Q2 2 QT2
     T 
Q1 0 1 0 Q1 0
= + bu u .
0 Q2 0 2 0 QT2
Thus T has the same eigenvalues as D + bu u by similarity.
Now,
det(D + bu u I) = det((D I)(I + b(D I)1 u u)).
If
det(D + bu u I) = 0
and is not an eigenvalue of D then

det(I + b(D I)1 u u) = 0.

Now det(I + x y) = 1 + hy, xi (see Exercise 8-4) and so

1 + bhu, (D I)1 ui = f () = 0.

Recall that we assume the {di } are distinct and that ui 6= 0. Let d0 = and dn+1 = .
Then

if b<0 then f () has n roots i (di , di+1 ), i = 0, . . . , n 1;


if b>0 then f () has n roots i (di , di+1 ), i = 1, . . . , n.

This follows from the fact that

n
X u2i
f () = 1 + b .
i=1
(di )

Hence all the eigenvalues are accounted for and it follows that the assumption that cannot be
an eigenvalue of D holds.

Theorem 8.13 is the basis of a divide and conquer algorithm for the eigenvalue problem.

Error Analysis
The primary source of error, in real arithmetic, arises from solution of the secular equation.
However, because the roots which we seek are real, and because they lie in known intervals,
root-nding methods with great accuracy and eciency can be applied (a simple example being
bisection). In practice the errors arising from this step of the algorithm are negligible compared
with those arising from nite precision arithmetic.

Computational Complexity
Let Lk = C(2k ), C(n) is the cost of nding eigenvalues and eigenvectors of symmetric
where
tridiagonal matrices in Rnn . If we use a recursive algorithm, then the cost Lk needs to be
expressed in terms of the cost Lk1 of nding the eigenvalues and eigenvectors of symmetric
n/2n/2
tridiagonal matrices in R .
nn
Finding the eigenvalues and eigenvectors in R has cost Lk where

Lk = 2Lk1 + mk ,

where mk is the cost of constructing, and solving, the secular equation for T Rnn . This cost
comprises:
112 CHAPTER 8. EIGENVALUE PROBLEMS

nding eigenvectors from these eigenvalues: this costs (n) for each eigenvalue since the
matrix is tridiagonal, giving a total of (n2 );
constructing u: (n) to nd last column of QT1 and rst column of QT2 ;
Solving the secular equation: (n), assuming unit cost for each root using, for example,
bisection.

To state a theorem about the cost of this iteration we need the following denition.

Denition 8.14. The decomposition (8.8) of a tridiagonal matrix T is non-degenerate if the


{di } are distinct and ui 6= 0 for all i.
Theorem 8.15 (Cost of Eigenvalue Divide and Conquer). Let T be symmetric, tridiagonal. If
all decompositions are non-degenerate at each level of recursion and if root nding for the secular
equation is assigned unit cost, then the algorithm generates all eigenvalues and eigenvectors of
T at cost O(n2 ).
Proof. We have mk = (n2 ) = (4k ). Thus, for some a > 0,
Lk 2Lk1 + a4k .
Induction based on the assumption
Lk b4k
gives the desired result, provided b min{1, 2a}, since nding the eigenvalues of a 11 matrix
has unit cost.

Bibliography

The book [Par80] contains a very complete overview of the symmetric eigenvalue problem, the
subject that dominates this chapter. The book [TB97] has a good introduction to the subject.

Exercises

Exercise 8-1.
Matlab
The purpose of this question is to introduce you to ideas associated with eigen-
value problems and their solution in . The basic problem is, given A Rnn , to nd
n
xC and C such that
Ax = x (8.9)

and, of course, x should be non-zero.


(i) What can you say about the eigenvalues if A is symmetric (just state, no proof ). What
can you say about the eigenvectors?
(ii) Write down the denition of the characteristic polynomial satised by the eigenvalues .
(iii) What is the polynomial in the case >> A=[3,2; 1,2] Hence nd the eigenvalues and eigen-
vectors.
(iv) If

>> A=[3,2,1; 2,2,2; 1,2,1]


write the polynomial for in the form

p1 3 + p2 2 + p3 + p4 = 0
and nd p = (p1 , p2 , p3 , p4 ). In principle you can solve this cubic exactly so there is no need to
resort to numerical solution. (In general we obtain

n
X
ni pi+1 = 0.
i=0

Matlab
What does Galois tell us about our need for applying numerical solutions to this problem?)
(v) One way to nd the eigenvalues of A is to use the command roots. Type
8.6. DIVIDE AND CONQUER FOR SYMMETRIC PROBLEMS 113

>> roots(p)

to nd the roots of our cubic, and hence the eigenvalues of A. We can do this more generally.
Type

>> A=randn(20);
>> A=0.5*[A+A'];
>> p=poly(A)'

This creates a 20 20 matrix with normally distributed random entries, and then makes a
symmetric matrix from A. The vector p contains the co-ecients in the characteristic polynomial
of A so that, if you type sort(roots(p)) you will get the eigenvalues of A (sort just orders the
roots). Do this.
A Matlab
Matlab does not do this
(vi) Thus one way to nd eigenvalues of in is to form the characteristic polynomial
and nds its roots. But as more ecient methods are available. Type

>> sort(eig(A))

and you should get the same as you did in (v). But the method employed was very dierent.
Here we show you the two key ideas used in eig. Set

>> A=[-1,2,3,0,4;2,1,6,2,2;3,6,3,3,0;0,2,3,8,2;4,2,0,2,-1]

and now type

>> [Q,H]=hess(A)

What properties do Q and H have? Type

>> rn=norm(Q*H*Q'-A)

What can you say about the relationship between the eigenvalues of H and those of A?
Now type

>> [W,D]=schur(H)

What properties do W and D have? Type

>> rn=norm(W*D*W'-H)

What can you say about the relationship between the eigenvalues of D H?
and those of
Finally, what can you say about the relationship between the eigenvalues of A and the
diagonal elements of D? Test your hypothesis by typing eig(A) and comparing with D .

Exercise 8-2. Give a proof by induction which shows that the matrix A from (8.1) really has
determinant (1)n p(z) where p(z) = a0 + a1 z + + an1 z n1 + z n .

Exercise 8-3. Let B be a constant matrix and let T (t) solve the matrix dierential equation

dT
= BT T B, T (0) = T0 .
dt
Assuming that the solution of this equation exists for all t R, show that T (t) has the same
eigenvalues as T0 , for all t R. If B is skew-symmetric, show that kT (t)k2 = kT0 k2 for all t R.

Exercise 8-4. Prove that det(I + xy T ) = 1 + y T x.

Exercise 8-5. Construct an example showing that the Jordan canonical form of a matrix can
be discontinuous with respect to small perturbations.
114 CHAPTER 8. EIGENVALUE PROBLEMS

Exercise 8-6. Newton's Method

1. Let G : Rm Rm be dierentiable. Newton's method for the solution of the equation


G(y) = 0 is to generate iterates yk according to

dG(yk )yk = G(yk ), yk+1 = yk + yk

where dG(y) is the Jacobian of G evaluated at y . Find the explicit formula for yk+1 in
terms of yk when m = 1. What is the order of convergence of the method?
2. The eigenvalue problem for AR can be regarded as a nonlinear system of equations for
y = y(x, ) satisfying

Ax x = 0
1 T
(x x 1) = 0
2
Apply Newton's method to this system and show that it gives the iteration

1 2
2 (kxk k2 + 1)
k+1 = k + T
xk (A k I)1 xk
xk+1 = (k+1 k )(A k I)1 xk .

If x0 = 0 (02 6= 1), A = (kk2 = 1) and 0 6= , show that xk = k and nd the


iteration governing the k , i.e. nd h : R R such that

k+1 = h(k ).

Show that this iteration is Newton's method applied to a scalar equation and identify that
equation. Using this fact, prove that the full method on Rm is second order convergent for
this specic choice of initial data.

Exercise 8-7. Prove the following theorem concerning eigenvalues of tridiagonal matrices.

Theorem 8.16. Consider a matrix A of the form (5.5). Let the diagonal entries be real, constant
with equal o-diagonal entries: ai = a, ci = a and di = d for all i = 1, . . . , m where a and d are
real. Then the eigenvalues k of the matrix are given by
 
k
k = d + 2a cos ,
m+1

for k = 1, . . . , m. The corresponding eigenvector x = [x1 , . . . , xm ]T has j th component given by


xj = sin(kj/(m + 1)) (8.10)

Exercise 8-8. Consider the companion matrix C from (8.1). Prove that the eigenvalues k
of C are the roots of the polynomial
m
X
j j = 0
j=0

where m = 1. The corresponding eigenvector is

x = [x1 , . . . , xm ]T

where xj = j1
k .
Bibliography
[Bha97] R. Bhatia. Matrix Analysis. Springer, 1997.

[CH96] S.-N. Chow and J.K. Hale. Methods of Bifurcation Theory. Springer, 1996.

[CLRS01] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to Algorithms.
McGraw Hill, 2001.

[Dem97] J.W. Demmel. Applied Numerical Linear Algebra. SIAM, 1997.

[GL96] G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins, third edition,
1996.

[Hig02] N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, 2002.

[HJ85] R.A. Horn and JC.R. Johnson. Matrix Analysis. Cambridge University Press, 1985.

[Lax97] P. Lax. Linear Algebra. Wiley, 1997.

[LT88] Randall J. Leveque and Lloyd N. Trefethen. Fourier analysis of the SOR iteration.
IMA Journal of Numerical Analysis, 8(3):273279, 1988.
[Ner71] E.D. Nering. Linear Algebra and Matrix Theory. John Wiley, 1971.

[NW06] J. Nocedal and S.J. Wright. Numerical Optimization. Springer, 2006.

[Par80] B. Parlett. The Symmetric Eigenvalue Problem. Prentice-Hall, 1980.

[Rob01] J.C. Robinson. Innite Dimensional Dynamical Systems. CUP, 2001.

[Saa97] Y. Saad. Iterative Methods for Sparse Linear Systems. PWS, 1997.

[SB02] J. Stoer and R. Bulirsch. Introduction to Numerical Analysis. Springer, third edition,
2002.

[TB97] L.N. Trefethen and D. Bau. Numerical Linear Algebra. SIAM, 1997.

115

You might also like