Week7 PDF
Week7 PDF
Week7 PDF
7.1 Introduction
The simplest of matrices are the diagonal ones. Thus a linear map will be also easy to
handle if its associated matrix is a diagonal matrix. Then again we have seen that the
matrix associated depends upon the choice of the bases to some extent. This naturally leads
us to the problem of investigating the existence and construction of a suitable basis with
respect to which the matrix associated to a given linear transformation is diagonal.
Definition 7.1 A n × n matrix A is called diagonalizable if there exists an invertible n × n
matrix M such that M −1 AM is a diagonal matrix. A linear map f : V −→ V is called
diagonalizable if the matrix associated to f with respect to some basis is diagonal.
Remark 7.1
(i) Clearly, f is diagonalizable iff the matrix associated to f with respect to some basis (any
basis) is diagonalizable.
(ii) Let {v1 , . . . , vn } be a basis. The matrix Mf of a linear transformation f w.r.t. this basis
is diagonal iff f (vi ) = λi vi , 1 ≤ i ≤ n for some scalars λi . Naturally a subquestion here is:
does there exist such a basis for a given linear transformation?
Remark 7.2
(i) It is easy to see that eigenvalues and eigenvectors of a linear transformation are same as
those of the associated matrix.
(ii) Even if a linear map is not diagonalizable, the existence of eigenvectors and eigenvalues
itself throws some light on the nature of the linear map. Thus the study of eigenvalues becomes
extremely important. They arise naturally in the study of differential equations. Here we shall
use them to address the problem of diagonalization and then see some geometric applications
of diagonalization itself.
58
Definition 7.3 For any square matrix A, the polynomial χA (λ) = det (A−λI) in λ is called
the characteristic polynomial of A.
Hence eigenvalues of A are 3 and 6. The eigenvalue λ = 3 is a double root of the charac-
teristic polynomial of A. We say that λ = 3 has algebraic multiplicity 2. Let us find the
eigenspaces E(3) and E(6).
0 0 0
λ = 3 : A − 3I = −2 1 2
. Hence rank (A − 3I) = 1. Thus nullity (A − 3I) = 2. By
−2 1 2
solving the system (A − 3I)v = 0, we find that
The dimension of EA (λ) is called the geometric multiplicity of λ. Hence geometric mul-
tiplicity of λ = 3 is2.
−3 0 0
λ = 6 : A − 6I = −2 −2 . Hence rank(A − 6I) = 2. Thus dim EA (6) = 1. (It
2
−2 1 −1
can be shown that {(0, 1, 1)} is a basis of EA (6).) Thus both the algebraic and geometric
multiplicities" of the #eigenvalue 6 are equal to 1.
1 1
(3) A = . Then det (A − λI) = (1 − λ)2 . Thus λ = 1 has algebraic multiplicity
0 1
2.
59
" #
0 1
A−I = . Hence nullity (A − I) = 1 and EA (1) = L{e1 }. In this case the
0 0
geometric multiplicity is less than the algebraic multiplicity of the eigenvalue 1.
Remark 7.3
(i) Observe that χA (λ) = χM −1 AM (λ). Thus the characteristic polynomial is an invariant
of similarity. Thus the characteristic polynomial of any linear map f : V −→ V is also
defined (where V is finite dimensional) by choosing some basis for V, and then taking the
characteristic polynomial of the associated matrix M(f ) of f. This definition does not depend
upon the choice of the basis.
(ii) If we expand det (A − λI) we see that there is a term
This is the only term which contributes to λn and λn−1 . It follows that the degree of the
characteristic polynomial is exactly equal to n, the size of the matrix; moreover, the coefficient
of the top degree term is equal to (−1)n . Thus in general, it has n complex roots, some of
which may be repeated, some of them real, and so on. All these patterns are going to influence
the geometry of the linear map.
(iii) If A is a real matrix then of course χA (λ) is a real polynomial. That however, does
not allow us to conclude that it has real roots. So while discussing eigenvalues we should
consider even a real matrix as a complex matrix and keep in mind the associated linear
map Cn −→ Cn . The problem of existence of real eigenvalues and real eigenvectors will be
discussed soon.
(iv) Next, the above observation also shows that the coefficient of λn−1 is equal to
Lemma 7.1 Suppose A is a real matrix with a real eigenvalue λ. Then there exists a real
column vector v 6= 0 such that Av = λv.
Proof: Start with Aw = λw where w is a non zero column vector with complex entries.
Write w = v + ıv′ where both v, v′ are real vectors. We then have
Compare the real and imaginary parts. Since w 6= 0, at least one of the two v, v′ must be a
non zero vector and we are done. ♠
60
(−1)n λn + (−1)n−1 λn−1 (a11 + . . . + ann ) + . . . (48)
Comparing (49) and 51), we get, the constant term of det (A − λI) is equal to λ1 λ2 . . . λn =
det A and tr(A) = a11 + a22 + . . . + ann = λ1 + λ2 + . . . + λn . ♠
λ1 (c1 v1 + c2 v2 + . . . + ck vk ) − (λ1 c1 v1 + λ2 c2 v2 + . . . + λk ck vk )
= (λ1 − λ2 )c2 v2 + (λ1 − λ3 )c3 v3 + . . . + (λ1 − λk )ck vk = 0
C −1 AC = D(λ1 , . . . , λn ) = D
61
7.3 Relation Between Algebraic and Geometric Multiplicities
Recall that
Proposition 7.5 Both algebraic multiplicity and the geometric multiplicities are invariant
of similarity.
Proof: We have already seen that for any invertible matrix C, χA (λ) = χC −1 AC (λ). Thus
the invariance of algebraic multiplicity is clear. On the other hand check that EC −1 AC (λ) =
C(EA (λ)). Therefore, dim (EC −1 AC (λ)) = dim C(EA λ)) = dim (EA (λ)), the last equality
being the consequence of invertibility of C.
♠
We have observed in a few examples that the geometric multiplicity of an eigenvalue is
at most its algebraic multiplicity. This is true in general.
Proposition 7.6 Let A be an n×n matrix. Then the geometric multiplicity of an eigenvalue
µ of A is less than or equal to the algebraic multiplicity of µ.
Proof: Put aA (µ) = k. Then (λ − µ)k divides det (A − λI) but (λ − µ)k+1 does not.
Let gA (µ) = g, be the geometric multiplicity of µ. Then EA (µ) has a basis consisting
of g eigenvectors v1 , v2 , . . . , vg . We can extend this basis of EA (µ) to a basis of Cn , say
{v1 , v2 , . . . , vg , . . . , vn }. Let B be the matrix such that B j = vj . Then B is an invertible
matrix and
µIg X
−1
B AB =
0 Y
where X is a g × (n − g) matrix and Y is an (n − g) × (n − g) matrix. Therefore,
Thus g ≤ k. ♠
Remark 7.4 We will now be able to say something about the diagonalizability of a given
matrix A. Assuming that there exists B such that B −1 AB = D(λ1 , . . . , λn ), as seen in the
previous proposition, it follows that AB = BD . . . etc.. AB i = λB i where B i denotes the
ith column vector of B. Thus we need not hunt for B anywhere but look for eigenvectors of
A. Of course B i are linearly independent, since B is invertible. Now the problem turns to
62
the question whether we have n linearly independent eigenvectors of A so that they can be
chosen for the columns of B. The previous proposition took care of one such case, viz., when
the eigenvalues are distinct. In general, this condition is not forced on us. Observe that the
geometric multiplicity and algebraic multiplicity of an eigenvalue co-incide for a diagonal
matrix. Since these concepts are similarity invariants, it is necessary that the same is true
for any matrix which is diagonalizable. This turns out to be sufficient also. The following
theorem gives the correct condition for diagonalization.
Proof: We have already seen the necessity of the condition. To prove the converse, suppose
that the two multiplicities coincide for each eigenvalue. Suppose that λ1 , λ2 , . . . , λk are all
the eigenvalues of A with algebraic multiplicities n1 , n2 , . . . , nk . Let
63
Lectures 19,20,21
hu, vi = u∗ v = u1 v1 + u2 v2 + · · · + un vn .
q
The length of u is given by kuk = |u1 |2 + · · · + |un |2 .
Remark 7.5
(i) If A is real then A = A∗ means A = At . Hence real symmetric matrices are Hermitian.
Likewise a real skew Hermitian matrix is skew symmetric.
(ii) A is Hermitian iff ıA is skew Hermitian.
Proof: (1) Since u∗ Au is a complex number, to prove it is real, we prove that (u∗ Au)∗ =
u∗ Au. But (u∗ Au)∗ = u∗ A∗ (u∗ )∗ = u∗ Au. Hence u∗ Au is real for all u ∈ Cn .
(2) Suppose λ is an eigenvalue of A and u is an eigenvector for λ. Then
Since u∗ Au is real and kuk is a nonzero real number, it follows that λ is real.
(3) Let λ and µ be two distinct eigenvalues of A and u and v be corresponding eigenvec-
tors. Then Au = λu and Av = µv. Hence
64
Corollary 7.1 Let A be an n × n skew Hermitian matrix. Then :
Proof: All this follow straight way from the corresponding statement about Hermitian
matrix, once we note that A is skew Hermitian implies ıA is Hermitian and the fact that a
complex number c is real iff ıc is either zero or purely imaginary.
Thus a real matrix A is orthogonal iff AT = A−1 . Also observe that A is unitary iff AT is
unitary iff A is unitary.
Proposition 7.8 Let A be a square matrix. Then the following conditions are equivalent.
(i) U is unitary.
(ii) The rows of U form an orthonormal set of vectors.
(iii) The columns of U form an orthonormal set of vectors.
(iv) U preserves the inner product, i.e., for all vectors x, y ∈ Cn , we have hUx, Uyi = hx, yi.
Hence
u∗1
u∗2
∗
U U =
.. [u1
u2 . . . un ]
.
u∗n
u∗1 u1 u∗1 u2 · · · u∗1 un
u∗2 u1 u∗2 u2 · · · u∗2 un
= .. .
. ···
∗ ∗ ∗
un u1 un u2 · · · un un
65
Thus U ∗ U = I iff u∗i uj = 0 for i 6= j and u∗i ui = 1 for i = 1, 2, . . . , n iff the column vectors
of U form an orthonormal set. This proves (i) ⇐⇒ (ii). Since U ∗ U = I implies UU ∗ = I,
the proof of (i) ⇐⇒ (iii) follows.
To prove (i) ⇐⇒ (iv) let U be unitary. Then U ∗ U = Id and hence hUx, Uyi = hx, U ∗ Uyi =
hx, yi. Conversely, iff U preserves inner product take x = ei and y = ej to get
where δij are Kronecker symbols (δij = 1 if i = j; = 0 otherwise.) This means the (i, j)th
entry of U ∗ U is δij . Hence U ∗ U = In . ♠
Remark 7.6 Observe that the above theorem is valid for an orthogonal matrix also by merely
applying it for a real matrix.
66
7.5 Spectral Theorem and Applications
Similarity does not necessarily preserve the distance. In terms of matrices, this may be
noticed in the fact that an arbitrary conjugate C −1 AC of a Hermitian matrix may not be
Hermitian. Thus the diagonalization problem for special matrices such as Hermitian matrices
needs a special treatment viz., we need to restrict C to those matrices which preserve the
inner product. In this section we shall first establish an important result that says that a
Hermitian matrix can be diagonalized using a unitary transformation. We shall then see
some applications of this to geometry.
Closely related to the problem of diagonalization is the problem of triangularization. We
shall use this concept as a stepping stone toward the solution of diagonalization problem.
Definition 7.8 Two n × n matrices A and B are said to be congruent if there exists a
unitary matrix C such that C ∗ AC = B.
Definition 7.9 We say A is triangularizable if there exists an invertible matrix C such that
C −1 AC is upper triangular.
Remark 7.7 Obviously all diagonalizable matrices are triangularizable. The following result
says that triangularizability causes least problem:
Proposition 7.9 Over the complex numbers every square matrix is congruent to an upper
triangular matrix.
Proof: Let A be a n × n matrix with complex entries. We have to find a unitary matrix C
such that C ∗ AC is upper triangular. We shall prove this by induction. For n = 1 there is
nothing to prove. Assume the result for n − 1. Let µ be an eigenvalue of A and let v ∈ Cn
be such that Av1 = µv1 . (Here we need to work with complex numbers; real numbers won’t
do. why?) We can choose v1 to be of norm 1. We can then complete it to an orthonormal
basis {v1 , . . . , vn }. (Here we use Gram-Schmidt.) We then take C1 to be the matrix whose
ith column is vi . Then as seen earlier, C1 is a unitary matrix. Put A1 = C1−1 AC1 . Then
A1 e1 = C1−1 AC1 e1 = C1−1 Av1 = µC1−1 (v1 ) = µe1 . This shows that the first column of A1
has all entries zero except the first one which is equal to µ. Let B be the matrix obtained
from A1 by cutting down the first row and the first column, so that A1 is a block matrix of
the form " #
µ ...
A1 =
0n−1 B
where 0n−1 is the column of zeros of size n − 1.
By induction there exists a (n − 1) × (n − 1) unitary matrix M such that M −1 BM is an
0tn−1
" #
1
upper triangular matrix. Put M1 = .
0n−1 M
Then M1 is unitary and hence C = C1 M1 is also unitary. Clearly C −1 AC = M1−1 C1−1 AC1 M1 =
M1−1 A1 M1 which is of the form
" #" #" #
1 0tn−1 µ ... 1 0tn−1
0n−1 M −1 0n−1 B 0n−1 M
67
Remark 7.8 Assume now that A is a real matrix with all its eigenvalues real. Then from
lemma 7.1, it follows that we can choose the eigenvector v1 to be a real vector and then
complete this into a basis for Rn . Thus the matrix C1 corresponding to this basis will have
real entries. By induction M will have real entries and hence the product C = MC1 will also
have real entries. Thus we have proved:
Proposition 7.10 For a real square matrix A with all its eigenvalues real, there exists an
orthogonal matrix C such that C t AC is upper triangular.
Remark 7.9
(i) Normality is congruence invariant. This means that if C is unitary and A is normal then
C −1 AC is also normal. This is easy to verify.
(ii) Any diagonal matrix is normal. Therefore it follows that normality is necessary for
diagonalization. Amazingly, it turns out to be sufficient. That is the reason to define this
concept.
(iii) "Observe# that product
"
of
#
two normal matrices may not be normal. For example take
0 1 1 0
A= ; B= .
1 0 0 2
(iv) Certainly Hermitian matrices are normal."
Of #course there are normal matrices which
ı 0
are not Hermitian. For example take A = .
0 −ı
Lemma 7.3 For a normal matrix A we have kAxk2 = kA∗ xk2 for all x ∈ Cn .
Proof: hAx, Axi = (Ax)∗ (Ax) = x∗ A∗ Ax = x∗ AA∗ x = (A∗ x)∗ (A∗ x) = hA∗ x, A∗ xi =
kA∗ xk2 . ♠
Proof: Observe that if A is normal then A − µI is also normal. Now (A − µI)(v) = 0 iff
k(A − µI)(v)k = 0 iff k(A − µI)∗ vk = 0 iff (A∗ − µI)(v) = 0. ♠
Proof: Let A be an upper triangular normal matrix. Inductively we shall show that aij = 0
for j > i. We have Ae1 = a11 e1 . Hence kAe1 k2 = |a11 |2 . On the other hand this is equal to
kA∗ e1 k = |a11 |2 + |a12 |2 + · · · + |a1n |2 . Hence a12 = a13 = · · · = a1n = 0. Inductively suppose
we have shown aij = 0 for j > i for all 1 ≤ i ≤ k − 1. Then it follows that Aek = akk ek .
Exactly as in the first case, this implies that kA∗ ek k = |ak,k |2 +|ak,k+1|2 +· · ·+|ak,n |2 = |ak,k |2 .
Hence ak,k+1 = · · · = ak,n = 0. ♠
Why all this fuss? Well for one thing, we now have a big theorem. All we have to do is
to combine propositions 7.9 and 7.11.
Theorem 7.2 Spectral Theorem Given any normal matrix A, there exists a unitary ma-
trix C such that C ∗ AC is a diagonal matrix.
68
Corollary 7.3 Every Hermitian matrix A is congruent to a diagonal matrix. A real sym-
metric matrix is real-congruent to a diagonal matrix.
Proof: For the first statement we simply observe that a Hermitian matrix is normal and
apply the above theorem. For the second statement, we first recall that for a real symmetric
matrix, all eigenvalues are real. Hence the proposition 7.10 is applicable and so we can
choose C to be an orthogonal matrix. Along with proposition 7.11, this gives the result. ♠
Quadratic forms and their diagonalization
is called the quadratic form associated with A. If A = diag (λ1 , λ2 , . . . , λn ) then Q(X) =
λ1 x21 + λ2 x22 + · · · + λn x2n is called a diagonal form.
x1
x2
Proposition 7.12 Q(X) = [x1 , x2 , . . . , xn ]A
..
= X t AX where X = (x1 , x2 , . . . , xn )t .
.
xn
Proof:
n
X
a1j xj
x1
j=1
x2
..
[x1 , x2 , . . . , xn ]A .. = [x1 , x2 , . . . , xn ] .
.
n
X
xn anj xj
j=1
n
X Xn
= a1j xj x1 +···+ anj xj xn
j=1 i=1
Xn X n
= aij xi xj
j=1 i=1
= Q(x).
♠
69
Proposition 7.13 For any n × n matrix A and the column vector X = (x1 , x2 , . . . , xn )t ,
1
X t AX = X t BX where B = (A + At ).
2
Hence every quadratic form is associated with a symmetric matrix.
Theorem 7.3 Let X t AX be a quadratic form associated with a real symmetric matrix A.
Let U be an orthogonal matrix such that U t AU = diag (λ1 , λ2 , . . . , λn ). Then
where
x1 y1
x2 y2
X= .. =U .. = UY.
.
.
xn yn
Example 7.5 Let us determine the orthogonal matrix U which reduces the quadratic form
Q(X) = 2x21 + 4xy + 5x22 to a diagonal form. We write
" # " #
2 2 x
Q(X) = [x, y] = X t AX.
2 5 y
70
" # " # " #
2 1 x u
Hence U = √1 . The change of variables equations are = U . The
5 −1 2 y v
diagonal form is: " #
1 0
[u, v] [u, v]T = u2 + 6v 2 .
0 6
Check that U t AU = diag (1, 6).
It can be proved that this equation represents one of the following: (i) the empty set
(ii) a single point (iii) one or two straight lines (iv) an ellipse (v) an hyperbola and (vi) a
parabola. The second degree part of (53)
is a quadratic form. This determines the type of the conic. We can write the equation (53)
into matrix form after setting x = x, y = y :
" # " # " #
a b/2 x x
[x, y] + [d, e] +f = 0 (54)
b/2 c y y
" #
a b/2
Write A = . Let U = [v1 , v2 ] be an orthogonal matrix whose column vectors v1
b/2 c
and v2 are eigenvectors of A with eigenvalues λ1 and λ2 . Apply the change of variables
" # " #
x u
X= =U
y v
to diagonalize the quadratic form Q(x, y) to the diagonal form λ1 y12 + λ2 y22. The orthonormal
basis {v1 , v2 } determines a new set of coordinate axes with respect to which the locus of the
equation [x, y]A[x, y]T + B[x, y]T + f = 0 with B = [d, e] is same as the locus of the equation
If the conic determined by (55) is not degenerate i.e., not an empty set, a point, nor line(s)
then signs of λ1 and λ2 determine whether it is a parabola, an hyperbola or an ellipse. The
equation (53) will represent (1) ellipse if λ1 λ2 > 0 (2) hyperbola if λ1 λ2 < 0 (3) parabola if
λ1 λ2 = 0
Example 7.6
(1) 2x2 + 4xy + 5y 2 + 4x + 13y − 1/4 = 0.
We have earlier diagonalized the quadratic form 2x2 +4xy+5y 2 . The associated symmetric
matrix, the eigenvectors and eigenvalues are displayed in the equation of diagonalization :
" # " # " # " #
t 1 2 −1 2 2 1 2 1 1 0
U AU = √ √ = .
5 1 2 2 5 5 −1 2 0 6
71
√
Set t = 1/ 5 for convenience. Then the change of coordinates equations are :
" # " # " #
x 2t t u
= ,
y −t 2t v
i.e., x = t(2u + v) and y = t(−u + 2v). Substitute these into the original equation to get
√ √ 1
u2 + 6v 2 − 5u + 6 5v − = 0.
4
Complete the square to write this as
1√ 2 1√ 2
(u −
5) + 6(v + 5) = 9.
2 2
√ √
This is an equation of ellipse with center ( 21 5, − 12 5) in the uv-plane. The u-axis and
v-axis are determined by the eigenvectors v1 and v2 as indicated in the following figure :
v y
u
x
" #
2 2 2 −2
(2) 2x − 4xy − y − 4x + 10y − 13 = 0. Here, the matrix A = gives the quadratic
−2 −1
part of the equation. We write the equation in matrix form as
" #" # " #
2 −2 x x
[x, y] + [−4, 10] − 13 = 0.
−2 −1 y y
√
Let t = 1/ 5. The eigenvalues of A are λ1 = 3, λ2 = −2. An orthonormal set of eigen-
vectors is v1 = t(2, −1)t and v2 = t(1, 2)t . Put
" # " # " #
2 1 x u
U =t and =U .
−1 2 y v
or
3u2 − 2v 2 − 18tu + 16tv − 13 = 0.
Complete the square in u and v to get
72
or
(u − 3t)2 (v − 4t)2
− = 1.
4 6
This represents a hyperbola with center (3t, 4t) in the uv-plane. The eigenvectors v1 and v2
determine the directions of positive u and v axes.
u
x
The equation in uv-plane is u2 + v = 0. This is an equation of parabola with its vertex at the
origin.
Quadric Surfaces
Let A be a 3 × 3 real symmetric matrix. The locus of the equation
x x
+c = 0
[x, y, z]A y + [b1 , b2 , b3 ] y (56)
z z
in three variables is called a quadric surface. We can carry out an analysis similar to the
one for quadratic forms in two variables to bring (56) into standard form. Degenerate cases
may arise. But the primary cases are :
73
Equation Surface signs of eigenvalues of A
x2 y2 z2
+ + ellipsoid all three positive
a2 b2 c2
x2 y2 z
+ − =0 elliptic paraboloid two positive, one negative
a22 b22 c
x y z2
+ − 2 =0 elliptic cone two positive, one negative
a2 b2 c
x2 y2 z2
+ − 2 =1 1-sheeted hyperboloid two positive, one negative
a22 b22 c
x y z2
− − 2 =1 2-sheeted hyperboloid one positive, two negative
a2 b2 c
x2 y2 z
− − =0 hyperbolic paraboloid one positive, one negative, one zero.
a2 b2 c
Example 7.7
(1) 7x2 + 7y 2 − 2z 2 + 20yz − 20zx − 2xy = 36.
The matrix form is:
7 −1 −10 x
[x, y, z] −1
7 = 36.
10 y
−10 10 −2 z
Let A be the 3 × 3 matrix appearing in this equation. The eigenvalues of A are λ1 = 6, λ2 =
−12 and λ2 = 18. An orthonormal set of eigenvectors is given by the column vectors of the
orthogonal matrix
√1 √1 √1
2 6 3 6 0 0
√1 − √16 − √13 and U t AU = 0 −12 0 .
U =
2
0 √2 − √13 0 0 18
6
74
The eigenvalues of A are 3, 3, −3. Since A is diagonalizable the eigenspace E(3) is 2-dimensional.
We find a basis of E(3) first. The eigenvectors for λ = 3 are solutions of
−2 2 −2 x 0
2 −2
y = 0 .
2
−2 2 −2 z 0
E(3) = {(y − z, y, z) | y, z ∈ R}
= L({u1 = (0, 1, 1)t , u2 = (−1, 1, 2)t }).
75