Basics of Algebra and Analysis For Computer Science
Basics of Algebra and Analysis For Computer Science
Basics of Algebra and Analysis For Computer Science
Jean Gallier
Department of Computer and Information Science
University of Pennsylvania
Philadelphia, PA 19104, USA
e-mail: jean@cis.upenn.edu
c Jean Gallier
!
February 9, 2011
2
Contents
1 Introduction 5
2 Linear Algebra 7
2.1 Groups, Rings, and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Linear Independence, Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Bases of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.8 Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.9 The Dual Space E ∗ and Linear Forms . . . . . . . . . . . . . . . . . . . . . . 44
2.10 Hyperplanes and Linear Forms . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.11 Transpose of a Linear Map and of a Matrix . . . . . . . . . . . . . . . . . . . 52
3 Determinants 55
3.1 Permutations, Signature of a Permutation . . . . . . . . . . . . . . . . . . . 55
3.2 Alternating Multilinear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Definition of a Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Inverse Matrices and Determinants . . . . . . . . . . . . . . . . . . . . . . . 68
3.5 Systems of Linear Equations and Determinant . . . . . . . . . . . . . . . . . 71
3.6 Determinant of a Linear Map . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.7 Gaussian Elimination, LU and Cholesky Factorization . . . . . . . . . . . . . 73
3.8 Futher Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3
4 CONTENTS
6 Topology 197
6.1 Metric Spaces and Normed Vector Spaces . . . . . . . . . . . . . . . . . . . . 197
6.2 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
6.3 Continuous Functions, Limits . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.4 Continuous Linear and Multilinear Maps . . . . . . . . . . . . . . . . . . . . 211
6.5 Normed Affine Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.6 Futher Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Bibliography 252
Chapter 1
Introduction
5
6 CHAPTER 1. INTRODUCTION
Chapter 2
Linear Algebra
7
8 CHAPTER 2. LINEAR ALGEBRA
Example 2.1
2. The set Q of rational numbers is a group under addition, with identity element 0. The
set Q∗ = Q − {0} is also a group under multiplication, with identity element 1.
3. Similarly, the sets R of real numbers and C of complex numbers are groups under
addition (with identity element 0), and R∗ = R − {0} and C∗ = C − {0} are groups
under multiplication (with identity element 1).
4. The sets Rn and Cn of n-tuples of real or complex numbers are groups under compo-
nentwise addition:
with identity element (0, . . . , 0). All these groups are abelian.
5. Given any nonempty set S, the set of bijections f : S → S, also called permutations
of S, is a group under function composition (i.e., the multiplication of f and g is the
composition g ◦ f ), with identity element the identity function idS . This group is not
abelian as soon as S has more than two elements.
6. The set of n × n matrices with real (or complex) coefficients is a group under addition
of matrices, with identity element the null matrix. It is denoted by Mn (R) (or Mn (C)).
7. The set R[X] of polynomials in one variable with real coefficients is a group under
addition of polynomials.
8. The set of n × n invertible matrices with real (or complex) coefficients is a group under
matrix multiplication, with identity element the identity matrix In . This group is
called the general linear group and is usually denoted by GL(n, R) (or GL(n, C)).
9. The set of n × n invertible matrices with real (or complex) coefficients and determinant
+1 is a group under matrix multiplication, with identity element the identity matrix
In . This group is called the special linear group and is usually denoted by SL(n, R)
(or SL(n, C)).
10. The set of n × n invertible matrices with real coefficients such that RR# = In and
of determinant +1 is a group called the orthogonal group and is usually denoted by
SO(n) (where R# is the transpose of the matrix R, i.e., the rows of R# are the columns
of R). It corresponds to the rotations in Rn .
2.1. GROUPS, RINGS, AND FIELDS 9
11. Given an open interval ]a, b[, the set C(]a, b[) of continuous functions f : ]a, b[→ R is a
group under the operation f + g defined such that
The groups Z, Q, R, C, R[X], and Mn (R) are more than an abelian groups, they are also
commutative rings. Furthermore, Q, R, and C are fields. We now introduce rings and fields.
a + (b + c) = (a + b) + c (associativity of +) (2.1)
a+b= b+a (commutativity of +) (2.2)
a+0= 0+a = a (zero) (2.3)
a + (−a) = (−a) + a = 0 (additive inverse) (2.4)
a ∗ (b ∗ c) = (a ∗ b) ∗ c (associativity of ∗) (2.5)
a∗1= 1∗a = a (identity for ∗) (2.6)
(a + b) ∗ c = (a ∗ c) + (b ∗ c) (distributivity) (2.7)
a ∗ (b + c) = (a ∗ b) + (a ∗ c) (distributivity) (2.8)
Note that (2.9) implies that if 1 = 0, then a = 0 for all a ∈ A, and thus, A = {0}. The
ring A = {0} is called the trivial ring. A ring for which 1 (= 0 is called nontrivial . The
multiplication a ∗ b of two elements a, b ∈ A is often denoted by ab.
10 CHAPTER 2. LINEAR ALGEBRA
Example 2.2
1. The additive groups Z, Q, R, C, are commutative rings.
2. The group R[X] of polynomials in one variable with real coefficients is a ring under
multiplication of polynomials. It is a commutative ring.
3. The group of n × n matrices Mn (R) is a ring under matrix multiplication. However, it
is not a commutative ring.
4. The group C(]a, b[) of continuous functions f : ]a, b[→ R is a ring under the operation
f · g defined such that
(f · g)(x) = f (x)g(x)
for all x ∈]a, b[.
When ab = 0 for a (= 0 and b (= 0, we say that a is a zero divisor (and so is b). A ring
A is an integral domain (or an entire ring) if 0 (= 1, A is commutative, and ab = 0 implies
that a = 0 or b = 0, for all a, b ∈ A. In other words, an integral domain is a nontrivial
commutative ring with no zero divisors.
Example 2.3
1. The rings Z, Q, R, C, are integral domains.
2. The ring R[X] of polynomials in one variable with real coefficients is an integral domain.
3. For any positive integer, p ∈ N, define a relation on Z, denoted m ≡ n (mod p), as
follows:
m ≡ n (mod p) iff m − n = kp for some k ∈ Z.
The reader will easily check that this is an equivalence relation, and, moreover, it is
compatible with respect to addition and multiplication, which means that if m1 ≡
n1 (mod p) and m2 ≡ n2 (mod p), then m1 + m2 ≡ n1 + n2 (mod p) and m1 m2 ≡
n1 n2 (mod p). Consequently, we can define an addition operation and a multiplication
operation of the set of equivalence classes (mod p):
[m] + [n] = [m + n]
and
[m] · [n] = [mn].
Again, the reader will easily check that the ring axioms are satisfied, with [0] as zero
and [1] as multiplicative unit. The resulting ring is denoted by Z/pZ.1 Observe that
if p is composite, then this ring has zero-divisors. For example, if p = 4, then we have
2 · 2 ≡ 0 (mod 4).
1
The notation Zp is sometimes used instead of Z/pZ but it clashes with the notation for the p-adic integers
so we prefer not to use it.
2.1. GROUPS, RINGS, AND FIELDS 11
However, the reader should prove that Z/pZ is an integral domain if p is prime (in
fact, it is a field).
4. The ring of n × n matrices Mn (R) is not an integral domain. It has zero divisors (see
Example 2.12).
Definition 2.3 Given two rings A and B, a homomorphism between A and B is a function
h: A → B satisfying the following conditions for all x, y ∈ A:
Example 2.4
1. If A is a ring, for any integer n ∈ Z, for any a ∈ A, we define n · a by
n·a= a
! + ·"#
· · + a$
n
if n ≥ 0 (with 0 · a = 0) and
n · a = −(−n) · a
if n < 0. Then, the map h: Z → A given by
h(n) = n · 1A
ηλ (f (X)) = f (λ)
Definition 2.4 A set K is a field if it is a ring and the following properties hold:
12 CHAPTER 2. LINEAR ALGEBRA
(F1) 0 (= 1;
(F2) K ∗ = K − {0} is a group w.r.t. ∗ (i.e., every a (= 0 has an inverse w.r.t. ∗);
(F3) ∗ is commutative.
If ∗ is not commutative but (F1) and (F2) hold, we say that we have a skew field (or
noncommutative field ).
Note that we are assuming that the operation ∗ of a field is commutative. This convention
is not universally adopted, but since ∗ will be commutative for most fields we will encounter,
we may as well include this condition in the definition.
Example 2.5
2. The set of (formal) fractions f (X)/g(X) of polynomials f (X), g(X) ∈ R[X], where
g(X) is not the null polynomial, is a field.
3. The ring C ∗ (]a, b[) of continuous functions f : ]a, b[→ R such that f (x) (= 0 for all
x ∈]a, b[ is a field.
Definition 2.5 Given a field K, a vector space over K (or K-vector space) is a set E
(of vectors) together with two operations +: E × E → E (called vector addition),2 and
·: K ×E → E (called scalar multiplication) satisfying the following conditions for all α, β ∈ K
and all u, v ∈ E;
(V1) α · (u + v) = (α · u) + (α · v);
(V2) (α + β) · u = (α · u) + (β · u);
(V3) (α ∗ β) · u = α · (β · u);
(V4) 1 · u = u.
Given α ∈ K and v ∈ E, the element α · v is also denoted by αv. The field K is often
called the field of scalars.
Unless specified otherwise or unless we are dealing with several different fields, in the rest
of this chapter, we assume that all K-vector spaces are defined with respect to a fixed field
K. Thus, we will refer to a K-vector space simply as a vector space. In most cases, the field
K will be the field R of reals.
From (V0), a vector space always contains the null vector 0, and thus is nonempty.
From (V1), we get α · 0 = 0, and α · (−v) = −(α · v). From (V2), we get 0 · v = 0, and
(−α) · v = −(α · v). The field K itself can be viewed as a vector space over itself, addition
of vectors being addition in the field, and multiplication by a scalar being multiplication in
the field.
Example 2.6
2. The groups Rn and Cn are vector spaces over R, and Cn is a vector space over C.
3. The ring R[X] of polynomials is a vector space over R, and C[X] is a vector space over
R and C. The ring of n × n matrices Mn (R) is a vector space over R.
4. The ring C ∗ (]a, b[) of continuous functions f : ]a, b[→ R is a vector space over R.
Let E be a vector space. We would like to define the important notions of linear com-
bination and linear independence. These notions can be defined for sets of vectors in E,
but it will turn out to be more convenient to define them for families (vi )i∈I , where I is any
arbitrary index set.
v = λ1 e1 + · · · + λn en ,
14 CHAPTER 2. LINEAR ALGEBRA
1, X, X 2 , . . . , X n , . . .
One might wonder if it is possible for a vector space to have bases of different sizes, or even
to have a finite basis as well as an infinite basis. We will see later on that this is not possible;
all bases of a vector space have the same number of elements (cardinality), which is called
the dimension of the space. However, we have the following problem: If a vector space has
an infinite basis, {e1 , e2 , . . . , }, how do we define linear combinations? Do we allow linear
combinations
λ1 e1 + λ2 e2 + · · ·
with infinitely many nonzero coefficients?
If we allow linear combinations with infinitely many nonzero coefficients, then we have
to make sense of these sums and this can only be done reasonably if we define such a sum
as the limit of the sequence of vectors, s1 , s2 , . . . , sn , . . ., with s1 = λ1 e1 and
But then, how do we define such limits? Well, we have to define some topology on our space,
by means of a norm, a metric or some other mechanism. This can indeed be done and this
is what Banach spaces and Hilbert spaces are all about but this seems to require a lot of
machinery.
A way to avoid limits is to restrict our attention to linear combinations involving only
finitely many vectors. We may have an infinite supply of vectors but we only form linear
combinations involving finitely many nonzero coefficients. Technically, this can be done by
introducing families of finite support. This gives us the ability to manipulate families of
scalars indexed by some fixed infinite set and yet to be treat these families as if they were
finite. With these motivations in mind, let us review the notion of an indexed family.
Given a set A, a family (ai )i∈I of elements of A is simply a function a: I → A.
Remark: When considering a family (ai )i∈I , there is no reason to assume that I is ordered.
The crucial point is that every element of the family is uniquely indexed by an element of
I. Thus, unless specified otherwise, we do not assume that the elements of an index set are
ordered.
If A is an abelian group (usually, when A is a ring or a vector space) with identity 0, we
say that a family (ai )i∈I has finite support if ai = 0 for all i ∈ I − J, where J is a finite
subset of I (the support of the family).
2.3. LINEAR INDEPENDENCE, SUBSPACES 15
We can deal with an arbitrary set X by viewing it as the family (Xx )x∈X corresponding
to the identity function id: X → X. We agree that when I = ∅, (ai )i∈I = ∅. A family (ai )i∈I
is finite if I is finite.
Given two disjoint sets I and J, the union of two families (ui )i∈I and (vj )j∈J , denoted as
(ui )i∈I ∪ (vj )j∈J , is the family (wk )k∈(I∪J) defined such that wk = uk if k ∈ I, and wk = vk
if k ∈ J. Given a family (ui )i∈I and any element v, we denote by (ui )i∈I ∪k (v) the family
(wi )i∈I∪{k} defined such that, wi = ui if i ∈ I, and wk = v, where k is any index such that
k∈ / I. Given a family (ui )i∈I , a subfamily of (ui )i∈I is a family (uj )j∈J where J is any subset
of I.
In this chapter, unless specified otherwise, it is assumed that all families of scalars have
finite support.
When I = ∅, we stipulate that v = 0. We say that a family (ui )i∈I is linearly independent if
for every family (λi )i∈I of scalars in K,
%
λi ui = 0 implies that λi = 0 for all i ∈ I.
i∈I
Equivalently, a family (ui )i∈I is linearly dependent if there is some family (λi )i∈I of scalars
in K such that %
λi ui = 0 and λj (= 0 for some j ∈ I.
i∈I
A family (ui )i∈I is linearly dependent iff some uj in the family can be expressed as a
linear combination of the other vectors in the family. Indeed, there is some family (λi )i∈I of
scalars in K such that
%
λi ui = 0 and λj (= 0 for some j ∈ I,
i∈I
Example 2.7
1. Any two disinct scalars λ, µ (= 0 in K are linearly independent.
2. In R3 , the vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) are linearly independent.
3. In R4 , the vectors (1, 1, 1, 1), (0, 1, 1, 1), (0, 0, 1, 1), and (0, 0, 0, 1) are linearly indepen-
dent.
4. In R2 , the vectors u = (1, 1), v = (0, 1) and w = (2, 3) are linearly dependent, since
w = 2u + v.
Note that a family (ui )i∈I is linearly independent iff (u& j )j∈J is linearly independent for
every finite subset J of I (even when I = ∅).& Indeed, when i∈I λi ui = 0, the& family (λi )i∈I
of scalars in K has finite support, and thus i∈I λi ui = 0 really means that j∈J λj uj = 0
for a finite subset J of I. When I is finite, we often assume that it is the set I = {1, 2, . . . , n}.
In this case, we denote the family (ui )i∈I as (u1, . . . , un ).
The notion of a subspace of a vector space is defined as follows.
Definition 2.7 Given a vector space E, a subset F of E is a linear subspace (or subspace)
of E if F is nonempty and λu + µv ∈ F for all u, v ∈ F , and all λ, µ ∈ K.
It is easy to see that a subspace F of E is indeed a vector space, since the restriction
of +: E × E → E to F × F is indeed a function +: F × F → F , and the restriction of
·: K × E → E to K × F is indeed a function ·: K × F → F . It is also easy to see that a
subspace F of E is closed under arbitrary linear combinations of vectors from F , and that
any intersection of subspaces is a subspace. Letting λ = µ = 0, we see that every subspace
contains the vector 0. The subspace {0} will be denoted by (0), or even 0 (with a mild abuse
of notation).
Example 2.8
1. In R2 , the set of vectors u = (x, y) such that
x+y−1=0
is a subspace.
2. In R3 , the set of vectors u = (x, y, z) such that
x+y+z−1=0
is a subspace.
3. For any n ≥ 0, the set of polynomials f (X) ∈ R[X] of degree at most n is a subspace
of R[X].
4. The set of upper triangular n × n matrices is a subspace of the space of n × n matrices.
2.4. BASES OF A VECTOR SPACE 17
Definition 2.8 Given a vector space E and a subspace V of E, a family (vi )i∈I of vectors
vi ∈ V spans V or generates V if for every v ∈ V , there is some family (λi )i∈I of scalars in
K such that %
v= λi vi .
i∈I
We also say that the elements of (vi )i∈I are generators of V and that V is spanned by (vi )i∈I ,
or generated by (vi )i∈I . If a subspace V of E is generated by a finite family (vi )i∈I , we say
that V is finitely generated . A family (ui )i∈I that spans V and is linearly independent is
called a basis of V .
Example 2.9
1. In R3 , the vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) form a basis.
It is a standard result of linear algebra that every vector space E has a basis, and that
for any two bases (ui )i∈I and (vj )j∈J , I and J have the same cardinality. In particular, if E
has a finite basis of n elements, every basis of E has n elements, and the integer n is called
the dimension of the vector space E. We begin with a crucial lemma.
Lemma 2.1 Given a linearly independent family (ui )i∈I of elements of a vector space E, if
v ∈ E is not a linear combination of (ui )i∈I , then the family (ui )i∈I ∪k (v) obtained by adding
v to the family (ui )i∈I is linearly independent (where k ∈/ I).
&
Proof . Assume that µv + i∈I λi ui = 0, for any family (λi )i∈I of scalars
& in K. If µ (= 0,
then µ has an inverse (because K is a field), and thus we have v = − i∈I (µ−1 λi )ui , showing
that v is a linear
&combination of (ui )i∈I and contradicting the hypothesis. Thus, µ = 0. But
then, we have i∈I λi ui = 0, and since the family (ui )i∈I is linearly independent, we have
λi = 0 for all i ∈ I.
The next theorem holds in general, but the proof is more sophisticated for vector spaces
that do not have a finite set of generators. Thus, in this chapter, we only prove the theorem
for finitely generated vector spaces.
18 CHAPTER 2. LINEAR ALGEBRA
Theorem 2.2 Given any finite family S = (ui )i∈I generating a vector space E and any
linearly independent subfamily L = (uj )j∈J of S (where J ⊆ I), there is a basis B of E such
that L ⊆ B ⊆ S.
Proof . Consider the set of linearly independent families B such that L ⊆ B ⊆ S. Since this
set is nonempty and finite, it has some maximal element, say B = (uh )h∈H . We claim that
B generates E. Indeed, if B does not generate E, then there is some up ∈ S that is not a
linear combination of vectors in B (since S generates E), with p ∈ / H. Then, by Lemma
2.1, the family B = (uh )h∈H∪{p} is linearly independent, and since L ⊆ B ⊂ B & ⊆ S, this
&
Remark: Theorem 2.2 also holds for vector spaces that are not finitely generated. In this
case, the problem is to guarantee the existence of a maximal linearly independent family B
such that L ⊆ B ⊆ S. The existence of such a maximal family can be shown using Zorn’s
lemma, see Appendix 8 and the references given there.
The following proposition giving useful properties characterizing a basis is an immediate
consequence of Theorem 2.2.
Proposition 2.3 Given a vector space E, for any family B = (vi )i∈I of vectors of E, the
following properties are equivalent:
(1) B is a basis of E.
The following replacement proposition shows the relationship between finite linearly in-
dependent families and finite families of generators of a vector space.
Proposition 2.4 Given a vector space E, let (ui )i∈I be any finite linearly independent family
in E, where |I| = m, and let (vj )j∈J be any finite family such that every ui is a linear
combination of (vj )j∈J , where |J| = n. Then, there exists a set L and an injection ρ: L → J
such that L ∩ I = ∅, |L| = n − m, and the families (ui )i∈I ∪ (vρ(l) )l∈L and (vj )j∈J generate
the same subspace of E. In particular, m ≤ n.
Proof . We proceed by induction on |I| = m. When m = 0, the family (ui )i∈I is empty, and
the proposition holds trivially with L = J (ρ is the identity). Assume |I| = m + 1. Consider
the linearly independent family (ui )i∈(I−{p}) , where p is any member of I. By the induction
hypothesis, there exists a set L and an injection ρ: L → J such that L ∩ (I − {p}) = ∅,
|L| = n − m, and the families (ui )i∈(I−{p}) ∪ (vρ(l) )l∈L and (vj )j∈J generate the same subspace
of E. If p ∈ L, we can replace L by (L − {p}) ∪ {p& } where p& does not belong to I ∪ L, and
replace ρ by the injection ρ& which agrees with ρ on L − {p} and such that ρ& (p& ) = ρ(p).
Thus, we can always assume that L ∩ I = ∅. Since up is a linear combination of (vj )j∈J
2.4. BASES OF A VECTOR SPACE 19
and the families (ui )i∈(I−{p}) ∪ (vρ(l) )l∈L and (vj )j∈J generate the same subspace of E, up is
a linear combination of (ui )i∈(I−{p}) ∪ (vρ(l) )l∈L . Let
% %
up = λi u i + λl vρ(l) . (1)
i∈(I−{p}) l∈L
contradicting the fact that (ui )i∈I is linearly independent. Thus, λl (= 0 for some l ∈ L, say
l = q. Since λq (= 0, we have
% %
vρ(q) = (−λ−1 −1
q λi )ui + λq up + (−λ−1
q λl )vρ(l) . (2)
i∈(I−{p}) l∈(L−{q})
We claim that the families (ui )i∈(I−{p}) ∪ (vρ(l) )l∈L and (ui )i∈I ∪ (vρ(l) )l∈(L−{q}) generate the
same subset of E. Indeed, the second family is obtained from the first by replacing vρ(q) by up ,
and vice-versa, and up is a linear combination of (ui )i∈(I−{p}) ∪ (vρ(l) )l∈L , by (1), and vρ(q) is a
linear combination of (ui )i∈I ∪(vρ(l) )l∈(L−{q}) , by (2). Thus, the families (ui )i∈I ∪(vρ(l) )l∈(L−{q})
and (vj )j∈J generate the same subspace of E, and the proposition holds for L − {q} and the
restriction of the injection ρ: L → J to L − {q}, since L ∩ I = ∅ and |L| = n − m imply that
(L − {q}) ∩ I = ∅ and |L − {q}| = n − (m + 1).
Actually, one can prove that Proposition 2.4 implies Theorem 2.2 when the vector space
is finitely generated. Putting Theorem 2.2 and Proposition 2.4 together, we obtain the
following fundamental theorem.
Theorem 2.5 Let E be a finitely generated vector space. Any family (ui )i∈I generating E
contains a subfamily (uj )j∈J which is a basis of E. Furthermore, for every two bases (ui )i∈I
and (vj )j∈J of E, we have |I| = |J| = n.
Proof . The first part follows immediately by applying Theorem 2.2 with L = ∅ and S =
(ui )i∈I . Assume that (ui )i∈I and (vj )j∈J are bases of E. Since (ui )i∈I is linearly independent
and (vj )j∈J spans E, proposition 2.4 implies that |I| ≤ |J|. A symmetric argument yields
|J| ≤ |I|.
Remark: Theorem 2.5 also holds for vector spaces that are not finitely generated. This can
be shown as follows. Let (ui )i∈I be a basis of E, let (vj )j∈J be a generating family of E, and
assume that I is infinite. For every j ∈ J, let Lj ⊆ I be the finite set
%
Lj = {i ∈ I | vj = vi ui , vi (= 0}.
i∈I
20 CHAPTER 2. LINEAR ALGEBRA
)
Let L = j∈J Lj . Since (ui )i∈I is a basis of E, we must have I = L, since otherwise (ui )i∈L
would be another basis of E, and by Lemma 2.1, this would contradict the fact that (ui )i∈I
is linearly independent. Furthermore, J must ) be infinite, since otherwise, because the Lj are
finite, I would be finite. But then, since I = j∈J Lj with J infinite and the Lj finite, by a
standard result of set theory, |I| ≤ |J|. If (vj )j∈J is also a basis, by a symmetric argument,
we obtain |J| ≤ |I|, and thus, |I| = |J| for any two bases (ui )i∈I and (vj )j∈J of E.
When |I| is infinite, we say that E is of infinite dimension, the dimension |I| being a
cardinal number which depends only on the vector space E. The dimension of a vector space
E is denoted by dim(E). Clearly, if the field K itself is viewed as a vector space, then every
family (a) where a ∈ K and a (= 0 is a basis. Thus dim(K) = 1.
Let (ui )i∈I be a basis of a vector space E. For any vector v ∈ E, since the family (ui )i∈I
generates E, there is a family (λi )i∈I of scalars in K, such that
%
v= λi u i .
i∈I
and since (ui )i∈I is linearly independent, we must have λi −µi = 0 for all i ∈ I, that is, λi = µi
for all i ∈ I. The converse is shown by contradiction. If (ui )i∈I was linearly dependent, there
would be a family (µi )i∈I of scalars not all null such that
%
µi u i = 0
i∈I
with λj (= λj +µ
&j since µj (= 0, contradicting the assumption that (λi )i∈I is the unique family
such that v = i∈I λi ui .
2.4. BASES OF A VECTOR SPACE 21
If (ui )i∈I is a basis of a vector space E, for any vector v ∈ E, if (vi )i∈I is the unique
family of scalars in K such that %
v= vi ui ,
i∈I
each vi is called the component (or coordinate) of index i of v with respect to the basis (ui )i∈I .
Given a field K and any (nonempty) set I, we can form a vector space K (I) which, in
some sense, is the standard vector space of dimension |I|.
Definition 2.9 Given a field K and any (nonempty) set I, let K (I) be the subset of the
cartesian product K I consisting of all families (λi )i∈I with finite support of scalars in K.3
We define addition and multiplication by a scalar as follows:
(λi )i∈I + (µi )i∈I = (λi + µi )i∈I ,
and
λ · (µi )i∈I = (λµi )i∈I .
It is immediately verified that addition and multiplication by a scalar are well defined.
Thus, K (I) is a vector space. Furthermore, because families with finite support are consid-
ered, the family (ei )i∈I of vectors ei , defined such that (ei )j = 0 if j (= i and (ei )i = 1, is
clearly a basis of the vector space K (I) . When I = {1, . . . , n}, we denote K (I) by K n . The
function ι: I → K (I) , such that ι(i) = ei for every i ∈ I, is clearly an injection.
! When I is a finite set, K (I) = K I , but this is false when I is infinite. In fact, dim(K (I) ) =
|I|, but dim(K I ) is strictly greater when I is infinite.
Many interesting mathematical structures are vector spaces. A very important example
is the set of linear maps between two vector spaces to be defined in the next section. Here
is an example that will prepare us for the vector space of linear maps.
Example 2.10 Let X be any nonempty set and let E be a vector space. The set of all
functions f : X → E can be made into a vector space as follows: Given any two functions
f : X → E and g: X → E, let (f + g): X → E be defined such that
(f + g)(x) = f (x) + g(x)
for all x ∈ X, and for every λ ∈ K, let λf : X → E be defined such that
(λf )(x) = λf (x)
for all x ∈ X. The axioms of a vector space are easily verified. Now, let E = K, and let I
be the set of all nonempty subsets of X. For every S ∈ I, let fS : X → E be the function
such that fS (x) = 1 iff x ∈ S, and fS (x) = 0 iff x ∈
/ S. We leave as an exercise to show that
(fS )S∈I is linearly independent.
3
Where K I denotes the set of all functions from I to K.
22 CHAPTER 2. LINEAR ALGEBRA
Definition 2.10 Given two vector spaces E and F , a linear map between E and F is a
function f : E → F satisfying the following two conditions:
Setting x = y = 0 in the first identity, we get f (0) = 0. The basic property of linear
maps is that they transform linear combinations into linear combinations. Given a family
(ui )i∈I of vectors in E, given any family (λi )i∈I of scalars in K, we have
% %
f( λi u i ) = λi f (ui ).
i∈I i∈I
The above identity is shown by induction on the size of the support of the family (λi ui )i∈I ,
using the properties of Definition 2.10.
Example 2.11
1. The map f : R2 → R2 defined such that
x& = x − y
y& = x + y
is a linear map.
where f & (X) is the derivative of the polynomial f (X), is a linear map
Given a linear map f : E → F , we define its image (or range) Im f = f (E), as the set
Proposition 2.7 Given a linear map f : E → F , the set Im f is a subspace of F and the set
Ker f is a subspace of E. The linear map f : E → F is injective iff Ker f = 0 (where 0 is the
trivial subspace {0}).
Proof . Given any x, y ∈ Im f , there are some u, v ∈ E such that x = f (u) and y = f (v),
and for all λ, µ ∈ K, we have
that is, λx + µy ∈ Ker f , showing that Ker f is a subspace of E. Note that f (x) = f (y) iff
f (x − y) = 0. Thus, f is injective iff Ker f = 0.
A fundamental property of bases in a vector space is that they allow the definition of
linear maps as unique homomorphic extensions, as shown in the following proposition.
Proposition 2.8 Given any two vector spaces E and F , given any basis (ui )i∈I of E, given
any other family of vectors (vi )i∈I in F , there is a unique linear map f : E → F such that
f (ui ) = vi for all i ∈ I. Furthermore, f is injective iff (vi )i∈I is linearly independent, and f
is surjective iff (vi )i∈I generates F .
Proof . If such a linear map f : E → F exists, since (ui )i∈I is a basis of E, every vector x ∈ E
can written uniquely as a linear combination
%
x= xi ui ,
i∈I
and since (ui )i∈I is a basis, we have λi = 0 for all i ∈ I, and (vi )i∈I is linearly independent.
Conversely, assume that (vi )i∈I is linearly independent. If
%
f( λi ui ) = 0,
i∈I
then % % %
λi vi = λi f (ui ) = f ( λi ui ) = 0,
i∈I i∈I i∈I
and λi = 0 for all i ∈ I, since (vi )i∈I is linearly independent. Since (ui )i∈I is a basis of E,
we just showed that Ker f = 0, and f is injective. The part where f is surjective is left as a
simple exercise.
By the second part of Proposition 2.8, an injective linear map f : E → F sends a basis
(ui )i∈I to a linearly independent family (f (ui ))i∈I of F , which is also a basis when f is
bijective. Also, when E and F have the same finite dimension n, (ui )i∈I is a basis of E, and
f : E → F is injective, then (f (ui ))i∈I is a basis of F (by Proposition 2.3).
We can now show that the vector space K (I) of Definition 2.9 has a universal property
that amounts to saying that K (I) is the vector space freely generated by I. Recall that
ι: I → K (I) , such that ι(i) = ei for every i ∈ I, is an injection from I to K (I) .
Proposition 2.9 Given any set I, for any vector space F , and for any function f : I → F ,
there is a unique linear map f : K (I) → F , such that
f = f ◦ ι,
for every i ∈ I. However, the family (ei )i∈I is a basis of K (I) , and (f (i))i∈I is a family of
vectors in F , and by Proposition 2.8, there is a unique linear map f : K (I) → F such that
f (ei ) = f (i) for every i ∈ I, which proves the existence and uniqueness of a linear map f
such that f = f ◦ ι.
The following simple proposition is also useful.
Proposition 2.10 Given any two vector spaces E and F , with F nontrivial, given any
family (ui )i∈I of vectors in E, the following properties hold:
(1) The family (ui )i∈I generates E iff for every family of vectors (vi )i∈I in F , there is at
most one linear map f : E → F such that f (ui ) = vi for all i ∈ I.
(2) The family (ui )i∈I is linearly independent iff for every family of vectors (vi )i∈I in F ,
there is some linear map f : E → F such that f (ui ) = vi for all i ∈ I.
Proof . (1) If there is any linear map f : E → F such that f (ui ) = vi for all i ∈ I, since
(ui )i∈I generates E, every vector x ∈ E can be written as some linear combination
%
x= xi ui ,
i∈I
This shows that f is unique if it exists. Conversely, assume that (ui )i∈I does not generate E.
Since F is nontrivial, there is some some vector y ∈ F such that y (= 0. Since (ui )i∈I does
not generate E, there is some vector w ∈ E that is not in the subspace generated by (ui )i∈I .
By Theorem 2.2, there is a linearly independent subfamily (ui )i∈I0 of (ui )i∈I generating the
same subspace. Since by hypothesis, w ∈ E is not in the subspace generated by (ui )i∈I0 , by
Lemma 2.1 and by Theorem 2.2 again, there is a basis (ej )j∈I0∪J of E, such that ei = ui ,
for all i ∈ I0 , and w = ej0 , for some j0 ∈ J. Letting (vi )i∈I be the family in F such that
vi = 0 for all i ∈ I, defining f : E → F to be the constant linear map with value 0, we have
a linear map such that f (ui ) = 0 for all i ∈ I. By Proposition 2.8, there is a unique linear
map g: E → F such that g(w) = y, and g(ej ) = 0, for all j ∈ (I0 ∪ J) − {j0 }. By definition
of the basis (ej )j∈I0∪J of E, we have, g(ui ) = 0 for all i ∈ I, and since f (= g, this contradicts
the fact that there is at most one such map.
(2) If the family (ui )i∈I is linearly independent, then by Theorem 2.2, (ui )i∈I can be
extended to a basis of E, and the conclusion follows by Proposition 2.8. Conversely, assume
that (ui )i∈I is linearly dependent. Then, there is some family (λi )i∈I of scalars (not all zero)
such that %
λi ui = 0.
i∈I
26 CHAPTER 2. LINEAR ALGEBRA
By the assumption, for any nonzero vector, y ∈ F , for every i ∈ I, there is some linear map
fi : E → F , such that fi (ui ) = y, and fi (uj ) = 0, for j ∈ I − {i}. Then, we would get
% %
0 = fi ( λi u i ) = λi fi (ui ) = λi y,
i∈I i∈I
and since y (= 0, this implies λi = 0, for every i ∈ I. Thus, (ui )i∈I is linearly independent.
Although in this book, we will not have many occasions to use quotient spaces, they are
fundamental in algebra. The next section may be omitted until needed.
Proposition 2.11 Given any vector space E and any subspace M of E, the relation ≡M is
an equivalence relation with the following two congruential properties:
1. If u1 ≡M v1 and u2 ≡M v2 , then u1 + u2 ≡M v1 + v2 , and
2. if u ≡M v, then λu ≡M λv.
(u1 + u2 ) − (v1 + v2 ) = w1 + w2 ,
Definition 2.11 Given any vector space E and any subspace M of E, we define the following
operations of addition and multiplication by a scalar on the set E/M of equivalence classes
of the equivalence relation ≡M as follows: for any two equivalence classes [u], [v] ∈ E/M, we
have
By Proposition 2.11, the above operations do not depend on the specific choice of represen-
tatives in the equivalence classes [u], [v] ∈ E/M. It is also immediate to verify that E/M is
a vector space. The function π: E → E/F , defined such that π(u) = [u] for every u ∈ E, is
a surjective linear map called the natural projection of E onto E/F . The vector space E/M
is called the quotient space of E by the subspace M.
2.7 Matrices
Proposition 2.8 shows that given two vector spaces E and F and a basis (uj )j∈J of E, every
linear map f : E → F is uniquely determined by the family (f (uj ))j∈J of the images under
f of the vectors in the basis (uj )j∈J . Thus, in particular, taking F = K (J) , we get an
isomorphism between any vector space E of dimension |J| and K (J) . If J = {1, . . . , n}, a
vector space E of dimension n is isomorphic to the vector space K n . If we also have a basis
(vi )i∈I of F , then every vector f (uj ) can be written in a unique way as
%
f (uj ) = ai j vi ,
i∈I
where j ∈ J, for a family (ai j )i∈I . Thus, with respect to the two bases (uj )j∈J of E and
(vi )i∈I of F , the linear map f is completely determined by a possibly infinite “I × J-matrix”
M(f ) = (ai j )i∈I, j∈J .
Remark: Note that we intentionally assigned the index set J to the basis (uj )j∈J of E,
and the index I to the basis (vi )i∈I of F , so that the rows of the matrix M(f ) associated
with f : E → F are indexed by I, and the columns of the matrix M(f ) are indexed by J.
Obviously, this causes a mildly unpleasant reversal. If we had considered the bases (ui )i∈I of
E and (vj )j∈J of F , we would obtain a J × I-matrix M(f ) = (aj i )j∈J, i∈I . No matter what
we do, there will be a reversal! We decided to stick to the bases (uj )j∈J of E and (vi )i∈I of
F , so that we get an I × J-matrix M(f ), knowing that we may occasionally suffer from this
decision!
When I and J are finite, and say, when |I| = m and |J| = n, the linear map f is
determined by the matrix M(f ) whose entries in the j-th column are the components of the
vector f (uj ) over the basis (v1 , . . . , vm ), that is, the matrix
a1 1 a1 2 . . . a1 n
a2 1 a2 2 . . . a2 n
M(f ) = ... .. .. ..
. . .
am 1 am 2 . . . am n
whose entry on row i and column j is ai j (1 ≤ i ≤ m, 1 ≤ j ≤ n).
28 CHAPTER 2. LINEAR ALGEBRA
The ring L(E; E) is an example of a noncommutative ring. It is easily seen that the set
of bijective linear maps f : E → E is a group under composition. It is called the general
linear group (of E), and it is denoted by GL(E), or when E = K n , by GL(n, K), or even
by GL(n).
We will now show that when E and F have finite dimension, linear maps can be very
conveniently represented by matrices, and that composition of linear maps corresponds to
matrix multiplication. We will follow rather closely an elegant presentation method due to
Emil Artin.
Let E and F be two vector spaces, and assume that E has a finite basis (u1 , . . . , un ) and
that F has a finite basis (v1 , . . . , vm ). Recall that we have shown that every vector x ∈ E
can be written in a unique way as
x = x1 u1 + · · · + xn un ,
2.7. MATRICES 29
y = y1 v1 + · · · + ym vm .
for every j, 1 ≤ j ≤ n. Then, substituting the right-hand side of each f (uj ) into the
expression for f (x), we get
m
% m
%
f (x) = x1 ( ai 1 vi ) + · · · + xn ( ai n vi ),
i=1 i=1
for all i, 1 ≤ i ≤ m. Let us now consider how the composition of linear maps is expressed in
terms of bases.
Let E, F , and G, be three vectors spaces with respective bases (u1 , . . . , up ) for E,
(v1 , . . . , vn ) for F , and (w1 , . . . , wm ) for G. Let g: E → F and f : F → G be linear maps.
As explained earlier, g: E → F is determined by the images of the basis vectors uj , and
f : F → G is determined by the images of the basis vectors vk . We would like to understand
how f ◦ g: E → G is determined by the images of the basis vectors uj .
x = x1 u1 + · · · + xp up ,
y = y1 v1 + · · · + yn vn ,
for all i, 1 ≤ i ≤ m. Then, if y = g(x) and z = f (y), we have z = f (g(x)), and in view of
(2) and (3), we have
n p
% %
zi = ai k ( bk j xj )
k=1 j=1
n p
%%
= ai k bk j xj
k=1 j=1
p n
% %
= ai k bk j xj
j=1 k=1
p n
% %
= ( ai k bk j )xj .
j=1 k=1
2.7. MATRICES 31
Definition 2.12 Given a field K, an m × n-matrix is a family (ai j )1≤i≤m, 1≤j≤n of scalars
in K, represented as an array
a1 1 a1 2 . . . a1 n
a2 1 a2 2 . . . a2 n
. .. .. ..
.. . . .
am 1 am 2 . . . am n
(a1 1 , . . . , a1 n )
In these last two cases, we usually omit the constant index 1 (first index in case of a row,
second index in case of a column). The set of all m × n-matrices is denoted by Mm,n (K)
or Mm,n . An n × n-matrix is called a square matrix of dimension n. The set of all square
square matrices of dimension n is denoted by Mn (K), or Mn .
Remark: As defined, a matrix A = (ai j )1≤i≤m, 1≤j≤n is a family, that is, a function from
{1, 2, . . . , m} × {1, 2, . . . , n} to K. As such, there is no reason to assume an ordering on
the indices. Thus, the matrix A can be represented in many different ways as an array, by
adopting different orders for the rows or the columns. However, it is customary (and usually
convenient) to assume the natural ordering on the sets {1, 2, . . . , m} and {1, 2, . . . , n}, and
to represent A as an array according to this ordering of the rows and columns.
We also define some operations on matrices as follows.
32 CHAPTER 2. LINEAR ALGEBRA
Definition 2.13 Given two m × n matrices A = (ai j ) and B = (bi j ), we define their sum
A + B as the matrix C = (ci j ) such that ci j = ai j + bi j . Given a scalar λ ∈ K, we define the
matrix λA as the matrix C = (ci j ) such that ci j = λai j . Given an m × n matrices A = (ai k )
and an n × p matrices B = (bk j ), we define their product AB as the m × p matrix C = (ci j )
such that n
%
ci j = ai k bk j ,
k=1
The square matrix In of dimension n containing 1 on the diagonal and 0 everywhere else
is called the identity matrix . It is denoted as
1 0 ... 0
0 1 ... 0
. . .
.. .. . . ...
0 0 ... 1
Note that the transpose A# of a matrix A has the property that the j-th row of A# is
the j-th column of A. In other words, transposition exchanges the rows and the columns
of a matrix. For every square matrix A of dimension n, it is immediately verified that
AIn = In A = A. If a matrix B such that AB = BA = In exists, then it is unique, and it is
called the inverse of A. The matrix B is also denoted by A−1 .
Consider the m × n-matrices Ei,j = (eh k ), defined such that ei j = 1, and eh k = 0, if h (= i
or k (= j. It is clear that every matrix A = (ai j ) ∈ Mm,n (K) can be written in a unique way
as m % n
%
A= ai j Ei,j .
i=1 j=1
2.7. MATRICES 33
Thus, the family (Ei,j )1≤i≤m,1≤j≤n is a basis of the vector space Mm,n (K), which has dimen-
sion mn.
Remark: Definition 2.12 and Definition 2.13 also make perfect sense when K is a (com-
mutative) ring rather than a field. In this more general setting, the framework of vector
spaces is too narrow, but we can consider structures over a commutative ring A satisfying
all the axioms of Definition 2.5. Such structures are called modules. The theory of modules
is (much) more complicated than that of vector spaces. For example, modules do not always
have a basis, and other properties holding for vector spaces usually fail for modules. When
a module has a basis, it is called a free module. For example, when A is a commutative
ring, the structure An is a module such that the vectors ei , with (ei )i = 1 and (ei )j = 0 for
j (= i, form a basis of An . Many properties of vector spaces still hold for An . Thus, An is a
free module. As another example, when A is a commutative ring, Mm,n (A) is a free module
with basis (Ei,j )1≤i≤m,1≤j≤n . Polynomials over a commutative ring also form a free module
of infinite dimension.
Square matrices provide a natural example of a noncommutative ring with zero divisors.
Example 2.12 For example, letting A, B be the 2 × 2-matrices
' ( ' (
1 0 0 0
A= , B= ,
0 0 1 0
then ' (' ( ' (
1 0 0 0 0 0
AB = = ,
0 0 1 0 0 0
and ' (' ( ' (
0 0 1 0 0 0
BA = = .
1 0 0 0 1 0
The matrix M(f ) associated with the linear map f : E → F is called the matrix of f with
respect to the bases (u1 , . . . , un ) and (v1 , . . . , vm ). When E = F and the basis (v1 , . . . , vm )
is identical to the basis (u1 , . . . , un ) of E, the matrix M(f ) associated with f : E → E (as
above) is called the matrix of f with respect to the base (u1 , . . . , un ).
Remark: As in the remark after Definition 2.12, there is no reason to assume that the
vectors in the bases (u1 , . . . , un ) and (v1 , . . . , vm ) are ordered in any particular way. However,
it is often convenient to assume the natural ordering. When this is so, authors sometimes
refer to the matrix M(f ) as the matrix of f with respect to the ordered bases (u1 , . . . , un )
and (v1 , . . . , vm ).
Then, given a linear map f : E → F represented by the matrix M(f ) = (ai j ) w.r.t. the
bases (u1 , . . . , un ) and (v1 , . . . , vm ), by equations (1) and the definition of matrix multipli-
cation, the equation y = f (x) correspond to the matrix equation M(y) = M(f )M(x), that
is,
y1 a1 1 . . . a1 n x1
.
.. = .. . . .. .. ...
.
ym am 1 . . . am n xn
The function that associates to a linear map f : E → F the matrix M(f ) w.r.t. the
bases (u1 , . . . , un ) and (v1 , . . . , vm ) has the property that matrix multiplication corresponds
to composition of linear maps. The following proposition states the main properties of the
mapping f 1→ M(f ) between L(E; F ) and Mm,n . In short, it is an isomorphism of vector
spaces.
Proposition 2.12 Given three vector spaces E, F , G, with respective bases (u1, . . . , up ),
(v1 , . . . , vn ), and (w1 , . . . , wm ), the mapping M: L(E; F ) → Mn,p that associates the matrix
M(g) to a linear map g: E → F satisfies the following properties for all x ∈ E, all g, h: E →
F , and all f : F → G:
M(g(x)) = M(g)M(x)
M(g + h) = M(g) + M(h)
M(λg) = λM(g)
M(f ◦ g) = M(f )M(g).
Thus, M: L(E; F ) → Mn,p is an isomorphism of vector spaces, and when p = n and the
basis (v1 , . . . , vn ) is identical to the basis (u1 , . . . , up ), M: L(E; E) → Mn is an isomorphism
of rings.
2.7. MATRICES 35
Proof . That M(g(x)) = M(g)M(x) was shown just before stating the proposition, using
identity (1). The identities M(g + h) = M(g) + M(h) and M(λg) = λM(g) are straightfor-
ward, and M(f ◦g) = M(f )M(g) follows from (4) and the definition of matrix multiplication.
The mapping M: L(E; F ) → Mn,p is clearly injective, and since every matrix defines a lin-
ear map, it is also surjective, and thus bijective. In view of the above identities, it is an
isomorphism (and similarly for M: L(E; E) → Mn ).
In view of Proposition 2.12, it seems preferable to represent vectors from a vector space
of finite dimension as column vectors rather than row vectors. Thus, from now on, we will
denote vectors of Rn (or more generally, of K n ) as columm vectors.
It is important to observe that the isomorphism M: L(E; F ) → Mn,p given by Proposition
2.12 depends on the choice of the bases (u1 , . . . , up ) and (v1 , . . . , vn ), and similarly for the
isomorphism M: L(E; E) → Mn , which depends on the choice of the basis (u1, . . . , un ). Thus,
it would be useful to know how a change of basis affects the representation of a linear map
f : E → F as a matrix. The following simple proposition is needed.
Proposition 2.13 Let E be a vector space, and let (u1 , . . . , un ) be a&basis of E. For every
family (v1 , . . . , vn ), let P = (ai j ) be the matrix defined such that vj = ni=1 ai j ui . The matrix
P is invertible iff (v1 , . . . , vn ) is a basis of E.
Proof . Note that we have P = M(f ), the matrix associated with the unique linear map
f : E → E such that f (ui ) = vi . By Proposition 2.8, f is bijective iff (v1 , . . . , vn ) is a basis of
E. Furthermore, it is obvious that the identity matrix In is the matrix associated with the
identity id: E → E w.r.t. any basis. If f is an isomorphism, then f ◦ f −1 = f −1 ◦ f = id, and
by Proposition 2.12, we get M(f )M(f −1 ) = M(f −1 )M(f ) = In , showing that P is invertible
and that M(f −1 ) = P −1 .
Proposition 2.13 suggests the following definition.
Definition 2.15 Given a vector space E of dimension n, for any two bases (u1 , . . . , un ) and
(v1 , . . . , vn ) of E, let P = (ai j ) be the invertible matrix defined such that
n
%
vj = ai j ui ,
i=1
which is also the matrix of the identity id: E → E with respect to the bases (v1 , . . . , vn ) and
(u1 , . . . , un ), in that order (indeed, we express each id(vj ) = vj over the basis (u1 , . . . , un )).
The matrix P is called the change of basis matrix from (u1 , . . . , un ) to (v1 , . . . , vn ).
showing that the old coordinates (xi ) of x (over (u1 , . . . , un )) are expressed in terms of the
new coordinates (x&i ) of x (over (v1 , . . . , vn )). Since the matrix P expresses the new basis
(v1 , . . . , vn ) in terms of the old basis (u1 , . . . , un ), we observe that the coordinates (xi ) of a
vector x vary in the opposite direction of the change of basis. For this reason, vectors are
sometimes said to be contravariant. However, this expression does not make sense! Indeed,
a vector in an intrinsic quantity that does not depend on a specific basis. What makes sense
is that the coordinates of a vector vary in a contravariant fashion.
The effect of a change of bases on the representation of a linear map is described in the
following proposition.
Proposition 2.14 Let E and F be vector spaces, let (u1 , . . . , un ) and (u&1, . . . , u&n ) be two
bases of E, and let (v1 , . . . , vm ) and (v1& , . . . , vm &
) be two bases of F . Let P be the change of
& &
basis matrix from (u1 , . . . , un ) to (u1 , . . . , un ), and let Q be the change of basis matrix from
(v1 , . . . , vm ) to (v1& , . . . , vm
&
). For any linear map f : E → F , let M(f ) be the matrix associated
to f w.r.t. the bases (u1 , . . . , un ) and (v1 , . . . , vm ), and let M & (f ) be the matrix associated to
f w.r.t. the bases (u&1 , . . . , u&n ) and (v1& , . . . , vm&
). We have
Proof . Since f : E → F can be written as f = idF ◦ f ◦ idE , since P is the matrix of idE
w.r.t. the bases (u&1 , . . . , u&n ) and (u1 , . . . , un ), and Q−1 is the matrix of idF w.r.t. the bases
(v1 , . . . , vm ) and (v1& , . . . , vm
&
), by Proposition 2.12, we have M & (f ) = Q−1 M(f )P .
As a corollary, we get the following result.
Corollary 2.15 Let E be a vector space, and let (u1 , . . . , un ) and (u&1 , . . . , u&n ) be two bases
of E. Let P be the change of basis matrix from (u1 , . . . , un ) to (u&1 , . . . , u&n ). For any linear
map f : E → E, let M(f ) be the matrix associated to f w.r.t. the basis (u1 , . . . , un ), and let
M & (f ) be the matrix associated to f w.r.t. the basis (u&1 , . . . , u&n ). We have
direct sum E ⊕ F of two vector spaces using the cartesian product E × F , we don’t quite get
the right notion because elements of E × F are ordered pairs, but we want E ⊕ F = F ⊕ E.
Thus, we want to think of the elements of E ⊕ F as unordrered pairs of elements. It is
possible to do so by considering the direct sum of a family (Ei )i∈{1,2} , and more generally of
a family (Ei )i∈I . For simplicity, we begin by considering the case where I = {1, 2}.
Definition 2.16 Given a family (Ei )i∈{1,2} of two vector spaces, we define the (external)
direct sum E1 ⊕ E2 of the family (Ei )i∈{1,2} as the set
with addition
{31, u14, 32, v14} + {31, u24, 32, v24} = {31, u1 + u2 4, 32, v1 + v2 4},
We define the injections in1 : E1 → E1 ⊕E2 and in2 : E2 → E1 ⊕E2 as the linear maps defined
such that,
in1 (u) = {31, u4, 32, 04},
and
in2 (v) = {31, 04, 32, v4}.
Note that
Thus, every member {31, u4, 32, v4} of E1 ⊕ E2 can be viewed as an unordered pair consisting
of the two vectors u and v, tagged with the index 1 and 2, respectively.
0
Remark: In fact, E1 ⊕ E2 is just the product i∈{1,2} Ei of the family (Ei )i∈{1,2} .
! This is not to be confused with the cartesian product E1 × E2 . The vector space E1 × E2
is the set of all ordered pairs 3u, v4, where u ∈ E1 , and v ∈ E2 , with addition and
multiplication by a scalar defined such that
Of course, we can define E1 × · · · × En for any number of vector spaces, and when
E1 = . . . = En , we denote this product by E n .
The following property holds.
Proposition 2.16 Given any two vector spaces, E1 and E2 , the set E1 ⊕ E2 is a vector
space. For every pair of linear maps, f : E1 → G and g: E2 → G, there is a unique linear
map, f + g: E1 ⊕ E2 → G, such that (f + g) ◦ in1 = f and (f + g) ◦ in2 = g, as in the following
diagram:
E1 """
"""
"""f
in1 """
# """
f +g "$
E1 ⊕% E2 !& G
#####
in2 ###
#####g
###
E2
Proof . Define
(f + g)({31, u4, 32, v4}) = f (u) + g(v),
for every u ∈ E1 and v ∈ E2 . It is immediately verified that f + g is the unique linear map
with the required properties.
We already noted that E1 ⊕ E2 is in bijection with E1 × E2 . If we define the projections
π1 : E1 ⊕ E2 → E1 and π2 : E1 ⊕ E2 → E2 , such that
and
π2 ({31, u4, 32, v4}) = v,
we have the following proposition.
Proposition 2.17 Given any two vector spaces, E1 and E2 , for every pair of linear maps,
f : D → E1 and g: D → E2 , there is a unique linear map, f × g: D → E1 ⊕ E2 , such that
π1 ◦ (f × g) = f and π2 ◦ (f × g) = g, as in the following diagram:
& E1
### %
f ##
### π1
#####
## f ×g
D
#
""" ! E1 ⊕ E2
"""
"""
" π2
g """"
""$ #
E2
2.8. DIRECT SUMS 39
Proof . Define
(f × g)(w) = {31, f (w)4, 32, g(w)4},
for every w ∈ D. It is immediately verified that f × g is the unique linear map with the
required properties.
Remark: It is a peculiarity of linear algebra that direct sums and products of finite families
are isomorphic. However, this is no longer true for products and sums of infinite families.
When U, V are subspaces of a vector space E, letting i1 : U → E and i2 : V → E be the
inclusion maps, if U ⊕ V is isomomorphic to E under the map i1 + i2 given by Proposition
2.16, we say that E is a direct (internal) sum of U and V , and we write E = U ⊕ V (with
a slight abuse of notation, since E and U ⊕ V are only isomorphic). It is also convenient to
define the sum U + V of U and V .
Definition 2.17 Given a vector space E, let U, V be any subspaces of E. We define the
sum U + V of U and V as the set
We say that E is the (internal) direct sum of U and V , denoted by E = U ⊕ V ,4 if for every
x ∈ E, there exist unique vectors u ∈ U and v ∈ V , such that x = u + v.
Proposition 2.18 Let E be a vector space. The following properties are equivalent:
1. E = U ⊕ V .
2. E = U + V and U ∩ V = 0.
Proof . First, assume that E is the direct sum of U and V . If x ∈ U ∩ V and x (= 0, since x
can be written both as x+0 and 0+x, we have a contradiction. Thus U ∩V = 0. Conversely,
assume that x = u + v and x = u& + v & , where u, u& ∈ U and v, v & ∈ V . Then,
v & − v = u − u& ,
where v & − v ∈ V and u − u& ∈ U. Since U ∩ V = 0, we must have u& = u and v & = v, and
thus E = U ⊕ V .
Proof . Let (ui )i∈I be a basis of U and let (vj )j∈J be a basis of V , where I and J are disjoint.
Clearly, (ui )i∈I ∪ (vj )j∈J is a basis of E.
We now give the definition of a direct sum for any arbitrary nonempty index set I. First,
let us recall
0 the notion of the product of a family (E)i )i∈I . Given a family of sets (Ei )i∈I , its
product i∈I Ei , is the set of all functions f : I → i∈I Ei , such that, f (i) ∈ Ei , for every
i ∈ I.0It is one of the many versions 0of the axiom of choice, that, if Ei (= ∅ for every i ∈ I,
then i∈I Ei (= ∅. A member 0 f ∈ i∈I Ei , is often denoted as (fi )i∈I . For every i ∈ I,
we have the projection πi : i∈I Ei → Ei , defined such that, πi ((fi )i∈I ) = fi . We now define
direct sums.
Definition 2.18 Let I be1 any nonempty set, and let (Ei )i∈I be a family of vector spaces.
The (external) direct sum i∈I Ei of the family (Ei )i∈I is defined as follows:
1 0
i∈I Ei consists of all f ∈ i∈I Ei , which have finite support, and addition and multi-
plication by a scalar are defined as follows:
Proposition 2.20 Let I be any nonempty 1 set, let (Ei )i∈I be a family of vector spaces, and
let G be any vector space. The direct sum i∈I Ei is a vector space, and for every family
(hi )i∈I of linear maps hi : Ei → G, there is a unique linear map
2% 3 4
hi : Ei → G,
i∈I i∈I
&
such that, ( i∈I hi ) ◦ ini = hi , for every i ∈ I.
1 (I)
Remark: When Ei = E, for all i ∈ I, we denote i∈I Ei by E . In particular, when
(I)
Ei = K, for all i ∈ I, we find the vector space K of Definition 2.9.
We also have the following basic proposition about injective or surjective linear maps.
Proposition 2.21 Let E and F be vector spaces, and let f : E → F be a linear map. If
f : E → F is injective, then, there is a surjective linear map r: F → E called a retraction,
such that r ◦ f = idE . If f : E → F is surjective, then, there is an injective linear map
s: F → E called a section, such that f ◦ s = idF .
2.8. DIRECT SUMS 41
Proof . Let (ui )i∈I be a basis of E. Since f : E → F is an injective linear map, by Proposition
2.8, (f (ui ))i∈I is linearly independent in F . By Theorem 2.2, there is a basis (vj )j∈J of F ,
where I ⊆ J, and where vi = f (ui ), for all i ∈ I. By Proposition 2.8, a linear map r: F → E
can be defined such that r(vi ) = ui , for all i ∈ I, and r(vj ) = w for all j ∈ (J − I), where w
is any given vector in E, say w = 0. Since r(f (ui )) = ui for all i ∈ I, by Proposition 2.8, we
have r ◦ f = idE .
Now, assume that f : E → F is surjective. Let (vj )j∈J be a basis of F . Since f : E → F
is surjective, for every vj ∈ F , there is some uj ∈ E such that f (uj ) = vj . Since (vj )j∈J is a
basis of F , by Proposition 2.8, there is a unique linear map s: F → E such that s(vj ) = uj .
Also, since f (s(vj )) = vj , by Proposition 2.8 (again), we must have f ◦ s = idF .
The converse of Proposition 2.21 is obvious. We now have the following fundamental
Proposition.
The property of a short exact sequence given by Proposition 2.22 is often described by saying
f g
that 0 −→ E −→ F −→ G −→ 0 is a (short) split exact sequence.
As a corollary of Proposition 2.22, we have the following result.
5
The existence of a section s: G → F of g follows from Proposition 2.21.
6
The existence of a retraction r: F → E of f follows from Proposition 2.21.
42 CHAPTER 2. LINEAR ALGEBRA
Corollary 2.23 Let E and F be vector spaces, and let f : E → F be a linear map. Then, E
is isomorphic to Ker f ⊕ Im f , and thus,
Proof . Consider
i f!
Ker f −→ E −→ Im f,
i f!
where Ker f −→ E is the inclusion map, and E −→ Im f is the surjection associated
f s
with E −→ F . Then, we apply Proposition 2.22 to any section Im f −→ E of f & to
get an isomorphism between E and Ker f ⊕ Im f , and Proposition 2.19, to get dim(E) =
dim(Ker f ) + dim(Im f ).
The following Proposition will also be useful.
3v, w4 ∈ R iff w − v ∈ U.
w−v =u
and
w & − v & = u& ,
where u, u& ∈ U, then, we have
Given a vector space E and any subspace U of E, Proposition 2.24 shows that the
dimension of any subspace V such that E = U ⊕ V depends only on U. We call dim(V ) the
codimension of U, and we denote it by codim(U). A subspace U of codimension 1 is called
a hyperplane.
The notion of rank of a linear map or of a matrix is an important one, both theoretically
and practically, since it is the key to the solvabilty of linear equations.
Definition 2.19 Given two vector spaces E and F and a linear map f : E → F , the rank
rk(f ) of f is the dimension dim(Im f ) of the image subspace Im f of F .
Definition 2.20 Given a m × n-matrix A = (ai j ) over the field K, the rank rk(A) of the
matrix A is the maximum number of linearly independent columns of A (viewed as vectors
in K m ).
In view of Proposition 2.3, the rank of a matrix A is the dimension of the subspace of
K m generated by the columns of A. Let E and F be two vector spaces, and let (u1 , . . . , un )
be a basis of E, and (v1 , . . . , vm ) a basis of F . Let f : E → F be a linear map, and let M(f )
be its matrix w.r.t. the bases (u1 , . . . , un ) and (v1 , . . . , vm ). Since the rank rk(f ) of f is the
dimension of Im f , which is generated by (f (u1 ), . . . , f (un )), the rank of f is the maximum
number of linearly independent vectors in (f (u1 ), . . . , f (un )), which is equal to the number
of linearly independent columns of M(f ), since F and K m are isomorphic. Thus, we have
rk(f ) = rk(M(f )), for every matrix representing f .
We will see later, using duality, that the rank of a matrix A is also equal to the maximal
number of linearly independent rows of A.
If U is a hyperplane, then E = U ⊕ V for some subspace V of dimension 1. However, a
subspace V of dimension 1 is generated by any nonzero vector v ∈ V , and thus we denote
44 CHAPTER 2. LINEAR ALGEBRA
The above identities mean that 3−, −4 is a bilinear map. In view of the above identities,
given any fixed vector v ∈ E, the map ṽ: E ∗ → K defined such that
ṽ(u∗ ) = 3u∗ , v4
for every u∗ ∈ E ∗ is a linear map from E ∗ to K, that is, ṽ is a linear form in E ∗∗ . Again
from the above identities, the map cE : E → E ∗∗ , defined such that
cE (v) = ṽ
for every v ∈ E, is a linear map. We shall see that it is injective, and that it is an isomorphism
when E has finite dimension.
2.9. THE DUAL SPACE E ∗ AND LINEAR FORMS 45
Definition 2.22 Given a vector space E and its dual E ∗ , we say that a vector v ∈ E and a
linear form u∗ ∈ E ∗ are orthogonal if 3u∗ , v4 = 0. Given a subspace V of E and a subspace U
of E ∗ , we say that V and U are orthogonal if 3u∗ , v4 = 0 for every u∗ ∈ U and every v ∈ V .
Given a subset V of E (resp. a subset U of E ∗ ), the orthogonal V 0 of V is the subspace V 0
of E ∗ defined such that
Given a vector space E and any basis (ui )i∈I for E, we can associate to each ui a linear
form u∗i ∈ E ∗ , and the u∗i have some remarkable properties.
Definition 2.23 Given a vector space E and any basis (ui )i∈I for E, by Proposition 2.8,
for every i ∈ I, there is a unique linear form u∗i such that
5
∗ 1 if i = j
ui (uj ) =
0 if i (= j,
for every j ∈ I. The linear form u∗i is called the coordinate form of index i w.r.t. the basis
(ui )i∈I .
Given an index set I, authors often define the so called “Kronecker symbol” δi j , such
that 5
1 if i = j
δi j =
0 if i (= j,
for all i, j ∈ I. Then, u∗i (uj ) = δi j .
Given a vector space E and a subspace U of E, by Theorem 2.2, every basis (ui )i∈I of U
can be extended to a basis (uj )j∈I∪J of E, where I ∩ J = ∅. We have the following important
theorem.
(c) For every subspace V of finite codimension m of E, for every subspace W of E such
that E = V ⊕ W (where W is of finite dimension m), for every basis (ui )i∈I of E such
that (u1 , . . . , um ) is a basis of W , the family (u∗1 , . . . , u∗m) is a basis of the orthogonal
V 0 of V in E ∗ . Furthermore, we have V 00 = V , and
for a family (λi )i∈I (of scalars in K). Since (λi )i∈I has finite support, there is a finite subset
J of I such that λi = 0 for all i ∈ I − J, and we have
%
λj u∗j = 0.
j∈J
&
Applying the linear form j∈J λj u∗j to each uj (j ∈ J), by Definition 2.23, since u∗i (uj ) = 1
if i = j and 0 otherwise, we get λj = 0 for all j ∈ J, that is λi = 0 for all i ∈ I (by definition
of J as the support). Thus, (u∗i )i∈I is linearly independent.
(b) Let (ui )i∈I∪J be a basis of E such that (ui )i∈I is a basis of V (where I ∩ J = ∅).
Clearly, we have V ⊆ V 00 . If V (= V 00 , then uj0 ∈ V 00 for some j0 ∈ J (and thus, j0 ∈ / I).
Since uj0 ∈ V 00 , uj0 is orthogonal to every linear form in V 0 . In particular, let g ∗ be the
linear form defined such that 5
∗ 1 if i = j0
g (ui ) =
0 if i (= j0 ,
where i ∈ I ∪ J. Then, it is clear that g ∗(ui ) = 0 for all i ∈ I, and thus that g ∗ ∈ V 0 .
However, g ∗ (uj0 ) = 1, contradicting the fact that uj0 is orthogonal to every linear form in
V 0 . Thus, V = V 00 .
(c) Let J = I − {1, . . . , m}. Every linear form f ∗ ∈ V 0 is orthogonal to every uj , for
j ∈ J, and thus, f ∗ (uj ) = 0, for all j ∈ J. For such a linear form f ∗ ∈ V 0 , let
This shows that (u∗1 , . . . , u∗m) generates V 0 , and since it is also a linearly independent family,
(u∗1 , . . . , u∗m) is a basis of V 0 . It is then obvious that dim(V ) + dim(V 0 ) = dim(E), and by
part (b), we have V 00 = V .
(d) Let (u∗1 , . . . , u∗m ) be a basis of U. Note that the map h: E → K m defined such that
h(v) = (u∗1 (v), . . . , u∗m (v))
for every v ∈ E, is a linear map, and that its kernel Ker h is precisely U 0 . Then, by
Proposition 2.23,
dim(E) = dim(Ker h) + dim(Im h) ≤ dim(U 0 ) + m,
since dim(Im h) ≤ m. Thus, U 0 is a subspace of E of finite codimension at most m, and by
(c), we have dim(U 0 )+dim(U 00 ) = dim(E). However, it is clear that U ⊆ U 00 , which implies
dim(U) ≤ dim(U 00 ), and so dim(U 0 ) + dim(U) ≤ dim(E). Thus, U 0 is of finite codimension
at least m, which implies codim(U 0 ) = m. But then, codim(U 0 ) = m = dim(U), and since
dim(U 0 ) + dim(U 00 ) = dim(E), we have dim(U 00 ) = m, and we must have U = U 00 .
Part (a) of Theorem 2.26 shows that
dim(E) ≤ dim(E ∗ ).
When E is of finite dimension n and (u1, . . . , un ) is a basis of E, by part (c), the family
(u∗1 , . . . , u∗n ) is a basis of the dual space E ∗ , called the dual basis of (u1 , . . . , un ). By part (c)
and (d) of theorem 2.26, the maps V 1→ V 0 and U 1→ U 0 , where V is a subspace of finite
codimension of E and U is a subspace of finite dimension of E ∗ , are inverse bijections. These
maps set up a duality between subspaces of finite codimension of E, and subspaces of finite
dimension of E ∗ .
! One should be careful that this bijection does not extend to subspaces of E ∗ of infinite
dimension.
!! When E is of infinite dimension, for every basis (ui )i∈I of E, the family (u∗i )i∈I of coor-
dinate forms is never a basis of E ∗ . It is linearly independent, but it is “too small” to
generate E ∗ . For example, if E = R(N) , where N = {0, 1, 2, . . .}, the map f : E → R that
sums the nonzero coordinates of a vector in E is a linear form, but it is easy to see that it
cannot be expressed as a linear combination of coordinate forms. As a consequence, when
E is of infinite dimension, E and E ∗ are not isomorphic.
When E is of finite dimension n and (u1 , . . . , un ) is a basis of E, we noted that the family
(u∗1 , . . . , u∗n ) is a basis of the dual space E ∗ (called the dual basis of (u1 , . . . , un )). Let us see
how the coordinates of a linear form ϕ∗ vary under a change of basis.
Let (u1, . . . , un ) and (v1 , . . . , vn ) be two bases of E, and let P = (ai j ) be the change of
basis matrix from (u1 , . . . , un ) to (v1 , . . . , vn ), so that
n
%
vj = ai j ui ,
i=1
48 CHAPTER 2. LINEAR ALGEBRA
and thus n
%
vj∗ = bj i u∗i ,
i=1
and similarly
n
%
u∗i = ai j vj∗ .
j=1
Since n n
% %
∗
ϕ = ϕi u∗i = ϕ&i vi∗ ,
i=1 i=1
we get
n
%
ϕ&j = ai j ϕi .
i=1
we note that this time, the coordinates (ϕi ) of the linear form ϕ∗ change in the same direction
as the change of basis. For this reason, we say that the coordinates of linear forms are
covariant. By abuse of language, it is often said that linear forms are covariant, which
explains why the term covector is also used for a linear form.
Remark: In many texts using tensors, vectors are often indexed with lower indices. If so, it
is more convenient to write the coordinates of a vector x over the basis (u1 , . . . , un ) as (xi ),
using an upper index, so that
%n
x= xi ui ,
i=1
and n
%
i
x = aij x&j .
j=1
Dually, linear forms are indexed with upper indices. Then, it is more convenient to write the
coordinates of a covector ϕ∗ over the dual basis (u∗ 1 , . . . , u∗n ) as (ϕi ), using a lower index,
so that n
%
∗
ϕ = ϕi u ∗ i
i=1
and n
%
ϕ&j = aij ϕi .
i=1
With these conventions, the index of summation appears once in upper position and once in
lower position, and the summation sign can be safely omitted, a trick due to Einstein. For
example, we can write
ϕ&j = aij ϕi
as an abbreviation for n
%
ϕ&j = aij ϕi .
i=1
We will now pin down the relationship between a vector space E and its bidual E ∗∗ .
cE (v) = ṽ,
cE (v)(u∗i ) = 3u∗i , v4 = vi ,
If E is of finite dimension n, by Theorem 2.26, for every basis (u1 , . . . , un ), the family
(u∗1 , . . . , u∗n )
is a basis of the dual space E ∗ , and thus the family (u∗∗ ∗∗
1 , . . . , un ) is a basis of
the bidual E ∗∗ . This shows that dim(E) = dim(E ∗∗ ) = n, and since by part (a), we know
that cE : E → E ∗∗ is injective, in fact, cE : E → E ∗∗ is bijective (because an injective map
carries a linearly independent family to a linearly independent family, and in a vector space
of dimension n, a linearly independent family of n vectors is a basis, see Proposition 2.3).
!! When a vector space E has infinite dimension, E and its bidual E ∗∗ are never isomorphic.
Definition 2.24 Given two vector spaces E and F over K, a pairing between E and F is
a bilinear map 3−, −4: E × F → K. Such a pairing is non-singular if for every u ∈ E, if
3u, v4 = 0 for all v ∈ F , then u = 0, and for every v ∈ F , if 3u, v4 = 0 for all u ∈ E, then
v = 0.
For example, the map 3−, −4: E ∗ × E → K defined earlier is a non-singular pairing (use
the proof of (a) in lemma 2.27).
Given a pairing 3−, −4: E × F → K, we can define two maps ϕ: E → F ∗ and ψ: F → E ∗
as follows: For every u ∈ E, we define the linear form ϕ(u) in F ∗ such that
ϕ(u)(v) = 3u, v4
for every v ∈ F , and we define the linear form ψ(u) in E ∗ such that
ψ(v)(u) = 3u, v4
Proposition 2.28 Given two vector spaces E and F over K, for every non-singular pairing
3−, −4: E × F → K between E and F , the maps ϕ: E → F ∗ and ψ: F → E ∗ are linear and
injective. Furthermore, if E and F have finite dimension, then this dimension is the same
and ϕ: E → F ∗ and ψ: F → E ∗ are bijections.
dim(F ) = n. The same argument applies to E, and thus n = dim(F ) ≤ dim(E) = m. But
then, dim(E) = dim(F ), and ϕ: E → F ∗ and ψ: F → E ∗ are bijections.
When E has finite dimension, the non-singular pairing 3−, −4: E ∗ ×E → K yields another
proof of the existence of a natural isomorphism between E and E ∗∗ . Interesting non-singular
pairings arise in exterior algebra. We now show the relationship between hyperplanes and
linear forms.
Proof . (a) If f ∗ ∈ E ∗ is nonnull, there is some vector v0 ∈ E such that f ∗ (v0 ) (= 0. Let
H = Ker f ∗ . For every v ∈ E, we have
' (
∗ f ∗ (v) f ∗ (v) ∗
f v− ∗ v0 = f ∗ (v) − ∗ f (v0 ) = f ∗ (v) − f ∗ (v) = 0.
f (v0 ) f (v0 )
Thus,
f ∗ (v)
v− v0 = h ∈ H,
f ∗ (v0 )
and
f ∗ (v)
v =h+ v0 ,
f ∗ (v0 )
that is, E = H + Kv0 . Also, since f ∗ (v0 ) (= 0, we have v0 ∈
/ H, that is, H ∩ Kv0 = 0. Thus,
E = H ⊕ Kv0 , and H is a hyperplane.
(b) If H is a hyperplane, E = H ⊕ Kv0 for some v0 ∈ / H. Then, every v ∈ E can be
written in a unique way as v = h + λv0 . Thus, there is a well-defined function f ∗ : E → K,
such that, f ∗ (v) = λ, for every v = h + λv0 . We leave as a simple exercise the verification
that f ∗ is a linear form. Since f ∗ (v0 ) = 1, the linear form f ∗ is nonnull. Also, by definition,
it is clear that λ = 0 iff v ∈ H, that is, Ker f ∗ = H.
(c) Let H be a hyperplane in E, and let f ∗ ∈ E ∗ be any (nonnull) linear form such that
H = Ker f ∗ . Clearly, if g ∗ = λf ∗ for some λ (= 0, then H = Ker g ∗ . Conversely, assume that
52 CHAPTER 2. LINEAR ALGEBRA
H = Ker g ∗ for some nonnull linear form g ∗ . From (a), we have E = H ⊕ Kv0 , for some v0
such that f ∗ (v0 ) (= 0 and g ∗ (v0 ) (= 0. Then, observe that
g ∗(v0 ) ∗
g∗ − f
f ∗ (v0 )
is a linear form that vanishes on H, since both f ∗ and g ∗ vanish on H, but also vanishes on
Kv0 . Thus, g ∗ = λf ∗ , with
g ∗ (v0 )
λ= ∗ .
f (v0 )
If E is a vector space of finite dimension n and (u1 , . . . , un ) is a basis of E, for any linear
form f ∗ ∈ E ∗ , for every x = x1 u1 + · · · + xn un ∈ E, we have
f ∗ (x) = λ1 x1 + · · · + λn xn ,
where λi = f ∗ (ui ) ∈ K, for every i, 1 ≤ i ≤ n. Thus, with respect to the basis (u1 , . . . , un ),
f ∗ (x) is a linear combination of the coordinates of x, as expected.
We leave as an exercise the fact that every subspace V (= E of a vector space E, is the
intersection of all hyperplanes that contain V . We now consider the notion of transpose of
a linear map and of a matrix.
The following proposition shows the relationship between orthogonality and transposi-
tion.
f (U)0 = (f # )−1 (U 0 ).
Proof . We have
3v ∗ , f (u)4 = 3f # (v ∗ ), u4,
for all u ∈ E and all v ∗ ∈ F ∗ , and thus, we have 3v ∗ , f (u)4 = 0 for every u ∈ U, i.e.,
v ∗ ∈ f (U)0 , iff 3f # (v ∗ ), u4 = 0 for every u ∈ U, i.e., v ∗ ∈ (f # )−1 (U 0 ), proving that
f (U)0 = (f # )−1 (U 0 ).
s# ◦ p# = idI ∗ ,
and
j # ◦ r # = idI ∗ .
54 CHAPTER 2. LINEAR ALGEBRA
j# p#
Thus, F ∗ −→ I ∗ is surjective, and I ∗ −→ E ∗ is injective. Since f = j ◦ p, we also have
f # = (j ◦ p)# = p# ◦ j # ,
j# p#
and since F ∗ −→ I ∗ is surjective, and I ∗ −→ E ∗ is injective, we have an isomorphism
between (Im f )∗ and f # (F ∗ ).
(b) We already noted that part (a) of Theorem 2.26 shows that dim(E) ≤ dim(E ∗ ),
for every vector space E. Thus, dim(Im f ) ≤ dim((Im f )∗ ), which, by (a), shows that
rk(f ) ≤ rk(f # ). When dim(Im f ) is finite, we already observed that as a corollary of
Theorem 2.26, dim(Im f ) = dim((Im f )∗ ), and thus, we have rk(f ) = rk(f # ).
The following proposition shows the relationship between the matrix representing a linear
map f : E → F and the matrix representing its transpose f # : F ∗ → E ∗ .
Proposition 2.32 Let E and F be two vector spaces, and let (u1 , . . . , un ) be a basis for E,
and (v1 , . . . , vm ) be a basis for F . Given any linear map f : E → F , if M(f ) is the m × n-
matrix representing f w.r.t. the bases (u1 , . . . , un ) and (v1 , . . . , vm ), the n×m-matrix M(f # )
representing f # : F ∗ → E ∗ w.r.t. the dual bases (v1∗ , . . . , vm
∗
) and (u∗1 , . . . , u∗n ) is the transpose
#
M(f ) of M(f ).
Proof . Recall that the entry ai j in row i and column j of M(f ) is the i-th coordinate of f (uj )
over the basis (v1 , . . . , vm ). By definition of vi∗ , we have 3vi∗ , f (uj )4 = ai j . The entry a#
j i in
row j and column i of M(f # ) is the j-th coordinate of f # (vi∗ ) over the basis (u∗1 , . . . , u∗n ),
which is just f # (vi∗ )(uj ) = 3f # (vi∗ ), uj 4. Since
we have ai j = a# # #
j i , proving that M(f ) = M(f ) .
We now can give a very short proof of the fact that the rank of a matrix is equal to the
rank of its transpose.
Determinants
55
56 CHAPTER 3. DETERMINANTS
Proof . Consider the relation Rπ defined on In as follows: iRπ j iff there is some k ≥ 1 such
that j = π k (i). We claim that Rπ is an equivalence relation. Transitivity is obvious. We
claim that for every i ∈ In , there is some least r (1 ≤ r ≤ n) such that π r (i) = i. Indeed,
consider the following sequence of n + 1 elements:
Since In only has n distinct elements, there are some h, k with 0 ≤ h < k ≤ n such that
π h (i) = π k (i),
that τ (n) = k and τ (k) = n, it is clear that τ ◦ π leaves n invariant, and by the induction
hypothesis, we have τ ◦ π = τm ◦ . . . ◦ τ1 for some transpositions, and thus
π = τ ◦ τm ◦ . . . ◦ τ1 ,
Remark: When π = idn is the identity permutation, we can agree that the composition of
0 transpositions is the identity. The second part of Proposition 3.1 shows that the transpo-
sitions generate the set of all permutations.
In writing a permutation π as a composition π = σ1 ◦ . . . ◦ σs of cyclic permutations, it
is clear that the order of the σi does not matter, since their domains are disjoint. Given
a permutation written as a product of transpositions, we now show that the parity of the
number of transpositions is an invariant.
Proposition 3.2 For every n ≥ 2, for every permutation π: In → In , for every transposition
τ , we have
-(τ ◦ π) = −-(π).
Consequently, for every product of transpositions such that π = τm ◦ . . . ◦ τ1 , we have
-(π) = (−1)m ,
Proof . Assume that τ (i) = j and τ (j) = i, where i < j. There are two cases, depending
whether i and j are in the same equivalence class Jl of Rπ , or if they are in distinct equivalence
classes. If i and j are in the same class Jl , then if
Jl = {i1 , . . . , ip , . . . iq , . . . ik },
and
τ (π(iq−1 )) = τ (iq ) = τ (j) = i = ip ,
58 CHAPTER 3. DETERMINANTS
it is clear that Jl splits into two subsets, one of which is {ip , . . . , iq−1 }, and thus, the number
of classes associated with τ ◦ π is r + 1, and -(τ ◦ π) = (−1)n−r−1 = −(−1)n−r = −-(π). If i
and j are in distinct equivalence classes Jl and Jm , say
{i1 , . . . , ip , . . . ih }
and
{j1 , . . . , jq , . . . jk },
where ip = i and jq = j, since
and
τ (π(π −1 (jq ))) = τ (jq ) = τ (j) = i = ip ,
we see that the classes Jl and Jm merge into a single class, and thus, the number of classes
associated with τ ◦ π is r − 1, and -(τ ◦ π) = (−1)n−r+1 = −(−1)n−r = −-(π).
Now, let π = τm ◦ . . . ◦ τ1 be any product of transpositions. By the first part of the
proposition, we have
Remark: When π = idn is the identity permutation, since we agreed that the composition
of 0 transpositions is the identity, it it still correct that (−1)0 = -(id) = +1. From the
proposition, it is immediate that -(π & ◦ π) = -(π & )-(π). In particular, since π −1 ◦ π = idn , we
get -(π −1 ) = -(π).
We can now proceed with the definition of determinants.
Remark: Most of the definitions and results presented in this section also hold when K is
a commutative ring, and when we consider modules over K (free modules, when bases are
needed).
Let E1 , . . . , En , and F , be vector spaces over a field K, where n ≥ 1.
3.2. ALTERNATING MULTILINEAR MAPS 59
(1)
f (. . . , xi , xi+1 , . . .) = −f (. . . , xi+1 , xi , . . .)
(2)
f (. . . , xi , . . . , xj , . . .) = 0,
where xi = xj , and 1 ≤ i < j ≤ n.
(3)
f (. . . , xi , . . . , xj , . . .) = −f (. . . , xj , . . . , xi , . . .),
where 1 ≤ i < j ≤ n.
60 CHAPTER 3. DETERMINANTS
(4)
f (. . . , xi , . . .) = f (. . . , xi + λxj , . . .),
for any λ ∈ K, and where i (= j.
f (. . . , xi , . . . , xj , . . .) = -f (. . . , xi , xj , . . .),
L(A)1 (u1 ) = a1 1 u1 + · · · + a1 n un ,
...
L(A)n (un ) = an 1 u1 + · · · + an n un ,
for all u1 , . . . , un ∈ E. It is immediately verified that L(A) is linear. Then, given two n × n
matrice A = (ai j ) and B = (bi j ), by repeating the calculations establishing the product of
matrices (just before Definition 2.12), we can show that
It is then convenient to use the matrix notation to describe the effect of the linear map L(A),
as u
L(A)1 (u1 ) a1 1 a1 2 . . . a1 n 1
L(A)2 (u2 ) a
2 1 a2 2 . . . a2 n u2
.. = . . .. .. ..
. . .
. .. .. .
L(A)n (un ) an 1 an 2 . . . an n un
3.2. ALTERNATING MULTILINEAR MAPS 61
v1 = a1 1 u1 + · · · + an 1 un ,
...
vn = a1 n u1 + · · · + an n un .
Equivalently, letting
a1 1 a1 2 ... a1 n
a2 1 a2 2 ... a2 n
A=
... .. .. ..
. . .
an 1 an 2 ... an n
assume that we have
v1 u1
v u
2 2
.. = A# .. .
. .
vn un
Then, 2% 3
f (v1 , . . . , vn ) = -(π)aπ(1) 1 · · · aπ(n) n f (u1 , . . . , un ),
π
for all possible functions π: {1, . . . , n} → {1, . . . , n}. However, because f is alternating, only
the terms for which π is a permutation are nonzero. By Proposition 3.1, every permutation
π is a product of transpositions, and by Proposition 3.2, the parity -(π) of the number of
transpositions only depends on π. Then, applying Proposition 3.3 (3) to each transposition
in π, we get
is in fact the value of the determinant of A (which, as we shall see shortly, is also equal to the
determinant of A# ). However, working directly with the above definition is quite ackward,
and we will proceed via a slightly indirect route
62 CHAPTER 3. DETERMINANTS
D: Mn (K) → K,
which, when viewed as a map on (K n )n , i.e., a map of the n columns of a matrix, is n-linear
alternating and such that D(In ) = 1 for the identity matrix In . Equivalently, we can consider
a vector space E of dimension n, some fixed basis (e1 , . . . , en ), and define
D: E n → K
First, we will show that such maps D exist, using an inductive definition that also gives
a recursive method for computing determinants. Actually, we will define a family (Dn )n≥1
of (finite) sets of maps D: Mn (K) → K. Second, we will show that determinants are in fact
uniquely defined, that is, we will show that each Dn consists of a single map. This will show
the equivalence of the direct definition det(A) of Lemma 3.4 with the inductive definition
D(A). Finally, we will prove some basic properties of determinants, using the uniqueness
theorem. Given a matrix A ∈ Mn (K), we denote its n columns by A1 , . . . , An .
Definition 3.5 For every n ≥ 1, we define a finite set Dn of maps D: Mn (K) → K induc-
tively as follows:
When n = 1, D1 consists of the single map D such that, D(A) = a, where A = (a), with
a ∈ K.
Assume that Dn−1 has been defined, where n ≥ 2. We define the set Dn as follows. For
every matrix A ∈ Mn (K), let Ai j be the (n − 1) × (n − 1)-matrix obtained from A = (ai j )
by deleting row i and column j. Then, Dn consists of all the maps D such that, for some i,
1 ≤ i ≤ n,
D(A) = (−1)i+1 ai 1 D(Ai 1 ) + · · · + (−1)i+n ai n D(Ai n ),
where for every j, 1 ≤ j ≤ n, D(Ai j ) is the result of applying any D in Dn−1 to Ai j .
! We confess that the use of the same letter D for the member of Dn being defined, and
for members of Dn−1 , may be slightly confusing. We considered using subscripts to
distinguish, but this seems to complicate things unnecessarily. One should not worry too
much anyway, since it will turn out that each Dn contains just one map.
3.3. DEFINITION OF A DETERMINANT 63
Each (−1)i+j D(Ai j ) is called the cofactor of ai j , and the inductive expression for D(A)
is called a Laplace expansion of D according to the i-th row . Given a matrix A ∈ Mn (K),
each D(A) is called a determinant of A.
We will prove shortly that D(A) is uniquely defined (at the momement, it is not clear
that Dn consists of a single map). Assuming this fact, given a n × n-matrix A = (ai j ),
a1 1 a1 2 . . . a1 n
a2 1 a2 2 . . . a2 n
A= ... .. .. ..
. . .
an 1 an 2 . . . an n
D(A) = ad − bc.
2. When n = 3, if
a1 1 a1 2 a1 3
A = a2 1 a2 2 a2 3
a3 1 a3 2 a3 3
expanding according to the first row, we have
6 6 6 6 6 6
6 a2 2 a2 3 6 6 a2 3 66 6a a2 2 66
D(A) = a1 1 66 6 − a1 2 6 a2 1 + a1 3 66 2 1
a3 2 a3 3 6 6 a3 1 a3 3 6 a3 1 a3 2 6
that is,
D(A) = a1 1 (a2 2 a3 3 − a3 2 a2 3 ) − a1 2 (a2 1 a3 3 − a3 1 a2 3 ) + a1 3 (a2 1 a3 2 − a3 1 a2 2 ),
which gives the explicit formula
D(A) = a1 1 a2 2 a3 3 + a2 1 a3 2 a1 3 + a3 1 a1 2 a2 3 − a1 1 a3 2 a2 3 − a2 1 a1 2 a3 3 − a3 1 a2 2 a1 3 .
64 CHAPTER 3. DETERMINANTS
Theorem 3.6 For every n ≥ 1, for every D ∈ Dn , for every matrix A ∈ Mn (K), we have
%
D(A) = -(π)aπ(1) 1 · · · aπ(n) n ,
π
where the sum ranges over all permutations π on {1, . . . , n}. As a consequence, Dn consists
of a single map for every n ≥ 1, and this map is given by the above explicit formula.
Proof . Consider the standard basis (e1 , . . . , en ) of K n , where (ei )i = 1 and (ei )j = 0, for
j (= i. Then, each column Aj of A corresponds to a vector vj whose coordinates over the
basis (e1 , . . . , en ) are the components of Aj , that is, we can write
v1 = a1 1 e1 + · · · + an 1 en ,
...
vn = a1 n e1 + · · · + an n en .
3.3. DEFINITION OF A DETERMINANT 65
Since by Lemma 3.5, each D is a multilinear alternating map, by applying Lemma 3.4, we
get 2% 3
D(A) = D(v1 , . . . , vn ) = -(π)aπ(1) 1 · · · aπ(n) n D(e1 , . . . , en ),
π
where the sum ranges over all permutations π on {1, . . . , n}. But D(e1 , . . . , en ) = D(In ),
and by Lemma 3.5, we have D(In ) = 1. Thus,
%
D(A) = -(π)aπ(1) 1 · · · aπ(n) n ,
π
where the sum ranges over all permutations π on {1, . . . , n}. Since a permutation is invertible,
every product
aπ(1) 1 · · · aπ(n) n
can be rewritten as
a1 π−1 (1) · · · an π−1 (n) ,
and since -(π −1 ) = -(π) and the sum is taken over all permutations on {1, . . . , n}, we have
% %
-(π)aπ(1) 1 · · · aπ(n) n = -(σ)a1 σ(1) · · · an σ(n) ,
π σ
where π and σ range over all permutations. But it is immediately verified that
%
D(A# ) = -(σ)a1 σ(1) · · · an σ(n) .
σ
A useful consequence of Corollary 3.7 is that the determinant of a matrix is also a multi-
linear alternating map of its rows. This fact, combined with the fact that the determinant of
a matrix is a multilinear alternating map of its columns is often useful for finding short-cuts
in computing determinants. We illustrate this point on the following example which shows
up in polynomial interpolation.
66 CHAPTER 3. DETERMINANTS
We claim that 7
V (x1 , . . . , xn ) = (xj − xi ),
1≤i<j≤n
Now, expanding this determinant according to the first column and using multilinearity,
we can factor (xi − x1 ) from the column of index i − 1 of the matrix obtained by deleting
the first row and the first column, and thus
V (x1 , . . . , xn ) = (x2 − x1 )(x3 − x1 ) · · · (xn − x1 )V (x2 , . . . , xn ),
which establishes the induction step.
Proof . The only difference with Lemma 3.4 is that here, we are using A# instead of A. Thus,
by Lemma 3.4 and Corollary 3.7, we get the desired result.
As a consequence, we get the very useful property that the determinant of a product of
matrices is the product of the determinants of these matrices.
Proposition 3.9 For any two n × n-matrices A and B, we have D(AB) = D(A)D(B).
Proof . We use Proposition 3.8 as follows: let (e1 , . . . , en ) be the standard basis of K n , and
let
w1 e1
w e
2 2
.. = AB .. .
. .
wn en
Then, we get
D(w1 , . . . , wn ) = D(AB)D(e1 , . . . , en ) = D(AB),
since D(e1 , . . . , en ) = 1. Now, letting
v1 e1
v e
2 2
.. = B .. ,
. .
vn en
we get
D(v1 , . . . , vn ) = D(B),
and since
w1 v1
w v
2 2
.. = A .. ,
. .
wn vn
we get
D(w1 , . . . , wn ) = D(A)D(v1 , . . . , vn ) = D(A)D(B).
68 CHAPTER 3. DETERMINANTS
It should be noted that all the results of this section, up to now, also holds when K is a
commutative ring, and not necessarily a field. We can now characterize when an n×n-matrix
A is invertible in terms of its determinant D(A).
8 = (bi j )
Definition 3.6 Let K be a commutative ring. Given a matrix A ∈ Mn (K), let A
be the matrix defined such that
bi j = (−1)i+j D(Aj i ),
bi j = (−1)i+j D(Aj i ).
Proposition 3.10 Let K be a commutative ring. For every matrix A ∈ Mn (K), we have
8 = AA
AA 8 = det(A)In .
8
As a consequence, A is invertible iff det(A) is invertible, and if so, A−1 = (det(A))−1 A.
8 = (bi j ) and AA
Proof . If A 8 = (ci j ), we know that the entry ci j in row i and column j of
AA8 is
ci j = ai 1 b1 j + · · · + ai k bk j + · · · + ai n bn j ,
which is equal to
ai 1 (−1)j+1 D(Aj 1 ) + · · · + ai n (−1)j+n D(Aj n ).
If j = i, then we recognize the expression of the expansion of det(A) according to the i-th
row:
ci i = det(A) = ai 1 (−1)i+1 D(Ai 1 ) + · · · + ai n (−1)i+n D(Ai n ).
If j (= i, we can form the matrix A& by replacing the j-th row of A by the i-th row of A.
Now, the matrix Aj k obtained by deleting row j and column k from A is equal to the matrix
3.4. INVERSE MATRICES AND DETERMINANTS 69
A&j k obtained by deleting row j and column k from A& , since A and A& only differ by the j-th
row. Thus,
D(Aj k ) = D(A&j k ),
and we have
ci j = ai 1 (−1)j+1D(A&j 1 ) + · · · + ai n (−1)j+n D(A&j n ).
However, this is the expansion of D(A& ) according to the j-th row, since the j-th row of A&
is equal to the i-th row of A, and since A& has two identical rows i and j, because D is an
alternating map of the rows (see an earlier remark), we have D(A& ) = 0. Thus, we have
shown that ci i = det(A), and ci j = 0, when j (= i, and so
8 = det(A)In .
AA
8 that
It is also obvious from the definition of A,
9# .
8# = A
A
9# = det(A# )In ,
A# A
9# = A# A
det(A)In = A# A 8# = (AA)
8 #,
that is,
8 # = det(A)In ,
(AA)
which yields
8 = det(A)In ,
AA
since In# = In . This proves that
8 = AA
AA 8 = det(A)In .
8 Conversely, if A is
As a consequence, if det(A) is invertible, we have A−1 = (det(A))−1 A.
−1 −1
invertible, from AA = In , by Proposition 3.9, we have det(A) det(A ) = 1, and det(A) is
invertible.
When K is a field, an element a ∈ K is invertible iff a (= 0. In this case, the second part
of the proposition can be stated as A is invertible iff det(A) (= 0. Note in passing that this
method of computing the inverse of a matrix is usually not practical.
We now consider some applications of determinants to linear independence and to solving
systems of linear equations. Although these results hold for matrices over an integral domain,
70 CHAPTER 3. DETERMINANTS
their proofs require more sophisticated methods (it is necessary to use the fraction field of
the integral domain, K). Therefore, we assume again that K is a field.
Let A be an n×n-matrix, X a column vectors of variables, and B another column vector,
and let A1 , . . . , An denote the columns of A. Observe that the system of equation AX = B,
a1 1 a1 2 . . . a1 n x1 b1
a2 1 a2 2 . . . a2 n x2 b2
. .. .. .. . = .
.. . . . .. ..
an 1 an 2 . . . an n xn bn
is equivalent to
x1 A1 + · · · + xj Aj + · · · + xn An = B,
since the equation corresponding to the i-th row is in both cases
ai 1 x1 + · · · + ai j xj + · · · + ai n xn = bi .
Proof . First, assume that the columns A1 , . . . , An of A are linearly dependent. Then, there
are x1 , . . . , xn ∈ K, such that
x1 A1 + · · · + xj Aj + · · · + xn An = 0,
D(A1 , . . . , x1 A1 + · · · + xj Aj + · · · + xn An , . . . , An ) = D(A1 , . . . , 0, . . . , An ) = 0,
where 0 occurs in the j-th position, by multilinearity, all terms containing two identical
columns Ak for k (= j vanish, and we get
xj D(A1 , . . . , An ) = 0
form a basis of K n , and we can express the standard basis (e1 , . . . , en ) of K n in terms of
A1 , . . . , An . Thus, we have
1
e1 b1 1 b1 2 . . . b1 n A
e b b2 2 . . . b2 n A2
2 21
.. = .. .. .. .. . ,
. . . . . ..
en bn1 bn 2 . . . bn n An
D(e1 , . . . , en ) = D(B)D(A1 , . . . , An ),
and since D(e1 , . . . , en ) = 1, this implies that D(A1 , . . . , An ) (= 0 (and D(B) (= 0). For the
second assertion, recall that the rank of a matrix is equal to the maximum number of linearly
independent columns, and the conclusion is clear.
Proposition 3.12 Given an n × n-matrix A over a field K, the following properties hold:
(1) For every column vector B, there is a unique column vector X such that AX = B iff
the only solution to AX = 0 is the trivial vector X = 0, iff D(A) (= 0.
(3) The system of linear equations AX = 0 has a nonzero solution iff D(A) = 0.
Proof . Assume that AX = B has a single solution X0 , and assume that AY = 0 with Y (= 0.
Then,
A(X0 + Y ) = AX0 + AY = AX0 + 0 = B,
and X0 + Y (= X0 is another solution of AX = B, contadicting the hypothesis that AX = B
has a single solution X0 . Thus, AX = 0 only has the trivial solution. Now, assume that
AX = 0 only has the trivial solution. This means that the columns A1 , . . . , An of A are
linearly independent, and by Proposition 3.11, we have D(A) (= 0. Finally, if D(A) (= 0,
by Proposition 3.10, this means that A is invertible, and then, for every B, AX = B is
equivalent to X = A−1 B, which shows that AX = B has a single solution.
72 CHAPTER 3. DETERMINANTS
(1) One should avoid computing the inverse, A−1 , of A explicitly. This is because this
would amount to solving the n linear systems, Au(j) = ej , for j = 1, . . . , n, where
ej = (0, . . . , 1, . . . , 0) is the jth canonical basis vector of Rn (with a 1 is the jth slot).
By doing so, we would replace the resolution of a single system by the resolution of n
systems, and we would still have to multiply A−1 by b.
(2) One does not solve (large) linear systems by computing determinants (using Cramer’s
formulae). This is because this method requires a number of additions (resp. multipli-
cations) proportional to (n + 1)! (resp. (n + 2)!).
The key idea on which most direct methods (as opposed to iterative methods, that look
for an approximation of the solution) are based is that if A is an upper-triangular matrix,
which means that aij = 0 for 1 ≤ j < i ≤ n (resp. lower-triangular, which means that
aij = 0 for 1 ≤ i < j ≤ n), then computing the solution, x, is trivial. Indeed, say A is an
upper-triangular matrix
a1 1 a1 2 · · · a1 n−2 a1 n−1 a1 n
0 a2 2 · · · a2 n−2 a2 n−1 a2 n
.. .. .. ..
0 0 . . . .
A=
.. .. .. .
. . .
0 0 ··· 0 an−1 n−1 an−1 n
0 0 ··· 0 0 an n
xn = a−1
n n bn
xn−1 = a−1
n−1 n−1 (bn−1 − an−1 n xn )
..
.
x1 = a−1
1 1 (b1 − a1 2 x2 − · · · − a1 n xn ).
Thus, what we need is a method for transforming a matrix to an equivalent one in upper-
triangular form. This can be done by elimination. Let us illustrate this method on the
following example:
2x + y + z = 5
4x − 6y = −2
−2x + 7y + 2z = 9.
We can eliminate the variable x from the second and the third equation as follows: Subtract
twice the first equation from the second and add the first equation to the third. We get the
new system
2x + y + z = 5
− 8y − 2z = −12
8y + 3z = 14.
This time, we can eliminate the variable y from the third equation by adding the second
equation to the third:
2x + y + z = 5
− 8y − 2z = −12
z = 2.
This last system is upper-triangular. Using back-substitution, we find the solution: z = 2,
y = 1, x = 1.
Observe that we have performed only row operations. The general method is to iteratively
eliminate variables using simple row operations (namely, adding or subtracting a multiple of
a row to another row of the matrix) while simultaneously applying these operations to the
vector b, to obtain a system, MAx = Mb, where MA is upper-triangular. Such a method is
called Gaussian elimination. However, one extra twist is needed for the method to work in
all cases: It may be necessary to permute rows, as illustrated by the following example:
x + y + z =1
x + y + 3z =1
2x + 5y + 8z = 1.
In order to eliminate x from the second and third row, we subtract the first row from the
second and we subtract twice the first row from the third:
x + y + z =1
2z =0
3y + 6z = −1.
Now, the trouble is that y does not occur in the second row; so, we can’t eliminate y from
the third row by adding or subtracting a multiple of the second row to it. The remedy is
simple: Permute the second and the third row! We get the system:
x + y + z =1
3y + 6z = −1
2z = 0,
3.7. GAUSSIAN ELIMINATION, LU AND CHOLESKY FACTORIZATION 75
which is already in triangular form. Another example where some permutations are needed
is:
z = 1
−2x + 7y + 2z = 1
4x − 6y = −1.
First, we permute the first and the second row, obtaining
−2x + 7y + 2z = 1
z = 1
4x − 6y = −1,
and then, we add twice the first row to the third, obtaining:
−2x + 7y + 2z = 1
z = 1
8y + 4z = 1.
Again, we permute the second and the third row, getting
−2x + 7y + 2z = 1
8y + 4z = 1
z = 1,
an upper-triangular system. Of course, in this example, z is already solved and we could
have eliminated it first, but for the general method, we need to proceed in a systematic
fashion.
We now describe the method of Gaussian Elimination applied to a linear system, Ax = b,
where A is assumed to be invertible. We use the variable k to keep track of the stages of
elimination. Initially, k = 1.
(1) The first step is to pick some nonzero entry, ai 1 , in the first column of A. Such an entry
must exist, since A is invertible (otherwise, we would have det(A) = 0). The actual
choice of such an element has some impact on the numerical stability of the method,
but this will be examined later. For the time being, we assume that some arbitrary
choice is made. This chosen element is called the pivot of the elimination step and is
denoted π1 (so, in this first step, π1 = ai 1 ).
(2) Next, we permute the row (i) corresponding to the pivot with the first row. Such a
step is called pivoting. So, after this permutation, the first element of the first row is
nonzero.
(3) We now eliminate the variable x2 from all rows except the first by adding suitable
multiples of the first row to these rows. More precisely we add −ai 1 /π1 times the first
row to the ith row, for i = 2, . . . , n. At the end of this step, all entries in the first
column are zero except the first.
76 CHAPTER 3. DETERMINANTS
(4) Increment k by 1. If k = n, stop. Otherwise, k < n, and then iteratively repeat steps
(1), (2), (3) on the (n − k + 1) × (n − k + 1) subsystem obtained by deleting the first
k − 1 rows and k − 1 columns from the current system.
Now, we will prove later that det(Ak ) = ± det(A). Since A is invertible, some entry akik with
k ≤ i ≤ n is nonzero; so, one of these entries can be chosen as pivot, and we permute the
kth row with the ith row, obtaining the matrix αk = (αjk l ). The new pivot is πk = αkk k , and
we zero the entries i = k + 1, . . . , n in column k by adding −αikk /πk times row k to row i.
At the end of this step, we have Ak+1 . Observe that the first k − 1 rows of Ak are identical
to the first k − 1 rows of Ak+1 .
It is easy to figure out what kind of matrices perform the elementary row operations used
during Gaussian elimination. The permutation of the kth row with the ith row is achieved
by multiplying A on the left by the transposition matrix , P (i, k), with
:
1 if j = l, where j (= i and l (= k
P (i, k)j l = 0 if j = l = i or j = l = k
1 if j = i and l = k or j = k and l = i,
i.e.,
1
1
0 1
1
..
P (i, k) = . .
1
1 0
1
1
Observe that det(P (i, k)) = −1. Therefore, during the permutation step (2), if row k and
row i need to be permuted, the matrix A is multiplied on the left by the matrix Pk such that
Pk = P (i, k), else we set Pk = I.
3.7. GAUSSIAN ELIMINATION, LU AND CHOLESKY FACTORIZATION 77
Adding β times row j to row i is achieved by multiplying A on the left by the elementary
matrix , Ei,j;β = I + βei,j , where
5
1 if k = i and l = j
(ei,j )k l =
0 if k (= i or l =
( j,
i.e.,
1
1
1
1
..
Ei,j;β = . .
1
β 1
1
1
Observe that the inverse of Ei,j;β = I + βei j is Ei,j;−β = I − βei,j and that det(Ei,j;β ) = 1.
Therefore, during step 3 (the elimination step), the matrix A is multiplied on the left by a
product, Ek , of matrices of the form Ei,k;βi,k . Consequently, we see that
Ak+1 = Ek Pk Ak .
The fact that det(P (i, k)) = −1 and that det(Ei,j;β ) = 1 implies immediately the fact
claimed above: We always have det(Ak ) = ± det(A). Furthermore, since
Ak+1 = Ek Pk Ak
An = En−1 Pn−1 · · · E2 P2 E1 P1 A
is upper-triangular. Also note that if we let M = En−1 Pn−1 · · · E2 P2 E1 P1 , then det(M) = ±1,
and
det(A) = ± det(An ).
Proof . We already proved the theorem when A is invertible, as well as the last assertion.
Now, A is singular iff some pivot is zero, say at stage k of the elimination. If so, we must
have akik = 0, for i = k, . . . , n; but in this case, Ak+1 = Ak and we may pick Pk = Ek = I.
It remains to discuss the choice of the pivot, and also conditions that guarantee that no
permutations are needed during the Gaussian elimination process.
We begin by stating a necessary and sufficient condition for an invertible matrix to have
an LU-factorization (i.e., Gaussian elimination does not require pivoting). We say that an
invertible matrix, A, has an LU-factorization if it can be written as A = LU, where U is
upper-triangular invertible and L is lower-triangular, with Li i = 1 for i = 1, . . . , n. A lower-
triangular matrix with diagonal entries equal to 1 is called a unit lower-triangular matrix.
Given an n × n matrix, A = (ai j ), for any k, with 1 ≤ k ≤ n, let A[1..k, 1..k] denote the
submatrix of A whose entries are ai j , where 1 ≤ i, j ≤ k.
A[1..k, 1..k] = L1 U1 ,
and since U is invertible, U1 is also invertible (the determinant of U is the product of the
diagonal entries in U, which is the product of the diagonal entries in U1 and U4 ). As L1 is
invertible (since its diagonal entries are equal to 1), we see that A[1..k, 1..k] is invertible for
k = 1, . . . , n.
Conversely, assume that A[1..k, 1..k] is invertible, for k = 1, . . . , n. We just need to show
that Gaussian elimination does not need pivoting. We prove by induction on k that the kth
step does not need pivoting. This holds for k = 1, since A[1..1, 1..1] = (a1 1 ), so, a1 1 (= 0.
Assume that no pivoting was necessary for the first k steps (1 ≤ k ≤ n − 1). In this case,
we have
Ek−1 · · · E2 E1 A = Ak ,
where L = Ek−1 · · · E2 E1 is a unit lower-triangular matrix and Ak [1..k, 1..k] is upper-
triangular, so that LA = Ak can be written as
' (' ( ' (
L1 0 A[1..k, 1..k] A2 U1 B2
= ,
P L4 A3 A4 0 B4
L1 A[1..k, 1..k]) = U1 ,
where L1 is invertible and U1 is also invertible since its diagonal elements are the first k
pivots, by hypothesis. Therefore, A[1..k, 1..k] is also invertible.
Proof . We proved in Proposition 3.15 that in this case Gaussian elimination requires no
pivoting. Then, since every elementary matrix Ei,k;β is lower-triangular (since we always
−1
arrange that the pivot, πk , occurs above the rows that it operates on), since Ei,k;β = Ei,k;−β
&
and the Ek s are products of Ei,k;βi,k ’s, from
En−1 · · · E2 E1 A = U,
A = LU,
The reader should verify that the example below is indeed an LU-factorization.
2 1 1 0 1 0 0 0 2 1 1 0
4 3 3 1 2 1 0 0
0 1 1 1.
8 7 9 5 = 4 3 1 0 0 0 2 2
6 7 9 8 3 4 1 1 0 0 0 2
One of the main reasons why the existence of an LU-factorization for a matrix, A, is
interesting is that if we need to solve several linear systems, Ax = b, corresponding to the
same matrix, A, we can do this cheaply by solving the two triangular systems
Lw = b, and Ux = w.
As we will see a bit later, symmetric positive definite matrices satisfy the condition of
Proposition 3.15. Therefore, linear systems involving symmetric positive definite matrices
can be solved by Gaussian elimination without pivoting. Actually, it is possible to do better:
This is the Cholesky factorization.
The following easy proposition shows that, in principle, A can be premultipied by some
permutation matrix, P , so that P A can be converted to upper-triangular form without using
any pivoting.
Proposition 3.17 Let A be an invertible n × n-matrix. Then, there is some permutation
matrix, P , so that P A[1..k, 1..k] is invertible for k = 1, . . . , n.
Proof . The case n = 1 is trivial, and so is the case n = 2 (we swap the rows if necessary). If
n ≥ 3, we proceed by induction. Since A is invertible, its columns are linearly independent;
so, in particular, its first n − 1 columns are also linearly independent. Delete the last column
of A. Since the remaining n−1 columns are linearly independent, there are also n−1 linearly
independent rows in the corresponding n × (n − 1) matrix. Thus, there is a permutation
of these n rows so that the (n − 1) × (n − 1) matrix consisting of the first n − 1 rows is
invertible. But, then, there is a corresponding permutation matrix, P1 , so that the first n − 1
rows and columns of P1 A form an invertible matrix, A& . Applying the induction hypothesis
to the (n − 1) × (n − 1) matrix, A& , we see that there some permutation matrix, P2 , so that
P2 P1 A[1..k, 1..k] is invertible, for k = 1, . . . , n − 1. Since A is invertible in the first place and
P1 and P2 are invertible, P1 P2 A is also invertible, and we are done.
Remark: One can also prove Proposition 3.17 using a clever reordering of the Gaussian
elimination steps. Indeed, we know that if A is invertible, then there are permutation
matrices, Pi , and products of elementary matrices, Ei , so that
An = En−1 Pn−1 · · · E2 P2 E1 P1 A,
where U = An is upper-triangular. For example, when n = 4, we have E3 P3 E2 P2 E1 P1 A = U.
We can define new matrices E1& , E2& , E3& which are still products of elementary matrices and
we have
E3& E2& E1& P3 P2 P1 A = U.
3.7. GAUSSIAN ELIMINATION, LU AND CHOLESKY FACTORIZATION 81
Indeed, if we let E3& = E3 , E2& = P3 E2 P3−1 , and E1& = P3 P2 E1 P2−1P3−1 , we easily verify that
each Ek& is a product of elementary matrices and that
In general, we let
Ek& = Pn−1 · · · Pk+1 Ek Pk+1
−1 −1
· · · Pn−1 ,
and we have
&
En−1 · · · E1& Pn−1 · · · P1 A = U.
Theorem 3.18 For every invertible n × n-matrix, A, there is some permutation matrix,
P , some upper-triangular matrix, U, and some unit lower-triangular matrix, L, so that
P A = LU (recall, Li i = 1 for i = 1, . . . , n). Furthermore, if P = I, then L and U are unique
and they are produced as a result of Gaussian elimination without pivoting. Furthermore, if
P = I, then L is simply obtained from the Ek−1 ’s.
Proof . The only part that has not been proved is the uniqueness part and how L arises
from the Ek−1 ’s. Assume that A is invertible and that A = L1 U1 = L2 U2 , with L1 , L2 unit
lower-triangular and U1 , U2 upper-triangular. Then, we have
L−1 −1
2 L1 = U2 U1 .
10−4 x + y = 1
x + y = 2.
10−4 x + y = 1
(1 − 104 )y = 2 − 104 .
82 CHAPTER 3. DETERMINANTS
1 1 − 2 × 10−4
x= , y= .
1 − 10−4 1 − 10−4
However, if roundoff takes place on the fourth digit, then 1 − 104 = −9999 and 2 − 104 =
−9998 will be rounded off both to −9990, and then, the solution is x = 0 and y = 1, very
far from the exact solution where x ≈ 1 and y ≈ 1. The problem is that we picked a very
small pivot. If instead we permute the equations, the pivot is 1, and after elimination, we
get the system
x + y = 2
(1 − 10 )y = 1 − 2 × 10−4 .
−4
This time, 1 − 10−4 = −0.9999 and 1 − 2 × 10−4 = −0.9998 are rounded off to 0.999 and the
solution is x = 1, y = 1, much closer to the exact solution.
To remedy this problem, one may use the strategy of partial pivoting. This consists of
choosing during step k (1 ≤ k ≤ n − 1) one of the entries akik such that
By maximizing the value of the pivot, we avoid dividing by undesirably small pivots.
It has been known for a long time (before 1900, say by Hadamard) that if a matrix, A,
is strictly column diagonally dominant (resp. strictly row diagonally dominant), then it is
invertible. (This is a good exercise, try it!) It can also be shown that if A is strictly col-
umn diagonally dominant, then Gaussian elimination with partial pivoting does not actually
require pivoting.
Another strategy, called complete pivoting, consists in choosing some entry akij , where
k ≤ i, j ≤ n, such that
|akij | = max |akp q |.
k≤p,q≤n
However, in this method, if the chosen pivot is not in column k, it is also necessary to
permute columns. This is achieved by multiplying on the right by a permutation matrix.
3.7. GAUSSIAN ELIMINATION, LU AND CHOLESKY FACTORIZATION 83
However, complete pivoting tends to be too expansive in practice, and partial pivoting is the
method of choice.
A special case where the LU-factorization is particularly efficient is the case of tridiagonal
matrices, which we now consider.
Consider the tridiagonal matrix
b1 c1
a2 b2 c2
a3 b3 c3
. .. ..
A= .. . . .
an−2 bn−2 cn−2
an−1 bn−1 cn−1
an bn
Proposition 3.19 If A is the tridiagonal matrix above, then δk = det(A[1..k, 1..k]), for
k = 1, . . . , n.
Proof . By expanding det(A[1..k, 1..k]) with respect to its last row, the proposition follows
by induction on k.
Theorem 3.20 If A is the tridiagonal matrix above and δk (= 0 for k = 1, . . . , n, then A has
the following LU-factorization:
δ1
1 c1
δ
δ0 0
a2 1 δ2
δ1 c2
δ1
δ1
a3 1 δ3
δ2 c3
A= .. .. δ2 .
. . .. ..
. .
δn−3
an−1 1 δn−1
δn−2 cn−1
δn−2
δn−2
an 1 δn
δn−1
δn−1
(LU)k k+1 = ck , 1 ≤ k ≤ n − 1
(LU)k k−1 = ak , 2 ≤ k ≤ n
(LU)k l = 0, |k − l| ≥ 2
δ1
(LU)1 1 = = b1
δ0
ak ck−1 δk−2 + δk
(LU)k k = = bk , 2 ≤ k ≤ n,
δk−1
d1 dk − ak wk−1
w1 = , wk = , k = 2, . . . , n,
b1 bk − ak zk−1
3.7. GAUSSIAN ELIMINATION, LU AND CHOLESKY FACTORIZATION 85
xn = wn , xk = wk − zk xk+1 , k = n − 1, n − 2, . . . , 1,
Remark: It can be verified that this requires 3(n − 1) additions, 3(n − 1) multiplications,
and 2n divisions, a total of 8n − 6 operations, which is much less that the O(2n3 /3) required
by Gaussian elimination in general.
We now consider the special case of symmetric positive definite matrices. Recall that an
n × n symmetric matrix, A, is positive definite iff
Equivalently, A is symmetric positive definite iff all its eigenvalues are strictly positive. The
following facts about a symmetric positive definite matrice, A, are easily established (some
left as exercise):
(1) The matrix A is invertible. (Indeed, if Ax = 0, then x# Ax = 0, which implies x = 0.)
(2) We have ai i > 0 for i = 1, . . . , n. (Just observe that for x = ei , the ith canonical basis
vector of Rn , we have e#
i Aei = ai i > 0.)
(3) For every n × n invertible matrix, Z, the matrix Z # AZ is symmetric positive definite
iff A is symmetric positive definite.
Next, we prove that a symmetric positive definite matrix has a special LU-factorization
of the form A = BB # , where B is a lower-triangular matrix whose diagonal elements are
strictly positive. This is the Cholesky factorization.
First, we note that a symmetric positive definite matrix satisfies the condition of Propo-
sition 3.15.
Proposition 3.21 If A is a symmetric positive definite matrix, then A[1..k, 1..k] is invertible
for k = 1, . . . , n.
√
Since A is symmetric positive definite, a1 1 > 0, and we can compute α = a1 1 . The trick is
that we can factor A uniquely as
' ( ' (' (' (
a1 1 W # α 0 1 0 α W # /α
A= = ,
W B W/α I 0 B − W W # /a1 1 0 I
is symmetric positive definite. However, this implies that B − W W # /a1 1 is also symmetric
positive definite (consider x# A1 x for every x ∈ Rn with x (= 0 and x1 = 0). Thus, we can
apply the induction hypothesis to B − W W # /a1 1 , and we find a unique lower-triangular
matrix, L, with positive diagonal entries, so that
B − W W # /a1 1 = LL# .
x# Ax = x# BB # x = (B # x)# B # x,
Remark: It can be shown that this methods requires n3 /6 + O(n2) additions, n3 /6 + O(n2)
multiplications, n2 /2 + O(n) divisions, and O(n) square root extractions. Thus, the Choleski
method requires half of the number of operations required by Gaussian elimination (since
Gaussian elimination requires n3 /3 + O(n2) additions, n3 /3 + O(n2 ) multiplications, and
n2 /2 + O(n) divisions). It also requires half of the space (only B is needed, as opposed to
both L and U). Furthermore, it can be shown that Cholesky’s method is numerically stable.
For more on the stability analysis and efficient implementation methods of Gaussian
elimination, LU-factoring and Cholesky factoring, see Demmel [14], Trefethen and Bau [54],
Ciarlet [12], Golub and Van Loan [23], Strang [50, 51], and Kincaid and Cheney [28].