[go: up one dir, main page]

0% found this document useful (0 votes)
44 views88 pages

Basics of Algebra and Analysis For Computer Science

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 88

Basics of Algebra and Analysis

For Computer Science

Jean Gallier
Department of Computer and Information Science
University of Pennsylvania
Philadelphia, PA 19104, USA
e-mail: jean@cis.upenn.edu

c Jean Gallier
!

February 9, 2011
2
Contents

1 Introduction 5

2 Linear Algebra 7
2.1 Groups, Rings, and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Linear Independence, Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Bases of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.8 Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.9 The Dual Space E ∗ and Linear Forms . . . . . . . . . . . . . . . . . . . . . . 44
2.10 Hyperplanes and Linear Forms . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.11 Transpose of a Linear Map and of a Matrix . . . . . . . . . . . . . . . . . . . 52

3 Determinants 55
3.1 Permutations, Signature of a Permutation . . . . . . . . . . . . . . . . . . . 55
3.2 Alternating Multilinear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Definition of a Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Inverse Matrices and Determinants . . . . . . . . . . . . . . . . . . . . . . . 68
3.5 Systems of Linear Equations and Determinant . . . . . . . . . . . . . . . . . 71
3.6 Determinant of a Linear Map . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.7 Gaussian Elimination, LU and Cholesky Factorization . . . . . . . . . . . . . 73
3.8 Futher Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4 Basics of Affine Geometry 89


4.1 Affine Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2 Examples of Affine Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3 Chasles’s Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.4 Affine Combinations, Barycenters . . . . . . . . . . . . . . . . . . . . . . . . 99
4.5 Affine Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.6 Affine Independence and Affine Frames . . . . . . . . . . . . . . . . . . . . . 108
4.7 Affine Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

3
4 CONTENTS

4.8 Affine Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119


4.9 Affine Geometry: A Glimpse . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.10 Affine Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5 Polynomials, PID’s and UFD’s 141


5.1 Multisets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.2 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3 Euclidean Division of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . 148
5.4 Ideals, PID’s, and Greatest Common Divisors . . . . . . . . . . . . . . . . . 150
5.5 Factorization and Irreducible Factors in K[X] . . . . . . . . . . . . . . . . . 157
5.6 Roots of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.7 Unique Factorization Domains (Factorial Rings) . . . . . . . . . . . . . . . . 168
5.8 Noetherian Rings and Hilbert’s Basis Theorem . . . . . . . . . . . . . . . . . 182
5.9 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.10 Polynomial Interpolation (Lagrange, Newton,
Hermite) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5.11 Futher Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

6 Topology 197
6.1 Metric Spaces and Normed Vector Spaces . . . . . . . . . . . . . . . . . . . . 197
6.2 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
6.3 Continuous Functions, Limits . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.4 Continuous Linear and Multilinear Maps . . . . . . . . . . . . . . . . . . . . 211
6.5 Normed Affine Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.6 Futher Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

7 Differential Calculus 219


7.1 Directional Derivatives, Total Derivatives . . . . . . . . . . . . . . . . . . . . 219
7.2 Jacobian Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
7.3 The Implicit and The Inverse Function Theorems . . . . . . . . . . . . . . . 232
7.4 Tangent Spaces and Differentials . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.5 Second-Order and Higher-Order Derivatives . . . . . . . . . . . . . . . . . . 237
7.6 Taylor’s formula, Faà di Bruno’s formula . . . . . . . . . . . . . . . . . . . . 242
7.7 Vector Fields, Covariant Derivatives, Lie Brackets . . . . . . . . . . . . . . . 245
7.8 Futher Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

8 Appendix: Zorn’s Lemma; Some Applications 249


8.1 Statement of Zorn’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
8.2 Proof of the Existence of a Basis in a Vector Space . . . . . . . . . . . . . . 250
8.3 Existence of Maximal Proper Ideals . . . . . . . . . . . . . . . . . . . . . . . 251

Bibliography 252
Chapter 1

Introduction

5
6 CHAPTER 1. INTRODUCTION
Chapter 2

Linear Algebra

2.1 Groups, Rings, and Fields


In this chapter, the basic algebraic structures (groups, rings, fields, vector spaces) are re-
viewed, with a major emphasis on vector spaces. Basic notions of linear algebra such as
vector spaces, subspaces, linear combinations, linear independence, bases, quotient spaces,
linear maps, matrices, change of bases, direct sums, linear forms, dual spaces, hyperplanes,
transpose of a linear maps, are reviewed.

The set R of real numbers has two operations +: R × R → R (addition) and ∗: R × R →


R (multiplication) satisfying properties that make R into an abelian group under +, and
R − {0} = R∗ into an abelian group under ∗. Recall the definition of a group.
Definition 2.1 A group is a set G equipped with an operation ·: G × G → G having the
following properties: · is associative, has an identity element e ∈ G, and every element in G
is invertible (w.r.t. ·). More explicitly, this means that the following equations hold for all
a, b, c ∈ G:
(G1) a · (b · c) = (a · b) · c. (associativity);
(G2) a · e = e · a = a. (identity);
(G3) For every a ∈ G, there is some a−1 ∈ G such that a · a−1 = a−1 · a = e (inverse).
A group G is abelian (or commutative) if
a·b= b·a
for all a, b ∈ G.

A set M together with an operation ·: M × M → M and an element e satisfying only


conditions (G1) and (G2) is called a monoid . For example, the set N = {0, 1, . . . , n . . .} of
natural numbers is a (commutative) monoid. However, it is not a group. Some examples of
groups are given below.

7
8 CHAPTER 2. LINEAR ALGEBRA

Example 2.1

1. The set Z = {. . . , −n, . . . , −1, 0, 1, . . . , n . . .} of integers is a group under addition,


with identity element 0. However, Z∗ = Z − {0} is not a group under multiplication.

2. The set Q of rational numbers is a group under addition, with identity element 0. The
set Q∗ = Q − {0} is also a group under multiplication, with identity element 1.

3. Similarly, the sets R of real numbers and C of complex numbers are groups under
addition (with identity element 0), and R∗ = R − {0} and C∗ = C − {0} are groups
under multiplication (with identity element 1).

4. The sets Rn and Cn of n-tuples of real or complex numbers are groups under compo-
nentwise addition:

(x1 , . . . , xn ) + (y1 , · · · , yn ) = (x1 + yn , . . . , xn + yn ),

with identity element (0, . . . , 0). All these groups are abelian.

5. Given any nonempty set S, the set of bijections f : S → S, also called permutations
of S, is a group under function composition (i.e., the multiplication of f and g is the
composition g ◦ f ), with identity element the identity function idS . This group is not
abelian as soon as S has more than two elements.

6. The set of n × n matrices with real (or complex) coefficients is a group under addition
of matrices, with identity element the null matrix. It is denoted by Mn (R) (or Mn (C)).

7. The set R[X] of polynomials in one variable with real coefficients is a group under
addition of polynomials.

8. The set of n × n invertible matrices with real (or complex) coefficients is a group under
matrix multiplication, with identity element the identity matrix In . This group is
called the general linear group and is usually denoted by GL(n, R) (or GL(n, C)).

9. The set of n × n invertible matrices with real (or complex) coefficients and determinant
+1 is a group under matrix multiplication, with identity element the identity matrix
In . This group is called the special linear group and is usually denoted by SL(n, R)
(or SL(n, C)).

10. The set of n × n invertible matrices with real coefficients such that RR# = In and
of determinant +1 is a group called the orthogonal group and is usually denoted by
SO(n) (where R# is the transpose of the matrix R, i.e., the rows of R# are the columns
of R). It corresponds to the rotations in Rn .
2.1. GROUPS, RINGS, AND FIELDS 9

11. Given an open interval ]a, b[, the set C(]a, b[) of continuous functions f : ]a, b[→ R is a
group under the operation f + g defined such that

(f + g)(x) = f (x) + g(x)

for all x ∈]a, b[.

The groups Z, Q, R, C, R[X], and Mn (R) are more than an abelian groups, they are also
commutative rings. Furthermore, Q, R, and C are fields. We now introduce rings and fields.

Definition 2.2 A ring is a set A equipped with two operations +: A × A → A (called


addition) and ∗: A × A → A (called multiplication) having the following properties:
(R1) A is an abelian group w.r.t. +;
(R2) ∗ is associative and has an identity element 1 ∈ A;
(R3) ∗ is distributive w.r.t. +.
The identity element for addition is denoted 0, and the additive inverse of a ∈ A is
denoted by −a. More explicitly, the axioms of a ring are the following equations which hold
for all a, b, c ∈ A:

a + (b + c) = (a + b) + c (associativity of +) (2.1)
a+b= b+a (commutativity of +) (2.2)
a+0= 0+a = a (zero) (2.3)
a + (−a) = (−a) + a = 0 (additive inverse) (2.4)
a ∗ (b ∗ c) = (a ∗ b) ∗ c (associativity of ∗) (2.5)
a∗1= 1∗a = a (identity for ∗) (2.6)
(a + b) ∗ c = (a ∗ c) + (b ∗ c) (distributivity) (2.7)
a ∗ (b + c) = (a ∗ b) + (a ∗ c) (distributivity) (2.8)

The ring A is commutative if


a∗b= b∗a
for all a, b ∈ A.

From (2.7) and (2.8), we easily obtain

a∗0 = 0∗a=0 (2.9)


a ∗ (−b) = (−a) ∗ b = −(a ∗ b). (2.10)

Note that (2.9) implies that if 1 = 0, then a = 0 for all a ∈ A, and thus, A = {0}. The
ring A = {0} is called the trivial ring. A ring for which 1 (= 0 is called nontrivial . The
multiplication a ∗ b of two elements a, b ∈ A is often denoted by ab.
10 CHAPTER 2. LINEAR ALGEBRA

Example 2.2
1. The additive groups Z, Q, R, C, are commutative rings.
2. The group R[X] of polynomials in one variable with real coefficients is a ring under
multiplication of polynomials. It is a commutative ring.
3. The group of n × n matrices Mn (R) is a ring under matrix multiplication. However, it
is not a commutative ring.
4. The group C(]a, b[) of continuous functions f : ]a, b[→ R is a ring under the operation
f · g defined such that
(f · g)(x) = f (x)g(x)
for all x ∈]a, b[.

When ab = 0 for a (= 0 and b (= 0, we say that a is a zero divisor (and so is b). A ring
A is an integral domain (or an entire ring) if 0 (= 1, A is commutative, and ab = 0 implies
that a = 0 or b = 0, for all a, b ∈ A. In other words, an integral domain is a nontrivial
commutative ring with no zero divisors.
Example 2.3
1. The rings Z, Q, R, C, are integral domains.
2. The ring R[X] of polynomials in one variable with real coefficients is an integral domain.
3. For any positive integer, p ∈ N, define a relation on Z, denoted m ≡ n (mod p), as
follows:
m ≡ n (mod p) iff m − n = kp for some k ∈ Z.
The reader will easily check that this is an equivalence relation, and, moreover, it is
compatible with respect to addition and multiplication, which means that if m1 ≡
n1 (mod p) and m2 ≡ n2 (mod p), then m1 + m2 ≡ n1 + n2 (mod p) and m1 m2 ≡
n1 n2 (mod p). Consequently, we can define an addition operation and a multiplication
operation of the set of equivalence classes (mod p):
[m] + [n] = [m + n]
and
[m] · [n] = [mn].
Again, the reader will easily check that the ring axioms are satisfied, with [0] as zero
and [1] as multiplicative unit. The resulting ring is denoted by Z/pZ.1 Observe that
if p is composite, then this ring has zero-divisors. For example, if p = 4, then we have
2 · 2 ≡ 0 (mod 4).
1
The notation Zp is sometimes used instead of Z/pZ but it clashes with the notation for the p-adic integers
so we prefer not to use it.
2.1. GROUPS, RINGS, AND FIELDS 11

However, the reader should prove that Z/pZ is an integral domain if p is prime (in
fact, it is a field).

4. The ring of n × n matrices Mn (R) is not an integral domain. It has zero divisors (see
Example 2.12).

A homomorphism between rings is a mapping preserving addition, multiplications (and


0 and 1).

Definition 2.3 Given two rings A and B, a homomorphism between A and B is a function
h: A → B satisfying the following conditions for all x, y ∈ A:

h(x + y) = h(x) + h(y)


h(xy) = h(x)h(y)
h(0) = 0
h(1) = 1.

Actually, because B is a group under addition, h(0) = 0 follows from

h(x + y) = h(x) + h(y).

Example 2.4
1. If A is a ring, for any integer n ∈ Z, for any a ∈ A, we define n · a by

n·a= a
! + ·"#
· · + a$
n

if n ≥ 0 (with 0 · a = 0) and
n · a = −(−n) · a
if n < 0. Then, the map h: Z → A given by

h(n) = n · 1A

is a ring homomorphism (where 1A is the multiplicative identity of A).

2. Given any real λ ∈ R, the evaluation map ηλ : R[X] → R defined by

ηλ (f (X)) = f (λ)

for every polynomial f (X) ∈ R[X] is a ring homomorphism.

A field is a commutative ring K for which A − {0} is a group under multiplication.

Definition 2.4 A set K is a field if it is a ring and the following properties hold:
12 CHAPTER 2. LINEAR ALGEBRA

(F1) 0 (= 1;

(F2) K ∗ = K − {0} is a group w.r.t. ∗ (i.e., every a (= 0 has an inverse w.r.t. ∗);

(F3) ∗ is commutative.

If ∗ is not commutative but (F1) and (F2) hold, we say that we have a skew field (or
noncommutative field ).

Note that we are assuming that the operation ∗ of a field is commutative. This convention
is not universally adopted, but since ∗ will be commutative for most fields we will encounter,
we may as well include this condition in the definition.

Example 2.5

1. The rings Q, R, and C are fields.

2. The set of (formal) fractions f (X)/g(X) of polynomials f (X), g(X) ∈ R[X], where
g(X) is not the null polynomial, is a field.

3. The ring C ∗ (]a, b[) of continuous functions f : ]a, b[→ R such that f (x) (= 0 for all
x ∈]a, b[ is a field.

4. The ring Z/pZ is a field whenever p is prime.

2.2 Vector Spaces


For every n ≥ 1, let Rn be the set of n-tuples x = (x1 , . . . , xn ). Addition can be extended to
Rn as follows:
(x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn ).
We can also define an operation ·: R × Rn → Rn as follows:
λ · (x1 , . . . , xn ) = (λx1 , . . . , λxn ).
The resulting algebraic structure has some interesting properties, those of a vector space.

Definition 2.5 Given a field K, a vector space over K (or K-vector space) is a set E
(of vectors) together with two operations +: E × E → E (called vector addition),2 and
·: K ×E → E (called scalar multiplication) satisfying the following conditions for all α, β ∈ K
and all u, v ∈ E;

(V0) E is an abelian group w.r.t. +, with identity element 0;


2
The symbol + is overloaded, since it denotes both addition in the field K and addition of vectors in E.
However, if we write elements of E as vectors, i.e., of the form u, it will always be clear from the type of the
arguments which + is intended.
2.3. LINEAR INDEPENDENCE, SUBSPACES 13

(V1) α · (u + v) = (α · u) + (α · v);

(V2) (α + β) · u = (α · u) + (β · u);

(V3) (α ∗ β) · u = α · (β · u);

(V4) 1 · u = u.

Given α ∈ K and v ∈ E, the element α · v is also denoted by αv. The field K is often
called the field of scalars.
Unless specified otherwise or unless we are dealing with several different fields, in the rest
of this chapter, we assume that all K-vector spaces are defined with respect to a fixed field
K. Thus, we will refer to a K-vector space simply as a vector space. In most cases, the field
K will be the field R of reals.
From (V0), a vector space always contains the null vector 0, and thus is nonempty.
From (V1), we get α · 0 = 0, and α · (−v) = −(α · v). From (V2), we get 0 · v = 0, and
(−α) · v = −(α · v). The field K itself can be viewed as a vector space over itself, addition
of vectors being addition in the field, and multiplication by a scalar being multiplication in
the field.

Example 2.6

1. The fields R and C are vector spaces over R.

2. The groups Rn and Cn are vector spaces over R, and Cn is a vector space over C.

3. The ring R[X] of polynomials is a vector space over R, and C[X] is a vector space over
R and C. The ring of n × n matrices Mn (R) is a vector space over R.

4. The ring C ∗ (]a, b[) of continuous functions f : ]a, b[→ R is a vector space over R.

Let E be a vector space. We would like to define the important notions of linear com-
bination and linear independence. These notions can be defined for sets of vectors in E,
but it will turn out to be more convenient to define them for families (vi )i∈I , where I is any
arbitrary index set.

2.3 Linear Independence, Subspaces


One of the most useful properties of vector spaces is that there possess bases. What this
means is that in every vector space, E, there is some set of vectors, {e1 , . . . , en }, such that
every, vector, v ∈ E, can be written as a linear combination,

v = λ1 e1 + · · · + λn en ,
14 CHAPTER 2. LINEAR ALGEBRA

of the ei , for some scalars, λ1 , . . . , λn ∈ K. Furthermore, the n-tuple, (λ1 , . . . , λn ), as above


is unique.
This description is fine when E has a finite basis, {e1 , . . . , en }, but this is not always the
case! For example, the vector space of real polynomials, R[X], does not have a finite basis
but instead it has an infinite basis, namely

1, X, X 2 , . . . , X n , . . .

One might wonder if it is possible for a vector space to have bases of different sizes, or even
to have a finite basis as well as an infinite basis. We will see later on that this is not possible;
all bases of a vector space have the same number of elements (cardinality), which is called
the dimension of the space. However, we have the following problem: If a vector space has
an infinite basis, {e1 , e2 , . . . , }, how do we define linear combinations? Do we allow linear
combinations
λ1 e1 + λ2 e2 + · · ·
with infinitely many nonzero coefficients?
If we allow linear combinations with infinitely many nonzero coefficients, then we have
to make sense of these sums and this can only be done reasonably if we define such a sum
as the limit of the sequence of vectors, s1 , s2 , . . . , sn , . . ., with s1 = λ1 e1 and

sn+1 = sn + λn+1 en+1 .

But then, how do we define such limits? Well, we have to define some topology on our space,
by means of a norm, a metric or some other mechanism. This can indeed be done and this
is what Banach spaces and Hilbert spaces are all about but this seems to require a lot of
machinery.
A way to avoid limits is to restrict our attention to linear combinations involving only
finitely many vectors. We may have an infinite supply of vectors but we only form linear
combinations involving finitely many nonzero coefficients. Technically, this can be done by
introducing families of finite support. This gives us the ability to manipulate families of
scalars indexed by some fixed infinite set and yet to be treat these families as if they were
finite. With these motivations in mind, let us review the notion of an indexed family.
Given a set A, a family (ai )i∈I of elements of A is simply a function a: I → A.

Remark: When considering a family (ai )i∈I , there is no reason to assume that I is ordered.
The crucial point is that every element of the family is uniquely indexed by an element of
I. Thus, unless specified otherwise, we do not assume that the elements of an index set are
ordered.
If A is an abelian group (usually, when A is a ring or a vector space) with identity 0, we
say that a family (ai )i∈I has finite support if ai = 0 for all i ∈ I − J, where J is a finite
subset of I (the support of the family).
2.3. LINEAR INDEPENDENCE, SUBSPACES 15

We can deal with an arbitrary set X by viewing it as the family (Xx )x∈X corresponding
to the identity function id: X → X. We agree that when I = ∅, (ai )i∈I = ∅. A family (ai )i∈I
is finite if I is finite.
Given two disjoint sets I and J, the union of two families (ui )i∈I and (vj )j∈J , denoted as
(ui )i∈I ∪ (vj )j∈J , is the family (wk )k∈(I∪J) defined such that wk = uk if k ∈ I, and wk = vk
if k ∈ J. Given a family (ui )i∈I and any element v, we denote by (ui )i∈I ∪k (v) the family
(wi )i∈I∪{k} defined such that, wi = ui if i ∈ I, and wk = v, where k is any index such that
k∈ / I. Given a family (ui )i∈I , a subfamily of (ui )i∈I is a family (uj )j∈J where J is any subset
of I.
In this chapter, unless specified otherwise, it is assumed that all families of scalars have
finite support.

Definition 2.6 Let E be a vector space. A vector v ∈ E is a linear combination of a family


(ui )i∈I of elements of E if there is a family (λi )i∈I of scalars in K such that
%
v= λi u i .
i∈I

When I = ∅, we stipulate that v = 0. We say that a family (ui )i∈I is linearly independent if
for every family (λi )i∈I of scalars in K,
%
λi ui = 0 implies that λi = 0 for all i ∈ I.
i∈I

Equivalently, a family (ui )i∈I is linearly dependent if there is some family (λi )i∈I of scalars
in K such that %
λi ui = 0 and λj (= 0 for some j ∈ I.
i∈I

We agree that when I = ∅, the family ∅ is linearly independent.

A family (ui )i∈I is linearly dependent iff some uj in the family can be expressed as a
linear combination of the other vectors in the family. Indeed, there is some family (λi )i∈I of
scalars in K such that
%
λi ui = 0 and λj (= 0 for some j ∈ I,
i∈I

which implies that %


uj = −λ−1
j λi u i .
i∈(I−{j})

i )i∈I is linearly independent, note that ui (= 0 for all


When I is nonempty, if the family (u&
i ∈ I, since otherwise we would have i∈I λi ui = 0 with some λi (= 0, since 0ui = 0.
16 CHAPTER 2. LINEAR ALGEBRA

Example 2.7
1. Any two disinct scalars λ, µ (= 0 in K are linearly independent.
2. In R3 , the vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) are linearly independent.
3. In R4 , the vectors (1, 1, 1, 1), (0, 1, 1, 1), (0, 0, 1, 1), and (0, 0, 0, 1) are linearly indepen-
dent.
4. In R2 , the vectors u = (1, 1), v = (0, 1) and w = (2, 3) are linearly dependent, since
w = 2u + v.

Note that a family (ui )i∈I is linearly independent iff (u& j )j∈J is linearly independent for
every finite subset J of I (even when I = ∅).& Indeed, when i∈I λi ui = 0, the& family (λi )i∈I
of scalars in K has finite support, and thus i∈I λi ui = 0 really means that j∈J λj uj = 0
for a finite subset J of I. When I is finite, we often assume that it is the set I = {1, 2, . . . , n}.
In this case, we denote the family (ui )i∈I as (u1, . . . , un ).
The notion of a subspace of a vector space is defined as follows.
Definition 2.7 Given a vector space E, a subset F of E is a linear subspace (or subspace)
of E if F is nonempty and λu + µv ∈ F for all u, v ∈ F , and all λ, µ ∈ K.

It is easy to see that a subspace F of E is indeed a vector space, since the restriction
of +: E × E → E to F × F is indeed a function +: F × F → F , and the restriction of
·: K × E → E to K × F is indeed a function ·: K × F → F . It is also easy to see that a
subspace F of E is closed under arbitrary linear combinations of vectors from F , and that
any intersection of subspaces is a subspace. Letting λ = µ = 0, we see that every subspace
contains the vector 0. The subspace {0} will be denoted by (0), or even 0 (with a mild abuse
of notation).
Example 2.8
1. In R2 , the set of vectors u = (x, y) such that
x+y−1=0
is a subspace.
2. In R3 , the set of vectors u = (x, y, z) such that
x+y+z−1=0
is a subspace.
3. For any n ≥ 0, the set of polynomials f (X) ∈ R[X] of degree at most n is a subspace
of R[X].
4. The set of upper triangular n × n matrices is a subspace of the space of n × n matrices.
2.4. BASES OF A VECTOR SPACE 17

2.4 Bases of a Vector Space


Given a vector space E, given a family (vi )i∈I , the subset V of E consisting of the null vector 0
and of all linear combinations of (vi )i∈I is easily seen to be a subspace of E. Subspaces having
such a “generating family” play an important role, and motivate the following definition.

Definition 2.8 Given a vector space E and a subspace V of E, a family (vi )i∈I of vectors
vi ∈ V spans V or generates V if for every v ∈ V , there is some family (λi )i∈I of scalars in
K such that %
v= λi vi .
i∈I

We also say that the elements of (vi )i∈I are generators of V and that V is spanned by (vi )i∈I ,
or generated by (vi )i∈I . If a subspace V of E is generated by a finite family (vi )i∈I , we say
that V is finitely generated . A family (ui )i∈I that spans V and is linearly independent is
called a basis of V .

Example 2.9

1. In R3 , the vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) form a basis.

2. In the subspace of polynomials in R[X] of degree at most n, the polynomials 1, X, X 2 ,


. . . , X n form a basis.
' (
n
3. The polynomials (1 − X)k X n−k for k = 0, . . . , n, also form a basis of that space.
k

It is a standard result of linear algebra that every vector space E has a basis, and that
for any two bases (ui )i∈I and (vj )j∈J , I and J have the same cardinality. In particular, if E
has a finite basis of n elements, every basis of E has n elements, and the integer n is called
the dimension of the vector space E. We begin with a crucial lemma.

Lemma 2.1 Given a linearly independent family (ui )i∈I of elements of a vector space E, if
v ∈ E is not a linear combination of (ui )i∈I , then the family (ui )i∈I ∪k (v) obtained by adding
v to the family (ui )i∈I is linearly independent (where k ∈/ I).
&
Proof . Assume that µv + i∈I λi ui = 0, for any family (λi )i∈I of scalars
& in K. If µ (= 0,
then µ has an inverse (because K is a field), and thus we have v = − i∈I (µ−1 λi )ui , showing
that v is a linear
&combination of (ui )i∈I and contradicting the hypothesis. Thus, µ = 0. But
then, we have i∈I λi ui = 0, and since the family (ui )i∈I is linearly independent, we have
λi = 0 for all i ∈ I.
The next theorem holds in general, but the proof is more sophisticated for vector spaces
that do not have a finite set of generators. Thus, in this chapter, we only prove the theorem
for finitely generated vector spaces.
18 CHAPTER 2. LINEAR ALGEBRA

Theorem 2.2 Given any finite family S = (ui )i∈I generating a vector space E and any
linearly independent subfamily L = (uj )j∈J of S (where J ⊆ I), there is a basis B of E such
that L ⊆ B ⊆ S.

Proof . Consider the set of linearly independent families B such that L ⊆ B ⊆ S. Since this
set is nonempty and finite, it has some maximal element, say B = (uh )h∈H . We claim that
B generates E. Indeed, if B does not generate E, then there is some up ∈ S that is not a
linear combination of vectors in B (since S generates E), with p ∈ / H. Then, by Lemma
2.1, the family B = (uh )h∈H∪{p} is linearly independent, and since L ⊆ B ⊂ B & ⊆ S, this
&

contradicts the maximality of B. Thus, B is a basis of E such that L ⊆ B ⊆ S.

Remark: Theorem 2.2 also holds for vector spaces that are not finitely generated. In this
case, the problem is to guarantee the existence of a maximal linearly independent family B
such that L ⊆ B ⊆ S. The existence of such a maximal family can be shown using Zorn’s
lemma, see Appendix 8 and the references given there.
The following proposition giving useful properties characterizing a basis is an immediate
consequence of Theorem 2.2.

Proposition 2.3 Given a vector space E, for any family B = (vi )i∈I of vectors of E, the
following properties are equivalent:
(1) B is a basis of E.

(2) B is a maximal linearly independent family of E.

(3) B is a minimal generating family of E.

The following replacement proposition shows the relationship between finite linearly in-
dependent families and finite families of generators of a vector space.

Proposition 2.4 Given a vector space E, let (ui )i∈I be any finite linearly independent family
in E, where |I| = m, and let (vj )j∈J be any finite family such that every ui is a linear
combination of (vj )j∈J , where |J| = n. Then, there exists a set L and an injection ρ: L → J
such that L ∩ I = ∅, |L| = n − m, and the families (ui )i∈I ∪ (vρ(l) )l∈L and (vj )j∈J generate
the same subspace of E. In particular, m ≤ n.

Proof . We proceed by induction on |I| = m. When m = 0, the family (ui )i∈I is empty, and
the proposition holds trivially with L = J (ρ is the identity). Assume |I| = m + 1. Consider
the linearly independent family (ui )i∈(I−{p}) , where p is any member of I. By the induction
hypothesis, there exists a set L and an injection ρ: L → J such that L ∩ (I − {p}) = ∅,
|L| = n − m, and the families (ui )i∈(I−{p}) ∪ (vρ(l) )l∈L and (vj )j∈J generate the same subspace
of E. If p ∈ L, we can replace L by (L − {p}) ∪ {p& } where p& does not belong to I ∪ L, and
replace ρ by the injection ρ& which agrees with ρ on L − {p} and such that ρ& (p& ) = ρ(p).
Thus, we can always assume that L ∩ I = ∅. Since up is a linear combination of (vj )j∈J
2.4. BASES OF A VECTOR SPACE 19

and the families (ui )i∈(I−{p}) ∪ (vρ(l) )l∈L and (vj )j∈J generate the same subspace of E, up is
a linear combination of (ui )i∈(I−{p}) ∪ (vρ(l) )l∈L . Let
% %
up = λi u i + λl vρ(l) . (1)
i∈(I−{p}) l∈L

If λl = 0 for all l ∈ L, we have


%
λi ui − up = 0,
i∈(I−{p})

contradicting the fact that (ui )i∈I is linearly independent. Thus, λl (= 0 for some l ∈ L, say
l = q. Since λq (= 0, we have
% %
vρ(q) = (−λ−1 −1
q λi )ui + λq up + (−λ−1
q λl )vρ(l) . (2)
i∈(I−{p}) l∈(L−{q})

We claim that the families (ui )i∈(I−{p}) ∪ (vρ(l) )l∈L and (ui )i∈I ∪ (vρ(l) )l∈(L−{q}) generate the
same subset of E. Indeed, the second family is obtained from the first by replacing vρ(q) by up ,
and vice-versa, and up is a linear combination of (ui )i∈(I−{p}) ∪ (vρ(l) )l∈L , by (1), and vρ(q) is a
linear combination of (ui )i∈I ∪(vρ(l) )l∈(L−{q}) , by (2). Thus, the families (ui )i∈I ∪(vρ(l) )l∈(L−{q})
and (vj )j∈J generate the same subspace of E, and the proposition holds for L − {q} and the
restriction of the injection ρ: L → J to L − {q}, since L ∩ I = ∅ and |L| = n − m imply that
(L − {q}) ∩ I = ∅ and |L − {q}| = n − (m + 1).
Actually, one can prove that Proposition 2.4 implies Theorem 2.2 when the vector space
is finitely generated. Putting Theorem 2.2 and Proposition 2.4 together, we obtain the
following fundamental theorem.

Theorem 2.5 Let E be a finitely generated vector space. Any family (ui )i∈I generating E
contains a subfamily (uj )j∈J which is a basis of E. Furthermore, for every two bases (ui )i∈I
and (vj )j∈J of E, we have |I| = |J| = n.

Proof . The first part follows immediately by applying Theorem 2.2 with L = ∅ and S =
(ui )i∈I . Assume that (ui )i∈I and (vj )j∈J are bases of E. Since (ui )i∈I is linearly independent
and (vj )j∈J spans E, proposition 2.4 implies that |I| ≤ |J|. A symmetric argument yields
|J| ≤ |I|.

Remark: Theorem 2.5 also holds for vector spaces that are not finitely generated. This can
be shown as follows. Let (ui )i∈I be a basis of E, let (vj )j∈J be a generating family of E, and
assume that I is infinite. For every j ∈ J, let Lj ⊆ I be the finite set
%
Lj = {i ∈ I | vj = vi ui , vi (= 0}.
i∈I
20 CHAPTER 2. LINEAR ALGEBRA
)
Let L = j∈J Lj . Since (ui )i∈I is a basis of E, we must have I = L, since otherwise (ui )i∈L
would be another basis of E, and by Lemma 2.1, this would contradict the fact that (ui )i∈I
is linearly independent. Furthermore, J must ) be infinite, since otherwise, because the Lj are
finite, I would be finite. But then, since I = j∈J Lj with J infinite and the Lj finite, by a
standard result of set theory, |I| ≤ |J|. If (vj )j∈J is also a basis, by a symmetric argument,
we obtain |J| ≤ |I|, and thus, |I| = |J| for any two bases (ui )i∈I and (vj )j∈J of E.

When |I| is infinite, we say that E is of infinite dimension, the dimension |I| being a
cardinal number which depends only on the vector space E. The dimension of a vector space
E is denoted by dim(E). Clearly, if the field K itself is viewed as a vector space, then every
family (a) where a ∈ K and a (= 0 is a basis. Thus dim(K) = 1.
Let (ui )i∈I be a basis of a vector space E. For any vector v ∈ E, since the family (ui )i∈I
generates E, there is a family (λi )i∈I of scalars in K, such that
%
v= λi u i .
i∈I

A very important fact is that the family (λi )i∈I is unique.

Proposition 2.6 Given & v ∈ E,


& a vector space E, let (ui )i∈I be a family of vectors in E. Let
and assume that v = i∈I λi ui . Then, the family (λi )i∈I of scalars such that v = i∈I λi ui
is unique iff (ui )i∈I is linearly independent.

Proof . First, assume that &


(ui )i∈I is linearly independent. If (µi )i∈I is another family of
scalars in K such that v = i∈I µi ui , then we have
%
(λi − µi )ui = 0,
i∈I

and since (ui )i∈I is linearly independent, we must have λi −µi = 0 for all i ∈ I, that is, λi = µi
for all i ∈ I. The converse is shown by contradiction. If (ui )i∈I was linearly dependent, there
would be a family (µi )i∈I of scalars not all null such that
%
µi u i = 0
i∈I

and µj (= 0 for some j ∈ I. But then,


% % % %
v= λi u i + 0 = λi u i + µi u i = (λi + µi )ui ,
i∈I i∈I i∈I i∈I

with λj (= λj +µ
&j since µj (= 0, contradicting the assumption that (λi )i∈I is the unique family
such that v = i∈I λi ui .
2.4. BASES OF A VECTOR SPACE 21

If (ui )i∈I is a basis of a vector space E, for any vector v ∈ E, if (vi )i∈I is the unique
family of scalars in K such that %
v= vi ui ,
i∈I

each vi is called the component (or coordinate) of index i of v with respect to the basis (ui )i∈I .
Given a field K and any (nonempty) set I, we can form a vector space K (I) which, in
some sense, is the standard vector space of dimension |I|.

Definition 2.9 Given a field K and any (nonempty) set I, let K (I) be the subset of the
cartesian product K I consisting of all families (λi )i∈I with finite support of scalars in K.3
We define addition and multiplication by a scalar as follows:
(λi )i∈I + (µi )i∈I = (λi + µi )i∈I ,
and
λ · (µi )i∈I = (λµi )i∈I .

It is immediately verified that addition and multiplication by a scalar are well defined.
Thus, K (I) is a vector space. Furthermore, because families with finite support are consid-
ered, the family (ei )i∈I of vectors ei , defined such that (ei )j = 0 if j (= i and (ei )i = 1, is
clearly a basis of the vector space K (I) . When I = {1, . . . , n}, we denote K (I) by K n . The
function ι: I → K (I) , such that ι(i) = ei for every i ∈ I, is clearly an injection.
! When I is a finite set, K (I) = K I , but this is false when I is infinite. In fact, dim(K (I) ) =
|I|, but dim(K I ) is strictly greater when I is infinite.

Many interesting mathematical structures are vector spaces. A very important example
is the set of linear maps between two vector spaces to be defined in the next section. Here
is an example that will prepare us for the vector space of linear maps.

Example 2.10 Let X be any nonempty set and let E be a vector space. The set of all
functions f : X → E can be made into a vector space as follows: Given any two functions
f : X → E and g: X → E, let (f + g): X → E be defined such that
(f + g)(x) = f (x) + g(x)
for all x ∈ X, and for every λ ∈ K, let λf : X → E be defined such that
(λf )(x) = λf (x)
for all x ∈ X. The axioms of a vector space are easily verified. Now, let E = K, and let I
be the set of all nonempty subsets of X. For every S ∈ I, let fS : X → E be the function
such that fS (x) = 1 iff x ∈ S, and fS (x) = 0 iff x ∈
/ S. We leave as an exercise to show that
(fS )S∈I is linearly independent.
3
Where K I denotes the set of all functions from I to K.
22 CHAPTER 2. LINEAR ALGEBRA

2.5 Linear Maps


A function between two vector spaces that preserves the vector space structure is called
a homomorphism of vector spaces, or linear map. Linear maps formalize the concept of
linearity of a function. In the rest of this section, we assume that all vector spaces are over
a given field K (say R).

Definition 2.10 Given two vector spaces E and F , a linear map between E and F is a
function f : E → F satisfying the following two conditions:

f (x + y) = f (x) + f (y) for all x, y ∈ E;


f (λx) = λf (x) for all λ ∈ K, x ∈ E.

Setting x = y = 0 in the first identity, we get f (0) = 0. The basic property of linear
maps is that they transform linear combinations into linear combinations. Given a family
(ui )i∈I of vectors in E, given any family (λi )i∈I of scalars in K, we have
% %
f( λi u i ) = λi f (ui ).
i∈I i∈I

The above identity is shown by induction on the size of the support of the family (λi ui )i∈I ,
using the properties of Definition 2.10.

Example 2.11
1. The map f : R2 → R2 defined such that

x& = x − y
y& = x + y

is a linear map.

2. The map D: R[X] → R[X] defined such that

D(f (X)) = f & (X),

where f & (X) is the derivative of the polynomial f (X), is a linear map

Given a linear map f : E → F , we define its image (or range) Im f = f (E), as the set

Im f = {y ∈ F | f (x) = y, for some x ∈ E},

and its Kernel (or nullspace) Ker f = f −1 (0), as the set

Ker f = {x ∈ E | f (x) = 0}.


2.5. LINEAR MAPS 23

Proposition 2.7 Given a linear map f : E → F , the set Im f is a subspace of F and the set
Ker f is a subspace of E. The linear map f : E → F is injective iff Ker f = 0 (where 0 is the
trivial subspace {0}).

Proof . Given any x, y ∈ Im f , there are some u, v ∈ E such that x = f (u) and y = f (v),
and for all λ, µ ∈ K, we have

f (λu + µv) = λf (u) + µf (v) = λx + µy,

and thus, λx + µy ∈ Im f , showing that Im f is a subspace of F . Given any x, y ∈ Ker f , we


have f (x) = 0 and f (y) = 0, and thus,

f (λx + µy) = λf (x) + µf (y) = 0,

that is, λx + µy ∈ Ker f , showing that Ker f is a subspace of E. Note that f (x) = f (y) iff
f (x − y) = 0. Thus, f is injective iff Ker f = 0.
A fundamental property of bases in a vector space is that they allow the definition of
linear maps as unique homomorphic extensions, as shown in the following proposition.

Proposition 2.8 Given any two vector spaces E and F , given any basis (ui )i∈I of E, given
any other family of vectors (vi )i∈I in F , there is a unique linear map f : E → F such that
f (ui ) = vi for all i ∈ I. Furthermore, f is injective iff (vi )i∈I is linearly independent, and f
is surjective iff (vi )i∈I generates F .

Proof . If such a linear map f : E → F exists, since (ui )i∈I is a basis of E, every vector x ∈ E
can written uniquely as a linear combination
%
x= xi ui ,
i∈I

and by linearity, we must have


% %
f (x) = xi f (ui ) = xi vi .
i∈I i∈I

Define the function f : E → F , by letting


%
f (x) = xi vi
i∈I
&
for every x = i∈I xi ui . It is easy to verify that f is indeed linear, it is unique by the
previous reasoning, and obviously, f (ui ) = vi .
Now, assume that f is injective. Let (λi )i∈I be any family of scalars, and assume that
%
λi vi = 0.
i∈I
24 CHAPTER 2. LINEAR ALGEBRA

Since vi = f (ui ) for every i ∈ I, we have


% % %
f( λi u i ) = λi f (ui ) = λi vi = 0.
i∈I i∈I i∈I

Since f is injective iff Ker f = 0, we have


%
λi ui = 0,
i∈I

and since (ui )i∈I is a basis, we have λi = 0 for all i ∈ I, and (vi )i∈I is linearly independent.
Conversely, assume that (vi )i∈I is linearly independent. If
%
f( λi ui ) = 0,
i∈I

then % % %
λi vi = λi f (ui ) = f ( λi ui ) = 0,
i∈I i∈I i∈I

and λi = 0 for all i ∈ I, since (vi )i∈I is linearly independent. Since (ui )i∈I is a basis of E,
we just showed that Ker f = 0, and f is injective. The part where f is surjective is left as a
simple exercise.
By the second part of Proposition 2.8, an injective linear map f : E → F sends a basis
(ui )i∈I to a linearly independent family (f (ui ))i∈I of F , which is also a basis when f is
bijective. Also, when E and F have the same finite dimension n, (ui )i∈I is a basis of E, and
f : E → F is injective, then (f (ui ))i∈I is a basis of F (by Proposition 2.3).
We can now show that the vector space K (I) of Definition 2.9 has a universal property
that amounts to saying that K (I) is the vector space freely generated by I. Recall that
ι: I → K (I) , such that ι(i) = ei for every i ∈ I, is an injection from I to K (I) .

Proposition 2.9 Given any set I, for any vector space F , and for any function f : I → F ,
there is a unique linear map f : K (I) → F , such that

f = f ◦ ι,

as in the following diagram:


ι
I !! ! K (I)
!!
!! f
!
f !!" #
F

Proof . If such a linear map f : K (I) → F exists, since f = f ◦ ι, we must have

f (i) = f (ι(i)) = f (ei ),


2.5. LINEAR MAPS 25

for every i ∈ I. However, the family (ei )i∈I is a basis of K (I) , and (f (i))i∈I is a family of
vectors in F , and by Proposition 2.8, there is a unique linear map f : K (I) → F such that
f (ei ) = f (i) for every i ∈ I, which proves the existence and uniqueness of a linear map f
such that f = f ◦ ι.
The following simple proposition is also useful.

Proposition 2.10 Given any two vector spaces E and F , with F nontrivial, given any
family (ui )i∈I of vectors in E, the following properties hold:

(1) The family (ui )i∈I generates E iff for every family of vectors (vi )i∈I in F , there is at
most one linear map f : E → F such that f (ui ) = vi for all i ∈ I.

(2) The family (ui )i∈I is linearly independent iff for every family of vectors (vi )i∈I in F ,
there is some linear map f : E → F such that f (ui ) = vi for all i ∈ I.

Proof . (1) If there is any linear map f : E → F such that f (ui ) = vi for all i ∈ I, since
(ui )i∈I generates E, every vector x ∈ E can be written as some linear combination
%
x= xi ui ,
i∈I

and by linearity, we must have


% %
f (x) = xi f (ui ) = xi vi .
i∈I i∈I

This shows that f is unique if it exists. Conversely, assume that (ui )i∈I does not generate E.
Since F is nontrivial, there is some some vector y ∈ F such that y (= 0. Since (ui )i∈I does
not generate E, there is some vector w ∈ E that is not in the subspace generated by (ui )i∈I .
By Theorem 2.2, there is a linearly independent subfamily (ui )i∈I0 of (ui )i∈I generating the
same subspace. Since by hypothesis, w ∈ E is not in the subspace generated by (ui )i∈I0 , by
Lemma 2.1 and by Theorem 2.2 again, there is a basis (ej )j∈I0∪J of E, such that ei = ui ,
for all i ∈ I0 , and w = ej0 , for some j0 ∈ J. Letting (vi )i∈I be the family in F such that
vi = 0 for all i ∈ I, defining f : E → F to be the constant linear map with value 0, we have
a linear map such that f (ui ) = 0 for all i ∈ I. By Proposition 2.8, there is a unique linear
map g: E → F such that g(w) = y, and g(ej ) = 0, for all j ∈ (I0 ∪ J) − {j0 }. By definition
of the basis (ej )j∈I0∪J of E, we have, g(ui ) = 0 for all i ∈ I, and since f (= g, this contradicts
the fact that there is at most one such map.
(2) If the family (ui )i∈I is linearly independent, then by Theorem 2.2, (ui )i∈I can be
extended to a basis of E, and the conclusion follows by Proposition 2.8. Conversely, assume
that (ui )i∈I is linearly dependent. Then, there is some family (λi )i∈I of scalars (not all zero)
such that %
λi ui = 0.
i∈I
26 CHAPTER 2. LINEAR ALGEBRA

By the assumption, for any nonzero vector, y ∈ F , for every i ∈ I, there is some linear map
fi : E → F , such that fi (ui ) = y, and fi (uj ) = 0, for j ∈ I − {i}. Then, we would get
% %
0 = fi ( λi u i ) = λi fi (ui ) = λi y,
i∈I i∈I

and since y (= 0, this implies λi = 0, for every i ∈ I. Thus, (ui )i∈I is linearly independent.
Although in this book, we will not have many occasions to use quotient spaces, they are
fundamental in algebra. The next section may be omitted until needed.

2.6 Quotient Spaces


Let E be a vector space, and let M be any subspace of E. The subspace M induces a relation
≡M on E, defined as follows: For all u, v ∈ E,
u ≡M v iff u − v ∈ M.
We have the following simple proposition.

Proposition 2.11 Given any vector space E and any subspace M of E, the relation ≡M is
an equivalence relation with the following two congruential properties:
1. If u1 ≡M v1 and u2 ≡M v2 , then u1 + u2 ≡M v1 + v2 , and
2. if u ≡M v, then λu ≡M λv.

Proof . It is obvious that ≡M is an equivalence relation. Note that u1 ≡M v1 and u2 ≡M v2


are equivalent to u1 − v1 = w1 and u2 − v2 = w2 , with w1 , w2 ∈ M, and thus,

(u1 + u2 ) − (v1 + v2 ) = w1 + w2 ,

and w1 + w2 ∈ M, since M is a subspace of E. Thus, we have u1 + u2 ≡M v1 + v2 . If


u − v = w, with w ∈ M, then
λu − λv = λw,
and λw ∈ M, since M is a subspace of E, and thus λu ≡M λv.
Proposition 2.11 shows that we can define addition and multiplication by a scalar on the
set E/M of equivalence classes of the equivalence relation ≡M .

Definition 2.11 Given any vector space E and any subspace M of E, we define the following
operations of addition and multiplication by a scalar on the set E/M of equivalence classes
of the equivalence relation ≡M as follows: for any two equivalence classes [u], [v] ∈ E/M, we
have

[u] + [v] = [u + v],


λ[u] = [λu].
2.7. MATRICES 27

By Proposition 2.11, the above operations do not depend on the specific choice of represen-
tatives in the equivalence classes [u], [v] ∈ E/M. It is also immediate to verify that E/M is
a vector space. The function π: E → E/F , defined such that π(u) = [u] for every u ∈ E, is
a surjective linear map called the natural projection of E onto E/F . The vector space E/M
is called the quotient space of E by the subspace M.

Given any linear map f : E → F , we know that Ker f is a subspace of E, and it is


immediately verified that Im f is isomorphic to the quotient space E/Ker f .

2.7 Matrices
Proposition 2.8 shows that given two vector spaces E and F and a basis (uj )j∈J of E, every
linear map f : E → F is uniquely determined by the family (f (uj ))j∈J of the images under
f of the vectors in the basis (uj )j∈J . Thus, in particular, taking F = K (J) , we get an
isomorphism between any vector space E of dimension |J| and K (J) . If J = {1, . . . , n}, a
vector space E of dimension n is isomorphic to the vector space K n . If we also have a basis
(vi )i∈I of F , then every vector f (uj ) can be written in a unique way as
%
f (uj ) = ai j vi ,
i∈I

where j ∈ J, for a family (ai j )i∈I . Thus, with respect to the two bases (uj )j∈J of E and
(vi )i∈I of F , the linear map f is completely determined by a possibly infinite “I × J-matrix”
M(f ) = (ai j )i∈I, j∈J .

Remark: Note that we intentionally assigned the index set J to the basis (uj )j∈J of E,
and the index I to the basis (vi )i∈I of F , so that the rows of the matrix M(f ) associated
with f : E → F are indexed by I, and the columns of the matrix M(f ) are indexed by J.
Obviously, this causes a mildly unpleasant reversal. If we had considered the bases (ui )i∈I of
E and (vj )j∈J of F , we would obtain a J × I-matrix M(f ) = (aj i )j∈J, i∈I . No matter what
we do, there will be a reversal! We decided to stick to the bases (uj )j∈J of E and (vi )i∈I of
F , so that we get an I × J-matrix M(f ), knowing that we may occasionally suffer from this
decision!
When I and J are finite, and say, when |I| = m and |J| = n, the linear map f is
determined by the matrix M(f ) whose entries in the j-th column are the components of the
vector f (uj ) over the basis (v1 , . . . , vm ), that is, the matrix
 
a1 1 a1 2 . . . a1 n
 a2 1 a2 2 . . . a2 n 
M(f ) =   ... .. .. .. 
. . . 
am 1 am 2 . . . am n
whose entry on row i and column j is ai j (1 ≤ i ≤ m, 1 ≤ j ≤ n).
28 CHAPTER 2. LINEAR ALGEBRA

Given vector spaces E, F , and G, and linear maps f : E → F and g: F → G, it is easily


verified that the composition g ◦f : E → G of f and g is a linear map. A linear map f : E → F
is an isomorphism iff there is a linear map g: F → E, such that g ◦ f = idE , and f ◦ g = idF .
It is immediately verified that such a map g is unique. The map g is called the inverse of f
and it is also denoted by f −1 .
One can verify that if f : E → F is a bijective linear map, then its inverse f −1 : F → E
is also a linear map, and thus f is an isomorphism. We leave as an easy exercise to show
that if E and F are two vector spaces, (ui )i∈I is a basis of E, and f : E → F is a linear map
which is an isomorphism, then the family (f (ui ))i∈I is a basis of F .
The set of all linear maps between two vector spaces E and F is denoted as L(E; F ). The
set L(E; F ) is a vector space under the operations defined at the end of Section 2.1, namely
(f + g)(x) = f (x) + g(x)
for all x ∈ E, and
(λf )(x) = λf (x)
for all x ∈ E. The point worth checking carefully is that λf is indeed a linear map, which
uses the commutativity of ∗ in the field K. Indeed, we have
(λf )(µx) = λf (µx) = λµf (x) = µλf (x) = µ(λf )(x).
When E and F have finite dimensions, the vector space L(E; F ) also has finite dimension, as
we shall see shortly. When E = F , a linear map f : E → E is also called an endomorphism.
It is also important to note that composition confers to L(E; E) a ring structure. Indeed,
composition is an operation ◦: L(E; E) × L(E; E) → L(E; E), which is associative and has
an identity idE , and the distributivity properties hold:
(g1 + g2 ) ◦ f = g1 ◦ f + g2 ◦ f ;
g ◦ (f1 + f2 ) = g ◦ f1 + g ◦ f2 .

The ring L(E; E) is an example of a noncommutative ring. It is easily seen that the set
of bijective linear maps f : E → E is a group under composition. It is called the general
linear group (of E), and it is denoted by GL(E), or when E = K n , by GL(n, K), or even
by GL(n).
We will now show that when E and F have finite dimension, linear maps can be very
conveniently represented by matrices, and that composition of linear maps corresponds to
matrix multiplication. We will follow rather closely an elegant presentation method due to
Emil Artin.
Let E and F be two vector spaces, and assume that E has a finite basis (u1 , . . . , un ) and
that F has a finite basis (v1 , . . . , vm ). Recall that we have shown that every vector x ∈ E
can be written in a unique way as
x = x1 u1 + · · · + xn un ,
2.7. MATRICES 29

and similarly every vector y ∈ F can be written in a unique way as

y = y1 v1 + · · · + ym vm .

Let f : E → F be a linear map between E and F . Then, for every x = x1 u1 + · · · + xn un in


E, by linearity, we have
f (x) = x1 f (u1) + · · · + xn f (un ).
Let
f (uj ) = a1 j v1 + · · · + am j vm ,
or more concisely,
m
%
f (uj ) = ai j vi ,
i=1

for every j, 1 ≤ j ≤ n. Then, substituting the right-hand side of each f (uj ) into the
expression for f (x), we get
m
% m
%
f (x) = x1 ( ai 1 vi ) + · · · + xn ( ai n vi ),
i=1 i=1

which, by regrouping terms to obtain a linear combination of the vi , yields


n
% n
%
f (x) = ( a1 j xj )v1 + · · · + ( am j xj )vm .
j=1 j=1

Thus, letting f (x) = y = y1 v1 + · · · + ym vm , we have


n
%
yi = ai j xj (1)
j=1

for all i, 1 ≤ i ≤ m. Let us now consider how the composition of linear maps is expressed in
terms of bases.
Let E, F , and G, be three vectors spaces with respective bases (u1 , . . . , up ) for E,
(v1 , . . . , vn ) for F , and (w1 , . . . , wm ) for G. Let g: E → F and f : F → G be linear maps.
As explained earlier, g: E → F is determined by the images of the basis vectors uj , and
f : F → G is determined by the images of the basis vectors vk . We would like to understand
how f ◦ g: E → G is determined by the images of the basis vectors uj .

Remark: Note that we are considering linear maps g: E → F and f : F → G, instead of


f : E → F and g: F → G, which yields the composition f ◦ g: E → G instead of g ◦ f : E → G.
Our perhaps unusual choice is motivated by the fact that if f is represented by a matrix
M(f ) = (ai k ) and g is represented by a matrix M(g) = (bk j ), then f ◦ g: E → G is
represented by the product AB of the matrices A and B. If we had adopted the other choice
30 CHAPTER 2. LINEAR ALGEBRA

where f : E → F and g: F → G, then g ◦ f : E → G would be represented by the product


BA. Personally, we find it easier to remember the formula for the entry in row i and column
of j of the product of two matrices when this product is written by AB, rather than BA.
Obviously, this is a matter of taste! We will have to live with our perhaps unorthodox choice.
Thus, let
m
%
f (vk ) = ai k wi ,
i=1

for every k, 1 ≤ k ≤ n, and let


n
%
g(uj ) = bk j vk ,
k=1

for every j, 1 ≤ j ≤ p. By previous considerations, for every

x = x1 u1 + · · · + xp up ,

letting g(x) = y = y1 v1 + · · · + yn vn , we have


p
%
yk = bk j xj (2)
j=1

for all k, 1 ≤ k ≤ n, and for every

y = y1 v1 + · · · + yn vn ,

letting f (y) = z = z1 w1 + · · · + zm wm , we have


n
%
zi = ai k yk (3)
k=1

for all i, 1 ≤ i ≤ m. Then, if y = g(x) and z = f (y), we have z = f (g(x)), and in view of
(2) and (3), we have
n p
% %
zi = ai k ( bk j xj )
k=1 j=1
n p
%%
= ai k bk j xj
k=1 j=1
p n
% %
= ai k bk j xj
j=1 k=1
p n
% %
= ( ai k bk j )xj .
j=1 k=1
2.7. MATRICES 31

Thus, defining ci j such that


n
%
ci j = ai k bk j ,
k=1

for 1 ≤ i ≤ m, and 1 ≤ j ≤ p, we have


p
%
zi = ci j xj (4)
j=1

Identity (4) suggests defining a multiplication operation on matrices, and we proceed to


do so. We have the following definitions.

Definition 2.12 Given a field K, an m × n-matrix is a family (ai j )1≤i≤m, 1≤j≤n of scalars
in K, represented as an array
 
a1 1 a1 2 . . . a1 n
 a2 1 a2 2 . . . a2 n 
 . .. .. .. 
 .. . . . 
am 1 am 2 . . . am n

In the special case where m = 1, we have a row vector , represented as

(a1 1 , . . . , a1 n )

and in the special case where n = 1, we have a column vector , represented as


 
a1 1
 ... 
am 1

In these last two cases, we usually omit the constant index 1 (first index in case of a row,
second index in case of a column). The set of all m × n-matrices is denoted by Mm,n (K)
or Mm,n . An n × n-matrix is called a square matrix of dimension n. The set of all square
square matrices of dimension n is denoted by Mn (K), or Mn .

Remark: As defined, a matrix A = (ai j )1≤i≤m, 1≤j≤n is a family, that is, a function from
{1, 2, . . . , m} × {1, 2, . . . , n} to K. As such, there is no reason to assume an ordering on
the indices. Thus, the matrix A can be represented in many different ways as an array, by
adopting different orders for the rows or the columns. However, it is customary (and usually
convenient) to assume the natural ordering on the sets {1, 2, . . . , m} and {1, 2, . . . , n}, and
to represent A as an array according to this ordering of the rows and columns.
We also define some operations on matrices as follows.
32 CHAPTER 2. LINEAR ALGEBRA

Definition 2.13 Given two m × n matrices A = (ai j ) and B = (bi j ), we define their sum
A + B as the matrix C = (ci j ) such that ci j = ai j + bi j . Given a scalar λ ∈ K, we define the
matrix λA as the matrix C = (ci j ) such that ci j = λai j . Given an m × n matrices A = (ai k )
and an n × p matrices B = (bk j ), we define their product AB as the m × p matrix C = (ci j )
such that n
%
ci j = ai k bk j ,
k=1

for 1 ≤ i ≤ m, and 1 ≤ j ≤ p. Given an m × n matrix A = (ai j ), its transpose A# = (a#j i ),


is the n × m-matrix such that a#ji = aij , for all i, 1 ≤ i ≤ m, and all j, 1 ≤ j ≤ n. The
transpose of a matrix A is sometimes denoted by At , or even by t A.

In the product AB = C shown below


    
a1 1 a1 2 . . . a1 n b1 1 b1 2 ... b1 p c1 1 c1 2 ... c1 p
 a2 1 a2 2 . . . a2 n   b2 1 b2 2 ... b2 p   c2 1 c2 2 ... c2 p 
 . .. .. ..   . .. .. ..  = . .. .. .. 
 .. . . .   .. . . .   .. . . . 
am 1 am 2 . . . am n bn 1 bn 2 . . . bn p cm 1 cm 2 . . . cm p
note that the entry of index i and j of the matrix AB is obtained by multiplying the matrices
A and B can be identified with the product of the row matrix corresponding to the i-th row
of A with the column matrix corresponding to the j-column of B:
 
b1 j %n
(ai 1 , . . . , ai n )  ...  = ( ai k bk j )
bn j k=1

The square matrix In of dimension n containing 1 on the diagonal and 0 everywhere else
is called the identity matrix . It is denoted as
 
1 0 ... 0
0 1 ... 0 
. . . 
 .. .. . . ... 
0 0 ... 1

Note that the transpose A# of a matrix A has the property that the j-th row of A# is
the j-th column of A. In other words, transposition exchanges the rows and the columns
of a matrix. For every square matrix A of dimension n, it is immediately verified that
AIn = In A = A. If a matrix B such that AB = BA = In exists, then it is unique, and it is
called the inverse of A. The matrix B is also denoted by A−1 .
Consider the m × n-matrices Ei,j = (eh k ), defined such that ei j = 1, and eh k = 0, if h (= i
or k (= j. It is clear that every matrix A = (ai j ) ∈ Mm,n (K) can be written in a unique way
as m % n
%
A= ai j Ei,j .
i=1 j=1
2.7. MATRICES 33

Thus, the family (Ei,j )1≤i≤m,1≤j≤n is a basis of the vector space Mm,n (K), which has dimen-
sion mn.

Remark: Definition 2.12 and Definition 2.13 also make perfect sense when K is a (com-
mutative) ring rather than a field. In this more general setting, the framework of vector
spaces is too narrow, but we can consider structures over a commutative ring A satisfying
all the axioms of Definition 2.5. Such structures are called modules. The theory of modules
is (much) more complicated than that of vector spaces. For example, modules do not always
have a basis, and other properties holding for vector spaces usually fail for modules. When
a module has a basis, it is called a free module. For example, when A is a commutative
ring, the structure An is a module such that the vectors ei , with (ei )i = 1 and (ei )j = 0 for
j (= i, form a basis of An . Many properties of vector spaces still hold for An . Thus, An is a
free module. As another example, when A is a commutative ring, Mm,n (A) is a free module
with basis (Ei,j )1≤i≤m,1≤j≤n . Polynomials over a commutative ring also form a free module
of infinite dimension.
Square matrices provide a natural example of a noncommutative ring with zero divisors.
Example 2.12 For example, letting A, B be the 2 × 2-matrices
' ( ' (
1 0 0 0
A= , B= ,
0 0 1 0
then ' (' ( ' (
1 0 0 0 0 0
AB = = ,
0 0 1 0 0 0
and ' (' ( ' (
0 0 1 0 0 0
BA = = .
1 0 0 0 1 0

We now formalize the representation of linear maps by matrices.


Definition 2.14 Let E and F be two vector spaces, and let (u1 , . . . , un ) be a basis for E,
and (v1 , . . . , vm ) be a basis for F . Each vector x ∈ E expressed in the basis (u1 , . . . , un ) as
x = x1 u1 + · · · + xn un is represented by the column matrix
 
x1
.
M(x) =  .. 
xn
and similarly for each vector y ∈ F expressed in the basis (v1 , . . . , vm ).
Every linear map f : E → F is represented by the matrix M(f ) = (ai j ), where ai j is the
i-th component of the vector f (uj ) over the basis (v1 , . . . , vm ), i.e., where
m
%
f (uj ) = ai j vi ,
i=1
34 CHAPTER 2. LINEAR ALGEBRA

for every j, 1 ≤ j ≤ n. Explicitly, we have


 
a1 1 a1 2 . . . a1 n
 a2 1 a2 2 . . . a2 n 
M(f ) =   ... .. .. .. 
. . . 
am 1 am 2 . . . am n

The matrix M(f ) associated with the linear map f : E → F is called the matrix of f with
respect to the bases (u1 , . . . , un ) and (v1 , . . . , vm ). When E = F and the basis (v1 , . . . , vm )
is identical to the basis (u1 , . . . , un ) of E, the matrix M(f ) associated with f : E → E (as
above) is called the matrix of f with respect to the base (u1 , . . . , un ).

Remark: As in the remark after Definition 2.12, there is no reason to assume that the
vectors in the bases (u1 , . . . , un ) and (v1 , . . . , vm ) are ordered in any particular way. However,
it is often convenient to assume the natural ordering. When this is so, authors sometimes
refer to the matrix M(f ) as the matrix of f with respect to the ordered bases (u1 , . . . , un )
and (v1 , . . . , vm ).
Then, given a linear map f : E → F represented by the matrix M(f ) = (ai j ) w.r.t. the
bases (u1 , . . . , un ) and (v1 , . . . , vm ), by equations (1) and the definition of matrix multipli-
cation, the equation y = f (x) correspond to the matrix equation M(y) = M(f )M(x), that
is,     
y1 a1 1 . . . a1 n x1
.
 ..  =  .. . . .. ..   ... 
.
ym am 1 . . . am n xn
The function that associates to a linear map f : E → F the matrix M(f ) w.r.t. the
bases (u1 , . . . , un ) and (v1 , . . . , vm ) has the property that matrix multiplication corresponds
to composition of linear maps. The following proposition states the main properties of the
mapping f 1→ M(f ) between L(E; F ) and Mm,n . In short, it is an isomorphism of vector
spaces.
Proposition 2.12 Given three vector spaces E, F , G, with respective bases (u1, . . . , up ),
(v1 , . . . , vn ), and (w1 , . . . , wm ), the mapping M: L(E; F ) → Mn,p that associates the matrix
M(g) to a linear map g: E → F satisfies the following properties for all x ∈ E, all g, h: E →
F , and all f : F → G:
M(g(x)) = M(g)M(x)
M(g + h) = M(g) + M(h)
M(λg) = λM(g)
M(f ◦ g) = M(f )M(g).

Thus, M: L(E; F ) → Mn,p is an isomorphism of vector spaces, and when p = n and the
basis (v1 , . . . , vn ) is identical to the basis (u1 , . . . , up ), M: L(E; E) → Mn is an isomorphism
of rings.
2.7. MATRICES 35

Proof . That M(g(x)) = M(g)M(x) was shown just before stating the proposition, using
identity (1). The identities M(g + h) = M(g) + M(h) and M(λg) = λM(g) are straightfor-
ward, and M(f ◦g) = M(f )M(g) follows from (4) and the definition of matrix multiplication.
The mapping M: L(E; F ) → Mn,p is clearly injective, and since every matrix defines a lin-
ear map, it is also surjective, and thus bijective. In view of the above identities, it is an
isomorphism (and similarly for M: L(E; E) → Mn ).
In view of Proposition 2.12, it seems preferable to represent vectors from a vector space
of finite dimension as column vectors rather than row vectors. Thus, from now on, we will
denote vectors of Rn (or more generally, of K n ) as columm vectors.
It is important to observe that the isomorphism M: L(E; F ) → Mn,p given by Proposition
2.12 depends on the choice of the bases (u1 , . . . , up ) and (v1 , . . . , vn ), and similarly for the
isomorphism M: L(E; E) → Mn , which depends on the choice of the basis (u1, . . . , un ). Thus,
it would be useful to know how a change of basis affects the representation of a linear map
f : E → F as a matrix. The following simple proposition is needed.

Proposition 2.13 Let E be a vector space, and let (u1 , . . . , un ) be a&basis of E. For every
family (v1 , . . . , vn ), let P = (ai j ) be the matrix defined such that vj = ni=1 ai j ui . The matrix
P is invertible iff (v1 , . . . , vn ) is a basis of E.

Proof . Note that we have P = M(f ), the matrix associated with the unique linear map
f : E → E such that f (ui ) = vi . By Proposition 2.8, f is bijective iff (v1 , . . . , vn ) is a basis of
E. Furthermore, it is obvious that the identity matrix In is the matrix associated with the
identity id: E → E w.r.t. any basis. If f is an isomorphism, then f ◦ f −1 = f −1 ◦ f = id, and
by Proposition 2.12, we get M(f )M(f −1 ) = M(f −1 )M(f ) = In , showing that P is invertible
and that M(f −1 ) = P −1 .
Proposition 2.13 suggests the following definition.

Definition 2.15 Given a vector space E of dimension n, for any two bases (u1 , . . . , un ) and
(v1 , . . . , vn ) of E, let P = (ai j ) be the invertible matrix defined such that
n
%
vj = ai j ui ,
i=1

which is also the matrix of the identity id: E → E with respect to the bases (v1 , . . . , vn ) and
(u1 , . . . , un ), in that order (indeed, we express each id(vj ) = vj over the basis (u1 , . . . , un )).
The matrix P is called the change of basis matrix from (u1 , . . . , un ) to (v1 , . . . , vn ).

Clearly, the change of basis matrix from (v1 , . . . , vn ) to (u1 , . . . , un ) is P −1 . Since P =


(ai,j ) is the matrix of the identity id: E → E with respect to the bases (v1 , . . . , vn ) and
36 CHAPTER 2. LINEAR ALGEBRA

(u1 , . . . , un ), given any vector x ∈ E, if x = x1 u1 + · · · + xn un over the basis (u1 , . . . , un ) and


x = x&1 v1 + · · · + x&n vn over the basis (v1 , . . . , vn ), from Proposition 2.12, we have
    & 
x1 a1 1 . . . a1 n x1
.
 ..  =  .. . . .. .
..   ... 
xn an 1 . . . an n x&n

showing that the old coordinates (xi ) of x (over (u1 , . . . , un )) are expressed in terms of the
new coordinates (x&i ) of x (over (v1 , . . . , vn )). Since the matrix P expresses the new basis
(v1 , . . . , vn ) in terms of the old basis (u1 , . . . , un ), we observe that the coordinates (xi ) of a
vector x vary in the opposite direction of the change of basis. For this reason, vectors are
sometimes said to be contravariant. However, this expression does not make sense! Indeed,
a vector in an intrinsic quantity that does not depend on a specific basis. What makes sense
is that the coordinates of a vector vary in a contravariant fashion.
The effect of a change of bases on the representation of a linear map is described in the
following proposition.

Proposition 2.14 Let E and F be vector spaces, let (u1 , . . . , un ) and (u&1, . . . , u&n ) be two
bases of E, and let (v1 , . . . , vm ) and (v1& , . . . , vm &
) be two bases of F . Let P be the change of
& &
basis matrix from (u1 , . . . , un ) to (u1 , . . . , un ), and let Q be the change of basis matrix from
(v1 , . . . , vm ) to (v1& , . . . , vm
&
). For any linear map f : E → F , let M(f ) be the matrix associated
to f w.r.t. the bases (u1 , . . . , un ) and (v1 , . . . , vm ), and let M & (f ) be the matrix associated to
f w.r.t. the bases (u&1 , . . . , u&n ) and (v1& , . . . , vm&
). We have

M & (f ) = Q−1 M(f )P.

Proof . Since f : E → F can be written as f = idF ◦ f ◦ idE , since P is the matrix of idE
w.r.t. the bases (u&1 , . . . , u&n ) and (u1 , . . . , un ), and Q−1 is the matrix of idF w.r.t. the bases
(v1 , . . . , vm ) and (v1& , . . . , vm
&
), by Proposition 2.12, we have M & (f ) = Q−1 M(f )P .
As a corollary, we get the following result.

Corollary 2.15 Let E be a vector space, and let (u1 , . . . , un ) and (u&1 , . . . , u&n ) be two bases
of E. Let P be the change of basis matrix from (u1 , . . . , un ) to (u&1 , . . . , u&n ). For any linear
map f : E → E, let M(f ) be the matrix associated to f w.r.t. the basis (u1 , . . . , un ), and let
M & (f ) be the matrix associated to f w.r.t. the basis (u&1 , . . . , u&n ). We have

M & (f ) = P −1M(f )P.

2.8 Direct Sums


Before considering linear forms and hyperplanes, we define the notion of direct sum and prove
some simple proositions. There is a subtle point, which is that if we attempt to define the
2.8. DIRECT SUMS 37

direct sum E ⊕ F of two vector spaces using the cartesian product E × F , we don’t quite get
the right notion because elements of E × F are ordered pairs, but we want E ⊕ F = F ⊕ E.
Thus, we want to think of the elements of E ⊕ F as unordrered pairs of elements. It is
possible to do so by considering the direct sum of a family (Ei )i∈{1,2} , and more generally of
a family (Ei )i∈I . For simplicity, we begin by considering the case where I = {1, 2}.

Definition 2.16 Given a family (Ei )i∈{1,2} of two vector spaces, we define the (external)
direct sum E1 ⊕ E2 of the family (Ei )i∈{1,2} as the set

E1 ⊕ E2 = {{31, u4, 32, v4} | u ∈ E1 , v ∈ E2 },

with addition

{31, u14, 32, v14} + {31, u24, 32, v24} = {31, u1 + u2 4, 32, v1 + v2 4},

and scalar multiplication

λ{31, u4, 32, v4} = {31, λu4, 32, λv4}.

We define the injections in1 : E1 → E1 ⊕E2 and in2 : E2 → E1 ⊕E2 as the linear maps defined
such that,
in1 (u) = {31, u4, 32, 04},
and
in2 (v) = {31, 04, 32, v4}.

Note that

E2 ⊕ E1 = {{32, v4, 31, u4} | v ∈ E2 , u ∈ E1 } = E1 ⊕ E2 .

Thus, every member {31, u4, 32, v4} of E1 ⊕ E2 can be viewed as an unordered pair consisting
of the two vectors u and v, tagged with the index 1 and 2, respectively.
0
Remark: In fact, E1 ⊕ E2 is just the product i∈{1,2} Ei of the family (Ei )i∈{1,2} .
! This is not to be confused with the cartesian product E1 × E2 . The vector space E1 × E2
is the set of all ordered pairs 3u, v4, where u ∈ E1 , and v ∈ E2 , with addition and
multiplication by a scalar defined such that

3u1, v1 4 + 3u2, v2 4 = 3u1 + u2 , v1 + v2 4,


λ3u, v4 = 3λu, λv4.
0
There is a bijection between i∈{1,2} Ei and E1 × E2 , but as we just saw, elements of
0
i∈{1,2} Ei are certain sets.
38 CHAPTER 2. LINEAR ALGEBRA

Of course, we can define E1 × · · · × En for any number of vector spaces, and when
E1 = . . . = En , we denote this product by E n .
The following property holds.

Proposition 2.16 Given any two vector spaces, E1 and E2 , the set E1 ⊕ E2 is a vector
space. For every pair of linear maps, f : E1 → G and g: E2 → G, there is a unique linear
map, f + g: E1 ⊕ E2 → G, such that (f + g) ◦ in1 = f and (f + g) ◦ in2 = g, as in the following
diagram:
E1 """
"""
"""f
in1 """
# """
f +g "$
E1 ⊕% E2 !& G
#####
in2 ###
#####g
###
E2

Proof . Define
(f + g)({31, u4, 32, v4}) = f (u) + g(v),
for every u ∈ E1 and v ∈ E2 . It is immediately verified that f + g is the unique linear map
with the required properties.
We already noted that E1 ⊕ E2 is in bijection with E1 × E2 . If we define the projections
π1 : E1 ⊕ E2 → E1 and π2 : E1 ⊕ E2 → E2 , such that

π1 ({31, u4, 32, v4}) = u,

and
π2 ({31, u4, 32, v4}) = v,
we have the following proposition.

Proposition 2.17 Given any two vector spaces, E1 and E2 , for every pair of linear maps,
f : D → E1 and g: D → E2 , there is a unique linear map, f × g: D → E1 ⊕ E2 , such that
π1 ◦ (f × g) = f and π2 ◦ (f × g) = g, as in the following diagram:

& E1
### %
f ##
### π1
#####
## f ×g
D
#
""" ! E1 ⊕ E2
"""
"""
" π2
g """"
""$ #
E2
2.8. DIRECT SUMS 39

Proof . Define
(f × g)(w) = {31, f (w)4, 32, g(w)4},
for every w ∈ D. It is immediately verified that f × g is the unique linear map with the
required properties.

Remark: It is a peculiarity of linear algebra that direct sums and products of finite families
are isomorphic. However, this is no longer true for products and sums of infinite families.
When U, V are subspaces of a vector space E, letting i1 : U → E and i2 : V → E be the
inclusion maps, if U ⊕ V is isomomorphic to E under the map i1 + i2 given by Proposition
2.16, we say that E is a direct (internal) sum of U and V , and we write E = U ⊕ V (with
a slight abuse of notation, since E and U ⊕ V are only isomorphic). It is also convenient to
define the sum U + V of U and V .

Definition 2.17 Given a vector space E, let U, V be any subspaces of E. We define the
sum U + V of U and V as the set

U + V = {w ∈ E | w = u + v, for some u ∈ U and some v ∈ V }.

We say that E is the (internal) direct sum of U and V , denoted by E = U ⊕ V ,4 if for every
x ∈ E, there exist unique vectors u ∈ U and v ∈ V , such that x = u + v.

It is immediately verified that U + V is the least subspace of E containing U and V .


Note that by definition, U + V = V + U, and U ⊕ V = V ⊕ U. The following two simple
propositions holds.

Proposition 2.18 Let E be a vector space. The following properties are equivalent:
1. E = U ⊕ V .

2. E = U + V and U ∩ V = 0.

Proof . First, assume that E is the direct sum of U and V . If x ∈ U ∩ V and x (= 0, since x
can be written both as x+0 and 0+x, we have a contradiction. Thus U ∩V = 0. Conversely,
assume that x = u + v and x = u& + v & , where u, u& ∈ U and v, v & ∈ V . Then,

v & − v = u − u& ,

where v & − v ∈ V and u − u& ∈ U. Since U ∩ V = 0, we must have u& = u and v & = v, and
thus E = U ⊕ V .

Proposition 2.19 Let E be a vector space, and assume that E = U ⊕ V . Then,

dim(E) = dim(U) + dim(V ).


4
Again, with a slight abuse of notation!
40 CHAPTER 2. LINEAR ALGEBRA

Proof . Let (ui )i∈I be a basis of U and let (vj )j∈J be a basis of V , where I and J are disjoint.
Clearly, (ui )i∈I ∪ (vj )j∈J is a basis of E.
We now give the definition of a direct sum for any arbitrary nonempty index set I. First,
let us recall
0 the notion of the product of a family (E)i )i∈I . Given a family of sets (Ei )i∈I , its
product i∈I Ei , is the set of all functions f : I → i∈I Ei , such that, f (i) ∈ Ei , for every
i ∈ I.0It is one of the many versions 0of the axiom of choice, that, if Ei (= ∅ for every i ∈ I,
then i∈I Ei (= ∅. A member 0 f ∈ i∈I Ei , is often denoted as (fi )i∈I . For every i ∈ I,
we have the projection πi : i∈I Ei → Ei , defined such that, πi ((fi )i∈I ) = fi . We now define
direct sums.

Definition 2.18 Let I be1 any nonempty set, and let (Ei )i∈I be a family of vector spaces.
The (external) direct sum i∈I Ei of the family (Ei )i∈I is defined as follows:
1 0
i∈I Ei consists of all f ∈ i∈I Ei , which have finite support, and addition and multi-
plication by a scalar are defined as follows:

(fi )i∈I + (gi )i∈I = (fi + gi )i∈I ,


λ(fi )i∈I = (λfi )i∈I .
1
We also have injection maps ini : Ei → i∈I Ei , defined such that, ini (x) = (fi )i∈I , where
fi = x, and fj = 0, for all j ∈ (I − {i}).

The following proposition is an obvious generalization of Proposition 2.16.

Proposition 2.20 Let I be any nonempty 1 set, let (Ei )i∈I be a family of vector spaces, and
let G be any vector space. The direct sum i∈I Ei is a vector space, and for every family
(hi )i∈I of linear maps hi : Ei → G, there is a unique linear map
2% 3 4
hi : Ei → G,
i∈I i∈I
&
such that, ( i∈I hi ) ◦ ini = hi , for every i ∈ I.

1 (I)
Remark: When Ei = E, for all i ∈ I, we denote i∈I Ei by E . In particular, when
(I)
Ei = K, for all i ∈ I, we find the vector space K of Definition 2.9.
We also have the following basic proposition about injective or surjective linear maps.

Proposition 2.21 Let E and F be vector spaces, and let f : E → F be a linear map. If
f : E → F is injective, then, there is a surjective linear map r: F → E called a retraction,
such that r ◦ f = idE . If f : E → F is surjective, then, there is an injective linear map
s: F → E called a section, such that f ◦ s = idF .
2.8. DIRECT SUMS 41

Proof . Let (ui )i∈I be a basis of E. Since f : E → F is an injective linear map, by Proposition
2.8, (f (ui ))i∈I is linearly independent in F . By Theorem 2.2, there is a basis (vj )j∈J of F ,
where I ⊆ J, and where vi = f (ui ), for all i ∈ I. By Proposition 2.8, a linear map r: F → E
can be defined such that r(vi ) = ui , for all i ∈ I, and r(vj ) = w for all j ∈ (J − I), where w
is any given vector in E, say w = 0. Since r(f (ui )) = ui for all i ∈ I, by Proposition 2.8, we
have r ◦ f = idE .
Now, assume that f : E → F is surjective. Let (vj )j∈J be a basis of F . Since f : E → F
is surjective, for every vj ∈ F , there is some uj ∈ E such that f (uj ) = vj . Since (vj )j∈J is a
basis of F , by Proposition 2.8, there is a unique linear map s: F → E such that s(vj ) = uj .
Also, since f (s(vj )) = vj , by Proposition 2.8 (again), we must have f ◦ s = idF .
The converse of Proposition 2.21 is obvious. We now have the following fundamental
Proposition.

Proposition 2.22 Let E, F and G, be three vector spaces, f : E → F an injective linear


map, g: F → G a surjective linear map, and assume that Im f = Ker g. Then, the following
properties hold. (a) For any section s: G → F of g, Ker g ⊕ Im s is isomorphic to F , and the
linear map f + s: E ⊕ G → F is an isomorphism.5
(b) For any retraction r: F → E of f , Im f ⊕ Ker r is isomorphic to F .6

Proof . (a) Since s: G → F is a section of g, we have g ◦ s = idG , and for every u ∈ F ,

g(u − s(g(u))) = g(u) − g(s(g(u))) = g(u) − g(u) = 0.

Thus, u − s(g(u)) ∈ Ker g, and we have F = Ker g + Im s. On the other hand, if u ∈


Ker g ∩ Im s, then u = s(v) for some v ∈ G because u ∈ Im s, g(u) = 0 because u ∈ Ker g,
and so,
g(u) = g(s(v)) = v = 0,
because g ◦ s = idG , which shows that u = s(v) = 0. Thus, F = Ker g ⊕ Im s, and since by
assumption, Im f = Ker g, we have F = Im f ⊕ Im s. But then, since f and s are injective,
f + s: E ⊕ G → F is an isomorphism. The proof of (b) is very similar.
f g
Given a sequence of linear maps E −→ F −→ G, when Im f = Ker g, we say that the
f g
sequence E −→ F −→ G is exact at F . If in addition to being exact at F , f is injective
and g is surjective, we say that we have a short exact sequence, and this is denoted as
f g
0 −→ E −→ F −→ G −→ 0.

The property of a short exact sequence given by Proposition 2.22 is often described by saying
f g
that 0 −→ E −→ F −→ G −→ 0 is a (short) split exact sequence.
As a corollary of Proposition 2.22, we have the following result.
5
The existence of a section s: G → F of g follows from Proposition 2.21.
6
The existence of a retraction r: F → E of f follows from Proposition 2.21.
42 CHAPTER 2. LINEAR ALGEBRA

Corollary 2.23 Let E and F be vector spaces, and let f : E → F be a linear map. Then, E
is isomorphic to Ker f ⊕ Im f , and thus,

dim(E) = dim(Ker f ) + dim(Im f ).

Proof . Consider
i f!
Ker f −→ E −→ Im f,
i f!
where Ker f −→ E is the inclusion map, and E −→ Im f is the surjection associated
f s
with E −→ F . Then, we apply Proposition 2.22 to any section Im f −→ E of f & to
get an isomorphism between E and Ker f ⊕ Im f , and Proposition 2.19, to get dim(E) =
dim(Ker f ) + dim(Im f ).
The following Proposition will also be useful.

Proposition 2.24 Let E be a vector space. If E = U ⊕ V and E = U ⊕ W , then there is


an isomorphism f : V → W between V and W .

Proof . Let R be the relation between V and W , defined such that

3v, w4 ∈ R iff w − v ∈ U.

We claim that R is a functional relation that defines a linear isomorphism f : V → W between


V and W , where f (v) = w iff 3v, w4 ∈ R (R is the graph of f ). If w − v ∈ U and w & − v ∈ U,
then w & − w ∈ U, and since U ⊕ W is a direct sum, U ∩ W = 0, and thus w & − w = 0, that is
w & = w. Thus, R is functional. Similarly, if w − v ∈ U and w − v & ∈ U, then v & − v ∈ U, and
since U ⊕ V is a direct sum, U ∩ V = 0, and v & = v. Thus, f is injective. Since E = U ⊕ V ,
for every w ∈ W , there exists a unique pair 3u, v4 ∈ U × V , such that w = u + v. Then,
w − v ∈ U, and f is surjective. We also need to verify that f is linear. If

w−v =u

and
w & − v & = u& ,
where u, u& ∈ U, then, we have

(w + w & ) − (v + v & ) = (u + u& ),

where u + u& ∈ U. Similarly, if


w−v =u
where u ∈ U, then we have
λw − λv = λu,
where λu ∈ U. Thus, f is linear.
2.8. DIRECT SUMS 43

Given a vector space E and any subspace U of E, Proposition 2.24 shows that the
dimension of any subspace V such that E = U ⊕ V depends only on U. We call dim(V ) the
codimension of U, and we denote it by codim(U). A subspace U of codimension 1 is called
a hyperplane.
The notion of rank of a linear map or of a matrix is an important one, both theoretically
and practically, since it is the key to the solvabilty of linear equations.

Definition 2.19 Given two vector spaces E and F and a linear map f : E → F , the rank
rk(f ) of f is the dimension dim(Im f ) of the image subspace Im f of F .

We have the following simple proposition.

Proposition 2.25 Given a linear map f : E → F , the following properties hold:

(i) rk(f ) = codim(Ker f ).

(ii) rk(f ) + dim(Ker f ) = dim(E).

(iii) rk(f ) ≤ min(dim(E), dim(F )).

Proof . Since by Proposition 2.23, dim(E) = dim(Ker f ) + dim(Im f ), and by definition,


rk(f ) = dim(Im f ), we have rk(f ) = codim(Ker f ). Since rk(f ) = dim(Im f ), (ii) follows
from dim(E) = dim(Ker f ) + dim(Im f ). As for (iii), since Im f is a subspace of F , we have
rk(f ) ≤ dim(F ), and since rk(f ) + dim(Ker f ) = dim(E), we have rk(f ) ≤ dim(E).
The rank of a matrix is defined as follows.

Definition 2.20 Given a m × n-matrix A = (ai j ) over the field K, the rank rk(A) of the
matrix A is the maximum number of linearly independent columns of A (viewed as vectors
in K m ).

In view of Proposition 2.3, the rank of a matrix A is the dimension of the subspace of
K m generated by the columns of A. Let E and F be two vector spaces, and let (u1 , . . . , un )
be a basis of E, and (v1 , . . . , vm ) a basis of F . Let f : E → F be a linear map, and let M(f )
be its matrix w.r.t. the bases (u1 , . . . , un ) and (v1 , . . . , vm ). Since the rank rk(f ) of f is the
dimension of Im f , which is generated by (f (u1 ), . . . , f (un )), the rank of f is the maximum
number of linearly independent vectors in (f (u1 ), . . . , f (un )), which is equal to the number
of linearly independent columns of M(f ), since F and K m are isomorphic. Thus, we have
rk(f ) = rk(M(f )), for every matrix representing f .
We will see later, using duality, that the rank of a matrix A is also equal to the maximal
number of linearly independent rows of A.
If U is a hyperplane, then E = U ⊕ V for some subspace V of dimension 1. However, a
subspace V of dimension 1 is generated by any nonzero vector v ∈ V , and thus we denote
44 CHAPTER 2. LINEAR ALGEBRA

V by Kv, and we write E = U ⊕ Kv. Clearly, v ∈ / U. Conversely, let x ∈ E be a vector


such that x ∈
/ U (and thus, x (= 0). We claim that E = U ⊕ Kx. Indeed, since U is a
hyperplane, we have E = U ⊕ Kv for some v ∈
/ U (with v (= 0). Then, x ∈ E can be written
in a unique way as x = u + λv, where u ∈ U, and since x ∈/ U, we must have λ (= 0, and
−1 −1
thus, v = −λ u + λ x. Since E = U ⊕ Kv, this shows that E = U + Kx. Since x ∈ / U,
we have U ∩ Kx = 0, and thus E = U ⊕ Kx. This argument shows that a hyperplane is a
maximal proper subspace H of E.
In the next section, we shall see that hyperplanes are precisely the Kernels of nonnull
linear maps f : E → K, called linear forms.

2.9 The Dual Space E ∗ and Linear Forms


We already observed that the field K itself is a vector space (over itself). The vector space
L(E; K) of linear maps f : E → K, the linear forms, plays a particular role. We take a quick
look at the connection between E and L(E; K), its dual space.
Definition 2.21 Given a vector space E, the vector space L(E; K) of linear maps f : E → K
is called the dual space (or dual) of E. The space L(E; K) is also denoted by E ∗ , and the
linear maps in E ∗ are called the linear forms, or covectors. The dual space E ∗∗ of the space
E ∗ is called the bidual of E.

As a matter of notation, linear forms f : E → K will also be denoted by starred symbol,


such as u∗ , x∗ , etc.
Given a linear form u∗ ∈ E ∗ and a vector v ∈ E, the result u∗ (v) of applying u∗ to v is
also denoted by 3u∗, v4. This defines a binary operation 3−, −4: E ∗ × E → K satisfying the
following properties:
3u∗1 + u∗2 , v4 = 3u∗1, v4 + 3u∗2, v4
3u∗ , v1 + v2 4 = 3u∗, v1 4 + 3u∗ , v2 4
3λu∗, v4 = λ3u∗, v4
3u∗, λv4 = λ3u∗, v4.

The above identities mean that 3−, −4 is a bilinear map. In view of the above identities,
given any fixed vector v ∈ E, the map ṽ: E ∗ → K defined such that
ṽ(u∗ ) = 3u∗ , v4
for every u∗ ∈ E ∗ is a linear map from E ∗ to K, that is, ṽ is a linear form in E ∗∗ . Again
from the above identities, the map cE : E → E ∗∗ , defined such that
cE (v) = ṽ
for every v ∈ E, is a linear map. We shall see that it is injective, and that it is an isomorphism
when E has finite dimension.
2.9. THE DUAL SPACE E ∗ AND LINEAR FORMS 45

Definition 2.22 Given a vector space E and its dual E ∗ , we say that a vector v ∈ E and a
linear form u∗ ∈ E ∗ are orthogonal if 3u∗ , v4 = 0. Given a subspace V of E and a subspace U
of E ∗ , we say that V and U are orthogonal if 3u∗ , v4 = 0 for every u∗ ∈ U and every v ∈ V .
Given a subset V of E (resp. a subset U of E ∗ ), the orthogonal V 0 of V is the subspace V 0
of E ∗ defined such that

V 0 = {u∗ ∈ E ∗ | 3u∗, v4 = 0, for every v ∈ V }

(resp. the orthogonal U 0 of U is the subspace U 0 of E defined such that

U 0 = {v ∈ E | 3u∗ , v4 = 0, for every u∗ ∈ U}).

Observe that E 0 = 0, and {0}0 = E ∗ .


! It is also easy to see that if M ⊆ N ⊆ E, then N 0 ⊆ M 0 ⊆ E ∗ . From this, it follows
that V ⊆ V 00 for every subspace V of E, and that U ⊆ U 00 for every subspace U of
E ∗ . However, even though V = V 00 is always true, when E is of infinite dimension, it is not
always true that U = U 00 .

Given a vector space E and any basis (ui )i∈I for E, we can associate to each ui a linear
form u∗i ∈ E ∗ , and the u∗i have some remarkable properties.

Definition 2.23 Given a vector space E and any basis (ui )i∈I for E, by Proposition 2.8,
for every i ∈ I, there is a unique linear form u∗i such that
5
∗ 1 if i = j
ui (uj ) =
0 if i (= j,

for every j ∈ I. The linear form u∗i is called the coordinate form of index i w.r.t. the basis
(ui )i∈I .

Given an index set I, authors often define the so called “Kronecker symbol” δi j , such
that 5
1 if i = j
δi j =
0 if i (= j,
for all i, j ∈ I. Then, u∗i (uj ) = δi j .
Given a vector space E and a subspace U of E, by Theorem 2.2, every basis (ui )i∈I of U
can be extended to a basis (uj )j∈I∪J of E, where I ∩ J = ∅. We have the following important
theorem.

Theorem 2.26 Let E be a vector space. The following properties hold:


(a) For every basis (ui )i∈I of E, the family (u∗i )i∈I of coordinate forms is linearly indepen-
dent.
46 CHAPTER 2. LINEAR ALGEBRA

(b) For every subspace V of E, we have V 00 = V .

(c) For every subspace V of finite codimension m of E, for every subspace W of E such
that E = V ⊕ W (where W is of finite dimension m), for every basis (ui )i∈I of E such
that (u1 , . . . , um ) is a basis of W , the family (u∗1 , . . . , u∗m) is a basis of the orthogonal
V 0 of V in E ∗ . Furthermore, we have V 00 = V , and

dim(V ) + dim(V 0 ) = dim(E).

(d) For every subspace U of finite dimension m of E ∗ , the orthogonal U 0 of U in E is of


finite codimension m, U 00 = U, and

dim(U) + dim(U 0 ) = dim(E).

Proof . (a) Assume that %


λi u∗i = 0,
i∈I

for a family (λi )i∈I (of scalars in K). Since (λi )i∈I has finite support, there is a finite subset
J of I such that λi = 0 for all i ∈ I − J, and we have
%
λj u∗j = 0.
j∈J
&
Applying the linear form j∈J λj u∗j to each uj (j ∈ J), by Definition 2.23, since u∗i (uj ) = 1
if i = j and 0 otherwise, we get λj = 0 for all j ∈ J, that is λi = 0 for all i ∈ I (by definition
of J as the support). Thus, (u∗i )i∈I is linearly independent.
(b) Let (ui )i∈I∪J be a basis of E such that (ui )i∈I is a basis of V (where I ∩ J = ∅).
Clearly, we have V ⊆ V 00 . If V (= V 00 , then uj0 ∈ V 00 for some j0 ∈ J (and thus, j0 ∈ / I).
Since uj0 ∈ V 00 , uj0 is orthogonal to every linear form in V 0 . In particular, let g ∗ be the
linear form defined such that 5
∗ 1 if i = j0
g (ui ) =
0 if i (= j0 ,
where i ∈ I ∪ J. Then, it is clear that g ∗(ui ) = 0 for all i ∈ I, and thus that g ∗ ∈ V 0 .
However, g ∗ (uj0 ) = 1, contradicting the fact that uj0 is orthogonal to every linear form in
V 0 . Thus, V = V 00 .
(c) Let J = I − {1, . . . , m}. Every linear form f ∗ ∈ V 0 is orthogonal to every uj , for
j ∈ J, and thus, f ∗ (uj ) = 0, for all j ∈ J. For such a linear form f ∗ ∈ V 0 , let

g ∗ = f ∗ (u1 )u∗1 + · · · + f ∗ (um )u∗m .

We have g ∗ (ui ) = f ∗ (ui ), for every i, 1 ≤ i ≤ m. Furthermore, by definition, g ∗ vanishes


on all uj , where j ∈ J. Thus, f ∗ and g ∗ agree on the basis (ui )i∈I of E, and so, g ∗ = f ∗ .
2.9. THE DUAL SPACE E ∗ AND LINEAR FORMS 47

This shows that (u∗1 , . . . , u∗m) generates V 0 , and since it is also a linearly independent family,
(u∗1 , . . . , u∗m) is a basis of V 0 . It is then obvious that dim(V ) + dim(V 0 ) = dim(E), and by
part (b), we have V 00 = V .
(d) Let (u∗1 , . . . , u∗m ) be a basis of U. Note that the map h: E → K m defined such that
h(v) = (u∗1 (v), . . . , u∗m (v))
for every v ∈ E, is a linear map, and that its kernel Ker h is precisely U 0 . Then, by
Proposition 2.23,
dim(E) = dim(Ker h) + dim(Im h) ≤ dim(U 0 ) + m,
since dim(Im h) ≤ m. Thus, U 0 is a subspace of E of finite codimension at most m, and by
(c), we have dim(U 0 )+dim(U 00 ) = dim(E). However, it is clear that U ⊆ U 00 , which implies
dim(U) ≤ dim(U 00 ), and so dim(U 0 ) + dim(U) ≤ dim(E). Thus, U 0 is of finite codimension
at least m, which implies codim(U 0 ) = m. But then, codim(U 0 ) = m = dim(U), and since
dim(U 0 ) + dim(U 00 ) = dim(E), we have dim(U 00 ) = m, and we must have U = U 00 .
Part (a) of Theorem 2.26 shows that
dim(E) ≤ dim(E ∗ ).
When E is of finite dimension n and (u1, . . . , un ) is a basis of E, by part (c), the family
(u∗1 , . . . , u∗n ) is a basis of the dual space E ∗ , called the dual basis of (u1 , . . . , un ). By part (c)
and (d) of theorem 2.26, the maps V 1→ V 0 and U 1→ U 0 , where V is a subspace of finite
codimension of E and U is a subspace of finite dimension of E ∗ , are inverse bijections. These
maps set up a duality between subspaces of finite codimension of E, and subspaces of finite
dimension of E ∗ .
! One should be careful that this bijection does not extend to subspaces of E ∗ of infinite
dimension.
!! When E is of infinite dimension, for every basis (ui )i∈I of E, the family (u∗i )i∈I of coor-
dinate forms is never a basis of E ∗ . It is linearly independent, but it is “too small” to
generate E ∗ . For example, if E = R(N) , where N = {0, 1, 2, . . .}, the map f : E → R that
sums the nonzero coordinates of a vector in E is a linear form, but it is easy to see that it
cannot be expressed as a linear combination of coordinate forms. As a consequence, when
E is of infinite dimension, E and E ∗ are not isomorphic.
When E is of finite dimension n and (u1 , . . . , un ) is a basis of E, we noted that the family
(u∗1 , . . . , u∗n ) is a basis of the dual space E ∗ (called the dual basis of (u1 , . . . , un )). Let us see
how the coordinates of a linear form ϕ∗ vary under a change of basis.
Let (u1, . . . , un ) and (v1 , . . . , vn ) be two bases of E, and let P = (ai j ) be the change of
basis matrix from (u1 , . . . , un ) to (v1 , . . . , vn ), so that
n
%
vj = ai j ui ,
i=1
48 CHAPTER 2. LINEAR ALGEBRA

and let P −1 = (bi j ) be the inverse of P , so that


n
%
ui = bj i vj .
j=1

Since u∗i (uj ) = δi j and vi∗ (vj ) = δi j , we get


n
%
vj∗ (ui ) = vj∗ ( bk i vk ) = bj i ,
k=1

and thus n
%
vj∗ = bj i u∗i ,
i=1

and similarly
n
%
u∗i = ai j vj∗ .
j=1

Since n n
% %

ϕ = ϕi u∗i = ϕ&i vi∗ ,
i=1 i=1
we get
n
%
ϕ&j = ai j ϕi .
i=1

Comparing with the change of basis


n
%
vj = ai j ui ,
i=1

we note that this time, the coordinates (ϕi ) of the linear form ϕ∗ change in the same direction
as the change of basis. For this reason, we say that the coordinates of linear forms are
covariant. By abuse of language, it is often said that linear forms are covariant, which
explains why the term covector is also used for a linear form.

Remark: In many texts using tensors, vectors are often indexed with lower indices. If so, it
is more convenient to write the coordinates of a vector x over the basis (u1 , . . . , un ) as (xi ),
using an upper index, so that
%n
x= xi ui ,
i=1

and in a change of basis, we have


n
%
vj = aij ui
i=1
2.9. THE DUAL SPACE E ∗ AND LINEAR FORMS 49

and n
%
i
x = aij x&j .
j=1

Dually, linear forms are indexed with upper indices. Then, it is more convenient to write the
coordinates of a covector ϕ∗ over the dual basis (u∗ 1 , . . . , u∗n ) as (ϕi ), using a lower index,
so that n
%

ϕ = ϕi u ∗ i
i=1

and in a change of basis, we have


n
%
u∗ i = aij v ∗ j
j=1

and n
%
ϕ&j = aij ϕi .
i=1

With these conventions, the index of summation appears once in upper position and once in
lower position, and the summation sign can be safely omitted, a trick due to Einstein. For
example, we can write
ϕ&j = aij ϕi
as an abbreviation for n
%
ϕ&j = aij ϕi .
i=1

We will now pin down the relationship between a vector space E and its bidual E ∗∗ .

Proposition 2.27 Let E be a vector space. The following properties hold:

(a) The linear map cE : E → E ∗∗ defined such that

cE (v) = ṽ,

that is, cE (v)(u∗ ) = 3u∗ , v4 for every u∗ ∈ E ∗ , is injective.

(b) When E is of finite dimension n, the linear map cE : E → E ∗∗ is an isomorphism


(called the canonical isomorphism).
&
Proof . (a) Let (ui )i∈I be a basis of E, and let v = i∈I vi ui . If cE (v) = 0, then in particular,
cE (v)(u∗i ) = 0 for all u∗i , and since

cE (v)(u∗i ) = 3u∗i , v4 = vi ,

we have vi = 0 for all i ∈ I, that is, v = 0, showing that cE : E → E ∗∗ is injective.


50 CHAPTER 2. LINEAR ALGEBRA

If E is of finite dimension n, by Theorem 2.26, for every basis (u1 , . . . , un ), the family
(u∗1 , . . . , u∗n )
is a basis of the dual space E ∗ , and thus the family (u∗∗ ∗∗
1 , . . . , un ) is a basis of
the bidual E ∗∗ . This shows that dim(E) = dim(E ∗∗ ) = n, and since by part (a), we know
that cE : E → E ∗∗ is injective, in fact, cE : E → E ∗∗ is bijective (because an injective map
carries a linearly independent family to a linearly independent family, and in a vector space
of dimension n, a linearly independent family of n vectors is a basis, see Proposition 2.3).
!! When a vector space E has infinite dimension, E and its bidual E ∗∗ are never isomorphic.

When E is of finite dimension and (u1, . . . , un ) is a basis of E, in view of the canonical


isomorphism cE : E → E ∗∗ , the basis (u∗∗ ∗∗
1 , . . . , un ) of the bidual is identified with (u1 , . . . , un ).

Proposition 2.27 can be reformulated very fruitfully in terms of pairings.

Definition 2.24 Given two vector spaces E and F over K, a pairing between E and F is
a bilinear map 3−, −4: E × F → K. Such a pairing is non-singular if for every u ∈ E, if
3u, v4 = 0 for all v ∈ F , then u = 0, and for every v ∈ F , if 3u, v4 = 0 for all u ∈ E, then
v = 0.

For example, the map 3−, −4: E ∗ × E → K defined earlier is a non-singular pairing (use
the proof of (a) in lemma 2.27).
Given a pairing 3−, −4: E × F → K, we can define two maps ϕ: E → F ∗ and ψ: F → E ∗
as follows: For every u ∈ E, we define the linear form ϕ(u) in F ∗ such that

ϕ(u)(v) = 3u, v4

for every v ∈ F , and we define the linear form ψ(u) in E ∗ such that

ψ(v)(u) = 3u, v4

for every u ∈ E. We have the following useful proposition.

Proposition 2.28 Given two vector spaces E and F over K, for every non-singular pairing
3−, −4: E × F → K between E and F , the maps ϕ: E → F ∗ and ψ: F → E ∗ are linear and
injective. Furthermore, if E and F have finite dimension, then this dimension is the same
and ϕ: E → F ∗ and ψ: F → E ∗ are bijections.

Proof . The maps ϕ: E → F ∗ and ψ: F → E ∗ are linear because a pairing is bilinear. If


ϕ(u) = 0 (the null form), then
ϕ(u)(v) = 3u, v4 = 0
for every v ∈ F , and since 3−, −4 is non-singular, u = 0. Thus, ϕ: E → F ∗ is injective.
Similarly, ψ: F → E ∗ is injective. When F has finite dimension n, we have seen that F
and F ∗ have the same dimension. Since ϕ: E → F ∗ is injective, we have m = dim(E) ≤
2.10. HYPERPLANES AND LINEAR FORMS 51

dim(F ) = n. The same argument applies to E, and thus n = dim(F ) ≤ dim(E) = m. But
then, dim(E) = dim(F ), and ϕ: E → F ∗ and ψ: F → E ∗ are bijections.
When E has finite dimension, the non-singular pairing 3−, −4: E ∗ ×E → K yields another
proof of the existence of a natural isomorphism between E and E ∗∗ . Interesting non-singular
pairings arise in exterior algebra. We now show the relationship between hyperplanes and
linear forms.

2.10 Hyperplanes and Linear Forms


Actually, Proosition 2.29 below follows from parts (c) and (d) of Theorem 2.26, but we feel
that it is also interesting to give a more direct proof.

Proposition 2.29 Let E be a vector space. The following properties hold:


(a) Given any nonnull linear form f ∗ ∈ E ∗ , its kernel H = Ker f ∗ is a hyperplane.
(b) For any hyperplane H in E, there is a (nonnull) linear form f ∗ ∈ E ∗ such that H =
Ker f ∗ .
(c) Given any hyperplane H in E and any (nonnull) linear form f ∗ ∈ E ∗ such that H =
Ker f ∗ , for every linear form g ∗ ∈ E ∗ , H = Ker g ∗ iff g ∗ = λf ∗ for some λ (= 0 in K.

Proof . (a) If f ∗ ∈ E ∗ is nonnull, there is some vector v0 ∈ E such that f ∗ (v0 ) (= 0. Let
H = Ker f ∗ . For every v ∈ E, we have
' (
∗ f ∗ (v) f ∗ (v) ∗
f v− ∗ v0 = f ∗ (v) − ∗ f (v0 ) = f ∗ (v) − f ∗ (v) = 0.
f (v0 ) f (v0 )
Thus,
f ∗ (v)
v− v0 = h ∈ H,
f ∗ (v0 )
and
f ∗ (v)
v =h+ v0 ,
f ∗ (v0 )
that is, E = H + Kv0 . Also, since f ∗ (v0 ) (= 0, we have v0 ∈
/ H, that is, H ∩ Kv0 = 0. Thus,
E = H ⊕ Kv0 , and H is a hyperplane.
(b) If H is a hyperplane, E = H ⊕ Kv0 for some v0 ∈ / H. Then, every v ∈ E can be
written in a unique way as v = h + λv0 . Thus, there is a well-defined function f ∗ : E → K,
such that, f ∗ (v) = λ, for every v = h + λv0 . We leave as a simple exercise the verification
that f ∗ is a linear form. Since f ∗ (v0 ) = 1, the linear form f ∗ is nonnull. Also, by definition,
it is clear that λ = 0 iff v ∈ H, that is, Ker f ∗ = H.
(c) Let H be a hyperplane in E, and let f ∗ ∈ E ∗ be any (nonnull) linear form such that
H = Ker f ∗ . Clearly, if g ∗ = λf ∗ for some λ (= 0, then H = Ker g ∗ . Conversely, assume that
52 CHAPTER 2. LINEAR ALGEBRA

H = Ker g ∗ for some nonnull linear form g ∗ . From (a), we have E = H ⊕ Kv0 , for some v0
such that f ∗ (v0 ) (= 0 and g ∗ (v0 ) (= 0. Then, observe that
g ∗(v0 ) ∗
g∗ − f
f ∗ (v0 )
is a linear form that vanishes on H, since both f ∗ and g ∗ vanish on H, but also vanishes on
Kv0 . Thus, g ∗ = λf ∗ , with
g ∗ (v0 )
λ= ∗ .
f (v0 )

If E is a vector space of finite dimension n and (u1 , . . . , un ) is a basis of E, for any linear
form f ∗ ∈ E ∗ , for every x = x1 u1 + · · · + xn un ∈ E, we have
f ∗ (x) = λ1 x1 + · · · + λn xn ,
where λi = f ∗ (ui ) ∈ K, for every i, 1 ≤ i ≤ n. Thus, with respect to the basis (u1 , . . . , un ),
f ∗ (x) is a linear combination of the coordinates of x, as expected.
We leave as an exercise the fact that every subspace V (= E of a vector space E, is the
intersection of all hyperplanes that contain V . We now consider the notion of transpose of
a linear map and of a matrix.

2.11 Transpose of a Linear Map and of a Matrix


Given a linear map f : E → F , it is possible to define a map f # : F ∗ → E ∗ which has some
interesting properties.
Definition 2.25 Given a linear map f : E → F , the transpose f # : F ∗ → E ∗ of f is the
linear map defined such that
f # (v ∗ ) = v ∗ ◦ f,
for every v ∗ ∈ F ∗ . Equivalently, the linear map f # : F ∗ → E ∗ is defined such that
3v ∗ , f (u)4 = 3f # (v ∗ ), u4,
for all u ∈ E and all v ∗ ∈ F ∗ .

It is easy to verify that the following properties hold:


(f # )# = f,
(f + g)# = f # + g#
(g ◦ f )# = f # ◦ g#
id#E = idE ∗ .
! Note the reversal of composition on the right-hand side of (g ◦ f )# = f # ◦ g # .
2.11. TRANSPOSE OF A LINEAR MAP AND OF A MATRIX 53

The following proposition shows the relationship between orthogonality and transposi-
tion.

Proposition 2.30 Given a linear map f : E → F , for any subspace U of E, we have

f (U)0 = (f # )−1 (U 0 ).

As a consequence, Ker f # = (Im f )0 and Ker f = (Im f # )0 .

Proof . We have
3v ∗ , f (u)4 = 3f # (v ∗ ), u4,
for all u ∈ E and all v ∗ ∈ F ∗ , and thus, we have 3v ∗ , f (u)4 = 0 for every u ∈ U, i.e.,
v ∗ ∈ f (U)0 , iff 3f # (v ∗ ), u4 = 0 for every u ∈ U, i.e., v ∗ ∈ (f # )−1 (U 0 ), proving that

f (U)0 = (f # )−1 (U 0 ).

Since we already observed that E 0 = 0, letting U = E in the above identity, we obtain


that
Ker f # = (Im f )0 .
The identity Ker f = (Im f # )0 holds because (f # )# = f .
The following theorem shows the relationship between the rank of f and the rank of f # .

Theorem 2.31 Given a linear map f : E → F , the following properties hold.

(a) The dual (Im f )∗ of Im f is isomorphic to Im f # = f # (F ∗ ).

(b) rk(f ) ≤ rk(f # ). If rk(f ) is finite, we have rk(f ) = rk(f # ).

Proof . (a) Consider the linear maps


p j
E −→ Im f −→ F,
p f j
where E −→ Im f is the surjective map induced by E −→ F , and Im f −→ F is the
injective inclusion map of Im f into F . By definition, f = j ◦ p. To simplify the notation,
p s j
let I = Im f . Since E −→ I is surjective, let I −→ E be a section of p, and since I −→ F
r
is injective, let F −→ I be a retraction of j. Then, we have p ◦ s = idI and r ◦ j = idI , and
since (p ◦ s)# = s# ◦ p# , (r ◦ j)# = j # ◦ r # , and id#I = idI ∗ , we have

s# ◦ p# = idI ∗ ,

and
j # ◦ r # = idI ∗ .
54 CHAPTER 2. LINEAR ALGEBRA

j# p#
Thus, F ∗ −→ I ∗ is surjective, and I ∗ −→ E ∗ is injective. Since f = j ◦ p, we also have

f # = (j ◦ p)# = p# ◦ j # ,

j# p#
and since F ∗ −→ I ∗ is surjective, and I ∗ −→ E ∗ is injective, we have an isomorphism
between (Im f )∗ and f # (F ∗ ).
(b) We already noted that part (a) of Theorem 2.26 shows that dim(E) ≤ dim(E ∗ ),
for every vector space E. Thus, dim(Im f ) ≤ dim((Im f )∗ ), which, by (a), shows that
rk(f ) ≤ rk(f # ). When dim(Im f ) is finite, we already observed that as a corollary of
Theorem 2.26, dim(Im f ) = dim((Im f )∗ ), and thus, we have rk(f ) = rk(f # ).
The following proposition shows the relationship between the matrix representing a linear
map f : E → F and the matrix representing its transpose f # : F ∗ → E ∗ .

Proposition 2.32 Let E and F be two vector spaces, and let (u1 , . . . , un ) be a basis for E,
and (v1 , . . . , vm ) be a basis for F . Given any linear map f : E → F , if M(f ) is the m × n-
matrix representing f w.r.t. the bases (u1 , . . . , un ) and (v1 , . . . , vm ), the n×m-matrix M(f # )
representing f # : F ∗ → E ∗ w.r.t. the dual bases (v1∗ , . . . , vm

) and (u∗1 , . . . , u∗n ) is the transpose
#
M(f ) of M(f ).

Proof . Recall that the entry ai j in row i and column j of M(f ) is the i-th coordinate of f (uj )
over the basis (v1 , . . . , vm ). By definition of vi∗ , we have 3vi∗ , f (uj )4 = ai j . The entry a#
j i in
row j and column i of M(f # ) is the j-th coordinate of f # (vi∗ ) over the basis (u∗1 , . . . , u∗n ),
which is just f # (vi∗ )(uj ) = 3f # (vi∗ ), uj 4. Since

3vi∗ , f (uj )4 = 3f # (vi∗ ), uj 4,

we have ai j = a# # #
j i , proving that M(f ) = M(f ) .

We now can give a very short proof of the fact that the rank of a matrix is equal to the
rank of its transpose.

Proposition 2.33 Given a m × n matrix A over a field K, we have rk(A) = rk(A# ).

Proof . The matrix A corresponds to a linear map f : K n → K m , and by Theorem 2.31,


rk(f ) = rk(f # ). By Proposition 2.32, the linear map f # corresponds to A# . Since rk(A) =
rk(f ), and rk(A# ) = rk(f # ), we conclude that rk(A) = rk(A# ).
Thus, given an m×n-matrix A, the maximum number of linearly independent columns is
equal to the maximum number of linearly independent rows. There are other ways of proving
this fact that do not involve the dual space, but instead some elementary transformations
on rows and columns.
Chapter 3

Determinants

3.1 Permutations, Signature of a Permutation


This chapter contains a review of determinants and their use in linear algebra. We begin
with permutations and the signature of a permutation. Next, we define multilinear maps
and alternating multilinear maps. Determinants are introduced as alternating multilinear
maps taking the value 1 on the unit matrix (following Emil Artin). It is then shown how to
compute a determinant using the Laplace expansion formula, and the connection with the
usual definition is made. It is shown how determinants can be used to invert matrices and
to solve (at least in theory!) systems of linear equations (the Cramer formulae). Finally, the
determinant of a linear map is defined.
Determinants can be defined in several ways. For example, determinants can be defined
in a fancy way in terms of the exterior algebra (or alternating algebra) of a vector space.
We will follow a more algorithmic approach due to Emil Artin. No matter which approach
is followed, we need a few preliminaries about permutations on a finite set. We need to
show that every permutation on n elements is a product of transpositions, and that the
parity of the number of transpositions involved is an invariant of the permutation. Let
In = {1, 2 . . . , n}, where n ∈ N, and n > 0.

Definition 3.1 A permutation on n elements is a bijection π: In → In . When n = 1, the


only function from {1} to {1} is the constant map 1 1→ 1. Thus, we will assume that n ≥ 2.
A transposition is a permutation τ : In → In such that, for some i < j (with 1 ≤ i < j ≤ n),
τ (i) = j, τ (j) = i, and τ (k) = k, for all k ∈ In − {i, j}. In other words, a transposition
exchanges two distinct elements i, j ∈ In . A cyclic permutation of order k (or k-cycle) is a
permutation σ: In → In such that, for some i1 , i2 , . . . , ik , with 1 ≤ i1 < i2 < . . . < ik ≤ n,
and k ≥ 2,
σ(i1 ) = i2 , . . . , σ(ik−1 ) = ik , σ(ik ) = i1 ,
and σ(j) = j, for j ∈ In − {i1 , . . . , ik }. The set {i1 , . . . , ik } is called the domain of the cyclic
permutation, and the cyclic permutation is sometimes denoted by (i1 , i2 , . . . , ik ).

55
56 CHAPTER 3. DETERMINANTS

If τ is a transposition, clearly, τ ◦ τ = id. Also, a cyclic permutation of order 2 is a


transposition, and for a cyclic permutation σ of order k, we have σ k = id. The following
proposition shows the importance of cyclic permutations and transpositions. We will also use
the terminology product of permutations (or transpositions), as a synonym for composition
of permutations.

Proposition 3.1 For every n ≥ 2, for every permutation π: In → In , there is a partition of


In into r subsets, with 1 ≤ r ≤ n, where each set J in this partition is either a singleton {i},
or it is of the form
J = {i, π(i), π 2 (i), . . . , π ri −1 (i)},
where ri is the smallest integer, such that, π ri (i) = i and 2 ≤ ri ≤ n. If π is not the identity,
then it can be written in a unique way as a composition π = σ1 ◦. . .◦σs of cyclic permutations
(where 1 ≤ s ≤ r). Every permutation π: In → In can be written as a nonempty composition
of transpositions.

Proof . Consider the relation Rπ defined on In as follows: iRπ j iff there is some k ≥ 1 such
that j = π k (i). We claim that Rπ is an equivalence relation. Transitivity is obvious. We
claim that for every i ∈ In , there is some least r (1 ≤ r ≤ n) such that π r (i) = i. Indeed,
consider the following sequence of n + 1 elements:

3i, π(i), π 2 (i), . . . , π n (i)4.

Since In only has n distinct elements, there are some h, k with 0 ≤ h < k ≤ n such that

π h (i) = π k (i),

and since π is a bijection, this implies π k−h (i) = i, where 0 ≤ k − h ≤ n. Thus, Rπ is


reflexive. It is symmetric, since if j = π k (i), letting r be the least r ≥ 1 such that π r (i) = i,
then
i = π kr (i) = π k(r−1) (π k (i)) = π k(r−1) (j).
Now, for every i ∈ In , the equivalence class of i is a subset of In , either the singleton {i} or
a set of the form
J = {i, π(i), π 2 (i), . . . , π ri −1 (i)},
where ri is the smallest integer such that π ri (i) = i and 2 ≤ ri ≤ n, and in the second case,
the restriction of π to J induces a cyclic permutation σi , and π = σ1 ◦ . . . ◦ σs , where s is the
number of equivalence classes having at least two elements.
For the second part of the proposition, we proceed by induction on n. If n = 2, there are
exactly two permutations on {1, 2}, the transposition τ exchanging 1 and 2, and the identity.
However, id2 = τ 2 . Now, let n ≥ 3. If π(n) = n, since by the induction hypothesis, the
restriction of π to {1, . . . , n − 1} can be written as a product of transpositions, π itself can be
written as a product of transpositions. If π(n) = k (= n, letting τ be the transposition such
3.1. PERMUTATIONS, SIGNATURE OF A PERMUTATION 57

that τ (n) = k and τ (k) = n, it is clear that τ ◦ π leaves n invariant, and by the induction
hypothesis, we have τ ◦ π = τm ◦ . . . ◦ τ1 for some transpositions, and thus

π = τ ◦ τm ◦ . . . ◦ τ1 ,

a product of transpositions (since τ ◦ τ = idn ).

Remark: When π = idn is the identity permutation, we can agree that the composition of
0 transpositions is the identity. The second part of Proposition 3.1 shows that the transpo-
sitions generate the set of all permutations.
In writing a permutation π as a composition π = σ1 ◦ . . . ◦ σs of cyclic permutations, it
is clear that the order of the σi does not matter, since their domains are disjoint. Given
a permutation written as a product of transpositions, we now show that the parity of the
number of transpositions is an invariant.

Definition 3.2 For every n ≥ 2, since every permutation π: In → In defines a partition of r


subsets over which π acts either as the identity or as a cyclic permutation, let -(π), called the
signature of π, be defined by -(π) = (−1)n−r , where r is the number of sets in the partition.

If τ is a transposition exchanging i and j, it is clear that the partition associated with


τ consists of n − 1 equivalence classes, the set {i, j}, and the n − 2 singleton sets {k}, for
k ∈ In − {i, j}, and thus, -(τ ) = (−1)n−(n−1) = (−1)1 = −1.

Proposition 3.2 For every n ≥ 2, for every permutation π: In → In , for every transposition
τ , we have
-(τ ◦ π) = −-(π).
Consequently, for every product of transpositions such that π = τm ◦ . . . ◦ τ1 , we have

-(π) = (−1)m ,

which shows that the parity of the number of transpositions is an invariant.

Proof . Assume that τ (i) = j and τ (j) = i, where i < j. There are two cases, depending
whether i and j are in the same equivalence class Jl of Rπ , or if they are in distinct equivalence
classes. If i and j are in the same class Jl , then if

Jl = {i1 , . . . , ip , . . . iq , . . . ik },

where ip = i and iq = j, since

τ (π(π −1 (ip ))) = τ (ip ) = τ (i) = j = iq

and
τ (π(iq−1 )) = τ (iq ) = τ (j) = i = ip ,
58 CHAPTER 3. DETERMINANTS

it is clear that Jl splits into two subsets, one of which is {ip , . . . , iq−1 }, and thus, the number
of classes associated with τ ◦ π is r + 1, and -(τ ◦ π) = (−1)n−r−1 = −(−1)n−r = −-(π). If i
and j are in distinct equivalence classes Jl and Jm , say

{i1 , . . . , ip , . . . ih }

and
{j1 , . . . , jq , . . . jk },
where ip = i and jq = j, since

τ (π(π −1 (ip ))) = τ (ip ) = τ (i) = j = jq

and
τ (π(π −1 (jq ))) = τ (jq ) = τ (j) = i = ip ,
we see that the classes Jl and Jm merge into a single class, and thus, the number of classes
associated with τ ◦ π is r − 1, and -(τ ◦ π) = (−1)n−r+1 = −(−1)n−r = −-(π).
Now, let π = τm ◦ . . . ◦ τ1 be any product of transpositions. By the first part of the
proposition, we have

-(π) = (−1)m−1 -(τ1 ) = (−1)m−1 (−1) = (−1)m ,

since -(τ1 ) = −1 for a transposition.

Remark: When π = idn is the identity permutation, since we agreed that the composition
of 0 transpositions is the identity, it it still correct that (−1)0 = -(id) = +1. From the
proposition, it is immediate that -(π & ◦ π) = -(π & )-(π). In particular, since π −1 ◦ π = idn , we
get -(π −1 ) = -(π).
We can now proceed with the definition of determinants.

3.2 Alternating Multilinear Maps


First, we define multilinear maps, symmetric multilinear maps, and alternate multilinear
maps.

Remark: Most of the definitions and results presented in this section also hold when K is
a commutative ring, and when we consider modules over K (free modules, when bases are
needed).
Let E1 , . . . , En , and F , be vector spaces over a field K, where n ≥ 1.
3.2. ALTERNATING MULTILINEAR MAPS 59

Definition 3.3 A function f : E1 ×. . .×En → F is a multilinear map (or an n-linear map) if


it is linear in each argument, holding the others fixed. More explicitly, for every i, 1 ≤ i ≤ n,
for all x1 ∈ E1 . . ., xi−1 ∈ Ei−1 , xi+1 ∈ Ei+1 , . . ., xn ∈ En , for all x, y ∈ Ei , for all λ ∈ K,

f (x1 , . . . , xi−1 , x + y, xi+1 , . . . , xn ) =


f (x1 , . . . , xi−1 , x, xi+1 , . . . , xn ) + f (x1 , . . . , xi−1 , y, xi+1, . . . , xn ),
f (x1 , . . . , xi−1 , λx, xi+1 , . . . , xn ) = λf (x1 , . . . , xi−1 , x, xi+1 , . . . , xn ).

When F = K, we call f an n-linear form (or multilinear form). If n ≥ 2 and E1 =


E2 = . . . = En , an n-linear map f : E × . . . × E → F is called symmetric, if f (x1 , . . . , xn ) =
f (xπ(1) , . . . , xπ(n) ), for every permutation π on {1, . . . , n}. An n-linear map f : E×. . .×E → F
is called alternating, if f (x1 , . . . , xn ) = 0 whenever xi = xi+1 , for some i, 1 ≤ i ≤ n − 1 (in
other words, when two adjacent arguments are equal). It does not harm to agree that when
n = 1, a linear map is considered to be both symmetric and alternating, and we will do so.

When n = 2, a 2-linear map f : E1 × E2 → F is called a bilinear map. We have already


seen several examples of bilinear maps. Multiplication ·: K × K → K is a bilinear map,
treating K as a vector space over itself. More generally, multiplication ·: A × A → A in a
ring A is a bilinear map, viewing A as a module over itself.
The operation 3−, −4: E ∗ × E → K applying a linear form to a vector is a bilinear map.
Symmetric bilinear maps (and multilinear maps) play an important role in geometry
(inner products, quadratic forms), and in differential calculus (partial derivatives).
A bilinear map is symmetric if f (u, v) = f (v, u), for all u, v ∈ E.
Alternating multilinear maps satisfy the following simple but crucial properties.

Proposition 3.3 Let f : E × . . . × E → F be an n-linear alternating map, with n ≥ 2. The


following properties hold:

(1)
f (. . . , xi , xi+1 , . . .) = −f (. . . , xi+1 , xi , . . .)

(2)
f (. . . , xi , . . . , xj , . . .) = 0,
where xi = xj , and 1 ≤ i < j ≤ n.

(3)
f (. . . , xi , . . . , xj , . . .) = −f (. . . , xj , . . . , xi , . . .),
where 1 ≤ i < j ≤ n.
60 CHAPTER 3. DETERMINANTS

(4)
f (. . . , xi , . . .) = f (. . . , xi + λxj , . . .),
for any λ ∈ K, and where i (= j.

Proof . (1) By multilinearity applied twice, we have

f (. . . , xi + xi+1 , xi + xi+1 , . . .) = f (. . . , xi , xi , . . .) + f (. . . , xi , xi+1 , . . .)


+f (. . . , xi+1 , xi , . . .) + f (. . . , xi+1 , xi+1 , . . .),

and since f is alternating, this yields

0 = f (. . . , xi , xi+1 , . . .) + f (. . . , xi+1 , xi , . . .),

that is, f (. . . , xi , xi+1 , . . .) = −f (. . . , xi+1 , xi , . . .).


(2) If xi = xj and i and j are not adjacent, we can interchange xi and xi+1 , and then xi
and xi+2 , etc, until xi and xj become adjacent. By (1),

f (. . . , xi , . . . , xj , . . .) = -f (. . . , xi , xj , . . .),

where - = +1 or −1, but f (. . . , xi , xj , . . .) = 0, since xi = xj , and (2) holds.


(3) follows from (2) as in (1). (4) is an immediate consequence of (2).
Proposition 3.3 will now be used to show a fundamental property of alternating linear
maps. First, we need to extend the matrix notation a little bit. Let E be a vector space
over K. Given an n × n matrix A = (ai j ) over K, we can define a map L(A): E n → E n as
follows:

L(A)1 (u1 ) = a1 1 u1 + · · · + a1 n un ,
...
L(A)n (un ) = an 1 u1 + · · · + an n un ,

for all u1 , . . . , un ∈ E. It is immediately verified that L(A) is linear. Then, given two n × n
matrice A = (ai j ) and B = (bi j ), by repeating the calculations establishing the product of
matrices (just before Definition 2.12), we can show that

L(AB) = L(A) ◦ L(B).

It is then convenient to use the matrix notation to describe the effect of the linear map L(A),
as    u 
L(A)1 (u1 ) a1 1 a1 2 . . . a1 n 1
 L(A)2 (u2 )   a  
   2 1 a2 2 . . . a2 n    u2 
 ..  = . . .. ..   .. 
.  . .
 .   .. .. . 
L(A)n (un ) an 1 an 2 . . . an n un
3.2. ALTERNATING MULTILINEAR MAPS 61

Lemma 3.4 Let f : E × . . . × E → F be an n-linear alternating map. Let (u1 , . . . , un ) and


(v1 , . . . , vn ) be two families of n vectors, such that,

v1 = a1 1 u1 + · · · + an 1 un ,
...
vn = a1 n u1 + · · · + an n un .

Equivalently, letting  
a1 1 a1 2 ... a1 n
 a2 1 a2 2 ... a2 n 
A=
 ... .. .. .. 
. . . 
an 1 an 2 ... an n
assume that we have
   
v1 u1
v  u 
 2  2
 ..  = A#  ..  .
 .   . 
vn un

Then, 2% 3
f (v1 , . . . , vn ) = -(π)aπ(1) 1 · · · aπ(n) n f (u1 , . . . , un ),
π

where the sum ranges over all permutations π on {1, . . . , n}.

Proof . Expanding f (v1 , . . . , vn ) by multilinearity, we get a sum of terms of the form

aπ(1) 1 · · · aπ(n) n f (uπ(1) , . . . , uπ(n) ),

for all possible functions π: {1, . . . , n} → {1, . . . , n}. However, because f is alternating, only
the terms for which π is a permutation are nonzero. By Proposition 3.1, every permutation
π is a product of transpositions, and by Proposition 3.2, the parity -(π) of the number of
transpositions only depends on π. Then, applying Proposition 3.3 (3) to each transposition
in π, we get

aπ(1) 1 · · · aπ(n) n f (uπ(1) , . . . , uπ(n) ) = -(π)aπ(1) 1 · · · aπ(n) n f (u1 , . . . , un ).

Thus, we get the expression of the lemma.


The quantity %
det(A) = -(π)aπ(1) 1 · · · aπ(n) n
π

is in fact the value of the determinant of A (which, as we shall see shortly, is also equal to the
determinant of A# ). However, working directly with the above definition is quite ackward,
and we will proceed via a slightly indirect route
62 CHAPTER 3. DETERMINANTS

3.3 Definition of a Determinant


Recall that the set of all square n × n-matrices with coefficients in a field K is denoted by
Mn (K).

Definition 3.4 A determinant is defined as any map

D: Mn (K) → K,

which, when viewed as a map on (K n )n , i.e., a map of the n columns of a matrix, is n-linear
alternating and such that D(In ) = 1 for the identity matrix In . Equivalently, we can consider
a vector space E of dimension n, some fixed basis (e1 , . . . , en ), and define

D: E n → K

as an n-linear alternating map such that D(e1 , . . . , en ) = 1.

First, we will show that such maps D exist, using an inductive definition that also gives
a recursive method for computing determinants. Actually, we will define a family (Dn )n≥1
of (finite) sets of maps D: Mn (K) → K. Second, we will show that determinants are in fact
uniquely defined, that is, we will show that each Dn consists of a single map. This will show
the equivalence of the direct definition det(A) of Lemma 3.4 with the inductive definition
D(A). Finally, we will prove some basic properties of determinants, using the uniqueness
theorem. Given a matrix A ∈ Mn (K), we denote its n columns by A1 , . . . , An .

Definition 3.5 For every n ≥ 1, we define a finite set Dn of maps D: Mn (K) → K induc-
tively as follows:
When n = 1, D1 consists of the single map D such that, D(A) = a, where A = (a), with
a ∈ K.
Assume that Dn−1 has been defined, where n ≥ 2. We define the set Dn as follows. For
every matrix A ∈ Mn (K), let Ai j be the (n − 1) × (n − 1)-matrix obtained from A = (ai j )
by deleting row i and column j. Then, Dn consists of all the maps D such that, for some i,
1 ≤ i ≤ n,
D(A) = (−1)i+1 ai 1 D(Ai 1 ) + · · · + (−1)i+n ai n D(Ai n ),
where for every j, 1 ≤ j ≤ n, D(Ai j ) is the result of applying any D in Dn−1 to Ai j .
! We confess that the use of the same letter D for the member of Dn being defined, and
for members of Dn−1 , may be slightly confusing. We considered using subscripts to
distinguish, but this seems to complicate things unnecessarily. One should not worry too
much anyway, since it will turn out that each Dn contains just one map.
3.3. DEFINITION OF A DETERMINANT 63

Each (−1)i+j D(Ai j ) is called the cofactor of ai j , and the inductive expression for D(A)
is called a Laplace expansion of D according to the i-th row . Given a matrix A ∈ Mn (K),
each D(A) is called a determinant of A.

We will prove shortly that D(A) is uniquely defined (at the momement, it is not clear
that Dn consists of a single map). Assuming this fact, given a n × n-matrix A = (ai j ),
 
a1 1 a1 2 . . . a1 n
 a2 1 a2 2 . . . a2 n 
A=  ... .. .. .. 
. . . 
an 1 an 2 . . . an n

its determinant is denoted by D(A) or det(A), or more explicitly by


6 6
6 a1 1 a1 2 . . . a1 n 6
6 6
6 a2 1 a2 2 . . . a2 n 6
det(A) = 66 .. .. .. .. 66
6 . . . . 6
6a an 2 . . . an n 6
n1

First, let us first consider some examples.


Example 3.1
1. When n = 2, if ' (
a b
A=
c d
expanding according to any row, we have

D(A) = ad − bc.

2. When n = 3, if  
a1 1 a1 2 a1 3

A = a2 1 a2 2 a2 3 
a3 1 a3 2 a3 3
expanding according to the first row, we have
6 6 6 6 6 6
6 a2 2 a2 3 6 6 a2 3 66 6a a2 2 66
D(A) = a1 1 66 6 − a1 2 6 a2 1 + a1 3 66 2 1
a3 2 a3 3 6 6 a3 1 a3 3 6 a3 1 a3 2 6
that is,
D(A) = a1 1 (a2 2 a3 3 − a3 2 a2 3 ) − a1 2 (a2 1 a3 3 − a3 1 a2 3 ) + a1 3 (a2 1 a3 2 − a3 1 a2 2 ),
which gives the explicit formula
D(A) = a1 1 a2 2 a3 3 + a2 1 a3 2 a1 3 + a3 1 a1 2 a2 3 − a1 1 a3 2 a2 3 − a2 1 a1 2 a3 3 − a3 1 a2 2 a1 3 .
64 CHAPTER 3. DETERMINANTS

We now show that each D ∈ Dn is a determinant (map).

Lemma 3.5 For every n ≥ 1, for every D ∈ Dn as defined in Definition 3.5, D is an


alternate linear map such that D(In ) = 1.

Proof . By induction on n, it is obvious that D(In ) = 1. Let us now prove that D is


multilinear. Let us show that D is linear in each column. Consider any column k. Since
D(A) = (−1)i+1 ai 1 D(Ai 1 ) + · · · + (−1)i+j ai j D(Ai j ) + · · · + (−1)i+n ai n D(Ai n ),
if j (= k, then by induction, D(Ai j ) is linear in column k, and ai j does not belong to column
k, so (−1)i+j ai j D(Ai j ) is linear in column k. If j = k, then D(Ai j ) does not depend on
column k = j, since Ai j is obtained from A by deleting row i and column j = k, and ai j
belongs to column j = k. Thus, (−1)i+j ai j D(Ai j ) is linear in column k. Consequently, in
all cases, (−1)i+j ai j D(Ai j ) is linear in column k, and thus, D(A) is linear in column k.
Let us now prove that D is alternating. Assume that two adjacent rows of A are equal,
say Ak = Ak+1 . First, let j (= k and j (= k + 1. Then, the matrix Ai j has two identical
adjacent columns, and by the induction hypothesis, D(Ai j ) = 0. The remaining terms of
D(A) are
(−1)i+k ai k D(Ai k ) + (−1)i+k+1 ai k+1 D(Ai k+1).
However, the two matrices Ai k and Ai k+1 are equal, since we are assuming that columns k
and k + 1 of A are identical, and since Ai k is obtained from A by deleting row i and column
k, and Ai k+1 is obtained from A by deleting row i and column k + 1. Similarly, ai k = ai k+1 ,
since columns k and k + 1 of A are equal. But then,
(−1)i+k ai k D(Ai k ) + (−1)i+k+1 ai k+1 D(Ai k+1) = (−1)i+k ai k D(Ai k ) − (−1)i+k ai k D(Ai k ) = 0.
This shows that D is alternating, and completes the proof.
Lemma 3.5 shows the existence of determinants. We now prove their uniqueness.

Theorem 3.6 For every n ≥ 1, for every D ∈ Dn , for every matrix A ∈ Mn (K), we have
%
D(A) = -(π)aπ(1) 1 · · · aπ(n) n ,
π

where the sum ranges over all permutations π on {1, . . . , n}. As a consequence, Dn consists
of a single map for every n ≥ 1, and this map is given by the above explicit formula.

Proof . Consider the standard basis (e1 , . . . , en ) of K n , where (ei )i = 1 and (ei )j = 0, for
j (= i. Then, each column Aj of A corresponds to a vector vj whose coordinates over the
basis (e1 , . . . , en ) are the components of Aj , that is, we can write
v1 = a1 1 e1 + · · · + an 1 en ,
...
vn = a1 n e1 + · · · + an n en .
3.3. DEFINITION OF A DETERMINANT 65

Since by Lemma 3.5, each D is a multilinear alternating map, by applying Lemma 3.4, we
get 2% 3
D(A) = D(v1 , . . . , vn ) = -(π)aπ(1) 1 · · · aπ(n) n D(e1 , . . . , en ),
π

where the sum ranges over all permutations π on {1, . . . , n}. But D(e1 , . . . , en ) = D(In ),
and by Lemma 3.5, we have D(In ) = 1. Thus,
%
D(A) = -(π)aπ(1) 1 · · · aπ(n) n ,
π

where the sum ranges over all permutations π on {1, . . . , n}.


We can now prove some properties of determinants.

Corollary 3.7 For every matrix A ∈ Mn (K), we have D(A) = D(A# ).

Proof . By Theorem 3.6, we have


%
D(A) = -(π)aπ(1) 1 · · · aπ(n) n ,
π

where the sum ranges over all permutations π on {1, . . . , n}. Since a permutation is invertible,
every product
aπ(1) 1 · · · aπ(n) n
can be rewritten as
a1 π−1 (1) · · · an π−1 (n) ,
and since -(π −1 ) = -(π) and the sum is taken over all permutations on {1, . . . , n}, we have
% %
-(π)aπ(1) 1 · · · aπ(n) n = -(σ)a1 σ(1) · · · an σ(n) ,
π σ

where π and σ range over all permutations. But it is immediately verified that
%
D(A# ) = -(σ)a1 σ(1) · · · an σ(n) .
σ

A useful consequence of Corollary 3.7 is that the determinant of a matrix is also a multi-
linear alternating map of its rows. This fact, combined with the fact that the determinant of
a matrix is a multilinear alternating map of its columns is often useful for finding short-cuts
in computing determinants. We illustrate this point on the following example which shows
up in polynomial interpolation.
66 CHAPTER 3. DETERMINANTS

Example 3.2 Consider the so-called Vandermonde determinant


6 6
6 1 1 ... 1 6
6 6
6 x1 x2 . . . xn 6
6 2 6
V (x1 , . . . , xn ) = 66 x1 x22 . . . x2n 6 .
6 ... .. .. .. 66
6 . . . 6
6 xn−1 xn−1 . . . xn−1 6
1 2 n

We claim that 7
V (x1 , . . . , xn ) = (xj − xi ),
1≤i<j≤n

with V (x1 , . . . , xn ) = 1, when n = 1. We prove it by induction on n ≥ 1. The case n = 1 is


obvious. Assume n ≥ 2. We proceed as follows: multiply row n − 1 by x1 and substract it
from row n (the last row), then multiply row n − 2 by x1 and subtract it from row n − 1,
etc, multiply row i − 1 by x1 and subtract it from row i, until we reach row 1. We obtain
the following determinant:
6 6
61 1 ... 1 6
6 6
60 x2 − x1 ... xn − x1 6
6 6
V (x1 , . . . , xn ) = 66 0 x2 (x2 − x1 ) . . . xn (xn − x1 ) 66
6 ... ..
.
..
.
..
. 6
6 6
6 0 xn−2 (x − x ) . . . xn−2 (x − x ) 6
2 2 1 n n 1

Now, expanding this determinant according to the first column and using multilinearity,
we can factor (xi − x1 ) from the column of index i − 1 of the matrix obtained by deleting
the first row and the first column, and thus
V (x1 , . . . , xn ) = (x2 − x1 )(x3 − x1 ) · · · (xn − x1 )V (x2 , . . . , xn ),
which establishes the induction step.

Lemma 3.4 can be reformulated nicely as follows.


Proposition 3.8 Let f : E × . . . × E → F be an n-linear alternating map. Let (u1 , . . . , un )
and (v1 , . . . , vn ) be two families of n vectors, such that
v1 = a1 1 u1 + · · · + a1 n un ,
...
vn = an 1 u1 + · · · + an n un .
Equivalently, letting  
a1 1 a1 2 ... a1 n
 a2 1 a2 2 ... a2 n 
A=
 ... .. .. .. 
. . . 
an 1 an 2 ... an n
3.3. DEFINITION OF A DETERMINANT 67

assume that we have


   
v1 u1
v  u 
 2  2
 ..  = A  ..  .
 .   . 
vn un
Then,
f (v1 , . . . , vn ) = D(A)f (u1 , . . . , un ).

Proof . The only difference with Lemma 3.4 is that here, we are using A# instead of A. Thus,
by Lemma 3.4 and Corollary 3.7, we get the desired result.
As a consequence, we get the very useful property that the determinant of a product of
matrices is the product of the determinants of these matrices.

Proposition 3.9 For any two n × n-matrices A and B, we have D(AB) = D(A)D(B).

Proof . We use Proposition 3.8 as follows: let (e1 , . . . , en ) be the standard basis of K n , and
let    
w1 e1
w  e 
 2  2
 ..  = AB  ..  .
 .   . 
wn en

Then, we get
D(w1 , . . . , wn ) = D(AB)D(e1 , . . . , en ) = D(AB),
since D(e1 , . . . , en ) = 1. Now, letting
   
v1 e1
v  e 
 2  2
 ..  = B  ..  ,
 .   . 
vn en

we get
D(v1 , . . . , vn ) = D(B),
and since    
w1 v1
w  v 
 2  2
 ..  = A  ..  ,
 .   . 
wn vn
we get
D(w1 , . . . , wn ) = D(A)D(v1 , . . . , vn ) = D(A)D(B).
68 CHAPTER 3. DETERMINANTS

It should be noted that all the results of this section, up to now, also holds when K is a
commutative ring, and not necessarily a field. We can now characterize when an n×n-matrix
A is invertible in terms of its determinant D(A).

3.4 Inverse Matrices and Determinants


In the next two sections, K is a commutative ring and when needed, a field.

8 = (bi j )
Definition 3.6 Let K be a commutative ring. Given a matrix A ∈ Mn (K), let A
be the matrix defined such that

bi j = (−1)i+j D(Aj i ),

the cofactor of aj i . The matrix Aj i is called a minor of the matrix A.

! Note the reversal of the indices in

bi j = (−1)i+j D(Aj i ).

8 is the transpose of the matrix of cofactors of elements of A.


Thus, A

We have the following proposition.

Proposition 3.10 Let K be a commutative ring. For every matrix A ∈ Mn (K), we have

8 = AA
AA 8 = det(A)In .

8
As a consequence, A is invertible iff det(A) is invertible, and if so, A−1 = (det(A))−1 A.

8 = (bi j ) and AA
Proof . If A 8 = (ci j ), we know that the entry ci j in row i and column j of
AA8 is
ci j = ai 1 b1 j + · · · + ai k bk j + · · · + ai n bn j ,
which is equal to
ai 1 (−1)j+1 D(Aj 1 ) + · · · + ai n (−1)j+n D(Aj n ).
If j = i, then we recognize the expression of the expansion of det(A) according to the i-th
row:
ci i = det(A) = ai 1 (−1)i+1 D(Ai 1 ) + · · · + ai n (−1)i+n D(Ai n ).
If j (= i, we can form the matrix A& by replacing the j-th row of A by the i-th row of A.
Now, the matrix Aj k obtained by deleting row j and column k from A is equal to the matrix
3.4. INVERSE MATRICES AND DETERMINANTS 69

A&j k obtained by deleting row j and column k from A& , since A and A& only differ by the j-th
row. Thus,
D(Aj k ) = D(A&j k ),
and we have
ci j = ai 1 (−1)j+1D(A&j 1 ) + · · · + ai n (−1)j+n D(A&j n ).
However, this is the expansion of D(A& ) according to the j-th row, since the j-th row of A&
is equal to the i-th row of A, and since A& has two identical rows i and j, because D is an
alternating map of the rows (see an earlier remark), we have D(A& ) = 0. Thus, we have
shown that ci i = det(A), and ci j = 0, when j (= i, and so

8 = det(A)In .
AA

8 that
It is also obvious from the definition of A,

9# .
8# = A
A

Then, applying the first part of the argument to A# , we have

9# = det(A# )In ,
A# A

and since, det(A# ) = det(A), A 9# , and (AA)


8# = A 8 # = A# A
8# , we get

9# = A# A
det(A)In = A# A 8# = (AA)
8 #,

that is,
8 # = det(A)In ,
(AA)
which yields
8 = det(A)In ,
AA
since In# = In . This proves that

8 = AA
AA 8 = det(A)In .

8 Conversely, if A is
As a consequence, if det(A) is invertible, we have A−1 = (det(A))−1 A.
−1 −1
invertible, from AA = In , by Proposition 3.9, we have det(A) det(A ) = 1, and det(A) is
invertible.
When K is a field, an element a ∈ K is invertible iff a (= 0. In this case, the second part
of the proposition can be stated as A is invertible iff det(A) (= 0. Note in passing that this
method of computing the inverse of a matrix is usually not practical.
We now consider some applications of determinants to linear independence and to solving
systems of linear equations. Although these results hold for matrices over an integral domain,
70 CHAPTER 3. DETERMINANTS

their proofs require more sophisticated methods (it is necessary to use the fraction field of
the integral domain, K). Therefore, we assume again that K is a field.
Let A be an n×n-matrix, X a column vectors of variables, and B another column vector,
and let A1 , . . . , An denote the columns of A. Observe that the system of equation AX = B,
    
a1 1 a1 2 . . . a1 n x1 b1
 a2 1 a2 2 . . . a2 n   x2   b2 
 . .. .. ..   . = . 
 .. . . .   ..   .. 
an 1 an 2 . . . an n xn bn

is equivalent to
x1 A1 + · · · + xj Aj + · · · + xn An = B,
since the equation corresponding to the i-th row is in both cases

ai 1 x1 + · · · + ai j xj + · · · + ai n xn = bi .

First, we characterize linear independence of the column vectors of a matrix A in terms


of its determinant.

Proposition 3.11 Given an n×n-matrix A over a field K, the columns A1 , . . . , An of A are


linearly dependent iff D(A) = D(A1 , . . . , An ) = 0. Equivalently, A has rank n iff D(A) (= 0.

Proof . First, assume that the columns A1 , . . . , An of A are linearly dependent. Then, there
are x1 , . . . , xn ∈ K, such that

x1 A1 + · · · + xj Aj + · · · + xn An = 0,

where xj (= 0 for some j. If we compute

D(A1 , . . . , x1 A1 + · · · + xj Aj + · · · + xn An , . . . , An ) = D(A1 , . . . , 0, . . . , An ) = 0,

where 0 occurs in the j-th position, by multilinearity, all terms containing two identical
columns Ak for k (= j vanish, and we get

xj D(A1 , . . . , An ) = 0

for every j, 1 ≤ j ≤ n. Since xj (= 0 for some j, and K is a field, we must have


D(A1 , . . . , An ) = 0.
Conversely, we show that if the columns A1 , . . . , An of A are linearly independent, then
D(A1 , . . . , An ) (= 0. If the columns A1 , . . . , An of A are linearly independent, then they
3.5. SYSTEMS OF LINEAR EQUATIONS AND DETERMINANT 71

form a basis of K n , and we can express the standard basis (e1 , . . . , en ) of K n in terms of
A1 , . . . , An . Thus, we have
    1 
e1 b1 1 b1 2 . . . b1 n A
e  b b2 2 . . . b2 n   A2 
 2   21
 ..  =  .. .. .. ..   . ,
 .  . . . .   .. 
en bn1 bn 2 . . . bn n An

for some matrix B = (bi j ), and by Proposition 3.8, we get

D(e1 , . . . , en ) = D(B)D(A1 , . . . , An ),

and since D(e1 , . . . , en ) = 1, this implies that D(A1 , . . . , An ) (= 0 (and D(B) (= 0). For the
second assertion, recall that the rank of a matrix is equal to the maximum number of linearly
independent columns, and the conclusion is clear.

3.5 Systems of Linear Equations and Determinant


We now characterize when a system of linear equations of the form AX = B has a unique
solution.

Proposition 3.12 Given an n × n-matrix A over a field K, the following properties hold:
(1) For every column vector B, there is a unique column vector X such that AX = B iff
the only solution to AX = 0 is the trivial vector X = 0, iff D(A) (= 0.

(2) If D(A) (= 0, the unique solution of AX = B is given by the expressions

D(A1 , . . . , Aj−1 , B, Aj+1, . . . , An )


Xj = ,
D(A1 , . . . , Aj−1, Aj , Aj+1, . . . , An )
known as Cramer’s rules.

(3) The system of linear equations AX = 0 has a nonzero solution iff D(A) = 0.

Proof . Assume that AX = B has a single solution X0 , and assume that AY = 0 with Y (= 0.
Then,
A(X0 + Y ) = AX0 + AY = AX0 + 0 = B,
and X0 + Y (= X0 is another solution of AX = B, contadicting the hypothesis that AX = B
has a single solution X0 . Thus, AX = 0 only has the trivial solution. Now, assume that
AX = 0 only has the trivial solution. This means that the columns A1 , . . . , An of A are
linearly independent, and by Proposition 3.11, we have D(A) (= 0. Finally, if D(A) (= 0,
by Proposition 3.10, this means that A is invertible, and then, for every B, AX = B is
equivalent to X = A−1 B, which shows that AX = B has a single solution.
72 CHAPTER 3. DETERMINANTS

(2) Assume that AX = B. If we compute


D(A1 , . . . , x1 A1 + · · · + xj Aj + · · · + xn An , . . . , An ) = D(A1 , . . . , B, . . . , An ),
where B occurs in the j-th position, by multilinearity, all terms containing two identical
columns Ak for k (= j vanish, and we get
xj D(A1 , . . . , An ) = D(A1 , . . . , Aj−1, B, Aj+1 , . . . , An ),
for every j, 1 ≤ j ≤ n. Since we assumed that D(A) = D(A1 , . . . , An ) (= 0, we get the
desired expression.
(3) Note that AX = 0 has a nonzero solution iff A1 , . . . , An are linearly dependent (as
observed in the proof of Proposition 3.11), which, by Proposition 3.11, is equivalent to
D(A) = 0.
As pleasing as Cramer’s rules are, it is usually impractical to solve systems of linear
equations using the above expressions.

3.6 Determinant of a Linear Map


We close this chapter with the notion of determinant of a linear map f : E → E.
Given a vector space E of finite dimension n, given a basis (u1 , . . . , un ) of E, for every
linear map f : E → E, if M(f ) is the matrix of f w.r.t. the basis (u1 , . . . , un ), we can
define det(f ) = det(M(f )). If (v1 , . . . , vn ) is any other basis of E, and if P is the change
of basis matrix, by Corollary 2.15, the matrix of f with respect to the basis (v1 , . . . , vn ) is
P −1 M(f )P . Now, by proposition 3.9, we have
det(P −1 M(f )P ) = det(P −1 ) det(M(f )) det(P ) = det(P −1 ) det(P ) det(M(f )) = det(M(f )).
Thus, det(f ) is indeed independent of the basis of E.
Definition 3.7 Given a vector space E of finite dimension, for any linear map f : E → E,
we define the determinant det(f ) of f as the determinant det(M(f )) of the matrix of f in
any basis (since, from the discussion just before this definition, this determinant does not
depend on the basis).

Then, we have the following proposition.


Proposition 3.13 Given any vector space E of finite dimension n, a linear map f : E → E
is invertible iff det(f ) (= 0.
Proof . The linear map f : E → E is invertible iff its matrix M(f ) in any basis is invertible
(by Proposition 2.12), iff det(M(f )) (= 0, by Proposition 3.10.
Given a vector space of finite dimension n, it is easily seen that the set of bijective
linear maps f : E → E such that det(f ) = 1 is a group under composition. This group is a
subgroup of the general linear group GL(E). It is called the special linear group (of E), and
it is denoted by SL(E), or when E = K n , by SL(n, K), or even by SL(n).
3.7. GAUSSIAN ELIMINATION, LU AND CHOLESKY FACTORIZATION 73

3.7 Gaussian Elimination, LU -Factorization,


and Cholesky Factorization
Let A be an n × n matrix, let b ∈ Rn be an n-dimensional vector and assume that A is
invertible. Our goal is to solve the system Ax = b. Since A is assumed to be invertible,
we know that this system has a unique solution, x = A−1 b. Experience shows that two
counter-intuitive facts are revealed:

(1) One should avoid computing the inverse, A−1 , of A explicitly. This is because this
would amount to solving the n linear systems, Au(j) = ej , for j = 1, . . . , n, where
ej = (0, . . . , 1, . . . , 0) is the jth canonical basis vector of Rn (with a 1 is the jth slot).
By doing so, we would replace the resolution of a single system by the resolution of n
systems, and we would still have to multiply A−1 by b.

(2) One does not solve (large) linear systems by computing determinants (using Cramer’s
formulae). This is because this method requires a number of additions (resp. multipli-
cations) proportional to (n + 1)! (resp. (n + 2)!).

The key idea on which most direct methods (as opposed to iterative methods, that look
for an approximation of the solution) are based is that if A is an upper-triangular matrix,
which means that aij = 0 for 1 ≤ j < i ≤ n (resp. lower-triangular, which means that
aij = 0 for 1 ≤ i < j ≤ n), then computing the solution, x, is trivial. Indeed, say A is an
upper-triangular matrix
 
a1 1 a1 2 · · · a1 n−2 a1 n−1 a1 n
 0 a2 2 · · · a2 n−2 a2 n−1 a2 n 
 .. .. .. .. 
 0 0 . . . . 
A=
 .. .. .. .

 . . . 
 0 0 ··· 0 an−1 n−1 an−1 n 
0 0 ··· 0 0 an n

Then, det(A) = a1 1 a2 2 · · · an n (= 0, and we can solve the system Ax = b from bottom-up by


back-substitution, i.e., first we compute xn from the last equation, next plug this value of xn
into the next to the last equation and compute xn−1 from it, etc. This yields

xn = a−1
n n bn
xn−1 = a−1
n−1 n−1 (bn−1 − an−1 n xn )
..
.
x1 = a−1
1 1 (b1 − a1 2 x2 − · · · − a1 n xn ).

If A was lower-triangular, we would solve the system from top-down by forward-substitution.


74 CHAPTER 3. DETERMINANTS

Thus, what we need is a method for transforming a matrix to an equivalent one in upper-
triangular form. This can be done by elimination. Let us illustrate this method on the
following example:
2x + y + z = 5
4x − 6y = −2
−2x + 7y + 2z = 9.
We can eliminate the variable x from the second and the third equation as follows: Subtract
twice the first equation from the second and add the first equation to the third. We get the
new system
2x + y + z = 5
− 8y − 2z = −12
8y + 3z = 14.
This time, we can eliminate the variable y from the third equation by adding the second
equation to the third:
2x + y + z = 5
− 8y − 2z = −12
z = 2.
This last system is upper-triangular. Using back-substitution, we find the solution: z = 2,
y = 1, x = 1.
Observe that we have performed only row operations. The general method is to iteratively
eliminate variables using simple row operations (namely, adding or subtracting a multiple of
a row to another row of the matrix) while simultaneously applying these operations to the
vector b, to obtain a system, MAx = Mb, where MA is upper-triangular. Such a method is
called Gaussian elimination. However, one extra twist is needed for the method to work in
all cases: It may be necessary to permute rows, as illustrated by the following example:
x + y + z =1
x + y + 3z =1
2x + 5y + 8z = 1.
In order to eliminate x from the second and third row, we subtract the first row from the
second and we subtract twice the first row from the third:
x + y + z =1
2z =0
3y + 6z = −1.
Now, the trouble is that y does not occur in the second row; so, we can’t eliminate y from
the third row by adding or subtracting a multiple of the second row to it. The remedy is
simple: Permute the second and the third row! We get the system:
x + y + z =1
3y + 6z = −1
2z = 0,
3.7. GAUSSIAN ELIMINATION, LU AND CHOLESKY FACTORIZATION 75

which is already in triangular form. Another example where some permutations are needed
is:
z = 1
−2x + 7y + 2z = 1
4x − 6y = −1.
First, we permute the first and the second row, obtaining
−2x + 7y + 2z = 1
z = 1
4x − 6y = −1,
and then, we add twice the first row to the third, obtaining:
−2x + 7y + 2z = 1
z = 1
8y + 4z = 1.
Again, we permute the second and the third row, getting
−2x + 7y + 2z = 1
8y + 4z = 1
z = 1,
an upper-triangular system. Of course, in this example, z is already solved and we could
have eliminated it first, but for the general method, we need to proceed in a systematic
fashion.
We now describe the method of Gaussian Elimination applied to a linear system, Ax = b,
where A is assumed to be invertible. We use the variable k to keep track of the stages of
elimination. Initially, k = 1.

(1) The first step is to pick some nonzero entry, ai 1 , in the first column of A. Such an entry
must exist, since A is invertible (otherwise, we would have det(A) = 0). The actual
choice of such an element has some impact on the numerical stability of the method,
but this will be examined later. For the time being, we assume that some arbitrary
choice is made. This chosen element is called the pivot of the elimination step and is
denoted π1 (so, in this first step, π1 = ai 1 ).

(2) Next, we permute the row (i) corresponding to the pivot with the first row. Such a
step is called pivoting. So, after this permutation, the first element of the first row is
nonzero.

(3) We now eliminate the variable x2 from all rows except the first by adding suitable
multiples of the first row to these rows. More precisely we add −ai 1 /π1 times the first
row to the ith row, for i = 2, . . . , n. At the end of this step, all entries in the first
column are zero except the first.
76 CHAPTER 3. DETERMINANTS

(4) Increment k by 1. If k = n, stop. Otherwise, k < n, and then iteratively repeat steps
(1), (2), (3) on the (n − k + 1) × (n − k + 1) subsystem obtained by deleting the first
k − 1 rows and k − 1 columns from the current system.

If we let A1 = A and Ak = (akij ) be the matrix obtained after k − 1 elimination steps


(2 ≤ k ≤ n), then the kth elimination step is as follows: The matrix, Ak , is of the form
 k 
a1 1 ak1 2 · · · · · · · · · ak1 n
 ak2 2 · · · · · · · · · ak2 n 
 .. .. 
 .. 
 . . . 
Ak =  .
 akk k · · · akk n 
 .. .. 
 . . 
k k
an k · · · an n

Now, we will prove later that det(Ak ) = ± det(A). Since A is invertible, some entry akik with
k ≤ i ≤ n is nonzero; so, one of these entries can be chosen as pivot, and we permute the
kth row with the ith row, obtaining the matrix αk = (αjk l ). The new pivot is πk = αkk k , and
we zero the entries i = k + 1, . . . , n in column k by adding −αikk /πk times row k to row i.
At the end of this step, we have Ak+1 . Observe that the first k − 1 rows of Ak are identical
to the first k − 1 rows of Ak+1 .
It is easy to figure out what kind of matrices perform the elementary row operations used
during Gaussian elimination. The permutation of the kth row with the ith row is achieved
by multiplying A on the left by the transposition matrix , P (i, k), with
:
1 if j = l, where j (= i and l (= k
P (i, k)j l = 0 if j = l = i or j = l = k
1 if j = i and l = k or j = k and l = i,
i.e.,
 
1
 1 
 
 0 1 
 
 1 
 
 .. 
P (i, k) =  . .
 
 1 
 
 1 0 
 
 1 
1
Observe that det(P (i, k)) = −1. Therefore, during the permutation step (2), if row k and
row i need to be permuted, the matrix A is multiplied on the left by the matrix Pk such that
Pk = P (i, k), else we set Pk = I.
3.7. GAUSSIAN ELIMINATION, LU AND CHOLESKY FACTORIZATION 77

Adding β times row j to row i is achieved by multiplying A on the left by the elementary
matrix , Ei,j;β = I + βei,j , where
5
1 if k = i and l = j
(ei,j )k l =
0 if k (= i or l =
( j,

i.e.,
 
1
 1 
 
 1 
 
 1 
 
 .. 
Ei,j;β = . .
 
 1 
 
 β 1 
 
 1 
1
Observe that the inverse of Ei,j;β = I + βei j is Ei,j;−β = I − βei,j and that det(Ei,j;β ) = 1.
Therefore, during step 3 (the elimination step), the matrix A is multiplied on the left by a
product, Ek , of matrices of the form Ei,k;βi,k . Consequently, we see that

Ak+1 = Ek Pk Ak .

The fact that det(P (i, k)) = −1 and that det(Ei,j;β ) = 1 implies immediately the fact
claimed above: We always have det(Ak ) = ± det(A). Furthermore, since

Ak+1 = Ek Pk Ak

and since Gaussian elimination stops for k = n, the matrix

An = En−1 Pn−1 · · · E2 P2 E1 P1 A

is upper-triangular. Also note that if we let M = En−1 Pn−1 · · · E2 P2 E1 P1 , then det(M) = ±1,
and
det(A) = ± det(An ).

We can summarize all this in the following theorem:

Theorem 3.14 (Gaussian Elimination) Let A be an n × n matrix (invertible or not). Then


there is some invertible matrix, M, so that U = MA is upper-triangular. The pivots are all
nonzero iff A is invertible.
78 CHAPTER 3. DETERMINANTS

Proof . We already proved the theorem when A is invertible, as well as the last assertion.
Now, A is singular iff some pivot is zero, say at stage k of the elimination. If so, we must
have akik = 0, for i = k, . . . , n; but in this case, Ak+1 = Ak and we may pick Pk = Ek = I.

Remark: Obviously, the matrix M can be computed as


M = En−1 Pn−1 · · · E2 P2 E1 P1 ,
but this expression is of no use. Indeed, what we need is M −1 ; when no permutations are
needed, it turns out that M −1 can be obtained immediately from the matrices Ek ’s, in fact,
from their inverses, and no multiplications are necessary.

Remark: Instead of looking for an invertible matrix, M, so that MA is upper-triangular,


we can look for an invertible matrix, M, so that MA is a diagonal matrix. Only a simple
change to Gaussian elimination is needed. At every stage, k, after the pivot has been found
and pivoting been performed, if necessary, in addition to adding suitable multiples of the
kth row to the rows below row k in order to zero the entries in column k for i = k + 1, . . . , n,
also add suitable multiples of the kth row to the rows above row k in order to zero the
entries in column k for i = 1, . . . , k − 1. Such steps are also achieved by multiplying on
the left by elementary matrices Ei,k;βi,k , except that i < k, so that these matrices are not
lower-diagonal matrices. Nevertheless, at the end of the process, we find that An = MA,
is a diagonal matrix. This method is called the Gauss-Jordan factorization. Because it
is more expansive than Gaussian elimination, this method is not used much in practice.
However, Gauss-Jordan factorization can be used to compute the inverse of a matrix, A.
Indeed, we find the jth column of A−1 by solving the system Ax(j) = ej (where ej is the jth
canonical basis vector of Rn ). By applying Gauss-Jordan, we are led to a system of the form
Dj x(j) = Mj ej , where Dj is a diagonal matrix, and we can immediately compute x(j) .

It remains to discuss the choice of the pivot, and also conditions that guarantee that no
permutations are needed during the Gaussian elimination process.
We begin by stating a necessary and sufficient condition for an invertible matrix to have
an LU-factorization (i.e., Gaussian elimination does not require pivoting). We say that an
invertible matrix, A, has an LU-factorization if it can be written as A = LU, where U is
upper-triangular invertible and L is lower-triangular, with Li i = 1 for i = 1, . . . , n. A lower-
triangular matrix with diagonal entries equal to 1 is called a unit lower-triangular matrix.
Given an n × n matrix, A = (ai j ), for any k, with 1 ≤ k ≤ n, let A[1..k, 1..k] denote the
submatrix of A whose entries are ai j , where 1 ≤ i, j ≤ k.

Proposition 3.15 Let A be an invertible n × n-matrix. Then, A, has an LU-factorization,


A = LU, iff every matrix A[1..k, 1..k] is invertible for k = 1, . . . , n.

Proof . First, assume that A = LU is an LU-factorization of A. We can write


' ( ' (' ( ' (
A[1..k, 1..k] A2 L1 0 U1 Q L1 U1 L1 Q
A= = = ,
A3 A4 P L4 0 U4 P U1 P Q + L4 U4
3.7. GAUSSIAN ELIMINATION, LU AND CHOLESKY FACTORIZATION 79

where L1 , L4 are unit lower-triangular and U1 , U4 are upper-triangular. Thus,

A[1..k, 1..k] = L1 U1 ,

and since U is invertible, U1 is also invertible (the determinant of U is the product of the
diagonal entries in U, which is the product of the diagonal entries in U1 and U4 ). As L1 is
invertible (since its diagonal entries are equal to 1), we see that A[1..k, 1..k] is invertible for
k = 1, . . . , n.
Conversely, assume that A[1..k, 1..k] is invertible, for k = 1, . . . , n. We just need to show
that Gaussian elimination does not need pivoting. We prove by induction on k that the kth
step does not need pivoting. This holds for k = 1, since A[1..1, 1..1] = (a1 1 ), so, a1 1 (= 0.
Assume that no pivoting was necessary for the first k steps (1 ≤ k ≤ n − 1). In this case,
we have
Ek−1 · · · E2 E1 A = Ak ,
where L = Ek−1 · · · E2 E1 is a unit lower-triangular matrix and Ak [1..k, 1..k] is upper-
triangular, so that LA = Ak can be written as
' (' ( ' (
L1 0 A[1..k, 1..k] A2 U1 B2
= ,
P L4 A3 A4 0 B4

where L1 is unit lower-triangular and U1 is upper-triangular. But then,

L1 A[1..k, 1..k]) = U1 ,

where L1 is invertible and U1 is also invertible since its diagonal elements are the first k
pivots, by hypothesis. Therefore, A[1..k, 1..k] is also invertible.

Corollary 3.16 (LU-Factorization) Let A be an invertible n × n-matrix. If every matrix


A[1..k, 1..k] is invertible for k = 1, . . . , n, then Gaussian elimination requires no pivoting
and yields an LU-factorization, A = LU.

Proof . We proved in Proposition 3.15 that in this case Gaussian elimination requires no
pivoting. Then, since every elementary matrix Ei,k;β is lower-triangular (since we always
−1
arrange that the pivot, πk , occurs above the rows that it operates on), since Ei,k;β = Ei,k;−β
&
and the Ek s are products of Ei,k;βi,k ’s, from

En−1 · · · E2 E1 A = U,

where U is an upper-triangular matrix, we get

A = LU,

where L = E1−1 E2−1 · · · En−1


−1
is a lower-triangular matrix. Furthermore, as the diagonal
entries of each Ei,k;β are 1, the diagonal entries of each Ek are also 1.
80 CHAPTER 3. DETERMINANTS

The reader should verify that the example below is indeed an LU-factorization.
    
2 1 1 0 1 0 0 0 2 1 1 0
4 3 3 1 2 1 0 0  
   0 1 1 1.
8 7 9 5 = 4 3 1 0   0 0 2 2
6 7 9 8 3 4 1 1 0 0 0 2

One of the main reasons why the existence of an LU-factorization for a matrix, A, is
interesting is that if we need to solve several linear systems, Ax = b, corresponding to the
same matrix, A, we can do this cheaply by solving the two triangular systems
Lw = b, and Ux = w.
As we will see a bit later, symmetric positive definite matrices satisfy the condition of
Proposition 3.15. Therefore, linear systems involving symmetric positive definite matrices
can be solved by Gaussian elimination without pivoting. Actually, it is possible to do better:
This is the Cholesky factorization.
The following easy proposition shows that, in principle, A can be premultipied by some
permutation matrix, P , so that P A can be converted to upper-triangular form without using
any pivoting.
Proposition 3.17 Let A be an invertible n × n-matrix. Then, there is some permutation
matrix, P , so that P A[1..k, 1..k] is invertible for k = 1, . . . , n.
Proof . The case n = 1 is trivial, and so is the case n = 2 (we swap the rows if necessary). If
n ≥ 3, we proceed by induction. Since A is invertible, its columns are linearly independent;
so, in particular, its first n − 1 columns are also linearly independent. Delete the last column
of A. Since the remaining n−1 columns are linearly independent, there are also n−1 linearly
independent rows in the corresponding n × (n − 1) matrix. Thus, there is a permutation
of these n rows so that the (n − 1) × (n − 1) matrix consisting of the first n − 1 rows is
invertible. But, then, there is a corresponding permutation matrix, P1 , so that the first n − 1
rows and columns of P1 A form an invertible matrix, A& . Applying the induction hypothesis
to the (n − 1) × (n − 1) matrix, A& , we see that there some permutation matrix, P2 , so that
P2 P1 A[1..k, 1..k] is invertible, for k = 1, . . . , n − 1. Since A is invertible in the first place and
P1 and P2 are invertible, P1 P2 A is also invertible, and we are done.

Remark: One can also prove Proposition 3.17 using a clever reordering of the Gaussian
elimination steps. Indeed, we know that if A is invertible, then there are permutation
matrices, Pi , and products of elementary matrices, Ei , so that
An = En−1 Pn−1 · · · E2 P2 E1 P1 A,
where U = An is upper-triangular. For example, when n = 4, we have E3 P3 E2 P2 E1 P1 A = U.
We can define new matrices E1& , E2& , E3& which are still products of elementary matrices and
we have
E3& E2& E1& P3 P2 P1 A = U.
3.7. GAUSSIAN ELIMINATION, LU AND CHOLESKY FACTORIZATION 81

Indeed, if we let E3& = E3 , E2& = P3 E2 P3−1 , and E1& = P3 P2 E1 P2−1P3−1 , we easily verify that
each Ek& is a product of elementary matrices and that

E3& E2& E1& P3 P2 P1 = E3 (P3 E2 P3−1)(P3 P2 E1 P2−1 P3−1 )P3 P2 P1 = E3 P3 E2 P2 E1 P1 .

In general, we let
Ek& = Pn−1 · · · Pk+1 Ek Pk+1
−1 −1
· · · Pn−1 ,
and we have
&
En−1 · · · E1& Pn−1 · · · P1 A = U.

Theorem 3.18 For every invertible n × n-matrix, A, there is some permutation matrix,
P , some upper-triangular matrix, U, and some unit lower-triangular matrix, L, so that
P A = LU (recall, Li i = 1 for i = 1, . . . , n). Furthermore, if P = I, then L and U are unique
and they are produced as a result of Gaussian elimination without pivoting. Furthermore, if
P = I, then L is simply obtained from the Ek−1 ’s.

Proof . The only part that has not been proved is the uniqueness part and how L arises
from the Ek−1 ’s. Assume that A is invertible and that A = L1 U1 = L2 U2 , with L1 , L2 unit
lower-triangular and U1 , U2 upper-triangular. Then, we have

L−1 −1
2 L1 = U2 U1 .

However, it is obvious that L−1 2 is lower-triangular and that U1


−1
is upper-triangular, and
so, L−1
2 L 1 is lower-triangular and U U
2 1
−1
is upper-triangular. Since the diagonal entries of
−1
L1 and L2 are 1, the above equality is only possible if U2 U1 = I, that is, U1 = U2 , and
so, L1 = L2 . Finally, since L = E1−1 E2−1 · · · En−1−1
and each Ek−1 is a product of elementary
matrices of the form Ei,k;−βi,k with k ≤ i ≤ n, we see that L is simply the lower-triangular
matrix whose kth column is the kth column of Ek−1 , with 1 ≤ k ≤ n − 1 (with Li i = 1 for
i = 1, . . . , n).

Remark: It can be shown that Gaussian elimination + back-substitution requires n3 /3 +


O(n2 ) additions, n3 /3 + O(n2 ) multiplications and n2 /2 + O(n) divisions.
Let us now briefly comment on the choice of a pivot. Although theoretically, any pivot
can be chosen, the possibility of roundoff errors implies that it is not a good idea to pick
very small pivots. The following example illustrates this point. Consider the linear system

10−4 x + y = 1
x + y = 2.

Since 10−4 is nonzero, it can be taken as pivot, and we get

10−4 x + y = 1
(1 − 104 )y = 2 − 104 .
82 CHAPTER 3. DETERMINANTS

Thus, the exact solution is

1 1 − 2 × 10−4
x= , y= .
1 − 10−4 1 − 10−4
However, if roundoff takes place on the fourth digit, then 1 − 104 = −9999 and 2 − 104 =
−9998 will be rounded off both to −9990, and then, the solution is x = 0 and y = 1, very
far from the exact solution where x ≈ 1 and y ≈ 1. The problem is that we picked a very
small pivot. If instead we permute the equations, the pivot is 1, and after elimination, we
get the system
x + y = 2
(1 − 10 )y = 1 − 2 × 10−4 .
−4

This time, 1 − 10−4 = −0.9999 and 1 − 2 × 10−4 = −0.9998 are rounded off to 0.999 and the
solution is x = 1, y = 1, much closer to the exact solution.
To remedy this problem, one may use the strategy of partial pivoting. This consists of
choosing during step k (1 ≤ k ≤ n − 1) one of the entries akik such that

|akik | = max |akp k |.


k≤p≤n

By maximizing the value of the pivot, we avoid dividing by undesirably small pivots.

Remark: A matrix, A, is called strictly column diagonally dominant iff


n
%
|aj j | > |ai j |, for j = 1, . . . , n
i=1, i*=j

(resp. strictly row diagonally dominant iff


n
%
|ai i | > |ai j |, for i = 1, . . . , n.)
j=1, j*=i

It has been known for a long time (before 1900, say by Hadamard) that if a matrix, A,
is strictly column diagonally dominant (resp. strictly row diagonally dominant), then it is
invertible. (This is a good exercise, try it!) It can also be shown that if A is strictly col-
umn diagonally dominant, then Gaussian elimination with partial pivoting does not actually
require pivoting.
Another strategy, called complete pivoting, consists in choosing some entry akij , where
k ≤ i, j ≤ n, such that
|akij | = max |akp q |.
k≤p,q≤n

However, in this method, if the chosen pivot is not in column k, it is also necessary to
permute columns. This is achieved by multiplying on the right by a permutation matrix.
3.7. GAUSSIAN ELIMINATION, LU AND CHOLESKY FACTORIZATION 83

However, complete pivoting tends to be too expansive in practice, and partial pivoting is the
method of choice.
A special case where the LU-factorization is particularly efficient is the case of tridiagonal
matrices, which we now consider.
Consider the tridiagonal matrix
 
b1 c1
 a2 b2 c2 
 
 a3 b3 c3 
 . .. .. 
A=  .. . . .

 an−2 bn−2 cn−2 
 
 an−1 bn−1 cn−1 
an bn

Define the sequence

δ0 = 1, δ1 = b1 , δk = bk δk−1 − ak ck−1 δk−2 , 2 ≤ k ≤ n.

Proposition 3.19 If A is the tridiagonal matrix above, then δk = det(A[1..k, 1..k]), for
k = 1, . . . , n.

Proof . By expanding det(A[1..k, 1..k]) with respect to its last row, the proposition follows
by induction on k.

Theorem 3.20 If A is the tridiagonal matrix above and δk (= 0 for k = 1, . . . , n, then A has
the following LU-factorization:
 
  δ1
1 c1
δ 
 δ0  0 
 a2 1  δ2 
 δ1  c2 
  δ1 
 δ1  
 a3 1  δ3 
 δ2  c3 
  
A= .. ..  δ2 .
 . .  .. .. 
  . . 
 δn−3  
 an−1 1  δn−1 
 δn−2  cn−1 
  δn−2 
 δn−2  
an 1  δn 
δn−1
δn−1

Proof . Since δk = det(A[1..k, 1..k]) (= 0 for k = 1, . . . , n, by Theorem 3.18 (and Proposition


3.15), we know that A has a unique LU-factorization. Therefore, it suffices to check that
84 CHAPTER 3. DETERMINANTS

the proposed factorization works. We easily check that

(LU)k k+1 = ck , 1 ≤ k ≤ n − 1
(LU)k k−1 = ak , 2 ≤ k ≤ n
(LU)k l = 0, |k − l| ≥ 2
δ1
(LU)1 1 = = b1
δ0
ak ck−1 δk−2 + δk
(LU)k k = = bk , 2 ≤ k ≤ n,
δk−1

since δk = bk δk−1 − ak ck−1 δk−2 .


It follows that there is a simple method to solve a linear system, Ax = d, where A is
tridiagonal (and δk (= 0 for k = 1, . . . , n). For this, it is convenient to “squeeze” the diagonal
matrix, ∆, defined such that ∆k k = δk /δk−1 , into the factorization so that A = (L∆)(∆−1 U),
and if we let
c1 δk−1
z1 = , zk = ck , 2 ≤ k ≤ n,
b1 δk
A = (L∆)(∆−1 U) is written as
 
  1 z1
c1  
 z1  
  1 z2 
 c2  
 a2  
  
 z2  1 z3 
  
 c3  
 a3  .. .. 
A= z3  . . .
  
 .. ..  
 . .  
 cn−1  1 zn−2 
 an−1  
 zn−1  
 


 cn   1 zn−1 
an  
zn
1

As a consequence, the system Ax = d can be solved by constructing three sequences: First,


the sequence
c1 ck
z1 = , zk = , k = 2, . . . , n,
b1 bk − ak zk−1
corresponding to the recurrence δk = bk δk−1 − ak ck−1 δk−2 and obtained by dividing both
sides of this equation by δk−1 , next

d1 dk − ak wk−1
w1 = , wk = , k = 2, . . . , n,
b1 bk − ak zk−1
3.7. GAUSSIAN ELIMINATION, LU AND CHOLESKY FACTORIZATION 85

corresponding to solving the system L∆w = d, and finally

xn = wn , xk = wk − zk xk+1 , k = n − 1, n − 2, . . . , 1,

corresponding to solving the system ∆−1 Ux = w.

Remark: It can be verified that this requires 3(n − 1) additions, 3(n − 1) multiplications,
and 2n divisions, a total of 8n − 6 operations, which is much less that the O(2n3 /3) required
by Gaussian elimination in general.
We now consider the special case of symmetric positive definite matrices. Recall that an
n × n symmetric matrix, A, is positive definite iff

x# Ax > 0 for all x ∈ R with x (= 0.

Equivalently, A is symmetric positive definite iff all its eigenvalues are strictly positive. The
following facts about a symmetric positive definite matrice, A, are easily established (some
left as exercise):
(1) The matrix A is invertible. (Indeed, if Ax = 0, then x# Ax = 0, which implies x = 0.)

(2) We have ai i > 0 for i = 1, . . . , n. (Just observe that for x = ei , the ith canonical basis
vector of Rn , we have e#
i Aei = ai i > 0.)

(3) For every n × n invertible matrix, Z, the matrix Z # AZ is symmetric positive definite
iff A is symmetric positive definite.

Next, we prove that a symmetric positive definite matrix has a special LU-factorization
of the form A = BB # , where B is a lower-triangular matrix whose diagonal elements are
strictly positive. This is the Cholesky factorization.
First, we note that a symmetric positive definite matrix satisfies the condition of Propo-
sition 3.15.

Proposition 3.21 If A is a symmetric positive definite matrix, then A[1..k, 1..k] is invertible
for k = 1, . . . , n.

Proof . If w ∈ Rk , with 1 ≤ k ≤ n, we let x ∈ Rn be the vector with xi = wi for i = 1, . . . , k


and xi = 0 for i = k + 1, . . . , n. Now, since A is symmetric positive definite, we have
x# Ax > 0 for all x ∈ Rn with x (= 0. This holds in particular for all vectors x obtained from
nonzero vectors w ∈ Rk as defined earlier, which proves that each A[1..k, 1..k] is symmetric
positive definite. Thus, A[1..k, 1..k] is also invertible.
Let A be a symmetric positive definite matrix and write
' (
a1 1 W #
A= .
W B
86 CHAPTER 3. DETERMINANTS


Since A is symmetric positive definite, a1 1 > 0, and we can compute α = a1 1 . The trick is
that we can factor A uniquely as
' ( ' (' (' (
a1 1 W # α 0 1 0 α W # /α
A= = ,
W B W/α I 0 B − W W # /a1 1 0 I

i.e., as A = B1 A1 B1# , where B1 is lower-triangular with positive diagonal entries. Thus, B1


is invertible, and by fact (3) above, A1 is also symmetric positive definite.

Theorem 3.22 (Cholesky Factorization) Let A be a symmetric positive definite matrix.


Then, there is some lower-triangular matrix, B, so that A = BB # . Furthermore, B can
be chosen so that its diagonal elements are strictly positive, in which case, B is unique.

Proof . We proceed by induction on k. For k = 1, we must have a1 1 > 0, and if we let



α = a1 1 and B = (α), the theorem holds trivially. If k ≥ 2, as we explained above, again
we must have a1 1 > 0, and we can write
' ( ' (' (' (
a1 1 W # α 0 1 0 α W # /α
A= = = B1 A1 B1# ,
W B W/α I 0 B − W W # /a1 1 0 I

where α = a1 1 , the matrix B1 is invertible and
' (
1 0
A1 =
0 B − W W # /a1 1

is symmetric positive definite. However, this implies that B − W W # /a1 1 is also symmetric
positive definite (consider x# A1 x for every x ∈ Rn with x (= 0 and x1 = 0). Thus, we can
apply the induction hypothesis to B − W W # /a1 1 , and we find a unique lower-triangular
matrix, L, with positive diagonal entries, so that

B − W W # /a1 1 = LL# .

But then, we get


' (' (' (
α 0
1 0 α W # /α
A =
0 B − W W # /a1 1
W/α I 0 I
' (' (' #
(
α 0 1 0 α W /α
= #
W/α I 0 LL 0 I
' (' (' (' (
α 0 1 0 1 0 α W # /α
=
W/α I 0 L 0 L# 0 I
' (' #
(
α 0 α W /α
= .
W/α L 0 L#
Therefore, if we let ' (
α 0
B= ,
W/α L
3.8. FUTHER READINGS 87

we have a unique lower-triangular matrix with positive diagonal entries and A = BB # .

Remark: If A = BB # , where B is any invertible matrix, then A is symmetric positive


definite. Obviously, BB # is symmetric, and since B is invertible and

x# Ax = x# BB # x = (B # x)# B # x,

it is clear that x# Ax > 0 if x (= 0.


The proof of Theorem 3.22 immediately yields an algorithm to compute B from A. For
j = 1, . . . , n,
; j−1
<1/2
%
bj j = aj j − b2j k ,
k=1

and for i = j + 1, . . . , n, ; <


j−1
%
bi j = ai j − bi k bj k /bj j .
k=1

The Cholseky factorization can be used to solve linear systems, Ax = b, where A is


symmetric positive definite: Solve the two systems Bw = b and B # x = w.

Remark: It can be shown that this methods requires n3 /6 + O(n2) additions, n3 /6 + O(n2)
multiplications, n2 /2 + O(n) divisions, and O(n) square root extractions. Thus, the Choleski
method requires half of the number of operations required by Gaussian elimination (since
Gaussian elimination requires n3 /3 + O(n2) additions, n3 /3 + O(n2 ) multiplications, and
n2 /2 + O(n) divisions). It also requires half of the space (only B is needed, as opposed to
both L and U). Furthermore, it can be shown that Cholesky’s method is numerically stable.
For more on the stability analysis and efficient implementation methods of Gaussian
elimination, LU-factoring and Cholesky factoring, see Demmel [14], Trefethen and Bau [54],
Ciarlet [12], Golub and Van Loan [23], Strang [50, 51], and Kincaid and Cheney [28].

3.8 Futher Readings


Thorough expositions of the material covered in Chapter 2 and 3 can be found in Strang
[51, 50], Lang [31], Artin [1], Mac Lane and Birkhoff [34], Bourbaki [7, 8], Van Der Waerden
[55], Bertin [6], and Horn and Johnson [27]. These notions of linear algebra are nicely put
to use in classical geometry, see Berger [3, 4], Tisseron [53] and Dieudonné [15].
Another rather complete reference is the text by Walter Noll [40]. But beware, the
notation and terminology is a bit strange!
88 CHAPTER 3. DETERMINANTS

You might also like