[go: up one dir, main page]

0% found this document useful (0 votes)
78 views35 pages

Panoramic View of Linear Algebra

This document summarizes key concepts from linear algebra including vector spaces, fields, and groups. It defines groups using both traditional notation and rigorous function notation. It then uses the concept of a group to more easily define the 11 axioms of a field. Finally, it defines vector spaces over a field by combining the axioms for an abelian group with properties describing scalar multiplication, derived from properties of real number vectors. The overall goal is to build understanding of these concepts in a logical manner using existing mathematical structures.

Uploaded by

Charm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views35 pages

Panoramic View of Linear Algebra

This document summarizes key concepts from linear algebra including vector spaces, fields, and groups. It defines groups using both traditional notation and rigorous function notation. It then uses the concept of a group to more easily define the 11 axioms of a field. Finally, it defines vector spaces over a field by combining the axioms for an abelian group with properties describing scalar multiplication, derived from properties of real number vectors. The overall goal is to build understanding of these concepts in a logical manner using existing mathematical structures.

Uploaded by

Charm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Those lecture notes are brought to you by the

Math, Physics,Engineering YouTube channel.


(Click the blue text to visit the channel)
Please consider subscribing and sharing.
Ring the bell in order not to miss future uploads!
Video explanation to the document is available at the following link:
Lecture in YouTube (simply click the blue text)
For question ,remarks, or support on PayPal support: math.physics3141@gmail.com
To support on Patreon follow the link: Patreon

1 Introduction
Those lecture note will help you prepare to an exam of a standard first linear
algebra course. If you are preparing for an exam and have the general knowledge
of the material , or want to refresh forgotten topics those lecture notes are
perfect for you. This short survey will help you organize the material in a
hierarchical logical order and obtain deep understanding of the material. In
this survey we skip the trivial material regarding matrix multiplication, Gauss-
Jordan elimination, and solutions to system of equations, etc... We begin with
the abstract notion of vector spaces.

2 Vector Spaces
Vector space is defined over a field , the algebraic structure with 11 axioms that
students have hard to remember and often don’t even bother to. I would like
to show you a method to remember and understand the axioms. It is related
to an important concept in mathematics which is called a group, this concept
if often falls outside of the scope of this introductory course but knowing it has
great benefits.
Definition 2.1. (Group)
A group is a nonempty set G endowed with a binary operation which we will
denote by · such that the following conditions are satisfied:
1)For all a, b ∈ G holds a · b ∈ G.
2)For all a, b, c ∈ G holds a · (b · c) = (a · b) · c.
3)There exist a special element e ∈ G such that for all a ∈ G holds a·e = e·a = a
4) For every element a ∈ G there exists an element an element a−1 ∈ G such
that a · a−1 = a−1 · a = e.

It is a standard convention to omit writing the product operation (often reffed


to as multiplication or the groups, product) · explicitly and often we write ab
instead of a · b. Also note that it is not specified in the axioms that ab = ba.
That is in general the group doesn’t have to be commutative. You saw for
example that for the product of matrices A and B it is not true in general

1
that AB = BA for square matrices A,B. If the matrices are not square then
it could be that while AB is defined the product BA is not defined. In non-
commutative groups the binary operation is denoted by the multiplicative sign
· , and the identity element of the group is often denoted by 1. Groups that
posses the commutativity property have a special name , they are called abelian
group , after the Norwegian mathematician Niels Henrik Abel.
Definition 2.2. A group G is said to be abelian if for every a, b ∈ G holds
ab = ba.
In abelian groups the binary action is often denoted by +, and the identity
element is denoted by 0. A more rigorous definition of a group G would be

Definition 2.3. A group is a nonempty set G, that contains a special element


e ∈ G (called the identity element of the group), that is endowed with two
functions:
µ : G × G → G, and ι : G → G
such that the following properties hold:
For all a, b, c ∈ G,
µ(a, µ(b, c)) = µ(µ(a, b), c).
For all a ∈ G holds
µ(a, ι(a)) = µ(ι(a), a) = e.

In the definition above the function µ is actually the binary operation. We can
then say that we write µ(a, b) as ab and ι(a) = a−1 , to get the previous defini-
tion. If you think about it, the first function that we ever learn about in the
first grade is a function of two variables. That’s right the addition operation is
a function of two variables, it takes two numbers and assigns a third number to
them which is their sum. When we want to explicitly specify a group , we need
to specify the set G, the binary operation , and the identity/neutral element.
This definition may seem a bit abstract and hard to grasp, but learning and un-
derstanding it is worth your time, groups are ubiquitous thought mathematics,
physics , chemistry , and computer science.
Here are a few examples :
1) (Z, +, 0) is an abelian group.
2) (Q, +, 0) is an abelian group.
3) (Q \ {0} , n
·, 1) is an abelian ogroup.
4) Let S = z ∈ C |z| = 1 , then (S1 , ·, 1) with the ordinary multiplication
1
of complex numbers is an abelian group. n o
5) Let X be a set and let S(X) = f f : X → X f is a bijection . Then

(S(X), ◦, IdX ) is a group, that is called the symmetry group of X, where ◦


denotes the function composition operation , and IdX is the identity function
of X which is the identity element of S(X) with respect to the function compo-
sition operation. This group is not commutative if X contains more than two
elements.

2
6) Let GL(n, R) be the set of all invertible square n × n matrices then G is a
group that is not abelian.

Let us now review the definition of a field:


Let F be a nonempty set endowed with two binary operations which will be
denoted by ” + ” and ” · ”, which are called addition and multiplication respec-
tively , and two special distinct elements 0, 1 ∈ F such that:
1) (F, 0, +) is an abelian group. (5 axioms)
2) (F \ {0} , ·, 1) is an abelian group.(5 axioms)
Note that by knowing the notion of the group we now effortlessly formulated
10 out of the 11 field axioms, that we can easily reproduce one by one if it
will be required. We need one more axiom to formulate the relation between
the addition and the multiplication operations. It is the most natural law of
distributivity for all a, b, c ∈ F holds:

a · (b + c) = ab + ac.

Just like that we have easily stated all the 11 axioms of a field. Thus a shorter
definition of the field is.
Definition 2.4. Let F be a nonempty set with endowed with two binary op-
erations which will be denoted by ” + ” and ” · ”, which are called addition
and multiplication respectively , and two special distinct elements 0, 1 ∈ F such
that:
1) (F, 0, +) is an abelian group. (5 axioms)
2) (F \ {0} , ·, 1) is an abelian group.(5 axioms)
3) for all a, b, c ∈ F holds: a · (b + c) = ab + ac.

Note how effortlessly we stated all the 11 axioms of a field in just 3 lines!

Vector Spaces
The first jump in the level of abstractness in a standard linear algebra course ap-
pears with introduction of the notion of vector spaces. From concrete and simple
example of matrices, matrix multiplication , and Gauss-Jordan elimination, the
transition to an abstract entity such as a vector space that is defined over an
arbitrary field can be quite hard. However this abstraction has significant ad-
vantages. When the notion of a vector space was defined , mathematicians had
in mind a concrete space R3 with which we are familiar and have a good intu-
itive understanding of it. We would therefore like to extract and understand the
essential properties of this space, that make so important in describing physics.
The advantage of this abstraction is that whenever we prove a theorem about
an abstract vector space, we also prove a theorem for an infinitely many other
concrete spaces.
Let us first review the essential properties of R3 . First of all we can add vectors,

3
and this addition satisfies lots of nice properties , in our new terminology we
can say that (R3 , +, 0) is an abelian group. These turn out to be the first 5
axioms of an abstract vector space. We can also multiply a vector by scalars ,
that is element of R. The real numbers R are a field , and this product of vector
by scalars satisfy the following nice properties:
1) For every α ∈ R and v ∈ R3 , holds αv ∈ R3 .
2) For every α, β ∈ R and v ∈ R3 holds α(βv) = (αβ)v.
Note that the product (αβ) is taken in the field and then applied to the vector v,
whereas on the left hand side we have mixed products of scalars with a vector!
3)For all α ∈ R and v, w ∈ R3 , holds α(v + w) = αv + βw.
4)For all v ∈ R3 holds 1 · v = v.
5) For all α, β ∈ R and v ∈ R3 holds (α + β)v = αv + βv.
The five properties mentioned above define all the possible interactions between
the scalars in R and the vectors in R3 . The scalar can be thought of as acting on
vectors by scaling them up. This is an interaction between objects of different
type. Taking the first five conditions that state that (R3 , 0, +) is an abelian
group, together with the five properties just mentioned that describe the inter-
action between scalars and vectors , we obtain the ten axioms of a vector space.
Replacing R by an arbitrary field F and replacing R3 by any abelian group V we
obtain the definition of a vector space over a field F. In order for the definition
to be more rigorous the multiplicative property should be defined as a function
µ : F × V → V that satisfies the required conditions , then for α ∈ F and v ∈ V
we would say that we write µ(α, v) as αv.

Definition 2.5. Let (V, 0, +) be an abelian group, and let F be a field. V is


said to be a vector field over F if there exist a product function µ : F × V → V
that will be denoted by µ(α, v) = αv for all α ∈ F and v ∈ V such that the
following holds:
1) For all v ∈ V and 1 ∈ F the field’s (multiplicative identity) holds 1 · v = v.
2) For every α, β ∈ F and all v ∈ V holds α(βv) = (αβ)v.
3) For every α, β ∈ F and all v ∈ V holds (α + β)v = αv + βv.
4)For all α ∈ F and all v, w ∈ V , holds α(v + w) = αv + βw.

We know that for every v ∈ R3 and 0 ∈ R holds 0 · v = 0. Does it hold for any
abstract vector space? This property doesn’t appear in the list of axioms of a
vector space, to see if it holds or not we would have to either prove or disprove
it based on the axioms of a vector space.
Proposition 2.5.1. Let V be a vector space, over a field F then the following
holds:
1) 0v = 0.
2) −v = (−1) · v
3) For all α ∈ F and 0 ∈ V holds α0 = 0.
Note that in 0v = 0 on the left hand side we have 0 ∈ F the zero scalar , while
the zero on the right side is the zero vector. In the second identity that seems

4
obvious, actually −v is the additive inverse of the element v ∈ V in the abelian
group (V, 0, +) while (−1)v is the product of the additive inverse of the unit
element in the field F multiplied by the vector v. The last property states that
every scalar in the field yields the zero vector when it is multiplied by the zero
vector.

Proof. Consider

0v = (0 + 0)v = 0v + 0v.
Where ∗ follows from property (4). Since 0v ∈ V it has an additive inverse in
the abelian group which is denoted by −0v. Adding this to both sides we obtain

0 = 0v + (−0v) = 0v + 0v + (−0v) = 0v

which implies 0v = 0 for all v ∈ V.


To prove the second identity note that

0 = 0v = (1 + (−1))v = 1v + (−1v) = v + (−1)v

where we used property (1) to conclude 1v = v. Adding the additive inverse of


v, that is −v to both sides we conclude that

−v = (−1)v.

Finally note that


α0 = α(0 + 0) = α0 + α0
adding −α0 to both sides we obtain α0 = 0.

It can be quite tedious to verify for each space that it indeed satisfies all the
axioms of the vector space. However the situation can be simplified if we are
considering a set that is a subset of a known vector space. We therefore introduce
the concept of a vector subspace.
Definition 2.6. (Vector subspace)
Let V be a vector space over a field F and let W ⊂ V with 0V ∈ W .
(The zero vector of V must belong to W , in particular W is not empty!)
We say that W is a subspace of V if W becomes a vector space when the scalar
multiplication operations and the addition that are defined on V are restricted
to W .
Note that the zero vector of V will automatically be the zero vector of W . From
now on we will write 0 to denote the zero vector of V that is also the zero of W .
The concept of a subspace is important not only because when one tries to
understand a complicated structure it helps to understand its substructures,
but also because it is much easier to check that a certain space is a subspace of
a known vector space. In this way proving a given set is a vector space is much
easier , as the following theorem shows.

5
Theorem 2.6.1. Let V be a vector space and W ⊂ V.
If the following holds:
1)For all w1 , w2 ∈ W we have w1 + w2 ∈ W,
2) 0 ∈ W ,
3)For all α ∈ F and w ∈ W we have αw ∈ W.
then W is a subspace of V and in particular W is a vector space in its own
right.
Proof. The idea here is that most of the axioms regarding addition and multipli-
cation by a scalar that hold on all of V will continue to be valid when restricted
to W . The closure with respect to addition holds by assumption (1). Since
the associativity and the commutativity of the addition holds on all of V it will
still hold when the addition is restricted to W . We also assume 0 ∈ W , which
implies that W is not an empty set and also contains the additive identity. Now
let w ∈ W. By assumption (3) for all α ∈ F and w ∈ W we have αw ∈ W , in
particular if we choose α = −1 ∈ F we have that (−1)w ∈ W. By proposition
2.5.1 (−1)w = −w so −w ∈ W so for each element of W its additive inverse is
also in W . So far we have verified the first five axioms that state that (W, 0, +)
is an abelian group. It is clear that properties (1)-(4) in definition 2.5, that hold
on all of V will continue to hold when restricted to W . The fact that for all
w ∈ W and α ∈ F we have αw ∈ W holds by assumption (3).
Thus we have verified all the vector space axioms for W , so W is a vector space
in its own right, and thus W is a subspace of V .
It is sometimes convenient to unify assumption (1) and (3) into a one single
assumption as the following corollary shows.
Corollary 2.6.2. Let V be a vector space and W ⊂ V.
If the following holds:
1) 0 ∈ W ,
2)For all α, β ∈ F and w1 , w2 ∈ W we have
αw1 + βw2 ∈ W,
then W is a subspace of V and in particular W is a vector space in its own
right.
Proof. From the previous theorem it follows that we only need to prove that:
1)For all w1 , w2 ∈ W holds w1 + w2 ∈ W.
2)For all α ∈ F and w ∈ W holds αw ∈ W.
Since by assumption we have that for all α, β ∈ F and w1 , w2 ∈ W holds
αw1 + βw2 ∈ W we can choose in particular α = β = 1 to conclude that
(1) holds. Choosing w2 = 0 ∈ W we conclude that for all w1 ∈ W holds
αw1 ∈ W.
As an example of how to apply this corollary consider the following proposition.
Proposition 2.6.3. (Subspaces Intersection)
Let V be a vector space and let U , W be subspace of V then U ∩ W is a subspace
of V.

6
Proof. Clearly 0 ∈ U ∩ W because 0 ∈ U and 0 ∈ W. Now let u, w ∈ U ∩ W ,
then u, w ∈ U and u, w ∈ W , since U, W are subspaces of V for all α, β ∈ F
holds αu + βw ∈ U and αu + βw ∈ W thus
αu + βw ∈ U ∩ W
which proves that U ∩ W is a subspace of V.
The next important concept in the linear algebra course is the span of a set of
vectors. The motivation comes from the following problem: given a vector space
V and vectors v1 , .., vk ∈ V what is the smallest subspace of V that contains the
vectors v1 , ..., vk ? If W is a subspace of V that contains v1 , .., vk then clearly
for all α1 , ..., αk ∈ F we have α1 v1 + ... + αk vk ∈ W. Such a weighted sum of
vectors is called a linear combination of v1 , ..., vk . This motivates the following
definition.
Definition 2.7. Let V be a vector space over F and let v1 , ..vk ∈ V. The space
spanned by v1 , ..., vk is defined by
n o
span {v1 , ..., vk } = α1 v1 + ... + αk vk α1 , ..., αk ∈ F

Proposition 2.7.1. Let V be a vector space over F and let v1 , ..vk ∈ V. Then
span {v1 , .., vk } is the smallest subspace of V that contains the vectors v1 , ..., vk .
Proof. Let W = span {v1 , ..., vk } we will show that W is a subspace of V and
that any other subspace of V that contains v1 , .., vk contains W .
Since k ≥ 1, W is not empty and choosing all the scalars to be zero we conclude
that 0 ∈ W. Let w1 , w2 ∈ W , then there exists α1 , ..., αk , β1 , ..βk ∈ F such that
k
X k
X
w1 = αi vi and w2 = βi vi .
i=1 i=1

Then for all α, β ∈ F we have


k
X
αw1 + βw2 = (ααi + ββi )vi ∈ W.
i=1

Thus W is a subspace of V . If U is any subspace of V , that contains v1 , ..., vk


then U must also contain every linear combination of those vectors,
thus W ⊂ U.
It is interesting to know when some finite set of vectors is sufficient to represent
the entire space, in the sense that every vector in the space can be written as the
linear combination of those vectors. The classical example is the space Rn and
the unit vectors in the direction of the coordinates, ei = (0, ..., |{z}1 , ..., 0).
i−th slot
Then every vector in (x1 , ..., xn ) ∈ Rn can be written as
(x1 , ..., xn ) = x1 e1 + ... + xn en .
This motivates the definition of the spanning set.

7
Definition 2.8. Let V be a vector space. The {v1 , .., vk } ⊂ V is said to be a
spanning set if
V = span {v1 , ..., vk } .
The situation in which an infinite set can be represent in some way by a finite set
is quite important. We have the following definition to describe this situation
for vector spaces.
Definition 2.9. Let V be a vector space over the field F, we say that V is
finitely generated if there exists a finite set of vectors v1 , ..., vn such that

V = span {v1 , ..., vn } .

If we consider again the example of Rn we can choose {e1 , ..., en } as the spanning
set for it. We notice another feature of this set, its minimal in the following
sense, if we remove any vector from this set we will no longer have a spanning
set. Whenever we have a spanning set we can add to it as many vectors from V
as we wish and we will still have a spanning set. So a spanning set can in general
have redundant vectors that we can remove and still obtain a spanning set. For
example if there is a vector that is a linear combination of other vectors we can
discard it and have a spanning set. This is a key observation. To formalize what
we mean by a set without ”redundancy” we have the following definition.
Definition 2.10. (Linear Dependence)
Let V be a vector space and let v1 , ..., vk ∈ V. The vector v1 , ..., vk are said to
be linearly dependent , if there exists α1 , ..., αk ∈ F not all of which are zero
such that
Xk
αi vk = 0.
i=1

For example any set that contains the zero vector is linearly dependent because
we can write,
1 · 0 + 0v1 + ... + 0vk = 0,
so we have found scalars not all of which of zero such that the linear combina-
tion above is the zero vector. A set that is linearly dependent always contains
redundancy, a vector that can be removed from the linearly dependent set such
that the remaining vectors will span the same subspace. This is formalized in
the following lemma.

Proposition 2.10.1. Let V be a vector space and let {v1 , ..., vk } be any spanning
set. Then for every v ∈ V the set {v, v1 , ..., vm } is linearly dependent.
Proof. Since V = span {v1 , ..., vk } and v ∈ V we can find scalars α1 , ..., αk ∈ F
such that
Xk
v= αi vi .
i=1

8
We can therefore write
k
X
v− αi vi = 0.
i=1

The coefficient of v is nonzero , so not all scalars are zero and thus {v, v1 , ..., vm }
is linearly dependent.

Lemma 2.10.2. Let V be a vector space and let v1 , ..., vk ∈ V , k > 1 be nonzero
vectors. Let v1 , ..., vk be linearly dependent, then there exists a vector that is the
linear combination of its predecessors.
Proof. Since the vectors are linearly dependent , there are scalars , not all of
which are zero such that
Xk
αi vi = 0.
i=1

Let j be the maximal index such that αj 6= 0, or


n o
j := max i i ∈ {1, .., k} , αi 6= 0 .

Clearly j > 1 because if j = 1 this means that α2 = ... = αk = 0 and α1 v1 = 0


but since v1 6= 0 this implies α1 = 0 contradicting the assumption that not all
scalar are zero. Now
k
X
αi vi = α1 v1 + ... + αj vj = 0.
i=1

Since αj 6= 0 and α ∈ F there exists a multiplicative inverse for α ∈ F which


means
vj = −αj−1 α1 v1 − ... − αj−1 αj−1 vj−1 .
So vj is a linear combination of it’s predecessors.
Definition 2.11. (Linear Independence)
Let V be a vector space and let v1 , ..., vk ∈ V. The vector v1 , ..., vk are said to
be linearly independent , if
Xk
αi vk = 0,
i=1

implies α1 = α2 = ... = αk = 0.
We can see that linear independence is equivalent to the requirement that the
zero vector can be uniquely written as a linear combination of linearly indepen-
dent vector, that is only as the trivial linear combination where all coefficients
are zero. This is in fact true for the general case as the next proposition.

9
Proposition 2.11.1. Let V be a vector space and let v1 , ..., vk ∈ V , be linearly
independent vectors. Then for every v ∈ span {v1 , ..., vk } there exists unique
scalars α1 , ..., αk ∈ F such that
k
X
v= αi vi .
i=1

Proof. Since v ∈ span {v1 , ..., vk } there exists α1 , .., αk ∈ F such that
k
X
v= αi vi .
i=1

To show uniqueness suppose that there exist β1 , ..., βk ∈ F such that


k
X
v= βi vi .
i=1

Subtracting the two expressions we obtain


k
X
0= (αi − βi )vi .
i=1

Since the set is linearly independent we must have

αi − βi = 0

for every i.
We have now defined two concepts that are in a sense conflicting. On the one
hand the spanning set tends to be big because by adding vectors to it , the set
will remain spanning. Intuitively we could say that a spanning set wants to be
as big as it can. Removing elements from it may ruin it. On the other hand if
we have a linearly independent set, by removing elements from it it will continue
to be linearly independent. The following theorem shows in what sense linearly
independent sets tend to be small while spanning sets tend to be big.
Theorem 2.11.2. Let V be a vector space. Let v1 , .., vm be any spanning set,
that is V = span {v1 , ..., vm } . Let {u1 , ..., uk } be any linearly independent set,
then k ≤ m. That is any spanning set has at least as many elements as any
linearly independent set.
Proof. We may assume that all the vectors v1 , .., vm are nonzero , otherwise we
can discard all the zero vectors and the remaining set of nonzero vectors will
also be a spanning set. Suppose that k > m we will show that we arrive to a
contradiction. Consider the set {u1 , v1 , .., vm } of nonzero vectors, by proposition
2.10.1 this set is linearly dependent and therefore by lemma 2.10.2 there exists
a vector that is a linear combination of its predecessors. This vector is not u1

10
because it has no predecessors, thus there exists an index 1 ≤ j1 ≤ m such that
vj1 is a linear combination of its predecessors, we can discard it and still obtain
a spanning set
{u1 , v1 , .., vbj1 , ..., vm }
where the hat over vj1 indicates that this vector is omitted from the list. Suppose
inductively that for l steps we arrived at a set
{u1 , .., ul , v1 , .., vbj1 , ..., vbjl , ..., vm }
in which we discarded l < m vectors from the list v1 , .., vm and obtained a
spanning set by adding the vectors u1 , .., ul . We will show that we can continue
the process. Since by the inductive hypothesis the set
{u1 , .., ul , v1 , .., vbj1 , ..., vbjl , ..., vm }
is a spanning set we can add to it the vector ul+1 to obtain a linearly dependent
set
{u1 , .., ul , ul+1 , v1 , .., vbj1 , ..., vbjl , ..., vm }
(for clarity of notation the omitted vectors could be any vectors from the list
v1 , ..., vm . ) Since the set is linearly dependent by lemma 2.10.2 there exists a
vector that is a linear combination of its predecessors. This vector is not any
of the vectors u1 , ..., ul+1 because this set is linearly independent. Thus there
exists 1 ≤ jl+1 ≤ m such that vjl+1 is a linear combination of its predecessors,
thus we can discard it and obtain the spanning set,

u1 , .., ul , ul+1 , v1 , .., vbj1 , ..., vbjl , ..., vbjl+1 , ..., vm ,
which completes the inductive step. Since we assumed k > m after m steps we
will obtain the spanning set {u1 , .., um } were all the vectors v1 , ..., vm have been
discarded. Since k > m we can add the vector um+1 to the spanning set and
conclude that the set {u1 , ..., um , um+1 } is linearly dependent, contradicting the
assumption that the set {u1 , ..., uk }, where k ≥ m + 1 is linearly independent.
This contradiction implies that k ≤ m as desired.

The previous theorem shows that the largest possible linearly independent set
will have no more elements than the smallest possible spanning set. So spanning
and linear independence are somewhat conflicting constraints, so the question
arises can we have both conditions hold at the same time. What happens when
those opposing forces reach the perfect balance and equilibrium. The answer is
that something wonderful happens at this point. We arrive to one of the most
important concepts in the course, the basis of a vector space.
Definition 2.12. (Vector Space Basis)
Let V be a vector space, a basis for V is a spanning linearly independent set.
We have defined the notion of a basis, but we still don’t know if we can always
find a basis for a given vector space. The following important theorem shows
that for a finitely generated vector spaces a basis always exists.

11
Theorem 2.12.1. (Existence of Basis)
Let V be a finitely generated vector space, then there exists a basis for V .
Proof. Since V is finitely generated there exist finitely many nonzero vectors
v1 , ..., vN ∈ V such that V = span {v1 , ..., vN } . If v1 , ..., vN are linearly inde-
pendent they from the desired basis and we are done, otherwise by lemma 2.10.2
there exists 1 < j ≤ N such that vj is a linear combination of its predecessors,
we can discard it and still obtain a spanning set. If the obtained set is linearly
independent we are done, otherwise we will continue the process which will ter-
minate after at most N − 1 steps because a set with a single nonzero vector
is linearly independent. Once the process has terminated we have arrived to a
linearly independent set, because it is a set of nonzero vectors such that there
is no vector that is a linear combination of its predecessors and is thus linearly
independent. Since at each step we obtain a spanning set , once the process
has terminated we have arrived at a spanning and a linearly independent set ,
which is a basis.
Definition 2.13. (Dimension)
Let V be a vector space, over the field F. The dimension of V is the number of
elements in a basis of V.
For example according to this definition the dimension of Rn over R is n because
the set of the vectors {e1 , .., en } is a spanning and a linearly independent set. In
addition if we consider the vector space as a set , then the field above which we
regard V as a vector space can effect the dimension. For example the dimension
of C as a vector space over C is 1, but if we consider C as a vector space over
R then its dimension is 2 and a basis for this space is {1, i} .
The question that arises is the notion of dimension is well defined, as for the
same space there could be many possible choices of bases. What if in two bases
have different number of elements, then this would imply that the definition
above is meaningless and the notion of dimension is not well defined. As the
next proposition shows this never happens.
Proposition 2.13.1. (Dimension is Well Defined)
Let V be a finitely generated vector space over F, then any two bases of V have
exactly the same number of elements.

Proof. Since V is finitely generated there exists at least one basis for V . Sup-
pose that B1 = {u1 , .., um } and B2 = {v1 , ..., vn } are two bases for V . Since
B1 is linearly independent set and B2 is a spanning set we conclude by propo-
sition 2.11.2 that m ≤ n, reversing the roles of B1 and B2 by exactly the same
argument we conclude that n ≤ m thus n = m.

We would like to emphasize again that a spanning set and a linearly independent
set are is a sense each others opposites. We have seen that adding a vector to
a spanning set we can only improve the situation as it will remain spanning,
we can add as many vectors as we please. Removing a vector from a spanning
set may result in a set that is no longer spanning and thus it can ruin the

12
property. On the other hand removing a vector from a linearly independent
set will result in a linearly independent set (we can consider the empty set a
linearly independent.) Adding a vector to a linearly independent set can result
in a linearly dependent set. However as the next lemma shows if the linearly
independent set is not a spanning set there is a way to add a vector to it and
still obtain a linearly independent set.
Lemma 2.13.2. Let V be a vector space over the field F and let {u1 , .., uk } a
linearly independent set in V that is not spanning V . Then for every
v ∈ V \ span {u1 , ..., uk } the set {u1 , ..., uk , v} is a linearly independent set.
Proof. Suppose that for α1 , .., αk , β ∈ F we have

α1 u1 + ... + αk uk + βv = 0.

If β 6= 0 then β has a multiplicative inverse in F and we have


k
X
v= (−β −1 αi )ui
i=1

which implies that v ∈ span {u1 , ..., uk } contradicting the assumption. Thus
β = 0 and
α1 u1 + ... + αk uk = 0.
Since u1 , ..., uk are linearly independent this implies α1 = ... = αk = 0 together
with β = 0 this implies that {u1 , ..., uk , v} is a linearly independent set.

Corollary 2.13.3. Let V be a vector space over the field F and let n = dim(V ),
then any linearly independent set with n elements is a basis for V.
Proof. Since V is finitely generated there exists a basis for V say v1 , ..., vn , so
n = dim(V ). Let u1 , ..., un be a linearly independent vectors. Suppose that
u1 , ..., un is not a basis for V , this means that those vector don’t span V . Thus
there exists un+1 ∈ V \ {u1 , ..., un }. By the previous lemma this implies that
{u1 , ..., un , un+1 } is linearly independent. Since {v1 , ..., vn } is a basis it is a
spanning set with n elements and since {u1 , ..., un , un+1 } is linearly independent
by lemma 2.11.2 we must have n + 1 ≤ n which is a contradiction, which implies
V \ {u1 , ..., un } = ∅, or V = span {u1 , ..., un } and thus u1 , ..., un is a basis for
V.
Theorem 2.13.4. (Completing To A Basis)
Let V be a finitely generated vector space over F, then every linearly independent
set can be completed to a basis.
Proof. Since V is finitely generated V by theorem 2.12.1 there exists a finite
basis for V , so V is finite dimensional. Let n = dim(V ). Let u1 , .., uk ∈ V be
any collection of linearly independent vectors, clearly k ≤ n. If k = n then by
corollary 2.13.3 u1 , ..., uk is already a basis. If k < n then choosing any uk+1 ∈

13
V \ span {u1 , ..., uk } the set {u1 , ..., uk , uk+1 } will be linearly independent. If
n = k + 1 by corollary 2.13.3 we are done. Since dim(V ) = n no set with less
then n vectors can span V so we will continue the process n − k steps each
time obtaining a linearly independent set. After n − k steps the process will
terminate and we will obtain a linearly independent set {u1 , ..., uk , uk+1 , .., un }
of size n which will be a basis.

Sums of Spaces
Let V be a finite dimensional vector space and let U, W be subspaces of V. The
union of U , W will not be a subspace of V unless U ⊂ W or W ⊂ V. The
reason for this is that this set will not be closed to addition otherwise. Indeed
one space is contained in another , their union is the bigger subspace, which is
a space in its own right. Since U ∪ W is a subset of V , it would suffice to show
that it is closed to addition because it clearly contains the zero vector , and is
closed to multiplication by a scalar. Indeed if v ∈ U ∪ W then either v ∈ U or
v ∈ W , since U , W are subspaces, αv ∈ U or αv ∈ W thus αv ∈ U ∪ W. Now
we will show that if U 6⊂ W and U ∪ W is closed to addition then W ⊂ U. In
this case U \ W 6= ∅, so choose any u ∈ U \ W . Now choose any w ∈ W , note
that u + w 6∈ W because if there was a w0 ∈ W such that u + w = w0 , since W is
a vector space with would imply that u = w0 − w ∈ W so u ∈ W contradicting
the assumption that u ∈ U \ W. Thus for every w ∈ W , u + w 6∈ W. Thus if
U ∪ W is closed to addition then for every w ∈ W we must have u + w ∈ U ∪ W ,
which means that u + w ∈ U . Thus for every w ∈ W there exists a vector
u0 ∈ U such that u + w = u0 or w = u0 − u ∈ U so w ∈ U for every w ∈ W
which means that W ⊂ U. This situation when the union of two subspaces is
not a subspace is illustrated in the example below. Consider
n the vector ospace
R2 and its two subspaces, the x-axis , U = R × {0} = (x, 0) x ∈ R and

n o
the y-axis , W = {0} × R (0, y) y ∈ R . The union

U ∪ W = {(x, 0), (0, y)| x, y ∈ R}

14
is not a subspace of R2 because (1, 0), (0, 1) ∈ U ∪ W but
(1, 0) + (0, 1) = (1, 1) 6∈ U ∪ W. As we will prove this will always happen unless
W ⊂ U or U ⊂ W. The question that arises is what is the smallest subspace of
V that contains U ∪ W. Since it is closed to scalar multiplication and contains
the zero vector the necessary condition for this on this subspace is is that it
must contain all the sums of vectors from U with vectors from W . We will show
that this condition is also sufficient.
Let us therefore define the following subset of V ,

U + W := {u + w | u ∈ U w ∈ W}.

This makes sense because both U , W are subspaces of V . Such a definition of


sum of two sets whenever it is defined is common in mathematics.
Theorem 2.13.5. (Subspaces Sum)
Let V be a vectors pace over the field F and let U ,W be subspaces of V . Then
U + W is the minimal subspace of V that contains both U and W.

Proof. We only need to show that U + W is a subspace V , minimality is clear


because any subspace of V that contains U and W is closed to addition and
in particular contains all the possible sum of a vector from U and a vector
from W and thus would contain U + W. Since U + W contains the zero vector,
and and is closed to multiplication by scalar we only need to show that for all
v1 , w2 ∈ U + W holds v1 + v2 ∈ U + W. Indeed by definition of U + W there
exists u1 , u2 ∈ U and w1 , w2 ∈ W such that v1 = u1 + w1 , v2 = u2 + w2 and
thus since U and W are subspaces of V we have

v1 + v2 = (u1 + u2 ) + (w1 + w2 )
| {z } | {z }
∈U ∈W

which proves that v1 + v2 ∈ U + W.


The question that arises is what will be the dimension of the space U, W
and how do we compute it. The following theorem provides us with an answer.
Theorem 2.13.6. Let V be a finite dimensional vector space over the field F
and let U, W be subspaces of V , then

dim(U + W ) = dim(U ) + dim(W ) − dim(U ∩ W ).

Proof. To prove this statement we will construct a basis for U + W. Since V


is finite dimensional, then so are all its subspaces. Since both U , W are sub-
spaces of V , then so is U ∩ W by proposition 2.6.3. Let us denote n = dim(U ),
m = dim(W ), r = U ∩ W. Let v1 , ..., vr be a basis for U ∩ W , then we can
complete it to a basis of U so that v1 , ..., vr , ur+1 , ..., un is a basis for U . Simi-
larly we can complete v1 , ..., vr to a basis of W so that v1 , ..., vr , wr+1 , ..., wm is
a basis for W . We will show that v1 , ..., vr , ur+1 , ..., un , wr+1 , ..., wm is a basis
for U + W. For that we need to show that this set is spanning and is linearly

15
independent.
Spanning: we need to show that every v ∈ U + W can be written as a linear
combination of those vectors. Since v ∈ U + W there exists u ∈ U and w ∈ W
such that v = u + w. Since we have a basis for U and W we can write

u = α1 v1 + ... + αr vr + βr+1 ur+1 + ... + βn un


v = γ1 v1 + ... + γr vr + δr+1 wr+1 + ... + δm wm
which means that
r
X
u+v = (αi + γi )vi + βr+1 ur+1 + ... + βn un + δr+1 wr+1 + ... + δm wm
i=1

This shows that the set is spanning. Now we need to show linear Independence.
Indeed suppose that

α1 v1 + ... + αr vr + βr+1 ur+1 + ... + βn un + γr+1 wr+1 + ... + γm wm = 0.

α1 v1 + ... + αr vr + βr+1 ur+1 + ... + βn un = −γr+1 wr+1 − ... − γm wm .


| {z } | {z }
∈U ∈W

Since the left hand side vector equals to the right hand side vector, and the left
hand side is in U and the right hand side is in W we conclude that

−γr+1 wr+1 − ... − γm wm ∈ U ∩ W.

This vector can therefore be written as a linear combination of the basis for
U ∩ W as
δ1 v1 + ...δr vr = −γr+1 wr+1 − ... − γm wm

16
which is equivalent to
δ1 v1 + ...δr vr + γr+1 wr+1 + ... + γm wm = 0
but since {v1 , ..., vr , wr+1 , .., wm } is a basis for W it is in particular linearly
independent and thus δ1 = ... = δr = γr+1 = ... = γm = 0. This implies that
α1 v1 + ... + αr vr + βr+1 ur+1 + ... + βn un = 0
and since v1 , ..., vr , ur+1 , ..., un is a basis for U it is in particular linearly inde-
pendent which implies α1 = ... = αr = βr+1 = ... = βn = 0. This proves that
the set v1 , ..., vr , ur+1 , ..., un , wr+1 , ..., wm spans U + W and is linearly indepen-
| {z } | {z }
n vectors m-r vectors
dent which means that it is a basis for U + W which contain n + m − r elements.
Recalling the definition of r, n, m we obtain that
dim(U + W ) = dim(U ) + dim(W ) − dim(U ∩ W ).

This marvelous formula leads us to consider a particularly interesting case of


sum of subspaces. Wouldn’t it be nice if for two subspaces U , W of V we had
dim(U + W ) = dim(U ) + dim(W ). Observing the formula we just proved we
see that it actually can happen when dim(U ∩ W ) = 0, since any two subspaces
contain the zero vector it implies that U ∩ W = {0} . This space is indeed zero
dimensional , and this is the minimal possible intersection between any two
subspaces of the same space. This special case is very important and deserves
its own name.
Definition 2.14. Let V be a vector space over a field F and U , W its subspaces.
We say that V is the direct sum of U and W and write V = U ⊕ W if:
1) V = U + W ,
2) U ∩ W = ∅
As the next proposition shows the case of a direct sum removes a certain
redundancy, moreover as it is clearly seen if V = U ⊕ W then
dim(V ) = dim(U ) + dim(W ).
Proposition 2.14.1. Let V be a vector space and let U, W be subspaces of V ,
then every vector v ∈ V there exists a unique u ∈ U and a unique w ∈ W such
that v = u + w.
Proof. Since V = U ⊕ W this means in particular that V = U + W so for every
v ∈ V we can find u ∈ U and w ∈ W such that v = u + w. It remain to show
uniqueness. Suppose that v = u + w = u0 + w0 . This implies
u − u0 = w0 − w,
| {z } | {z }
∈U ∈W
0 0
so u − u ∈ U ∩ W and w − w ∈ U ∩ W. Since V is a direct sum of U and W
we have U ∩ W = {0} and thus u = u0 and w = w0 .

17
3 Linear Transformations
This is the most important part of the course. As you may know from experi-
ence studying functions and properties of functions from R to R for example is
more important than studying the properties of R itself. The question is what
are the types of maps that are most natural to study between vector space. On
each vector space there is additional layers of structure on top of the set itself.
There is scalar multiplication and addition of vectors, so it will be natural to
consider mapping between vector spaces that preserve this structure.

Definition 3.1. Let V ,W be vector spaces over the field F. A linear T from V
to W is a function T : V → W that satisfies , for all v1 , v2 , v ∈ V and all α ∈ F:

T (v1 + v2 ) = T (v1 ) + T (v2 ),

T (αv) = αT (v).
Note that the addition on the left hand side is in V and it is the addition in W on
the right hand side. The same is with the scalar multiplication. The following
images illustrate the naturality in the definition of linear transformations.

That is if T takes v1 to w1 , and v2 to w2 , then in order to preserve the addition


structure on the vector spaces it must take v1 + v2 to w1 + w2 .
Similarly fixing some scalar α ∈ F and denoting by µα : V → V the multiplica-
tion by the scalar α, µα (v) = αv then we have the following picture.
This is called a commutative diagram , an important concept that is ubiquitous
thought mathematics. This diagram express the property that T (αv) = αT (v).
With every linear transformation T : V → W there is an important subspace of
V and a subspace of W associated with it. The kernel of T defined by

Ker(T ) = { v ∈ V | T (v) = 0 } ⊂ V,

and the image of T defined by

Im(T ) = { T (v) | v ∈ V } ⊂ W.

18
Proposition 3.1.1. Let V ,W be vector spaces over the field F and let T : V →
W be a linear transformation. Then Ker(T ) is a subspace of V and Im(T ) is
a subspace of W .

Proof. First note that 0 ∈ ker(T ). Then T (0) = T (0 + 0) = T (0) + T (0), adding
−T (0) to both sides we obtain T (0) = 0, so 0 ∈ ker(T ). Now for all u, v ∈ V
and all α, β ∈ F we have

T (αu + βv) = αT (u) + βT (v) = 0

thus αu + βv ∈ Ker(T ) which shows that ker(T ) is a subspace of V .


Since for any linear transformation T holds T (0) = 0 we have 0 ∈ Im(T ). Now
suppose that w1 , w2 ∈ Im(T ), then there exists v1 , v2 ∈ V such that T (vi ) = wi ,
i = 1, 2. Therefore for every α, β ∈ F we have

αw1 + βw2 = αT (v1 ) + βT (v2 ) = T (αv1 + βv2 )

which proves that αw1 + βw2 ∈ Im(T ) and that Im(T ) is a subspace of W .
By definition we see that T : V → W is surjective if and only if Im(T ) = W.
What can we say about the kernel?
Proposition 3.1.2. Let V ,W be vector spaces over the field F and let

T :V →W

be a linear transformation. Then T is injective (one to one) if and only if


Ker(T ) = {0} .

19
Proof. Suppose that T is one to one, since T (0) = 0, then for every v ∈ V
such that T (v) = 0 we must have , v = 0 so Ker(T ) = {0} . Conversely suppose
Ker(T ) = {0} . Suppose that T (v1 ) = T (v2 ), this is equivalent to T (v1 −v2 ) = 0
which means v1 − v2 ∈ Ker(T ). Since Ker(T ) = {0} we have v1 − v2 = 0 or
v1 = v2 .
Both of the properties mentioned above are special case of the following
important theorem.
Theorem 3.1.3. Let V ,W be vector spaces over the field F and let T : V → W
be a linear transformation. Then

dim(Im(T )) + dim(Ker(T )) = dim(V ).

Proof. Let us choose a basis for Ker(T ) denote it by v1 , ..., vk , and complete it
to a basis of V by vk+1 , ..., vn so that v1 , ..., vn is a basis for V . We will show
that the n − k vectors wk+1 = T (vk+1 ),...,wn = T (vn ) are a basis for Im(T ).
Then the result follows at once because dim(V ) = n, dim(Ker(T )) = k, and
dim(Im(T )) = n−k, which means that dim(Im(T ))+dim(Ker(T )) = dim(V ).
Note that for every w ∈ W there exists v ∈ V such that w = T (v). We can
write the vector v as a linear combination of the basis so
n
! k n
!
X X X
T (v) = T αi vi = T ( αi vi ) + T αi vi
i=1 i=1 i=k+1
| {z }
∈Ker(T )

n
X
= αi wi .
i=k+1

which shows that wk+1 , ..., wn span Im(T ).


It remains to show that this set is linearly independent. Suppose that
n
X
αi wi = 0,
i=k+1

This means that !


n
X
T αi vi = 0.
i=k+1

Which implies that


n
X
αi vi ∈ Ker(T ).
i=k+1

We can therefore write it as a linear combination of the basis elements for


Ker(T ) as follows:
Xk Xn
αi vi = αi vi
i=1 i=k+1

20
or
α1 v1 + ... + αk vk − αk+1 vk+1 − ... − αn vn = 0,
since v1 , ..., vn is a basis for V it is a linearly independent set which means

α1 = ... = αk = ... = αn = 0.

In particular this implies that wk+1 , .., wn is linearly independent and is therefore
a basis for Im(T ).

Corollary 3.1.4. Let T : V → W be a linear transformation , and suppose that


dim(W ) = dim(V ). Then T is one to one if and only if T is onto.
Proof. Since dim(Im(T )) + dim(Ker(T )) = dim(V ) is T is one to one then
Ker(T ) = {0}, so dim(Ker(T )) = 0. Which implies

dim(Im(T )) = dim(V ) = dim(W ).

Since Im(T ) ⊂ W and the spaces have the same dimension we conclude that
Im(T ) = W and thus T is onto. Similarly if T is onto , then Im(T ) = W
so dim(Im(T )) = dim(W ) = dim(V ) which implies dim(Ker(T )) = 0. This
means that Ker(T ) = {0} and therefore T is one to one.

The next theorem shows that the values of a linear transformation on a basis
uniquely define the transformation on the entire space.
Theorem 3.1.5. (Existence and Uniqueness of Linear Transformations) Let V
and W be vector space over the field F. Let {v1 , ..., vn } be any basis for V , and
let w1 , ..., wn ∈ W be any n vectors in W . Then there exists a unique linear
transformation T : V → W that satisfies T (vi ) = wi , for every i = 1, 2, ..., n.
Pn
Proof. Let v ∈ V and write v = i=1 αi vi and define T : V → W by
n
X
T (v) = αi wi .
i=1

Note that T is linear: indeed let v1 , v2 ∈ V and α, β ∈ F then we can write


uniquely
Xn
v1 = αi vi
i=1
n
X
v2 = βi vi
i=1

Then
n
X
T (αv1 + βv2 ) = (ααi + ββi )wi = αT (v1 ) + βT (v2 ).
i=1

21
Moreover T (v1 ) = wi . finally is Te is another
Pn linear operator that satisfies T (vi ) =
e
wi then write each v uniquely as v = i=1 αi vi by linearity it follows that
n
X n
X
Te(v) = αi Te(vi ) = αi wi = T (v).
i=1 i=1

One of the main goals of algebra is to study and classify algebraic structures. We
would like to find a definition to the case when two vector spaces are essentially
the same. Similarly to the situation when when say exactly the same thing
but in a different language. If we continue with this analogy to vector spaces
V ,W we need a mapping that will be one and one and onto T : V → W, if in
addition this map is also linear then it also preserves the structure. That is if
T : V → W is a linear bijective map then instead of calling a vector v we can
call it T (v) and instead of calling the vector αv we can call it αT (v), and we
call the sum v1 + v2 by T (v1 ) + T (v2 ). With this identification the vector spaces
V and W are essentially the same. Calling the vector v by the name T (v) we
see that V and W are the same up to renaming each vector in V to the name
name T (v) which is a vector and W . Thus when when have have a bijective
linear map T : V → W we have to vector spaces that are completely identical
up to renaming of its vectors. This has a spacial name, we say that the spaces
V and W are isomorphic. In Greek iso means same and morph means shape.
Definition 3.2. Let V ,W be vector spaces over the field F, we say that V and
W are isomorphic if there exists a bijective linear transformation T : V → W ,
we say that T is an isomorphism.
Let V be a finite dimensional vector space over the field F once we fix a ba-
sis for V say B = {v1 , ..., vn } by proposition 2.11.1 there exists unique scalars
α1 , ..., αn ∈ F such that
Xn
v= αi vi .
i=1

If we know the basis B those unique scalars encode all the information regarding
v , and allows us to recover v easily from them by forming the required linear
combination. Let us define
 
α1
 .. 
[v]B =  .  ∈ Fn .
αn

This vector is called the coordinates vector of v ∈ V with respect to the basis B.
Thus for every basis B of V there exists therefore a bijective map ϕB : V → Fn
defined by
ϕB (v) = [v]B .

22
The map ϕB : V → Fn is linear , and since for each vector there corresponds
a unique coordinate vector in a basis B, this map is actually an isomorphism.
This is a very important observation because however complicated and abstract
the n dimensional vector space V may be it is isomorphic to the simple concrete
vector space Fn . That is those vectors spaces are the same. In fact we have
just proved the following classification theorem for all finite dimensional vector
spaces.
Theorem 3.2.1. Let V be an n dimensional vector space over the field F, then
V is isomorphic to Fn .

23
Remember the following picture where the vector space V sits on top,

the simpler vector space Fn is at the bottom and there is a bijective map iden-
tifying the two. Such a picture arises often thought mathematics. This is a
key observation because for all ”practical purposes” we can think of any vector
v ∈ V in any vector space (complicated and abstract as it may be) as a tuple
of n-scalars , that are its components with respect to the chosen basis. What a
striking resemblance to the case of the well familiar space Rn !!!

Transition From Basis to Basis


Suppose that V is a vector space and B = {v1 , ..., vn } and B 0 = {u1 , ..., un } are
two bases for V . For each v ∈ V we can assign it vector coordinates in the basis
B and it coordinate vector in the basis B 0 . The question is what is the relation
between [v]B and [v]B 0 ? Let
   
α1 β1
 .. 
[v]B =  .  and [v]B 0 =  ... 
 

αn βn

be the vector coordinates of v in the bases B and B 0 respectively. To find the


relation write each vi ∈ B as a linear combination of the elements of the basis
B 0 . That is for each i there exists coefficients aij such that
n
X
vi = aij uj .
j=1

Now each vector v ∈ V can be written as


n
X n
X
v= α i vi = βj uj .
i=1 j=1

24
If we substitute the expression for each vi in terms of u0j s we get
  !
Xn n
X Xn Xn Xn
v= αi vi = αi  aij uj  = aij αi uj .
i=1 i=1 j=1 j=1 i=1

The last equality arises from rearranging the order of summation. In the last
summation we also have v expressed as a linear combination of u0j s, and since
B 0 is a basis this representation is unique so we must have for each j:
n
X
βj = aij αi .
i=1

Note that we sum over the row index. Therefore if we define the matrix
 T
a11 ... a1n
PBB0 =  ... .. 

. 
an1 ... ann

The matrix PBB0 is called the transition matrix from basis B to B 0 . Note that

[v]B 0 = PBB0 [v]B .

This means that for each vector v the matrix PBB0 takes the vector coordinates of
a vector v in basis B to its coordinates in the basis B 0 . Note that this notation
for the transition matrix is convenient because when we observe the formula
[v]B 0 = PBB0 [v]B it is as though the B on the top and the B on the bottom
on the left hand side of the equation is being reduced. Such notation trick
reminds the common practise and it is easier to avoid mistakes and see when
certain expression make sense. We have the following commutative diagram:
The space Fn is represented in the diagram as the coordinates space. Since the

diagram is commutative and ϕB ,ϕB 0 are isomorphisms we see that for every
x ∈ Fn we have
PBB0 x = (ϕB 0 ◦ ϕ−1
B )x.

25
Such a picture is important and common in the context of topological mani-
folds and algebraic manifolds. Even through elementary mathematics we can
experience the flavor of the advanced topics. Clearly the matrix PBB0 must be
0
invertible because by the same procedure we can also compute the matrix PBB .
using the property of those matrices for each v ∈ V let us compute
0 0 0
(PBB PBB0 )[v]B = PBB PBB0 [v]B = PBB [v]B 0 = [v]B = I[v]B .


Since for every x ∈ Fn there exists v ∈ V s.t x = [v]B we conclude that for all
x ∈ Fn holds 0
(PBB PBB0 )x = Ix,
it follows that 0
PBB PBB0 = I
and similarly 0
PBB0 PBB = I
0
thus PBB is an invertible matrix and
0
(PBB )−1 = PBB0 .
This shows another advantage of this notation, the notation of the inverse matrix
is just filliping the order of B and B 0 . Moreover every invertible matrix P can
be thought of as being the transition matrix from one basis to another. For
example if we choose the standard basis B = {e1 , ..., en } for Fn and P is any
invertible matrix with entries in F, then P is theP transition matrix from the
n
basis B to the basis B 0 = {u1 , ..., un } where ui = j=1 Pji ej . It can be easily
0
shown that if P is invertible then B is indeed a basis for Fn . The importance
of this observation will become apparent soon.

Representing Matrix of a Linear Transformation


Let V , W be vector spaces over the field F and let T : V → W be a lin-
ear transformation. By the existence and the uniqueness theorem for linear
transformations we know that if we fix a basis BV = {v1 , ..., vn } for V , then
T is uniquely determined by its values on T (vi ) for i = 1, .., n. The elements
T (vi ) lie in the vector space W . Suppose that we also fix a basis for W say
BW = {w1 , ..., wm } . Then each T (vi ) can be expressed as a linear combination
of the elements of BW . This means that there exist unique elements aij ∈ F
such that for all i we have
Xm
T (vi ) = aij wj .
j=1

Now suppose that v ∈ V is an arbitrary vector , and it vector coordinates in


BV is given by [v]B = (α1 , ..., αn )T . Then
V
!   !
Xn n
X Xn Xm m
X n
X
T (v) = T αi vi = αi T (vi ) = αi  aij wj  = aij αi wj
i=1 i=1 i=1 j=1 j=1 i=1

26
Now let [T (v)]B = (β1 , ..., βm )T . Since every vector can be written uniquely
W
as a linear combination of the basis elements we conclude that for every j we
have
Xn
βj = aij αi
i=1

define the m × n matrix


 T
a11 ... a1n
=  ... .. 
B
[T ]BV

W . 
am1 ... amn

we see that B
[T (v)]BW = [T ]BV [v]B .
W V
B
V
The matrix [T ]B is called the representing matrix of T with respect to the
W
base BV and BW . This connection between a linear transformation and
a multiplication by a matrix is the most important concept that is
taught in the first undergraduate linear algebra course!!! What we have
proven shows that however complicated the vector spaces V , W may over F be,
once we fix a bases BV and BW for W , the space V is isomorphic to Fn and W
is isomorphic to Fm and we can think any linear T : V → W as a multiplication
B
by the m × n matrix [T ]BV as the following commutative diagram illustrates:
W

This diagram shows that for all practical purposes once we fix bases BV and
BW for V and BW we can think of each vector in the spaces as its vector
coordinates , and of each linear transformation as its representing matrix in
those bases. perform all the computation we need in the ”simple” spaces Fn
and Fm and then if we need we can always recover the vectors in the original
vector spaces. The important lesson to be learned from the discussion above

27
is that every multiplication of a vector by matrix is a linear transformation ,
and every linear transformation between any two vector spaces can be uniquely
represented as a multiplication of a vector by a matrix once some bases were
fixed for the spaces.

Linear Operators
Now consider the case where V is a vector spaces over the field F and T : V → V
is a linear transformation, this important special case has a name, T is said to
be a linear operator. In this case we need to fix only one basis for the space
V , say B and we will be able to find the unique matrix representing T in this
B
basis. In stead of writing [T ]B we will write [T ]B or TB for simplicity. There
are three related questions that are related to each other arise.
1) How does TB depend on the choice of the basis?
2) If we choose another basis B 0 , how TB and TB 0 are related? (The must be
related because they represent the same linear transformation!)
3)Is there a preferable choice of basis so that TB has the ”simplest possible
form”?
We will answer all the questions by examining the relation between TB and TB 0 .
For every basis B we have

[T (v)]B = TB [v]B .
B
Multiplying both sides of the equation by PB0 we obtain
B
[T (v)]B 0 = PB0 TB [v]B
0
Using the qualities [T (v)]B0 = [T ]B0 [v]B 0 and [v]B = PBB [v]B 0 we obtain
B B 0
[T ]B0 [v]B 0 = [T (v)]B 0 = PB0 TB [v]B = (PB0 TB PBB )[v]B 0 .

We therefore conclude that for every v ∈ V we have


B 0
[T ]B0 [v]B 0 = (PB0 TB PBB )[v]B 0

which implies
B 0 B0 0
[T ]B0 = PB0 TB PBB = (PB )−1 TB PBB .
We have thus obtained the answer to the first and the second questions and
derived the exact relation. Since every invertible matrix can be thought of as
transition from basis to basis we arrive at the following IMPORTANT conclu-
sion. Two square matrices A1 ,A2 of the same dimension represent the same
operator in two different bases if and only if there exist an invertible matrix P
such that
A2 = P −1 A1 P.
This motivates the following definition.

28
Definition 3.3. Let A, B ∈ Fn×n the A and B are said to be similar if there
exists an invertible matrix P such that

B = P −1 AP.

The observation that similar matrices represent the same operator suggest that
they have some common properties ,those properties would be some properties
of the operator that they represent , that are invariant and are independent of
the choice of the basis. This is indeed the case.
Proposition 3.3.1. Suppose that A, B are similar square matrices then :
1) r(A) = r(B)
2) tr(A) = tr(B)
3) det(A) = det(B).

Proof. 1) idea of the proof: A and B represent the same operator T so

dim(Im(T )) = r(A) = r(B).

2)Since for every two matrices tr(AB) = tr(BA) and matrix multiplication is
associative we have

tr(B) = tr((P −1 A)P ) = tr(P (P −1 A)) = tr((P −1 P )A) = tr(A).

3) Since for any two square matrices A, B we have

det(AB) = det(A)det(B) = det(B)det(A) = det(BA)

we conclude that

det(B) = det((P −1 A)P ) = det((P P −1 )A) = det(A).

Finally we begin to answer the last question. How do we choose a basis B such
that TB is as simple as possible. Suppose we have chosen some basis B, then
the matrix representing T in a basis B 0 will be TB 0 = P −1 TB P where P is the
transition matrix from B 0 to B. So the question reduces to the following one :
given a matrix A find an invertible matrix P such that P −1 AP is as ”simple as
possible”. The question is what do we mean by as simple as possible? Clearly
the zero matrix is the simplest matrix. Can we choose P such that for a given
A we will have P −1 AP = 0? No, unless A = 0 to begin with. That is it
will work only if A represents the operator that is identically zero. What is
the next simplest thing to try? what about a scalar matrix of the from λI. If
P −1 AP = λI then gain since the scalar matrices commute with every matrix
we are led to the conclusion that in this case A = λI which is again possible
if and only if A represents the operator which is simply a multiplication by a

29
scalar. The simplest from that we can hope to achieve is therefore a diagonal
matrix  
λ1 ... 0
D =  ... . . . ... 
 

0 ... λn
We therefore arrive at the diagonalization problem.
Given a matrix A find an invertible matrix P and a diagonal matrix D such
that
P −1 AP = D.
The first question that we should ask ourselves is it always possible. Let us first
examine the necessary conditions for this. We need to find an invertible matrix
P and a diagonal matrix D such that

AP = DP.

Now let us write P = (v1 |...|vn ) where v1 , ..., vn are column vectors of P . Since
P is invertible the vector v1 , ..., vn are linearly independent and therefore form
a basis for Fn . We therefore have the equation

A(v1 |...|vn ) = (v1 |...|vn )diag(λ1 , ..., λn ).

Now from properties of matrix multiplication we coclude that the i column


of A(v1 |...|vn ) is Avi and the i-th column of (v1 |...|vn )diag(λ1 , ..., λn ) = λi vi .
A necessary condition that must be satisfied by the i column vector vi of the
matrix P is
Avi = λi v.
We see that on the columns of P A acts in a particularly simple way, A just
scales those vectors by some constant. This leads to the following definition.
Definition 3.4. Let A ∈ Fn×n be a square matrix, a vector v ∈ Fn is said to
be an eigenvector of A if there exists a constant λ ∈ F such that

Av = λv.

This λ is called an eigenvalue of the matrix A.


Using the terminology just introduced we see that we have actually proved the
following theorem which is basically a redefinition of diagonalization.
Theorem 3.4.1. A matrix A ∈ Fn×n is diagonalizable if and only if there exists
a basis for Fn that consist from eigenvectors of A.
The theorem above is just a reformulation of the definition, but it doesn’t say
how to find eigenvectors or eigenvalues of a matrix A , and it doesn’t say if those
exist at all! Consider the equation Av = λv where v is (supposed to be a basis
element) therefore v must be a nonzero vector! This equation is equivalent to
the equation
(λI − A)v = 0.

30
This means that the homogeneous equation above has a nonzero and hence a
nontrivial solution, which can happen if and only if the matrix λI −A is singular
, which is true if and only if

det(λI − A) = 0.

If we were to compute this expression we would obtain a polynomial in λ. This


leads us to the following definition.
Definition 3.5. Let A be a square matrix, the polynomial

∆A (t) = det(tI − A),

is called the characteristic polynomial of A.


Using the definition above we arrive to the conclusion that λ is an eigenvalue of
the matrix A is and only if it is a root of ∆A (t), the characteristic polynomial
of A. That is finding the eigenvalue of A is equivalent to finding the roots of a
polynomial. We also have the following proposition.
Proposition 3.5.1. Let A be a square matrix, then to each eigenvalue of A
there corresponds at least one eigenvector of A.
Proof. Let λ be an eigenvalue of A, then λ is a root of the characteristic poly-
nomial of A, that is ∆A (λ) = 0 which means that det(λI − A) = 0 which means
that the matrix λI − A is singular , which means that there exists v 6= 0 that
solves the homogeneous equation. That is Av = λv and thus v is an eigenvector
of A.
Here is another property that is common for similar matrices.
Similar matrices have the same characteristic polynomial in particular similar
matrices have the same eigenvalues. Thus eigenvalues of a linear operator are
its intrinsic property that are independent of the basis we choose to represent it.

Proposition 3.5.2. Let A, B be similar matrices, then

∆A (t) = ∆B (t).

Proof. Since A and B are similar there exists an invertible matrix P such that
P −1 AP = B. We therefore compute

∆B (t) = det(tI − B) = det(tP −1 P − P −1 AP ) =

= det(P −1 (tI − A)P ) = det(tI − A) = ∆A (t).

31
The problem that we encounter is that the characteristic polynomial of the ma-
trix with coefficients in F may have no root in F. For example the characteristic
polynomial of the matrix  
0 −1
A=
1 0
is  
t 1
∆A (t) = det(tI − A) = det = t2 + 1
−1 t
has no roots in the field R. So the simple answer would be is that the matrix A
is not diagonalizable over R.
This may be a disappointing answer however if we really don’t want to encounter
this problem when we try to diagonalize a matrix is to assume that we are
working over a field in which every polynomial has a root. In such a field the
division algorithm for polynomials will imply that polynomial of degree n has
exactly n roots when they are counted with their multiplicity (that is the root
are not necessarily all distinct!)
Definition 3.6. A field F is said to be algebraically closed if every polynomial
with coefficients in F has a root in F.
If we wish we may consider every matrix with entries in R as a matrix over C. In
that case the fundamental theorem of algebra guaranties we will not encounter
the problem where the characteristic polynomial doesn’t a root.
Theorem 3.6.1. (Fundamental Theorem of Algebra)
The field of the complex numbers C is algebraically closed.
The proof of this theorem is beyond the scope of a standard undergraduate
level firs linear algebra course. As a remark we will quote another theorem to
the case where the field F may not be R. We will not even formulate it as a
theorem and just say that very field F can be extended to an algebraically closed
field.
From now on we will assume that all the matrices that we consider are over C
which is algebraically closed , thus for each n × n matrix A there exist exactly n
eigenvalues when counted with multiplicity , that is not all eigenvalues are nec-
essarily distinct! as we will see this is the main obstacle to the diagonalizability
of a matrix. To prove it consider the following observation: Let A be a matrix
and v an eigenvector of A that corresponds to the eigenvalue λ. Then for every
k ∈ N we have
Ak v = λk v.
Which means that for every polynomial p we can define the matrix p(A). Then
it is easily seen that.
Proposition 3.6.2. For every polynomial p, and an eigenvector v of A that
correspond to an eigenvalue λ we have

p(A)v = p(λ)v.

32
With this observation we can now produce a BEAUTIFUL proof of the following
theorem.
Theorem 3.6.3. Let F be a field and let A ∈ Fn×n be a matrix. Then eigenvec-
tors of A that correspond to distinct eigenvalues of A are linearly independent.
Proof. Let v1 , ..., vk be eigenvectors of A that correspond to distinct eigenvalues
λ1 , ..., λk with λi 6= λj if i 6= j. Define the Lagrange polynomials
n
Y x − λj
pi (x) =
λi − λj
j=1,j6=i

Since all the eigenvalues are distinct this polynomial is well defined. Moreover
for each l 6= i we have
pi (λl ) = 0
and
pi (λi ) = 1.
Now suppose that
k
X
αl vl = 0
l=1

then applying the matrix pi (A) to both sides we obtain


k
! k k k
X X X X
0 = pi (A) αl vl = αl pi (A)(vl ) = αl pi (λl )vl = αl δil vl = αi .
l=1 l=1 l=1 l=1

This means that for every i we have αi = 0 which shows that the vectors v1 , ..., vk
are linearly independent.
Corollary 3.6.4. Let F be a field and let A ∈ Fn×n be a matrix that has n
distinct eigenvalues, then A is diagonalizable.
Proof. For each eigenvalue λi of A there corresponds at least one eigenvector
of A. Since eigenvectors that correspond to distinct eigenvalues are linearly
independent , this means that we have n linearly independent vectors. Since
dim(Fn ) = n the eigenvectors of A for a basis for Fn and thus A is diagonalizable.

This corollary shows that over an algebraically closed field the only reason that
A may not be diagonalizable is that not all its eigenvalues are distinct. Let
∆A (t) be the characteristic polynomial of A and let λ1 , ..., λm be its distinct
eigenvalues. Then we can decompose ∆A (t) as

∆A (t) = (t − λ1 )k1 · ... · (t − λm )km

where
k1 + ... + km = n.

33
The integer number ki is said to be the algebraic multiplicity of the root λi .
Suppose that A ∈ Fn×n has only k < n distinct eigenvalues. For each eigenvalue
we have at least one (up to multiplication by nonzero scalar) solution to (λi I −
A)v = 0. The problem is that it may not be enough. Let λi be an eigenvalue of A
and consider the space Vλi = Ker(λi I − A). This space is called the eigenspace
of λi . The geometric multiplicity of λi is defined by

µG (λi ) = dim(λi I − A).

If for each λi with algebraic multiplicity ki we could find ki linearly independent


solutions to (λi I−A)v = 0 , we could conclude that A is diagonalizable. However
this is not always the case and there are matrices that connot be diagonalized.
For example consider the matrix
 
1 1
A= .
0 1

Its characteristic polynomial is


 
t−1 −1
∆A (t) = det(tI − A) = det = (t − 1)2 .
0 t−1

We see that λ = 1 is an eigenvector of A with algebraic multiplicity 2, but its


geometric multiplicity is
  
0 1
dim Ker = 1.
0 0

This means that the matrix A is not diagonalizable even over C because we cant
find enough eigenvectors of A to form a basis for C2 . This situation is a typical
as the following theorem shows

Theorem 3.6.5. Let A be a matrix with eigenvalue λ. Then the geometric


multiplicity of λ is smaller or equal to the algebraic multiplicity of λ.
Proof. Let {v1 , ..., vk } be a basis for the space Ker (λI − A), we complete it to
a basis of Rn by {v1 , ..., vk , ..., vn } . Then the matrix representing A in this basis
has the form
 
λIk | Bk×(n−k)
[T ]B =  − − − −−− −−− .
0(n−k)×k | C(n−k)×(n−k)

This means that the algebraic multiplicity of λ is at least k because λ may be


an eigenvalue of C(n−k)×(n−k) .
Corollary 3.6.6. Let F be an algebraically closed field and let A ∈ Fn×n , then A
is diagonalizable if and only if for every eigenvalue of A its algebraic multiplicity
is equal to its geometric multiplicities.

34
Equivalently we have the following theorem :
Corollary 3.6.7. Let F be an algebraically closed field and let A ∈ Fn×n . Let
λ1 , ..., λk be the distinct eigenvalues of A. Then A is diagonalizable if and only
if the space Fn decomposes as the direct sum

Fn = Vλ1 ⊕ ... ⊕ Vλk .

35

You might also like