Panoramic View of Linear Algebra
Panoramic View of Linear Algebra
1 Introduction
Those lecture note will help you prepare to an exam of a standard first linear
algebra course. If you are preparing for an exam and have the general knowledge
of the material , or want to refresh forgotten topics those lecture notes are
perfect for you. This short survey will help you organize the material in a
hierarchical logical order and obtain deep understanding of the material. In
this survey we skip the trivial material regarding matrix multiplication, Gauss-
Jordan elimination, and solutions to system of equations, etc... We begin with
the abstract notion of vector spaces.
2 Vector Spaces
Vector space is defined over a field , the algebraic structure with 11 axioms that
students have hard to remember and often don’t even bother to. I would like
to show you a method to remember and understand the axioms. It is related
to an important concept in mathematics which is called a group, this concept
if often falls outside of the scope of this introductory course but knowing it has
great benefits.
Definition 2.1. (Group)
A group is a nonempty set G endowed with a binary operation which we will
denote by · such that the following conditions are satisfied:
1)For all a, b ∈ G holds a · b ∈ G.
2)For all a, b, c ∈ G holds a · (b · c) = (a · b) · c.
3)There exist a special element e ∈ G such that for all a ∈ G holds a·e = e·a = a
4) For every element a ∈ G there exists an element an element a−1 ∈ G such
that a · a−1 = a−1 · a = e.
1
that AB = BA for square matrices A,B. If the matrices are not square then
it could be that while AB is defined the product BA is not defined. In non-
commutative groups the binary operation is denoted by the multiplicative sign
· , and the identity element of the group is often denoted by 1. Groups that
posses the commutativity property have a special name , they are called abelian
group , after the Norwegian mathematician Niels Henrik Abel.
Definition 2.2. A group G is said to be abelian if for every a, b ∈ G holds
ab = ba.
In abelian groups the binary action is often denoted by +, and the identity
element is denoted by 0. A more rigorous definition of a group G would be
In the definition above the function µ is actually the binary operation. We can
then say that we write µ(a, b) as ab and ι(a) = a−1 , to get the previous defini-
tion. If you think about it, the first function that we ever learn about in the
first grade is a function of two variables. That’s right the addition operation is
a function of two variables, it takes two numbers and assigns a third number to
them which is their sum. When we want to explicitly specify a group , we need
to specify the set G, the binary operation , and the identity/neutral element.
This definition may seem a bit abstract and hard to grasp, but learning and un-
derstanding it is worth your time, groups are ubiquitous thought mathematics,
physics , chemistry , and computer science.
Here are a few examples :
1) (Z, +, 0) is an abelian group.
2) (Q, +, 0) is an abelian group.
3) (Q \ {0} , n
·, 1) is an abelian ogroup.
4) Let S = z ∈ C |z| = 1 , then (S1 , ·, 1) with the ordinary multiplication
1
of complex numbers is an abelian group. n o
5) Let X be a set and let S(X) = f f : X → X f is a bijection . Then
2
6) Let GL(n, R) be the set of all invertible square n × n matrices then G is a
group that is not abelian.
a · (b + c) = ab + ac.
Just like that we have easily stated all the 11 axioms of a field. Thus a shorter
definition of the field is.
Definition 2.4. Let F be a nonempty set with endowed with two binary op-
erations which will be denoted by ” + ” and ” · ”, which are called addition
and multiplication respectively , and two special distinct elements 0, 1 ∈ F such
that:
1) (F, 0, +) is an abelian group. (5 axioms)
2) (F \ {0} , ·, 1) is an abelian group.(5 axioms)
3) for all a, b, c ∈ F holds: a · (b + c) = ab + ac.
Note how effortlessly we stated all the 11 axioms of a field in just 3 lines!
Vector Spaces
The first jump in the level of abstractness in a standard linear algebra course ap-
pears with introduction of the notion of vector spaces. From concrete and simple
example of matrices, matrix multiplication , and Gauss-Jordan elimination, the
transition to an abstract entity such as a vector space that is defined over an
arbitrary field can be quite hard. However this abstraction has significant ad-
vantages. When the notion of a vector space was defined , mathematicians had
in mind a concrete space R3 with which we are familiar and have a good intu-
itive understanding of it. We would therefore like to extract and understand the
essential properties of this space, that make so important in describing physics.
The advantage of this abstraction is that whenever we prove a theorem about
an abstract vector space, we also prove a theorem for an infinitely many other
concrete spaces.
Let us first review the essential properties of R3 . First of all we can add vectors,
3
and this addition satisfies lots of nice properties , in our new terminology we
can say that (R3 , +, 0) is an abelian group. These turn out to be the first 5
axioms of an abstract vector space. We can also multiply a vector by scalars ,
that is element of R. The real numbers R are a field , and this product of vector
by scalars satisfy the following nice properties:
1) For every α ∈ R and v ∈ R3 , holds αv ∈ R3 .
2) For every α, β ∈ R and v ∈ R3 holds α(βv) = (αβ)v.
Note that the product (αβ) is taken in the field and then applied to the vector v,
whereas on the left hand side we have mixed products of scalars with a vector!
3)For all α ∈ R and v, w ∈ R3 , holds α(v + w) = αv + βw.
4)For all v ∈ R3 holds 1 · v = v.
5) For all α, β ∈ R and v ∈ R3 holds (α + β)v = αv + βv.
The five properties mentioned above define all the possible interactions between
the scalars in R and the vectors in R3 . The scalar can be thought of as acting on
vectors by scaling them up. This is an interaction between objects of different
type. Taking the first five conditions that state that (R3 , 0, +) is an abelian
group, together with the five properties just mentioned that describe the inter-
action between scalars and vectors , we obtain the ten axioms of a vector space.
Replacing R by an arbitrary field F and replacing R3 by any abelian group V we
obtain the definition of a vector space over a field F. In order for the definition
to be more rigorous the multiplicative property should be defined as a function
µ : F × V → V that satisfies the required conditions , then for α ∈ F and v ∈ V
we would say that we write µ(α, v) as αv.
We know that for every v ∈ R3 and 0 ∈ R holds 0 · v = 0. Does it hold for any
abstract vector space? This property doesn’t appear in the list of axioms of a
vector space, to see if it holds or not we would have to either prove or disprove
it based on the axioms of a vector space.
Proposition 2.5.1. Let V be a vector space, over a field F then the following
holds:
1) 0v = 0.
2) −v = (−1) · v
3) For all α ∈ F and 0 ∈ V holds α0 = 0.
Note that in 0v = 0 on the left hand side we have 0 ∈ F the zero scalar , while
the zero on the right side is the zero vector. In the second identity that seems
4
obvious, actually −v is the additive inverse of the element v ∈ V in the abelian
group (V, 0, +) while (−1)v is the product of the additive inverse of the unit
element in the field F multiplied by the vector v. The last property states that
every scalar in the field yields the zero vector when it is multiplied by the zero
vector.
Proof. Consider
∗
0v = (0 + 0)v = 0v + 0v.
Where ∗ follows from property (4). Since 0v ∈ V it has an additive inverse in
the abelian group which is denoted by −0v. Adding this to both sides we obtain
0 = 0v + (−0v) = 0v + 0v + (−0v) = 0v
−v = (−1)v.
It can be quite tedious to verify for each space that it indeed satisfies all the
axioms of the vector space. However the situation can be simplified if we are
considering a set that is a subset of a known vector space. We therefore introduce
the concept of a vector subspace.
Definition 2.6. (Vector subspace)
Let V be a vector space over a field F and let W ⊂ V with 0V ∈ W .
(The zero vector of V must belong to W , in particular W is not empty!)
We say that W is a subspace of V if W becomes a vector space when the scalar
multiplication operations and the addition that are defined on V are restricted
to W .
Note that the zero vector of V will automatically be the zero vector of W . From
now on we will write 0 to denote the zero vector of V that is also the zero of W .
The concept of a subspace is important not only because when one tries to
understand a complicated structure it helps to understand its substructures,
but also because it is much easier to check that a certain space is a subspace of
a known vector space. In this way proving a given set is a vector space is much
easier , as the following theorem shows.
5
Theorem 2.6.1. Let V be a vector space and W ⊂ V.
If the following holds:
1)For all w1 , w2 ∈ W we have w1 + w2 ∈ W,
2) 0 ∈ W ,
3)For all α ∈ F and w ∈ W we have αw ∈ W.
then W is a subspace of V and in particular W is a vector space in its own
right.
Proof. The idea here is that most of the axioms regarding addition and multipli-
cation by a scalar that hold on all of V will continue to be valid when restricted
to W . The closure with respect to addition holds by assumption (1). Since
the associativity and the commutativity of the addition holds on all of V it will
still hold when the addition is restricted to W . We also assume 0 ∈ W , which
implies that W is not an empty set and also contains the additive identity. Now
let w ∈ W. By assumption (3) for all α ∈ F and w ∈ W we have αw ∈ W , in
particular if we choose α = −1 ∈ F we have that (−1)w ∈ W. By proposition
2.5.1 (−1)w = −w so −w ∈ W so for each element of W its additive inverse is
also in W . So far we have verified the first five axioms that state that (W, 0, +)
is an abelian group. It is clear that properties (1)-(4) in definition 2.5, that hold
on all of V will continue to hold when restricted to W . The fact that for all
w ∈ W and α ∈ F we have αw ∈ W holds by assumption (3).
Thus we have verified all the vector space axioms for W , so W is a vector space
in its own right, and thus W is a subspace of V .
It is sometimes convenient to unify assumption (1) and (3) into a one single
assumption as the following corollary shows.
Corollary 2.6.2. Let V be a vector space and W ⊂ V.
If the following holds:
1) 0 ∈ W ,
2)For all α, β ∈ F and w1 , w2 ∈ W we have
αw1 + βw2 ∈ W,
then W is a subspace of V and in particular W is a vector space in its own
right.
Proof. From the previous theorem it follows that we only need to prove that:
1)For all w1 , w2 ∈ W holds w1 + w2 ∈ W.
2)For all α ∈ F and w ∈ W holds αw ∈ W.
Since by assumption we have that for all α, β ∈ F and w1 , w2 ∈ W holds
αw1 + βw2 ∈ W we can choose in particular α = β = 1 to conclude that
(1) holds. Choosing w2 = 0 ∈ W we conclude that for all w1 ∈ W holds
αw1 ∈ W.
As an example of how to apply this corollary consider the following proposition.
Proposition 2.6.3. (Subspaces Intersection)
Let V be a vector space and let U , W be subspace of V then U ∩ W is a subspace
of V.
6
Proof. Clearly 0 ∈ U ∩ W because 0 ∈ U and 0 ∈ W. Now let u, w ∈ U ∩ W ,
then u, w ∈ U and u, w ∈ W , since U, W are subspaces of V for all α, β ∈ F
holds αu + βw ∈ U and αu + βw ∈ W thus
αu + βw ∈ U ∩ W
which proves that U ∩ W is a subspace of V.
The next important concept in the linear algebra course is the span of a set of
vectors. The motivation comes from the following problem: given a vector space
V and vectors v1 , .., vk ∈ V what is the smallest subspace of V that contains the
vectors v1 , ..., vk ? If W is a subspace of V that contains v1 , .., vk then clearly
for all α1 , ..., αk ∈ F we have α1 v1 + ... + αk vk ∈ W. Such a weighted sum of
vectors is called a linear combination of v1 , ..., vk . This motivates the following
definition.
Definition 2.7. Let V be a vector space over F and let v1 , ..vk ∈ V. The space
spanned by v1 , ..., vk is defined by
n o
span {v1 , ..., vk } = α1 v1 + ... + αk vk α1 , ..., αk ∈ F
Proposition 2.7.1. Let V be a vector space over F and let v1 , ..vk ∈ V. Then
span {v1 , .., vk } is the smallest subspace of V that contains the vectors v1 , ..., vk .
Proof. Let W = span {v1 , ..., vk } we will show that W is a subspace of V and
that any other subspace of V that contains v1 , .., vk contains W .
Since k ≥ 1, W is not empty and choosing all the scalars to be zero we conclude
that 0 ∈ W. Let w1 , w2 ∈ W , then there exists α1 , ..., αk , β1 , ..βk ∈ F such that
k
X k
X
w1 = αi vi and w2 = βi vi .
i=1 i=1
7
Definition 2.8. Let V be a vector space. The {v1 , .., vk } ⊂ V is said to be a
spanning set if
V = span {v1 , ..., vk } .
The situation in which an infinite set can be represent in some way by a finite set
is quite important. We have the following definition to describe this situation
for vector spaces.
Definition 2.9. Let V be a vector space over the field F, we say that V is
finitely generated if there exists a finite set of vectors v1 , ..., vn such that
If we consider again the example of Rn we can choose {e1 , ..., en } as the spanning
set for it. We notice another feature of this set, its minimal in the following
sense, if we remove any vector from this set we will no longer have a spanning
set. Whenever we have a spanning set we can add to it as many vectors from V
as we wish and we will still have a spanning set. So a spanning set can in general
have redundant vectors that we can remove and still obtain a spanning set. For
example if there is a vector that is a linear combination of other vectors we can
discard it and have a spanning set. This is a key observation. To formalize what
we mean by a set without ”redundancy” we have the following definition.
Definition 2.10. (Linear Dependence)
Let V be a vector space and let v1 , ..., vk ∈ V. The vector v1 , ..., vk are said to
be linearly dependent , if there exists α1 , ..., αk ∈ F not all of which are zero
such that
Xk
αi vk = 0.
i=1
For example any set that contains the zero vector is linearly dependent because
we can write,
1 · 0 + 0v1 + ... + 0vk = 0,
so we have found scalars not all of which of zero such that the linear combina-
tion above is the zero vector. A set that is linearly dependent always contains
redundancy, a vector that can be removed from the linearly dependent set such
that the remaining vectors will span the same subspace. This is formalized in
the following lemma.
Proposition 2.10.1. Let V be a vector space and let {v1 , ..., vk } be any spanning
set. Then for every v ∈ V the set {v, v1 , ..., vm } is linearly dependent.
Proof. Since V = span {v1 , ..., vk } and v ∈ V we can find scalars α1 , ..., αk ∈ F
such that
Xk
v= αi vi .
i=1
8
We can therefore write
k
X
v− αi vi = 0.
i=1
The coefficient of v is nonzero , so not all scalars are zero and thus {v, v1 , ..., vm }
is linearly dependent.
Lemma 2.10.2. Let V be a vector space and let v1 , ..., vk ∈ V , k > 1 be nonzero
vectors. Let v1 , ..., vk be linearly dependent, then there exists a vector that is the
linear combination of its predecessors.
Proof. Since the vectors are linearly dependent , there are scalars , not all of
which are zero such that
Xk
αi vi = 0.
i=1
implies α1 = α2 = ... = αk = 0.
We can see that linear independence is equivalent to the requirement that the
zero vector can be uniquely written as a linear combination of linearly indepen-
dent vector, that is only as the trivial linear combination where all coefficients
are zero. This is in fact true for the general case as the next proposition.
9
Proposition 2.11.1. Let V be a vector space and let v1 , ..., vk ∈ V , be linearly
independent vectors. Then for every v ∈ span {v1 , ..., vk } there exists unique
scalars α1 , ..., αk ∈ F such that
k
X
v= αi vi .
i=1
Proof. Since v ∈ span {v1 , ..., vk } there exists α1 , .., αk ∈ F such that
k
X
v= αi vi .
i=1
αi − βi = 0
for every i.
We have now defined two concepts that are in a sense conflicting. On the one
hand the spanning set tends to be big because by adding vectors to it , the set
will remain spanning. Intuitively we could say that a spanning set wants to be
as big as it can. Removing elements from it may ruin it. On the other hand if
we have a linearly independent set, by removing elements from it it will continue
to be linearly independent. The following theorem shows in what sense linearly
independent sets tend to be small while spanning sets tend to be big.
Theorem 2.11.2. Let V be a vector space. Let v1 , .., vm be any spanning set,
that is V = span {v1 , ..., vm } . Let {u1 , ..., uk } be any linearly independent set,
then k ≤ m. That is any spanning set has at least as many elements as any
linearly independent set.
Proof. We may assume that all the vectors v1 , .., vm are nonzero , otherwise we
can discard all the zero vectors and the remaining set of nonzero vectors will
also be a spanning set. Suppose that k > m we will show that we arrive to a
contradiction. Consider the set {u1 , v1 , .., vm } of nonzero vectors, by proposition
2.10.1 this set is linearly dependent and therefore by lemma 2.10.2 there exists
a vector that is a linear combination of its predecessors. This vector is not u1
10
because it has no predecessors, thus there exists an index 1 ≤ j1 ≤ m such that
vj1 is a linear combination of its predecessors, we can discard it and still obtain
a spanning set
{u1 , v1 , .., vbj1 , ..., vm }
where the hat over vj1 indicates that this vector is omitted from the list. Suppose
inductively that for l steps we arrived at a set
{u1 , .., ul , v1 , .., vbj1 , ..., vbjl , ..., vm }
in which we discarded l < m vectors from the list v1 , .., vm and obtained a
spanning set by adding the vectors u1 , .., ul . We will show that we can continue
the process. Since by the inductive hypothesis the set
{u1 , .., ul , v1 , .., vbj1 , ..., vbjl , ..., vm }
is a spanning set we can add to it the vector ul+1 to obtain a linearly dependent
set
{u1 , .., ul , ul+1 , v1 , .., vbj1 , ..., vbjl , ..., vm }
(for clarity of notation the omitted vectors could be any vectors from the list
v1 , ..., vm . ) Since the set is linearly dependent by lemma 2.10.2 there exists a
vector that is a linear combination of its predecessors. This vector is not any
of the vectors u1 , ..., ul+1 because this set is linearly independent. Thus there
exists 1 ≤ jl+1 ≤ m such that vjl+1 is a linear combination of its predecessors,
thus we can discard it and obtain the spanning set,
u1 , .., ul , ul+1 , v1 , .., vbj1 , ..., vbjl , ..., vbjl+1 , ..., vm ,
which completes the inductive step. Since we assumed k > m after m steps we
will obtain the spanning set {u1 , .., um } were all the vectors v1 , ..., vm have been
discarded. Since k > m we can add the vector um+1 to the spanning set and
conclude that the set {u1 , ..., um , um+1 } is linearly dependent, contradicting the
assumption that the set {u1 , ..., uk }, where k ≥ m + 1 is linearly independent.
This contradiction implies that k ≤ m as desired.
The previous theorem shows that the largest possible linearly independent set
will have no more elements than the smallest possible spanning set. So spanning
and linear independence are somewhat conflicting constraints, so the question
arises can we have both conditions hold at the same time. What happens when
those opposing forces reach the perfect balance and equilibrium. The answer is
that something wonderful happens at this point. We arrive to one of the most
important concepts in the course, the basis of a vector space.
Definition 2.12. (Vector Space Basis)
Let V be a vector space, a basis for V is a spanning linearly independent set.
We have defined the notion of a basis, but we still don’t know if we can always
find a basis for a given vector space. The following important theorem shows
that for a finitely generated vector spaces a basis always exists.
11
Theorem 2.12.1. (Existence of Basis)
Let V be a finitely generated vector space, then there exists a basis for V .
Proof. Since V is finitely generated there exist finitely many nonzero vectors
v1 , ..., vN ∈ V such that V = span {v1 , ..., vN } . If v1 , ..., vN are linearly inde-
pendent they from the desired basis and we are done, otherwise by lemma 2.10.2
there exists 1 < j ≤ N such that vj is a linear combination of its predecessors,
we can discard it and still obtain a spanning set. If the obtained set is linearly
independent we are done, otherwise we will continue the process which will ter-
minate after at most N − 1 steps because a set with a single nonzero vector
is linearly independent. Once the process has terminated we have arrived to a
linearly independent set, because it is a set of nonzero vectors such that there
is no vector that is a linear combination of its predecessors and is thus linearly
independent. Since at each step we obtain a spanning set , once the process
has terminated we have arrived at a spanning and a linearly independent set ,
which is a basis.
Definition 2.13. (Dimension)
Let V be a vector space, over the field F. The dimension of V is the number of
elements in a basis of V.
For example according to this definition the dimension of Rn over R is n because
the set of the vectors {e1 , .., en } is a spanning and a linearly independent set. In
addition if we consider the vector space as a set , then the field above which we
regard V as a vector space can effect the dimension. For example the dimension
of C as a vector space over C is 1, but if we consider C as a vector space over
R then its dimension is 2 and a basis for this space is {1, i} .
The question that arises is the notion of dimension is well defined, as for the
same space there could be many possible choices of bases. What if in two bases
have different number of elements, then this would imply that the definition
above is meaningless and the notion of dimension is not well defined. As the
next proposition shows this never happens.
Proposition 2.13.1. (Dimension is Well Defined)
Let V be a finitely generated vector space over F, then any two bases of V have
exactly the same number of elements.
Proof. Since V is finitely generated there exists at least one basis for V . Sup-
pose that B1 = {u1 , .., um } and B2 = {v1 , ..., vn } are two bases for V . Since
B1 is linearly independent set and B2 is a spanning set we conclude by propo-
sition 2.11.2 that m ≤ n, reversing the roles of B1 and B2 by exactly the same
argument we conclude that n ≤ m thus n = m.
We would like to emphasize again that a spanning set and a linearly independent
set are is a sense each others opposites. We have seen that adding a vector to
a spanning set we can only improve the situation as it will remain spanning,
we can add as many vectors as we please. Removing a vector from a spanning
set may result in a set that is no longer spanning and thus it can ruin the
12
property. On the other hand removing a vector from a linearly independent
set will result in a linearly independent set (we can consider the empty set a
linearly independent.) Adding a vector to a linearly independent set can result
in a linearly dependent set. However as the next lemma shows if the linearly
independent set is not a spanning set there is a way to add a vector to it and
still obtain a linearly independent set.
Lemma 2.13.2. Let V be a vector space over the field F and let {u1 , .., uk } a
linearly independent set in V that is not spanning V . Then for every
v ∈ V \ span {u1 , ..., uk } the set {u1 , ..., uk , v} is a linearly independent set.
Proof. Suppose that for α1 , .., αk , β ∈ F we have
α1 u1 + ... + αk uk + βv = 0.
which implies that v ∈ span {u1 , ..., uk } contradicting the assumption. Thus
β = 0 and
α1 u1 + ... + αk uk = 0.
Since u1 , ..., uk are linearly independent this implies α1 = ... = αk = 0 together
with β = 0 this implies that {u1 , ..., uk , v} is a linearly independent set.
Corollary 2.13.3. Let V be a vector space over the field F and let n = dim(V ),
then any linearly independent set with n elements is a basis for V.
Proof. Since V is finitely generated there exists a basis for V say v1 , ..., vn , so
n = dim(V ). Let u1 , ..., un be a linearly independent vectors. Suppose that
u1 , ..., un is not a basis for V , this means that those vector don’t span V . Thus
there exists un+1 ∈ V \ {u1 , ..., un }. By the previous lemma this implies that
{u1 , ..., un , un+1 } is linearly independent. Since {v1 , ..., vn } is a basis it is a
spanning set with n elements and since {u1 , ..., un , un+1 } is linearly independent
by lemma 2.11.2 we must have n + 1 ≤ n which is a contradiction, which implies
V \ {u1 , ..., un } = ∅, or V = span {u1 , ..., un } and thus u1 , ..., un is a basis for
V.
Theorem 2.13.4. (Completing To A Basis)
Let V be a finitely generated vector space over F, then every linearly independent
set can be completed to a basis.
Proof. Since V is finitely generated V by theorem 2.12.1 there exists a finite
basis for V , so V is finite dimensional. Let n = dim(V ). Let u1 , .., uk ∈ V be
any collection of linearly independent vectors, clearly k ≤ n. If k = n then by
corollary 2.13.3 u1 , ..., uk is already a basis. If k < n then choosing any uk+1 ∈
13
V \ span {u1 , ..., uk } the set {u1 , ..., uk , uk+1 } will be linearly independent. If
n = k + 1 by corollary 2.13.3 we are done. Since dim(V ) = n no set with less
then n vectors can span V so we will continue the process n − k steps each
time obtaining a linearly independent set. After n − k steps the process will
terminate and we will obtain a linearly independent set {u1 , ..., uk , uk+1 , .., un }
of size n which will be a basis.
Sums of Spaces
Let V be a finite dimensional vector space and let U, W be subspaces of V. The
union of U , W will not be a subspace of V unless U ⊂ W or W ⊂ V. The
reason for this is that this set will not be closed to addition otherwise. Indeed
one space is contained in another , their union is the bigger subspace, which is
a space in its own right. Since U ∪ W is a subset of V , it would suffice to show
that it is closed to addition because it clearly contains the zero vector , and is
closed to multiplication by a scalar. Indeed if v ∈ U ∪ W then either v ∈ U or
v ∈ W , since U , W are subspaces, αv ∈ U or αv ∈ W thus αv ∈ U ∪ W. Now
we will show that if U 6⊂ W and U ∪ W is closed to addition then W ⊂ U. In
this case U \ W 6= ∅, so choose any u ∈ U \ W . Now choose any w ∈ W , note
that u + w 6∈ W because if there was a w0 ∈ W such that u + w = w0 , since W is
a vector space with would imply that u = w0 − w ∈ W so u ∈ W contradicting
the assumption that u ∈ U \ W. Thus for every w ∈ W , u + w 6∈ W. Thus if
U ∪ W is closed to addition then for every w ∈ W we must have u + w ∈ U ∪ W ,
which means that u + w ∈ U . Thus for every w ∈ W there exists a vector
u0 ∈ U such that u + w = u0 or w = u0 − u ∈ U so w ∈ U for every w ∈ W
which means that W ⊂ U. This situation when the union of two subspaces is
not a subspace is illustrated in the example below. Consider
n the vector ospace
R2 and its two subspaces, the x-axis , U = R × {0} = (x, 0) x ∈ R and
n o
the y-axis , W = {0} × R (0, y) y ∈ R . The union
14
is not a subspace of R2 because (1, 0), (0, 1) ∈ U ∪ W but
(1, 0) + (0, 1) = (1, 1) 6∈ U ∪ W. As we will prove this will always happen unless
W ⊂ U or U ⊂ W. The question that arises is what is the smallest subspace of
V that contains U ∪ W. Since it is closed to scalar multiplication and contains
the zero vector the necessary condition for this on this subspace is is that it
must contain all the sums of vectors from U with vectors from W . We will show
that this condition is also sufficient.
Let us therefore define the following subset of V ,
U + W := {u + w | u ∈ U w ∈ W}.
v1 + v2 = (u1 + u2 ) + (w1 + w2 )
| {z } | {z }
∈U ∈W
15
independent.
Spanning: we need to show that every v ∈ U + W can be written as a linear
combination of those vectors. Since v ∈ U + W there exists u ∈ U and w ∈ W
such that v = u + w. Since we have a basis for U and W we can write
This shows that the set is spanning. Now we need to show linear Independence.
Indeed suppose that
Since the left hand side vector equals to the right hand side vector, and the left
hand side is in U and the right hand side is in W we conclude that
This vector can therefore be written as a linear combination of the basis for
U ∩ W as
δ1 v1 + ...δr vr = −γr+1 wr+1 − ... − γm wm
16
which is equivalent to
δ1 v1 + ...δr vr + γr+1 wr+1 + ... + γm wm = 0
but since {v1 , ..., vr , wr+1 , .., wm } is a basis for W it is in particular linearly
independent and thus δ1 = ... = δr = γr+1 = ... = γm = 0. This implies that
α1 v1 + ... + αr vr + βr+1 ur+1 + ... + βn un = 0
and since v1 , ..., vr , ur+1 , ..., un is a basis for U it is in particular linearly inde-
pendent which implies α1 = ... = αr = βr+1 = ... = βn = 0. This proves that
the set v1 , ..., vr , ur+1 , ..., un , wr+1 , ..., wm spans U + W and is linearly indepen-
| {z } | {z }
n vectors m-r vectors
dent which means that it is a basis for U + W which contain n + m − r elements.
Recalling the definition of r, n, m we obtain that
dim(U + W ) = dim(U ) + dim(W ) − dim(U ∩ W ).
17
3 Linear Transformations
This is the most important part of the course. As you may know from experi-
ence studying functions and properties of functions from R to R for example is
more important than studying the properties of R itself. The question is what
are the types of maps that are most natural to study between vector space. On
each vector space there is additional layers of structure on top of the set itself.
There is scalar multiplication and addition of vectors, so it will be natural to
consider mapping between vector spaces that preserve this structure.
Definition 3.1. Let V ,W be vector spaces over the field F. A linear T from V
to W is a function T : V → W that satisfies , for all v1 , v2 , v ∈ V and all α ∈ F:
T (αv) = αT (v).
Note that the addition on the left hand side is in V and it is the addition in W on
the right hand side. The same is with the scalar multiplication. The following
images illustrate the naturality in the definition of linear transformations.
Ker(T ) = { v ∈ V | T (v) = 0 } ⊂ V,
Im(T ) = { T (v) | v ∈ V } ⊂ W.
18
Proposition 3.1.1. Let V ,W be vector spaces over the field F and let T : V →
W be a linear transformation. Then Ker(T ) is a subspace of V and Im(T ) is
a subspace of W .
Proof. First note that 0 ∈ ker(T ). Then T (0) = T (0 + 0) = T (0) + T (0), adding
−T (0) to both sides we obtain T (0) = 0, so 0 ∈ ker(T ). Now for all u, v ∈ V
and all α, β ∈ F we have
which proves that αw1 + βw2 ∈ Im(T ) and that Im(T ) is a subspace of W .
By definition we see that T : V → W is surjective if and only if Im(T ) = W.
What can we say about the kernel?
Proposition 3.1.2. Let V ,W be vector spaces over the field F and let
T :V →W
19
Proof. Suppose that T is one to one, since T (0) = 0, then for every v ∈ V
such that T (v) = 0 we must have , v = 0 so Ker(T ) = {0} . Conversely suppose
Ker(T ) = {0} . Suppose that T (v1 ) = T (v2 ), this is equivalent to T (v1 −v2 ) = 0
which means v1 − v2 ∈ Ker(T ). Since Ker(T ) = {0} we have v1 − v2 = 0 or
v1 = v2 .
Both of the properties mentioned above are special case of the following
important theorem.
Theorem 3.1.3. Let V ,W be vector spaces over the field F and let T : V → W
be a linear transformation. Then
Proof. Let us choose a basis for Ker(T ) denote it by v1 , ..., vk , and complete it
to a basis of V by vk+1 , ..., vn so that v1 , ..., vn is a basis for V . We will show
that the n − k vectors wk+1 = T (vk+1 ),...,wn = T (vn ) are a basis for Im(T ).
Then the result follows at once because dim(V ) = n, dim(Ker(T )) = k, and
dim(Im(T )) = n−k, which means that dim(Im(T ))+dim(Ker(T )) = dim(V ).
Note that for every w ∈ W there exists v ∈ V such that w = T (v). We can
write the vector v as a linear combination of the basis so
n
! k n
!
X X X
T (v) = T αi vi = T ( αi vi ) + T αi vi
i=1 i=1 i=k+1
| {z }
∈Ker(T )
n
X
= αi wi .
i=k+1
20
or
α1 v1 + ... + αk vk − αk+1 vk+1 − ... − αn vn = 0,
since v1 , ..., vn is a basis for V it is a linearly independent set which means
α1 = ... = αk = ... = αn = 0.
In particular this implies that wk+1 , .., wn is linearly independent and is therefore
a basis for Im(T ).
Since Im(T ) ⊂ W and the spaces have the same dimension we conclude that
Im(T ) = W and thus T is onto. Similarly if T is onto , then Im(T ) = W
so dim(Im(T )) = dim(W ) = dim(V ) which implies dim(Ker(T )) = 0. This
means that Ker(T ) = {0} and therefore T is one to one.
The next theorem shows that the values of a linear transformation on a basis
uniquely define the transformation on the entire space.
Theorem 3.1.5. (Existence and Uniqueness of Linear Transformations) Let V
and W be vector space over the field F. Let {v1 , ..., vn } be any basis for V , and
let w1 , ..., wn ∈ W be any n vectors in W . Then there exists a unique linear
transformation T : V → W that satisfies T (vi ) = wi , for every i = 1, 2, ..., n.
Pn
Proof. Let v ∈ V and write v = i=1 αi vi and define T : V → W by
n
X
T (v) = αi wi .
i=1
Then
n
X
T (αv1 + βv2 ) = (ααi + ββi )wi = αT (v1 ) + βT (v2 ).
i=1
21
Moreover T (v1 ) = wi . finally is Te is another
Pn linear operator that satisfies T (vi ) =
e
wi then write each v uniquely as v = i=1 αi vi by linearity it follows that
n
X n
X
Te(v) = αi Te(vi ) = αi wi = T (v).
i=1 i=1
One of the main goals of algebra is to study and classify algebraic structures. We
would like to find a definition to the case when two vector spaces are essentially
the same. Similarly to the situation when when say exactly the same thing
but in a different language. If we continue with this analogy to vector spaces
V ,W we need a mapping that will be one and one and onto T : V → W, if in
addition this map is also linear then it also preserves the structure. That is if
T : V → W is a linear bijective map then instead of calling a vector v we can
call it T (v) and instead of calling the vector αv we can call it αT (v), and we
call the sum v1 + v2 by T (v1 ) + T (v2 ). With this identification the vector spaces
V and W are essentially the same. Calling the vector v by the name T (v) we
see that V and W are the same up to renaming each vector in V to the name
name T (v) which is a vector and W . Thus when when have have a bijective
linear map T : V → W we have to vector spaces that are completely identical
up to renaming of its vectors. This has a spacial name, we say that the spaces
V and W are isomorphic. In Greek iso means same and morph means shape.
Definition 3.2. Let V ,W be vector spaces over the field F, we say that V and
W are isomorphic if there exists a bijective linear transformation T : V → W ,
we say that T is an isomorphism.
Let V be a finite dimensional vector space over the field F once we fix a ba-
sis for V say B = {v1 , ..., vn } by proposition 2.11.1 there exists unique scalars
α1 , ..., αn ∈ F such that
Xn
v= αi vi .
i=1
If we know the basis B those unique scalars encode all the information regarding
v , and allows us to recover v easily from them by forming the required linear
combination. Let us define
α1
..
[v]B = . ∈ Fn .
αn
This vector is called the coordinates vector of v ∈ V with respect to the basis B.
Thus for every basis B of V there exists therefore a bijective map ϕB : V → Fn
defined by
ϕB (v) = [v]B .
22
The map ϕB : V → Fn is linear , and since for each vector there corresponds
a unique coordinate vector in a basis B, this map is actually an isomorphism.
This is a very important observation because however complicated and abstract
the n dimensional vector space V may be it is isomorphic to the simple concrete
vector space Fn . That is those vectors spaces are the same. In fact we have
just proved the following classification theorem for all finite dimensional vector
spaces.
Theorem 3.2.1. Let V be an n dimensional vector space over the field F, then
V is isomorphic to Fn .
23
Remember the following picture where the vector space V sits on top,
the simpler vector space Fn is at the bottom and there is a bijective map iden-
tifying the two. Such a picture arises often thought mathematics. This is a
key observation because for all ”practical purposes” we can think of any vector
v ∈ V in any vector space (complicated and abstract as it may be) as a tuple
of n-scalars , that are its components with respect to the chosen basis. What a
striking resemblance to the case of the well familiar space Rn !!!
αn βn
24
If we substitute the expression for each vi in terms of u0j s we get
!
Xn n
X Xn Xn Xn
v= αi vi = αi aij uj = aij αi uj .
i=1 i=1 j=1 j=1 i=1
The last equality arises from rearranging the order of summation. In the last
summation we also have v expressed as a linear combination of u0j s, and since
B 0 is a basis this representation is unique so we must have for each j:
n
X
βj = aij αi .
i=1
Note that we sum over the row index. Therefore if we define the matrix
T
a11 ... a1n
PBB0 = ... ..
.
an1 ... ann
The matrix PBB0 is called the transition matrix from basis B to B 0 . Note that
This means that for each vector v the matrix PBB0 takes the vector coordinates of
a vector v in basis B to its coordinates in the basis B 0 . Note that this notation
for the transition matrix is convenient because when we observe the formula
[v]B 0 = PBB0 [v]B it is as though the B on the top and the B on the bottom
on the left hand side of the equation is being reduced. Such notation trick
reminds the common practise and it is easier to avoid mistakes and see when
certain expression make sense. We have the following commutative diagram:
The space Fn is represented in the diagram as the coordinates space. Since the
diagram is commutative and ϕB ,ϕB 0 are isomorphisms we see that for every
x ∈ Fn we have
PBB0 x = (ϕB 0 ◦ ϕ−1
B )x.
25
Such a picture is important and common in the context of topological mani-
folds and algebraic manifolds. Even through elementary mathematics we can
experience the flavor of the advanced topics. Clearly the matrix PBB0 must be
0
invertible because by the same procedure we can also compute the matrix PBB .
using the property of those matrices for each v ∈ V let us compute
0 0 0
(PBB PBB0 )[v]B = PBB PBB0 [v]B = PBB [v]B 0 = [v]B = I[v]B .
Since for every x ∈ Fn there exists v ∈ V s.t x = [v]B we conclude that for all
x ∈ Fn holds 0
(PBB PBB0 )x = Ix,
it follows that 0
PBB PBB0 = I
and similarly 0
PBB0 PBB = I
0
thus PBB is an invertible matrix and
0
(PBB )−1 = PBB0 .
This shows another advantage of this notation, the notation of the inverse matrix
is just filliping the order of B and B 0 . Moreover every invertible matrix P can
be thought of as being the transition matrix from one basis to another. For
example if we choose the standard basis B = {e1 , ..., en } for Fn and P is any
invertible matrix with entries in F, then P is theP transition matrix from the
n
basis B to the basis B 0 = {u1 , ..., un } where ui = j=1 Pji ej . It can be easily
0
shown that if P is invertible then B is indeed a basis for Fn . The importance
of this observation will become apparent soon.
26
Now let [T (v)]B = (β1 , ..., βm )T . Since every vector can be written uniquely
W
as a linear combination of the basis elements we conclude that for every j we
have
Xn
βj = aij αi
i=1
we see that B
[T (v)]BW = [T ]BV [v]B .
W V
B
V
The matrix [T ]B is called the representing matrix of T with respect to the
W
base BV and BW . This connection between a linear transformation and
a multiplication by a matrix is the most important concept that is
taught in the first undergraduate linear algebra course!!! What we have
proven shows that however complicated the vector spaces V , W may over F be,
once we fix a bases BV and BW for W , the space V is isomorphic to Fn and W
is isomorphic to Fm and we can think any linear T : V → W as a multiplication
B
by the m × n matrix [T ]BV as the following commutative diagram illustrates:
W
This diagram shows that for all practical purposes once we fix bases BV and
BW for V and BW we can think of each vector in the spaces as its vector
coordinates , and of each linear transformation as its representing matrix in
those bases. perform all the computation we need in the ”simple” spaces Fn
and Fm and then if we need we can always recover the vectors in the original
vector spaces. The important lesson to be learned from the discussion above
27
is that every multiplication of a vector by matrix is a linear transformation ,
and every linear transformation between any two vector spaces can be uniquely
represented as a multiplication of a vector by a matrix once some bases were
fixed for the spaces.
Linear Operators
Now consider the case where V is a vector spaces over the field F and T : V → V
is a linear transformation, this important special case has a name, T is said to
be a linear operator. In this case we need to fix only one basis for the space
V , say B and we will be able to find the unique matrix representing T in this
B
basis. In stead of writing [T ]B we will write [T ]B or TB for simplicity. There
are three related questions that are related to each other arise.
1) How does TB depend on the choice of the basis?
2) If we choose another basis B 0 , how TB and TB 0 are related? (The must be
related because they represent the same linear transformation!)
3)Is there a preferable choice of basis so that TB has the ”simplest possible
form”?
We will answer all the questions by examining the relation between TB and TB 0 .
For every basis B we have
[T (v)]B = TB [v]B .
B
Multiplying both sides of the equation by PB0 we obtain
B
[T (v)]B 0 = PB0 TB [v]B
0
Using the qualities [T (v)]B0 = [T ]B0 [v]B 0 and [v]B = PBB [v]B 0 we obtain
B B 0
[T ]B0 [v]B 0 = [T (v)]B 0 = PB0 TB [v]B = (PB0 TB PBB )[v]B 0 .
which implies
B 0 B0 0
[T ]B0 = PB0 TB PBB = (PB )−1 TB PBB .
We have thus obtained the answer to the first and the second questions and
derived the exact relation. Since every invertible matrix can be thought of as
transition from basis to basis we arrive at the following IMPORTANT conclu-
sion. Two square matrices A1 ,A2 of the same dimension represent the same
operator in two different bases if and only if there exist an invertible matrix P
such that
A2 = P −1 A1 P.
This motivates the following definition.
28
Definition 3.3. Let A, B ∈ Fn×n the A and B are said to be similar if there
exists an invertible matrix P such that
B = P −1 AP.
The observation that similar matrices represent the same operator suggest that
they have some common properties ,those properties would be some properties
of the operator that they represent , that are invariant and are independent of
the choice of the basis. This is indeed the case.
Proposition 3.3.1. Suppose that A, B are similar square matrices then :
1) r(A) = r(B)
2) tr(A) = tr(B)
3) det(A) = det(B).
2)Since for every two matrices tr(AB) = tr(BA) and matrix multiplication is
associative we have
we conclude that
Finally we begin to answer the last question. How do we choose a basis B such
that TB is as simple as possible. Suppose we have chosen some basis B, then
the matrix representing T in a basis B 0 will be TB 0 = P −1 TB P where P is the
transition matrix from B 0 to B. So the question reduces to the following one :
given a matrix A find an invertible matrix P such that P −1 AP is as ”simple as
possible”. The question is what do we mean by as simple as possible? Clearly
the zero matrix is the simplest matrix. Can we choose P such that for a given
A we will have P −1 AP = 0? No, unless A = 0 to begin with. That is it
will work only if A represents the operator that is identically zero. What is
the next simplest thing to try? what about a scalar matrix of the from λI. If
P −1 AP = λI then gain since the scalar matrices commute with every matrix
we are led to the conclusion that in this case A = λI which is again possible
if and only if A represents the operator which is simply a multiplication by a
29
scalar. The simplest from that we can hope to achieve is therefore a diagonal
matrix
λ1 ... 0
D = ... . . . ...
0 ... λn
We therefore arrive at the diagonalization problem.
Given a matrix A find an invertible matrix P and a diagonal matrix D such
that
P −1 AP = D.
The first question that we should ask ourselves is it always possible. Let us first
examine the necessary conditions for this. We need to find an invertible matrix
P and a diagonal matrix D such that
AP = DP.
Now let us write P = (v1 |...|vn ) where v1 , ..., vn are column vectors of P . Since
P is invertible the vector v1 , ..., vn are linearly independent and therefore form
a basis for Fn . We therefore have the equation
Av = λv.
30
This means that the homogeneous equation above has a nonzero and hence a
nontrivial solution, which can happen if and only if the matrix λI −A is singular
, which is true if and only if
det(λI − A) = 0.
∆A (t) = ∆B (t).
Proof. Since A and B are similar there exists an invertible matrix P such that
P −1 AP = B. We therefore compute
31
The problem that we encounter is that the characteristic polynomial of the ma-
trix with coefficients in F may have no root in F. For example the characteristic
polynomial of the matrix
0 −1
A=
1 0
is
t 1
∆A (t) = det(tI − A) = det = t2 + 1
−1 t
has no roots in the field R. So the simple answer would be is that the matrix A
is not diagonalizable over R.
This may be a disappointing answer however if we really don’t want to encounter
this problem when we try to diagonalize a matrix is to assume that we are
working over a field in which every polynomial has a root. In such a field the
division algorithm for polynomials will imply that polynomial of degree n has
exactly n roots when they are counted with their multiplicity (that is the root
are not necessarily all distinct!)
Definition 3.6. A field F is said to be algebraically closed if every polynomial
with coefficients in F has a root in F.
If we wish we may consider every matrix with entries in R as a matrix over C. In
that case the fundamental theorem of algebra guaranties we will not encounter
the problem where the characteristic polynomial doesn’t a root.
Theorem 3.6.1. (Fundamental Theorem of Algebra)
The field of the complex numbers C is algebraically closed.
The proof of this theorem is beyond the scope of a standard undergraduate
level firs linear algebra course. As a remark we will quote another theorem to
the case where the field F may not be R. We will not even formulate it as a
theorem and just say that very field F can be extended to an algebraically closed
field.
From now on we will assume that all the matrices that we consider are over C
which is algebraically closed , thus for each n × n matrix A there exist exactly n
eigenvalues when counted with multiplicity , that is not all eigenvalues are nec-
essarily distinct! as we will see this is the main obstacle to the diagonalizability
of a matrix. To prove it consider the following observation: Let A be a matrix
and v an eigenvector of A that corresponds to the eigenvalue λ. Then for every
k ∈ N we have
Ak v = λk v.
Which means that for every polynomial p we can define the matrix p(A). Then
it is easily seen that.
Proposition 3.6.2. For every polynomial p, and an eigenvector v of A that
correspond to an eigenvalue λ we have
p(A)v = p(λ)v.
32
With this observation we can now produce a BEAUTIFUL proof of the following
theorem.
Theorem 3.6.3. Let F be a field and let A ∈ Fn×n be a matrix. Then eigenvec-
tors of A that correspond to distinct eigenvalues of A are linearly independent.
Proof. Let v1 , ..., vk be eigenvectors of A that correspond to distinct eigenvalues
λ1 , ..., λk with λi 6= λj if i 6= j. Define the Lagrange polynomials
n
Y x − λj
pi (x) =
λi − λj
j=1,j6=i
Since all the eigenvalues are distinct this polynomial is well defined. Moreover
for each l 6= i we have
pi (λl ) = 0
and
pi (λi ) = 1.
Now suppose that
k
X
αl vl = 0
l=1
This means that for every i we have αi = 0 which shows that the vectors v1 , ..., vk
are linearly independent.
Corollary 3.6.4. Let F be a field and let A ∈ Fn×n be a matrix that has n
distinct eigenvalues, then A is diagonalizable.
Proof. For each eigenvalue λi of A there corresponds at least one eigenvector
of A. Since eigenvectors that correspond to distinct eigenvalues are linearly
independent , this means that we have n linearly independent vectors. Since
dim(Fn ) = n the eigenvectors of A for a basis for Fn and thus A is diagonalizable.
This corollary shows that over an algebraically closed field the only reason that
A may not be diagonalizable is that not all its eigenvalues are distinct. Let
∆A (t) be the characteristic polynomial of A and let λ1 , ..., λm be its distinct
eigenvalues. Then we can decompose ∆A (t) as
where
k1 + ... + km = n.
33
The integer number ki is said to be the algebraic multiplicity of the root λi .
Suppose that A ∈ Fn×n has only k < n distinct eigenvalues. For each eigenvalue
we have at least one (up to multiplication by nonzero scalar) solution to (λi I −
A)v = 0. The problem is that it may not be enough. Let λi be an eigenvalue of A
and consider the space Vλi = Ker(λi I − A). This space is called the eigenspace
of λi . The geometric multiplicity of λi is defined by
This means that the matrix A is not diagonalizable even over C because we cant
find enough eigenvectors of A to form a basis for C2 . This situation is a typical
as the following theorem shows
34
Equivalently we have the following theorem :
Corollary 3.6.7. Let F be an algebraically closed field and let A ∈ Fn×n . Let
λ1 , ..., λk be the distinct eigenvalues of A. Then A is diagonalizable if and only
if the space Fn decomposes as the direct sum
35