Introduction To Linear Algebra
Introduction To Linear Algebra
Rita Fioresi
University of Bologna
Marta Morigi
University of Bologna
First edition published 2022
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
Authorized translation from Italian language edition published by CEA – Casa Editrice Ambrosiana, A Division of
Zanichelli editore S.p.A.
Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright
holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowl-
edged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho-
tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission
from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the
Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are
not available on CCC please contact mpkbookspermissions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for
identification and explanation without intent to infringe.
Typeset in LM Roman
by KnowledgeWorks Global Ltd.
Contents
Preface ix
v
vi Contents
Bibliography 261
Index 263
Preface
This textbook comes from the need to cater the essential notions of linear algebra to
Physics, Engineering and Computer Science students. We strived to keep the abstrac-
tion and rigor of this beautiful subject and yet give as much as possible the intuition
behind all of the mathematical concepts we introduce. Though we provide the full
proofs of all of our statements, we introduce each topic with a lot of examples and
intuitive explanations to guide the students to a mature understanding.
This is not meant to be a comprehensive treatment on linear algebra but an
essential guide to its foundation and heart for those who want to understand the
basic concepts and the abstract mathematics behind the powerful tools it provides.
A short tour of our presentation goes as follows.
Chapters 1, 13 and 14 are independent from each other and from the rest of the
book. Chapter 1 and/or Chapter 13 can be effectively used as a motivational intro-
duction to linear algebra and vector spaces. Chapter 14 contains some further topics
like the principle of induction and Euclid’s algorithm, which are essential for com-
puter science students, but they can easily be omitted and the chapter is independent
from the rest of the book.
Chapters 2, 3, 4 and 5 introduce the basic notions concerning vector spaces and
linear maps, while Chapters 6, 7, 8 and 9 further develop the theory to reach the
question of eigenvalues and eigenvectors. A minimal course in linear algebra can end
after Chapter 6, or even better after Chapter 9. In the remaining Chapters 10, 11
and 12 we study scalar products, the Spectral Theorem and quadratic forms, very
important for physical and engineering applications.
ix
x Preface
Ch. 2, 4, 5
Ch. 6, 7, 8, 9
Introduction to Linear
Systems
In this chapter, we discuss how to solve linear systems with real coefficients using a
method known as Gaussian algorithm. Later on, we will also use this method to solve
other questions; at the same time, we will interpret linear systems as special cases of
a much deeper theory.
a1 x1 + a2 x2 + . . . + an xn = b, (1.1)
The numbers a11 , . . . , a1n , . . . , am1 , . . . , amn are called the system coefficients, while
b1 , . . . , bm are called the known terms. If bi = 0 for every i = 1, . . . , m, the sys-
tem is said to be homogenous. A solution of the linear system (1.2) is a n-tuple
(s1 , s2 , . . . , sn ) of numbers that satisfies all the system equations. For example (1, 2)
1
2 Introduction to Linear Algebra
x1 + x2 = 3
{
x1 − x2 = −1
In this book, we will deal exclusively with linear systems with real coefficients
that is, systems of the form (1.2) in which all the coefficients aij of the unknowns
and all known terms bi are real numbers. The solutions that we will find, therefore,
will always be ordered n-tuples of real numbers.
Given a linear system, we aim at answering the following questions:
2. If so, how many solutions does it admit and what are they?
In certain cases, it is particularly easy to answer these questions. Let us see some
examples.
x1 + x2 = 3
{
x1 + x2 = 1
It is immediate to observe that the sum of two real numbers cannot be simultaneously
equal to 3 and 1. Thus, the system does not admit solutions. In other words, when
the conditions assigned by the two equations of the system are incompatible, then
the system does not have solutions.
x1 + x2 = 3
{
x2 = −1
Substituting in the first equation the value of x2 obtained from the second one, we
get: x1 = 3 − x2 = 3 + 1 = 4. The system is therefore compatible and admits a unique
solution: (4, −1). In this example, two variables are assigned (the unknowns x1 and
x2 ), and two conditions are given (the two equations of the system). These conditions
are compatible, that is they are not contradictory, and are “independent” meaning
that they cannot be obtained one from the other. In summary:
Two real variables along with two compatible conditions give one and only one
solution.
Introduction to Linear Systems 3
x1 + x2 = 3
{
2x1 + 2x2 = 6.
Unlike what happened in the previous example, here the conditions given by the two
equations are not “independent”, in the sense that the second equation is obtained
by multiplying the first by 2. The two equations give the same relation between
the variables x1 and x2 . Then, solving the linear system means simply solving the
equation x1 + x2 = 3. This equation certainly has solutions: for example, we saw in
the previous example that (4, −1) is a solution, but also (1, 2) or (0, 3) are solutions.
Exactly how many solutions are there? And how can we find them out? In this case,
we have two variables and one condition on them. This means that a variable is free
to vary in the set of real numbers, which are infinitely many. The equation allows
us to express a variable, say x2 , as a function of the other variable x1 . The solutions
are all expressible in the form: (x1 , 3 − x1 ). With this, we mean that the variable x1
can take all the infinite real values, and that in order for the equation x1 + x2 = 3 to
be satisfied, it must be x2 = 3 − x1 . A more explicit way, but obviously equivalent,
to describe the solutions, is {(t, 3 − t)∣t ∈ R}. Of course, we could decide to vary
the variable x2 and express x1 as a function of x2 . In that case, we would give the
solutions in the form (3 − x2 , x2 ), or equivalently we say that the set of solutions is:
{(3 − s, s)∣s ∈ R} . In summary:
Two real variables along with one condition give infinite solutions.
Definition 1.1.5 Two linear systems are called equivalent if they have the same
solutions.
x1 + x2 = 3
{
2x1 + 2x2 = 6
1.2 MATRICES
Given two natural numbers m, n, a m × n matrix with real coefficients is a table of
mn real numbers placed on m rows and n columns. For example:
5 −6 0
( )
4 3 −1
is a 2 × 3 matrix.
4 Introduction to Linear Algebra
1 0
( 2 )
3
3
5 −6 0
A=( )
4 3 −1
the (1, 3) entry is 0, while the (2, 2) entry is 3. Of course, two m × n matrices A and
B are equal if their entries coincide, that is, if the (i, j) entry of A coincides with the
(i, j) entry of B, for every i = 1, . . . , m and for every j = 1, . . . , n.
Given a generic m × n matrix, we can write it synthetically as follows:
⎛ b1j ⎞
⎜ b2j ⎟
cij = ( ai1 ai2 . . . ais )⎜
⎜
⎜
⎜
⎟
⎟
⎟ = ai1 b1j + ai2 b2j + . . . + ais bsj ,
⎜ ⋮ ⎟ ⎟
⎝ bsj ⎠
⎛ 0 1 ⎞
⎛ 1 0 3 −1 ⎞
⎜ ⎟
B=⎜ ⎟
−3 5
A=⎜
⎜ 0 −2 2 ⎟
1 ⎟ ⎜
⎜
⎜
⎟
⎟
⎟
⎝ 1 0 −1 0 ⎠ ⎜ 1 0 ⎟
⎝ 2 −1 ⎠
Introduction to Linear Systems 5
then
c12 = 1 ⋅ 1 + 0 ⋅ 5 + 3 ⋅ 0 + (−1) ⋅ (−1) = 2
c31 = 1 ⋅ 0 + 0 ⋅ (−3) + (−1) ⋅ 1 + 0 ⋅ 2 = −1.
At this point we define the product of A and B as
C = AB = (cij )i=1,...,m .
j=1,...,n
3 −1 ⎞ ⎛ ⎞
0 1
⎛ 1 0
⎜ ⎟ ⎛ 1 2 ⎞
C=⎜ 1 ⎟ ⎜ −3 5 ⎟
⎜ 0 −2 2 ⎟⎜⎜ 1
⎜
⎟
⎟ ⎜
⎟ = ⎜ 10 −11 ⎟
⎟.
0 ⎠⎜ ⎟ ⎝ −1
⎝ 1 0
0 −1 ⎝ 2 −1 ⎠ 1 ⎠
1. Associative, that is, (AB)C = A(BC) where A, B, C are matrices such that
the products that appear in the formula are defined.
2. Distributive, that is, A(B + C) = AB + AC, provided that the sum and product
operations that appear in the formula are defined.
Proof. The proof is a calculation and amounts to applying the definition. We show
only the associativity of the product. Consider A ∈ Mm,s (R), B ∈ Ms,r (R), C ∈
Mr,n (R). We observe that:
s r
(AB)iu = ∑ aih bhu , (BC)hj = ∑ bhu cuj ;
h=1 u=1
then r r s
((AB)C)ij = ∑ (AB)iu cuj = ∑ ( ∑ aih bhu )cuj =
u=1 u=1 h=1
r s s r
∑ ∑ aih bhu cuj = ∑ ∑ aih bhu cuj =
u=1 h=1 h=1 u=1
s r s
∑ aih ( ∑ bhu cuj ) = ∑ aih (BC)hj = (A(BC))ij .
h=1 u=1 h=1
Note that the product operation between matrices is not commutative. Even if
the product AB between two matrices A and B is defined, the product BA may not
be defined. For example if
⎛ 1 0⎞ 1 −1
A=⎜ ⎟,
⎜ 2 1⎟ B=( )
⎝−1 0⎠ 0 1
we have that
⎛ 1 −1⎞
⎜ 2 −1⎟
AB = ⎜ ⎟,
⎝−1 1 ⎠
while BA is not defined. Similarly if
1 2 −1 1
A=( ), B=( )
0 −3 0 2
we have that
−1 5 −1 −5
AB = ( ), BA = ( ).
0 −6 0 −6
and then using the product rows by columns in the following way:
where A = (aij ) is the m × n matrix which has as entries the coefficients of the
unknowns,
⎛ x1 ⎞
⎜ x2 ⎟
⎜ ⎟
x=⎜ ⎜
⎜
⎟
⎟
⎜ ⋮ ⎟⎟
⎝ xn ⎠
is the column of the n unknowns, and
⎛ b1 ⎞
⎜ ⎟
b=⎜ ⎟
⎜ b2 ⎟
⎜
⎜ ⎟
⎟
⎜ ⋮ ⎟
⎝ bm ⎠
is the column of m known terms. The matrix A = (aij ) is called the incomplete matrix
associated with the system and the matrix
in the unknowns x1 , x2 , x3 .
Then the incomplete matrix and the complete matrix associated with the system
are, respectively:
√ √
2 2 −1 2 2 −1 2
A= ( ) and (A∣b) = ( ).
1 0 −1 1 0 −1 1
Using matrices is simply a more convenient way to write and deal with linear
systems. Each row of the complete matrix associated with a linear system is equivalent
to an equation in which the unknowns are implied.
Definition 1.3.2 A matrix is said to be in row echelon form or staircase form if the
following conditions are met:
(a) rows consisting of zeros, if any, are found at the bottom of the matrix;
(b) the first nonzero element of each (nonzero) row is located to the right of the
first nonzero element of the previous row.
8 Introduction to Linear Algebra
⎛ 1 −1 −1 2 −4 ⎞
⎜ 5 ⎟
A=⎜ ⎟
0 0 −1 3
⎜
⎜ ⎟
⎟
⎜
⎜ 0 0 0 1
1 ⎟
⎟
3
⎝ 0 0 0 0 0 ⎠
is a row echelon matrix because it satisfies conditions (a) and (b) of Definition 1.3.2.
On the contrary, the matrix
⎛ 2 −1 −1 2 −4 ⎞
B=⎜
⎜ 0 1 −1 3 5 ⎟
⎟
⎝ 0 2 0 1 1 ⎠
5
is not in such a form because the first nonzero element of the third row is not located
to the right of the first nonzero element of the second row (but below it).
Definition 1.3.4 Let A be a row echelon matrix (by rows). We call pivot of A the
first nonzero element of each nonzero row of A. We call row rank of A, denoted by
rr(A), the number of nonzero rows, or equivalently, the number of its pivots.
⎛ 1 −1 −1 2 −4 ⎞
⎜ 5 ⎟
A=⎜ ⎟
0 0 −1 3
⎜
⎜ ⎟
⎟ ,
⎜
⎜ 0 0 0 1
1 ⎟
⎟
3
⎝ 0 0 0 0 0 ⎠
rr(A) ≤ m. (1.3)
rr(A) ≤ n. (1.4)
Definition 1.3.7 The linear system Ax = b is called row echelon if the matrix A is
row echelon.
We will explain how to quickly solve a linear system whose matrix is in row
echelon form.
Introduction to Linear Systems 9
⎛ 4 2 3 4 1 ⎞
⎜ ⎟
(A∣b) = ⎜ ⎟
⎜ 0 1 −2 0 2 ⎟
⎜
⎜ ⎟
⎟ ,
⎜ 0 0 1 −1 0 ⎟
⎝ 0 0 0 1 1 ⎠
which is in row echelon form and has row rank 4. Obviously also the incomplete
matrix A is in row echelon form, and we note that it also has row rank 4. The fact
that the matrix A is in row echelon form means that if you choose any of the sytem
equations, there exists one unknown wich appears in that equation but not in the
following ones. The linear system can therefore be easily solved starting from the
bottom and proceeding with successive replacements, that is, from the last equation
and going back to the first. From the fourth equation we get x4 = 1; replacing x4 = 1
in the third equation we get x3 = x4 = 1. Replacing x3 = 1 in the second equation we
get x2 = 2 + 2 = 4. Finally, replacing x2 = 4 and x3 = x4 = 1 in the first equation, we
obtain x1 = 14 (1 − 2x2 − 3x3 − 4x4 ) = 41 (1 − 8 − 3 − 4) = − 72 . The system has therefore
only one solution: (− 27 , 4, 1, 1).
Example 1.3.9 Consider the linear system in the unknowns x1 , x2 , x3 , x4 obtained
from that of the previous example by deleting the last equation:
⎧
⎪ 4x1 + 2x2 + 3x3 + 4x4 = 1
⎪
⎪
⎨
⎪ x 2 − 2x3 = 2
⎪
⎪
⎩ x3 − x4 = 0.
The complete matrix associated with the system is:
⎛ 4 2 3 4 1 ⎞
⎜
(A∣b) = ⎜ 0 1 −2 0 2 ⎟
⎟.
⎝ 0 0 1 −1 0 ⎠
It is in row echelon form and has row rank 3. The incomplete matrix A is in echelon
form and also has row rank 3. Of course, the solution (− 72 , 4, 1, 1), found in the
previous example, continues to be a solution of the system, so that the system is
certainly compatible. However, how many solutions of the system do we have? Also
in this case, we can proceed from the bottom with subsequent replacements because,
as before, for each equation there exists one unknown which appears in that equation
but not in the following ones. From the last equation, we get x3 = x4 . Replacing
x3 = x4 in the second equation, we get x2 = 2 + 2x3 = 2 + 2x4 . Replacing x2 and x3
in the first equation we obtain x1 = 14 (1 − 2x2 − 3x3 − 4x4 ) = 41 (1 − 4 − 4x4 − 3x4 −
10 Introduction to Linear Algebra
4x4 ) = 14 (−3 − 11x4 ). The system therefore has infinitely many solutions of the form
( 41 (−3 − 11x4 ), 2 + 2x4 , x4 , x4 ) where the variable x4 is allowed to vary in the set of
real numbers.
What we have illustrated in Examples 1.3.8, 1.3.9 is a general fact, and we have
the following proposition.
(c) if rr(A) = rr(A∣b) = k < n, the system admits infinitely many solutions, which
depend on n − k free variables.
Proof. First we observe that by deleting the column b from the matrix (A∣b) we still
have a matrix in row echelon form, thus also the incomplete matrix A associated
with the system is a matrix in such a form. Also, by deleting the column (b) from
the matrix (A∣b), the number of pivots may decrease at most 1 unit. More precisely
this happens if and only if the matrix A has at least one zero row, let us say the i-th,
and the element bi is different from 0. Going back to the corresponding equation, we
obtain that this is equivalent to the condition 0 = bi ≠ 0, which cannot evidently be
satisfied. So if rr(A) ≠ rr(A∣b), the system does not admit solutions. Now suppose
that rr(A) = rr(A∣b) = n. This means that the number of pivots, i.e. the number of
“steps” , coincides with the number of unknowns, so the system consists of exactly n
equations, the unknown x1 appears only in the first equation, x2 appears only in the
first two equations, x3 appears only in the first three and so on. In particular the last
equation of the system contains only the unknown xn and establishes its value. By
substituting this value in the next-to-last equation, we obtain the value of the variable
xn−1 and so on, proceeding by subsequent replacements from below as in Example
1.3.8, we get the solution of the system, which is unique. If, instead, rr(A) = rr(A∣b) =
k < n it is possible, by proceeding from the bottom with subsequent replacements,
to express the k variables corresponding to the pivots of nonzero rows as a function
of the other n − k variables, which remain free to vary in the set of real numbers. In
this way, we get infinitely many solutions.
x1 − x2 + x3 − x4 = 1
{
x3 + 21 x4 = 0
x1 − x2 + x3 − x4 = 1
{
x3 + 21 x4 = 0
Introduction to Linear Systems 11
We note that rr(A) = rr(A∣b) = 2 so, by Proposition 1.3.10 (a), the system has
solutions. Since the number of variables is 4 > 2 , by Proposition 1.3.10 (c), the
system admits infinitely many solutions. Basically, we have four variables and two
conditions on them, thus two variables remain free to vary in the set of real numbers,
and we can express two variables as a function of the other two. Proceeding with
subsequent substitutions from the bottom we have:
1
x3 = − x4
2
1 3
x1 = x2 − x3 + x4 + 1 = x2 + x4 + x4 + 1 = x2 + x4 + 1.
2 2
The infinitely many solutions are of the form: (x2 + 23 x4 + 1, x2 , − 21 x4 , x4 ), with
x2 , x4 ∈ R.
We observe that we could choose to express the variable x4 as depending on the
variable x3 and, for example, the variable x2 as depending on the variables x1 and
x3 (x2 = x1 + 3x3 − 1). In other words, the choice of the free variables is not forced.
However, we can always choose as free those variables corresponding to the columns
of the matrix A containing no pivots and express the unknowns corresponding to the
columns that contain the pivots as a function of the others. For example, in this case
the pivots, both equal to 1, are located on the first and third column of the matrix
A and, as a first choice, we have left the variables x2 and x4 to be free and we have
expressed x1 and x3 as a function of x2 and x4 .
x1 − x2 = 1 x1 − x2 = 1
{ {
x1 + x2 = 2 2x2 = 1
are equivalent. In fact, we can easily see that in both cases the solution is: ( 32 , 12 ). We
note that the first equation is the same in the two systems and that the second system
can be obtained by substituting the second equation with the difference between the
second equation itself and the first equation:
nd nd st
2 equation → 2 equation − 1 equation.
12 Introduction to Linear Algebra
How can we switch from one system to another one equivalent to it? For example
by doing the following:
(c) substitution of the i-th equation with the sum of the i-th equation and the j-th
equation multiplied by a real number α. In summary:
It is straightforward to verify that operations (a) and (b) do not alter the system
solutions. As for operation (c), it is enough to observe that it involves only the i-th
and j-th equation of the system, so just observe that the systems
• Operation (c) is equivalent to replacing the i-th row of the complete matrix as-
sociated to the system, with the sum of the i-th row and the j-th row multiplied
by a real number α.
Let us see a little better what we mean. Let (ai1 . . . ain bi ) and (aj1 . . . ajn bj )
be, respectively, the i-th and j-th row of the matrix (A∣b). Adding the i-th row with
the j-th one multiplied by a number α it means to take the sum:
Because of the importance of these operations we will then give them a name.
(c) replacing the i-th row with the sum of the i-th row and the j-th row multiplied
by a real number α.
Observation 1.4.3 We observe that the elementary operation (c) does not require
that the number α is not zero. In fact, if α = 0 the operation (c) amounts to leaving
the i-th row unchanged.
Given any matrix A = (aij ) we can turn it into a row echelon matrix by elementary
operations on the rows of A. This process is known as Gaussian reduction, and the
algorithm that is used is called Gaussian algorithm and operates as follows:
1. If a11 = 0 exchange the first row of A with a row in which the first element is
nonzero. We denote by a such nonzero element. If the first element of each row
of A is zero, consider the matrix that is obtained by deleting the first column
and start again.
2. Check all the rows except the first, one after the other. If the first element of a
row is zero, leave the row unchanged. If the first element of a row, say the i-th
(i > 1), is equal to b ≠ 0, replace the i-th row with the sum of the i-th row and
the first row multiplied by − ab .
3. At this point all the elements of the first column, except possibly the first, are
zero. Consider the matrix that is obtained by deleting the first row and the first
column of the matrix and start again from step one.
Example 1.4.4 Consider
⎛ 0 1 −1 0 ⎞
⎜
A=⎜ 1 2 0 1 ⎟
⎟.
⎝ 2 −1 1 2 ⎠
Let us use the Gaussian algorithm to reduce A in row echelon form.
Since the entry in place (1, 1) is zero, we exchange the first with the second row,
obtaining the matrix:
⎛ 1 2 0 1 ⎞
⎜
⎜ 0 1 −1 0 ⎟
⎟.
⎝ 2 −1 1 2 ⎠
The first entry of the second row is zero, hence we leave this row unchanged. The
first element of the third row is 2, hence we substitute the third row with the sum of
the third row and the first one multiplied by −2. We thus obtain:
⎛ 1 2 0 1 ⎞ ⎛ 1 2 0 1 ⎞
⎜
⎜ 0 1 −1 0 ⎟
⎟→⎜
⎜ 0 1 −1 0 ⎟
⎟.
⎝ 2 −1 1 2 ⎠ ⎝ 0 −5 1 0 ⎠
Every entry of the first column except for the first one is zero. We then consider the
matrix obtained by deleting the first row and first column:
⎛ 1\ 2\ 0\ 1\ ⎞
⎜
⎜ 0\ 1 −1 0 ⎟
⎟.
⎝ 0\ −5 1 0 ⎠
14 Introduction to Linear Algebra
We apply again the Gaussian algorithm. The first entry of the first row is nonzero,
hence we leave the first row as it is. We substitute the second row with the sum of
the second row with the first multiplied by 5. We obtain:
⎛ 1\ 2\ 0\ 1\ ⎞ ⎛ 1\ 2\ 0\ 1\ ⎞
⎜
⎜ 0\ 1 −1 0 ⎟ → ⎜ 0\ 1 −1 0 ⎟
⎟ ⎜ ⎟.
⎝ 0\ −5 1 0 ⎠ ⎝ 0\ 0 −4 0 ⎠
⎛ 1 2 0 1 ⎞
B=⎜
⎜ 0 1 −1 0 ⎟
⎟.
⎝ 0 0 −4 0 ⎠
At this point, we are able to solve any linear system Ax = b. The complete matrix
associated with the system is (A∣b). Using the Gaussian algorithm we can “reduce”
(A∣b) to a row echelon matrix obtaining a matrix (A ∣b ). The linear system A x = b
′ ′ ′ ′
This means that there is no linear system with real coefficients with a finite number of
solutions greater than 1. When a linear system with real coefficients has 2 solutions,
then, it has infinitely many ones.
Observation 1.4.5 In the Gaussian algorithm the operations are not “forced”. In
Example 1.4.4, for example, instead of exchanging the first with the second row, we
could exchange the first with the third row. In this way, completing the algorithm,
we would have obtained a different row echelon form of the matrix. From the point of
view of linear systems, this simply means that we get different row echelon systems,
but all equivalent to the initial system (and therefore equivalent to each other).
Example 1.4.6 We want to solve the following linear system of four equations in
five unknowns u, v, w, x, y:
⎧
⎪ u + 2v + 3w + x + y = 4
⎪
⎪
⎪
⎪ u + 2v + 3w + 2x + 3y = −2
⎨
⎪
⎪
⎪ u + v + w + x + y = −2
⎪
⎪
⎩ −3u − 5v − 7w − 4x − 5y = 0.
Introduction to Linear Systems 15
⎛ 1 2 3 1 1 4 ⎞
⎜ ⎟
(A∣b) = ⎜ ⎟
⎜ 1 2 3 2 3 −2 ⎟.
⎜
⎜ 1 −2 ⎟
⎟
⎜ 1 1 1 1 ⎟
⎝ −3 −5 −7 −4 −5 0 ⎠
We first reduce the matrix (A∣b) to a row echelon form using the Gaussian algorithm;
then we solve the linear system associated with the reduced matrix.
In this first example, we describe the steps of the Gaussian algorithm and at
the same time we describe the operations on the equations. The advantage of the
Gaussian algorithm is that we can forget the equations and the unknowns, focusing
only on matrices, so our present description is merely explanatory.
The entry (1, 1) is not zero, so we leave the first row unchanged. Then we perform
the following elementary row operations on (A∣b):
nd nd st
- 2 row → 2 row – 1 row;
rd rd st
- 3 row → 3 row – 1 row;
th t st
- 4 row → 4 h row + 3 ( 1 row).
⎛ 1 2 3 1 1 4 ⎞ ⎧
⎪ u + 2v + 3w + x + y = 4
⎪
⎪
⎜
⎜ 0 0 0 1 ⎟
2 −6 ⎟ ⎪
⎪ x + 2y = −6
⎜
⎜ ⎟
⎟ ⎨
⎜
⎜ 0 −1 −2 0 0 −6 ⎟
⎟
⇔ ⎪
⎪ −v − 2w = −6
⎪
⎪
⎝ 0 1 2 −1 −2 12 ⎠ ⎪
⎩ v + 2w − x − 2y = 12
⎛ 1 2 3 1 1 4 ⎞ ⎧
⎪ u + 2v + 3w + x + y = 4
⎪
⎪
⎜
⎜ 0 1 ⎟
2 −1 −2 12 ⎟ ⎪
⎪ v + 2w − x − 2y = 12
⎜
⎜ ⎟
⎟ ⎨
⎜
⎜ 0 −1 −2 0 ⎟
0 −6 ⎟
⇔ ⎪
⎪ −v − 2w = −6
⎪
⎪
⎝ 0 0 0 1 2 −6 ⎠ ⎪
⎩ x + 2y = −6
Now we replace the third row with the sum of the third row and the second one:
⎛ 1 2 3 1 1 4 ⎞ ⎧
⎪ u + 2v + 3w + x + y = 4
⎪
⎪
⎜
⎜ 0 1 2 −1 −2 12 ⎟
⎟ ⎪
⎪ v + 2w − x − 2y = 12
⎜
⎜ ⎟
⎟ ⎨
⎜
⎜ 0 0 0 −1 −2 6 ⎟
⎟
⇔ ⎪
⎪ −x − 2y = 6
⎪
⎪
⎝ 0 0 0 1 2 −6 ⎠ ⎪
⎩ x + 2y = −6
Finally, we substitute the fourth row with the sum of the fourth row and the third
one:
⎛ 1 2 3 1 1 4 ⎞ ⎧
⎪ u + 2v + 3w + x + y = 4
⎪
⎪
⎜
⎜ ⎟
⎟ ⎪
⇔⎪
⎜ 0 1 2 −1 −2 12 ⎟ v + 2w − x − 2y = 12
⎜
⎜ ⎟
⎟ ⎨
⎪
⎜ 0 0 0 −1 −2 6 ⎟ ⎪
⎪ −x − 2y = 6
⎝ 0 0 0 ⎪
⎪
0 0 0 ⎠ ⎩ 0=0
16 Introduction to Linear Algebra
The initial system is equivalent to the row echelon system that we have obtained, in
which the last equation has become an identity. The rank of the incomplete matrix
and the rank of the complete matrix of the row echelon matrix obtained coincide and
are equal to 3. The number of unknowns of the system is 5, then the system admits
infinitely many solutions that depend on 5 − 3 = 2 free variables. We solve the system
from the bottom using subsequent substitutions. Using the third equation we can
express the variable x as a function of y:
x = −2y − 6.
In the second equation, we replace x with its expression in terms of y and we obtain
v in terms of w and y:
v = −2w + 6.
Finally, in the first equation, we substitute x with its expression depending on y, v
with its expression depending on w and we obtain u as a function of w and y:
So the system has infinitely many solutions of the type (w + y − 2, −2w + 6, w, −2y −
6, y), that depend on two free variables, w, y ∈ R.
⎧
⎪ x − 2y = 5
⎪
⎪
⎪
⎪ −x + 2y − 3z = −2
⎨
⎪
⎪
⎪ −2y + 3z − 4t = −11
⎪
⎪
⎩ −3z + 4t = 15
Solution. The complete matrix associated with the system is:
⎛ 1 −2 0 0 5 ⎞
⎜ 0 −2 ⎟
(A∣b) = ⎜ ⎟
⎜ −1 2 −3 ⎟.
⎜
⎜ 3 −4 −11 ⎟
⎟
⎜ 0 −2 ⎟
⎝ 0 0 −3 4 15 ⎠
We reduce the matrix (A∣b) to row echelon form using the Gaussian algorithm:
⎛ 1 −2 0 0 5 ⎞ ⎛ 1 −2 0 0 5 ⎞
⎜
⎜ 0 0 −3 0 3 ⎟
⎟ ⎜
⎜ 0 −2 3 −4 −11 ⎟
⎟
⎜
⎜ ⎟
⎟ ⎜
⎜ ⎟
⎜ 3 −4 −11 ⎟
⎟ ⎜ 3 ⎟
⎟
→
⎜ 0 −2 ⎜ 0 0 −3 0 ⎟
⎝ 0 0 −3 4 15 ⎠ ⎝ 0 0 −3 4 15 ⎠
⎛ 1 −2 0 0 5 ⎞
⎜ 3 −4 −11 ⎟
→⎜ ⎟ = (A′ ∣b′ ).
⎜ 0 −2 ⎟
⎜
⎜ 3 ⎟
⎟
⎜ 0 0 −3 0 ⎟
⎝ 0 0 0 4 12 ⎠
Introduction to Linear Systems 17
The matrix in row echelon form is the complete matrix associated to the linear system:
⎧
⎪ x − 2y = 5
⎪
⎪
⎪
⎪ −2y + 3z − 4t = −11
⎨
⎪
⎪
⎪ −3z = 3
⎪
⎪
⎩ 4t = 12
Note that rr(A ) = rr(A ∣b ) = 4. The system, therefore, admits a unique solution
′ ′ ′
that we can calculate by proceeding with subsequent substitutions from the bottom.
From the fourth equation we have
t = 3;
and from the third equation we have
z = −1;
replacing these values of t and z in the second equation we get
y = −2;
finally, by replacing the values of t, z, y in the first equation we get
x = 1.
So the system has only one solution: (1, −2, −1, 3).
1.5.2 Determine the solutions of the following linear system in the unknowns x, y,
z, t, depending on the real parameter α:
⎧
⎪ x+y+z+t=0
⎪
⎪
⎪
⎪ x − z − t = −1
⎨
⎪
⎪
⎪ x + 2y + (2α + 1)z + 3t = 2α − 1
⎪
⎪
⎩ 3x + 4y + (3α + 2)z + (α + 5)t = 3α − 1.
Solution. In this exercise, we are dealing with a linear system in which a real
parameter α appears. This means that as α varies in R we get infinitely many different
linear systems that we will solve by treating them as much as possible as one. The
procedure is always the same, we behave as if the parameter were a fixed real number.
First of all, then, let us write the complete matrix associated with the system:
⎛ 1 1 1 1 0 ⎞
⎜ ⎟
(A∣b) = ⎜ ⎟
⎜ 1 0 −1 −1 −1 ⎟
⎜
⎜ ⎟
⎟
⎜ 1 2 2α + 1 3 2α − 1 ⎟
⎝ 3 4 3α + 2 α + 5 3α − 1 ⎠
and reduce it to row echelon form using the Gaussian algorithm:
18 Introduction to Linear Algebra
⎛ 1 1 1 1 0 ⎞
⎜ ⎟
→⎜ ⎟
⎜ 0 −1 −2 −2 −1 ⎟
⎜
⎜ ⎟
⎟
⎜ 0 1 2α 2 2α − 1 ⎟
⎝ 0 1 3α − 1 α + 2 3α − 1 ⎠
⎛ 1 1 1 1 0 ⎞
⎜ ⎟
→⎜ ⎟
⎜ 0 −1 −2 −2 −1 ⎟
⎜
⎜ ⎟
⎟
⎜ 0 0 2α − 2 0 2α − 2 ⎟
⎝ 0 0 3α − 3 α 3α − 2 ⎠
⎛ 1 1 1 1 0 ⎞
⎜ ⎟
→⎜ ⎟
⎜ 0 −1 −2 −2 −1 ⎟
⎜
⎜ ⎟
⎟
⎜ 0 0 α−1 0 α−1 ⎟
⎝ 0 0 3α − 3 α 3α − 2 ⎠
⎛ 1 1 1 1 0 ⎞
⎜ ⎟
→⎜ ⎟
⎜ 0 −1 −2 −2 −1 ⎟ = (A ∣b )
′ ′
⎜
⎜ ⎟
⎟
⎜ 0 0 α−1 0 α−1 ⎟
⎝ 0 0 0 α 1 ⎠
We now have to determine what happens when the parameter α varies in the set of
real numbers. We must therefore answer the following questions:
2. For the α values for which the system is compatible, how many solutions do we
have and can we determine them explicitly?
As we know the answer is given by Proposition 1.3.10 (a): we must compare the rank
of A with the rank of (A ∣b ). We note that these ranks depend on the value of α.
′ ′ ′
More precisely: rr(A ) = rr(A ∣b ) = 4 for α ≠ 0, 1. In this case, the system has a
′ ′ ′
⎛ 1 1 1 1 0 ⎞
⎜ 0 −1 −2 −2 −1 ⎟
(A ∣b ) = ⎜
⎜ ⎟
⎟,
′ ′
⎜
⎜ 0 0 −1 0 −1 ⎟ ⎟
⎜ ⎟
⎝ 0 0 0 0 1 ⎠
For α = 1 we have
⎛ 1 1 1 1 0 ⎞
⎜ 0 −1 −2 −2 −1 ⎟
(A ∣b ) = ⎜
⎜ ⎟
⎟
′ ′
⎜
⎜ 0 ⎟
⎟
⎜ 0 0 0 0 ⎟
⎝ 0 0 0 1 1 ⎠
Introduction to Linear Systems 19
therefore rr(A ) = 3 = rr(A ∣b ), so the system has infinitely many solutions depend-
′ ′ ′
ing on a free variable. As usual we can determine such solutions proceeding with
subsequent substitutions: (x3 , −1 − 2x3 , x3 , 1), x3 ∈ R.
⎧
⎪ αx1 + (α + 3)x2 + 2αx3 = α + 2
⎪
⎪
Σα ∶ ⎨
⎪ αx1 + (2α + 2)x2 + 3αx3 = 2α + 2
⎪
⎪
⎩ 2αx1 + (α + 7)x2 + 4αx3 = 2α + 4
1. Determine the solutions of the linear system Σα as the parameter α ∈ R varies.
Solution.
1. Consider the complete matrix (A∣b) associated with the linear system Σα :
⎛ α α + 3 2α α + 2 ⎞
⎜
⎜ α 2α + 2 3α 2α + 2 ⎟
⎟
⎝ 2α α + 7 4α 2α + 4 ⎠
⎛ α α + 3 2α α + 2 ⎞ ⎛ α α + 3 2α α + 2 ⎞
⎜
⎜ α 2α + 2 3α 2α + 2 ⎟
⎟→⎜
⎜ 0 α−1 α α ⎟
⎟→
⎝ 2α α + 7 4α 2α + 4 ⎠ ⎝ 0 −α + 1 0 0 ⎠
⎛ α α + 3 2α α + 2 ⎞
⎜
⎜ 0 α−1 α α ⎟
⎟ = (A ∣b ).
′ ′
⎝ 0 0 α α ⎠
⎛ 0 3 0 2 ⎞
(A ∣b ) = ⎜
⎜ 0 −1 0 0 ⎟
′ ′
⎟,
⎝ 0 0 0 0 ⎠
which is not in row echelon form, but this can be fixed by replacing the second
line with the sum of the second row and first row multiplied by 13 :
20 Introduction to Linear Algebra
⎛ 0 3 0 2 ⎞
⎜ 0 0 0 2/3 ⎟
⎜ ⎟.
⎝ 0 0 0 0 ⎠
The given system is therefore equivalent to the linear system
3x2 = 2
{
0 = 2/3
which obviously has no solutions.
Finally, if α = 1 we have:
⎛ 1 4 2 3 ⎞
⎜ 0 0 1 1 ⎟
(A ∣b ) = ⎜
′ ′
⎟
⎝ 0 0 1 1 ⎠
⎛ 1 4 2 3 ⎞
⎜ 0 0 1 1 ⎟
(A ∣b ) = ⎜
′′ ′′
⎟
⎝ 0 0 0 0 ⎠
x1 + 4x2 + 2x3 = 3
{ .
x3 = 1
This system has infinitely many solutions depending on a parameter and the
set of solutions is: {(1 − 4x2 , x2 , 1) ∣ x2 ∈ R}.
2. Adding the unknown x4 means to add to the complete matrix (A∣b) associ-
ated with the system a column of zeros corresponding to the coefficients of x4 .
Therefore, by reducing (A∣b) to row echelon form, we get the matrix:
⎛ α α + 3 2α 0 α + 2 ⎞
(A ∣b ) = ⎜ α ⎟
′ ′
⎜ 0 α−1 α 0 ⎟.
⎝ 0 0 α 0 α ⎠
Therefore, reasoning as above, but taking into account that in this case the
number of variables is 4, we obtain that:
for α ∈ R\{0, 1} the system has infinitely many solutions, and they are of the
form: ( 2−α
α
, 0.1, x4 ) with x4 ∈ R;
for α = 0 the system has no solutions;
for α = 1 the system has infinitely many solutions, and they are of the form:
(1 − 4x2 , x2 , 1, x4 ), with x2 , x4 ∈ R.
Introduction to Linear Systems 21
1.5.4 Determine if there are values of the real parameter k such that the linear
system
⎧
⎪ 2x1 + x2 − x3 = 0
⎪
⎪
Σ∶⎨
⎪
4x1 − x2 = 0
⎪
⎪ 1 3
⎩ x1 + 2 x2 − x3 = − 2
is equivalent to the linear system
⎧
⎪ x1 + x2 − 12 x3 = 1
⎪
Πk ∶ ⎪
⎨
⎪ 2x1 − x2 + x3 = 2
⎪
⎪
⎩ kx1 − 4x2 + 3x3 = k
Solution. Two systems are equivalent if they have the same solutions. First we solve
the linear system Σ. The complete matrix associated with the system is:
⎛ 2 1 −1 0 ⎞
(A∣b) = ⎜
⎜ 4 −1 0 0 ⎟
⎟.
⎝ 1 1
−1 − 23 ⎠
2
Using the Gaussian algorithm we can reduce (A∣b) to row echelon form, obtaining
the matrix
⎛ 2 1 −1 0 ⎞
⎜
(A ∣b ) = ⎜
′ ′
0 −3 2 0 ⎟⎟.
⎝ 0 0 − 2 − 23 ⎠
1
⎛ 1 1 − 12 1 ⎞
⎜
⎜ 2 −1 1 2 ⎟
⎟.
⎝ 2 −4 3 2 ⎠
By reducing this matrix to row echelon form, we obtain the matrix:
⎛ 1 1 − 12 1 ⎞
(A ∣b ) = ⎜ 2 0 ⎟
′′ ′′
⎜ 0 −3 ⎟.
⎝ 0 0 0 0 ⎠
solutions. We can therefore conclude that there are no values of k such that the
systems Σ and Πk are equivalent to each other.
1.
⎧
⎪ x+y+z =1
⎪
⎪
⎨
⎪ 2x + 2y + z = 1
⎪
⎪
⎩ 3y + z = 1
2.
⎧
⎪ x − y + 4z = 10
⎪
⎪
⎨
⎪ 3x + y + 5z = 15
⎪
⎪
⎩ x + 3y − 3z = 6
1.6.2 Solve the following linear systems in the unknowns x, y, z, w:
1.
⎧
⎪ x − y + 2z − 3w = 0
⎪
⎪
⎪
⎪ 2x + y − w = 3
⎨
⎪
⎪
⎪ 2y + z + w = −3
⎪
⎪
⎩ 2x + z = 0
2.
⎧
⎪ x+y−z+w =0
⎪
⎪
⎪
⎪ 2x − z − w = 0
⎨
⎪
⎪
⎪ x − y − 2w = 0
⎪
⎪
⎩ 3x + y − 2z = 0
3.
⎧
⎪ x+z =7
⎪
⎪
⎪
⎪ x+y =2
⎨
⎪
⎪
⎪ 4x + 12y + z = 1
⎪
⎪
⎩ 5x + 6y + 2z = −1
Introduction to Linear Systems 23
1.6.3 Consider the following linear system in the unknowns x, y, z, depending on the
real parameter k:
⎧
⎪ x + 2y + kz = 0
⎪
⎪
⎨
⎪ x + y = −1
⎪
⎪
⎩ x + ky = −2.
Determine for which value of k the system admits solutions and, when possible,
determine such solutions.
1.6.4 Determine for which values of a ∈ R the following linear system, in the un-
knowns x, y, z, t, admits solutions and, when possible, determine them:
⎧
⎪ 2x + y − z = 1
⎪
⎪
⎪
⎪ −2x + 3z + t = 1
⎪
⎨
⎪
⎪ 2x + 3y + (a + 2a + 3)z + (a − 2)t = a + 6
2 2
⎪
⎪
⎪
⎪ y + 2(a + 2a + 1)z + (3a − 2a − 7)t = 3a + 4.
2 2
⎩
1.6.5 Given the linear system in the unknowns x, y, z:
⎧
⎪ x + (2 + a)y = b
⎪
⎪
Σa,b ∶ ⎨ (2 + 2a)x + 3y − (b + 1)z = 1 + b
⎪
⎪
⎪
⎩ bx + by − (b + 4)z = b + 3b.
2
2. Determine for which among the values of a, b found in the previous point the
system Σa,b is solvable and determine its solutions.
1.6.6 Determine for which values of the real parameter k the following system in
the unknowns x1 , x2 , x3 , x4 is compatible. Determine the system’s solutions when
possible.
⎧
⎪ x1 + 3x2 + kx3 + 2x4 = k
⎪
⎪
⎪
⎪ x1 + 6x2 + kx3 + 3x4 = 2k + 1
⎨
⎪
⎪ −x1 − 3x2 + (k − 2)x4 = 1 − k
⎪
⎪
⎪
⎩ kx3 + (2 − k)x4 = 1
CHAPTER 2
Vector Spaces
In this chapter, we want to introduce the main character of linear algebra: the vector
space. It is a generalization of concepts that we already know very well. The Cartesian
plane, the set of functions studied in calculus, the set of m × n matrices introduced in
the previous chapter, the set of polynomials, the set of real numbers are all examples
of sets that have a natural vector space structure. The vector space will also be the
right environment in which to read and interpret the results obtained in the previous
chapter. Before giving its precise definition, we see some concrete examples.
25
26 Introduction to Linear Algebra
• admits neutral element, i.e. there exists a number, 1, such that 1a = a1 = a for
every a ∈ R;
One of the most important properties of real numbers, which distinguishes them
from other sets of numbers, is their continuity. Geometrically this means that we
think of real numbers as distributed along a straight line. More precisely, given a
line, a fixed point on it (origin) and a unit of measure, there is a correspondence
between the points on the line and the set of real numbers. In other words, every real
number uniquely identifies one and only one point on the line.
-
-1 0 1 2
N
2.2 THE VECTOR SPACE R AND THE VECTOR SPACE OF MATRICES
2
We denote by the symbol R the set of ordered pairs of real numbers:
2
R = {(x, y) ∣ x, y ∈ R}.
The fact that the pairs are ordered means, for example, that the element (1, 2) is
different from the element (2, 1).
Once we fix a Cartesian coordinate system in a plane, there is a correspondence
2
between R and the set of points in the plane. Attaching a Cartesian reference to the
plane means fixing two oriented perpendicular lines r and s and a unit of measure.
The point of intersection between the two straight lines is called the origin of the
reference system. Each point of the plane is then uniquely identified by a pair of real
numbers, called coordinates of the point, which indicate the distance of the point
from the line s and its distance from the line r, respectively. The student who is not
familiar with the Cartesian plane can think of the boardgame Battleship.
Vector Spaces 27
6
y
(2, 1)
1
-
0 1 2 x
It is natural to try to extend the operations that we perform with numbers to the
pairs of real numbers. We then define the following:
• Sum:
2 2 2
+∶ R ×R → R
((x, y), (x , y )) ↦ (x, y) + (x , y ) = (x + x , y + y ).
′ ′ ′ ′ ′ ′
Note that in the product λ(x, y), we have omitted the symbol for the multiplica-
tion, just like we usually do when we multiply two real numbers.
2
We try to interpret geometrically the operations defined in the case of R . To
this aim, we think of each element (a, b) of R as the endpoint of a vector applied at
2
the origin, that is, as an outgoing-oriented segment from the origin with the arrow
pointing to the point of coordinates (a, b). In this case, the way to add two elements
2
of R coincides with the well-known rule of the parallelogram used to add up forces
in physics. This rule states that the sum of two vectors u⃗ and v⃗ applied at a point is
a vector applied at the same point with the direction and length of the diagonal of
the parallelogram having as sides u⃗ and v⃗.
28 Introduction to Linear Algebra
1
6 >
y
v⃗
u ⃗ + ⃗
v
1
u⃗
-
0 x
− 21 u⃗
Some students will remember from physics that there are other operations that
can be performed with vectors (the dot product, cross product, etc.), but now we are
not interested in them and we will not take them into account.
2
Almost immediately we can verify that the sum of elements of R satisfies the
following properties:
1. commutative: (x, y) + (x , y ) = (x , y ) + (x, y) for every (x, y), (x , y ) ∈ R ;
′ ′ ′ ′ ′ ′ 2
(x, y), (x , y ), (x , y ) ∈ R ;
′ ′ ′′ ′′ 2
Vector Spaces 29
4. existence of opposite: for every (x, y) ∈ R there exists an element (a, b), called
2
opposite of (x, y), such that (a, b) + (x, y) = (x, y) + (a, b) = (0, 0). Obviously
we have: (a, b) = (−x, −y);
6. (λ + µ)(x, y) = λ(x, y) + µ(x, y), for every (x, y) ∈ R and for all λ, µ ∈ R.
2
2
Of course we can generalize what has been done for R to the set of ordered
n-tuples of real numbers, for every n ∈ N:
n
R = {(x1 , . . . , xn ) ∣ x1 , . . . , xn ∈ R}.
n n
⋅∶R×R →R
λ(x1 , . . . , xn ) = (λx1 , . . . , λxn ).
With a little patience we can check the properties 1 through 8 listed above for
n
the sum and product real numbers in R .
Let us examine another example. Consider the set of matrices 2 × 2 with real
coefficients:
a b
M2 (R) = {( ) ∣ a, b, c, d ∈ R}
c d
introduced in Chapter 1. We define in M2 (R) the following sum + and product ⋅ by
real number operations:
⋅ ∶ R × M2 (R) → M2 (R)
a b λa λb
λ( )=( ).
c d λc λd
30 Introduction to Linear Algebra
Also in this case, with patience, it is possible to verify properties 1 through 8. Students
are strongly encouraged to do so. For example, we prove the commutativity of +:
′ ′ ′ ′
a b a b a+a b+b
( )+( ′ ′) = ( ′ ′) .
c d c d c+c d+d
Once again it all depends on the properties of the real number operations. This is
precisely the strategy to verify the properties 1 through 8 in M2 (R) and more in
general in any vector space.
It is clear that, similarly to what was done for 2 × 2 matrices, it is possible to
define a sum and a product by real numbers also in the set of m × n matrices, and
with some patience one can show that such operations satisfy all the properties listed
above. So we give the definition of the operations of sum and product by real numbers
in Mm,n (R):
Sum:
′ ′ ′
⎛ a11 a12 . . . a1n ⎞ ⎛ a11 a12 . . . a1n ⎞
⎜
⎜ a21 a22 . . . a2n ⎟
⎟ ⎜
⎜
′ ′ ′ ⎟
⎟
⎜
⎜ ⎟
⎟ ⎜
⎜
a21 a22 . . . a2n ⎟
⎟
⎜
⎜ ⋮ ⎟ +
⎟ ⎜ ⎜ ⎟
⎟
⋮ ⋱ ⋮ ⎜ ⋮ ⋮ ⋱ ⋮ ⎟
⎝ am1 am2 . . . amn ⎠ ⎝ a′ ′ ′ ⎠
m1 am2 . . . amn
′ ′ ′
⎛ a11 + a11 a12 + a12 . . . a1n + a1n ⎞
⎜
⎜ a21 + a21
′ ′ ′ ⎟
⎟
=⎜ ⎟
a22 + a22 . . . a2n + a2n
⎜
⎜ ⎟.
⎟
⎜
⎜ ⋮ ⋮ ⋱ ⋮ ⎟
⎟
⎝ am1 + am1 am2 + a2m . . . amn + a′mn
′ ′ ⎠
Note that the neutral element of the sum in Mm,n (R) is the zero matrix
⎛ 0 0 ... 0 ⎞
⎜
⎜ 0 0 ... 0 ⎟
⎟
⎜
⎜ ⎟
⎟ .
⎜
⎜ ⋮ ⋮ ⋱ ⋮ ⎟
⎟
⎝ 0 0 ... 0 ⎠
The sets described so far with the sum and product operations are all examples
of vector spaces.
Vector Spaces 31
Definition 2.3.1 A real vector space is a set V equipped with two operations called,
respectively, sum and multiplication by scalars:
+∶V ×V ⟶ V ⋅∶R×V ⟶ V
3. there exists a neutral element for the sum, i.e. there is 0 ∈ V such that 0 + u =
u + 0 = u for each u in V ;
4. each element of V has an opposite, that is, for every u ∈ V there exists a vector
a such that a + u = u + a = 0;
5. 1u = u;
The elements of a vector space are called vectors, while the real numbers are called
scalars. The neutral element of the sum in V is called zero vector. To distinguish
vectors from numbers we will indicate the vectors in bold.
In the previous section, we have seen that R and Mm,n (R) are real vector spaces.
n
Then α ⋅ v⃗ is the vector lying in the same line as v⃗, its length is multiplied by the
factor ∣α∣ (where ∣α∣ is the absolute value of α) and its direction is the same as that
of v⃗ or opposite to it depending on whether the sign of α is positive or negative.
In this way, the set of vectors of the space applied at a point turns out to be a
real vector space.
Example 2.3.3 The functions. Let F(R) be the set of functions F ∶ R ⟶ R with
two operations:
We now show some useful properties that are valid for any vector space:
Proposition 2.3.5 Let V be a vector space. Then we have the following properties:
i) The zero vector is unique and will be denoted by 0V .
ii) If u is a vector of V , its opposite is unique and it will be denoted by −u.
iii) λ0V = 0V , for each scalar λ ∈ R.
iv) 0u = 0V for each u ∈ V (note the different sense of zero in the first and second
member!).
v) If λu = 0V , then it is λ = 0 or u = 0V .
vi) (−λ)u = λ(−u) = −λu.
Proof. Notice that, while the properties 1 through 8 in Definition 2.3.1 are given, the
statements i) through vi), though appear obvious to the student, must be proven and
they are a direct consequence of Definition 2.3.1.
′
i) If 0 is another vector that fulfills property 3 of Definition 2.3.1, we have that
′
0 + 0V = 0V (here we take u = 0V ). Furthermore, using the fact that 0V satisfies
′ ′ ′ ′
property 3 and taking u = 0 we also have 0 + 0V = 0 . It follows that 0V = 0 + 0V =
′
0.
′
ii) If a and a are both opposite of u, by property 4 we have in particular that
a+u = 0V and u+a = 0V . Then a = a+0V = a+(u+a ) = (a+u)+a = 0V +a = a ,
′ ′ ′ ′ ′
Definition 2.3.6 The trivial vector space, denoted with {0V }, is a vector space
consisting only of the zero vector.
0V + 0V = 0V
Observation 2.3.8 Let us think about the definition of R-vector space. First of
all we observe that, by definition, a real vector space can never be empty. In fact, it
must contain at least the zero vector that is the neutral element of the sum. It could
happen that a vector space contains only the zero vector; In this case, it is called
trivial.
Now suppose that V is a nontrivial vector space, i.e. that it contains at least one
vector v ≠ 0V . How many elements does V contain? As we can multiply by real
numbers, which are infinitely many, V will contain all the infinitely many multiples
of v, that is, all the vectors of the form λv, for every λ ∈ R.
Since λ varies in the set of real numbers we get infinitely many different elements
of V . To rigorously prove this statement, we have to show that if λ and µ are distinct
real numbers, that is, λ ≠ µ, and v ∈ V is a nonzero vector, then λv ≠ µv. In fact,
if not, we would have:
λv = µv ↔ (λ − µ)v = 0V
with λ − µ ≠ 0 and v ≠ 0V , which would contradict property (v) of Proposition 2.3.5.
2.4 SUBSPACES
How can we recognize and describe a vector space? How can we single out a subset of a
vector space with the same characteristics? To answer to these questions is necessary
to introduce the definition of subspace.
1) not empty: it contains infinitely many pairs of real numbers (x, 0);
2) closed under the sum: given two elements (x1 , 0), (x2 , 0) in X, the sum of
(x1 , 0) + (x2 , 0) = (x1 + x2 , 0) still belongs to X;
3) closed with respect to the product by scalars: given any real number α and any
element (x, 0) ∈ X, the product α(x, 0) = (αx, 0) belongs to X.
2
Geometrically, after setting a Cartesian coordinate system in R , we can iden-
tify the set X with the x−axis. Then adding two vectors lying on the x−axis or
multiplying by a scalar one of them, we still get a vector that lies on the x−axis.
More generally, if a is a real number set Wa = {(x, y) ∈ R ∣ y = ax}. We observe
2
first that (0, 0) ∈ W . Furthermore, given two elements (x1 , ax1 ) and (x2 , ax2 ) in Wa ,
their sum
(x1 , ax1 ) + (x2 , ax2 ) = (x1 + x2 , a(x1 + x2 ))
belongs to Wa , i.e. Wa is closed with respect to the sum. Likewise, given λ ∈ R and
(x1 , ax1 ) ∈ Wa , we have:
λ(x1 , ax1 ) = (λx1 , λax1 ) = (λx1 , a(λx1 )),
Vector Spaces 35
say that S is not a subspace of R as it does not contain 0R2 = (0, 0). Therefore, not
2
all the straight lines of the plane give subspaces of R , but only those through (0, 0).
2
a b
Example 2.4.4 The set X = {( ) ∈ M2 (R)∣ b = 1} ⊆ M2 (R) is not a subspace
c d
0 0
of M2 (R), because the zero matrix ( ) does not belong to X.
0 0
Observation 2.4.5 If S is a subspace of a vector space V, the condition 0V ∈ S is
necessary, but not sufficient, for S to be a subspace of V . We give a counterexample.
Example 2.4.6 Let S = {(x, y, z) ∈ R ∣ xy = z}. Despite the fact the set S contains
3
the zero vector (0, 0, 0) of R , S is not a subspace of R because it is not closed with
3 3
respect to the sum. Indeed the vectors v = (1, 1, 1) and w = (−1, −1, 1) belong to S,
since they satisfy the equation xy = z, but their sum v + w = (1, 1, 1) + (−1, −1, 1) =
(0, 0, 2) does not belong to S since 0 ⋅ 0 ≠ 2.
Observation 2.4.8 The examples given so far highlight two different types of rea-
soning. To prove that a subset S of a vector space V is a subspace of V , we must prove
it satisfies properties 1), 2) and 3) of Definition 2.4.1. These properties must apply
always, that is, for each pair of vectors in S (property 2) and for all real numbers
(property 3).
On the contrary, to prove that a subset of a vector space V is not a subspace of
V , it is enough to show that one of properties 1), 2) and 3) of Definition 2.4.1 fails,
i.e. if S ≠ ∅, or if there exists a pair of vectors of S whose sum is not in S, or if there
is a vector of S and a scalar whose product is not in S.
36 Introduction to Linear Algebra
Example 2.4.9 In the vector space R[x] of polynomials with real coefficients in a
variable x, consider the subset R2 [x] consisting of polynomials of degree less than or
equal to 2:
2
R2 [x] = {p(x) = a + bx + cx ∣ a, b, c ∈ R}.
Then R2 [x] is a subspace of R[x]. In fact, adding two polynomials of degrees less
than or equal to 2, we obtain a polynomial of degree less than or equal to 2:
2 ′ 2 2
(a + bx + cx ) + (a + b x + c x ) = (a + a ) + (b + b )x + (c + c )x .
′ ′ ′ ′ ′
In other words, R2 [x] (which is certainly not empty) is closed with respect to the
sum and the product by scalars, therefore it is a subspace of R[x].
Example 2.4.10 In the vector space R[x] of polynomials with real coefficients in a
variable x, we consider the subset S = {p(x) = a + bx + cx ∈ R[x] ∣ ac = 0}. The
2
subset S contains, for example, the polynomial identically zero and the monomial x,
therefore, it is different from the empty set. However, it is not closed with respect
to the sum defined in R[x]. In fact, S contains the polynomials p(x) = 1 + x and
2 2
q(x) = x + x but does not contain their sum: p(x) + q(x) = 1 + 2x + x .
Since a subspace of a vector space V is primarily a subset of V , it is natural to ask
what happens when carrying out the operation of set theoretic union and intersection
of two (or more) subspaces of V .
Example 2.4.11 Consider the set W = X ∪ Y with
2 2
X = {(x, y) ∈ R ∣ y = 0} and Y = {(x, y) ∈ R ∣ x = 0}.
2
In Example, 2.4.2 we showed that X is a subspace of R . In a similar way, one can
2 2
show that Y is a subspace of R . However, their union W is not a subspace of R ,
because it is not closed with respect to the sum: in fact, the vector (1, 0) belongs to
W because it is an element of X, and the vector (0, 1) belongs to W because it is an
element of Y . However their sum (1, 0) + (0, 1) = (1, 1) belongs neither to X nor to
Y.
The geometric reasoning is also simple: we can think of X and Y , respectively, as
the x-axis and the y-axis in a Cartesian reference in the plane, and W as the union
of the two axes. It is clear from the parallelogram rule that the sum of a vector that
lies on the x-axis and a vector that lies on the y-axis will be outside of the two lines,
hence W is not a subspace.
We observe that the set W can be described as follows:
2
W = {(x, y) ∈ R ∣ xy = 0}.
In fact, since x and y real numbers, their product is zero if and only if at least one
of the two factors is zero.
Vector Spaces 37
The example above shows that, in general, the union of two subspaces of a vector
space V is not a subspace. More precisely, we have the following proposition.
Proposition 2.4.12 Let W1 , W2 be two subspaces of a vector space V . Then W1 ∪W2
is a subspace if and only if W1 ⊆ W2 or W2 ⊆ W1 .
Proof. “⇐” If W1 ⊆ W2 (resp. if W2 ⊆ W1 ) then W1 ∪W2 = W2 (resp. W1 ∪W2 = W1 ),
which is a subspace by hypothesis.
“⇒” To prove this implication, we show that, if W1 ⊆ / W2 and W2 ⊆ / W1 , then
W1 ∪ W2 is not a subspace of V . As W1 ⊆ / W2 , there exists a vector v1 ∈ W1 \ W2 ;
similarly with W2 ⊆/ W1 , there exists a vector v2 ∈ W2 \ W1 . If W1 ∪ W2 were a
subspace, then v = v1 + v2 should be an element of W1 ∪ W2 as it is the sum of an
element of W1 and one of W2 . If v were in W1 , then also v2 = v − v1 would belong to
W1 , but we had chosen v2 ∈ W2 \W1 . Similarly, if v were in W2 , then also v1 = v−v2
would belong to W2 , but we had chosen v1 ∈ W1 \ W2 . So v ∉ W1 ∪ W2 .
With the intersection of two subspaces we have less problems.
Proposition 2.4.13 The intersection S1 ∩ S2 of two subspaces S1 and S2 of a vector
space V is a subspace of V .
Proof. We have to show that S1 ∩ S2 is a subspace of V : we observe first that this
intersection is not empty since 0V belongs both to S1 and S2 , so it belongs to S1 ∩ S2 .
Now we show that S1 ∩S2 is closed with respect to the sum of V : so let v1 , v2 ∈ S1 ∩S2 .
This means, in particular, that v1 , v2 ∈ S1 , which is a subspace of V , so v1 + v2 ∈ S1 .
Similarly since S2 is a subspace, we have that v1 + v2 ∈ S2 . Then v1 + v2 ∈ S1 ∩ S2 .
Similarly, since we show that S1 ∩ S2 is closed with respect to the product by
scalars. Let v ∈ S1 ∩ S2 and λ ∈ R. In particular, v ∈ S1 , which is a subspace, so
λv ∈ S1 ; similarly, v ∈ S2 , which is a subspace, so λv ∈ S2 . Thus λv belongs both
to S1 and to S2 , so that it belongs to their intersection.
Example 2.4.14 Consider the subspaces:
a b
S = {( ) ∈ M2 (R) ∣ b = −c} and
c d
a b
T = {( ) ∈ M2 (R) ∣ a + b + c + d = 0}
c d
of M2 (R).
What is S ∩ T ? The subspace S ∩ T consists of the elements of M2 (R) belonging
to S and T , that is:
a b
S ∩ T = {( ) ∈ M2 (R) ∣ b = −c, a + b + c + d = 0} .
c d
Hence:
a −c
S ∩ T = {( ) ∈ M2 (R)} .
c −a
It is easy to verify that this subset of M2 (R) is closed with respect to the sum and
the product by scalars, as guaranteed by Proposition 2.4.13.
38 Introduction to Linear Algebra
Solution. First of all, we observe that X is not the empty set because (0, 0, 0) ∈ X
(just take r = s = 0).
Let us now consider two generic elements of X: (r1 , s1 , r1 −s1 ) and (r2 , s2 , r2 −s2 ).
Their sum is: (r1 , s1 , r1 − s1 ) + (r2 , s2 , r2 − s2 ) = (r1 + r2 , s1 + s2 , r1 − s1 + r2 − s2 ) =
(r1 + r2 , s1 + s2 , r1 + r2 − (s1 + s2 )), and it still belongs to X as it is of the type
(r, s, r − s), with r = r1 + r2 and s = s1 + s2 .
Consider (r1 , s1 , r1 −s1 ) ∈ X and λ ∈ R. Then λ(r1 , s1 , r1 −s1 ) = (λr1 , λs1 , λ(r1 −
s1 )) = (λr1 , λs1 , λr1 − λs1 ) still belongs to X as it is of the type (r, s, r − s), with
3
r = λr1 and s = λs1 . So X is a subspace of R .
0 + 0 + 2z1 z2 = 2z1 z2 , and it is not true that 2z1 z2 is always equal to zero.
For example, the elements (−2, 1, 2) and (−8, 3, 4) belong to W but (−2, 1, 2) +
(−8, 3, 4) = (−10, 4, 6) ∈ / W , because 2 ⋅ (−10) + (6) ≠ 0. (Note that these W
2
elements were not chosen randomly but so to satisfy the request 2z1 z2 ≠ 0).
3
So W is not a subspace of R .
3
2.5.3 Determine a non-empty subset of R closed with respect to the sum but not
with respect to the product by scalars.
Solution. The set X = {(x, y, z)∣x, y, z ∈ R, x ≥ 0} has this property. In fact, X is not
empty because, for example, (0, 0, 0) ∈ X. Let us check if X is closed with respect
to the sum. Let (x1 , y1 , z1 ),(x2 , y2 , z2 ) ∈ X, with x1 , x2 ≥ 0. Then (x1 , y1 , z1 ) +
(x2 , y2 , z2 ) = (x1 + x2 , y1 + y2 , z1 + z2 ) ∈ X because x1 + x2 ≥ 0 (the sum of two
non-negative real numbers is a non-negative real number). Now let (x1 , y1 , z1 ) ∈ X
and λ ∈ R. We have that λ(x1 , y1 , z1 ) = (λx1 , λy1 , λz1 ) belongs to X if and only
if λx1 ≥ 0. But if we choose λ negative and x1 > 0, for example λ = −1 and
(x1 , y1 , z1 ) = (3, −2, 1), this condition it is not verified. So X is not closed with
respect to the product by scalars.
r s
Xk = {( 2 ) ∣ r, s ∈ R} ⊆ M2 (R)
r+k k −k
is a subspace of M2 (R).
Solution. We know that in order for Xk to be a subspace of M2 (R), the null matrix
0 0 r s
must belong to Xk , that is to say that ( ) is of type ( 2 ) for some
0 0 r+k k −k
Vector Spaces 39
r, s ∈ R. This happens if
⎧
⎪ r=0
⎪
⎪
⎪
⎪ s=0
⎨
⎪
⎪
⎪ k=0
⎪
⎪ 2
⎩ k − k = 0,
that is, k = 0.
r s
Let us see now if X0 = {( ) ∣ r, s ∈ R} is a subspace of M2 (R). Certainly X0
r 0
is not empty since it contains the null matrix.
r s1 r2 s2
Let ( 1 )( ) ∈ X0 . We have:
R1 0 r2 0
r1 s1 r s r + r2 s1 + s2
( ) + ( 2 2) = ( 1 ).
r1 0 r2 0 r1 + r2 0
r s
The matrix obtained therefore belongs to X0 . Similarly, if λ ∈ R also λ ( 1 1 ) =
r1 0
λr λs1
( 1 ) belongs to X0 , so X0 is closed with respect to the sum and with respect
r1 0
to the product by scalars, so it is a vector subspace of M2 (R).
In conclusion, Xk is a subspace of M2 (R) for k = 0.
iii) Wn = {p(x) ∈ R[x] ∣ deg(p(x)) = n}, N ∈ N. (Here deg (p(x)) indicates the
degree of the polynomial p(x).
a 0
iv) D = {( ) ∈ M2 (R)}.
0 d
a b
v) T = {( ) ∈ M2 (R)}.
0 d
vi) A = {(aij )i=1,...,3
j=1,...,3
∈ M3 (R) ∣ a11 + a22 + a33 = 0}.
0 r
x) X = {( ) ∣ r, s ∈ R} ⊆ M2 (R).
2r s
40 Introduction to Linear Algebra
r 2r
xi) X = {( 2 ) ∣ r ∈ R} ⊆ M2 (R).
r r
2.6.2 Show that the set of solutions of the homogeneous linear system
⎧
⎪ x1 + 3x2 − x3 = 0
⎪
⎪
⎪
⎨ 2x1 + 4x2 − 4x3 − x4 = 0
⎪
⎪
⎪
⎪
⎩x2 + x3 + 2x4 = 0
4
in the unknowns x1 , x2 , x3 , x4 is a subspace of R .
for every (an ), (bn ) ∈ SR and k ∈ R. Show that with these operations SR is a vector
space over R.
2.6.9 Let C(R; R) be the set of continuous functions from R to R. Consider the
operation of sum of functions and the operation of product of any function by a real
number defined as in Example 2.3.3. Show that with these operations C(R; R) is a
vector space over R.
CHAPTER 3
In the previous chapter, we have seen the definition of vector space and subspace.
We now want to describe these objects in a more efficient way. We introduce for
this purpose the concept of linear combination of a set of vectors and the concept
of linearly independent vectors. These are two fundamental definitions within the
theory of vector spaces, whose understanding is necessary to get to the key concepts
of basis and linear transformation, which we we will treat later.
41
42 Introduction to Linear Algebra
y 6
(2, 2a)
*
(1, a)
*
-
x
(− 23 , − 32 a)
We say that the vector (1, a) generates the subspace W represented by the line
y = x. The word “generate” is not accidental since, in fact, all vectors of the subspace
W are multiples of (1, a). We also note that the choice of the vector (1, a), as a
generator of W , is arbitrary, we could as well have choosen any of its multiples, like
(2, 2a) or (− 23 , − 32 a).
Graphically it is clear that if we know a point of a straight line (in the plane, but
also in three-dimensional space) different from the origin, then we can immediately
draw the line passing through it and the origin. We will see later that the fact of
knowing the generators of a vector space allows us to determine it uniquely.
Now let us see another example. In R , we consider the two vectors (1, 0) and
2
(0, 1). We ask ourselves: what is the smallest subspace W of R that contains both of
2
these vectors? From the previous reasoning, we know that this subspace must contain
the two subspaces W1 and W2 generated by (1, 0) and (0, 1):
We also know that the sum of two vectors of W still belongs to W (by the definition
of subspace). For instance (1, 0) + (0, 1) = (1, 1) ∈ W , but also (1, 2) + (3, 4) =
(4, 6) ∈ W . The student is invited to draw vectors sums in R considering the points
2
of the plan associated with them and using the parallelogram rule. In this way, we
2
can convince ourselves that actually W = R . But the graphic construction is not
sufficient to prove this fact, as it is not possible draw all the vectors of the plane, so
let us look at an algebraic proof. We take the generic vector (λ, 0) in W1 and the
generic vector (0, µ) in W2 , and we take their sum: (λ, 0) + (0, µ) = (λ, µ). It is clear
Linear Combination and Linear Independence 43
that all vectors (x, y) in R can be written in this way choosing λ = x and µ = y. So
2
we found that the smallest subspace of R containing the vectors (1, 0) and (0, 1) is
2
2
all R .
Now we formalize the concept of generation of subspace, which we have described
with the previous examples.
For example, (1, 1) is a linear combination of (1, 0) and (0, 1) with scalars λ1 = 1
and λ2 = 1, but also a linear combination of (2, 1) and (1, 0) with scalars λ1 = 1 and
λ2 = −1.
We now come to the concept of vector space generated by some vectors, the main
concept of this chapter along with that of linear independence.
Definition 3.1.2 Let V be a vector space and let {v1 , . . . , vn } be a set of vectors
of V . The subspace generated (or spanned) by the vectors v1 , . . . , vn is the set of all
their linear combinations, in symbols
We have seen that, for example, the subspace generated by a nonzero vector in
2
R corresponds to a straight line, while the subspace generated by the two vectors
(1, 0) and (0, 1) of R is all R .
2 2
Definition 3.1.4 Let V be a vector space and let {v1 , . . . , vn } be a set of vectors of
V . We say that v1 , . . . , vn generate V , or {v1 , . . . , vn } is a set of generators of V if
V = ⟨v1 , . . . , vn ⟩.
In the example above, we saw that the vectors (1, 0) and (0, 1) generate the vector
space R , as each vector (a, b) of R can be written as a linear combination of (1, 0)
2 2
Proposition 3.1.5 Let V be a vector space and let {v1 , . . . , vn } be a set of vectors of
V . Then we have that ⟨v1 , . . . , vn ⟩ is a subspace of V . Moreover, if Z is a subspace of
V containing v1 , . . . vn , then ⟨v1 , . . . , vn ⟩ ⊆ Z, therefore ⟨v1 , . . . , vn ⟩ is the smallest
subspace of V containing {v1 , . . . , vn }.
44 Introduction to Linear Algebra
Proof. First of all, we note that 0 ∈ ⟨v1 , . . . , vn ⟩, as 0 = 0v1 + ⋅ ⋅ ⋅ + 0vn . Now let
v, w ∈ ⟨v1 , . . . , vn ⟩. Then by definition there exist scalars α1 , . . . , αn and β1 , . . . , βn
such that:
v = α1 v1 + ⋯ + αn vn , w = β1 v1 + ⋯ + βn vn
thus
v + w = (α1 + β1 )v1 + ⋯ + (αn + βn )vn ∈ ⟨v1 , . . . , vn ⟩ .
Moreover if k ∈ R
Let us look at an example that is linked to what we have seen in Chapter 1 about
the solution of linear systems depending on a parameter.
Example 3.1.6 We want to determine the subspace generated by the vectors (1, 1),
(2, k), depending on the parameter k.
We see at once that if k = 2, then the two points lie on the same line through the
origin, thus the smallest subspace that contains both of them will be precisely this
line, whose equation is y = x.
If k ≠ 2, the two points lie on two distinct lines through the origin, then the
smallest subspace that contains both of them must contain such lines, and also
the sum of any two points on these lines. Therefore, with a reasoning similar to
the one made at the beginning of this chapter, we have that the smallest subspace
that contains both points whose coordinates are (1, 1), (2, k) is the whole plane, i.e.
the vectors (1, 1), (2, k) generate R .
2
Let us now see an algebraic proof of this fact. Let (a, b) be a generic vector of
R . We want to determine when (a, b) belongs to ⟨(1, 1), (2, k)⟩, that is, when there
2
⎧
⎪ λ1 + 2λ2 = a
⎪
⎪
⎨
⎪
⎪
⎪
⎩ λ1 + kλ2 = b
We leave as an exercise to verify that this system in the unknowns λ1 , λ2 always has
a solution if k ≠ 2. If instead k = 2 the complete matrix associated to the system is:
1 2 a
( ),
1 2 b
If we solve the linear system, depending on the parameter k, with the Gaussian
algorithm, an easy calculation shows that this system always has solution (for every
a and b fixed) provided we have k ≠ 2. But when k = 2:
2
⟨(1, 0), (0, 1), (1, 1)⟩ = R .
Conversely if
⟨v1 , . . . , vn ⟩ = ⟨v1 , . . . , vn , w⟩
then w is a linear combination of v1 , . . . , vn .
Proof. In order to show the first part of the result, it is enough to observe that w ∈
⟨v1 , . . . , vn ⟩ by assumption, so it follows from Proposition 3.1.5 that Z = ⟨v1 , . . . , vn ⟩
is a subspace containing {v1 , . . . , vn , w}, thus ⟨v1 , . . . , vn , w⟩ ⊆ ⟨v1 , . . . , vn ⟩, again
by Proposition 3.1.5. The inclusion ⟨v1 , . . . , vn ⟩ ⊆ ⟨v1 , . . . , vn , w⟩ is obvious,
To show the converse, it is enough to note that since ⟨v1 , . . . , vn ⟩ =
⟨v1 , . . . , vn , w⟩ we have that w ∈ ⟨v1 , . . . , vn ⟩, i.e. w is a linear combination of
v1 , . . . , vn .
Linear Combination and Linear Independence 47
λ1 v1 + ⋅ ⋅ ⋅ + λn vn = 0,
Let us review the previous examples. The set of vectors {(1, 0), (0, 1)} in R is a
2
set of vectors which are linearly independent, as their only linear combination that
gives the zero vector is obtained with all zero scalars:
only if α = β = 0.
On the other hand, the vectors of the set {(1, 0), (0, 1), (1, 1)} are linearly depen-
dent, because there is a linear combination of the given vectors with scalars, not all
zero, which is equal to the zero vector.
Example 3.2.2 Consider the following set of vectors in R2 [x]: {x+1, x −1, 2, x−1}.
2
Is this a set of linearly independent vectors? If we knew something more about linear
algebra the answer would be immediate, for the moment we have to perform the
1
The words “linearly independent” can be used for the vectors v1 , . . . , vn , but also for the set
of vectors {v1 , . . . , vn } indifferently, i.e. the two terminologies have the same meaning.
48 Introduction to Linear Algebra
From which:
2
α2 x + (α1 + α4 )x + (α1 − α2 + 2α3 − α4 ) = 0.
A polynomial is zero if and only if all its coefficients are zero, so we obtain the linear
system:
⎧
⎪ α2 = 0
⎪
⎪
⎨
⎪ α1 + α4 = 0 .
⎪
⎪
⎩ α1 − α2 + 2α3 − α4 = 0
We leave as an exercise for the student to verify that this system admits infinitely
many solutions. For example, it has the solution: α1 = 1, α2 = 0, α3 = −1, α4 = −1.
So we can explicitly write a linear combination of the given vectors, which is equal
to the zero vector, while the scalars are not all zero:
1 ⋅ (x + 1) − 1 ⋅ 2 − 1 ⋅ (x − 1) = 0.
(x + 1) = 2 + (x − 1)
Of course, in a set of linearly dependent vectors, it is not true that each of the given
vectors can be expressed as a function of the others; for example, we see that there
2
is no way to express x − 1 as a linear combination of the others.
The important thing to note is that, in a set of linearly dependent vectors, if we
eliminate one vector that is a linear combination of the others, the subspace they
generate does not change (see Proposition 3.1.8), and the vectors of the new set thus
obtained may have become linearly independent. Be careful, however, that this is not
always the case, for example, in the set {2x, 3x, 4x} even if we eliminate a vector,
the remaining vectors are linearly dependent, as the student may verify. Somehow
the linear independence tells us that we have reached the smallest number of vectors
to describe the subspace. This concept will be explored very carefully in the next
chapter on bases.
Observation 3.2.3 We note that, if a set of vectors contains the zero vector, then
it is always a set of linearly dependent vectors. In fact, if we consider the set {v1 =
0, v2 . . . , vn } we have that;
So we obtained a linear combination equal to the zero vector, while the first scalar is
not zero.
Linear Combination and Linear Independence 49
α1 v1 + ⋯ + αn vn = 0 .
Since at least one of the scalars is nonzero, we have that αk ≠ 0 for some k. Then:
α1 αk−1 αk+1 αn
vk = − α v1 − ⋯ − α vk−1 − α vk+1 − ⋯ − α vn ,
k k k k
then
α1 v1 + ⋯ + αk−1 vk−1 + (−1)vk + αk+1 vk+1 + ⋯ + αn vn = 0,
and at least one of the coefficients is not zero, that of vk . So the vectors v1 , . . . , vn
are linearly dependent.
Observation 3.2.6 In order to prove that the vectors of a set are linearly dependent,
it is enough to find a vector that is a linear combination of the others. For example
if we see that a vector is a multiple of another, then we know that the vectors of the
set are linearly dependent. The vectors of the following sets are linearly dependent,
and we can verify it without any calculation (but the student should do it if he does
not see why and wants to convince himself!).
• In R3 [x]: {0, x, 1 − x, x }.
3
• In M2 (R):
1 0 0 0 0 √0 3 0
{( ), ( ), ( ), ( )} .
0 1 0 3 0 2 0 3
50 Introduction to Linear Algebra
The next proposition shows that removing some vectors from a set of linearly
independent vectors, we get a set of vectors that are still linearly independent.
Example 3.2.9 Suppose we want to determine for which values of k the following
vectors in M2 (R):
1 0 6 0 k 0
( ), ( ), ( ),
1 −3 k −18 1 5
are linearly independent. We proceed as suggested by the definition: we write a linear
combination of the given vectors, and we see if there are nonzero scalars that allow
us to get the zero vector. For the values of k for which this happens, we will have
that the given vectors are linearly dependent.
So we write a generic linear combination of the given vectors and set it equal to
the zero vector:
1 0 6 0 k 0 0 0
λ1 ( ) + λ2 ( ) + λ3 ( )=( ) (3.1)
1 −3 −18 k 1 5 0 0
from which:
λ1 + 6λ2 + kλ3 0 0 0
( )=( ).
λ1 + kλ2 + λ3 −3λ1 − 18λ2 + 5λ3 0 0
Then equality (3.1) is satisfied if and only if λ1 , λ2 , λ3 are solutions of the homoge-
neous linear system:
⎧
⎪ λ1 + 6λ2 + kλ3 = 0
⎪
⎪
⎨
⎪ λ1 + kλ2 + λ3 = 0
⎪
⎪
⎩ −3λ1 − 18λ2 + 5λ3 = 0.
The complete matrix associated with the system is:
⎛ 1 6 k 0 ⎞
(A∣b) = ⎜
⎜ 1 k 1 0 ⎟
⎟,
⎝ −3 −18 5 0 ⎠
Linear Combination and Linear Independence 51
and, if we reduce it in the echelon form with the Gaussian algorithm, it becomes:
⎛ 1 6 k 0 ⎞
(A ∣b ) = ⎜ ⎟.
′ ′
⎜ 0 k−6 1−k 0 ⎟
⎝ 0 0 5 + 3k 0 ⎠
Note that the system always admits the zero solution λ1 = λ2 = λ3 = 0. The question
is whether there are also nonzero solutions or not. The row echelon form of the ma-
trix is particularly appropriate to understand whether or not we have only the zero
solution. We can immediately observe that if the initial vectors were 5, of course at
least one of the 5 initial unknowns (the scalars that give the zero linear combina-
tion) would be indeterminate, in other words the vectors would definitely be linearly
dependent, because we could arbitrarily assign a nonzero value to that unknown. In
the next chapter, using the concept of basis, we will formalize this reasoning which,
however, even now should be clear intuitively.
Returning to the example in question, if k ≠ 6 and k ≠ 35 , we have that rr(A ) =
′
rr(A ∣b ) = 3 is equal to the number of unknowns, thus the system admits a unique
′ ′
⎛ 1 6 6 0 ⎞
(A ∣b ) = ⎜ ⎟
′′ ′′
⎜ 0 0 −5 0 ⎟,
⎝ 0 0 0 0 ⎠
thus rr(A ) = rr(A ∣b ) = 2, and the system admits infinitely many solutions de-
′′ ′′ ′′
pending on one parameter. In particular, there are nonzero solutions, and the given
vectors are linearly dependent.
Finally, if k = − 53 we get:
⎛ 1 6 − 53 0 ⎞
(A ∣b ) = ⎜
⎜ 0 − 5 −5 0 ⎟
′ ′ 33 8
⎟.
⎝ 0 0 0 0 ⎠
Thus rr(A ) = rr(A ∣b ) = 2, and as before the given vectors are linearly dependent.
′ ′ ′
We conclude this chapter with some exercises that clarify the techniques for ver-
ifying linear dependence or independence and the concept of generators.
2 2 2 2 2
ax + bx + c = λ1 (x + 2x + k) + λ2 (5x + 2kx + k ) + λ3 (kx + x + 3),
52 Introduction to Linear Algebra
⎛ 1 5 k a ⎞
(A∣b) = ⎜
⎜ 2 2k 1 b ⎟
⎟,
⎝ k k2 3 c ⎠
⎛ 1 5 k a ⎞
′ ′
⎜
(A ∣b ) = ⎜ 0 2k − 10 1 − 2k b − 2a ⎟
⎟.
⎝ 0 0 3 − 21 k c − 2b k ⎠
solution, independently from the values of a, b, c, therefore each vector of the type
ax + bx + c belongs to ⟨x + 2x + k, 5x + 2kx + k , kx + x + 3⟩, and the given vectors
2 2 2 2 2
generate R2 [x].
If k = 5, further reducing the matrix to row echelon form, we obtain:
⎛ 1 5 5 a ⎞
(A ∣b ) = ⎜ ⎟
′′ ′′
⎜ 0 0 1 2c − 5b ⎟,
⎝ 0 0 0 −a − 22b + 9c ⎠
and if −a − 22b + 9c ≠ 0 the system does not admit solution, because 3 = rr(A ) ≠
′′
If k = 6 we obtain
⎛ 1 5 6 a ⎞
(A ∣b ) = ⎜ ⎟
′ ′
⎜ 0 2 −11 b − 2a ⎟,
⎝ 0 0 0 c − 3b ⎠
and if c ≠ 3b the system does not admit solution, thus the given vectors do not
generate R2 [x]. For example 3x + 5x − 8 ∈
/ ⟨x + 2x + k, 5x + 2kx + k , kx + x + 3⟩.
2 2 2 2 2
4
3.3.2 Let W be the vector subspace of R given by the set of solutions of the homo-
geneous linear system:
x + x2 − x4 = 0
{ 1
2x1 + x2 − x3 + 3x4 = 0
Linear Combination and Linear Independence 53
1 1 0 −1 0
(A∣b) = ( ),
2 1 −1 3 0
which reduced to row echelon form with the Gaussian algorithm becomes:
1 1 0 −1 0
(A ∣b ) = ( ).
′ ′
0 −1 −1 5 0
Solution. We need to see if there are two nonzero scalars λ1 , λ2 , such that λ1 v1 +
λ2 v2 = 0. It must happen that:
2
λ1 (2, 2k, k , 2k + 2) + λ2 (−1, −k, 2k + 2, −k − 1) = (0, 0, 0, 0),
i.e.
2
(2λ1 − λ2 , 2kλ1 − kλ2 , k λ1 + (2k + 2)λ2 , (2k + 2)λ1 − (k + 1)λ2 ) = (0, 0, 0, 0),
⎛ 2 −1 0 ⎞
⎜ ⎟
(A∣b) = ⎜ ⎟
2k −k 0
⎜
⎜ ⎟
⎟ ,
⎜
⎜ k
2
2k + 2 0 ⎟
⎟
⎝ 2k + 2 −k − 1 0 ⎠
54 Introduction to Linear Algebra
which reduced to row echelon form with the Gaussian algorithm becomes:
⎛ 2 −1 0 ⎞
⎜ 0 (k + 2)
2
⎟
(A ∣b ) = ⎜ ⎟
′ ′
⎜ 0 ⎟
⎜
⎜ ⎟
⎟ .
⎜ 0 0 0 ⎟
⎝ 0 0 0 ⎠
If k ≠ −2, then rr(A ) = rr(A ∣b ) = 2 so the system has only one solution, which is
′ ′ ′
then rr(A ) = rr(A ∣b ) = 1, the system has infinitely many solutions that depend on
′ ′ ′
and v1 = −2v2 .
1 0 0 3 1 3
{( ) ,( ) ,( )} ⊆ M2 (R).
2 −1 1 −3 −2 −1
determine if they are linearly independent and determine the subspace generated by
them.
Linear Combination and Linear Independence 55
3.4.6 Determine for which values of k the polynomial x +2k belongs to ⟨x +kx, x −
2 2 2
(k + 1)x − k⟩.
1 0 −2 −1 −1 1
A=( ), B = ( ).
4 1 0 0 −2 1
a) Determine the values of k for which the three vectors v1 , v2 , v3 are linearly
independent.
b) Determine the values of k for which w ∈ ⟨v1 , v2 , v3 ⟩.
3.4.10 a) Determine the solutions of the following linear system as the parameter k
varies:
⎧
⎪ x−z =1
⎪
⎪
⎪
⎨ kx − ky + 2z = 0
⎪
⎪
⎪
⎪
⎩2x + 3ky − 11z = −1.
2
b) Determine for which values of the parameter k the polynomial x − 1 belongs to
2 2 2
the subspace generated by the polynomials x + kx + 2, kx − 3k , x − 2x + 11.
3.4.12 Find the values of h and k for which the vectors of the set {x + h, kx −
2
generate R2 [x]?
b) Choose a value of k for which v1 , v2 , v3 are linearly dependent and write one of
them as a linear combination of the others.
1 2 k 0
3.4.14 a) Establish for which values of the parameter k the matrices ( ), ( ),
0 0 4 0
−1 k − 2
( ) are linearly independent.
k 0
b) Establish for which values of the parameter k such vectors generate the sub-
r s
space W = {( ) ∣r, s, t ∈ R} of M2 (R).
t 0
3.4.15 Determine for which values of the parameter k we have that:
3 −2 2 0 1 k k 6
( ) ∈ ⟨( ), ( ), ( )⟩ .
2 2 2 0 0 −k 1 −6
i) S = {(x, y, z) ∈ R ∣ x + 2y = 0}.
3
The concepts of basis and dimension, which are closely related, are central in the
theory of vector spaces.
2 3
Let us start with some examples, mainly, but not only, in R and R . Thanks to
these examples, we will develop geometric intuition, which will be valuable in order
to understand what happens in vector spaces that cannot be visualized. Then, we will
discuss the theory and state the Completion Theorem. This is the most important
result in this chapter; starting from the concept of basis it allows us to reach the
definition of dimension. At the end, we will revisit the Gaussian algorithm, described
in Chapter 1, and we will see how it can be effectively used to answer the main
questions regarding a basis or the dimension of a vector space.
Example 4.1.1 In the previous chapter, we have seen several examples of sets of
2
generators of the vector space R . We will mention a few:
2
R = ⟨(1, 0), (0, 1)⟩ = ⟨(1, 0), (0, 1), (1, 1)⟩.
2 2
If we add a vector to a set that generates R , this set continues to generate R by
Proposition 3.1.8. The question that we ask ourselves is: how can we find a minimal
2
set, i.e. a set as small as possible, of generators for the space R ?
Proposition 3.1.8 comes again to help us: if we remove from the set a vector which
is a linear combination of the others, the new set obtained generates the same vector
space. In the example we are considering, we can remove the vector (1, 1) as it is a
linear combination of (1, 0) and (0, 1): (1, 1) = (1, 0) + (0, 1). If now, however, we try
to further decrease the number of generators in the set, the vector space generated by
them changes. Indeed ⟨(1, 0)⟩ is just the x-axis, while ⟨(0, 1)⟩ is just the y-axis. So, if
we remove one of the two vectors (1, 0) or (0, 1), from the given set, the vector space
generated by the set changes, in other words, there are no “redundant generators”. The
57
58 Introduction to Linear Algebra
important difference between the two sets: {(1, 0), (0, 1)} and {(1, 0), (0, 1), (1, 1)}
cannot go unnoticed. The first set consists of linearly independent vectors, while the
vectors of the second set are linearly dependent. So we have seen in this example
that, starting from a set of generators, we can delete one by one the generators that
are linear combination of other vectors in the set, until we obtain a set of linearly
independent vectors, that is a set in which no vector is a linear combination of the
other vectors (Proposition 3.2.4 of Chapter 3). At this point, we cannot remove any
vector from the set, without changing the vector space generated by the vectors in
the set.
The next proposition formalizes the conclusions of the previous example and gives
us an algorithm to obtain a minimal set of generators; as we will see, this set is called
basis.
Proposition 4.1.2 Let V = ⟨v1 , . . . , vn ⟩ ≠ {0}. Then there exists a subset of
{v1 , . . . , vn }, consisting of linearly independent vectors, which generates V .
Proof. We proceed algorithmically by steps.
Step one. We have that V = ⟨v1 , . . . , vn ⟩ by assumption. If v1 , . . . , vn are linearly
independent, then we have proved the statement. Otherwise, one of the vectors,
suppose vn , is a linear combination of the others, by Proposition 3.2.4. By Proposition
3.1.8, we have:
V = ⟨v1 , . . . , vn ⟩ = ⟨v1 , . . . , vn−1 ⟩.
Step two. In step one, we have eliminated the vector vn from the set of generators
of V , thus V = ⟨v1 , . . . , vn−1 ⟩. If v1 , . . . , vn−1 are linearly independent, then we have
finished our proof. Otherwise, we go back to step one, that is, one of the vectors,
suppose vn−1 , is a linear combination of the others. By Proposition 3.1.8 in Chapter
3 we have:
V = ⟨v1 , . . . , vn ⟩ = ⟨v1 , . . . , vn−1 ⟩ = ⟨v1 , . . . , vn−2 ⟩.
It is clear that, after a finite number of steps, n − 1 at most, we get a set in which
no vector is a linear combination of the others, therefore by Proposition 3.2.4, it is a
set of linearly independent vectors.
We are ready for the definition of basis.
Definition 4.1.3 Let V be a vector space. The set {v1 , . . . , vn } is called a basis if:
1. The vectors v1 , . . . , vn are linearly independent.
2. The vectors v1 , . . . , vn generate V .
We say also that V is finitely generated, if there exists a finite set of generators
of V , i.e. V = ⟨v1 , . . . , vn ⟩.
If V admits a basis {v1 , . . . , vn }, then it is finitely generated. We will soon see
that the converse is also true.
Henceforth, we will say that a set X is maximal (minimal) with respect to a
certain property if X enjoys that properties, but as soon as we add (remove) an
element to (from) X, then X does not enjoy the property anymore.
Basis and Dimension 59
⟨v1 , . . . , vn ⟩ = ⟨v1 , . . . , vn , w⟩ = V,
αv + α1 v1 + ⋅ ⋅ ⋅ + αn vn = 0.
We want to find a basis for the subspace they generate. The procedure we will follow is
not the standard one, but only an example of the procedure described in Proposition
4.1.2. First, we see immediately that 5 is a linear combination of 2, as it is a multiple
60 Introduction to Linear Algebra
we have:
3 2 3 3 2
W = ⟨x , x , 2, 5, x + 2, 3x, −7x, 2x ⟩ = ⟨x , x , 2, x + 2, 3x⟩.
To verify that these vectors are linearly independent and thus form a basis of W , we
have to show that the equation:
3 2
ax + bx + 2c + 3dx = 0
Observation 4.1.8 Proposition 4.1.6 ensures that each vector space, different from
the zero space and generated by a finite number of vectors, has at least one basis.
However, this basis is not unique. For example it is easy to verify that, if k ≠ 0, then
the set Bk = {(k, 1), (0, 1)} is a basis of R , so R has infinitely many bases. We
2 2
invite the student to convince himself that every vector space that admits a basis,
actually admits infinitely many.
Basis and Dimension 61
• The canonical basis is given by C = {E1,1 , . . . , Em,n }, where Ei,j is the matrix
that has 1 in position (i, j) and 0 in the other positions. For example the
canonical basis of M2,3 is
1 0 0 0 1 0
C = {( ),( ),
0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0
( ),( ),( ),( )} .
0 0 0 1 0 0 0 1 0 0 0 1
The knowledge of the canonical bases tells us immediately the dimension of the
vector spaces considered above:
n
dim(R ) = n, dim(Rn [x]) = n + 1, dim = mn.
If one keeps in mind this fact, many exercices become simple. For example, we
ask ourself if the set
a) dim(W ) ≤ dim(V );
Proof. a) Recall that the dimension of a vector space is the number of elements
of a basis, which is also a maximal set of linearly independent vectors. Since W
is contained in V , we cannot choose in W a larger number of linearly independent
vectors than the number of linearly independent vectors in V , therefore the dimension
of W cannot be larger than the dimension of V .
b) Since the vectors of a basis of W are linearly independent, by the Completion
Theorem, we can add to them dim(V ) − dim(W ) vectors to obtain a basis of V . If
dim(W ) = dim(V ), it means that a basis of W is already a basis of V , and then in
particular generates V , i.e. W = V .
principle, we have to verify that the two vectors are linearly independent and that
2
they generate R . However, while linear independence is obvious, because one vector
is not a multiple of the other, for generation we should do the calculations. Now we
see that the calculations are not necessary. In fact, we have two linearly independent
Basis and Dimension 63
2
vectors in R , so the subspace generated by them has dimension two. So, by the
2
previous theorem, it must be equal to R .
3 2
Similarly, in Example 4.1.5 we have shown that the vectors x , x , 2, 3x are
linearly independent. Then, since there are four of them, we can immediately
conclude, without making calculations, that they are a basis of R3 [x], therefore
⟨x , x , 2, 3x⟩ = R3 [x].
3 2
In fact, a much stronger property is true. In general, given a set of vectors, the
property of being linearly independent or the property of being generators of a certain
vector space are not related with each other. But, if we are in a vector space of
dimension n and we consider a set with exactly n vectors, then the two properties
are equivalent.
a) {v1 , . . . , vn } is a basis of V ;
c) v1 , . . . , vn generate V .
We now see that a basis is a very “efficient” way to represent vectors in a vector
space. Let us see an example.
2
Example 4.2.7 In R we know that all the vectors are linear combinations of the
two vectors of the canonical basis e1 = (1, 0), e2 = (0, 1). We can also verify that
2
each vector in R is not only a linear combination of e1 and e2 , but it is so in a unique
way. In fact, if we take the vector (2, 3), we can write (2, 3) = 2(1, 0) + 3(0, 1) and
the numbers 2 and 3 are the only scalars which give us (2, 3) as a linear combination
of (1, 0), (0, 1). But the situation is different if we take the three vectors (1, 0), (0, 1)
and (1, 1). Indeed, we already know that the vector (1, 1) is somehow “redundant”,
that is, we know that: ⟨(1, 0), (0, 1), (1, 1)⟩ = ⟨(1, 0), (0, 1)⟩, and this is because (1, 1)
is a linear combination of (1, 0) and (0, 1). This is reflected by the fact that a vector
2
in R is no longer a linear combination in a unique way of these three vectors. Indeed
Theorem 4.2.8 Let B = {v1 , . . . , vn } be an ordered basis for the vector space V
(that is, we fixed an order in the set of vectors numbering them) and let v ∈ V . Then
there exists a unique n−tuple of scalars (α1 , . . . , αn ), such that
v = α1 v1 + ⋯ + αn vn .
v = α1 v1 + ⋯ + αn vn .
v = β1 v1 + ⋯ + βn vn .
Definition 4.2.9 The scalars (α1 , . . . , αn ) are called the components of v ∈ V in the
basis B or also the coordinates of v with respect to the basis B and will be denoted
by (v)B = (α1 , . . . , αn ).
Example 4.2.10 As an example, we prove by exercise that B = {(1, −1), (2, 0)} is
a basis of R , and we determine the coordinates of v = (−3, 1) with respect to this
2
basis.
Clearly the vectors in B are linearly independent as (2, 0) is not a multiple of
(1, −1). At this point, as R has dimension 2, we already know that B is a basis. But
2
We observe that, if we have a matrix A ∈ Mm,n (R), we can consider its rows
n
as vectors of R , such vectors will be called row vectors of A. For example, if A =
0 1 −3
( ), its row vectors are R1 = (0, 1, −3) and R2 = (2, −1, 1).
2 −1 1
Proposition 4.3.1 Given a matrix A ∈ Mm,n (R) the elementary row operations do
n
not change the subspace of R generated by the row vectors of A.
(c) replacing the i-th row with the sum of the i-th row and j-th multiplied by any
real number α.
It is immediate to verify that the statement is true for the type of operations
(a) and (b). For operations of type (c), it is sufficient to show that if Ri and Rj
are two row vectors of A and α ∈ R, we have that ⟨Ri , Rj + αRi ⟩ = ⟨Ri , Rj ⟩. We
obviously have that Ri , Rj +αRi ∈ ⟨Ri , Rj ⟩, so ⟨Ri , Rj ⟩ is a subspace of R containing
n
Observation 4.3.2 The elementary row operations do not change the subspace
n m
of R generated by the row vectors of A, but they do change the subspace of R
generated by the column vectors of A. We invite the reader to verify this fact with
an example in order to convince himself.
Proposition 4.3.3 If a matrix A is row echelon, its nonzero row vectors are linearly
independent.
Proof. Let R1 , . . . , Rk be the nonzero rows of A, and let a1j1 , . . . , akjk be the cor-
responding pivots. Now let λ1 R1 + ⋅ ⋅ ⋅ + λk Rk = 0, and we want to prove that
λ1 = λ2 = ⋅ ⋅ ⋅ = λk = 0. In the vector λ1 R1 + ⋅ ⋅ ⋅ + λk Rk , the element in the position
j1 is λ1 a1j1 , the element in the position j2 is λ1 a1j2 + λ2 a2j2 , and so on, until we reach
the element in position jk , which is λ1 a1jk + λ2 a2jk + ⋅ ⋅ ⋅ + λk akjk . So, from the fact
that λ1 R1 + ⋅ ⋅ ⋅ + λk Rk = 0, it follows that:
⎧
⎪ λ1 a1j1 = 0
⎪
⎪
⎪
⎪ λ1 a1j2 + λ2 a2j2 = 0
⎨
⎪
⎪
⎪ ⋮
⎪
⎪
⎩ λ1 a1jk + λ2 a2jk + ⋅ ⋅ ⋅ + λk akjk = 0.
Let us now see an example of how these propositions can be applied in the exer-
cises.
3
Example 4.3.4 Given the following vectors of R
3
we want to establish if they are linearly independent and if they are a basis of R .
Moreover, we want to find a basis of the subspace W generated by them and to
calculate the dimension of W .
The matrix:
⎛2 3 −1⎞
⎜0 −1 3 ⎟
A=⎜ ⎟
⎝2 2 2⎠
can be reduced to the following row echelon form:
⎛1 0 4 ⎞
⎜0 1 −3⎟
⎜ ⎟.
⎝0 0 0 ⎠
⎛ 2 ⎞ ⎛ 0 ⎞ ⎛2⎞
W = ⟨⎜
⎜3⎟ ⎜−1⎟
⎟,⎜ ⎜2⎟
⎟,⎜ ⎟⟩ =
⎝−1⎠ ⎝ 3 ⎠ ⎝2⎠
⎛1⎞ ⎛ 0 ⎞
= ⟨⎜
⎜0⎟
⎟,⎜
⎜−1⎟⎟⟩ .
⎝4⎠ ⎝ 3 ⎠
⎛1⎞
Therefore, we have that the subspace W is generated by the two vectors u1 = ⎜ ⎜0⎟⎟
⎝4⎠
⎛−1⎞
and u2 = ⎜ ⎜0⎟⎟, which are linearly independent by Proposition 4.3.3. Therefore,
⎝3⎠
{u1 , u2 } is a basis for W , which consequently has dimension 2. Since by Theorem
4.1.4, the number of vectors in a basis is the maximum number of linearly independent
vectors in the vector space, we have that v1 , v2 , v3 are linearly dependent, therefore
3
they cannot be a basis of R .
Let us now see in general how to proceed to find a basis of the subspace W gener-
n
ated by vectors v1 , . . . , vk ∈ R and how to decide if they are linearly independent,
that is, we formalize what we learned from the previous example.
Basis and Dimension 67
just add the vector (0, 0, 1), or any vector of the type (0, 0, h) with h ≠ 0.
associates to every vector its coordinates with respect to the basis B. If we write
v ∈ V as a linear combination of the elements of B, v = α1 v1 + ⋯ + αn vn , we have
c(v) = (v)B = (α1 , . . . , αn ). By Theorem 4.2.8, c is a bijection. If v has coordinates
(α1 , . . . , αn ) and λ ∈ R is a scalar, then the coordinates of λv are (λα1 , . . . , λαn ),
and if w is another vector, whose coordinates are (β1 , . . . , βn ), then the coordinates
of v + w are c(v + w) = (α1 + β1 , . . . , αn + βn ). The student is invited to verify these
assertions, that we will recall later on, when we talk about isomorphisms of vector
spaces. For now, we are just content to note that, thanks to these properties, we have
68 Introduction to Linear Algebra
we want to establish if they are linearly independent and determine a basis of the
subspace W they generate. Consider the canonical basis C = {x , x, 1} of R2 [x]. With
2
We can then proceed as in Example 4.3.4, making exactly the same calculations, and
⎛1⎞
we obtain a basis of W given by polynomials whose coordinates are u1 = ⎜⎜0⎟⎟ , u2 =
⎝4⎠
⎛−1⎞
⎜0⎟
⎜ ⎟. Returning to polynomials, a basis of W is {x + 4, −x + 3}.
2
⎝3⎠
⎛1 − k 1 1 −k ⎞
⎜
A=⎜ 2 2−k 2 0⎟⎟,
⎝ 1 1 1−k k ⎠
Basis and Dimension 69
⎛1 1 1−k k ⎞
A =⎜ ⎟
′
⎜0 −k 2k −2k ⎟.
⎝0 0 4k − k 2 −4k + k 2 ⎠
2 ′
If k ≠ 0 and 4k − k ≠ 0, that is, if k ≠ 0 and k ≠ 4 the matrix A has 3 nonzero
rows, which are linearly independent, therefore as
dimension 1.
If k = 4 we get:
⎛1 1 −3 4 ⎞
A =⎜ ⎜0 −4 8 −8⎟
′
⎟,
⎝0 0 0 0⎠
so A has 2 nonzero rows and W = ⟨(1, 1, −3, 4), (0, −4, 8, −8)⟩ has dimension 2.
′
We now choose k = 4. To complete {(1, 1, −3, 4), (0, −4, 8, −8)} to a basis of R
4
we have to add 2 row vectors having the pivots in the “missing steps”, i.e. in the
third and fourth place. For example, we can add (0, 0, −1, 2), (0, 0, 0, 5).
So {(1, 1, −3, 4), (0, −4, 8, −8), (0, 0, −1, 2), (0, 0, 0, 5)} is a basis of R obtained
4
by completing a basis of W .
4.4.2 Let v1 = (1, 2, 1, 0), v2 = (4, 8, k, 5), v3 = (−1, −2, 3 − k, −k). Determine for
which values of k the vectors v1 , v2 , v3 are linearly independent. Set k = 1 and de-
/ ⟨v1 , v2 , v3 ⟩.
4
termine, if possible, a vector w ∈ R , such that w ∈
Solution. We write the matrix A that has the given vectors as rows, and we apply
the Gaussian algorithm to reduce it to row echelon form. We have
⎛1 2 1 0⎞
A=⎜
⎜ 4 8 k 5⎟⎟,
⎝−1 −2 3 − k −k ⎠
⎛1 2 1 0 ⎞
A =⎜ 5 ⎟
′
⎜0 0 k − 4 ⎟.
⎝0 0 0 5 − k⎠
which are linearly independent, so W = ⟨(1, 2, 1, 0), (0, 0, k − 4, 5), (0, 0, 0, k − 5)⟩
has dimension 3. Since v1 , v2 , v3 generate W , by Proposition 4.2.6 they are linearly
independent.
70 Introduction to Linear Algebra
If k = 4 we get:
⎛1 2 1 0⎞
A =⎜
⎜0 0 0 5⎟
′
⎟.
⎝0 0 0 1⎠
Now W = ⟨(1, 2, 1, 0), (0, 0, 0, 5), (0, 0, 0, 1)⟩ = ⟨(1, 2, 1, 0), (0, 0, 0, 1)⟩ has dimension
2, and then v1 , v2 , v3 are linearly dependent.
If k = 5 we get:
⎛1 2 1 0⎞
A =⎜ ⎜0 0 1 5⎟
′
⎟,
⎝0 0 0 0⎠
which is a row echelon form matrix with two nonzero rows, so W has dimension 2
and v1 , v2 , v3 are linearly dependent.
′
Now let k = 1. We replace this value in A to get a basis of W :
⎛1 2 1 0⎞
⎜0 0 −3 5⎟
A =⎜
′
⎟.
⎝0 0 0 4⎠
We have that W has dimension 3, and if we choose a row vector that has
the second nonzero pivot, for example w = (0, −2, 3, −1), it follows from
Proposition 4.3.3 that the vectors (1, 2, 1, 0), (0, −2, 3, −1), (0, 0, −3, 5), (0, 0, 0, 4)
are linearly independent, so by Proposition 3.2.4 we have that (0, −2, 3, −1) ∈ /
⟨(1, 2, 1, 0), (0, 0, −3, 5), (0, 0, 0, 4)⟩, that is w ∈
/ W.
5
4.4.3 Let W be the subspace of R generated by the set
1 3 −1 1 2
A=( ),
2 6 −2 4 4
1 3 −1 1 2
A =( ).
′
0 0 0 2 0
Basis and Dimension 71
At this point, to complete {(1, 3, −1, 1, 2), (0, 0, 0, 2, 0)} to a basis of R , we need
5
to add 3 row vectors having the pivots in the “missing steps”, i.e. the second, third
and fifth places. For example, we can add (0, 1, −1, 0, 1), (0, 0, 2, 1, −3), (0, 0, 0, 0, 1).
As
W = ⟨(1, 3, −1, 1, 2), (2, 6, −2, 4, 4)⟩ = ⟨(1, 3, −1, 1, 2), (0, 0, 0, 2, 0)⟩,
it is easy to see that
⟨(1, 3, −1, 1, 2), (2, 6, −2, 4, 4), (0, 1, −1, 0, 1), (0, 0, 2, 1, −3), (0, 0, 0, 0, 1)⟩
= ⟨(1, 3, −1, 1, 2), (0, 0, 0, 2, 0), (0, 1, −1, 0, 1), (0, 0, 2, 1, −3), (0, 0, 0, 0, 1)⟩
5
=R .
So if B̃ is the set
{(1, 3, −1, 1, 2), (2, 6, −2, 4, 4), (0, 1, −1, 0, 1), (0, 0, 2, 1, −3), (0, 0, 0, 0, 1)}
5 5
the vectors of B̃ generate R , therefore, by Proposition 4.2.6, B̃ is a basis of R , and
it was obtained by completing the basis B of W .
4.4.4 Consider the following vector subspaces of M2,3 (R):
a 0 b
U = {( ) ∣a, b, c, d ∈ R} ,
c a d
r s t
W = {( ) ∣r + s + t + u + x + y = 0} ,
u x y
and determine a basis of U ∩ W .
Solution. We have that:
a 0 b
U ∩ W = {( ) ∣a, b, c, d ∈ R, a + b + c + a + d = 0} ,
c a d
that is
a 0 b
U ∩ W = {( ) ∣a, b, c, d ∈ R, d = −2a − b − c} =
c a d
a 0 b
{( ) ∣a, b, c ∈ R} =
c a −2a − b − c
1 0 0 0 0 1 0 0 0
= {a ( ) + b( ) + c( ) ∣a, b, c ∈ R} =
0 1 −2 0 0 −1 1 0 −1
1 0 0 0 0 1 0 0 0
⟨( ),( ),( )⟩ .
0 1 −2 0 0 −1 1 0 −1
1 0 0 0 0 1 0 0 0
So the vectors v1 = ( ) , v2 = ( ) , v3 = ( ) gener-
0 1 −2 0 0 −1 1 0 −1
ate U ∩ V . To show that they are linearly independent, let us consider their co-
ordinates with respect to the canonical basis, that is: (v1 )C = (1, 0, 0, 0, 1, −2),
(v2 )C = (0, 0, 0, 0, −1), (v3 )C = (0, 0, 0, 1, 0, −1). We observe that the matrix A
whose rows are (v1 )C , (v2 )C , (v3 )C is in row echelon form, so by Proposition 4.3.3
we have that (v1 )C , (v2 )C , (v3 )C are linearly independent. Then also v1 , v2 , v3 are
linearly independent, so they are a basis of U ∩ W .
72 Introduction to Linear Algebra
4.5.3 Find a basis of ⟨(1, 0, 3), (2, 3, 0), (1, 1, 1)⟩ and complete it to a basis of R .
3
3
4.5.4 Determine which of the following sets of vectors generate R and which of them
3
are a basis of R .
⎧
⎪ ⎛0⎞ 2 ⎫
⎪
⎪
⎪ ⎛ ⎞ ⎛ 0 ⎞⎪ ⎪
⎪
⎪ ⎜
⎜ ⎟
⎟ ⎪
⎪
i) S1 = ⎨ ⎜
⎜ ⎟
⎟ , ⎜
⎜0 ⎟
⎟ , ⎜
⎜ −3 ⎟
⎟⎬ .
⎪
⎪ ⎜
⎜ 0 ⎟
⎟ ⎪
⎪
⎪
⎪ ⎝0 ⎠ ⎝ 0 ⎠⎪
⎪
⎪
⎩⎝−1⎠ ⎪
⎭
⎧
⎪ ⎛ 1 ⎞ ⎛2⎞ ⎛ 1 ⎞ ⎛ 0 ⎞⎫ ⎪
⎪
⎪ ⎪
⎪
ii) S2 = ⎨ ⎜
⎜ 1 ⎟
⎟ , ⎜
⎜0 ⎟
⎟ , ⎜
⎜ −1 ⎟
⎟ ⎜
⎜ 6 ⎟
⎟⎬ .
⎪
⎪ ⎪
⎪
⎪ ⎝
⎩ −1 ⎠ ⎝1 ⎠ ⎝ 1 ⎠ ⎝ ⎠
−3 ⎭⎪
⎧
⎪ ⎛1⎞ ⎛−2⎞ ⎛ 0 ⎞⎫ ⎪
⎪
⎪ ⎪
⎪
iii) S3 = ⎨ ⎜
⎜3⎟
⎟ , ⎜
⎜ 1 ⎟
⎟ , ⎜
⎜ 5 ⎟
⎟⎬ .
⎪
⎪ ⎪
⎪
⎪
⎩ ⎝1⎠ ⎝−9 ⎠ ⎝ −5 ⎠⎪
⎭
4.5.5 Determine if the vectors v1 = (1, 1), v2 = (−1, 1), v3 = (2, 1) are linearly
2
independent. Do they generate R ?
4.5.6 Determine for which values of k the vector w = (−1, k, 1) belongs to
⟨(1, −1, 0), (k, −k, 1), (−1, k , 1)⟩.
2
4.5.7 Determine for which values of k the vectors v1 = (1, 2k, 0), v2 = (1, 0, −3), v3 =
(−1.0, k + 2) are linearly independent. Set k = 1 and establish if (1, −2, −6) ∈
⟨v1 , v2 , v3 ⟩.
4.5.8 Let v1 = (1, 1, 1, 0), v2 = (2, 0, 2, −1), v3 = (−1, 1, −1, 1). Determine a basis of
⟨v1 , v2 , v3 ⟩ and then complete it to a basis of R . Also determine for which values of
4
3
subspace of R and determine a basis for it.
4.5.10 Determine for which values of k the vectors v1 = (0, 1, 0), v2 = (1, k, 4), v3 =
(k, 2, 3) generate R . Set k = 0 and determine if the vector (4, −1, 6) belongs to
3
⟨v1 , v2 , v3 ⟩.
2 2
4.5.11 i) Determine for which values of k the vectors v1 = x + 2x − 1, v2 = x +
kx + 1 − k, v3 = 5x + k are linearly dependent.
ii) Choose one value of k found in point i) and write one of the 3 vectors as a linear
combination of the others. Then find a basis of ⟨v1 , v2 , v3 ⟩.
Basis and Dimension 73
2
4.5.12 i) Determine for which values of k the vectors v1 = 6x − 6x − k, v2 =
2
−kx + kx + 6 are linearly independent.
ii) Set k = 0. Determine the dimension of ⟨v1 , v2 ⟩, and if possible, find a vector w
such that w does not belong to ⟨v1 , v2 ⟩ and {v1 , v2 , w} does not generate R2 [x].
4.5.13 Determine for which values of k the vectors v1 = (1, 2, 0), v2 = (−k, 3, 0),
v3 = (2, 0, 1) are a basis B of R . Put k = 0 and determine the coordinates of the
3
4.5.14 Determine, if possible, 4 nonzero vectors of R2 [x] that do not generate R2 [x].
Determine, if possible, two distinct subspaces of R2 [x] of dimension 2 that both
2
contain the vector x + x.
a b
4.5.15 Let X = {( ) ∈ M2 (R)∣A, b ∈ R}. Prove that X is a subspace of M2 (R)
−b a
and determine its dimension.
−1 1
4.5.16 Show that B = {( ) , ( )} is a basis of R and determine the coordinates
2
2 −1
3 0
of the vectors ( ) and ( ) with respect to this basis.
−1 1
ii) Choose a value of k and determine if the vector (2, k, k) belongs to ⟨v1 , v2 , v3 ⟩.
2 2
4.5.18 i) Determine for which values of k the vectors v1 = x + 2x + 2, v2 = −x +
2kx + k − 1, v3 = kx + (2k + 4)x + 3k are linearly independent.
2
ii) Set k = 0 and determine, if possible, a vector w that does not belong to ⟨v1 , v2 , v3 ⟩.
4.5.22 Prove that a nonzero vector space of finite dimension has infinitely many
bases.
74 Introduction to Linear Algebra
We write here, for the interested reader, the proof of the Completion Theorem. We
start with a technical lemma.
v = λ1 w1 + ⋅ ⋅ ⋅ + λk wk + ⋅ ⋅ ⋅ + λn wn
β1 w1 + ⋅ ⋅ ⋅ + βk v + ⋅ ⋅ ⋅ + βn wn = 0.
Since the wi are linearly independent, we have that all coefficients must be zero. In
particular, it must happen that βk λk = 0 and βi + βk λi = 0 for every i ≠ k. From the
first equality, being λk ≠ 0, it follows that βk = 0, and by substituting in the others
′
we get βi = 0 for every i ≠ k. So all βi are zero, which shows that the vectors of B
are linearly independent.
v1 = α1 w1 + ⋅ ⋅ ⋅ + αn wn ,
where not all the coefficients are zero. Possibly rearranging the wk , we can assume
α1 ≠ 0, and then by Lemma 4.6.1 we have that {v1 , w2 , . . . , wn } is a basis of V .
Basis and Dimension 75
Now we consider v2 and the basis {v1 , w2 , . . . ,wn }. We can write v2 in the form:
v2 = β1 v1 + β2 w2 + ⋅ ⋅ ⋅ + βn wn ,
where at least one of the βj with j ≥ 2 is not zero, otherwise v2 would be a multiple of
v1 contradicting the hypothesis that the vectors v1 , . . . , vm are linearly independent.
It is not restrictive to assume β2 ≠ 0, and therefore by Lemma 4.6.1 we have that
{v1 , v2 , w3 , . . . ,wn } is basis.
We can then continue in the same way. At the i-th step we can assume that
{v1 , . . . , vi−1 , wi , . . . , wn } is a basis. We can write vi in the form:
vi = λ1 v1 + ⋅ ⋅ ⋅ + λi−1 vi−1 + λi wi + ⋅ ⋅ ⋅ + λn wn ,
where at least one of the λj with j ≥ i is not zero, otherwise we would have that
vi ∈ ⟨v1 , . . . , vi−1 ⟩, contradicting the hypothesis that the vectors v1 , . . . , vm are
linearly independent. It is not restrictive; suppose that it is λi ≠ 0, and then by
Lemma 4.6.1 we have that {v1 , . . . , vi , wi+1 ,. . . , wn } is a basis for V .
If m ≤ n, possibly rearranging the vectors wk appropriately, after m steps we
obtain that {v1 , . . . , vm , wm+1 , . . . , wn } is a basis for V , as we wanted.
If m > n, after n steps we get that {v1 , . . . , vn } is a basis of V , from which
it follows that vn+1 ∈ ⟨v1 , . . . , vn ⟩, but this contradicts the hypothesis that vectors
v1 , . . . , vm are linearly independent. So it must be m ≤ n, and this ends the proof.
CHAPTER 5
Linear Transformations
Linear transformations are functions between vector spaces that preserve their struc-
ture, i.e. they are compatible with the operations of sum of vectors and multiplication
of a vector by a scalar. As we will see, linear maps are represented very efficiently
using matrices. The purpose of this chapter is to introduce the concept of linear
transformation and understand how it is possible to uniquely associate a matrix to
n m
each linear transformation between R and R , once we fix the canonical bases in
both spaces. Then we will study the kernel, the image of a linear transformation and
the Rank Nullity Theorem, which is one of the most important results in the theory
of vector spaces of finite dimension.
On the other hand, there are other functions f ∶ R ⟶ R, that behave well
with respect to the vector space structure, i.e. they verify the equalities f (u + v) =
f (u) + f (v) and f (λv) = λf (v). Consider for example, the function f (x) = 3x. We
see immediately that f (x1 + x2 ) = 3(x1 + x2 ) = 3x1 + 3x2 = f (x1 ) + f (x2 ) and also
that f (λx1 ) = 3λx1 = λ(3x1 ) = λf (x1 ).
As we will see, those functions are linear transformations between vector spaces,
and they preserve the structure, i.e. the sum of vectors has as image, via the function,
77
78 Introduction to Linear Algebra
the sum of the images of the vectors, and the image of the product of a vector by a
scalar is the product of the scalar and the image of the vector.
Before the formal definition of linear map, we give the definition of function and
image.
Definition 5.1.1 We define a function f between two sets A and B as a law which
associates to each element of A one and only one element of B and denote this law
as f ∶ A ⟶ B. The set A is called domain of the function, while the set B is
called codomain of the function. We define image of an element a ∈ A, the element
f (a) ∈ B. The set of images of all the elements of A is called image of f and is
denoted by Im(f ) or sometimes with f (A).
Not all laws that associate elements of a set to elements of another set are func-
tions. For example, we can define a law that goes from the set A of all human beings,
to the set B of all human beings (A and B can be the same set), that associates to
every person a brother. This is not a function because someone may have more than
one brother.
Another example: We consider the law that goes from the set of natural numbers,
to the set of natural numbers, that associates to every number one of its divisors.
Also this law is not a function.
Let us define linear transformations.
2. Let D ∶ R[x] ⟶ R[x] be defined by: D(p(x)) = p (x), i.e. D is the function
′
⎛ a1 1 a1 2 ⋯ a1 n ⎞
⎜ a2 1 a2 2 ⋯ a2 n ⎟
A=⎜
⎜
⎜
⎟
⎟ ∈ Mm,n (R),
⎜
⎜ ⋮ ⋮ ⋱ ⋮ ⎟ ⎟
⎟
⎝am 1 am 2 ⋯ am n ⎠
n m
we can associate the function LA ∶ R → R so defined:
n m
LA ∶ R → R
⎛ x1 ⎞ ⎛ a1 1 x1 + a1 2 x2 + . . . + a1 n xn ⎞
⎜
⎜ x2 ⎟
⎟ ↦ ⎜
⎜ a2 1 x1 + a2 2 x2 + . . . + a2 n xn ⎟
⎟
⎜
⎜⋮⎟ ⎟ ⎜
⎜ ⎟
⎟
⎜
⎜ ⎟ ⎟ ⎜
⎜ ⋮ ⎟
⎟
⎝xn ⎠ ⎝am 1 x1 + am 2 x2 + . . . + am n xn ⎠
⎛ a1 1 a1 2 ⋯ a1 n ⎞ ⎛ x 1 ⎞ ⎛ x1 ⎞
⎜ a2 1 a2 2 ⋯ a2 n ⎟ ⎜ ⎟ ⎜ x2 ⎟
=⎜ ⎟ ⎜ 2⎟
= A⎜ ⎟
⎜ ⎟ ⎜ x ⎟ ⎜ ⎟
⎜
⎜ ⎟
⎟ ⎜
⎜ ⎟
⎟ ⎜
⎜ ⎟
⎜ ⋮ ⋮ ⋱ ⋮ ⎟⎜ ⋮ ⎟ ⎜⋮⎟ ⎟
⎝am 1 am 2 ⋯ am n ⎠ ⎝xn ⎠ ⎝xn ⎠
80 Introduction to Linear Algebra
where the product of A by the vector (x1 , . . . , xn ) is the product rows by columns
defined in Chapter 1.
⎛ x1 ⎞ ⎛ x1 ⎞
⎜ ⎟ ⎜ x2 ⎟
LA ⎜ 2⎟
= A⎜ ⎟
⎜ x ⎟ ⎜ ⎟
⎜
⎜ ⎟
⎟ ⎜
⎜ ⎟.
⎜ ⎟⋮ ⎜⋮⎟ ⎟
⎝xn ⎠ ⎝xn ⎠
2 1 0
A=( ),
−1 1 3
3 2
it follows that the linear transformation LA ∶ R → R is defined by:
⎛x1 ⎞ 2 1 0 ⎛
x1 ⎞
2x1 + x2
LA ⎜
⎜ 2⎟
x ⎟ = ( ) ⎜
⎜x2 ⎟
⎟ =( )
⎝x3 ⎠ −1 1 3 ⎝ ⎠ −x1 + x2 + 3x3
x3
2 1 0
= x1 ( ) + x2 ( ) + x3 ( ) .
−1 1 3
in other words:
the images of the canonical basis vectors are the columns of the matrix A.
This fact will be crucial for the exercises, when we have to determine the image of a
linear transformation.
We now wish to know how many possibilities we have for a linear transformation
F ∶ R ⟶ R such that F (1) = a ∈ R. We observe that, by property 1 of Definition
5.1.2 we have that F (x) = xF (1) = ax. So the only linear transformations from R to
R correspond to the straight lines passing through the origin (hence the name linear
transformation). This example is particularly instructive because it showed us that,
in order to fully understand a linear transformation from R to R, it is sufficient to
know only one value; we chose F (1), but the student can convince himself that the
value of F in any other point (as long as nonzero) would have determined F . This
Linear Transformations 81
Theorem 5.1.7 Let V and W be two vector spaces. If {v1 , . . . , vn } is a basis of V and
w1 , . . . , wn are arbitrary vectors of W , then there is a unique linear transformation
L ∶ V → W , such that L(v1 ) = w1 , . . . , L(vn ) = wn .
L(v) = α1 w1 + ⋅ ⋅ ⋅ + αn wn .
= α1 w1 + ⋯ + αn wn + β1 w1 + ⋯ + βn wn = L(v) + L(u).
= λα1 w1 + ⋯ + λαn wn =
= λ(α1 w1 + ⋯ + αn wn ) = λL(v).
So L is a linear transformation.
Now let us prove uniqueness. Suppose that G is a linear transformation G ∶ V →
W , such that G(v1 ) = w1 , . . . , G(vn ) = wn and L is the linear transformation defined
above. Then:
G(v) = G(α1 v1 + ⋯ + αn vn ) = α1 G(v1 ) + ⋅ ⋅ ⋅ + αn G(vn ) =
= α1 w1 + ⋅ ⋅ ⋅ + αn wn = L(v).
So G = L, as we wanted.
Corollary 5.1.8 Let V and W be two vector spaces. If two linear transformations,
T, S ∶ V → W coincide on a basis of V , then they coincide on the whole V .
82 Introduction to Linear Algebra
R , i.e. e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1). We want to determine F (x, y)
3
= (2y, x − y, x + y).
We want to express F (x1 , . . . , xn ) that is, we want to write the image of any vector
(x1 , . . . , xn ) ∈ R .
n
We proceed exactly as in the example, the reasoning is the same, only more
complicated to write.
Linear Transformations 83
F (x1 , . . . , xn ) = F (x1 e1 + x2 e2 + ⋅ ⋅ ⋅ + xn en ) =
= x1 F (e1 ) + x2 F (e2 ) ⋅ ⋅ ⋅ + xn F (en ) =
= x1 (a11 e1 + a12 e2 + ⋅ ⋅ ⋅ + am1 em ) + x2 (a12 e1 + a22 e1 ⋅ ⋅ ⋅ + am2 em )+
⋅ ⋅ ⋅ + xn (a1n e1 + ⋅ ⋅ ⋅ + amn em ) =
= (a1 1 x1 + a1 2 x2 + . . . + a1 n xn )e1 +
+(a2 1 x1 + a2 2 x2 + . . . + a2 n xn )e2 + . . .
+(am 1 x1 + am 2 x2 + . . . + am n xn )em =
⎛ a1 1 x1 + a1 2 x2 + . . . + a1 n xn ⎞
⎜ a2 1 x1 + a2 2 x2 + . . . + a2 n xn ⎟
=⎜
⎜
⎜
⎜
⎟
⎟
⎟
⎟ .
⎜ ⋮ ⎟
⎝am 1 x1 + am 2 x2 + . . . + am n xn ⎠
Let us take a step further, noting that F (x1 , . . . , xn ) can also be written in a
more compact form, using the notation of multiplication of a matrix by a vector:
⎛ x1 ⎞ ⎛ x1 ⎞ ⎛ a11 . . . a1n ⎞
F⎜
⎜ ⎟⋮ ⎟ = A ⎜
⎜⋮⎟ ⎟, where A=⎜
⎜ ⋮ ⋮ ⎟⎟.
⎝xn ⎠ ⎝xn ⎠ ⎝am1 . . . amn ⎠
⎛ a1 1 x 1 + a1 2 x 2 + . . . + a1 n x n ⎞
⎜ a2 1 x 1 + a2 2 x 2 + . . . + a2 n x n ⎟
2. F (x1 , . . . , xn ) = ⎜
⎜
⎜
⎜
⎟
⎟
⎟
⎟;
⎜ ⋮ ⎟
⎝am 1 x1 + am 2 x2 + . . . + am n xn ⎠
84 Introduction to Linear Algebra
⎛ x1 ⎞ ⎛ a11 . . . a1n ⎞
⎜⋮⎟
x=⎜ ⎟ A=⎜
⎜ ⋮ ⋮ ⎟⎟.
⎝xn ⎠ ⎝am1 . . . amn ⎠
⎛ a11 . . . a1n ⎞
A=⎜
⎜ ⋮ ⋮ ⎟⎟
⎝am1 . . . amn ⎠
and vice versa.
n m
Observation 5.2.4 We observe that, if the linear transformation F ∶ R → R is
associated with the matrix A, where we fix the canonical bases in the domain and
codomain, indicating as usual with {e1 , . . . , en } the canonical basis of R , we have
n
that the i-th column of the matrix A (which we denote with Ai ) is F (ei ).
We compute f ◦ g ∶ R → R and g ◦ f ∶ R → R,
2 2
f ◦ g ∶ x ↦ g(x) = x + 2 ↦ f (x + 2) = (x + 2) − 1 = x + 2x + 3,
2 2 2 2
g ◦ f ∶ y ↦ f (y) = y − 1 ↦ g(y − 1) = (y − 1) + 2 = y + 1.
In this case, f ◦ g ≠ g ◦ f .
Let us now see an important example in linear algebra.
3 2
Example 5.3.2 Consider the two linear transformations LA ∶ R ⟶ R , LB ∶
2 2
R ⟶ R associated with the matrices:
−1 1 2 2 1
A=( ), B=( ),
3 1 0 1 3
2 3
with respect to the canonical bases in R and R . We see immediately that LB ◦ LA
is defined, while LA ◦ LB it is not defined. This is because LA must have as argument
a vector in R , while for every v ∈ R we have that LB (v) ∈ R , then LA (LB (v))
3 2 2
⎛ x1 ⎞ ⎛ x1 ⎞
= (BA) ⎜ ⋮ ⎟ = LBA ⎜
⎜ ⎟ ⎜⋮⎟ ⎟.
⎝xn ⎠ ⎝xn ⎠
86 Introduction to Linear Algebra
In this check, we used the associativity of the multiplication rows by columns between
matrices.
which are the polynomial p(x) whose image is zero, i.e. such that D(p(x)) = 0. From
calculus, we know they are all the constant polynomials. So Ker (D) = {c ∣c ∈ R}.
Let us look at the image of D. We ask which polynomials are derivatives of other
polynomials. From calculus we know they are all the polynomials (we are in fact
integrating), so Im(D) = R[x].
2. Consider now the linear transformation L ∶ R ⟶ R defined by: L(e1 ) = 2e1 −e2 ,
3 2
= {(x, y, z)∣2x + y + z = 0, −x + 2z = 0}
2x y z
= {w ∈ R ∣ w = ( )+( )+( )
2
−x 0 2z
with x, y, z ∈ R }
2 1 1
= {w ∈ R ∣ w = x ( )+y( )+z( )
2
−1 0 2
with x, y, z ∈ R }
2 1 1
= ⟨( ) , ( ) , ( )⟩ .
−1 0 2
The fact that the linear maps are defined so to preserve both operations of vector
spaces, makes both the kernel and the image of a given linear transformation to be
linear subspaces.
Proposition 5.4.3 Let L ∶ V ⟶ W be a linear transformation.
1. The kernel of L is a subspace of the domain V .
2. The image of L is a vector subspace of the codomain W .
Proof. (1) We note first that Ker (L) is not the empty set, because 0V ∈ Ker (L) by
Proposition 5.1.4. We have then to verify that Ker (L) is closed with respect to the
sum of vectors and the multiplication of a vector by a scalar. Let us start with the sum.
Let u, v ∈ Ker (L). Then L(u) = L(v) = 0W , so L(u+v) = L(u)+L(v) = 0W +0W =
0W , thus u + v ∈ Ker (L). Now we verify the closure of L with respect to the product
by a scalar. If α ∈ R and u ∈ Ker (L) one has L(αu) = αL(u) = α0W = 0W , so
αu ∈ Ker (L).
(2) Let us now see the same two properties for Im(L). We have that 0W ∈ Im(L)
by Proposition 5.1.4. Let now w1 , w2 ∈ Im(L). So there exist v1 , v2 ∈ V , such that
L(v1 ) = w1 and L(v2 ) = w2 . Therefore, w1 + w2 = L(v1 ) + L(v2 ) = L(v1 + v2 ) ∈
Im(L) and αw1 = αL(v1 ) = L(αv1 ) ∈ Im(L) for every α ∈ R.
f ∶A⟶B
1. We say that f is injective if whenever f (x) = f (y) then x = y, i.e. two distinct
elements x and y can never have the same image.
1. L is injective if and only if Ker (L) = 0V , that is, its kernel is the zero subspace
of the domain V .
2. L is surjective if and only if Im(L) = W , i.e. the image of L coincides with the
codomain.
Proof. (1) We show that if L is injective Ker (L) = 0V . If u ∈ Ker (L) then L(u) =
0W = L(0V ) and since L is injective, u = 0V .
Viceversa, let Ker L = (0V ) and suppose that L(u) = L(v) for some u, v ∈ V .
Then L(u − v) = L(u) − L(v) = 0W . So u − v ∈ Ker (L) = 0V , and we have that
u − v = 0V , then u = v, therefore f is an injection.
(2)This is precisely the definition of surjectivity.
The next proposition tells us that the injective linear transformations preserve
linear independence.
Proof. Let {u1 , . . . , ur } be a basis for the subspace Ker L. By Theorem 4.2.1, we can
complete it to a basis B of V . Let
B = {u1 , . . . , ur , wr+1 , . . . , wn } .
90 Introduction to Linear Algebra
If we prove that B1 = {L(wr+1 ), . . . , L(wn )} is a basis for Im(L) then the theorem
is proved, as dim(Ker (L)) = r, dim(V ) = n and dim(Im(L)) = n − r (the dimension
of Im(L) is the number of vectors in a basis and B1 contains n − r vectors).
Certainly B1 is a system of generators for Im(L), by Proposition 5.4.4. Now we
show that the vectors in B1 are linearly independent. Let
αr+1 w1 + ⋯ + αn wn − (α1 u1 + ⋯ + αr ur ) = 0
and being B a basis for V this implies that α1 = . . . = αn = 0, concluding the proof
of the theorem.
Formula 5.1 places restrictions on the type and the existence of linear maps be-
tween two given vector spaces.
Example 5.6.2 Consider the linear map L ∶ R2 [x] ⟶ R defined by: L(x ) =
3 2
(1, 0, 0), L(x) = (0, 1, 0), L(1) = (0, 0, 1). This linear transformation is invertibile.
To show this, we can determine the kernel and see that it is the zero subspace and de-
3
termine the image and see that it is all R . We leave this as an exercise. Alternatively,
we can define the linear transformation T ∶ R ⟶ R2 [x], such that T (e1 ) = x ,
3 2
T (e2 ) = x, T (e3 ) = 1 and verify that it is the inverse of L (the student may want to
do these verifications as an exercise). Therefore R2 [x] and R are isomorphic. Some-
3
how, it is as if they were the same space, as we created a one to one correspondence
that associates to a vector in R2 [x], one and only one vector in R , and vice versa.
3
This correspondence also preserves the operations of sum of vectors and multiplica-
tion of a vector by a scalar. In fact, we had already noticed that, once we fix basis in
R2 [x], each vector is written using three coordinates, just like a vector in R . If we
3
fix the canonical basis {x , x, 1}, the linear map that associates to each polynomial
2
The next theorem is particularly important, since it tells us that not only R2 [x],
N
but any vector space of finite dimension is isomorphic to R for a certain N (which
of course depends on the vector space we consider). So the calculation methods we
N
have described to solve various problems in the vector space R s can be applied to
any vector space V using, instead of the N -tuples of real numbers, the coordinates
of the vectors of the vector space V with respect to a fixed basis.
Theorem 5.6.3 Two vector spaces V and W are isomorphic if and only if they have
the same dimension.
Since we know the dimensions of the vector spaces Mm,n (R) and Rd [x], we im-
mediately get the following corollary.
Example 5.7.1 Consider the linear map F ∶ R ⟶ R defined by: F (e1 ) = −e2 ,
4 2
F (e2 ) = 3e1 − 4e2 , F (e3 ) = −e1 , F (e4 ) = 3e1 + e2 . We want to determine a basis
for the kernel of F . We write the matrix A associated with F with respect to the
canonical bases:
0 3 3 −1
A=( ).
−4 −1 0 1
Therefore F (x1 , x2 , x3 , x4 ) = (3x2 − x3 + 3x4 , −x1 − 4x2 + x4 ), and Ker F is the set
of solutions of the homogeneous linear system:
3x2 − x3 + 3x4 = 0
{
−x1 − 4x2 + x4 = 0,
4 1
Ker F = {(− + x3 5x4 , x3 − x4 , x3 , x4 ) ∣x3 , x4 ∈ R} =
3 3
4 1
{(− x3 , x3 , x3 , 0) + (5x4 , −x4 , 0, x4 )∣x3 , x4 ∈ R} =
3 3
4 1
{x3 (− , , 1, 0) + x4 (5, −1, 0, 1) ∣x3 , x4 ∈ R} =
3 3
4 1
⟨(− , , 1, 0) , (5, −1, 0, 1)⟩ .
3 3
We observe that the vectors (− 34 , 13 , 1, 0) , (5, −1, 0, 1) not only generate Ker F , but
they are also linearly independent, because they are not one a multiple of the other,
so they are a basis of Ker F .
Another way to understand the above equalities is the following: Ker F is the set
of linear combinations of the vectors (− 34 , 31 , 1, 0) , (5, −1, 0, 1), obtained by placing,
respectively, first x3 = 1, x4 = 0, then x3 = 0, x4 = 1.
Definition 5.7.3 We call row rank of a matrix M ∈ Mm,n (R) the maximum number
n
of linearly independent rows of M , which is the dimension of the subspace of R
generated by the rows of M . The row rank of A is denoted by rr(A).
If A ∈ Mm,n (R) is a matrix in echelon form the definition of row rank given
above coincides with Definition 1.3.4. In fact, by Proposition 4.3.3 the nonzero rows
of a matrix A in row echelon form are linearly independent, so the dimension of the
subspace generated by the rows of A coincides with the number of nonzero rows of
A.
4.3.1, we have that rr(A) = rr(A ) = r, and in turn r is equal to the number of
′
′ ′
non-zero rows of A , that is the number of pivots of A . Then, as we saw in Chapter
1, we can assign an arbitrary value to each of the n − r variables, and write the r
variables corresponding to the pivots in terms of these values. Let xi1 , . . . , xin−r be
the free variables and let wj be the solution of the system obtained by putting xij = 1
and the other free variables equal to zero. Proceeding exactly as in Example 5.7.1,
we obtain that the vectors w1 , . . . , wr generate W . We now show that they are also
linearly independent, from which it follows that they are a basis of W , and so W has
dimension n − r. Suppose that w = λ1 w1 + ⋅ ⋅ ⋅ + λn−r wn−r = 0 with λ1 , . . . , λn−r ∈ R.
We observe the element of place ij of wh is 1 if h = j, otherwise it is zero, thus the
element of place ij of w is exactly λj . The hypothesis that w = 0 means that all
elements of w are zero, in particular λ1 = ⋅ ⋅ ⋅ = λn−r = 0, and this shows that the
vectors w1 , . . . , wn are linearly independent.
We now want to proceed and calculate a basis for the image of a linear transfor-
mation.
n m
Suppose we have a linear map F ∶ R ⟶ R , and we want to determine a basis
n m
for the image. We endow R and R with the canonical bases; then, by Proposition
5.2.2, we have that F (x) = Ax for a suitable matrix A ∈ Mm,n (R). By Proposition
5.4.4 we have:
Im(F ) = ⟨F (e1 ), . . . , F (en )⟩ = ⟨A1 , . . . , An ⟩,
where A1 , . . . , An are the columns of A. At this point, we simply apply the Gaussian
algorithm to the vectors that form the columns of A. Recall that, to perform the
Gaussian algorithm, we must write the vectors as rows. Let us see an example.
Example 5.7.5 Let F ∶ R ⟶ R be defined by F (x, y, z) = (x, 2x, x + y + z, y).
3 4
The matrix associated with F with respect to the canonical bases is:
⎛1 0 0⎞
⎜2 0⎟
A=⎜ ⎟
⎜ 0 ⎟.
⎜
⎜ 1⎟
⎟
⎜1 1 ⎟
⎝0 1 0⎠
The image of F is generated by the columns of A, i.e. ImF = ⟨(1, 2, 1, 0),
(0, 0, 1, 1), (0, 0, 1, 0)⟩. So we apply the Gaussian algorithm to the matrix
⎛1 2 1 0⎞
A =⎜
⎜0 0 1 1⎟
T
⎟,
⎝0 0 1 0⎠
T
where A denotes the transpose of the matrix A, i.e. it is the matrix which has the
T
columns of the matrix A as rows. Reducing A to row echelon form, we get
⎛1 2 1 0 ⎞
⎜
⎜0 0 1 1 ⎟
⎟.
⎝0 0 0 −1⎠
We then see that a basis for the image of F is:
{(1, 2, 1, 0), (0, 0, 1, 1), (0, 0, 0, −1)}.
Linear Transformations 95
ke3 , Fk (e2 ) = ke2 + ke3 , Fk (e3 ) = ke1 + ke2 + 6e3 , Fk (e4 ) = k + e1 (6 − k)e3 .
a) Determine for which values of k we have that Fk is injective and for which
values of k we have that Fk is surjective.
b) Having chosen a value of k for which Fk is not surjective, determine a vector
/ ImFk .
3
v ∈ R such that v ∈
Solution. By Proposition 5.5.2 Fk is never injective. We now study surjectivity. The
matrix associated with Fk with respect to the canonical bases in the domain and in
the codomain is:
⎛1 0 k k ⎞
A=⎜ ⎜ 2 k k 0 ⎟ ⎟
⎝k k 6 6 − k ⎠
(see Observation 5.2.4). The image of Fk is the subspace generated by the columns
T
of A. We write these columns as rows (i.e. we consider the matrix A , the transpose
of A), and we perform the Gaussian algorithm. We obtain:
⎛1 2 k ⎞
⎜0 k ⎟
A =⎜ ⎟
T
⎜ k ⎟
⎜
⎜ 6 ⎟⎟
⎜k k ⎟
⎝k 0 6 − k⎠
⎛1 2 k ⎞
⎜
⎜0 k k ⎟
⎟
⎜
⎜ ⎟.
⎜ 0 k − k − 6⎟
⎟
2
⎜0 ⎟
⎝0 0 0 ⎠
If k ≠ 0, k ≠ −2 and k ≠ 3, this matrix has three nonzero rows, so ImFk has dimension
3 and Fk is surjective.
If k = 0, after the exchange of the second with the third row we get:
⎛1 2 0⎞
⎜
⎜0 0 −6⎟
⎟
⎜
⎜ ⎟,
⎜
⎜0 0 0⎟⎟
⎟
⎝0 0 0⎠
4 3
5.8.2 Determine, if possible, a linear transformation G ∶ R → R such that ImG =
⟨(1, 1, 0), (0, 3, −1), (3, 0, 1)⟩ and Ker G = ⟨(−1, 0, 1, −3)⟩.
Solution. Let us see, first of all, if the requests made are compatible with the
Rank Nullity Theorem. We first want to find a basis of ⟨(1, 1, 0), (0, 3, −1), (3, 0, 1)⟩.
To do this, we reduce the matrix to row echelon form:
⎛1 1 0 ⎞
⎜0 3 −1⎟
⎜ ⎟
⎝3 0 1 ⎠
and we get
⎛1 1 0 ⎞
⎜
⎜0 3 −1⎟
⎟.
⎝0 0 0 ⎠
4
So we would have that ImG has dimension 2 and Ker G has dimension 1, but dim R =
4 ≠ 1 + 2 = dim(Ker G) + dim(ImG), consequently a linear transformation with the
required properties cannot exist.
3 3
5.8.3 Determine, if possible, a linear transformation F ∶ R → R such that
Ker F = ⟨e1 − 2e3 ⟩ and ImF = ⟨2e1 + e2 − e3 , e1 − e2 − e3 ⟩. Is this transforma-
tion unique?
Solution. We observe that the requests are compatible with the Rank Nullity The-
orem. In fact, ⟨e1 − 2e3 ⟩ has a dimension of 1, ⟨2e1 + e2 − e3 , e1 − e2 − e3 ⟩
has dimension 2 (because the two vectors are not one multiple of the other) and
dim R = 3 = 1 + 2 = dim(Ker F ) + dim(ImF ).
3
Let us now try to determine the matrix A associated with such F with respect
to the canonical basis (in the domain and in the codomain). It must happen that
F (e1 − 2e3 ) = 0, i.e. F (e1 ) − 2F (e3 ) = 0 (because F is linear), so F (e1 ) = 2F (e3 ).
Since F (e1 ) is represented by the first column of A, and F (e3 ) by third column, we
have that the first column of A must be twice the third. Furthermore, the subspace
generated by the columns of A, that is ImF must be equal to ⟨(2, 1, −1), (1, −1, −1)⟩.
For example, the matrix:
⎛2 2 1⎞
A = ⎜−2 1 −1⎟
⎜ ⎟
⎝−2 −1 −1⎠
meets all of the requirements. In fact by construction e1 − 2e3 ∈ Ker F and ImF =
⟨2e1 + e2 − e3 , e1 − e2 − e3 ⟩; also from the Rank Nullity Theorem, we get that
dim(Ker F ) = dim R − dim(ImF ) = 3 − 2 = 1, thus Ker F = ⟨e1 − 2e3 ⟩.
3
This F is not unique; in fact, it can be verified that for example also the matrix
⎛6 2 3⎞
A=⎜
⎜−6 1 −3⎟
⎟
⎝−6 −1 −3⎠
meets all the given requirements.
Linear Transformations 97
2 3
5.8.4 Let Gk ∶ R → R be the linear transformation defined by:
Gk (x, y) = (kx + 5y, 2x + (k + 3)y, (2k − 2) + x(7 − k)y). Determine for which values
of k we have that G is injective.
Solution. To determine if Gk is injective we need to understand if the kernel of Gk
contains only the zero vector or not. By Corollary 5.2.3 the matrix associated with
Gk with respect to the canonical bases in the domain and in the codomain is:
⎛ k 5 ⎞
A=⎜
⎜ 2 k + 3⎟⎟.
⎝2k − 2 7 − k ⎠
To find the kernel of Gk we need to solve the homogeneous linear system associated
with A. Reducing the matrix (A∣0) to row echelon form, we get:
k+3
⎛ 1 2
0 ⎞
⎜
(A ∣0) = ⎜ ⎟.
⎜ 0 −k − 3k + 10 0 ⎟
′ 2
⎟
⎝ 0 0 0 ⎠
Since the unknowns are 2, the system has only one solution, the null one, therefore
Ker Gk = {(0, 0)} and G is injective. If k = −5 or k = 2, we easily see that rr(A ∣0) =
′
rr(A ) = 1, then the system admits infinitely many solutions that depend on one
′
the image of Gk has dimension of 2. We then calculate a basis for the image of Gk .
We must consider the matrix
k 2 2k − 2
( ),
5 k+3 7−k
If k ≠ −5 and k ≠ 2, there are two nonzero rows, so the image of Gk has dimension
2 and Gk is injective. If k = −5 or k = 2 there is only one nonzero row, so the image
of Gk has dimension 1 and Gk is not injective. We have thus found once again the
result obtained with the previous method.
3 2 2 3
5.9.3 Given linear transformations F ∶ R → R and G ∶ R → R defined by:
F (x, y, z) = (x − y, 2x + y + z) and G(x, y) = (3y, −x, 4x + 2y), determine if possible,
F ◦ G and G ◦ F .
2 2 2
5.9.4 Given the linear transformations F ∶ R → R and G ∶ R → R defined by:
F (e1 ) = −e1 − e2 , F (e2 ) = e1 + e2 , G(e1 ) = 2, G(e2 ) = − 1; determine if possible
F ◦ G and G ◦ F .
5.9.5 Consider the linear transformation F ∶ R ⟶ R defined by F (e1 ) = 3e1 −3e2 ,
2 2
F (e2 ) = 2e1 − 2e2 . Compute a basis of the kernel and a basis for the image of F .
5.9.6 Establish which of the following linear transformations are isomorphisms:
i) F ∶ R ⟶ R defined by F (x, y, z) = (x + 2z, y + z, z);
3 3
iii) F ∶ R → R defined by F (e1 ) = 2e1 +e2 , F (e2 ) = 3e1 −e3 , F (e3 ) = e1 −e2 −e3 .
3 3
5.9.7 Find a basis for the kernel and one for the image of each of the following linear
transformations. Establish if they are injective, surjective and/or bijective.
i) F ∶ R ⟶ R defined by F (x, y, z) = (x − z, x + 2y − z, x − 4y − z).
3 3
3 3
iv) F ∶ R ⟶ R associated with A, with respect to the canonical basis, where
⎛−1 1 −1⎞
⎜0 0 1⎟
A=⎜ ⎟.
⎝1 0 1⎠
x1 − x3 − x4 x1 + 2x3
F (x1 , x2 , x3 , x4 ) = ( ),
x1 + x4 0
F (e2 ) = 2e1 + 2e2 + 2e3 , F (e3 ) = e1 + e2 + e3 ; find a basis for Ker F and establish
if the vector e1 − e2 + e3 belongs to ImF .
3 2
5.9.10 Determine, if possible, a surjective linear transformation T ∶ R → R and
4 3
an injective linear transformation F ∶ R → R .
3 4
5.9.11 Are there injective transformations T ∶ R → R ? If yes, determine one; if
not, give reasons for the answer.
Linear Transformations 99
2 2
5.9.12 Let Tk ∶ R ⟶ R be the linear transformation associated with the matrix
A with respect to the canonical basis, where
1 4
A=( ).
k 0
⎛1 2 0⎞
A=⎜
⎜1 0 1⎟
⎟.
⎝2 k 3⎠
2z, y, 2x + 3y + 4z, 3x − y + 6z). Find a basis for Ker (T ) and a basis for Im(T ) and
their dimensions. Is T injective?
3 2
5.9.15 Determine, if possible, a linear transformation F ∶ R → R such that Ker F =
⟨e1 ⟩ and ImF = ⟨e1 − e2 ⟩.
2 3
5.9.16 Determine, if possible, a linear transformation F ∶ R → R such that Ker F =
⟨e2 ⟩ and ImF = ⟨e1 − e2 + 2e3 ⟩.
2 4
5.9.17 Determine, if possible, a linear transformation F ∶ R → R such that ImF
has dimension 1.
3 2
5.9.18 Determine, if possible, a linear linear transformation F ∶ R → R such that
(1, 1) ∈
/ ImF .
5.9.19 Let F ∶ R → R be the linear transformation defined by: F (e1 ) = 2e1 + ke2 ,
3 2
F (e2 ) = ke1 + 2e2 , F (e3 ) = −2e1 − ke2 . Determine for which values of k we have
that F is not surjective. Set k = −1, determine a vector v1 that belongs to Ker F
and a vector v2 which does not belong to Ker F .
4e2 + ke3 , Fk (e2 ) = −3e1 + ke2 + 3e3 , determine for which values of k we have that
Fk is injective and for which values of k we have that Fk is surjective.
CHAPTER 6
Linear Systems
In this chapter, we want to revisit the theory of linear systems and interpret the
results already discussed in Chapter 1 in terms of linear transformations, using the
knowledge we have gained in Chapter 5. We will use the notation and terminology
introduced in Chapter 1.
6.1 PREIMAGE
The inverse image or preimage of a vector w ∈ W under a linear map f ∶ V ⟶ W
constists of all the vectors in the vector space V , whose image is w. It is a basic
concept in mathematics; for us, it will be a useful tool for expressing the solutions of
a linear system.
We already know an example of inverse image, namely the kernel of a linear
transformation F . In fact, Ker (F ) is the inverse image of the zero vector of W , i.e.
it consists of all vectors, whose image under F is 0W .
Let us look at the definition.
Notice that the notation F (w), introduced in the previous definition, does not
−1
have anything to do with the invertibility of the function. When we speak of inverse
image of a vector under a function F , we are not saying that F is an invertible
function: The notation F (w) simply indicates a subset of the domain.
−1
2 3
Example 6.1.2 Let F ∶ R → R be the linear transformation defined by:
F (x, y) = (x + y, x + y, x).
101
102 Introduction to Linear Algebra
What is the inverse image of the vector (1, 1, 3) under F ? By definition we have:
⎧
⎪ ⎧
⎪ ⎫
⎪
⎪
⎪ ⎪x + y = 1, ⎪
⎪ 2 ⎪ ⎪
=⎨
⎪ (x, y) ∈ R ∣ ⎪
⎨
⎪ x + y = 1, ⎪
⎬
⎪ = {(3, −2)}.
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎩ ⎪
⎩x = 3 ⎪
⎭
This means that F (3, −2) = (1, 1, 3), and there are no other elements of R that have
2
⎧
⎪ ⎧
⎪ x+y =1 ⎫
⎪
⎪
⎪ 2 ⎪ ⎪
=⎨
⎪ (x, y) ∈ R ∣ ⎪
⎨
⎪ x+y =0 ⎪
⎬
⎪ = ∅.
⎪
⎪ ⎪
⎪ ⎪
⎪
⎩ ⎩ x=0 ⎭
The vector (1, 0, 0) is not the image of any vector of R , i.e. (1, 0, 0) ∉ ImF .
2
These examples show that calculating the preimage of a vector under a linear
transformation is equivalent to solving a linear system. We will now deepen our
understanding on this point.
of V .
(w) = {v + z ∣ z ∈ Ker F },
−1
F (6.1)
F (v ) = F (v) = w,
′
so
F (v − v) = 0W ,
′
i.e.:
′
v − v ∈ Ker F.
So any element v of F (w) is written as v = v+ (v − v) = v + z, with z ∈ Ker F .
′ −1 ′ ′
F (v + z) = F (v) + F (z) = w + 0W = w.
This gives the other inclusion and we have shown the result.
Definition 6.2.1 We call column rank of a matrix A ∈ Mm,n (R), the maximum
number of linearly independent columns of A, i.e. the dimension of the subspace of
m
R generated by the columns of A.
The following observation is already known and yet, given the its importance in
the context that we are studying, we want to rexamine it.
Observation 6.2.2 If we write A as the matrix associated with the linear trans-
n m
formation LA ∶ R ⟶ R with respect to the canonical bases, then the column
rank of A is the dimension of the image of LA . Indeed, the image is generated by the
columns of the matrix A.
Although in general the row vectors and column vectors of a matrix A ∈ Mm,n (R)
are elements of different vector spaces, the row and column rank of A always coincide.
This number is simply called rank of A, denoted by rk(A).
Proposition 6.2.3 If A ∈ Mm,n (R), then the row rank of A is equal to the column
rank of A.
104 Introduction to Linear Algebra
n m
Proof. Let LA ∶ R ⟶ R be the linear transformation associated with the matrix
A with respect to the canonical bases. The kernel of LA is the set of solutions of the
homogeneous linear system associated with the matrix A and, by Proposition 5.7.4,
has dimension n − rr(A), where rr(A) is the row rank of A. By the Rank Nullity
Theorem 5.5.1, we also know that the dimension of Ker LA is equal to n−dim(ImLA ).
It follows that rr(A) = dim(ImLA ), i.e. rr(A) is equal to the rank of columns A.
1 2 0 −1
A=( ).
1 2 0 −1
The row rank of A is 1. Also the column rank is 1, since the vectors (1, 1), (2, 2),
(−1, −1) of R are linearly dependent. So rk(A) = 1.
2
Observation 6.2.5 If A ∈ Mm,n (R) is a matrix in row echelon form, the definition
of rank coincides with the Definition 1.3.4. In fact, by Proposition 4.3.3 the rows of a
nonzero matrix A in row echelon form are linearly independent, hence the dimension
of the subspace generated by the rows of A coincides with the number of nonzero rows
of A. Computing the rank of a matrix in row echelon form is therefore immediate,
while calculating the rank of a generic matrix requires more time.
Proposition 4.3.1 provides an effective method for calculating the rank of a matrix
as the elementary operations on the rows of a matrix preserve the rank, since the
subspace generated by the rows remains unchanged. Hence, to compute the rank of
a matrix, A, we reduce A in row echelon form with the Gaussian algorithm and then
we compute the rank of the reduced matrix, which simply amounts to counting the
number of nonzero rows.
⎛ 0 1 3 ⎞
⎜ 1 −1 5 ⎟
A=⎜ ⎟.
⎝ −1 1 0 ⎠
We have:
⎛ 1 −1 5 ⎞
rk(A) = rk ⎜
⎜ 0 1 3 ⎟⎟ = 3.
⎝ 0 0 5 ⎠
S = {v + z ∣ z ∈ Ker A},
where v is a particular solution of the system and Ker A is the set of solutions of the
associated homogeneous linear system A x = 0.
Proof. This proposition is basically a rewriting of Proposition 6.1.4 with F = LA . In
fact, if we view the matrix A as the matrix associated to the linear transformation
n m
LA ∶ R → R with respect to the canonical bases, then determining the solutions
n
of the linear system Ax = b is the same as determining the vectors x ∈ R such that
LA (x) = b. In other words, it is the same as determining the preimage LA (b) of the
−1
m
vector b ∈ R . Since by hypothesis the system admits solutions, b belongs to the
image of LA , i.e. LA (v) = Av = b for a suitable vector v of R . The set S of system
n
The theorem that follows is the most important result in theory of linear systems.
Theorem 6.2.9 (Rouché-Capelli Theorem). A linear system Ax = b of m equa-
tions in n unknowns admits solutions if and only if rk(A) = rk(A∣b). If this condition
is satisfied, then the system has:
1. exactly one solution if and only if rk(A) = rk(A∣b) = n;
2. infinitely many solutions if and only if rk(A) = rk(A∣b) < n. In this case, the
solutions of the system depend on n − rk(A) parameters.
Proof. (1). We view the matrix A as the matrix associated with the linear transfor-
n m
mation LA ∶ R → R with respect to the canonical bases. The solutions of the linear
system Ax = b correspond to the vectors x ∈ R , such that LA (x) = b. Hence, we
n
preimage is not empty if and only if b ∈ ImLA . In other words, the system has so-
lutions if and only if b ∈ ImLA . As ImLA is generated by the column vectors of the
matrix A, b ∈ ImLA if and only if the subspace generated by the column vectors of
A coincides with the subspace generated by the column vectors of A and the column
vector b, that is, if and only if
2. dim(Ker A) > 0, thus Ker A contains infinitely many elements, being a real
n
vector subspace of R .
2. We use the Gaussian algorithm to reduce (A∣b) to a row echelon matrix in the
form (A ∣b );
′ ′
3. The starting linear system Ax = b is equivalent to the row echelon linear system
′ ′
Ax =b;
In this case, using subsequent substitutions, we obtain all the solutions of the
system.
Rouché-Capelli theorem gives all the information on linear systems, we have al-
ready seen in Chapter 1. In particular it states that:
A linear system with real coefficients which admits solutions has either one solution
or infinitely many.
This is exactly the situation that we described in Chapter 1, reinterpreted in terms of
linear transformations. In essence, a compatible linear system is a set of compatible
conditions that are assigned on n real variables. These conditions then lower the
number of degrees of freedom of the system: if the system rank is k, the set of
the solutions no longer depends on n free variables, but on n − k. What counts is
the rank of the system and not the number of equations, because the rank of the
system quantifies independent conditions and eliminates those conditions that can
be deduced from the others and so are redundant.
Here are some examples to illustrate the results shown above.
Linear Systems 107
3 3
Example 6.2.11 Consider the linear transformation LA ∶ R → R whose matrix
with respect to the canonical basis (both in the domain and in the codomain) is:
⎛ 1 0 2⎞
⎜ 2 1 1⎟
A=⎜ ⎟.
⎝ 3 1 3⎠
⎛ x ⎞ ⎛ 3 ⎞
⎜ y ⎟
A⎜ ⎟=⎜
⎜ 0 ⎟
⎟,
⎝ z ⎠ ⎝ 3 ⎠
that is:
⎛ x + 2z ⎞ ⎛ 3 ⎞
⎜
⎜ 2x + y + z ⎟
⎟=⎜
⎜ 0 ⎟
⎟.
⎝ 3x + y + 3z ⎠ ⎝ 3 ⎠
In other words, computing the inverse image of b by LA means solving the linear
system
⎧
⎪ x + 2z = 3
⎪
⎪
⎨
⎪ 2x + y + z = 0
⎪
⎪
⎩ 3x + y + 3z = 3.
To calculate the rank of matrix (A∣b) and compare it with the rank of A, we reduce
the matrix (A∣b) and, simultaneously, the matrix A in row echelon form using the
Gaussian algorithm and then we calculate the rank of the reduced matrices. We have:
⎛ 1 0 2 3 ⎞ ⎛ 1 0 2 3 ⎞
(A∣b) → ⎜ ⎟→⎜
⎜ 2 1 1 0 ⎟ ⎜ 0 1 −3 −6 ⎟
⎟→
⎝ 3 1 3 3 ⎠ ⎝ 0 1 −3 −6 ⎠
⎛ 1 0 2 3 ⎞
→⎜
⎜ 0 1 −3 −6 ⎟
⎟.
⎝ 0 0 0 0 ⎠
So rk(A) = rk(A∣b) = 2. This means that the vector b belongs to the image of LA
and dim(Ker LA ) = 3 − 2 = 1. The inverse image of b is given by the elements: v + z,
where v is a particular solution of the system Ax = b and z ∈ Ker A, the kernel of
the matrix A, i.e. the set of solutions of the homogeneous linear system Ax = 0. To
compute v, we observe that the starting system is equivalent to the system:
x + 2z = 3
{
y − 3z = −6.
108 Introduction to Linear Algebra
x + 2z = 0
{
y − 3z = 0;
we can solve from the bottom with subsequent substitutions: y = 3z, x = -2z. The
inverse image of b is thus S = {(3, −6, 0) + (−2z, 3z, z) ∣ z ∈ R}.
⎛ 1 −1 3 0 1 2 ⎞
⎜
(A∣b) = ⎜ 2 1 8 −4 2 3 ⎟
⎟,
⎝ 1 2 5 −3 4 1 ⎠
⎛ 1 −1 3 0 1 2 ⎞
′
⎜
(A ∣b ) = ⎜ 0
′
3 2 −4 0 −1 ⎟
⎟.
⎝ 0 0 0 1 3 0 ⎠
We have rk(A ) = rk(A ∣b ) = 3 so the system is solvable, and the solutions depend
′ ′ ′
on 5 − 3 = 2 parameters.
′
The pivots are on the first, second and fourth column of A , so we can obtain the
unknowns x1 , x2 and x4 in terms of x3 and x5 .
The system associated with the row echelon form matrix (A ∣b ) is:
′ ′
⎧
⎪ x1 − x2 + 3x3 + x5 = 2
⎪
⎪
⎨
⎪ 3x2 + 2x3 − 4x4 = −1
⎪
⎪
⎩ x4 + 3x5 = 0.
5 1 11 2
{( , − , 0, 0, 0) + z ∣z ∈ ⟨(− , − , 1, 0, 0) , (−5, −4, 0, −3, 1)⟩} .
3 3 3 3
Linear Systems 109
6.3.2 Determine the solutions of the following linear system in the unknown x1 , x2 ,
x3 :
⎧
⎪ x1 − x2 + x3 = 2
⎪
⎪
⎨
⎪ 2x1 − x2 + 3x3 = −1
⎪
⎪
⎩ x1 + 2x3 = 1.
Solution. The complete matrix associated with the system is:
⎛ 1 −1 1 2 ⎞
(A∣b) = ⎜
⎜ 2 −1 3 −1 ⎟
⎟,
⎝ 1 0 2 1 ⎠
⎛ 1 −1 1 2 ⎞
(A ∣b ) = ⎜ ⎟
′ ′
⎜ 0 1 1 −5 ⎟.
⎝ 0 0 0 4 ⎠
Thus, we have rk(A ) = 2 ≠ rk(A ∣b ) = 3. Therefore, the system does not admit
′ ′ ′
solutions by the Rouché-Capelli Theorem. We note that the linear system associated
with the matrix (A ∣b ) is
′ ′
⎧
⎪ x1 − x2 + x3 = 2
⎪
⎪
⎨
⎪ x2 + x3 = −5
⎪
⎪
⎩ 0 = 4,
which is clearly not compatible.
as set of solutions.
as set of solutions.
as set of solutions.
3
Solution. Since S is a subset of R , each linear system having S as set of solutions is
a linear system in 3 unknowns. We indicate these unknowns with x, y, z. The set of
solutions of a linear system of the form Ax = b is S = {v + z∣z ∈ + ker A}, where
v is a particular solution of the system. So if Ax = b is a linear system having S as
set of solutions, (1, 2, 1) is a system solution and ker A = ⟨(1, 1, 1)⟩. In particular,
110 Introduction to Linear Algebra
dim(ker A) = 1 = 3 − rk(A). So the system we seek has rank 2 and therefore must
necessarily consist of at least 2 equations. This immediately allows us to answer the
question (c): there is no linear system of rank 1 having S as a set of solutions.
To determine a linear system having S as a set of solutions and then answer
question (a), we could then write a generic linear system consisting of two equations
and impose that:
This method, which certainly works, however, is not the most effective. We then
choose a smarter approach.
What we want to do is to describe by equations the set of elements (x, y, z) ∈ S,
that is the set of elements (x, y, z) of R , such that :
3
or, equivalently,
(x, y, z) − (1, 2, 1) = z, with z ∈ ⟨(1, 1, 1)⟩.
Note that the vector (x, y, z) − (1, 2, 1) = (x − 1, y − 2, z − 1) belongs to the subspace
⟨(1, 1, 1)⟩ if and only if it is a multiple of (1, 1, 1), i.e. if and only if
1 1 1
rk ( ) = 1.
x−1 y−2 z−1
1 1 1 1 1 1
rk ( ) = rk ( ).
x−1 y−2 z−1 0 y−1−x z−x
The rank of this matrix is equal to 1 if and only if the second row of the matrix is
null, that is if and only if
−x + y − 1 = 0
{
−x + z = 0.
We have therefore found a linear system having S as set of solutions. Naturally, every
system equivalent to the one found has S as a set of solutions. In particular, to answer
the question (b) we will have to determine a linear system of 3 equations equivalent
to the one just written. Just add an equation that is a linear combination of the two
equations found. For example:
⎧
⎪ −x + y − 1 = 0
⎪
⎪
⎨
⎪ −x + z = 0
⎪
⎪
⎩ −2x + y + z − 1 = 0.
Linear Systems 111
Tk (x1 , x2 , x3 , x4 ) =
(x1 − 5x2 + kx3 − kx4 , x1 + kx2 + kx3 + 5x4 , 2x1 − 10x2 + (k + 1)x3 − 3kx4 ),
determine for which values of k the vector wk = (1, k, −1) belongs to Im(Tk ). Set
k = 0 and determine the preimage of w0 under T0 .
3 3
6.4.3 Given the linear transformation T ∶ R → R associated with the matrix
⎛3k 3 k + 2⎞
A=⎜
⎜ 1 k k ⎟⎟,
⎝1 2 2 ⎠
6.4.4 Let S = {(1, 2, 0, 3) + z∣z ∈ ⟨(1, −1, 2, 1), (1, 5, −2, 5)⟩}. Determine if S is a
4
vector subspace of R and determine, if possible, a homogeneous linear system having
S as set of solutions.
3 3
6.4.5 Construct, if possible a linear transformation F ∶ R → R such that
F (1, 0, 0) = {(1, 0, 0) + v ∣ v ∈ ⟨(1, 1, 1), (0, 1, −1)⟩}. Establish whether such a
−1
transformation is unique.
1. a linear equation having S = (2, 1, 0, 1) + ⟨(2, 1, 2, 2), (1, −1, 2, 1)⟩ as a set of
solutions;
In this chapter, we introduce two basic concepts: the determinant and the inverse
of a square matrix. The importance of these two concepts will be summarized by
Theorem 7.6.1, which contains essentially all we learnt about the linear maps from
n n
R to R .
⎛1 0 0⎞
I=⎜
⎜0 1 0⎟
⎟.
⎝0 0 1⎠
113
114 Introduction to Linear Algebra
2 0
A=( ),
0 3
then
1 0 1 0
det(A) = 2 det ( ) = 2 ⋅ 3 det ( ) 2 ⋅ 3 det(I) = 6.
0 3 0 1
We will see later how to exploit these properties in a suitable manner to obtain
the determinant of any matrix.
For the moment, we have defined the determinant as a function that has some
properties, however this does not guarantee that such a function exists or, if it exists,
that it is unique. The next proposition, which we do not prove, establishes such facts.
det(A) = − det(B).
(b) If B is obtained from A by adding to a row any linear combination of the other
rows, then:
det(A) = det(B).
(c) If A is an upper (or lower) triangular matrix, that is, the coefficients below
(respectively above) the main diagonal are all equal to zero, then the determinant
of A is the product of the elements that are located on its main diagonal.
Determinant and Inverse 115
= det(R1 + R2 , R1 , R3 , . . . Rn )+
+ det(R1 + R2 , R2 , R3 , . . . , Rn )
= det(R1 , R1 , R3 , . . . Rn ) + det(R1 , R2 , R3 , . . . , Rn )+
+ det(R2 , R1 , R3 , . . . , Rn ) + det(R2 , R2 , R3 , . . . , Rn ) =
= det(R1 , R2 , R3 , . . . , Rn ) + det(R2 , R1 , R3 , . . . , Rn ),
where at the last step we used again property (3). Then
det(R1 , R2 , R3 , . . . , Rn ) + det(R2 , R1 , R3 , . . . , Rn ) = 0,
from which (a) follows, relatively to the first two rows. It is clear that this can be
repeated, in an identical way, for two generic rows.
(b) Let A = (R1 , R2 , . . . , Rn ) and B = (R1 + λ2 R2 + ⋅ ⋅ ⋅ + λn Rn , R2 , . . . , Rn ). Then,
by properties (1) and (2) of Definition 7.1.2, we have:
det(B) = det(R1 , R2 , . . . , Rn ) + λ2 det(R2 , R2 , . . . , Rn ) + ⋅ ⋅ ⋅ +
⎛d1 0 . . . 0⎞
⎜ 0 d2 . . . 0⎟
det ⎜
⎜
⎜
⎟
⎟ = d1 d2 ⋯dn .
⎜
⎜⋮ ⋮⎟⎟
⎟
⎝0 ⋮ 0 dn ⎠
116 Introduction to Linear Algebra
In the case when one or more coefficients on the diagonal are equal to zero, we cannot
obtain a diagonal matrix, however, it is easy to see, by applying the Gaussian algo-
rithm, that we obtain a matrix in which a row consists of all zeros and consequently
the determinant is zero. We leave to the reader the details of this case. The reasoning
for an upper triangular matrix is similar.
Given a square matrix A, using elementary row operations, we can always reduce
A to a triangular form and then calculate the determinant using the properties seen
above. One must pay attention to the fact that the elementary row operations may
change the determinant: if we exchange two rows, we must remember that the deter-
minant changes sign; if we multiply a row by a scalar, the determinant is multiplied
by the same scalar; while finally, if we add to a row a linear combination of the others,
the determinant does not change.
We see an explicit example, although the method that we describe is not the most
efficient for the calculation of the determinant in general.
⎛0 2 4 6⎞
⎜1 2 1⎟
A=⎜ ⎟
⎜ 1 ⎟.
⎜
⎜ 2 −1⎟
⎟
⎜1 1 ⎟
⎝1 1 1 2⎠
We want to bring this matrix into triangular form, using the Gaussian algorithm, but
we must take into account all the exchanges and all the multiplications by a scalar
we make.
We exchange the first row with the second, in this case by Proposition 7.1.4 (a)
the determinant changes sign and the matrix becomes:
⎛1 1 2 1⎞
⎜0 4 6⎟
A=⎜ ⎟
⎜ 2 ⎟.
⎜
⎜ 2 −1⎟
⎟
⎜1 1 ⎟
⎝1 1 1 2⎠
Now we perform the following elementary operations, so as to set to zero all the
coefficients in the first column:
In this way, the determinant does not change, by Proposition 7.1.4 (b), and we obtain:
⎛1 1 2 1⎞
⎜
⎜0 2 4 6⎟⎟
⎜
⎜ ⎟.
⎜
⎜0 0 0 −1⎟⎟
⎟
⎝0 0 −1 1 ⎠
Determinant and Inverse 117
Then, we exchange the second with the third row; by Proposition 7.1.4 (a) the
determinant changes sign and the matrix becomes:
⎛1 1 2 1⎞
⎜
⎜0 2 4 6⎟⎟
⎜
⎜ ⎟.
⎜
⎜0 0 −1 1 ⎟
⎟
⎟
⎝0 0 0 −1⎠
Now we can use property (c) of the previous proposition, which tells us that the
determinant of a triangular matrix is the product of the coefficients on the diagonal.
⎛1 1 2 1⎞
⎜0 6⎟
det ⎜ ⎟
⎜ 2 4 ⎟ = 1 ⋅ 2 ⋅ (−1) ⋅ (−1) = 2.
⎜
⎜ 0 −1 1 ⎟
⎟
⎜0 ⎟
⎝0 0 0 −1⎠
To get the correct result, we must then multiply the determinant of A by (−1) as
many times as the row swaps (in this case two), so:
det(A) = 2 ⋅ (−1) ⋅ (−1) = 2.
In this way, by Proposition 7.1.4 (b), the determinant does not change and we obtain
the triangular matrix:
a11 a12
( 0 a − a21 ⋅ a ) .
22 a 12
11
Now we simply take the product of the diagonal coefficients and we have that:
a21
det(A) = a11 ⋅ (a22 − a ⋅ a12 ) = a11 a22 − a12 a21 .
11
Case 2. Suppose a11 = 0. We exchange the first and the second row; the determinant
changes sign, and we get:
a a
( 21 22 ) .
0 a12
118 Introduction to Linear Algebra
being a11 = 0.
In both cases, then the following formula holds:
1 2
det ( ) = 1 ⋅ 4 − 2 ⋅ 3 = −2.
3 4
Proceeding in a similar manner as in the 2×2 case (obviously the Gaussian algorithm
requires a greater number of steps), it is possible to show that the determinant is:
det(A) = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 − a12 a21 a33 − a11 a23 a32 .
Let us see a mnemonic aid to remember the formula. We rewrite the first two columns
of A next to A to the right
Definition 7.3.1 Let A ∈ Mn×n (R) be a square matrix of order n (i.e. with n rows
and n columns). We denote by Ai j the square submatrix of A obtained by deleting
the i-th row and j-th column A. Ai j is called a minor of A of order n − 1.
Example 7.3.2 If
⎛−1 0 4 7⎞
⎜ 1⎟
A=⎜ ⎟
⎜ 3 −1 2 ⎟,
⎜
⎜ 6 −1 0 ⎟
⎟
⎜3 ⎟
⎝−2 −1 0 −1⎠
we have
⎛−1 2 1⎞ ⎛−1 0 7⎞
A1 1 ⎜ ⎟ ⎜
= ⎜ 6 −1 0 ⎟ , A2 3 = ⎜ 3 6 0⎟⎟,
⎝−1 0 −1⎠ ⎝−2 −1 −1⎠
⎛−1 4 7⎞
A4 2 =⎜
⎜3 2 1⎟⎟.
⎝ 3 −1 0⎠
• If A has order 1, i.e. A = (a1 1 ) has one row and one column, we set
det(A) = a1 1 .
• Suppose now that we know how to compute the determinant of matrices order
n − 1. Let
i+j
Γi j = (−1) det(Ai j ),
then n
det(A) = a1 1 Γ1 1 + a1 2 Γ1 2 + . . . + a1 n Γ1 n = ∑ a1 k Γ1 k .
k=1
120 Introduction to Linear Algebra
This is the method for the calculation of the determinant by expanding along the
first row. Let us see how it works in practice, for 2 × 2 and 3 × 3 matrices: we find
the results seen before.
a b
In fact, we see at once that if A = ( ) is a 2 × 2 matrix, we have
c d
det(A) = aΓ1 1 + bΓ1 2 = ad − bc.
2 3
For example: det( ) = 2 ⋅ 4 − 3 ⋅ 1 = 8 − 3 = 5.
1 4
Let us see now the case of 3 × 3 matrices. Let
⎛a1 1 a1 2 a1 3 ⎞
A=⎜ ⎟.
⎜a2 1 a2 2 a2 3 ⎟
⎝a3 1 a3 2 a3 3 ⎠
We have, by definition:
a2 2 a2 3
Γ1 1 = (−1) det A1 1 = det ( ),
1+1
a3 2 a3 3
a2 1 a2 3
Γ1 2 = (−1) det A1 2 = − det ( ),
1+2
a3 1 a3 3
a2 1 a2 2
Γ1 3 = (−1) det A1 3 = det ( ).
1+3
a3 1 a3 2
Hence
det(A) = a1 1 Γ1 1 + a1 2 Γ1 2 + a1 3 Γ1 3 =
a a a a a a
a1 1 det ( 2 2 2 3 ) − a1 2 det ( 2 1 2 3 ) + a1 3 det ( 2 1 2 2 ) =
a3 2 a3 3 a3 1 a3 3 a3 1 a3 2
= a1 1 (a2 2 a3 3 − a2 3 a3 2 ) − a1 2 (a2 1 a3 3 − a2 3 a3 1 ) + a1 3 (a2 1 a3 2 − a2 2 a3 1 ) =
a1 1 a2 2 a3 3 + a1 2 a2 3 a3 1 + a1 3 a2 1 a3 2 − a1 1 a2 3 a3 2 − a1 2 a2 1 a3 3 − a1 3 a2 2 a3 1 ,
as we saw earlier.
It is possible to expand the determinant according according to the r-th row:
n
det(A) = ar 1 Γr 1 + ar 2 Γr 2 + . . . + ar n Γr n = ∑ ar k Γr k .
k=1
Now
⎛1 0 0⎞
det ⎜2 0 −1⎟
⎜
3+2
Γ3 2 = (−1) ⎟ = −(−1) = 1,
⎝1 −1 0 ⎠
⎛1 3 0 ⎞
det ⎜
⎜2 0 −1⎟
3+3
Γ3 3 = (−1) ⎟ = +(−3) = −3,
⎝1 0 0 ⎠
then det(A) = −3 ⋅ 1 + 5(−3) = −18.
Let us now instead expand det(A) according to the second column:
Now
⎛2 0 −1⎞
det ⎜ 0⎟
1+2
Γ1 2 = (−1) ⎜0 5 ⎟ = −(+5) = −5,
⎝1 −1 0 ⎠
which plays here the same role as the unit in real numbers. This matrix B is called
the inverse of A. By Binet theorem, which we have just seen, it is clear that if
det(A) = 0, then it is not possible to find a matrix B with this property, because
det(AB) = det(A) det(B) = det(I) = 1. We will see shortly that this condition is
also sufficient.
We begin our discussion with the definition of the inverse of a square matrix and
then move on to the various methods of calculation.
We will compute the inverse of a matrix. A first direct method is given by the
proof of the following theorem, which characterizes invertible matrices.
Theorem 7.4.2 The matrix A is invertible if and only if its determinant is nonzero.
Proof. We prove first that if A is invertible then its determinant is different from zero.
By definition we have that AA = I, then, by Binet theorem, det(A) det(A ) =
−1 −1
where (A )ij indicates the i, j entry of the matrix A . We omit the proof of the
−1 −1
Observation 7.4.3 Assume we have two different square matrices A, B such that
AB = I. Then, by Binet Theorem 7.3.5, 1 = det(I) = det(AB) = det(A) det(B), so
det(A) and det(B) are both nonzero. By Theorem 7.4.2, both A and B are invertible,
−1
thus by multiplying (on the left) both sides of the equality AB = I by A we obtain
−1
that B = A is the inverse of A.
Thanks to this observation we get that, given two square matrices A and B, then
AB = I if and only if BA = I and the inverse of a matrix is unique.
Now let us see the explicit formula for the inverse of a 2 × 2 matrix.
a b
Let A = ( ). If the determinant det(A) = ad − bc of A is not zero, the inverse
c d
of A can be calculated with formula (7.1) of the previous proposition. We have that:
then
d −b
= ( ad−bc ad−bc ) .
−1
A −c a
ad−bc ad−bc
1 3
A=( ).
1 4
To compute the inverse, we must apply the Gaussian algorithm to the matrix:
1 3 1 0
( ).
1 4 0 1
We carry out the following elementary operation: 2nd row → 2nd row - 1st row, and
we get:
1 3 1 0
( ).
0 1 −1 1
124 Introduction to Linear Algebra
a
2 row →
a a
2 row − 1 row ⎛ 1 k 0 1 0 0 ⎞
a a a ⎜
→ ⎜ 0 k − 2 0 −1 1 0 ⎟⎟
3 row → 3 row − 2k ⋅ 1 row ⎝ 0 −2k 2 k −2k 0 1 ⎠
⎛ 1 k 0 1 0 0 ⎞
⋅ 2 row → ⎜ 0 ⎟
a 1 a 1 1
2 row → k−2 ⎜ 0 1 0 − k−2 k−2 ⎟
⎝ 0 −2k k −2k
2
0 1 ⎠
⎛ 1 k 0 1 0 0 ⎞
a a ⎜
3 row → 3 row + 2k ⋅ 2 row → ⎜
2 1 a 1 ⎟
⎟
⎜ 0 1 0 − k−2 k−2
0 ⎟
⎝ 0 0 k 1 ⎠
2
4k 2k
k−2 k−2
2k−2 k
a a a
1 row → 1 row − k ⋅ 2 row ⎛ 1 0 0 k−2 − k−2 0 ⎞
a 1 a →⎜
⎜ 1
⎜ 0 1 0 − k−2
1
0 ⎟
⎟
⎟
3 row → ⋅ 3 row k−2
k ⎝ 0 0 1 4 2k 1 ⎠
k−2 k−2 k
The inverse is therefore:
2k−2 k
⎛ k−2 − k−2 0 ⎞
⎜
⎜ 1 1
0 ⎟
⎟
⎜ − k−2 k−2 ⎟.
⎝ 4 2k 1 ⎠
k−2 k−2 k
Determinant and Inverse 125
In the next section, we will relate the concept of determinant and inverse of a
matrix with the properties of the linear transformation associated with it, once we
have fixed the canonical bases in the domain and the codomain.
N N
7.6 THE LINEAR MAPS FROM R TO R
Now that we have introduced the concept of determinant and inverse of a matrix, we
can give an important result that allows us to characterize invertible linear transfor-
n n
mations from R to R .
n n
Theorem 7.6.1 Let F ∶ R ⟶ R be a linear map, and let A be the matrix
associated to F with respect to the canonical basis (in the domain and codomain).
The following statements are equivalent.
1. F is an isomorphism.
2. F is injective.
3. F is surjective.
4. dim(Im(F )) = n.
5. rk(A) = n.
10. A is invertible.
Proof. By Proposition 5.5.2, we immediately have the equivalence between (1), (2),
(3). We now show that the statements (3) through (9) are equivalent, showing that
each of them implies the next and then that (9) implies (2). We will show then,
finally, that (1), (10), (11) are equivalent.
n
(3) implies (4), because if F is surjective, then ImF = R has dimension n.
(4) implies (5), because rk(A) = dim(Im(F )), by Observation 6.2.2.
(5) implies (6), by the definition of rank of a matrix (which is in particular is the
column rank).
(6) implies (7), because the row rank of a matrix is equal to the column rank (Propo-
sition 6.2.3).
(7) implies (8), because if the rows of A are linearly independent, when we apply the
126 Introduction to Linear Algebra
Gaussian algorithm for solving the system A x = 0, we find a row echelon matrix
with exactly n pivots, so there is a unique solution.
We now show that (8) implies (9). If the system A x = 0 has a unique solution,
′
reducing the matrix A in row echelon form, we get a matrix A with exactly n pivots.
Then, reducing the matrix A∣b in row echelon form, we get a matrix of the type
A ∣b , which also has exactly n pivots (those of A ). Then the system A x = b admits
′ ′ ′
a unique solution.
We show that (9) implies (2). By Proposition 5.4.8, it is enough to show that
Ker (F ) = 0. But Ker (F ) consists of the solutions of the homogeneous linear system
A ⋅ x = 0, and thanks to (9), taking x = 0, it has only a solution, which must be the
zero solution, thus F is injective.
We have shown that the conditions (2) through (9) are equivalent
We show that (1) implies (10). Let G be the inverse of F , then F ◦ G = G ◦ F = idRn .
Let B be the matrix associated with G with respect to the canonical basis. Then
AB = BA = I, so B is the inverse of A.
n n
(10) implies (1), because, if B is the inverse of A and LB ∶ R ⟶ R is the linear
map associated with it, then LB is the inverse of F .
(10) is equivalent to (11) by Theorem 7.4.2.
Solution. The matrix associated with F with respect to the canonical bases of the
domain and codomain is:
⎛2 0 1⎞
A=⎜ ⎜−1 1 −1⎟
⎟.
⎝0 k 1⎠
If we calculate the determinant of A with any of the methods that we have seen, for
example expanding according to the first row, we get:
1+1 1+3
det(A) = (−1) 2(1 + k) + 0 + (−1) (−k) = k + 2.
By Theorem 7.6.1 we know that F is an isomorphism if and only if the determinant
of A is nonzero, and therefore F is isomorphism if and only if k ≠ 2. We can therefore
−1
choose any value of k other than 2 to calculate F . We choose k = 0, since this will
simplify the calculations. The matrix associated with the inverse of F in the canonical
bases for the domain and codomain is the inverse of the matrix A. We compute this
inverse using formula 7.1:
−1
⎛2 0 1⎞ ⎛1/2 0 −1/2⎞
⎜−1 1 −1⎟
=⎜ ⎜1/2 1 1/2 ⎟
=⎜
−1
A ⎟ ⎟.
⎝0 0 1⎠ ⎝ 0 0 1 ⎠
Although not necessary for this exercise, it is always a good idea to make sure that
−1
A is actually the inverse of A. To that purpose, it is necessary to perform the rows
−1
by columns product of A and A and verify that the result is the identity matrix:
⎛ 2 0 1 ⎞ ⎛1/2 0 −1/2⎞
⎜−1 1 −1⎟
=⎜ ⎟⎜⎜1/2 1 1/2 ⎟
−1
AA ⎟ = I.
⎝ 0 0 1 ⎠⎝ 0 0 1 ⎠
Therefore the inverse of F is
1 1 1 1
F (x, y, z) = ( x − z, x + y + z, z) .
2 2 2 2
⎛0 1 0⎞
A=⎜
⎜1 0 1⎟
⎟
⎝2 a 3⎠
is invertible.
Choose one of the values for which it is invertible and compute the inverse.
7.8.5 Let e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1) be the canonical basis of the real
3 3 3
vector space R and let a ∈ R. Let T ∶ R ⟶ R be the linear transformation such
that T (e1 ) = e1 − ae2 , T (e2 ) = e2 + e3 , T (e3 ) = ae3 .
a) Find the values of a for which the map is invertible.
b) Choosing one of the values of a for which T is invertible, compute the inverse.
−2 4
A=( ).
−1 2
⎛1 −1 −1⎞
⎜2 −1 0 ⎟
A=⎜ ⎟
⎝0 0 1⎠
Determine T ◦ LA .
7.9 APPENDIX
In this appendix, we want to give an alternative, but equivalent, definition of determi-
nant of a square matrix n × n. Instead of defining it indirectly through its properties,
like we have done in the text, we will see a direct definition, through the concept of
permutation, which is extremely important, even if does not appear so in our choice
of exposition of the theory.
This appendix is not necessary to continue reading, but it represents a deepening
of the concepts presented in this chapter. By the nature and depth of the topics of
this appendix it will be impossible to give a complete treatment, and we refer the
interested reader to the fundamental text of S. Lang, Introduction to linear algebra
[5], for further details.
Determinant and Inverse 129
Definition 7.9.1 Let {1, . . . , n} be the set of the first n natural numbers. A permu-
tation is a bijective function σ ∶ {1, . . . , n} ⟶ {1, . . . , n}.
1 ... n
σ=( )
σ(1) . . . σ(n)
1 2 3 4
σ1 = ( ).
1 4 3 2
This permutation exchanges (or we also say permutes) the elements 2 and 4
and leaves 3 and 4 unchanged. It is also denoted for simplicity by σ1 = (2, 4)
and, as we have seen, we call it a transposition.
1 2 3 4
σ2 = ( ).
2 3 1 4
for every s1 , s2 ∈ Sn .
Now that we have introduced the concept of permutation, we can give an alter-
native definition of determinant.
where with ∑σ∈Sn we denote the fact that we are doing the sum of the elements
a1,σ(1) . . . an,σ(n) as σ varies among all permutations of Sn .
Let A be a 2 × 2 matrix:
a11 a12
A=( ).
a21 a22
According to the new definition, its determinant is:
det(A) = a11 a22 − a12 a21 ,
as S2 , the set of permutations of two elements, consists only of the identity and the
transposition (1, 2). We can immediately note that this expression coincides with the
formula for the determinants of 2 × 2 matrices obtained in Section 7.2.
Let A be a 3 × 3 matrix:
⎛a11 a12 a13 ⎞
⎜a21 a22 a23 ⎟
A=⎜ ⎟.
⎝a31 a32 a33 ⎠
The set of permutations of three elements is:
S3 = {id, (1, 2), (2, 3), (1, 3), (3, 2, 1), (2, 3, 1)},
with respective parities: p(1, 2) = p(2, 3) = p(1, 3) = −1, p(id) = p(3, 2, 1) =
p(2, 3, 1) = 1. The determinant of A is therefore given by:
det(A) = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 − a12 a21 a33 − a11 a23 a32 .
We introduce a notation that will be useful later.
Let A = (aij )i=1,...,n
j=1,...,n
be a generic matrix. The ith row of A is given by:
a11 a12
To clarify the new notation that we introduced, if A = ( ), we write:
a21 a22
A = (a11 e1 + a12 e2 , a21 e1 + a22 e2 ).
We now want to show that in the case of a 2 × 2 matrix the properties that
define the determinant (see Definition 7.1.2) determine it in a unique way. Let A =
(a11 e1 + a12 e2 , a21 e1 + a22 e2 ). Thanks to property (1) of Definition 7.1.2 we have:
det(A) = det(a11 e1 + a12 e2 , a21 e1 + a22 e2 ) =
+ det(a12 e2 , a22 e2 ).
132 Introduction to Linear Algebra
By Proposition 7.1.4, we have that: det(e1 , e1 ) = det(e2 , e2 ) = 0 (as they are matrices
with two equal rows) and det(e1 , e2 ) = − det(e2 , e1 ). Finally by property (3) of
Definition 7.1.2, we have that det(e1 , e2 ) = 1. Therefore
det(A) = (a11 a22 − a12 a21 ) det(e1 , e2 ) = a11 a22 − a12 a21 .
This shows that the function defined in 7.1.2, must necessarily be expressed by the
formula in 7.9.6, and therefore this function is unique.
The procedure we have described for 2×2 matrices can be replicated identically in
the case of n × n matrices, allowing us to get the equivalence of the two definitions of
determinant. Let us look at this in more detail in the proof of the following theorem,
which is the most significant result of this appendix.
Theorem 7.9.7 The function defined in 7.1.2 exists and is unique, and it is ex-
pressed by the formula of the Definition 7.9.6. The two Definitions 7.9.6 and 7.1.2 of
determinant are therefore equivalent.
Proof. Let
n n n
A = ( ∑ a1k1 ek1 , ∑ a2k2 ek2 , . . . , ∑ ankn ekn )
k1 =1 k2 =1 kn =1
Note that the matrix (ek1 , ek2 , . . . , ekn ) (written as a sequence of rows, according
to our convention) has, in the i -th row, 1 in the ki position and zero elsewhere. It
is therefore clear that if there are some values repeated among k1 , . . . , kn the matrix
(ek1 , ek2 , . . . , ekn ) has two equal rows and therefore by property (3) of Definition 7.1.2
its determinant is zero. So det(ek1 , ek2 , . . . , ekn ) ≠ 0 only if k1 , . . . , kn are all distinct,
that is, the function s defined by s(i) = ki is a permutation of {1, . . . , n}. At this
point we have
1 if p(s) = 1
det(ek1 , ek2 , . . . , ekn ) = { .
−1 if p(s) = −1
because we can reorder the rows of (ek1 , ek2 , . . . , ekn ) to get the identity matrix (which
has determinant 1), and to do this we made a number of exchanges corresponding to
Determinant and Inverse 133
This proves the equivalence between the two definitions, but also how the number
defined by the four properties in 7.1.2 exists and is unique, that is how these properties
determine it uniquely.
(note that, with respect to Definition 7.9.6, the permutations are carried out on the
row and not on the column indexes). In particular we have that
T
det(A) = det(A ).
Proof. By the previous theorem we have:
p(σ)
det(A) = ∑ (−1) a1,σ(1) . . . an,σ(n) .
σ∈Sn
= ∑σ∈Sn (−1)
p(σ)
aσ−1 (1),1 . . . aσ−1 (n),n =
p(τ )
= ∑τ ∈Sn (−1) aτ (1),1 . . . aτ (n),n ,
−1
because, if σ varies among all the permutations in Sn , also τ = σ varies among all
permutations in Sn and moreover p(τ ) = p(σ).
Thanks to this new definition and to the previous theorem, we can prove what
we have just stated in the text concerning the determinant calculation procedures.
We now want to get a proof of Theorem 7.3.3, which provides us with a valid
tool for calculating the determinant. Before going to the proof of the formula that
appears in Theorem 7.3.3, also known as the formula for Laplace expansion of the
determinant, we need a technical lemma.
134 Introduction to Linear Algebra
⎛ 1 0 0 ... 0 ⎞
⎜ . . . b2n ⎟
B=⎜ ⎟
⎜ b21 b22 b23 ⎟,
⎜
⎜ ... ⋮ ⎟ ⎟
⎜ ⋮ ⋮ ⋮ ⎟
⎝ bn1 bn2 bn3 . . . bnn ⎠
then
⎛ b22 b23 . . . b2n ⎞
det(B) = det ⎜
⎜ ⋮ ⋮ ... ⋮ ⎟ ⎟.
⎝ bn2 bn3 . . . bnn ⎠
Proof. We have det(B) = ∑σ∈Sn (−1) b1,σ(1) . . . bn,σ(n) . Since b1,j = 0 for every
p(σ)
Now the set of permutations σ of {1, . . . , n} that fix 1 can be seen as the set of all
permutations τ of the set {2, . . . , n}. Considering that b1,1 = 1 we have:
p(τ )
det(B) = ∑ (−1) b2,τ (2) . . . bn,τ (n) ,
τ ∈Sn−1
where Γkl denotes determinant of the matrix obtained from A by deleting the k-th row
and the l-th column, multiplied by (−1) .
k+l
Proof. We look at the proof of the first of these properties, the second is quite
similar. We also suppose i = 1. The general case is only more complicated to
write, but does not offer any additional conceptual difficulty. We write A as A =
n
(∑j=1 a1j ej , A2 , . . . , An ). By properties (1) and (2) of Definition 7.1.2, we have that:
n
det(A) = ∑ a1j det(ej , A2 , . . . , An ).
j=1
Determinant and Inverse 135
Mj = det(ej , A2 , . . . , An ) =
⎛ 0 0 ... 0 1 0 ... 0 ⎞
⎜ a21 a22 . . . a2(j−1) a2j a2(j+1) . . . a2n ⎟
=⎜
⎜
⎜
⎟
⎟.
⎜
⎜ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋱ ⋮ ⎟⎟
⎟
⎝ an1 an2 . . . an(j−1) anj an(j+1) . . . a2n ⎠
det(Mj ) =
⎛ 1 0 ... 0 0 0 ... 0 ⎞
⎜ ⎟
= (−1) det ⎜ ⎟
j−1
⎜ a2j a21 a22 . . . a2(j−1) a2(j+1) . . . a2n ⎟
⎜
⎜ ⎟
⎟ .
⎜ ⋮ ⋮ ⋮ ... ⋮ ⎟
⎝ anj an1 an2 . . . an(j−1) an(j+1) . . . ann ⎠
and this is precisely the matrix obtained from A by deleting the first row and the j
-th column.
Then we get:
n
det(A) = ∑j=1 a1j (−1)
1+j
Let us now prove Binet Theorem 7.3.5; as the two definitions of determinant are
equivalent we can use the one that best fits what we want to do.
n
Proof. We know that AB is the matrix whose coefficient of place i, j is ∑k=1 aik bkj ,
then apply Definition 7.9.6:
n n
det(AB) = ∑σ∈Sn (−1) (∑k1 =1 a1k1 bk1 ,σ(1) ) . . . (∑kn =1 ankn bkn ,σ(n) ) =
p(σ)
This is the determinant of the matrix having as rows the rows k1 , . . . , kn of the matrix
B, i.e.:
p(σ)
∑ (−1) bk1 ,σ(1) . . . bkn ,σ(n) = det(∑ bk1 ,i1 ei1 , . . . , ∑ bk1 ,in ein ),
σ∈Sn i1 in
using the notation introduced previously. By property (3) of Definition 7.1.2, we have
that if k1 , . . . , kn are not all distinct, that determinant is zero. So from now on in the
expression det(AB) we sum up only the terms where k1 , . . . , kn are distinct. In this
case, we note that the matrix (∑i1 bk1 ,i1 ei1 , . . . , ∑in bk1 ,in ein ) is obtained starting from
B with a number of row exchanges corresponding to the parity of the permutation τ
defined by τ (i) = ki . Therefore:
p(τ )
det(∑ bk1 ,i1 ei1 , . . . , ∑ bk1 ,in ein ) = (−1) det(B).
i1 in
p(τ )
= ∑τ ∈Sn (−1) a1,τ (1) . . . an,τ (n) det(B) = det(A) det(B).
If A is a square matrix we denote with A1 , . . . , An the rows of A and with Γ ̃i
the column vector (Γi1 , . . . , Γkn ) , where the numbers Γij are defined as in Laplace
T
theorem, and they are called the algebraic complements of the A matrix. The first
formula of Laplace theorem can then be written as:
̃ i = det(A),
Ai Γ for each i = 1, . . . , n, (7.2)
where the product is the usual product rows by columns.
̃ j has any meaning even when i ≠ j. The
It makes sense to ask if the product Ai Γ
answer is given by following proposition.
Determinant and Inverse 137
′
On the other hand, the matrix A has two equal rows (the i-th and the j-th), then
′
by property (3) of Definition 7.1.2 the determinant of A is equal to zero. This proves
precisely formula (7.3), that is, we have:
ai,1 Γj,1 + ⋅ ⋅ ⋅ + ai,n Γj,n = 0 per ogni i, j = 1, . . . , n, i ≠ j.
Let us now prove formula (7.1) for the computation of the inverse matrix of a
square matrix A with non zero determinant. Let det(Ai j ) be the determinant of the
matrix obtained from A by removing the i-th row and the j-th column and consider
the matrix B whose elements are defined by:
1 i+j
(B)ij = (−1) det(Aj i ). (7.4)
det(A)
Note that the j-th column of the matrix B is 1 ̃ .
Γ
We now compute the product,
det(A) j
1
rows by columns, AB. The element of place i, j is Ai Bj = det(A) ̃ j and by formulas
Ai Γ
1
(7.2) and (7.3) this is det(A) det(A) = 1 if i = j and 0 if i ≠ j.
So AB = I. Similarly, starting from Laplace theorem and expanding the deter-
minant according to the columns, we will have that BA = I, hence B is the inverse
of A.
Finally, we prove the correctness of the method described in Section 7.5 to com-
pute the inverse of an invertible matrix using the Gaussian algorithm.
Let A = (aij )i=1,...,n
j=1,...,n
be an invertible square matrix and let B = (bij )i=1,...,n
j=1,...,n
be its
inverse. We have AB = I, where I is the identity matrix of order n. Let A1 , . . . , An
be the row vectors of A and B ̃1 , . . . , B
̃n the column vectors of B. The coefficients
̃1 satisfy the relations: A1 B
b11 , . . . , bn1 of B ̃1 = (AB)11 = 1, A2 B ̃1 = (AB)21 =
̃
0, . . . , An B1 = (AB)n1 = 0 (where the products are rows by columns), i.e. they are a
solution of the linear system associated with the matrix:
⎛ a11 a12 . . . a1n 1 ⎞
⎜
⎜ a21 a22 . . . a2n 0 ⎟
⎟
⎜
⎜ ⎟.
⋮ ⎟
⎜ ⎟ (7.5)
⎜ ⋮ ⋮ ⋮ ⎟
⎝ an1 an2 . . . ann 0 ⎠
138 Introduction to Linear Algebra
Since A is invertible, by Theorem 7.6.1, the system admits a unique solution, and
then solving this system we determine uniquely the elements of the column B ̃1 . As
described in Section 7.5, through elementary operations on the rows, it is possible to
obtain the identity matrix on the left, that is
⎛ 1 0 ... 0 c11 ⎞
⎜
⎜ 0 1 ... 0 c21 ⎟
⎟
⎜
⎜ ⎟.
⋮ ⋮ ⎟
⎜ ⎟ (7.6)
⎜ ⋮ ⋮ ⎟
⎝ 0 0 ... 1 cn1 ⎠
Since the two systems associated with the matrices in (7.5) and (7.6) have the same
solutions, it must be bj1 = cj1 for each j = 1, . . . , n, i.e. in (7.6) the column on the
̃1 .
left is precisely B
We proceed in the same way for the generic column B ̃i , whose coefficients are the
solutions of the linear system:
To solve this system, we can perform on the rows of A exactly the same elementary
̃1 , and thus we obtain:
operations that we did in the case of B
⎛ 1 0 ... 0 c1i ⎞
⎜
⎜ 0 1 ... 0 c2i ⎟
⎟
⎜
⎜ ⎟
⎟ (7.8)
⎜
⎜ ⋮ ⋮ ⋮ ⋮ ⎟
⎟
⎝ 0 0 ... 1 cni ⎠
and again, since the two systems associated with the matrices in (7.7) and (7.8) have
the same solutions, it must be bji = cji for each j = 1, . . . , n, i.e. in (7.8) the column
̃i . Since we have to solve n linear systems that all have the same
on the left is just B
matrix of coefficients, we can solve them at the same time by considering the matrix
A∣I.
For all we said, after performing the elementary operations on the rows needed to
reduce A to the identity matrix, we obtain a matrix of the type:
• The determinant of a matrix A is zero if and only if the matrix A has one row
(or a column), which is a linear combination of the others.
′
• If the matrix A is obtained from the matrix A by exchanging two rows (or two
′
columns) the determinant of A is the opposite of the determinant of A.
′
• If the matrix A is obtained from the matrix A by multiplying a row (or a col-
′
umn) by a scalar λ, the determinant of A is the product of λ by the determinant
of A.
′
• If the matrix A is obtained from the matrix A adding to a row (or a col-
′
umn) a linear combination of the others the determinant of A is equal to the
determinant of A.
• det(AB) = det(BA) = det(A) det(B) for each pair of square matrices A and
B of the same order (but caution, in general AB ≠ BA!).
Change of Basis
In this chapter, we want to address one of the most technical topics of this theory, i.e.
the change of basis within a vector space. We will also understand how to change the
matrix associated with a linear transformation, if we change the bases in the domain
and codomain.
↦ (F (e1 ), . . . , F (en )) .
n m
F ∶R ⟶R
The matrix (F (e1 ), . . . , F (en )), associated with the linear transformation F ∶
n m
R ⟶ R , in such one to one correspondence, has as columns the images of the
n
vectors of the canonical basis of R . Let us see an example.
Consider the linear transformation F ∶ R ⟶ R , F (e1 ) = e1 − e2 , F (e2 ) = 3e2 .
2 2
This transformation is associated, with respect to the canonical basis in the domain
and codomain, to the matrix:
1 0
A=( ).
−1 3
n
We know that the choice of the canonical basis to represent the vectors in R
is arbitrary, while being extremely convenient. For example, we have seen that a
vector expressed with respect to two different-ordered bases has obviously differ-
ent coordinates. Up to now, using a basis, other than the canonical one, to rep-
resent vectors seemed unnecessary. However, as we shall see in the next chapter,
it provides us the key to understanding the concepts of eigenvalues and eigenvec-
tors, which are of fundamental importance not only in linear algebra but also in its
applications.
We now want to generalize the correspondence between matrices and linear trans-
formations described above. Let us start with some observations (see 5).
141
142 Introduction to Linear Algebra
Let V and W be two vector spaces of finite dimension and let B = {v1 , . . . , vn }
and B = {w1 , . . . , wm } be two ordered bases of V and W , respectively. If F ∶ V → W
′
= (a1 1 x1 + a1 2 x2 + . . . + a1 n xn )w1 +
+(a2 1 x1 + a2 2 x2 + . . . + a2 n xn )w2 + . . .
+(am 1 x1 + am 2 x2 + . . . + am n xn )wm .
So, if the coordinates of v with respect to the base B are (v)B = (x1 , . . . , xn ), we
have that the coordinates of F (v) with respect to the base B are:
′
⎛ a1 1 x 1 + a1 2 x 2 + . . . + a1 n x n ⎞
⎜ a2 1 x1 + a2 2 x2 + . . . + a2 n xn ⎟
(F (v))B′ =⎜
⎜
⎜
⎜
⎟
⎟
⎟
⎟ = A ⋅ (v)B ,
⎜ ⋮ ⎟
⎝am 1 x1 + am 2 x2 + . . . + am n xn ⎠
where A is the matrix defined in (8.1), which has as columns the coordinates of
the vectors F (v1 ), . . . , F (vn ) with respect to base B = {w1 , . . . , wm }, and A ⋅ (v)B
′
denotes the product rows by columns of the matrix A and the vector (v)B , which
represents the coordinates of v with respect to the basis B.
We are therefore able to associate a m × n matrix A to F once we fix arbitrary
′ n m
ordered bases B and B in the domain and codomain. If V = R , W = R and we fix
′
as B and B the canonical bases of the domain and the codomain, we have that the
matrix A is precisely the matrix associated to F defined in the previous chapters and
recalled the beginning of this chapter. This matrix has as columns F (e1 ), . . . , F (en ),
Change of Basis 143
namely the coordinates, with respect to the canonical basis, of the the images of
n
the vectors of the canonical basis (in R , if we do not specify otherwise, we always
consider the coordinates of vectors with respect to the canonical basis).
We now want to formalize what we observed in the following definition.
Let us make an important distinction. While until now we interchangeably used
rows or columns to indicate the coordinates, in this chapter we require more accuracy,
since the coordinates of a vector with respect to a given basis will form a column
vector, which must then be multiplied (rows by columns) by a matrix. Henceforth,
we shall denote the coordinates of a vector with respect to a given basis via a column
vector.
⎛ y1 ⎞ ⎛ x1 ⎞
(F (v))B′ = ⎜
⎜ ⎟⋮ ⎟ = A B,B ⎜
′ ⎜⋮⎟ ⎟.
⎝ym ⎠ ⎝xn ⎠
From previous observations, we have that the i-th column of AB,B′ is given by the
coordinates of F (vi ) with respect to the basis B .
′
1. What are the coordinates of the vectors v1 and v2 with respect to the basis B?
2. How can we write the matrix associated with F with respect to the canonical
basis in the domain and the basis B in the codomain?
The answer to the first question is obvious. The vectors v1 and v2 have, respec-
tively, coordinates (1, 0) and (0, 1) (T denotes the transpose, namely the fact that
T T
the coordinates represent a column vector). Indeed, v1 = 1⋅v1 +0⋅v2 , v2 = 0⋅v1 +1⋅v2 .
The answer to the second question is also quite simple. Indeed, we have already
seen that changing the basis that we choose to represent vectors within a vector
144 Introduction to Linear Algebra
space, does not change vectors, but only how we write them, i.e. their coordinates.
In the above example, the vectors v1 and v2 are the same, what changes passing
from the canonical basis to the basis B are just their coordinates, which change from
(v1 )C = (1, −1) , (v2 )C = (0, 3) to (v1 )B = (1, 0) , (v2 )B = (0, 1) .
T T T T
The same reasoning is valid for linear transformations, where the concept of co-
ordinates is replaced by the concept of matrix associated with the transformation,
with respect to two given ordered bases in the domain and codomain. Let us see a
concrete example.
Consider the linear transformation F ∶ R ⟶ R , such that F (e1 ) = v1 , F (e2 ) =
2 2
v2 . Let us now represent F using the coordinates with respect to the canonical basis
C = {e1 , e2 } in the domain and the coordinates with respect to the basis B = {v1 , v2 }
in the codomain. We have that (F (e1 ))B = (1, 0) , (F (e2 )B = (0, 1) , where we use
T T
an index to remind us that we use coordinates with respect to a certain basis, which
is not necessarily the canonical one. So we have that the matrix associated to F with
respect to the bases C of the domain and B of the codomain is
1 0
AC,B = ( ).
0 1
In fact, taking the product rows by columns, we see that:
1 1
AC,B ( )= ( )= (F (e1 ))B
0 0
0 0
AC,B ( ) = ( ) = (F (e2 ))B .
1 1
Therefore, the matrix associated to F with respect to the bases C in the domain
and B in the codomain is just AC,B , that is, the identity matrix.
We can easily generalize what we have just said.
Proposition 8.1.2 Let F ∶ V ⟶ V be a linear transformation such that F (v1 ) =
w1 , . . . , F (vn ) = wn , where B = {v1 , . . . , vn }, B = {w1 , . . . , wn } are two ordered
′
bases of V . Then the matrix associated to F , with respect to the base B in the domain
′
and the base B in the codomain is the identity matrix.
The proof is an easy exercise and follows the previous reasoning.
avoid to treat the change of basis in this generality is dictated only by the hope to
increase the clarity, but does not involve conceptual issues.
We now ask a question in some way related to the previous ones.
n n
What is the matrix associated to the identity id ∶ R ⟶ R with respect to
the bases B = (v1 . . . vn ), B = (w1 . . . wn ), respectively, in the domain and in the
′
codomain?
Certainly we know that the identity matrix is associated to the identity map if we
fix the canonical bases in the domain and codomain, however we already know from
the previous example that changing the basis can radically change the appearance of
the matrix associated to the same linear transformation.
We now look at a simple example to help our understanding.
2 2
Example 8.2.1 Consider the identity map id ∶ R ⟶ R and fix the basis B =
{v1 , v2 } in the domain and the canonical basis C in the codomain, with v1 = 2e1 , v2 =
e1 + e2 .
The identity map always behaves in the same way even if we change the way we
represent it: id still sends a vector to itself. Let us see what happens:
We write the coordinates of the vectors with respect to the canonical basis:
2 1
(id(v1 ))C = ( ) , (id(v2 ))C = ( ) .
0 1
Therefore, the matrix associated to the identity with respect to the basis B in the
domain and the canonical basis C in the codomain is:
2 1
IB,C = ( ).
0 1
Now we wonder what happens if we want to represent the identity using the canonical
basis C in the domain and the basis B in the codomain. The identity always associates
to each vector itself; the problem is to understand what are the right coordinates.
So:
1/2 −1/2
(id(e1 ))B = ( ), (id(e2 ))B = ( ).
0 1
Therefore, the matrix associated to the identity with respect to the canonical basis
C of the domain and the basis B of the codomain is:
1/2 −1/2
IC,B = ( ).
0 1
In this very simple example, it was possible to calculate easily the coordinates of e1
146 Introduction to Linear Algebra
and e2 with respect to the basis B = {v1 , v2 }; in general this is not always so easy.
−1
However, in this example, note that IC,B = IB,C . Therefore, the coordinates of the
vectors e1 and e2 with respect to the basis B can be read from the columns of the
−1
matrix IB,C . Remember that the matrix IB,C can be easily calculated and it has as
columns the coordinates of the vectors v1 , v2 with respect to the canonical basis.
we have:
• the matrix associated to id with respect to the basis B in the domain and the
canonical basis C in codomain is:
where (vi )C are the coordinates of the vector vi with respect to the canonical
basis;
• the matrix associated to id with respect to the canonical basis C in the domain
−1
and B in the codomain is IB,C .
Proof. The first point is clear; substantially it is what we saw in Example 8.2.1. Indeed
(id(v1 ))C = (v1 )C are precisely the coordinates of the vector v1 in the canonical basis.
The same is true for (id(v2 ))C , . . . , (id(vn ))C .
Change of Basis 147
To show the second point, we show that IC,B is the inverse of IB,C , that is,
IB,C IC,B = IC,B IB,C = I, where I is the identity matrix.
Consider the composition of the identity with itself, with respect to different
bases, as indicated in the following diagram:
n id n id n
R ⟶ R ⟶ R .
B C B
The composite function id◦id = id is still the identity, and if we consider the matrices
associated with it, by Observation 8.2.3 we get: IC,B IB,C = IB . Now, by Proposition
8.2.2 we have that IB = I, so that IC,B IB,C = I. Similarly, considering the diagram:
n id n id n
R ⟶ R ⟶ R ,
C B C
−1
we get that IB,C = IC,B . This concludes the proof.
This theorem also answers another question, which was asked previously, that is,
how we can write the coordinates of a vector v with respect to a given basis B.
So far we have responded with a very explicit calculation in each case, but now
we can state a corollary that contains the answer in general.
n n
Corollary 8.2.5 Let C be a basis of R , and let v be a vector of R . Then the
coordinates of v with respect to the basis B are given by:
Proof. We have that v = id(v), so we choose the canonical basis C in the domain
and the basis B in the codomain. Now by Theorem 8.2.4, we know that, with respect
−1
to these bases, the identity map is represented by the matrix IC,B = IB,C .
−e2 . We want find the coordinates of the vector v = −e1 + 3e2 with respect to
−1 0
the basis B. We know that (v)C = (−1, 3) and IB,C = ( ). With an easy
2 −1
−1 0
calculation, we obtain that IB,C = ( ), then the coordinates are:
−1
−2 −1
−1 0 −1 1
(v)B = IB,C ⋅ (v)C = ( )( ) = ( ).
−1
−2 −1 3 −1
Indeed:
v = 1 ⋅ v1 − 1 ⋅ v2 = −e1 + 2e2 + e2 = −e1 + 3e2 .
148 Introduction to Linear Algebra
id ↑ ↓ id
n F m ′
B R ⟶ R B
This is what is called a commutative diagram, because the path we choose in the
diagram does not influence the result:
F = id ◦ F ◦ id.
The equality we wrote appears as a tautology and not particularly interesting, how-
ever, when we associate with each transformation its matrix with respect to the fixed
bases in the domain and codomain, this same equality will provide a complete answer
to the question we set at the beginning of this section, and that is perhaps the most
technical point of our linear algebra notes.
Therefore, we associate to each linear transformation the corresponding matrix
on the same diagram using coordinates, that is, using matrices to represent linear
transformations. As we know from the previous section, we get:
n F m
R ⟶ R
IB,C (v)B (v)C ↦ AC,C ′ (v)C (v)C ′
↑ id ↑ ↓ id ↓ .
F
(v)B IC ′ ,B′ (v)C ′
n m
R ⟶ R
We want to use this diagram to determine AB,B′ , that is, the matrix associated
′
to F with respect to the bases B in the domain and B in the codomain.
Thanks to Observation 8.2.3, we have that the equality:
F = id ◦ F ◦ id
corresponds to:
−1
AB,B′ = IC ′ ,B′ AC,C ′ IB,C = IB′ ,C ′ AC,C ′ IB,C ,
where, for the last equality, we have used Theorem 8.2.4.
Thus, we have proved the following theorem.
n m
Theorem 8.3.1 Let F ∶ R ⟶ R be a linear transformation and let AC,C ′ be the
′
matrix associated with F with respect to the canonical bases C and C in the domain
and codomain, respectively. Then the matrix associated to F , with respect to the bases
′
B in the domain and B in the codomain, is given by:
−1
AB,B′ = IB′ ,C ′ AC,C ′ IB,C ,
where:
• AC,C ′ has, as columns, the coordinates of the vectors which are the images of
the vectors of the canonical basis, i.e. F (e1 ), . . . , F (en ), expressed in terms of
m
the canonical basis of R ;
′
• IB′ ,C ′ has, as columns, the coordinates of the vectors of the basis B expressed
m
with respect to the canonical basis of R ;
• IB,C has, as columns, the coordinates of the vectors of the basis B, expressed in
n
terms of the canonical basis of R .
2 3
Example 8.3.2 Consider the linear transformation F ∶ R ⟶ R associated to the
matrix AC,C ′ with respect to the canonical bases, where
⎛ 1 2⎞
AC,C ′ =⎜
⎜ 0 1⎟
⎟.
⎝−1 0⎠
We want to find the matrix AB,B′ associated with F with respect to the bases: B =
{e1 + e2 , −e1 − 2e2 } in the domain and B = {2e3 , e1 + e3 , e1 + e2 } in the codomain.
′
1 −1 ⎛0 1 1⎞
IB,C =( ), IB ,C
′ ′ =⎜
⎜0 0 1⎟
⎟.
1 −2 ⎝2 1 0⎠
150 Introduction to Linear Algebra
−1
We calculate IB′ ,C ′ with any method:
⎛−3/2 2 ⎞
=⎜
⎜ 2 −3⎟
⎟.
⎝ 1 −2⎠
0 1 0 0
column is F (e12 ) = F ( ) = 0 + 0 = 0, the third is F (e21 ) = F ( ) = 0 and
0 0 1 0
0 0
the fourth is F (e22 ) = F ( ) = 1, thus A = (1, 0, 0, 1).
0 1
the matrix AC,B associated with F with respect to the canonical basis in the domain
and the basis B in the codomain.
Solution. The matrix associated with F with respect to the canonical bases of the
domain and codomain is:
2 1 1
AC,C ′ = ( ).
−1 0 1
We want to change basis in the codomain, therefore we consider the following com-
position of functions:
3 F 2 id 2
R ⟶ R ⟶ R .
′
C C B
The composition is id ◦ F = F , and the matrix associated to it is AC,B = IC ′ ,B AC,C ′ . In
−1
addition, IC ′ ,B = IB,C ′ , where IB,C is the matrix that has as columns the coordinates
2 1
of the vectors of B with respect to the canonical basis C , thus: IB,C ′ = ( ).
′
−1 −1
1 1
We have that IB,C ′ = ( ), therefore
−1
−1 −2
1 1 2 1 1 1 1 2
AC,B = IC ′ ,B AC,C ′ = ( )( )=( ).
−1 −2 −1 0 1 0 −1 −3
8.4.3 Let B = {(2, −1), (1, 1)} be a basis of R and let G ∶ R ⟶ R be the linear
2 2 3
transformation defined by: G(2, −1) = (1, −1, 0), G(1, 1) = (2, 1, −2). Determine the
matrix AC,C ′ associated with G with respect to the canonical bases of the domain and
of the codomain.
Solution. The matrix associated with G with respect to the basis B of the domain
′
and the canonical basis C of the codomain is:
⎛1 2⎞
AB,C ′ = ⎜
⎜−1 1 ⎟
⎟.
⎝ 0 −2⎠
Since we have changed the basis in the domain, we consider the following composition
of functions:
2 id 2 F 3
R ⟶ R ⟶ R .
′
B C C
The composition is G ◦ id = G, and the matrix associated with it is AB,C ′ = AC,C ′ IB,C ,
−1 −1
therefore, multiplying to the right-both members by IB,C we get: AC,C ′ = AB,C ′ IB,C .
152 Introduction to Linear Algebra
1
2 1 − 13
We have IB,C = ( ) and IB,C ′ = ( 31 ), so
−1
2
−1 1 3 3
⎛1 2⎞ 1
− 13 ⎛ 1 1 ⎞
AC,C ′ = ⎜
⎜−1 1 ⎟
⎟ ( 31 2 )=⎜
⎜ 0 1 ⎟⎟.
⎝ 0 −2⎠ 3 3 ⎝− 2 4⎠
−3
3
8.5.2 Consider the linear transformation F ∶ R ⟶ R defined by: F (e1 ) = −e2 +e3 ,
2 3
b) Determine the coordinates of the vector v = (1, −1, 1) with respect to the basis
′ 3
B of R .
a) Determine the matrix associated to F with respect to the canonical basis and the
matrix AB associated to F with respect to the basis B = {−e1 + e2 , −2e1 + 3e2 , −e3 }.
b) Say if F is an isomorphism and give a motivation for your answer.
c) If the answer in point (b) is affirmative, compute the inverse of F .
3 2
8.5.5 a) Given the linear transformation T ∶ R → R defined by:
T (x, y, z) = (3kx+y −2kz, 3x+ky −2z), determine for which values of k we have that
T is surjective, motivating the procedure followed. Set k = 0 and determine ker T .
b) Let B = {4e1 −2e2 , −e1 +e2 } be another R basis. Set k = 0. Determine the matrix
2
3
AC,B associated with T with respect to the canonical basis C of R in the domain
and at the basis B in the codomain.
3 3
8.5.6 a) Let F ∶ R → R be the linear transformation defined by:
F (x, y, z) = (x − 4y − 2z, −x + ky + kz, kx − 4ky + z).
Determine for which values of k we have that F is surjective.
Change of Basis 153
3 3
b) Set k = 0 and determine, if possible, a linear transformation G ∶ R → R such
that G ◦ F is the identity.
c) Let B = {e1 + e2 , −e1 + e3 , 2e2 } be another basis of R . Set k = 0. Determine the
3
matrix AC,B associated with F with respect to the basis B in the domain and the
3
canonical basis C of R in the codomain.
8.5.7 Consider the linear transformation D ∶ R3 [x] ⟶ R3 [x] that associates its
derivative to each polynomial. Determine the matrix associated with D with respect
to the basis {x , x , x, 1} of R3 [x].
3 2
Eigenvalues and
Eigenvectors
In this chapter, we want to address one of the most important questions of linear
algebra, namely the problem of diagonalizing a linear transformation together with
the concepts of eigenvalue and eigenvector.
9.1 DIAGONALIZABILITY
The idea behind the problem of diagonalizability is very simple: given a linear trans-
n n
formation F ∶ R ⟶ R , we ask if there is a basis, both for the domain and the
codomain, such that the matrix associated to F , with respect to this basis, has the
simplest possible form, namely the diagonal one. Let us see an example.
1 0
AB = ( ).
0 −1
We can see this right away without calculations, however, to convince ourselves, we
can just use Definition 8.1.1 or the formula for changing the basis in Chapter 8.
In this case, it is very simple to see what happens geometrically. The transforma-
tion φ is the reflection of the plane with respect to the line x = y. Indeed, φ(e1 ) = e2 ,
155
156 Introduction to Linear Algebra
φ(e2 ) = e1 . We see, geometrically, that the vector v1 , lying on the straight line y = x,
is fixed by the transformation, while the vector v2 , which is perpendicular to the line
y = x, is sent to −v2 . Based on these observations, we can conclude without any
calculation that, with respect to the basis B = {v1 , v2 }, the matrix associated with
φ is in the specified diagonal form.
6y
φ(v2 ) = −v2
v1 = φ(v1 )
@
I
@
@
@
@
@ -
@
@ x
@
@
@
R
@
v2
n n
Definition 9.1.2 A linear transformation T ∶ R ⟶ R is said to be diagonalizable,
n
if there is a basis B for R , such that the matrix AB associated with T with respect
to B (in domain and codomain) is a diagonal matrix.
In the above example, the transformation φ is diagonalizable and B = {e1 +
e2 , e1 − e2 } is the basis with respect to which matrix associated with φ is diagonal.
Just as we gave the definition of diagonalizable linear map, we can also give
the definition of diagonalizable matrix: this is essentially a matrix associated with a
diagonalizable linear transformation. We now see the precise definition.
Definition 9.1.3 A square matrix A is called diagonalizable, if there is a invertible
−1
matrix P , such that P AP is diagonal.
n n
Proposition 9.1.4 Let T ∶ R ⟶ R be a linear transformation associated with the
matrix A with respect to the canonical basis (in domain and codomain). Then T is
diagonalizable if and only if A is diagonalizable. Also, if T and A are diagonalizable,
then the coordinates of the vectors forming the basis B are the columns of the matrix
−1
P such that P AP is diagonal.
Proof. The statement is straightforward, if we remember how to make basis changes
from the previous chapter. Suppose that T is diagonalizable. Then there is a basis B
with respect to which the matrix AB associated with T is diagonal. If P = IB,C , we
have that the formula in Theorem 8.3.1 becomes:
−1
AB = P AP,
Eigenvalues and Eigenvectors 157
λ1 0
AB = ( ).
0 λ2
But it is easy to see that a rotation is not fixing any direction, hence v1 and v2 cannot
exist, that is, A is not diagonalizable.
Note that the argument would be different, if we allowed the scalars to take
complex values. In fact, in this case there would be two vectors namely v1 = (1, −i)
and v2 = (i, −1), such that ψ(v1 ) = −iv1 , ψ(v2 ) = iv2 .
This example suggests a third question:
3) If we allow scalars to take complex values, then, can we always diagonalize a
given matrix (or linear transformation)?
158 Introduction to Linear Algebra
The answer is no, but we can always bring a matrix to a form which is almost
diagonal; this is called the Jordan form, and we will not discuss it, because it would
take us too far.
...
T (vn ) = λn vn = 0v1 + 0v2 + ⋅ ⋅ ⋅ + λn vn ,
so the matrix associated with T respect to the basis B is:
⎛λ1 . . . 0⎞
AB = ⎜
⎜⋮ ⋮⎟⎟.
⎝ 0 ... λn ⎠
So the transformation T is diagonalizable by definition.
Conversely, if T is diagonalizable, it means that there is a basis B with respect to
which the matrix associated with T is diagonal, i.e.
⎛λ1 . . . 0⎞
AB = ⎜
⎜⋮ ⋮⎟⎟.
⎝ 0 ... λn ⎠
But then, this matrix has precisely the eigenvalues on the diagonal because
⎛λ1 . . . 0 ⎞ ⎛0⎞ ⎛ 0 ⎞
.
(T (vn ))B = A(vn )B = ⎜
⎜⋮ ⋮⎟⎟⎜⎜⋮⎟
⎟=⎜
⎜⋮⎟ ⎟
⎝ 0 ... λn ⎠ ⎝1⎠ ⎝λn ⎠
We now want to give a concrete method for calculating eigenvalues and eigenvec-
tors of a given matrix or linear transformation.
The fact that pA (x) is actually a polynomial in x, for example, follows from the
recursive calculation of the determinant expanded according to a row (or a column)
of A − xI.
In the following, if A is a matrix, for the sake of brevity we will denote by Ker A the
kernel of the linear transformation LA associated with A with respect to the canonical
basis (in the domain and codomain). Equivalently, Ker A is the set of solutions of the
homogeneous linear system A x = 0.
Proof. If λ is an eigenvalue of A, then there exists v ≠ 0 such that Av = λv, that is,
Av − λv = 0, i.e. (A − λI)v = 0 and so v ∈ Ker (A − λI). Thus by Theorem 7.6.1
the determinant of the matrix A − λI is zero.
Conversely, if det(A − λI) = 0, by Theorem 7.6.1 there is a nonzero vector v ∈
Ker (A − λI). Then (A − λI)v = 0, so Av = λv and v is an eigenvector of eigenvalue
λ.
Definition 9.2.6 Let A and B be two n × n matrices. A and B are said similar, if
there exists an invertible matrix n × n P such that:
−1
B=P AP.
Observation 9.2.7 • If A and B are similar then A and B represent the same
linear transformation with respect to different bases. This immediately follows
from our discussion on basis change.
pB (x) = det(P
−1 −1 −1
AP − xI) = det(P AP − P P xI) =
(In this proof, we have also used the fact that if P is a matrix and x is a scalar, then
P (xI) = (xI)P ).
Eigenvalues and Eigenvectors 161
have the same characteristic polynomial pA (x) = (3 − x) = pB (x), but using the
2
techniques we will learn at the end of this chapter it is easy to deduce that A and B
are not similar, becauseB is not diagonalizable.
Observation 9.2.10 From the proof of Theorem 9.2.5, we have that a vector v ≠ 0
is an eigenvector of a linear transformation T with associated eigenvalue λ if and only
if it belongs to Ker (A − λI), where A is the matrix associated with T with respect
to the canonical basis.
Definition 9.2.11 If λ is an eigenvalue of T ∶ R ⟶ R then Vλ = {v ∈ R ∣ T (v) =
n n n
The associated equation has solutions: 1 and 2, so there are two eigenvalues for
A, λ = 1 and λ = 2.
x x
A( ) = ( ),
y y
5x − 4y = x
{
3x − 2y = y .
162 Introduction to Linear Algebra
5−1 −4
( ),
3 −2 − 1
3 −4
V2 = Ker (A − 2I) = Ker ( ) = ⟨(4/3, 1)⟩.
3 −4
• The vectors (1, 1), (4/3, 1) are linearly independent, so we have that A is diag-
onalizable, and it is similar to the diagonal matrix:
1 0
D=( ) = P AP,
−1
0 2
1 4/3
where P = ( ).
1 1
P is the matrix of the change of basis, which allows us to pass from the canonical
basis to the basis formed by the eigenvectors of A. By the theory on the basis change
(see Chapter 8), the columns of P consist of the coordinates of the eigenvectors with
respect to the canonical basis.
β1 v1 + β2 v2 = 0. (9.1)
Applying T to both members and taking into account that T (v1 ) = λ1 v1 , T (v2 ) =
λ2 v2 , we obtain:
β1 λ1 v1 + β2 λ2 v2 = 0. (9.2)
Eigenvalues and Eigenvectors 163
Now we subtract from the second equality the first equality multiplied by λ2 , and we
get:
β1 (λ1 − λ2 )v1 = 0.
As v1 ≠ 0, we have that β1 (λ1 − λ2 ) = 0, and so β1 = 0, being λ1 ≠ λ2 . By replacing
β1 = 0 in (9.1), we get that β2 v2 = 0. But v2 is an eigenvector, thus v2 ≠ 0, and it
follows that β2 = 0, as we wanted.
After k − 1 steps, we have shown that the vectors v1 , . . . , vk−1 are linearly inde-
pendent.
- Step k. We show that v1 , . . . , vk−1 , vk are linearly independent. Let
β1 , . . . , βk−1 , βk ∈ R, such that:
Applying T to both sides and taking into account that T (vi ) = λi vi we obtain:
Now we subtract from equality (9.4) equality (9.3) multiplied by λk and we get:
βk vk = 0,
therefore also βk = 0, being vk ≠ 0. So the βi are all zero, and v1 , . . . , vk are linearly
independent, as we wanted.
After n steps we get what wanted, namely that v1 , . . . , vn are linearly indepen-
dent.
⎛ −1 2 0 ⎞
A=⎜
⎜ 1 1 0 ⎟
⎟
⎝ −1 1 4 ⎠
⎛ −1 − x 2 0 ⎞
pA (x) = det ⎜ ⎟ 2
⎜ 1 1 − x 0 ⎟ = (x − 3)(4 − x) .
⎝ −1 1 4−x ⎠
√ √
x1 = 3, x2 = − 3, x3 = 4.
These roots are the eigenvalues of A. Since A has three distinct eigenvalues, by
3
the previous theorem there is a basis of R consisting of eigenvectors of A. So
we can immediately answer one of the questions: the matrix A is diagonalizable.
⎛ x ⎞ ⎛ x ⎞
A⎜ ⎟ = 4⎜
⎜ y ⎟ ⎜ y ⎟
⎟,
⎝ z ⎠ ⎝ z ⎠
⎧
⎪ −x + 2y = 4x
⎪
⎪
⎪
⎨ x + y = 4y
⎪
⎪
⎪
⎪
⎩−x + y + 4z = 4z.
We note that the matrix associated with this linear system is:
⎛ −5 2 0 ⎞
⎜
⎜ 1 −3 0 ⎟
⎟,
⎝ −1 1 0 ⎠
⎛ −1 1 0 ⎞
⎜
⎜ 0 −2 0 ⎟
⎟.
⎝ 0 0 0 ⎠
√
• We calculate the eigenspace V√3 corresponding eigenvalue 3, which consists
of the vectors (x, y, z) ∈ R such that:
3
⎛ x ⎞ √ ⎛ x ⎞
A⎜ ⎟ = 3⎜
⎜ y ⎟ ⎟.
⎜ y ⎟
⎝ z ⎠ ⎝ z ⎠
⎛ x ⎞ √ ⎛ x ⎞
A⎜ ⎟ = − 3⎜
⎜ y ⎟ ⎜ y ⎟
⎟,
⎝ z ⎠ ⎝ z ⎠
that is, the √solutions of the homogeneous linear system associated with the
matrix A + 3I: √
⎛ −1 + 3 2√ 0 ⎞
A=⎜ ⎜ 1 1+ 3 0√ ⎟ ⎟.
⎝ −1 1 4+ 3 ⎠
⎛ 4 √0 0 ⎞
⎜
D=⎜ 0 3 0√ ⎟
⎟
⎝ 0 0 − 3 ⎠
166 Introduction to Linear Algebra
−1
is similar to A, and we have that D = P AP , where
√ √
⎛ 0 −1 − 3√3 3√3 − 1 ⎞
P =⎜⎜ 0 −5 − 2 3 2 3 − 5 ⎟ ⎟.
⎝ 1 1 1 ⎠
We now return to the general theory. We know that the eigenvalues of a matrix A
are the roots of its characteristic polynomial. The fact that pA (λ) = 0 is equivalent
to saying that x − λ divides pA (x), i.e. we can write pA (x) = (x − λ)f (x), where
f (x) is a polynomial in x.
We now want to be more precise.
pA (x).
If λ is an eigenvalue of A, the dimension of Ker (A − λI) is called the geometric
multiplicity of λ.
⎛λ 0 ⋯ 0 b1 s+1 ⋯ b1 n ⎞
⎜
⎜ 0 λ ⋯ 0 b2 s+1 ⋯ b2 n ⎟ ⎟
⎜
⎜ ⎟
⎜
⎜ ⋮ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎟ ⎟
⎟
⎜
⎜ ⎟
⎜
AB = ⎜
⎜ 0 0 ⋯ λ bs s+1 ⋯ bs n ⎟ ⎟
⎟
⎟.
⎜
⎜ ⋯ bs+1 n ⎟
⎟
⎜
⎜ 0 0 ⋯ 0 bs+1 s+1 ⎟
⎟
⎜
⎜ ⎟
⎜⋮ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎟ ⎟
⎝0 0 ⋯ 0 bn s+1 ⋯ bn n ⎠
We observe now that, by Theorem 9.2.8, we have pA (x) = pAB (x), because the two
matrices A and AB are similar. Compute det(A − xI) = det(AB − xI) developing
according to the first column, and then again according to the first column of the
Eigenvalues and Eigenvectors 167
only minor of order n − 1 with nonzero determinant appearing in the formula, and
so on. We get:
det(AB − xI) =
⎛λ − x 0 ⋯ 0 b1 s+1 ⋯ b1 n ⎞
⎜
⎜ 0 λ − x ⋯ 0 b2 s+1 ⋯ b2 n ⎟⎟
⎜
⎜ ⎟
⎜
⎜ ⋮ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎟ ⎟
⎟
⎜ ⎟
det ⎜
⎜
⎜
⎜ 0 0 ⋯ λ−x bs s+1 ⋯ bs n ⎟⎟
⎟
⎟ =
⎜
⎜ ⎟
⎟
⎜
⎜ 0 0 ⋯ 0 bs+1 s+1 − x ⋯ bs+1 n ⎟⎟
⎜
⎜ ⎟
⎜ ⋮ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎟ ⎟
⎝ 0 0 ⋯ 0 bn s+1 ⋯ bn n − x⎠
⎛λ − x ⋯ 0 b2 s+1 ⋯ b2 n ⎞
⎜
⎜ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎟ ⎟
⎜
⎜ ⎟
⎜ bs n ⎟⎟
(λ − x) det ⎜ ⎟
⎜ 0 ⋯ λ−x bs s+1 ⋯ ⎟=
⎜
⎜ 0 ⋯ bs+1 n ⎟⎟
⎜
⎜ ⋯ 0 bs+1 s+1 − x ⎟
⎟
⎜
⎜ ⎟
⎜ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎟ ⎟
⎝ 0 ⋯ 0 bn s+1 ⋯ bn n − x⎠
⎛λ − x bs s+1 ⋯ bs n ⎞
⎜ 0 ⋯ bs+1 n ⎟
= (λ − x) det ⎜ ⎟
s−1
⎜ bs+1 s+1 − x ⎟=
⎜
⎜ ⋮ ⎟ ⎟
⎜ ⋮ ⋮ ⋱ ⎟
⎝ 0 bn s+1 ⋯ bn n − x⎠
The following proposition allows us to have a clear strategy for figuring out if a
certain n × n matrix is diagonalizable.
Proof. Let T be the linear application associated with A with respect to the canonical
basis. For each eigenvalue λi , we consider a basis Bi = {vi 1 , . . . , vi ni } of the eigenspace
Vλi . We then show that the union B = {v1 1 , . . . , v1 n1 , . . . , vr 1 , . . . , vr nr } of such
m
bases is a basis for R . Once shown this, by Proposition 9.2.3, we have that T is
diagonalizable, because B is a basis of eigenvectors.
Since n1 + ⋅ ⋅ ⋅ + nr = n the set B contains n vectors, so, by Proposition 4.2.6, to
n
prove that B is a basis of R , it is enough to prove that the vectors of B are linearly
independent. Suppose
λ1 1 v1 1 + ⋅ ⋅ ⋅ + λ1 n1 v1 n1 + ⋅ ⋅ ⋅ + λr 1 vr 1 + ⋅ ⋅ ⋅ + λr nr vr nr = 0. (9.5)
1 ⋅ w1 + 1 ⋅ w2 + ⋅ ⋅ ⋅ + 1 ⋅ wr = 0. (9.6)
If some of the wi were not zero, by equality (9.6) such wi would be linearly dependent,
but this contradicts the fact that eigenvectors with distinct eigenvalues are linearly
independent (Theorem 9.2.14). So wi = 0 for each i = 1, . . . , r, i.e. λi 1 vi 1 + ⋅ ⋅ ⋅ +
λi n1 vi n1 = 0, for each i = 1, . . . , r. Let us now exploit the fact that the vectors
vi 1 , . . . , vi ni are linearly independent, because they form a basis of Vλi . We get:
λi 1 = ⋅ ⋅ ⋅ = λi ni = 0. Thus, in the linear combination in the first member of equality
(9.5), the coefficients must be all zero, and this shows that the vectors of B are linearly
independent, so B is a basis of eigenvectors.
To see the reverse implication, let us assume that A is diagonalizable and let
B = {v1 , . . . , vn } be a basis of R consisting of eigenvectors of T . The matrix D
n
is the algebraic multiplicity of the eigenvalue λi , but it also is the number of distinct
eigenvectors of eigenvalue λi in B, thus mi ≤ ni = dim(Vλi ) for all i = 1, . . . , r.
Moreover, m1 + ⋅ ⋅ ⋅ + mr = n. As ni ≤ mi by Proposition 9.2.19, it follows that
ni = mi for all i = 1, . . . , r. Therefore n1 + ⋅ ⋅ ⋅ + nr = n, as we wanted to prove.
• If the eigenvalues are not distinct, for each eigenvalue λ we calculate its geo-
metric multiplicity. If the sum of all geometric multiplicities is n, this will allow
us to find n linearly independent eigenvectors and therefore a basis of V ; con-
sequently A is diagonalizable. If the sum of all geometric multiplicities is less
than n, then A is not diagonalizable.
Eigenvalues and Eigenvectors 169
Solution. We first write the matrix associated with L with respect to the canonical
basis:
1 −1
A=( ).
1 3
Now we compute the characteristic polynomial of this matrix:
1 − x −1 2 2
pA (x) = det ( ) = x − 4x + 4 = (x − 2) .
1 3−x
The characteristic polynomial has the unique root x = 2, with algebraic multiplicity
2. So L has only one eigenvalue.
We get that V2 = ⟨(−1, 1)⟩ has dimension 1. So it is not possible to find a basis of
2
R consisting of eigenvectors and so the matrix A is not diagonalizable.
Since A is not diagonalizable, no diagonal matrix D can be found that is similar to
−1 0
A. In particular, if we take for example B = ( ), certainly B is not similar to
0 5
A.
9.3.2 Let L ∶ R ⟶ R be the linear transformation defined by L(e1 ) = 8e1 + 3e2 ,
3 3
Solution. The A matrix associated with L with respect to the canonical basis is:
⎛ 8 −18 9 ⎞
A=⎜
⎜ 3 −7 3 ⎟
⎟.
⎝ 0 0 −1 ⎠
⎛ 8−x −18 9 ⎞
⎜
pA (x) = det ⎜ 3 −7 − x 3 ⎟ 2
⎟ = −(1 + x)(x − x − 2) .
⎝ 0 0 −1 − x ⎠
We now find the roots of the characteristic polynomial, i.e. the solutions of (1 +
x)(x − x − 2) = 0, that is (1 + x) (x − 2) = 0:
2 2
x1 = −1, x2 = 2.
Such roots are the eigenvalues of A. We notice that the eigenvalue −1 has algebraic
multiplicity 2 and the eigenvalue 2 has algebraic multiplicity 1.
By Observation 9.2.20, we know that the eigenspace V2 has dimension 1, while
by Proposition 9.2.19 we can only say that the dimension of V−1 is at most 2. If V−1
has dimension 2, then, by Proposition 9.2.21, we have that A is diagonalizable; if
instead V−1 has a dimension smaller than 2, then it is not possible to find a basis of
3
R consisting of eigenvectors and therefore A is not diagonalizable.
So we compute the eigenspace corresponding to the eigenvalue x = −1; it consists
of the solutions of the homogeneous linear system associated with the matrix A + I:
⎛ 8 + 1 −18 9 ⎞
⎜
⎜ 3 −7 + 1 3 ⎟
⎟.
⎝ 0 0 −1 + 1 ⎠
⎛ 1 −2 1 ⎞
⎜
⎜ 0 0 0 ⎟⎟,
⎝ 0 0 0 ⎠
⎛ 8 − 2 −18 9 ⎞
⎜
⎜ 3 −7 − 2 3 ⎟
⎟.
⎝ 0 0 −1 − 2 ⎠
Eigenvalues and Eigenvectors 171
⎛ 1 −3 1 ⎞
⎜ 0 0 3 ⎟
⎜ ⎟,
⎝ 0 0 0 ⎠
⎛ −1 0 0 ⎞
D=⎜
⎜ 0 −1 0 ⎟
⎟.
⎝ 0 0 2 ⎠
−1
A matrix P1 such that P1 AP1 = D is, for example, the basis change matrix IB1 ,C :
⎛ 2 −1 3 ⎞
⎜ 1 0 1 ⎟
P1 = ⎜ ⎟.
⎝ 0 1 0 ⎠
−
If we want another matrix P2 such that P2 1AP2 = D, we must choose another
3
ordered basis of R constisting of eigenvectors, making sure that the first two columns
consist always of the coordinates of eigenvectors with eigenvalue −1, while in third
columns we have the coordinates of an eigenvector with eigenvalue 2, for example B1 =
{(−3, 0, 3), (2, 1, 0), (−3, −1, 0}. In this case, the basis change matrix P2 = IB2 ,C is:
⎛ −3 2 −3 ⎞
⎜ 0 1 −1 ⎟
P2 = ⎜ ⎟.
⎝ 3 0 0 ⎠
⎛ −8 − x 18 2 ⎞
pA (x) = det ⎜ ⎟ 2
⎜ −3 7 − x 1 ⎟ = (1 − x)(x + x − 2) .
⎝ 0 0 1−x ⎠
We find the roots of the characteristic polynomial, i.e. the solutions of (1 − x)(x +
2
x − 2) = 0, that is (1 − x) (2 + x) = 0:
2
x1 = 1, x1 = −2
172 Introduction to Linear Algebra
and such roots are the eigenvalues of A. We note that the eigenvalue 1 has algebraic
multiplicity 2 and the eigenvalue −2 has algebraic multiplicity 1.
As in the previous exercise, we know that the V−2 eigenspace is 1, while to deter-
mine the dimension of V1 we need to explicitly compute this eigenspace.
V1 consists of the solutions of the homogeneous linear system associated with the
A − I matrix:
⎛ −8 − 1 18 2 ⎞
⎜
⎜ −3 7−1 1 ⎟ ⎟.
⎝ 0 0 1−1 ⎠
Reducing the matrix with the Gaussian algorithm we obtain
⎛ 1 −2 −1/9 ⎞
⎜ ⎟,
⎜ 0 −12 1/3 ⎟
⎝ 0 0 0 ⎠
⎛1 1 1 + k ⎞
A=⎜
⎜2 2 2 ⎟ ⎟
⎝0 0 −k ⎠
⎛1 − x 1 1+k ⎞
⎜
pA (x) = det ⎜ 2 2−x 2 ⎟
2
⎟ = (x − 3x)(−k − x).
⎝ 0 0 −k − x⎠
x1 = 0, x2 = 3, x3 = −k,
and such roots are the eigenvalues of A. If k ≠ 0 and k ≠ −3 we get that A has 3
distinct eigenvalues, so it is diagonalizable.
If k = 0 we get that
⎛1 1 1⎞
A=⎜ ⎜2 2 2⎟
⎟,
⎝0 0 0⎠
and as we have just seen, we know that A has eigenvalues 0 and −3 with algebraic
multiplicity 2 and 1, respectively. By Observation 9.2.20, we know that the eigenspace
V−3 has dimension 1, while to determine the dimension of V0 we need to explicitly
compute this eigenspace.
Eigenvalues and Eigenvectors 173
V0 consists of the solutions of the homogeneous linear system associated with the
matrix A − 0I = A, i.e. V0 = ker A. Reducing the matrix with the Gaussian algorithm
we obtain
⎛1 1 1⎞
⎜0 0 0⎟
⎜ ⎟,
⎝0 0 0⎠
so the system solutions depend on 3 − 1 = 2 parameters. So V0 has dimension 2 and
A is diagonalizable.
If k = −3 we get that
⎛1 1 −2⎞
A=⎜ ⎜2 2 2 ⎟ ⎟,
⎝0 0 3 ⎠
and as we have just seen, we know that A has eigenvalues 0 and 3 with algebraic mul-
tiplicity 1 and 2, respectively. As before, to determine if A is diagonalizable we have
to determine the dimension of the eigenspace relative to the eigenvalue of algebraic
multiplicity 2, that is V3 .
We must solve the homogeneous linear system associated with the matrix:
⎛−2 1 −2⎞
⎜ 2 −1 2 ⎟
A − 3I = ⎜ ⎟.
⎝0 0 0⎠
⎛2 −1 2⎞
⎜0 0 0⎟
⎜ ⎟,
⎝0 0 0⎠
9.4.1 Find eigenvalues and eigenvectors of the following matrices or linear transfor-
mations:
i) the matrix:
⎛ 2 1 0 ⎞
⎜
⎜ 0 1 −1 ⎟
⎟
⎝ 0 2 4 ⎠
2 2
ii) the linear transformation L ∶ R → R defined by:
3 3
iii) the linear transformation L ∶ R → R defined by:
L(x, y, z) = (x + y, x + z, y + z)
2 2
iv) the linear transformation L ∶ R → R defined by:
L(x, y) = (x − 3y, −2x + 6y)
2 2
v) the linear transformation L ∶ R → R defined by:
L(e1 ) = e1 − e2 , L(e2 ) = 2e1
9.4.2 a) Given the matrix:
⎛7 0 0⎞
A = ⎜0 7 −1⎟
⎜ ⎟
⎝0 14 −2⎠
compute its eigenvalues and eigenvectors.
′
Is A diagonalizable? If so, determine a diagonal matrix A similar to A.
b) Is it possible to find a matrix B such that AB = I (where I is the identity matrix)?
Clearly motivate the answer.
2 2
9.4.3 Determine a linear transformation T ∶ R ⟶ R that has e1 − e2 as an
eigenvector of eigenvalue 2.
9.4.4 Consider the matrix:
⎛3 2 −1⎞
⎜
A=⎜0 2 0⎟⎟.
⎝−1 −2 3 ⎠
AB associated with T with respect to the basis B (in domain and codomain).
1) e1 − e3 , T (e2 ) = ke2 + (k + 1)e3 , T (e3 ) = ke3 and let A be the matrix associated
with T with respect to the canonical basis.
a) Determine for which values of k we have that T is diagonalizable.
b) Determine for which values of k we have that 2e1 − 2e3 is an eigenvector of T .
b) For the values of k found in point a) determine, if possible, two distinct diagonal
−1
matrices D1 and D2 similar to A. Also determine a matrix P such that P AP = D1 .
(2x + 2y + z, 2x − y − 2z, kz), and let A be the matrix associated with F with respect
to the canonical basis.
a) Determine for which values of k we have that F is diagonalizable.
b) Choose any value of k for which F is diagonalizable and determine all the diagonal
matrices D which are similar to A.
Scalar Products
In the definition of vector space, we have the two operations of sum of vectors and
multiplication of a vector by a scalar (see Chapter 2). In this chapter, we want to
introduce a new operation: the scalar product of two vectors. The result of this
operation is a scalar, that is a real number. In addition to its vast importance in the
applications to physics, we will see how the scalar product is essential in linear algebra
for the solution of the problem of diagonalization of symmetric matrices, which we
will discuss later.
In other words, for any fixed vector u ∈ V the functions g(u, ⋅) ∶ V ⟶ R and
g(⋅, u) ∶ V ⟶ R are linear applications, hence the term bilinear.
g is called symmetric if g(u, v) = g(v, u) for every u, v ∈ V . A symmetric bilinear
form is called a scalar product on V and will be denoted with < , >.
We shall return to the definition of scalar product in Section 10.4, where we examine
it in more detail.
177
178 Introduction to Linear Algebra
In a completely similar way, we can also verify the property (2). It is therefore a
bilinear form. Note, however, that g(e1 , e2 ) = 2 while g(e2 , e1 ) = 0, so g is not a
scalar product.
g((x1 , . . . , xn ), (y1 , . . . , yn )) = x1 y1 + ⋅ ⋅ ⋅ + xn yn ,
We leave to the reader to verify the properties (1) and (2) of Definition 10.1.1. So,
we have a bilinear form. Since g(u, v) = g(v, u), g is a scalar product. This scalar
n
product on R is called Euclidean product or standard product. We shall denote this
n
product between two vectors u, v ∈ R as < u, v >e or also as u ⋅ v.
g(vi , vj ) = cij .
u = α1 v1 + ⋅ ⋅ ⋅ + αn vn , w = β1 v1 + ⋅ ⋅ ⋅ + βn vn . (10.1)
n
where we used the symbol ∑i,j=1 to indicate the sum for all possible i, j = 1, . . . n. In
full:
g(u, w) = α1 β1 c11 + α1 β2 c12 + ⋅ ⋅ ⋅ +
We must now verify that it is a bilinear application, that is, that it satisfies the
properties of Definition 10.1.1. We check the first of the conditions in (1) leaving the
others by exercise.
Consider the three vectors in V :
′ ′ ′
u = α1 v1 + ⋅ ⋅ ⋅ + αn vn , u = α1 v1 + ⋅ ⋅ ⋅ + αn vn , w = β1 v1 + ⋅ ⋅ ⋅ + βn vn .
i,j=1
n n
′
= ∑ αi βj cij + ∑ αi βj cij =
i,j=1 i,j=1
′
= g(u, w) + g(u , w).
g̃(u, w) = g̃(α1 v1 + ⋅ ⋅ ⋅ + αn vn , β1 v1 + ⋅ ⋅ ⋅ + βn vn ) =
Since g̃(vi , vj ) = g(vi , vj ) by the very definition of g̃ (see (10.2)), we get g̃(u, w) =
g(u, w)
of matrices (rows by columns) is the product of the transposed matrices, with the
factors order reversed.
In formulas, if A ∈ Mm,r (R), B ∈ Mr,n (R), then:
T T
(AB) = B A . (10.3)
Let us now continue our discussion on the one-to-one correspondence between bilinear
forms and matrices, once fixed a basis of the given vector space.
⎛ g(v1 , v1 ) . . . g(v1 , vn ) ⎞
C=⎜
⎜ ⋮ ⋮ ⎟
⎟.
⎝g(vn , v1 ) . . . g(vn , vn )⎠
is associated with the matrix C ∈ Mn (R) where (u)B denotes the coordinate
column of the vector u relative to the basis B.
Scalar products, i.e. symmetric bilinear forms, correspond to the symmetric matrices
in Mn (R).
Proof. The first point of this correspondence is a direct consequence of the previous
proposition: to each bilinear application we can associate n scalars g(vi , vj ).
2
Now let us see the second point, that is how to associate a bilinear application directly
to a matrix.
We define
⎛ c11 . . . c1n ⎞ ⎛ y1 ⎞
g(u, v) = (u)B C (v)B = (x1 . . . xn ) ⎜ ⋮ ⎟⎟⎜⎜⋮⎟
T
⎜ ⋮ ⎟,
⎝cn1 . . . cnn yn ⎠
⎠ ⎝
where
⎛ x1 ⎞ ⎛ y1 ⎞
(u)B = ⎜
⎜⋮⎟ ⎟, (v)B = ⎜
⎜⋮⎟ ⎟
⎝xn ⎠ ⎝yn ⎠
are the coordinates of u and v with respect to the basis B. It is immediate to verify
that
g(vi , vj ) = cij ,
Scalar Products 181
′
= g(u, v) + g(u , v).
We note now that g is symmetric if and only if the corresponding matrix C = (cij )
T
is symmetric, i.e. C = C .
If g is symmetric then cij = g(vi , vj ) = g(vj , vi ) = cji for every i, j = 1, . . . , n , so C
is symmetric.
T
Conversely, suppose that C is symmetric, that is, C = C . We observe that for every
T
u, v ∈ V we have that g(u, v) = g(u, v), since g(u, v) ∈ R.
For each u, v ∈ V by (10.3), we have therefore that:
T T T
g(u, v) = (u)B C (v)B = ((u)B C (v)B )
T T T
= (v)B C (u)B = (v)B C (u)B = g(v, u).
This shows that g is symmetric.
3 1
C=( ),
1 2
2
is associated, with respect to the canonical basis of R , to the scalar product <
(x1 , x2 ), (y1 , y2 ) >= 3x1 y1 + x1 y2 + x2 y1 2 + x2 y2 . Indeed:
3 1 y1
(x1 x2 ) ( ) ( ) = 3x1 y1 + x1 y2 + x2 y1 + 2x2 y2 .
1 2 y2
similar: the matrix associated to the given bilinear form can take very different forms
and yet the bilinear form does not change.
Let IB,C be the basis change matrix; it is the matrix associated with the identity
map, where we have fixed an ordered basis B in the domain and the canonical basis
n
C in the codomain. We then have that, for each vector u ∈ R :
where (u)B denotes the column of coordinates of the vector u with respect to the
ordered basis B.
Assume that C is the matrix associated with a given bilinear form, with respect to
n ′
the canonical basis of R . We want to determine the matrix C associated with the
same bilinear form with respect to the basis B. Let us replace (u)C , (v)C using (10.3):
T T T T
< u, v >= (u)C C(v)C = (IBC (u)B ) C(IBC (v)B ) = (u)B IBC CIBC (v)B .
Example 10.3.1 Consider the scalar product < , > associated to the matrix
2 1
C=( )
1 2
with respect to the the canonical basis. In particular, notice that scalar products are
bilinear forms.
Suppose we choose B = {v1 = e1 , v2 = e1 + e2 } as a basis, therefore:
1 1
IBC = ( ).
0 1
′
The matrix C associated with the same scalar product with respect to the basis B
is:
1 0 2 1 1 1 2 3
C =( )( )( )=( ).
′
1 1 1 2 0 1 3 3
Scalar Products 183
Consider the scalar product of v1 and v2 , i.e. < e1 , e1 +e2 >. Using first the canonical
basis and then the basis B, we verify that the result is the same:
2 1 1
< e1 , e1 + e2 >= (1 0) ( )( ) = 3
1 2 1
2 3 0
< e1 , e1 + e2 >= (1 0) ( ) ( ) = 3.
3 3 1
Observation 10.3.2 It is useful to compare the basis change formula for a linear
application and that of the basis change for a bilinear form.
• Two matrices A and B represent the same linear application (with respect to
different bases) if and only if they are similar, that is, there is an invertible
−1
matrix P such that B = P AP .
• Two matrices A and B represent the same bilinear form (with respect to dif-
T
ferent bases) if and only if an invertible P matrix exists such that B = P AP .
T −1
It is clear by looking at these two formulas that matrices with the property P = P ,
i.e. such that their transpose coincides with their inverse, are of particular importance.
We will do a more detailed study of these matrices and their properties in later
sections.
Example 10.4.2 We can immediately verify that the standard scalar product or
n
Euclidean scalar product in R , defined in the Example 10.1.3 given by:
Definition 10.4.3 Let V be a vector space with a scalar product. We say that
u, v ∈ V are perpendicular (orthogonal) to each other if < u, v >= 0. We will also
use the u ⊥ v notation to indicate two vectors perpendicular to each other.
So we can reformulate the notions defined above in the following way:
A scalar product is non-degenerate if and only if there is no nonzero vector perpendic-
ular to all the others. Furthermore, a positive definite scalar product is automatically
not degenerate: in fact, being positive definite implies that there is no vector orthog-
onal to itself, while being degenerate requires that such a vector exists.
Observation 10.4.4 1. The notion of orthogonality depends on the scalar product
2
chosen. For example, in R we consider the two scalar products:
We see that the vectors e1 and e2 are perpendicular with respect to the Euclidean
product <, >e , but not with respect to the other scalar product, in fact:
< (a, b), (1, 0) >m = a = 0 < (a, b), (0, 1) >m = −b = 0.
Proposition 10.5.1 Let W ⊆ V be a vector subspace of V and < , > a scalar product
on V . Then the set:
is a vector subspace of V .
Proof. Let us check the three properties of Definition 10.4.1. We immediately see
⊥
that 0V ∈ W . Indeed, by property (2) of Definition 10.4.1:
Definition 10.5.2 Given a vector subspace W of V and a scalar product < , > on
V , the orthogonal subspace to W is:
⊥
In fact, if u ∈ W certainly < u, wi >= 0, but the converse is also true. In fact, let
u ∈ V , such that < u, wi >= 0. If w = λ1 w1 + ⋅ ⋅ ⋅ + λn wn ∈ W , then
< u, w > =< u, λ1 w1 + ⋅ ⋅ ⋅ + λn wn >=
Thanks to Observation 10.5.3, we are able to determine the dimension of the subspace
n
orthogonal to W ⊆ R with respect to the Euclidean scalar product.
n ⊥
Proposition 10.5.5 Let W be a vector subspace of R and let W be the subspace
orthogonal to W with respect to the Euclidean scalar product. Then:
dim(W ) + dim(W ) = n.
⊥
Proof. Let B = {v1 , . . . , vm } be an ordered basis of W and let A = (aij ) ∈ Mm,n (R)
be the matrix having the coordinates of the vectors v1 , . . . , vm as rows. By Observa-
tion 10.5.3, we have that (x1 , . . . , xn ) ∈ W if and only if:
⊥
We know that the dimension of the image of LA is the rank of A, which is the
dimension of the subspace generated by its columns or equivalently by its rows. Since
the rows of A are given by the coordinates of the vectors of a basis of W , this
dimension is just dim(W ). Therefore:
dim(W ) = n − dim(W ).
⊥
Scalar Products 187
Definition 10.6.1 Let V be a vector space with a scalar product < , >, which is
positive definite and B = {u1 , . . . , un } a basis of V . We say that B is an orthogonal
basis if ui ⊥ uj for every i, j = 1, . . . , n and that B is an orthonormal basis if:
1 for i = j
< ui , uj >= { ,
0 for i ≠ j
that is, if it is an orthogonal basis and each vector of the basis has norm equal to 1.
where the function δij is called Kronecker delta and is defined by:
1 for i = j
∆ij = { .
0 for i ≠ j
Definition 10.6.2 Let V be a vector space with a positive definite scalar product,
and let u, v. The orthogonal projection of the vector v on the vector u is given by
the vector:
< v, u >
proju (v) = < u, u > u.
v
BB
BB
u2 u1 = u
BB
BM
B 1
B B
B 1B
B
B proju v
B
B
B
188 Introduction to Linear Algebra
From the figure, where we chose the Euclidean product, we can see and verify with
an easy calculation in the general setting, that if {u, v} is a basis of R , {u1 = u, u2 =
2
v − proju1 (v)} is an orthogonal basis of R . This process can be iterated and allows
2
⋮ ⋮
1
uk = vk − proju1 (vk ) ⋅ ⋅ ⋅ − projuk−1 (vk ), fk = uk ,
∥uk ∥
√
where ∥ui ∥ = < ui , ui >.
With the procedure described above, called Gram-Schmidt algorithm, we immediately
obtained a set of mutually orthogonal vectors.
Proof. The fact that the vectors f1 , . . . , fn are mutually orthogonal and their norm is
equal to 1 is easy to verify. Since the dimension of V is n, in order to show f1 , . . . , fn
is a basis, it is enough to verify linear independence. If
λ1 f1 + ⋅ ⋅ ⋅ + λn fn = 0,
then:
< fi , λ1 f1 + ⋅ ⋅ ⋅ + λn fn >= λi = 0 for all i = 1, . . . , n.
So f1 , . . . , fn are linearly independent and form an orthonormal basis of V .
= α1 β1 + ⋅ ⋅ ⋅ + αn βn ,
where we used the bilinearity properties of the scalar product and the fact that
< vi , vj >= δij . Hence, if we choose an orthonormal basis B, the matrix associated
to the given positive definite scalar product is the identity, just as it happens for
n
the standard scalar product in R and the canonical basis, and the scalar product of
two vectors v and w coincides with the Euclidian product of their coordinates with
respect to the basis B.
1 −2
C=( ).
−2 0
′
It is a scalar product since the matrix is symmetric. The matrix C associated with
the same scalar product with respect to the basis B is:
T
1 2 1 −2 1 2 −3 0
C =( ) ( )( )=( ).
′
1 −1 −2 0 1 −1 0 12
3
10.7.2 Consider the vector subspace W of R generated by the vectors w1 = e1 +
⊥
e2 − 3e3 and w2 = −e1 + 2e2 − 3e3 . Determine a basis for W , computed with respect
3
to the Euclidean scalar product in R . Also determine an orthonormal basis for W
⊥
and an orthonormal basis for W .
consists of the vectors (x, y, z) ∈ R , such that:
⊥ 3
Solution. W
x + y − 3z = 0
{
−x + 2y − 3z = 0.
190 Introduction to Linear Algebra
We immediately get that W = {(z, 2z, z)∣z ∈ R}, therefore a basis for W is given
⊥ ⊥
√ √ √
u2 = w2 − <w2 ,u1 >
u ,
<u1 ,u1 > 1
f2 = 1
u
∥u2 ∥ 2
= (−7/ 66, 2 2/33, −1/ 66).
An orthonormal basis for W is obtained by taking a generator, for example (1, 2, 1),
⊥
Solution. We observe that we can write the equations that define W in the following
way:
(1, 1, 1, −1) ⋅ (x, y, z, t) = 0, (1, 2, −1, 1) ⋅ (x, y, z, t) = 0.
Therefore the vectors (1, 1, 1, −1) and (1, 2, −1, 1) belong to W as they are perpen-
⊥
3x2 y2 .
a) Write the matrix associated with it with respect to the canonical basis.
b) Write the matrix associated with it with respect to the basis B = {v1 = e1 +
e2 , v2 = −2e2 }.
4
10.8.2 Let W be the vector subspace of R defined from the equation x+y+2z−t = 0
4
and consider the Euclidian scalar product of R .
⊥
a) Determine a basis for W .
4
10.8.3 Let W be the vector subspace of R generated by the vectors e1 + e4 , e2 −
2e3 + e4 .
⊥
a) Determine a basis for W .
b) Determine an orthogonal basis of W .
4
10.8.4 Let W be the vector subspace of R defined by the following equations:
⎧
⎪ x+y+z =0
⎪
⎪
⎪
⎨ −x + y + z + w = 0
⎪
⎪
⎪
⎪ 1
⎩y + z + 2 w = 0.
4
Calculate an orthonormal basis for W relative to the Euclidean scalar product of R .
10.8.5 Let W be the vector subspace of R generated by the vectors (1, 2, −1),
3
d) Determine the matrix associated with the scalar product of point (c) with respect
to the ordered basis
B = {(1, 1, 1, −1), (0, 1, 0, 1), (0, 0, 1, 0), (1, 0, 1, 2)}.
192 Introduction to Linear Algebra
Spectral Theorem
The Spectral Theorem represents one of the most important results of elementary
linear algebra. In Chapter 9, we examined the problem of calculating eigenvalues and
eigenvectors and the question of diagonalizability for a square matrix A of order n
with real entries. We have seen that it is not always possible to find a diagonal matrix
similar to the given matrix, because sometimes we do not have a basis of eigenvectors
n
of A for the space R . However, if the matrix A is symmetric, i.e. it coincides with its
transpose, the Spectral Theorem guarantees that it is diagonalizable. Furthermore,
not only is A similar to a diagonal matrix, but, if we denote with < , > the scalar
product associated with it, there is a basis consisting of eigenvectors of A such that
the matrix associated with < , > with respect to this basis is diagonal.
In order to prove all these results, it is necessary to introduce the concepts of orthog-
onal linear transformation and symmetric linear transformation.
In other words, the linear transformation U preserves the scalar product given in V .
Proposition 11.1.2 Let V be a real vector space of dimension n with a positive
definite scalar product <, >. Let U ∶ V ⟶ V be a linear map and let u, v ∈ V . The
following statements are equivalent.
1. U is orthogonal, i.e. < U (u), U (v) >=< u, v > for each u, v ∈ V .
2. U preserves the norm, i.e.: < U (u), U (u) >=< u, u > for each u ∈ V .
3. If B = {v1 , . . . , vn } is an orthonormal basis of V (with respect to the scalar
product <, >) then {U (v1 ), . . . , U (vn )} is also an orthonormal basis of V .
193
194 Introduction to Linear Algebra
Thus:
< U (u), U (u) > −2 < U (u), U (v) > + < U (v), U (v) >=
2 2
= α1 < v1 , v1 > +α1 α2 < v1 , v2 > + ⋅ ⋅ ⋅ + αn < vn , vn >
using the bilinearity of the scalar product and the linearity of U . We recall that by
hypothesis also {U (v1 ), . . . U (vn )} is an orthonormal basis, i.e. < U (vi ), U (vj ) >=
δij , we get:
2 2
< U (u), U (u) >= α1 + ⋅ ⋅ ⋅ + αn
So, by (11.1), we have < U (u), U (u) >=< u, u >.
This concludes the proof.
n
Let us consider the vector space R with the standard scalar product < , >e :
⎛ β1 ⎞
⎜⋮⎟
< u, v >e = (α1 . . . αn ) ⎜ ⎟ = α1 β1 + ⋅ ⋅ ⋅ + αn βn (11.2)
⎝βn ⎠
where we denote with Au the product rows by columns of the matrix A by the column
vector u.
linear transformation associated with it, if we fix the canonical bases in domain and
codomain; if u ∈ R , LA (u) = Au.
n
Proof. The equivalence between (a) and (b) is immediate by Proposition 11.1.2.
Before proceeding with the proof, we recall that the standard scalar product of two
column vectors u = (α1 . . . αn ) , v = (β1 . . . βn ) , can be expressed as the product
T T
rows by columns:
⎛ β1 ⎞
u v = (α1 . . . αn ) ⎜⎜⋮⎟
T
⎟
⎝βn ⎠
So we have that:
T T T
< Au, Av >e = (Au) (Av) = u (A A)v (11.3)
196 Introduction to Linear Algebra
T T T T
< Au, Av >e = (Au) (Av) = u (A A)v = u v =< u, v >e ,
as we wanted to prove.
(c) ⟺ (d). If v1 , . . . , vn are the column vectors of A the equation A A = I is
T
equivalent to:
vi ⋅ vj = 0, i ≠ j, vi ⋅ vi = 1.
that is, it is equivalent to the fact that these columns are orthonormal vectors.
T
For the row vectors we argue similarly. The equation A A = I is equivalent to the
fact that the rows of A are orthonormal vectors. We conclude by remembering that
n
A is an invertible matrix if and only if its rows (columns) form a basis of R .
T
Remark 11.2.4 We observe that in the previous proposition, condition (c), AA =
T
I = A A, is equivalent to only one of the two equalities. In fact, in general, given
X, Y ∈ Mn (R), if XY = I then, by Binet theorem, det(XY ) = det(X) det(Y ) = 1.
Hence both matrices are invertible and they are one the inverse of the other (see
Observation 7.4.3 of Chapter 7).
Similarly in part (d), the two conditions “ the columns of A form a basis” and “ the
rows of A form a basis” are equivalent. This is a consequence of the general fact that
the row rank of a matrix is equal to its column rank.
Because of the previous remark and the above discussion we have the following propo-
sition.
Spectral Theorem 197
Proposition 11.2.5 Let V be a real vector space of finite dimension with a positive
definite scalar product. Let U ∶ V ⟶ V be a linear map and let A be the matrix
associated with it, with respect to an orthonormal B basis. Then U is orthogonal if
and only if the matrix A is orthogonal.
Let us see some examples of orthogonal matrices. We leave to the reader the easy
−1 T
verification that A = A .
2
Example 11.2.7 1. Rotations in R :
cos(t) sin(t)
A=( )
−sin(t) cos(t)
As the reader can easily verify, this matrix is associated with a rotation of the plane
by an angle t, centered on the origin of the Cartesian axes.
2
2. Reflection with respect to the line x = y in R :
0 1
A=( )
1 0
As the reader can easily verify, this matrix it is associated with a reflection, i.e. the
points that belong to the line x = y are fixed, and every other point of the plane is
sent to the his symmetric with respect to that line.
3
3. Rotations in R . Let us consider a linear transformation that rotates any vector
applied in the origin around a given axis passing through the origin, of a certain fixed
angle t. Such a linear transformation preserves the Euclidean norm and therefore
is an orthogonal transformation (with respect to the Euclidean scalar product). A
3
famous theorem by Euler states that every orthogonal transformation of R whose
matrix has determinant equal to 1 is of this type.
198 Introduction to Linear Algebra
Definition 11.3.1 Let V be a real vector space with a positive definite scalar product
< , >. We say that the linear transformation T ∶ V ⟶ V is symmetric if
0⎞ ⎛ ⎞
0
⎛1 0 . . . ⎜ ⋮⎟
⎜0 1 . . . 0⎟ ⎜
⎜ ⎟
⎟
< T (ei ), ej >e = (Aei ) ej = (a1i . . .
T
ani ) ⎜
⎜
⎜
⎜
⎟
⎟
⎟
⎟
⎜
⎜
⎜
⎜1 ⎟
⎟
⎟
⎟ = aji
⎜⋮ ⋮⎟ ⎜
⎜ ⎟
⎟
⎝0 ⎜⋮⎟
... 1⎠ ⎝ ⎠
0
⎛1 0 . . . 0⎞ ⎛ a1j ⎞
⎜0 1 . . . 0⎟
⎟⎜ a2j ⎟
< ei , T (ej ) >e = = (0 . . . 1 . . . 0) ⎜
⎜ ⎟ ⎜
⎜ ⎟
⎟
T
ei Aej ⎜
⎜ ⋮⎟
⎟ ⎜ ⎟ = aij
⎜⋮ ⎟⎜ ⋮ ⎟
⎜ ⎟
⎝0 ... 1⎠ ⎝anj ⎠
and therefore by the condition (11.4), we have aij = aji , that is, the matrix A is
symmetric.
The same happens for the general case: every symmetric linear map is associated to
a symmetric matrix, if we fix an orthonormal basis in domain and codomain.
Spectral Theorem 199
Proposition 11.3.2 Let V be a real vector space, with a positive definite scalar
product < , > and let B = {v1 , . . . , vn } be an orthonormal basis. Let T ∶ V ⟶ V be
a linear map and let A = (aij ) be the matrix associated with T with respect to the
basis B. Then T is symmetric if and only if the matrix A is symmetric.
Proof. We first prove that the matrix associated to the symmetric linear map T with
respect to the basis B is symmetric. We have:
from which:
< T (vj ), vi > = a1j < v1 , vi > + ⋅ ⋅ ⋅ + aij < vi , vi > + . . .
proof.
Remark 11.3.3 Note that in the preliminary observations to the previous proposi-
tion we have proved that if A is a n × n matrix with real entries then:
T
< Au, v >e =< u, A v >e
n
where < , >e is the Euclidean scalar product and u, v ∈ R are column vectors.
We observe that once proven that a matrix A with real entries admits an eigenvalue
n
λ ∈ R, we immediately have that the eigenspace Vλ = ker(A − λI) ⊆ R contains
a nonzero vector and therefore there exists also a real eigenvector relative to the
eigenvalue λ.
Let us now establish another result that will be fundamental in the proof of the
spectral theorem.
0 = λ < u, w >e =< λu, w >e =< Au, w >e =< u, Aw >e ,
Let us summarize with a corollary what we proved for symmetric matrices in terms
of symmetric linear transformations.
Corollary 11.4.4 Let V be a real vector space of finite dimension with a positive
definite scalar product, and let T ∶ V ⟶ V be a symmetric linear map. Then:
Proof. Let B be an orthonormal basis for the positive definite scalar product. Such
a basis exists thanks to the Gram-Schmidt algorithm. By Proposition 11.3.2, the
matrix A associated to T with respect to the basis B is symmetric. Therefore, the
statements of the corollary immediately follow from Lemmas 11.4.1, 11.4.2 and from
Corollary 11.4.3.
Spectral Theorem 201
We can finally state the spectral theorem for real symmetric matrices and symmetric
linear maps at the same time.
Theorem 11.4.5 Let V be a real vector space of dimension n with a positive definite
scalar product. Let T ∶ V ⟶ V be a symmetric linear transformation and let A ∈
Mn (R) be the symmetric matrix associated with T with respect to an orthonormal
basis B.
Then:
• T is diagonalizable, and there exists an orthonormal basis N of eigenvectors of
T.
• A is diagonalizable by an orthogonal matrix, that is, there exists an orthogonal
−1
matrix P , such that D = P AP is diagonal.
Before starting with the proof, we observe that the two statements of the theorem are
completely equivalent. The orthonormal basis N is a basis of mutually perpendicular
eigenvectors of T that have norm 1. The existence of this basis of eigenvectors is
−1
equivalent to the existence of an orthogonal matrix P , such that P AP is diagonal.
This matrix has as columns the coordinates of the eigenvectors with respect to the
basis B.
Proof. Let λ1 be a real eigenvalue of T , and let u1 ∈ V be an eigenvector of norm 1 of
eigenvalue λ1 . We know that such λ1 and u1 exist by Lemma 11.4.1. Let W1 = ⟨u1 ⟩ .
⊥
Then, we have dim(W1 ) = n − 1. Let us now consider the linear map T1 = T ∣W1 ,
that is, let us look at the restriction of T1 to the subspace W1 , then T1 ∶ W1 ⟶ V .
For Lemma 11.4.2, since u is also perpendicular to T1 (w) for each w ∈ W1 , we have
Im(T1 ) ⊆ W1 = ⟨u1 ⟩ , therefore we can write T1 ∶ W1 ⟶ W1 . Let us now repeat all
⊥
The matrix Q, which has as columns the eigenvectors of the basis {v1 , v2 , v3 }, diag-
onalizes the matrix A, however it is not orthogonal:
⎛7 0 0 ⎞ ⎛1 −1/2 −1 ⎞
D=⎜
⎜0 7 0 ⎟ Q=⎜ −1/2⎟
−1
⎟ = Q AQ, ⎜0 1 ⎟.
⎝0 0 −2⎠ ⎝1 0 1 ⎠
T (e2 ) = 2e1 +e2 . Check that this is a symmetric linear transformation and determine
an orthonormal basis B with respect to which T is associated with a diagonal matrix.
Solution. The matrix associated with T with respect to the canonical basis (in domain
and codomain) is:
1 2
A=( ).
2 1
Since A is a symmetric matrix, T is a symmetric linear transformation.
The eigenvalues of A and T are: −1, 3. We compute the eigenspaces:
(see Definition 10.6.2). Verify that this is a symmetric linear transformation and
determine an orthonormal basis B with respect to which proju is associated to a
diagonal matrix D. Then write D explicitly.
Spectral Theorem 203
Solution. We determine the A matrix associated with proju with respect to the canon-
ical basis (in domain and codomain). We have:
proju (e1 ) = <e 1 ,u>e
<u,u>
(e1 − 2e3 ) = 15 (e1 − 2e3 ) = 15 e1 − 52 e3
e
proju (e2 ) = <e2 ,u>e
<u,u>e
(e1 − 2e3 ) = 0(e1 − 2e3 ) = 0
proju (e3 ) = <e3 ,u>e
<u,u>
(e1 − 2e3 ) = − 25 (e1 − 2e3 ) = − 52 e1 + 45 e3 .
e
Therefore:
1
⎛ 5 0 − 25 ⎞
A=⎜
⎜ 0 0 0 ⎟ ⎟.
⎝− 2 0 54 ⎠
5
Since A is a symmetric matrix, proju is a symmetric linear transformation.
We want to determine an orthonormal basis B with respect to which proju is as-
sociated with a diagonal matrix. We can proceed as in the previous exercise or
observe that u is an eigenvector of proju relative to the eigenvalue 1, in fact:
proju (u) = <u,u>
<u,u>e
u = u. Moreover, if v is any nonzero vector orthogonal to u,
e
i.e. < v, u >e = 0, we have that proju (v) = 0u, so v is an eigenvector of proju relative
to the eigenvalue 0.
Let us now consider an orthonormal basis B obtained by applying the Gram-Schmidt
algorithm to basis {u, e1 , e2 }.
Using the notation of Theorem 10.6.3, we have:
1 1 2
u1 = u, f1 = u = √ e1 − √ e3
∥u∥ 5 5
4 2 1 2 1
u2 = e1 − proju (e1 ) = e1 + e3 , f2 = u2 = √ e1 + √ e3
5 5 ∥u2 ∥ 5 5
1
u3 = e2 − proju (e2 ) − proju2 (e2 ) = e2 , f3 = u3 = e2 .
∥u3 ∥
The vector f1 is a multiple of u so it is an eigenvector of proju of eigenvalue 1, the
vectors f2 , f3 are perpendicular to u by construction, so they are eigenvectors of proju
of eigenvalue 0. A basis with the required properties is: B = {f1 , f2 , f3 } and the matrix
D is:
⎛1 0 0⎞
D=⎜ ⎜0 0 0⎟ ⎟.
⎝0 0 0⎠
⎛3 1 1⎞ 2 −1 ⎛3 2 0⎞ ⎛2 0 0⎞
⎜1 3 1⎟
⎜ ⎟, ( ), ⎜2 6 0⎟
⎜ ⎟, ⎜
⎜0 1 −2⎟
⎟.
⎝1 1 3⎠ −1 1 ⎝0 0 3⎠ ⎝0 −2 4 ⎠
2 2
11.6.5 Determine the linear transformation proju ∶ R → R associating to each
vector its projection on the vector u = e1 −e2 . Write explicitly the matrix T associated
to it with respect to the canonical basis. Say if it is a symmetric and/or orthogonal
linear transformation. Determine (if possible) a diagonal matrix similar to T .
2 2
11.6.6 Let proju ∶ R → R be the linear transformation associating to any vector
its projection on the vector u = 3e1 − 4e2 . Let A be the matrix associated with it
with respect to the canonical basis. Determine if there is an orthogonal matrix P ,
0 0
such that P AP = ( ).
−1
0 1
The difference between a hermitian product and a scalar product is that for a her-
mitian product we require the linearity of the function < u, ⋅ >∶ V ⟶ C, but
the antilinearity of the function < ⋅, u >∶ V ⟶ C, i.e. for u ∈ V fixed we have
< u, µv >= µ < u, v >.
We are particularly interested in the following example.
n
Example 11.7.2 In the complex vector space C we define:
We leave to the reader to verify this is an hermitian product. This product is called
n
the standard hermitian product in C . It is immediate to verify that it is positive
definite, indeed:
2 2
< (x1 , . . . , xn ), (x1 , . . . , xn ) >h = x1 x1 + ⋅ ⋅ ⋅ + xn xn = ∣x1 ∣ + ⋅ ⋅ ⋅ + ∣xn ∣ ≥ 0,
having 1 in the i-th position and 0 elsewhere. We observe that, as we did for scalar
n
products, to each hermitian product < , > in C we can associate a C matrix, with
cij =< ei , ej >, such that
′
⎛ x1 ⎞
< (x1 , . . . , xn ), (x1 , . . . , xn ) >= (x1 , . . . , xn )C ⎜
⎜ ⋮, ⎟
′ ′
⎟.
⎝xn ′ ⎠
In the case of the standard hermitian product, the matrix associated with it is the
identity matrix I since < ei , ej >h = δij . It is not difficult to prove, in complete
analogy with the case of the scalar products, that a matrix C is associated with
T
a hermitian product if and only if C = C , that is, it coincides with its complex
conjugate transpose.
n
In other words, the hermitian product of vectors in R coincides with the usual
n
Euclidean product in R .
If A is a matrix with real entries, we have that:
T n
< Au, v >h =< u, A v >h , for each u, v ∈ C . (11.7)
In fact, (11.7) is true for u = ei , v = ej (where the ei are the vectors of the canonical
basis), therefore, by the linearity and antilinearity of the hermitian product, it is not
n
difficult to verify that it is true also for generic vectors u, v ∈ C .
Proof. A is a matrix with real entries, however, as real numbers are contained in
the complex field we also have that A ∈ Mn (C). By the Fundamental Theorem of
algebra (see Appendix A), the characteristic polynomial of A, det(A − λI), is equal
to zero for at least one complex number, λ0 ∈ C. We want to prove that λ0 is real,
n
that is λ0 = λ0 . Let u ∈ C be an eigenvector of eigenvalue λ0 , and let < , >h be the
n
standard hermitian product in C , i.e .:
T n
< u, v >h = u v, for all u, v ∈ C ,
We have:
T
since A = A and, because A is a matrix with real entries, A = A. Therefore,
Hence:
(λ0 − λ0 ) < u, u >h = 0.
From the fact that u ≠ 0, since it is an eigenvector, it follows that < u, u >h ≠ 0, then
λ0 = λ0 .
In the rest of this appendix, we want to revisit the results we have stated in this
chapter for a real vector space with a positive definite scalar product, for the case
of a complex vector space with a positive definite hermitian product. As we will see
Spectral Theorem 207
all the main theorems, including the Spectral Theorem, have statements and proofs
similar to those seen, which we will therefore leave by exercise.
Reading this part is not necessary for understanding the real case; we include it for
completeness.
In analogy with the real case, we can give the following definitions. In the complex
case, the notions unitary and hermitian linear maps or matrices replace the corre-
sponding notions of real symmetric and orthogonal ones, respectively.
Definition 11.7.5 Let V be a complex vector space with a positive definite hermitian
product < , >h . We say that a linear transformation T ∶ V ⟶ V is unitary if
−1 ∗ ∗
We say that a complex coefficient matrix A is unitary if A = A , where A = AT ;
∗
instead we say it is hermitian if A = A , that is, A coincides with its transpose
complex conjugate. We note that these two operations, that is, transposition of A and
conjugation of each entry of A, can be interchanged; that is, the result is independent
of which one we choose to do first.
It is easy to verify that the hermitian condition on A corresponds to the fact that
n
with respect to the hermitian scalar product in C we have:
Also note that if A is a real symmetric matrix we have immediately that it is also an
∗
hermitian matrix. In fact, it satisfies the condition A = A , as the conjugate complex
of a number real is the real number itself.
We can state the analogue of Proposition 11.7.6, whose proof is the same as in the
real case.
Given a complex vector space V with an hermitian scalar product < , >h , we say that
u, v ∈ V are perpendicular (orthogonal) if < u, v >h = 0.
If V is finite dimensional and the hermitian scalar product is positive definite, with
the same calculations as in Section 10.6 it can be proved that there exists a basis B
of V consisting of vectors of norm 1 such that any two of them are orthogonal, where
√
the norm of a vector u is defined as ∥u∥ = < u, u >h .
Proposition 11.7.6 Let V be a complex vector space of finite dimension, with a pos-
itive definite hermitian product < , >h . Let T ∶ V ⟶ V be a linear transformation,
and let B be an orthonormal basis. Then:
1. T is hermitian if and only if its associated matrix with respect to the basis B is
an hermitian matrix.
208 Introduction to Linear Algebra
2. T is unitary if and only if its associated matrix with respect to the basis B is a
unitary matrix.
Similarly to the real case, we can also state and prove the following results.
Finally, we can state the Spectral Theorem for hermitian linear maps and equivalently
for hermitian matrices. The proof is the same as the one for the real case.
Applications of Spectral
Theorem and Quadratic
Forms
In this chapter, we want to study some consequences of the Spectral Theorem for
scalar products and quadratic forms associated with them.
where IB,A is the basis change matrix between the bases B and A.
Now suppose we have a vector space V with a positive definite scalar product <, >V .
As we saw in Observation 10.6.4, if we fix an orthonormal basis, <, >V is associated
with the identity matrix. Therefore, if we write the vectors of V using the coordinates
n
with respect to the chosen orthonormal basis, we can identify V with R and < , >V
with the standard scalar product.
Now, consider an arbitrary scalar product < , > in V (not necessarily positive definite
or non-degenerate). We will see shortly that, thanks to the Spectral Theorem, it is
possible choose a basis N , orthonormal with respect to < , >V , such that the matrix
209
210 Introduction to Linear Algebra
associated with the scalar product < , > with respect to the basis N is diagonal.
Hence, N will be an orthogonal basis (not necessarily orthonormal) also with respect
to < , >. This will allow us to immediately to determine some fundamental properties
of the scalar product < , >. For example, we can determine if the product is non-
degenerate or positive definite, simply by looking at the signs of the elements on the
diagonal (which are, in fact, the eigenvalues) of the matrix associated to < , > with
respect to N .
We start with an equivalent statement of the Spectral Theorem.
Theorem 12.1.1 Let V be a vector space of dimension n with a positive definite
scalar product < , >V . Let < , > be another scalar product in V . Then, there exists an
orthonormal basis N for < , >V , which is also orthogonal for < , >.
Proof. By the Gram-Schmidt Theorem 10.6.3, there exists a basis A, which is or-
thonormal for < , >V . Let C be the matrix associated with < , > with respect to the
basis A and let T ∶ V ⟶ V be the linear application associated with C with re-
spect to the basis A in both the domain and codomain. By the Spectral Theorem,
there exists a basis N , orthonormal for the positive definite scalar product < , >V ,
consisting of eigenvectors of T . If P = IN ,A is the matrix of basis change between N
−1 T
and A, we have, again by the Spectral Theorem, that P is orthogonal, i.e. P = P .
Therefore, since N is a basis of eigenvectors, we can write:
−1 T
D=P CP = P CP, (12.2)
Remark 12.1.2 Formula (12.2) tells us a surprising fact: given a vector space V
with a positive definite scalar product and a fixed orthonormal basis A, there exists
an orthonormal basis N , such that we can write the basis change formula for a
symmetric linear application T as the basis change formula for a scalar product < , >
in the same way!
Hence, we can use the theory of diagonalization of linear applications, that we studied
in the Chapter 9, to solve the diagonalizability problem for scalar products. We need
to keep in mind two things:
1. Unlike what happens for linear applications, scalar products are always diago-
nalizable. This happens because, with a fixed-ordered basis, a scalar product is
associated to a symmetric matrix and the Spectral Theorem guarantees us the
diagonalizability of such matrices, via an orthogonal matrix P .
dealt with. We want to emphasize that orthogonal basis changes are particularly
useful in applications, especially in physics.
From the previous theorem, we immediately have a corollary, which is very important
for applications.
n
Corollary 12.1.3 Let < , > be a scalar product in R . Then, there exists a basis N ,
orthonormal for the Euclidean scalar product, such that the matrix associated with
< , > is diagonal.
Let us now see how the above results can be applied to determine if an arbitrary
scalar product is positive definite and non-degenerate.
Proposition 12.1.4 Let V be a vector space of dimension n with a scalar product
< , > associated with a diagonal matrix D with respect to a given basis N . Then:
1. < , > is non-degenerate if and only if all the elements on the diagonal of D are
non zero;
2. < , > is positive definite if and only if all elements on the diagonal of D are
positive.
Proof. Given two vectors u, v ∈ V , let (u)N = (α1 , . . . , αn ) , (v)N = (β1 , . . . , βn )
T T
This implies that at least one of the λi is negative or null, getting the contradiction.
212 Introduction to Linear Algebra
The matrix associated with it with respect to the canonical basis is:
−4 2
C=( )
2 −4
The Spectral Theorem guarantees us that such a matrix can be diagonalized through
an orthogonal change of basis. With an easy calculation, we see that the eigenvalues
of C are λ1 = −2, λ2 = −6, and its eigenspaces are: V−2 = ⟨(1, 1) >, V−6 = ⟨(1, −1) >.
By the Spectral Theorem we immediately have that
−2 0
( ) = P CP = P CP =
−1 T
0 −6
√ √ √ √
1/√2 1/ √2 −4 2 1/√2 1/ √2
=( )( )( ).
1/ 2 −1/ 2 2 −4 1/ 2 −1/ 2
By the
√ previous
√ √ with respect to the ordered orthonormal basis N =
proposition,
√
((1/ 2, 1/ 2), (1/ 2, −1/ 2)), the scalar product < , > is associated to the diagonal
matrix:
−2 0
D=( ).
0 −6
We can then conclude that < , > is non-degenerate but not positive definite.
Thanks to the previous example, we can make an easy observation, which is very
important for exercises.
Observation 12.1.6 Let < , > be a scalar product in a vector space V of dimension
n, and let C be the associated matrix, with respect to any basis of V . Then:
2. < , > is positive definite if and only if C has no negative or null eigenvalues.
In fact, by Corollary 12.1.3 there exists a basis N , such that the matrix associated
to < , > is diagonal. Looking at the proof of Theorem 12.1.1, we see that this matrix
has the eigenvalues of C on its diagonal. Hence, our claims descend from Proposition
12.1.4.
⎛−2 2 0⎞
C=⎜
⎜2 0 −1⎟⎟.
⎝ 0 −1 1 ⎠
n n
Observation 12.2.4 In R consider the following function q ∶ R ⟶ R:
q(x1 , . . . , xn ) = a11 x1 + a12 x1 x2 + a13 x1 x3 + . . . a1n x1 xn +
2
2 2
+a22 x2 + a23 x2 x3 + ⋅ ⋅ ⋅ + a2n x2 xn + ⋅ ⋅ ⋅ + ann xn .
We construct the symmetric matrix C as follows:
a12 a13
⎛aa11 2 2
. . . a21n ⎞
⎜ 212 a23
. . . a22n ⎟
C=⎜ ⎟
a22
⎜
⎜ 2 ⎟
⎟ . (12.6)
⎜
⎜. . . ⎟
⎟
⎝ a1n a2n a3n
. . . ann ⎠
2 2 2
⎛ x1 ⎞
q(x1 , . . . , xn ) = (x1 , . . . , xn )C ⎜
⎜⋮⎟ ⎟.
⎝xn ⎠
n
So the function q represents a quadratic form in R and we can immediately check
n
using (12.6) that every quadratic form in R is of this form.
Let us see an example.
3 2 2
Example 12.2.5 Consider q ∶ R ⟶ R given by: q(x, y, z) = x + 2xy + 3zy − 2z .
We have that q is a quadratic form, and the matrix associated with it is given by:
⎛1 1 0 ⎞ ⎛x⎞
q(x, y, z) = (x y z) ⎜
⎜1 0 3/2⎟
⎟⎜⎜y ⎟
⎟.
⎝0 3/2 −2 ⎠ ⎝z ⎠
n
Definition 12.2.7 Let q be a quadratic form on R . We define the signature of q
as the pair (r, s), where r and s are the number of positive and negative eigenvalues
respectively, of the matrix C associated with q with respect to the canonical basis,
each counted with multiplicity.
We note that, in this definition, instead of the canonical basis, we may choose any
orthonormal basis to determine C. In fact, we know that all matrices associated with
q have the same eigenvalues.
q(x, y) = c, (12.7)
• λ1 , λ2 > 0 ellipse;
Thanks to the Principal Axis Theorem 12.2.6, we can treat the general case as one
of these two geometric figures. We will therefore say that a quadratic form is in
canonical form, if it takes the expression (12.8). Let us see an example.
Example 12.3.1 We want to draw the curve in the plane whose points have coor-
2 2
dinates satisfying the equation: 5x − 4xy + 5y = 48.
216 Introduction to Linear Algebra
2 2
The matrix associated with the quadratic form q(x, y) = 5x − 4xy + 5y is:
5 −2
A=( )
−2 5
The eigenvalues
√ √of A are 3 and√7, the√ corresponding eigenvectors of unit length are
u1 = (1/ 2, 1/ 2), u2 = (−1/ 2, 1/ 2). Using the coordinates with respect to the
basis B = {u1 , u2 }, we have q(x , y ) = 3(x ) + 7(y ) . So q(x, y) = 48 is an ellipse
′ ′ ′ 2 ′ 2
y’ x’
1 −4
A=( ).
−4 −5
The eigenvalues
√ √of A are −7 and√ 3, the
√ corresponding eigenvectors of unit length are
u1 = (1/ 5, 2/ 5), u2 = (−2/ 5, 1/ 5). Using coordinates with respect to the basis
B = {u1 , u2 }, we have q(x , y ) = −7(x ) + 3(y ) . So q(x, y) = 16 is a hyperbola,
′ ′ ′ 2 ′ 2
Say if it is non-degenerate, positive definite and give its signature. Write the scalar
product associated with it with respect to the canonical basis.
Applications of Spectral Theorem and Quadratic Forms 217
Also, determine a basis with respect to which this scalar product is associated with
a diagonal matrix.
Solution. First we write the matrix associated with the given quadratic form, in the
canonical basis:
⎛1 0 1⎞
C=⎜ ⎜0 −1 −1⎟
⎟.
⎝1 −1 0 ⎠
Let us compute the eigenvalues:
√ √
λ1 = − 3, λ2 = 3, λ3 = 0.
The quadratic form is degenerate as one of the eigenvalues of the associated matrix
is equal to zero. It is not positive definite as it is degenerate. The signature is (1, 1).
The scalar product associated with q with respect to the canonical basis is given by:
′
⎛1 0 1 ⎞ ⎛x ⎞
⎟⎜
′⎟
< (x, y, z), (x , y , z ) >= (x, y, z) ⎜0 −1 −1⎟
⎜ ⎜y ⎟
⎜
′ ′ ′
⎟.
⎝1 −1 0 ⎠ ⎝z ′ ⎠
We can see immediately, without the need for further calculations, that it is a hyper-
bola, whose canonical form is given by:
√ ′ 2 √ ′ 2
q(x , y ) = (1 + 5)(x ) + (1 − 5)(y ) .
′ ′
218 Introduction to Linear Algebra
y
y’
x’
a) Write the matrix associated with it with respect to the canonical basis.
b) Write the quadratic form in canonical form q(x1 , x2 ) = ax1 + bx2 for appropriate
2 2
a and b.
c) Draw the curve described by the equation q(x1 , x2 ) = 48 in the cartesian plane.
12.5.2 Given the quadratic form q(x1 , x2 ) = 3x1 + 2x2 + x3 + 4x1 x2 + 4x2 x3 .
2 2 2
a) Write the matrix A associated with it with respect to the canonical basis.
b) Say if q is positive definite.
−1
c) Find (if possible) a matrix P such that D = P AP is diagonal.
d) Write the quadratic form q1 associated with D and establish the relation between
q1 and q.
Applications of Spectral Theorem and Quadratic Forms 219
⎛ 4 2 3 ⎞
C=⎜
⎜ 2 0 2 ⎟
⎟
⎝ 3 2 4 ⎠
a) Write the scalar product and the quadratic form associated with it with respect
3
to the canonical basis of R .
b) Determine whether the scalar product in (a) is positive definite and/or non-
degenerate.
c) Compute the signature of the quadratic form in (a).
d) Determine a basis in which the scalar product given in (a) is associated with a
diagonal matrix.
[Help: 8 and −1 are eigenvalues of C.]
2
12.5.4 Consider the quadratic form: q(x, y) = x + 5xy.
1) Write the scalar product associated with it and determine if it is non-degenerate
or positive definite. Compute the signature of q.
2) Given the curve in the plane consisting of the points with coordinates satisfying
2
the equation x + 5xy = 1, say what curve it is.
2 2
12.5.5 Draw the curve x1 − 8x1 x2 − 5x2 = 16.
3
13.1 POINTS AND VECTORS IN R
3
Consider R , the set of ordered triples of real numbers:
3
R = {(x, y, z) ∣ x, y, z ∈ R}
3
We can represent the elements of R as points, where:
• We have a point O, called the origin, corresponding to the element (0, 0, 0).
• We choose three lines through O perpendicular to each other, and we call them
coordinate axes, denoted respectively as the x, y and z axis. We usually think
of the x and y axes as horizontal and to the z axis as vertical (see Fig. 13.1).
• We set an orientation of the axes according to the right-hand rule (see also Fig.
13.1). We first choose a direction for the x-axis and one for the y-axis and we
call these positive directions. For the z axis the positive direction is identified
as follows: if we wrap the fingers of the right hand (pointer, middle, ring finger,
little finger) around the z-axis from the x-axis in the positive direction, to the
y-axis in the positive direction, the thumb points in the direction that we call
positive for the z axis.
221
222 Introduction to Linear Algebra
z
6 .......
........ ......
...... ....
...
Y
.... ..
.
...
...... ..
. ...
....... ..
............... ......................
.
-
y
+
x
Fig. 13.1
We can then associate with a point P the three ordered real numbers (a, b, c) that
3
we call coordinates of P (see Fig. 13.2). From now on, we will identify R with the
points of space through this representation.
Fig. 13.2
We leave to the reader as an easy exercise to prove that actually d(P, Q) represents
the length of the segment which connects P with Q.
Lines and Planes 223
We now define the notion of vector, which is extremely important for our discussion.
3
In this appendix we will use round brackets for the points of R and square brackets
for vectors, in order to mark the difference between these two notions.
Given a point P = (a, b, c) ∈ R , we define the position vector of P as:
3
⟶
v =OP = [a, b, c].
We represent the position vector as an arrow from the origin with its tip in P .
⟶
Given the points P = (a, b, c), Q = (a , b , c ) ∈ R , we define the vector P Q like:
′ ′ ′ 3
⟶
w =P Q= [((a − a ), (b − b ), (c − c )].
′ ′ ′
We can represent the vector w as an arrow starting from the origin and parallel to
the segment P Q.
6
Q
w
-
O
P
+
Fig. 13.3
Given two vectors u = [u1 , u2 , u3 ], v = [v1 , v2 , v3 ] we can define their sum as follows:
u + v = [u1 + v1 , u2 + v2 , u3 + v3 ].
Similarly we can define multiplication by a scalar, i.e. by a real number λ:
λu = [λu1 , λu2 , λu3 ].
The two operations of sum between vectors and multiplication of a vector by a scalar
equip the set of vectors with the structure of vector space, that is, the 8 properties
of Definition 2.3.1 of Chapter 2 apply.
We leave the easy verification to the reader.
The dot product has the following properties, which we leave as an easy exercise for
the reader. For each vector u, v, w and for each scalar λ we have:
• Commutativity:
u⋅v=v⋅u
• Distributivity:
u ⋅ (v + w) = u ⋅ v + u ⋅ w
√ √
Note that u ⋅ u = u21 + u22 + u23 represents the distance of the point P = (u1 , u2 , u3 )
from the origin and therefore represents the length of the segment OP . We define as
√
length of the vector u the number u ⋅ u, which we also denote as ∥u∥.
Let us now remind the reader of an elementary result of Euclidean geometry, the
cosine theorem, which allows us to find the length of an edge of a triangle with
vertices A, B, C, knowing the lengths of the other two edges and the angle θ between
them (see Fig. 13.4). This theorem states that:
2 2 2
BC = AC + AB − 2 ⋅ AC ⋅ AB cos θ,
C
C
C
C
AC C
C BC
C
C
C
....
...
...
...
θ C
.
C
..
..
CC
A AB B
Fig. 13.4
Theorem 13.2.1 Let u and v be two nonzero vectors and let θ be the angle having
as sides the half-lines determined by the two vectors. Then
By the cosine theorem, if we consider the triangle with a vertex in the origin and
sides of length ∥u∥, ∥v∥ and ∥u − v∥, we immediately have that:
2 2 2 2
∥u − v∥ = ∥u∥ + ∥v∥ − ∥u∥ ∥v∥ cos(θ). (13.3)
From the previous theorem, we can easily obtain the following corollary, which es-
tablishes when two vectors are perpendicular, that is, when the angle between them
is π/2. This result will be very useful for exercises.
Corollary 13.2.2 Two vectors u and v are perpendicular to each other if and only
if u ⋅ v = 0.
Let us now turn to another extremely important product for our discussion: the cross
product, also called vector product. Let u = [u1 , u2 , u3 ], v = [v1 , v2 , v3 ] be two vectors.
We define their vector product as:
u × v = (u2 v3 − u3 v2 , u3 v1 − u1 v3 , u1 v2 − u2 v1 ) .
The vector product has the following properties that we leave for exercise. For each
vector u, v, w and for each scalar λ:
u×u=0
• Anticommutativity
u × v = −(v × u)
u × (v + w) = (u × v) + (u × w)
(λ u) × v = u × (λ v) = λ (u × v)
We conclude the section with a result of great importance for the exercises. It is an
immediate consequence of Corollary 13.2.2.
Proposition 13.2.3 The cross product between two vectors u and v is perpendicular
to both u and v.
226 Introduction to Linear Algebra
3
13.3 LINES IN R
3
In the space R , we want to describe the points laying on a straight line r, using
equations. We require our line to pass through a point P0 = (x0 , y0 , z0 ), we ask that
its direction is determined by the vector v = [v1 , v2 , v3 ]. In short, with the help of a
drawing, we can immediately write the equation for P = (x, y, z), the generic point
of the line r:
⟶ ⟶
OP =OP0 +t[v1 , v2 , v3 ]. (13.4)
Therefore:
[x, y, z] = [x0 , y0 , z0 ] + t[v1 , v2 , v3 ]. (13.5)
We call equation (13.5) a vector parametrization or vector equation of the line r.
z
6
PP
PP
P PP
P PP P
PP P
P
PP
7 P P
PPPP
P0
PP
P
P
*PPP
r
PP
PP
v
i
P
PP
PP
PP -
y
=
x
In general, the line r is the set points with coordinates (x, y, z) ∈ R expressed as:
3
⎧
⎪ x = x0 + tv1
⎪
⎪
⎪
⎨ y = y0 + tv2 (13.6)
⎪
⎪
⎪
⎪
⎩z = z0 + tv3
as the parameter t ∈ R changes. The equations in (13.6) are called parametric equa-
tions of the line r. The vector v is called the direction vector of r. Note that the
vector v identifies the direction of r; however, we can use any nonzero vector, multiple
of v, to define the same line.
Let us see a concrete example.
Example 13.3.1 We want to write parametric equations of the line r through the
point P0 = (1, 0 − 1) with direction given by the vector v = [2, 1, −1]. We also
Lines and Planes 227
[2, 0, −3] + t [−4, −2, 2]. Substituting in formula (13.6), we have immediately the
′
⎧
⎪ x = 1 + 2t
⎪
⎪
⎪
⎨ y=t (13.7)
⎪
⎪
⎪
⎪
⎩z = −1 − t.
′
The two lines r and r have the same direction, since the direction vector of r,
v = [2, 1, −1], is a multiple of the direction vector of r , v = [−4, −2, 2]. There-
′ ′
fore, the two lines are parallel or they coincide. To establish their mutual position
′
it is sufficient to check if the point P0 belongs to the line r , that is, if it exists a
value of the parameter t such that [1, 0, −1] = [2, 0, −3] + t [−4, −2, 2]. We leave to
′ ′
the reader to verify that this value does not exist. Therefore, the two given lines are
parallel and they do not coincide.
Example 13.3.2 We want to write parametric equations of the line r̂ through the
points P1 = (3, 1, −2) and P2 = (5, 2, −3). A direction of r̂ is obtained by taking the
difference between the coordinates of P2 and those of P1 : v = [2, 1, −1]. So r̂ is given
by:
⎧
⎪ x = 3 + 2t̂
⎪
⎪
⎪
⎨ y = 1 + t̂ (13.8)
⎪
⎪
⎪
⎪
⎩z = −2 − t̂,
where we have chosen P0 = P1 in formula (13.6) (we very well could have chosen P2
instead).
We now ask whether the line r̂ is parallel or coincident with the line r of the previous
example, since they have the same direction. A quick calculation shows that Q belongs
to r, and therefore the two lines are coincident.
These examples show that a parametric form of a given line is not unique: we can in
fact change the point P0 = (x0 , y0 , z0 ) used in the representation (13.6), choosing it
arbitrarily among all the infinite points of the line, or we can multiply the parameter
t for an arbitrary nonzero constant: in both the cases the line does not change, even
though its parametric equations can take a different form.
3
Let us now see an equivalent way of describing the points of a line in R without
using a parameter. For any set of parametric equations of a line r, it is always possible
to get the parameter t from one of the equations and, replacing it in the other two,
we obtain a linear system of two equations in the unknowns x, y, z. Such equations
are called Cartesian equations of the line r. It is obvious that if the coordinates of
a point verify parametric equations of the line r then they also verify the Cartesian
equations, the vice-versa will be clear at the end of next section. In fact, we will
see that the two linear equations of the system we obtain represent two planes both
228 Introduction to Linear Algebra
containing the line r. The solutions of the linear system are the points that lie in the
intersection of two planes, that is, the points of a line, and this line must be indeed
r. Let us see an example.
Example 13.3.3 We want to write the line r of the previous example in Cartesian
form (see 13.3.1). In this case, it is very simple; since t = y, just substitute y instead
of t directly in the other equations:
x = 1 + 2y
{ (13.9)
z = −1 − y
So x − 2y − 1 = 0 and y + z + 1 = 0 are the Cartesian equations of the line r. We can
then think of the points of the line as the set of solutions of the linear system (13.9).
In the next section, we will see how the previous example can be reinterpreted to see
3
a line as the intersection of two planes in R .
3
13.4 PLANES IN R
We now want to determine the equation that describes the points of a plane perpen-
dicular to a given line with direction n = [a, b, c] and passing through a fixed point
P0 = (x0 , y0 , z0 ). By Corollary 13.2.2, we have that two vectors are perpendicular if
and only if their dot product is zero. Therefore, the set of points of the plane con-
taining P0 = (x0 , y0 , z0 ), perpendicular to the vector n, is obtained by imposing that
the generic point P = (x, y, z) on the plane satisfies the equation:
⟶
n⋅ P0 P = 0.
We write this equation as:
a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0. (13.10)
Equation (13.10) is called a Cartesian equation of the given plane and the vector n
is said normal vector to the plane.
z
6
P
PPP
P
PPP
P 0
PP
s
PP
PP H P
PP HH P
Hs
PP
PP
PP
PP P
7
π
n
-
y
=
x
Lines and Planes 229
Example 13.4.1 We want to determine the plane through the point P0 = (1, −1, 2)
and with normal vector n = [2, −3, −1]. Substituting in equation (13.10) we imme-
diately have:
2(x − 1) − 3(y + 1) − (z − 2) = 0.
Thus the plane consists of all the points (x, y, z) that satisfy the equation 2x−3y−z =
3.
It is easy to verify that, on the other hand, any linear equation of the type:
ax + by + cz = d
⎧
⎪ x = 11 − 2t + 5s
⎪
⎪
⎪
⎨ y=t
⎪
⎪
⎪
⎪
⎩z = s.
Let us look at another more complicated example.
⎧
⎪ x = 1 + t + 3s
⎪
⎪
⎪
⎨ y = 0 + 2t + s (13.12)
⎪
⎪
⎪
⎪
⎩z = −1 + 2t + 3s.
To determine the Cartesian form, we could obtain t and s from two equations and
replace them into the third. However, from formula (13.10) we know that, to de-
termine the plane, it is enough know the coordinates of a point in the plane and a
vector normal to it. To determine a vector normal to the plane, we can take the vector
product of the two direction vectors. Such product is in fact always perpendicular to
both vectors. Let us see the calculation:
⟶ ⟶
n =P Q × P R= [4, 3, −5].
4(x − 1) + 3y − 5(z + 1) = 0.
i.e. 4x + 3y − 5z = 9.
x − 2y + 4z = 5.
The line we need to find has as a direction vector the vector normal to the plane and
therefore we can immediately write the equations of the line r in parametric form:
⎧
⎪ x=1+t
⎪
⎪
⎪
⎨ y = −2 − 2t (13.13)
⎪
⎪
⎪
⎪
⎩z = 3 + 4t,
which correspond to the equations in Cartesian form:
4x − z = 1
{ (13.14)
2x + y = 0.
13.5.2 Determine the distance of the point P = (2, −1, 3) from the plane passing
through Q = (1, −1, −8) with normal vector n = (2, −2, −1).
Solution. We immediately write the Cartesian equation of the plane:
2x − 2y − z = 12
and then a set of parametric equations of the line passing through P and with direc-
tion vector n is:
⎧
⎪ x = 2 + 2t
⎪
⎪
⎪
⎨ y = −1 − 2t
⎪
⎪
⎪
⎪
⎩z = 3 − t.
We then calculate the point R of intersection between this line and the given plane
by substituting the generic point obtained from the parametric equations of the line
in the equation of the plane:
We obtain t = 1, thus R = (4, −3,√2). Using the distance formula (13.1), we get that
the distance between P and R is 17.
13.5.3 Consider the line r through the two points P = (1, 0, −2) and Q = (−1, 1, −1)
and the line r through the point R = (3, 1, −5) with direction vector v = (0, −1, 1).
′ ′
′ ′
Say if r is perpendicular to r and compute the distance between Q and r .
232 Introduction to Linear Algebra
Solution. We compute a direction vector: v = [−2, 1, 1] and therefore the line r has
parametric equations:
⎧
⎪ x = 1 − 2t
⎪
⎪
⎪
⎨ y =0+t
⎪
⎪
⎪
⎪
⎩z = −2 + t.
′
We immediately see that v ⋅ v = 0, however two lines are perpendicular if they
intersect and if they have perpendicular direction vectors. So we have to check if they
′
intersect. We write parametric equations of r :
⎧
⎪ x=3
⎪
⎪
⎪
⎨ y =1−t
′
⎪
⎪
⎪
⎪ ′
⎩z = −5 + t .
At this point, to check whether or not they intersect we solve the system:
⎧
⎪ 3 = 1 − 2t
⎪
⎪
⎪
⎨
′
1−t =0+t
⎪
⎪
⎪
⎪ ′
⎩−5 + t = −2 + t.
′
We see that this system admits solution for t = −1 and t = 2, that is, the point
S = (3, −1, −3) belongs to both lines. Therefore, r and r are perpendicular.
′
′
We now want to calculate the distance between Q and r . As the lines are perpendic-
ular and intersect at the point S this distance will be given by the distance between
the points Q and S which we can immediately calculate using formula (13.1):
√ √
16 + 4 + 4 = 2 6.
x + y − z = 0, y + 2z = 6. (13.15)
Solution. Before proceeding, let us note a very important fact: the expression of a line
in Cartesian form, that is, given by a system of two equations in the unknowns x, y,
z, corresponds to the intersection of two planes each identified by its own Cartesian
equation. The direction of the line is uniquely identified as it is perpendicular to
both normal directions of the two planes. So, to find a vector perpendicular to both
the normal vectors of the planes, we calculate the vector product of the two normal
vectors: n1 = [1, 1, −1] and n2 = [0, 1, 2]:
Let us now choose an arbitrary point on the line, that is, a point that satisfies both
equations (13.15). We can for example set z = 0 and obtain values for x and y from
Lines and Planes 233
the equations: y = 6 and x = −6. Hence, parametric equations of the line intersection
of the two planes are:
⎧
⎪ x = −6 + 3t
⎪
⎪
⎪
⎨ y = 6 − 2t
⎪
⎪
⎪
⎪
⎩z = t.
r1 ∶ x = 1 + t, y = t, z = 2 − 5t
r2 ∶ x + 1 = y − 2 = 1 − z,
r3 ∶ x = 1 + t, y = 4 + t, z =1−t
⎧
⎪ x = 1 + s + 2t
⎪
⎪
⎪
⎨ y = 3s
⎪
⎪
⎪
⎪
⎩z = 2 + t.
Determine the line r perpendicular to π and passing through the point P = (2, 1, 0).
b) Determine the plane π containing Q = (1, 0, −1) and r, both in parametric and
′
in Cartesian form.
x+y−1=0 x + y − 2z − 2 = 0
r∶{ s∶{
z−1=0 z + 1 = 0;
b) Determine the Cartesian equation of the plane π containing r and the point
P = (1, 0, 1).
c) Find parametric equations for the line s passing through P and orthogonal to
π.
3
13.6.7 In R , given the plane π with Cartesian equation π ∶ x + y − 2z + 4 = 0 and
the line r with Cartesian equations
x − 2z + 12 = 0
{
y−4=0
π ∶x−y+z+1=0
c) Find Cartesian equations for the line s passing through Q and parallel to the
vector v = [1, 0, 1].
13.6.9 Consider two generic vectors u, v laying on the plane xy. Prove that the norm
of ∥u × v∥ is the area of the parallelogram of sides u and v.
Introduction to Modular
Arithmetic
In this chapter, we want to study the arithmetic of the integers. We will start with
the principle of induction, a result of fundamental importance that has several ap-
plications in various areas of mathematics. We will then continue with the division
algorithm, Euclid’s algorithm, arriving to congruences, the most important topic of
elementary discrete mathematics.
We will denote this statement with P (n). First of all, we check its validity for n = 0:
we need to show that the sum of natural numbers between 0 and 0 is 0. In fact,
0 = 0(0 + 1)/2 = 0. Let us now verify that P (1) is true: the sum of the integers
between 0 and 1 is 0 + 1 = 1 = 1(1 + 1)/2. Similarly, P (2): 0 + 1 + 2 = 2(2 + 1)/2 = 3.
It is clear that, with a little patience, we could go on like this by verifying formula
(14.1) when n is very large, but we want to prove that it is true for all natural
numbers n. The induction principle helps us by allowing us to prove the validity of a
statement for all natural numbers.
First of all, we state the axiom of good ordering. We will see later that the principle
of induction and the axiom of good ordering are equivalent to each other. However in
the theory we will illustrate we shall take one as an axiom and then prove the other.
Axiom of good ordering. Each non-empty subset of the set of natural numbers
contains an element that is smaller than all the others.
235
236 Introduction to Linear Algebra
Example 14.1.2 Let us see how to use the principle of induction to prove the
formula (14.1), which is the following statement P (n):
n(n+1)
the sum of the first n natural numbers is 2
.
We have already seen that this statement is true for n = 0, that is, hypothesis 1) of
the principle of induction is true. Now assume that P (k) is true, that is:
k(k + 1)
0 + 1 + 2 + 3 + ⋅⋅⋅ + k = .
2
We have to show that P (k) ⟹ P (k + 1) (hypothesis 2) of the induction principle,
i.e. that:
(k + 1)(k + 2)
0 + 1 + 2 + 3 + ⋅ ⋅ ⋅ + k + (k + 1) = .
2
Since P (k) is true, we have:
2
0 + 1 + ⋅ ⋅ ⋅ + k + (k + 1) =
k(k+1) k +k+2k+2
2
+k+1= 2
=
(k+1)(k+2)
= 2
,
1. P (0) is true;
2. if P (j) is true for every j < k then P (k) is true;
then P (n) is true for every n ∈ N.
Proof. It is an immediate consequence of the principle of induction.
This apparently convoluted proof is, however, instructive because it proves the equiv-
alence of three statements, which look quite different from each other.
Proof. Let us first prove the existence of q and r with the principle of complete
induction. P (0) it is true because n = 0 = 0b + 0, with q = r = 0. Assume that P (j)
is true for every j such that 0 ≤ j < k and we want to show P (k) (that is, we want
to verify hypothesis 2) of the principle of complete induction). If k < b then:
so P (k) is true for k < b. Now assume that k ≥ b. Since 0 ≤ k − b < k, by applying
the inductive hypothesis to k − b we have:
k − b = q1 b + r1 , with 0 ≤ r1 < b,
Observation 14.2.2 With a similar proof, which uses the axiom of good ordering,
we obtain that the division algoritm holds for any two integers a and b, not necessarily
belonging to natural numbers. More precisely we have that:
Theorem 14.2.3 Given two integers n and b with b ≠ 0, there exists a unique pair
of integers p and q, respectively called quotient and remainder, such that
n = qb + r, with 0 ≤ r < ∣ b ∣,
We now want to introduce the concept of divisibility and of greatest common divisor.
Definition 14.2.4 Let a and b be two integers. We say that b divides a if there is
an integer c such that a = bc and we write b∣a. We say that d is the greatest common
divisor between two numbers a and b if it divides them both and it is the largest
integer with this property. We will denote the greatest common divisor between a
and b with gcd(a, b).
Observation 14.2.5 Note that if a, b and c are integers and a divides both b and
c, then a also divides b + c and b − c. We leave to the reader the easy verification of
this property that we will use several times later.
We now want to find an efficient algorithm to determine the greatest common divisor
between two integers.
Introduction to Modular Arithmetic 239
Theorem 14.2.6 (Euclid’s algorithm) Let a and b be two positive integers such
that b ≤ a and b does not divide a. Then we have:
a = q0 b + r0 , where 0 ≤ r0 < b
b = q1 r0 + r1 , where 0 ≤ r1 < r0
r0 = q2 r1 + r2 , where 0 ≤ r2 < r1
rt−1 = qt+1 rt
and the last nonzero remainder rt is the greatest common divisor between a and b.
Proof. By Theorem 14.2.1 we can write: a = q0 b + r0 . Now we want to show that
gcd(a, b) = gcd(b, r0 ). In fact if c∣a and c∣b then c∣r0 , since r0 = a − q0 b. Similarly,
if c∣b and c∣r0 then c∣a = q0 b + r0 . So the set of integers that divide both a and b
coincides with the set of integers that divide both b and r0 . Therefore the greatest
common divisor of the two pairs (a, b) and (b, r0 ) is the same. Once established that,
the result follows immediately from the chain of equalities:
Let us see concretely how to use this algorithm to determine the greatest common
divisor of two given numbers.
Example 14.2.7 We want to compute gcd(603, 270). We use Euclid’s algorithm
(Theorem 14.2.6):
603 = 2 ⋅ 270 + 63
270 = 4 ⋅ 63 + 18
63 = 3 ⋅ 18 + 9
18 = 2 ⋅ 9.
So gcd(603, 270) = 9.
The following theorem is a consequence of Euclid’s algorithm and will be the funda-
mental tool for the resolution of congruences, which we will study in Section 14.4.
Theorem 14.2.8 (Bézout Identity). Let a, b be positive integers and let d =
gcd(a, b). Then, there are two integers u, v (not unique) such that:
d = ua + vb.
240 Introduction to Linear Algebra
Proof. The proof of this result uses Euclid’s algorithm 14.2.6. We show that at each
step there exist ui , vi ∈ Z such that ri = ui a + vi b.
For r0 the result is true, with u0 = 1 and v0 = −q0 , indeed
r0 = a − q0 b.
Then we have that:
r1 = b − r0 q1 = b − (u0 a + v0 b)q1 = −u0 aq1 + (1 − vo q1 )b
and the result is also true for r1 , just take u1 = −u0 q1 and v1 = 1 − v0 q1 . In general,
after the step i − 1 we know ui−2 , ui−1 , vi−2 , vi−1 ∈ Z such that:
ri−2 = ui−2 a + vi−2 b, ri−1 = ui−1 a + vi−1 b.
So we have that:
ri = ri−2 − ri−1 qi = ui−2 a + vi−2 b − (ui−1 a + vi−1 b)qi
= (ui−2 − ui−1 qi )a + (vi−2 − vi−1 qi )b,
so the result is true for ri , with ui = ui−2 − ui−1 qi and vi = vi−2 − vi−1 qi . Since
gcd(a, b) is the last nonzero remainder rt , after the step t we know ut and vt such
that rt = gcd(a, b) = ut a + vt b and u = ut , v = vt are the integers we were looking
for.
Observation 14.2.9 In the previous theorem, the existence of two numbers u, v such
that d = ua+vb does not guarantee that d = gcd(a, b). For example 10 = 15⋅14−25⋅8
but gcd(14, 8) = 2.
We conclude this section with a result that we do not prove and that we will not use
later, but which plays a fundamental role for the theory of the integer numbers and
whose generalizations are extremely important in number theory (see [2]).
Definition 14.2.11 Let us say that a positive integer p is prime if its only divisors
are ±p and ±1.
Theorem 14.2.12 (Fundamental Theorem of Arithmetic). Each integer
greater than 1 is the product of primes in a unique way up to reordering:
n = p1 p2 . . . pr ,
2. ac ≡n bd.
242 Introduction to Linear Algebra
Let us now define the congruence classes, i.e. the sets that contain all integer numbers
which are congruent to each other modulo a certain integer n.
[a]n = {b ∈ Z ∣ b ≡n a} = {a + kn ∣ k ∈ Z}.
Note that:
[4]4 = [0]4 = [−4]4 = . . .
Proposition 14.3.5 Let [a]n and [b]n be two congruence classes modulo n. Then
[a]n = [b]n or [a]n and [b]n are disjoint, that is they do not have common elements.
Proof. Assume there is a common element c between [a]n and [b]n , that is c ≡n a
and c ≡n . So, by the transitive property of congruences a ≡n b, and then, again by
the transitive property, [a]n = [b]n .
2. [0]n , [1]n , . . . , [n − 1]n are all the distinct congruence classes modulo n.
We have come to the most important definition of this chapter: the set Zn .
Definition 14.3.7 The set of integers modulo n, denoted with Zn , is the set of
congruence classes modulo n:
Definition 14.3.8 We define the following sum and product operations on the set
Zn :
[a]n + [b]n = [a + b]n , [a]n [b]n = [ab]n .
Observation 14.3.9 The operations just defined do not depend on the numbers
a and b chosen to represent the congruence classes which we add or multiply, but
only from their congruency class. In this case, it is said that the operations are well
defined. For example, in Z4 we have: [1]4 = [5]4 and [2]4 = [6]4 . By definition,
[1]4 + [2]4 = [3]4 = [11]4 = [5]4 + [6]4 .
Example 14.3.10 We compute the tables of addition and multiplication for Z3 and
Z4 , inviting the student to practice in building the analogue tables for Z5 and Z6 :
We note some very important facts: in Z3 each element other than [0]3 admits an
inverse, that is, for every [a]3 ≠ [0]3 there is an element [b]3 such that [a]3 [b]3 =
[1]3 . This inverse is denoted with [a]3 . So we have: [1]3 = [1]3 , [2]3 = [2]3 . This
−1 −1 −1
property does not apply in the case of Z4 . In fact, the multiplicative table shows that
there is no inverse of the class [2]4 . As we will see in detail in the next paragraph,
this diversity is linked to the fact that 3 is a prime number while 4 is not.
14.4 CONGRUENCES
In this section, we aim at solving linear equations in which the unknown belongs to
the set Zn introduced in the previous section.
Let us start by examining the structure of Zp , with p a prime number.
Proposition 14.4.1 The following statements are equivalent:
(1) p is a prime number.
(2) The equation [a]p x = [1]p , with [a]p ≠ [0]p , has a solution in Zp , that is,
every element [a]p ≠ [0]p in Zp admits an inverse.
(3) If [a]p [b]p = [0]p in Zp then [a]p = [0]p or [b]p = [0]p .
Proof. (1) ⟹ (2): since [a]p ≠ [0]p , p does not divide a, so gcd(a, p) = 1. Then
by Theorem 14.2.8 there exists u, v ∈ Z such that 1 = au + pv. Taking the classes
congruence modulo p, we have: [1]p = [a]p [u]p + [p]p [v]p = [a]p [u]p , therefore
x = [u]p ∈ Zp is a solution of [a]p x = [1]p .
(2) ⟹ (3): we have [a]p [b]p = [0]p with [a]p ≠ [0]p . By hypothesis there is an
inverse of [a]p that is, there is an element [u]p such that [u]p [a]p = [1]p . Multi-
plying both members of the equality [a]p [b]p = [0]p by [u]p we get: [u]p [a]p [b]p =
[u]p [0]p = [0]p , i.e. [b]p = [0]p .
(3) ⟹ (1): we assume that p = ab and we show that necessarily a and b are
equal to ±1 or ±p, that is the only divisors of p are, up to changing the sign, p itself
and 1. Considering the absolute values, we observe that ∣a∣∣b∣ = ∣ab∣ = ∣p∣ = p so
∣a∣ ≤ p and ∣b∣ ≤ p. The equality p = ab, translated in Zp , becomes the equality
[p]p = [a]p [b]p i.e. [a]p [b]p = [0]p . By hypothesis we know that either [a]p = [0]p
or [b]p = [0]p . If [a]p = [0]p then a = ±p and b = ±1. If [b]p = [0]p then b = ±p and
a = ±1.
Corollary 14.4.2 If p is a prime number the equation [a]p x = [b]p , with [a]p ≠
[0]p , has a single solution in Zp .
Proof. By property (2) of the previous proposition we know that [a]p is invertible.
Multiplying the given equation by [a]p we get x = [a]p [b]p , which shows at the
−1 −1
Proof. Since gcd(a, n) = 1, by Theorem 14.2.8 there exist u, v ∈ Z such that au+nv =
1, so [a]n is invertible in Zn , with inverse [u]n . Arguing as in the proof of the previous
corollary we get the result.
(2) if d∣b then equation [a]n x = [b]n has exactly d distinct solutions in Zn .
Proof. Assume that [c]n is a solution of the given equation, then: [a]n [c]n = [ac]n =
[b]n i.e. ac ≡n b or, equivalently, n∣ac − b. Consequently d∣b since d∣n and d∣a.
′ ′ ′
Assume now that d divides b and let a = a d, n = n d, b = b d. Observe that
gcd(a , n ) = 1, otherwise d would not be the greatest common divisor between a
′ ′
and n. Then by Proposition 14.4.3, the equation [a ]n′ x = [b ]n′ has a unique solu-
′ ′
tion in Zn′ , let it be [c]n′ . Thus we have: [a ]n′ [c]n′ = [a c]n′ = [b ]n′ , i.e. n ∣a c − b ,
′ ′ ′ ′ ′ ′
therefore n ∣a c − a e. It follows that [a ]n′ [e]n′ = [a ]n′ [c]n′ and then [e]n′ is the
′ ′ ′ ′ ′
solution of the equation [a ]n′ x = [b ]n′ . By Proposition 14.4.3 we have [e]n′ = [c]n′ ,
′ ′
′
i.e. e = c + kn , with k ∈ Z.
Then it is easy to verify that [e]n ∈ {[c]n , [c+n ]n , [c+2n ]n , . . . , [c+(d−1)n ]n } = X
′ ′ ′
and that the elements of X are all distinct and are all solutions of the equation
[a]n x = [b]n . This shows what we wanted.
Example 14.4.6 We want to determine all solutions in Z74 of the equation [33]74 x =
[5]74 .
We use Euclid’s algorithm:
74 = 2 ⋅ 33 + 8
33 = 4 ⋅ 8 + 1
8 =8⋅1
246 Introduction to Linear Algebra
As (33, 74) = 1, we know that the solution exists and is unique, and the calculations
just made allow us to compute the inverse of [33]74 . We have in fact that:
Now let us see how Theorem 14.4.5 also allows us to solve linear congruences.
[a]n x = [b]n , found for example with the method used in point (1), the integers e
such that [e]n is a solution of the equation [a]n x = [b]n are all those of the type
′
e = c + kn , with k ∈ Z. Thus the solutions of the linear congruence ax ≡n b are
′
precisely only those of the type e = c + kn , with k ∈ Z.
have that 63 = 3 ⋅ 21, 375 = 3 ⋅ 125, 24 = 3 ⋅ 8, thus we solve the equation [21]125 x =
[8]125 . We want to find the inverse of [21]125 , and to do so we first use Euclid’s
algorithm to compute gcd(125, 21):
125 = 5 ⋅ 21 + 20
21 = 1 ⋅ 20 + 1
20 = 20 ⋅ 1.
Proceeding backwards:
14.6.2 Prove the Fundamental Theorem of Arithmetic 14.2.12 using the principle of
complete induction.
n
14.6.3 Prove (by induction) that if n it is a non-negative integer then 2 > n.
14.6.4 Say if there are two classes [a]37 , [b]37 in Z37 , both nonzero such that
[a]37 [b]37 = [0]. If they exist compute them, if they do not exist explain why. Answer
the same question replacing Z37 with Z36 .
14.6.5 Consider the two congruence classes: [0]6 e [3]12 . Say if they are the same
or if they are different from each other or if one is contained in the other.
X = {3, 4, 5, 6, 7},
or we can assign a property of its elements, for example, in the previous case, X is
the set of natural numbers greater than 2 and smaller than 8, and can be referred to
as:
X = {x∣ x is a natural number and 2 < x < 8}.
To denote that an element belongs to a set, we use the symbol ∈, whose negation is
/ . For example, in the previous case we have 5 ∈ X, 9 ∈
∈ / X.
Some sets often used in the text are:
N = {0, 1, 2, . . . , n, n + 1, . . . } set of natural numbers,
Z = {0, ±1, ±2, . . . , ±n, . . . } set of integer numbers,
R set of real numbers.
Two sets X and Y are the same if they have the same elements, and in this case we
write X = Y .
Definition 14.7.1 If X and Y are sets, let us say X is a subset of Y if each element
of X is also an element of Y , and we write X ⊆ Y .
Then there is a special set, the empty set, that is the set with no elements and is
denoted with ∅. Note that the empty set it is a subset of any set X. We have for
example:
2
{x ∣ x ∈ R, x = −1} = ∅,
because no real number raised to the square gives −1 as a result. Care must be taken;
for example X = {0} is the subset of the real numbers that contains the single element
zero, however, it is not the empty set because it contains an element.
Let us now recall the two fundamental operations that can be carried out between
sets.
• The union of two sets X and Y is the set of all the elements that belong to X
or to Y and is denoted with X ∪ Y .
• The intersection of two sets X and Y is the set of all the elements that belong
to both X and Y and is denoted with X ∩ Y .
APPENDIX A
Complex Numbers
In this appendix, we introduce the set of complex numbers, necessary for a deeper
understanding of the question of finding solutions of algebraic equations. All linear
algebra results we describe in this book concerning real vector spaces, are also true re-
placing real numbers with complex numbers, without any modification to the theory.
Since this topic involves an additional difficulty, we prefer to present our treatment
of linear algebra limiting ourselves to the case of real scalars and leaving the complex
number case in this appendix.
249
250 Introduction to Linear Algebra
The set of real numbers is a subset of C, because we can write any real number
a ∈ R in the form a = a + 0i. We call complex numbers of the type bi = 0 + bi purely
imaginary. If z = a + bi is a complex number, the real numbers a and b are called the
real part and imaginary part of z, respectively.
We can represent complex numbers in the Cartesian plane as follows: we associate
to a + bi the pair of real numers (a, b). In this plane, the x-axis represents the real
numbers and it is called real axis, while the y-axis represents pure imaginary complex
numbers and we call it imaginary axis.
Im
−1 + 2i 6
s
1+i
s
-
Re
1
s − 2i
2
Given a complex number α = a + bi, we define its complex conjugate (or conjugate)
α as α = a − bi. We also define the modulus of a complex number α as
√ √
∣α∣ = αα = a2 + b2 .
• α = α if and only if α ∈ R,
• α + β = α + β for each α, β ∈ C,
• αβ = αβ for each α, β ∈ C.
−1
One of the most important properties of complex numbers is that the inverse α of
Complex Numbers 251
−1 a b
α = 2 2
− 2 i.
a +b a + b2
This allows us to immediately compute the quotient of two complex numbers. Instead
of remembering the formula, we invite the student to understand the procedure de-
scribed in the following example.
Example A.1.2 Let us consider the quotient of complex numbers: 3−2i 1−i
. We want
to express this quotient as a + bi for appropriate a and b. We proceed by multiplying
the numerator and denominator by the complex conjugate of the denominator. The
student will recognize the analogy with the procedure to rationalize the denominator
of a fraction:
3 − 2i 3 − 2i 1 + i (3 − 2i)(1 + i) 3 − 2i + 3i − 2i
2
5 1
= ⋅ = = = + i.
1−i 1−i 1+i ∣1 − i∣2 2 2 2
We conclude this section with a list of the properties of operations in complex num-
bers, the verification of which is left to the reader as an easy exercise.
• product associativity:
(αβ)γ = α(βγ) for each α, β, γ ∈ C;
= ρ1 ρ2 ((cos θ1 cos θ2 − sin θ1 sin θ2 ) + (cos θ1 sin θ2 + sin θ1 cos θ2 )i) (A.1)
The trigonometric form of a complex number allows us to compute its n-th roots fairly
quickly, through De Moivre’s formula. Thanks to formula (A.1) we can compute the
powers of a complex number:
α = ρ(cos θ + i sin θ)
= ρ (cos 2θ + sin 2θ i)
2 2
α
= ρ (cos 3θ + sin 3θ i)
3 3
α (A.2)
⋮ =
= ρ (cos nθ + sin nθ i).
n n
α
Example A.2.1 We want to determine all the cube roots of 1 + i. According to the
formula (A.3) they are given by:
√
2{ cos[(π/4 + 2kπ)/3] + sin[(π/4 + 2kπ)/3] i}
√
α2 = 2{ cos[(π/4 + 2π)/3] + sin[(π/4 + 2π)/3] i} =
√
= 2{ cos 3π
4
+ sin 3π
4
i}
√
α3 = 2{ cos[(π/4 + 4π)/3] + sin[(π/4 + 4π)/3] i} =
√
= 2{ cos 17π
12
+ sin 17π
12
i}.
Im
6
α2
r
r α1
-
Re
α3 r
We conclude this section by stating a very important result: the Fundamental The-
orem of Algebra, whose proof is particularly difficult. Since it is beyond the scope
of this book, we refer the reader to one of several specific texts (see for instance S.
Lang, Algebra [3]).
Theorem A.2.2 Any polynomial of degree n with complex coefficients
n n−1
p(x) = an x + an−1 x + ⋅ ⋅ ⋅ + a1 x + a0 , an , . . . , ao ∈ C
with α1 , . . . , αn ∈ C.
This specifically implies that a polynomial equation of degree n with coefficients in
C always has n complex solutions, although not necessarily distinct.
Let us see an example.
4
Example A.2.3 We want to find all the solutions of the equation x − 16 = 0. We
can immediately factor the polynomial as:
4 2 2
x − 16 = (x − 4)(x + 4) = (x − 2)(x + 2)(x + 2i)(x − 2i).
Therefore the zeros of the polynomial, corresponding to the solutions of the given
equation, are: ±2, ±2i. We can obtain the same result also by applying formula
(A.3):
2 = 2[cos(0) + sin(0) i],
Solutions of some
suggested exercises
255
256 Introduction to Linear Algebra
3.4.15 k ≠ 3.
3.4.16 a) {(−2, 1, 0), (0, 0, 1)}.
⟨v1 , v2 , v3 ⟩.
4.5.11 k = −5, k = 2.
4.5.12 k ≠ ±6.
4.5.13 k ≠ − 23 .
4.5.17 (a) k ≠ 0, k ≠ 1/10.
√ √for W : {(0,
10.8.4 Basis √ −1, 1,√0), (1/2, √
−1/2, 0,√1)}. Orthonormal basis for W :
{(0, −1/ (2/
⊥
√ 2, 1 2,
√ 0), √ 22, −1/ 22. − 1/ 22, 4/ 22)}. Orhonormal basis for W :
{(0, 1/ 3, −1/ 3, 1/ 3)}. c) No.
√ √
√ W : {(0,√1, −1, 1)}.√b) Orthonormal
√ basis for W : {(1/ 3, −1/ 3, 0,
⊥
10.8.7
√ a)√Basis for √
1/ 3), ( 2/3, 1/ 6, 0, −1/ 6), (0, 1/ 6, 2/3, 1/ 6)}.
5x − 9y + 4z = 1.
13.6.6 a) Parametric equations for r: x = t, y = 1+t, z = −1/3−(2/3)t. b) Cartesian
equation for π: x − 3y − 3z + 2 = 0. c) Parametric equations for s: x = 1 + t, y = −3t,
z = 1 − 3t.
13.6.7 a) Parametric equations
√ for r: x = −12 + 2t, y = 4, z = t. b) They are parallel.
c) The distance is (2/3) 6.
[4] S. Lang. Undergraduate Algebra. Springer Science & Business Media, 2005.
[5] S. Lang. Introduction to Linear Algebra. Springer Science & Business Media,
2012.
261
Index
2
R , 26 good ordering axiom, 235
Zn , 243 Gram-Schmidt algorithm, 188
greatest common divisor, 238
basis, 58
basis change for scalar products, 182 hermitian product, 204
Bezout identity, 239 non-degenerate, 205
bilinear application, 177 positive definite, 205
bilinear form, 177 homogeneous equation, 1
263
264 Index