CompleteLectureNotes Filled
CompleteLectureNotes Filled
ILLINOIS
Department of Mathematics
Course description
8 Linear algebra is used in most sciences and engineering areas,
because it allows modeling many natural phenomena, and ef-
ficiently computing with such models.
8 Mathematics is the language of science and engineering, but
coding is believing it.
8 But what is data science, if not linear algebra persevering?
This is a first course in linear algebra. This covers basic definitions
and algorithms of the subject needed in the higher level (engineer-
ing, science and economics) courses and more sophisticated math-
ematical techniques such as the Singular Value Decomposition.
In this course you learn the mathematical theory and how to imple-
ment it in Python. You will discover many of the striking modern
applications of linear algebra, such as Google’s PageRank algorithm,
image and audio compression schemes such as JPEG and MP3, au-
tomatic face recognition and other data science and machine learn-
ing algorithms.
Three disclaimers
o This is not a course that only teaches you how to compute stuff. Computer will always be
faster. Modern applications of linear algebra require a sophisticated understanding of
theory and methods of linear algebra, and learning these is the purpose of this course.
Some of it might look like abstract linear algebra. However, through the applications we
cover in the labs, you realizes that this indeed is applied linear algebra.
o If you already know some linear algebra, this course might look easy at the beginning.
Don’t be fooled into thinking it will stay like that. Even the material familiar to you will
be covered in more depth here. Furthermore, the exams will require a deeper
understanding of the concepts you already know something about. So it is a good idea to
take this course seriously from the beginning.
o MATH 257 is a new course. A significant part of the material for this new course is - not
surprisingly - new. So if you find a typo or an error in any part of this course, please let us
know by sending an email to the instructors. We appreciate your help, and are also happy
to hear any further comments or suggestions. Thank you!
Module videos Module videos are available at
i https://mediaspace.illinois.edu/playlist/0 w39larp3/
Further resources
Books
Philip N. Klein, Coding the Matrix: Linear Algebra through Applications to Computer
Science, first edition, Newtonian Press
Feryal Alayont, Steven Schlicker, Linear Algebra and Applications:An Inquiry-Based
Approach, scholarworks.gvsu.edu/books/21/
David Cherney, Tom Denton, Rohit Thomas, Andrew Waldron, Linear Algebra,
www.math.ucdavis.edu/∼ linear/
Gilbert Strang, Linear Algebra and its Applications, fourth edition, Cengage.
Videos
i Essence of Linear Algebra by 3Blue1Brown, on Youtube (highly recommended!)
i MIT lectures by Gilbert Strang, MIT Open Courseware
i Coding the Matrix videos by Philip Klein, on Youtube
Table of Contents.
Module 1: Introduction to Linear Systems Module 17: Column spaces and Nullspaces
Module 2: Matrices and Linear Systems Module 18: Abstract vector spaces
Module 3: Echelon forms of matrices Module 19: Linear independence
Module 4: Gaussian elimination Module 20: Basis and Dimension
Module 5: Linear combinations Module 21: The four fundamental subspaces
Module 6: Matrix vector multiplication Module 22: Graphs and adjacency matrices
Module 7: Matrix multiplication Module 23: Orthogonal complements
Module 8: Properties of matrix multiplication Module 24: Coordinates
Module 9: Elementary matrices Module 25: Orthonormal basis
Module 10: Inverse of a matrix Module 26: Linear Transformations
Module 11: Computing an inverse Module 27: Coordinate matrix
Module 12: LU decomposition Module 28: Determinants
Module 13: Solving using LU decomposition Module 29: Cofactor expansion
Module 14: Spring-mass systems Module 30: Eigenvectors and eigenvalues
Module 15: Inner products and orthogonality Module 31: Computing eigenvalues
Module 16: Subspaces of Rn Module 32: Properties of eigenvectors
Table of Contents (ctd).
Module 48: Review of complex numbers
Module 33: Markov matrices
Module 49: Complex linear algebra
Module 34: Diagonalization
Module 35: Powers of Matrices
Module 36: Matrix exponential
Module 37: Linear Differential Equations
Module 38: Projections onto lines
Module 39: Projections onto subspaces
Module 40: Least squares solutions
Module 41: Linear Regression
Module 42: Gram-Schmidt Method
Module 43: Spectral theorem
Module 44: Singular Value Decomposition
Module 45: Low rank approximations
Module 46: The Pseudo-inverse
Module 47: PCA
LINEAR ALGEBRA
Introduction to Linear Systems
ILLINOIS
Department of Mathematics
Definition. A linear equation is a equation of the form
a1 x1 + . . . + an xn = b
where a1 , ..., an , b are numbers and x1 , ..., xn are variables.
Example. Which of the following equations are linear equations (or can be rearranged to
become linear equations)?
Solution.
Definition. A linear system is a collection of one or more linear equations involving the
same set of variables, say, x1 , x2 , ..., xn .
A solution of a linear system is a list (s1 , s2 , ..., sn ) of numbers that makes each equation in
the system true when the values s1 , s2 , ..., sn are substituted for x1 , x2 , ..., xn , respectively.
Example. Two equations in two variables:
x1 + x2 = 1 (I)
−x1 + x2 = 0. (II)
Multiply (III) by 2: x2
−2x1 − 2x2 = −6
(V)/(VI)
x1
(V) and (VI) has the same solutions. Any
value of x1 works, just set x2 := 3 − x1 .
Equation has infinitely many solutions.
Theorem 1. A linear system has either
one unique solution or no solution or infinitely many solutions.
Definition. The solution set of a linear system is the set of all solutions of the linear system.
Two linear systems are equivalent if they have the same solution set.
The general strategy is to replace one system with an equivalent system that is easier to solve.
Example. Consider
x1 − 3x2 = 1 (VII)
−x1 + 5x2 = 3 (VIII)
Transform this linear system into another easier equivalent system.
Solution.
Add (VII) to (VIII):
x1 − 3x2 = 1 (VII)
2x2 = 4 (IX)
Thus x2 = 2 and x1 = 7.
LINEAR ALGEBRA
Matrices and Linear Systems
ILLINOIS
Department of Mathematics
Definition. An m × n matrix is a rectangular array of numbers with m rows and n columns.
Example. Let’s give a few examples.
Solution.
1 2
√ 1 −2 3 4
1 2 1 2 3
, , 3 4, 1 + 5 , 5 −6 7 8
3 4 2 1 4
5 6 −9 −10 11 12
Remark. Indeed, every row operation is reversible. We already saw how to reverse the
replacement operator. The scaling operatior R2 → cR2 is reversed by the scaling operator
R2 → c1 R2 . Row interchange R1 ↔ R2 is reversible by performing it twice.
Definition. Two matrices are row equivalent, if one matrix can be transformed into the
other matrix by a sequence of elementary row operations.
Theorem 2. If the augmented matrices of two linear systems are row equivalent, then the
two systems have the same solution set.
LINEAR ALGEBRA
Echelon forms of matrices
ILLINOIS
Department of Mathematics
Definition. A matrix is in echelon form (or row echelon form) if
1. All nonzero rows (rows with at least one nonzero element) are above any rows of all zeros.
2. the leading entry (the first nonzero number from the left) of a nonzero row is always
strictly to the right of the leading entry of the row above it.
Example. Are the following matrices in echelon form? Circle the leading entries.
3 1 2 0 5 0 2 0 1 4
0 2 0 1 4 3 1 2 0 5
a)
0
b)
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
2 −2 3 0 1 √3
c) 0 5 0 d) 0 0 2
0 0 52 0 0 0
Definition. A matrix is in row reduced echelon form (or: reduced echelon form, or:
RREF) if it is in echelon form and
3. The leading entry in each nonzero row is 1.
4. Each leading entry is the only nonzero entry in its column.
Example.
Are the following matrices in reduced
echelon form?
0 1 3 0 0 2 5 0 0 6
1 0 1 12 0 0
0 0
0 −2 1 0 −2 3 2 −24
a)
0 0 0 0 1 −3 4 0 0 5 b) 0 1 −2 2 0 −7
0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 4
0 0 0 0 0 0 0 0 1 1
Theorem 3. Each matrix is row-equivalent to one and only one matrix in reduced echelon
form.
Definition. We say a matrix B is the reduced echelon form (or: the RREF) of a matrix A
if A and B are row-equivalent and B is in reduced echelon form.
Question. Is each matrix also row-equivalent to one and only one matrix in echelon from?
Solution.
1 0 1 2
No. For example, and are row-equivalent and both in echelon form.
0 1 0 1
1 4 5 −9 −7 1 4 5 −9 −7
; −1 −2 −1 3 1 ; 0 2 4 −6 −6
R1 ↔R3 R2 →R2 +R1
0 −3 −6 4 9 0 −3 −6 4 9
1 4 5 −9 −7
; 0 2 4 −6 −6
R3 →R3 +1.5R2
0 0 0 −5 0
The first, third and fifth column of the coefficient matrix are pivot columns.
ILLINOIS
Department of Mathematics
Goal. Solve linear systems for the pivot variables in terms of the free variables (if any) in the
equation.
Algorithm. (Gaussian Elimination) Given a linear system,
(1) Write down the augmented matrix.
(2) Find the reduced echelon form of the matrix.
(3) Write down the equations corresponding to the reduced echelon form.
(4) Express pivot variables in terms of free variables.
Example. Find the general solution of
3x1 −7x2 +8x3 −5x4 +8x5 = 9
C.F. Gauß (1777–1855)
3x1 −9x2 +12x3 −9x4 +6x5 = 15
Solution.
3 −7 8 −5 8 9 1 0 −2 3 5 −4
Augmented matrix:
3 −9 12 −9 6 15 RREF 0 1 −2 2 1 −3
Solution.
1 0 −2 3 5 −4
RREF of the augmented matrix:
0 1 −2 2 1 −3
Write down the equations corresponding to the reduced echelon form:
ILLINOIS
Department of Mathematics
a11 a12 ··· a1n b11 b12 ··· b1n
a21 a22 ··· a2n b21 b22 ··· b2n
Definition. Consider m × n-matrices A =
.. .. .. .. ,
and B =
.. .. .. .. .
. . . . . . . .
am1 am2 · · · amn bm1 bm2 · · · bmn
a) The sum of A + B is b)
The product cA fora scalar c is
ca11 ca12 · · · ca1n
a11 + b11 a12 + b12 · · · a1n + b1n
a21 + b21 a22 + b22 · · · a2n + b2n ca21 ca22 · · · ca2n
.. .. ..
.. .. .. .. ..
. . . . . . . .
am1 + bm1 am2 + bm2 · · · amn + bmn cam1 cam2 · · · camn
Example. Calculate
1 0 2 3 3 3 2 1 0 10 5 0
+ = 5 =
5 2 3 1 8 3 3 1 −1 15 5 −5
Warning. Addition is only defined if A has the same number of columns as B and the same
number of rows as B.
Definition. A column vector is an m × 1-matrix. A row vector is a 1 × n-matrix.
Example. Give a few ex- 0
1
amples of column and row , 1 2 , −2, 1 2 3 4 5 6 7 8 9
2
vectors. 3
Remark. The transpose of a column vector is a row vector and vice versa.
Definition. The linear combination of m × n-matrices A1 , A2 , . . . , Ap with coefficients
c1 , c2 , . . . , cp is defined as
c1 A1 + c2 A2 + · · · + cp Ap .
Example. Consider m × n-matrices A1 and A2 . Give examples of linear combinations of these
two matrices.
Solution.
1
3A1 + 2A2 , A1 − 2A2 , 3 A1 = 31 A1 + 0A2 , 0 = 0A1 + 0A2
Definition. The span (A1 , . . . , Ap ) is defined as the set of all linear combinations of
A1 , . . . , Ap . Stated another way:
span (A1 , . . . , Ap ) := {c1 A1 + c2 A2 + · · · + cp Ap : c1 , . . . , cp scalars}.
Definition. We denote the set of all column vectors of length m by Rm .
1 4 −1
Example. Let a1 = 0 , a2 = 2 and b = 8 . Is b a linear combination of a1 , a2 ?
3 14 −5
Solution.
Find x1 , x2 such that
1 4 −1
1
4 −1 ; 0 2 8
R3 →R3 −R2
x1 0 + x2 2 = 8
0 0 −10
3 14 −5 System is inconsistent. Thus b is not a linear
combination of a1 , a2 .
1x1 + 4x2 = −1
0x1 + 2x2 = 8 b
a1
3x1 + 14x2 = −5. span(a1 , a2 )
1 4 −1 1 4 −1
a2
0 2 8 ; 0 2 8
R3 →R3 −3R1
3 14 −5 0 2 −2
Key observation from the previous example: Solving linear systems is the same as finding
linear combinations!
Theorem 5. A vector equation
x1 a1 + x2 a2 + · · · + xn an = b
has the same solution set as the linear system whose augmented matrix is
a1 a2 · · · an | b
ILLINOIS
Department of Mathematics
Definition. Let x be a vector in Rn and A = a1 . . . an an m × n-matrix. We define the
product Ax by
Ax = x1 a1 + x2 a2 + . . . + xn an .
Remark.
○ Ax is a linear combination of the columns of A using the entries in x as coefficients.
○ Ax is only defined if the number of entries of x is equal to the number of columns of A.
1 2
2 0 2
Example. Consider A = ,B= 0 1 ,x=
. Determine Ax and Bx.
1 1 3
3 5
Solution.
2 0 2 2 0 4
Ax = =2 +3 =
1 1 3 1 1 5
1 2 1 2 8
2
Bx = 0 1 = 2 0 + 3 1 = 3
3
3 5 3 5 21
Example. Consider the following vector equation
1 3 0
x1 + x2 = .
2 4 2
Find a 2 × 2 matrix A such that (x1 , x2 ) is a solution to the above equation if and only if
x 0
A 1 = ?
x2 2
Solution.
1 3 1 3 x1 1 3
Take A = ; Ax = = x1 + x2 .
2 4 2 4 x2 2 4
x1 0 1 3 0
;A = ⇐⇒ x1 + x2 =
x2 2 2 4 2
Theorem 6. Let A = a1 . . . an be an m × n-matrix and b in Rm . Then the following
are equivalent
○ (x1 , x2 , . . . , xn ) is a solution of the vector equation x1 a1 + x2 a2 + · · · + xn an = b
x1
..
○ . is a solution to the matrix equation Ax = b.
xn
○ (x1 , x2 , . . . , xn ) is a solution of the system with augmented matrix A b
Notation. We will write Ax = b for the system of equations with augmented matrix A b .
x Ax
A
0 1
Example. Consider the matrix A = . What does this machine do?
1 0
Solution.
x2
x
Let x = 1 be our input.
x2
0 1 x1 0 1 x Av
Ax = = x1 +x2 = 2 .
1 0 x2 1 0 x1
v x1
So the machine A switches the entries of
Aww
the vector x.
Geometrically speaking, this machine re-
flects across the x1 = x2 -line.
1 0
Example. Consider the matrix B = . What does this machine do?
0 0
Solution.
x2
x
Let x = 1 be our input.
x2
1 0 x1 1 0 x
Bx = = x1 +x2 = 1 . v
0 0 x2 0 0 0 Bw
w Bv x1
So the machine B replaces the second entry
of the vector x by 0.
Geometrically speaking, this machine
projects a vector onto the x1 -axis.
Composition of machines. Let A be an m × n matrix and B be an k × l matrix.Now we can
compose the two machines:
x Ax B(Ax)
A B
x2 x2
v v
ABv Av
Bv x1 BAv x1
LINEAR ALGEBRA
Matrix multiplication
ILLINOIS
Department of Mathematics
Definition. Let A be an m × n-matrix and let B = b1 . . . bp be an n × p-matrix. We
define
AB := [Ab1 Ab2 · · · Abp ]
4 −2
2 −3
Example. Compute AB where A = 3 −5 and B =
.
6 −7
0 1
Solution.
4 −2 4 −2 −4
2
Ab1 = 3 −5 = 2 3 + 6 −5 = −24
6
0 1 0 1 6
4 −2 4 −2 2 −4 2
−3
= −3 3 − 7 −5 = 26 ; AB = Ab1 Ab2 = −24 26
Ab2 = 3 −5
−7
0 1 0 1 −7 6 −7
Remark. Ab1 is a linear combination of the columns of A and Ab2 is a linear combination
of the columns of A. Each column of AB is a linear combination of the columns of A using
coefficients from the corresponding columns of B.
Definition. Let A be an m × n-matrix and let B be an n × p-matrix. We define
If this is the case, then AB has as many rows as A and as many columns as B.
x Bx A(Bx)
B A
x (AB)x
AB
Example. Consider
2 0 1 2 x
A= , B= , x= 1
1 1 0 1 x2
Compute (AB)x and A(B(x)). Are these the same?
Solution.
1 2 x1 1 2 x1 + 2x2
Bx = = x1 + x2 =
0 1 x2 0 1 x2
2 0 x1 + 2x2 2 0 2x1 + 4x2
A(Bx) = = (x1 + 2x2 ) + (x2 ) =
1 1 x2 1 1 x1 + 3x2
x1 2 0 2 0
(AB)x = Ab1 Ab2 = x1 Ab1 + x2 Ab2 = x1 (1 +0 ) + x2 (2 +1 )
x2 1 1 1 1
2 4 2x1 + 4x2
= x1 + x2 =
1 3 x1 + 3x2
ILLINOIS
Department of Mathematics
Definition. The identity matrix In of size n is defined as
1 0 ... 0
0 1 . . . 0
In = . .. . . .. .
.. . . .
0 0 ... 1
Theorem 8. Let A be an m × n matrix and let B and C be matrices for which the indicated
sums and products are defined.
(a) A (BC ) = (AB)C (associative law of multiplication)
(b) A (B + C ) = AB + AC , (B + C ) A = BA + CA (distributive laws)
(c) r (AB) = (rA)B = A(rB) for every scalar r ,
(d) A (rB + sC ) = rAB + sAC for every scalars r , s (linearity of matrix multiplication)
(e) Im A = A = AIn (identity for matrix multiplication)
Warning. Properties above are analogous to properties of real numbers. But NOT ALL
properties of real number also hold for matrices.
1 1 1 0
Example. Let A = ,B= . Determine AB and BA. Are these matrices the
0 1 1 1
same?
Solution.
1 1 1 0
2 1
AB = =
0 1 1 1
1 1
1 0 1 1 1 1
BA = =
1 1 0 1 1 2
No. These matrices are not the same: AB 6= BA. Matrix multiplication is not commutative.
1 2
1 2 0
Example. Let A = ,B= 0 1 . Compute (AB)T , AT B T and B T AT .
3 0 1
Solution. −2 4
1 2
1 2 0 1 4
AB = 0 1 =
3 0 1 1 10
−2 4
T 1 1
(AB) =
4 10
1 3 7 3 10
1 0 −2
AT B T = 2 0 = 2 0 −4
2 1 4
0 1 2 1 4
1 3
T T 1 0 −2 1 1
B A = 2 0 =
2 1 4 4 10
0 1
Theorem 9. The transpose of a product is the product of transposes in opposite order:
(AB)T = B T AT
Definition. We write Ak for A · · · A, k-times; that is Ak is obtained by multiplying A k-times
with itself.
Question. For which matrices A does Ak make sense? If A is m × n what can m and n be?
Solution.
Remark. In general, calculating high powers of large matrix in this way is very hard. We will
later learn a clever way to do this efficiently.
LINEAR ALGEBRA
Elementary matrices
ILLINOIS
Department of Mathematics
Definition. An elementary matrix is one that is obtained by performing a single elementary
row operation on an identity matrix.
A permutation matrix is one that is obtained by performing row exchanges on an identity
matrix.
1 0 0 1 0 0 1 0 0
Example. Consider E1 = 0 3 0 , E2 = 0 0 1 , E3 = 0 1 0 . Are these
0 0 1 0 1 0 2 0 1
matrices elementary matrices?
Solution.
1 0 0 1 0 0
0 1 0 ; 0 3 0
R2 →3R2
0 0 1 0 0 1
1 0 0 1 0 0
0 1 0 ; 0 0 1
R2 ↔R3
0 0 1 0 1 0
1 0 0 1 0 0
0 1 0 ; 0 1 0
R3 →R3 +2R1
0 0 1 2 0 1
Question. Let A be a 3 × 3-matrix. What happens to A if you multiply it by one of E1 , E2
and E3 ?
Solution.
1 0 0 a11 a12 a13 a11 a12 a13
E1 A = 0 3 0 a21 a22 a23 = 3a21 3a22 3a23
0 0 1 a31 a32 a33 a31 a32 a33
1 0 0 a11 a12 a13 a11 a12 a13
E2 A = 0 0 1 a21 a22 a23 = a31 a32 a33
0 1 0 a31 a32 a33 a21 a22 a23
1 0 0 a11 a12 a13 a11 a12 a13
E3 A = 0 1 0 a21 a22 a23 = a21 a22 a23
2 0 1 a31 a32 a33 a31 + 2a11 a32 + 2a12 a33 + 2a13
Find elementary matrices E1−1 and E2−1 such that A = E1−1 E2−1 B.
Solution.
0 1 1 2 1 2
A= 1 2
0 1 0 1 = B
R ↔R R →R +2R1
2 4 1 2 2 4 3 3 0 0
0 1 0 1 0 0
Set E1−1 = 1 0 0 and E2−1 = 0 1 0.
0 0 1 2 0 1
LINEAR ALGEBRA
Inverse of a matrix
ILLINOIS
Department of Mathematics
The inverse of a real number a is denoted by a−1 . For example, 7−1 = 1/7 and
7 · 7−1 = 7−1 · 7 = 1.
Remark. Not all real numbers have inverse. 0−1 is not well defined, since there is no real
number b such that 0 · b = 1.
Remember that the identity matrix In is the n × n-matrix
1 0 ··· 0
0 1 · · · 0
In = . . .
. . . ..
. . . .
0 0 ··· 1
CA = AC = In
BA = AB = In (I)
CA = AC = In (II)
(B −1 A−1 )(AB) = B −1 IB = B −1 B = I X
(AB)(B −1 A−1 ) = AIA−1 = AA−1 = I X
(c)
It is unclear whether this means AB −1 or B −1 A, and these two matrices are different.
A(A−1 b) = (AA−1 )b = In b = b A
w = In w = A−1 Aw = A−1 b. x1
A must have n pivots, because otherwise Ax = b would not have a solution of each b.
LINEAR ALGEBRA
Computing the inverse
ILLINOIS
Department of Mathematics
Question. When is the 1 × 1 matrix a invertible?
Solution.
1
When a 6= 0. Its inverse is [ a ].
a b
Theorem 15. Let A = . If ad − bc 6= 0, then A is invertible and
c d
−1 1 d −b
A = .
ad − bc −c a
A = A0 ; A1 ; · · · ; Am = In
Em Em−1 . . . E1 A = In .
Thus
A−1 = Em Em−1 . . . E1 = Em Em−1 . . . E1 In .
Theorem 17. Suppose A is invertible. The every sequence of elementary row operations
that reduces A to In will also transform In to A−1 .
Algorithm.
○ Place A and I side-by-side to form an augmented matrix [ A | I ].
This is an n × 2n matrix (Big Augmented Matrix), instead of n × (n + 1).
○ Perform row operations on this matrix (which will produce identical operations on A and
I ).
○ By Theorem:
[ A | I ] will row reduce to I | A−1
or A is not invertible.
2 0 0
Example. Find the inverse of A = −3 0 1, if it exists.
0 1 0
Solution.
1
2 0 0 1 0 0 1 0 0 2 0 0
A I = −3 0 1 0 1 0 ∼ · · · ∼ 0 1 0 0 0 1
3
0 1 0 0 0 1 0 0 1 2 1 0
Example. Let’s do the previous example step by step.
Solution.
2 0 0 1 0 0
[ A | I ] = −3 0 1 0 1 0
0 1 0 0 0 1
1 0 0 12 0 0
; −3 0 1 0 1 0
R1 → 12 R1
0 1 0 0 0 1
1 0 0 12 0 0
; 0 0 1 3 1 0
2
R2 →R2 +3R1
0 1 0 0 0 1
1 0 0 12 0 0
; 0 1 0 0 0 1
R2 ↔R3
0 0 1 32 1 0
LINEAR ALGEBRA
LU decomposition
ILLINOIS
Department of Mathematics
Definition. An n × n matrix A is called
? ? ? ? ?
0 ? ? ? ?
0
upper triangular if it is of the form 0 ? ? ? ,
. .
0 . . ..
0 0
0 0 0 0 ∗
? 0 0 0 0
? ? 0 0 0
?
lower triangular if it is of the form ? ? 0 0 .
. . ..
? ? ? . .
? ? ? ? ?
Example. Give a few examples of upper and lower triangular matrices!
Solution.
Upper triangular
matrices:
Lower triangular matrices:
2 3 1 2 0 0
1 0 0 0 5 0
, 0 3 4 , , 0 3 0
0 1 0 1 2 1
0 0 2 1 −2 2
0 2
Neither upper nor lower triangular:
2 1
Theorem 18. The product of two lower (upper) triangular matrices is lower (upper)
triangular.
Example. Consider the row operation Ri → Ri + cRj where j < i. What can you say about
the elementary matrix E corresponding this row operation? What about E −1 ?
Solution.
Such a matrix is lower-triangular! Take R3 → R3 + cR2 , then
1 0 0
E = 0 1 0 .
0 c 1
1 0 0
Recall that E −1 = 0 1 0. So E −1 is also lower-triangular.
0 −c 1
Remark. The inverse of a lower (upper) triangular matrix (if it exists) is again lower (upper)
triangular.
Definition. A matrix A has an LU decomposition if there is a lower triangular matrix L and
a upper triangular matrix U such that A = LU.
Theorem 19. Let A be an n × n-matrix. If A can be brought to echelon form just using row
operations of the form Ri → Ri + cRj where j < i, then A has an LU-decomposition.
Proof.
A ; A1 ; A2 · · · ; Am = U
using row operations of the form Ri → Ri + cRj where j < i. Then there are lower-triangular
matrices E1 , . . . , Em such that
Em Em−1 . . . E1 A = U.
Multiply the equation by E1−1 . . . Em−1
−1
Em−1 , we get
A = E1−1 . . . Em−1
−1
Em−1 U.
| {z }
=:L
2 1 1 2 1 1 2 1 1 2 1 1
4 −6 0 ; 0 −8 −2 ; 0 −8 −2 ; 0 −8 −2
R2 →R2 −2R1 R3 →R3 +R1 R3 →R3 +R2
−2 7 2 −2 7 2 0 8 3 0 0 1
2 1 1 1 0 0 1 0 0 1 0 0 2 1 1
U = 0 −8 −2 = 0 1 0 0 1 0 −2 1 0 4 −6 0 = E3 E2 E1 A
0 0 1 0 1 1 1 0 1 0 0 1 −2 7 2
1 0 0 2 1 1 2 1 1
E1−1 E2−1 E3−1 U = 2 1 0 0 −8 −2 = 4 −6 0 = A
−1 −1 1 0 0 1 −2 7 2
| {z }
=:L
1 0 0
Remark. In previous example L = `21 1 0 where `ij is the factor between pivot and
`31 `32 1
the entry you want to make zero in the elimination process. This works in general if you do the
row operations in the right order.
Step 1:
2 1 1 2 1 1
A = 4 −6 0 ; = 0 −8 −2
−2 7 2 R2 →R2 −2 R1 ,R3 →R3 + 1 R1 0 8 3
Then set `21 = 2 and `31 = −1.
Step 2:
2 1 1 2 1 1
0 −8 −2 ; = 0 −8 −2 .
0 8 3 R3 →R3 + 1 ·R2 0 0 1
2 1 1 1 0 0 1 0 0
Set `32 = −1,‘ U := 0 −8 −2 , and L := `21 1 0 = 2 1 0 .
0 0 1 `31 `32 1 −1 −1 1
1 2 2
Example. Determine the LU-decomposition of 4 4 4.
4 4 8
Solution.
1 2 2 1 2 2 1 2 2
4 4 4 ; 0 −4 −4 ; 0 −4 −4 .
4 4 8 R2 →R2 −4 R1 ,R3 →R3 −4 R1 0 −4 0 R3 →R3−1 R2 0 0 4
Thus
1 2 2 1 0 0
U := 0 −4 −4 , L := 4 1 0 .
0 0 4 4 1 1
Check that A = LU.
Remark. It is important that you do the row operations in the right order!
LINEAR ALGEBRA
Solving linear systems using LU decomposition
ILLINOIS
Department of Mathematics
Theorem 20. Let A be an n × n-matrix such that A = LU, where L is a lower triangular
matrix and U is a upper triangular matrix, and let b ∈ Rn . In order to find a solution of the
linear system
Ax = b,
it is enough to find a solution of the linear system
Ux = c,
where c satisfies Lc = b.
Proof.
If Lc = b and Ux = c, then
Ax = (LU)x = L(Ux) = Lc = b
On the other hand, suppose Ax = b. We take c to be the vector Ux. Then Lc is equal to
L(Ux) = Ax = b, so in total we have both Ux = c and Lc = b.
2 1 1 x1 5
Example. Find a solution to the linear system 4
−6 0 x2 = −2.
−2 7 2 x3 9
Solution.
2 1 1 1 0 0 2 1 1
Recall the LU decomposition 4 −6 0 = 2 1 0 0 −8 −2 .
−2 7 2 −1 −1 1 0 0 1
| {z }| {z }
L U
1 0 0 c1 5
2 1 1 x1 5
2 1 0 c2 = −2 0 −8 −2 x2 = −12 .
−1 −1 1 c3 9
| {z } | {z } 0 0 1 x3 2
L b | {z } | {z }
U c
Forward substitution:
Backwards substitution:
Row 1 → c1 = 5.
Row 3 → x3 = 2.
Row 2 → 10 + c2 = −2; c2 = −12.
Row 2 → −8x2 − 4 = −12; x2 = 1.
Row 3 → −5 + 12 + c3 = 9; c3 = 2.
Row 1 → 2x1 + 1 + 2 = 5; x1 = 1.
Question. Why do we care about LU decomposition if we already have Gaussian elimination?
Solution.
Imagine you solve many linear systems Ax = b where A always stays the same, but b varies.
You have to do the LU-factorization of A only once, and not for every b.
Finding solutions of the two triangular systems above takes 2n2 − n arithmetic operations.
You need (4n3 + 9n2 − 7n)/6 arithmetic operations for Gaussian elimination.
Theorem 21. Let A be n × n matrix. Then there is a permutation matrix P such that PA
has an LU-decomposition.
Proof.
If A can be brought to echelon form with the help of row exchanges, we can do those exchange
first.
So there is a permutation matrix P such that PA can be brought to echelon form without row
exchanges.
0 0 1
Example. Let A = 1 1 0. Find a permutation matrix P such that PA has a LU
2 1 0
decomposition.
Solution.
0 0 1 2 1 0 0 0 1
A = 1 1 0 ; 1 1 0 Set P = 0 1 0 .
R1 ↔R3
2 1 0 0 0 1 1 0 0
2 1 0 2 1 0
PA = 1 1
0 ; 0 .5 0
R2 →R2 −.5R1
0 0 1 0 0 1
| {z }
=:U
Thus
2 1 0
1 0 0
2 1 0
PA = 1 1 0 = .5 1 0 0 .5 0
0 0 1 0 0 1 0 0 1
| {z }
=:L
LINEAR ALGEBRA
Spring-mass systems
ILLINOIS
Department of Mathematics
Example. Consider the following spring-mass system, consisting of five masses and six
springs fixed between two walls.
m1 m2 m3 m4 m5
Goal.
○ Add (steady) applied forces f1 , f2 , f3 , f4 , f5 a (steady) applied force on mass i.
○ Compute the displacements u1 , u2 , u3 , u4 , u5 of the fives masses.
u1 u2 u3 u4 u5
m1 m2 m3 m4 m5
f1 f2 f3 f4 f5
Equilibrium. In the equilibrium, the forces at each mass add up to 0.
Hooke’s law. The force F needed to extend or compress a spring by some distance u is
proportional to that distance; that is F = −ku, where k is a constant factor characteristic of
the spring, called its stiffness. Let ki be the stiffness of the i-th spring.
Springs:
○ u1 > 0 ; spring 1 is extended.
○ u2 − u1 > 0 ; spring 2 is extended.
u1 u2 Forces
u3 at m1 : u4 u5
f1 f2 ○ applied forces f1 (if positive, pushes m1 to the right).
m1 m2 ○m spring 1: −k1 u1 (since
m4 u1 > 0, pulls m m15 to the left).
3
○ spring 2: k2 (u2 − u1 ) (since u2 − u1 > 0, pulls m1 to
f3
the right). f4 f5
−k1 u1 k2 (u2 − u1 )
Equilibrium:
m1 m2 m3 f3 ○ spring m
2: −k2 (u2 − u1 ) (since (u2 − u1 ) > 0, pulls
m5
4 left).
m2 to the
○ spring 3: k3 (u3f4− u2 ) (since u3 − fu52 > 0, pulls m2 to
−k2 (u2 − u1 ) k3 (u3 − u2 ) the right).
Equilibrium at m2 :
f2 − k2 (u2 − u1 ) + k3 (u3 − u2 ) = 0
; −k2 u1 + (k2 + k3 )u2 − k3 u3 = f2
Springs:
○ u5 − u4 > 0 ; spring 5 is extended.
○ u5 > 0 ; spring 6 is compressed.
Forces at m5 :
u4 u5
f4 f5 ○ applied forces f5 (if positive, pushes m5 to the right).
○ spring 5: −k5 (u5 − u4 ) (since (u5 − u4 ) > 0, pulls
m4 m5 m5 to the left).
○ spring 6: −k6 u5 (since u5 > 0, pushes m5 to the
−k5 (u5 − u4 ) −k6 u5 left).
Equilibrium at m5 :
f5 − k5 (u5 − u4 ) − k6 u5 = 0
; (k5 + k6 )u5 − k5 u4 = f5
Equilibrium equations.
(k1 + k2 )u1 − k2 u2 = f1 k1 + k2 −k2 0 0 0 f1
−k2 u1 + (k2 + k3 )u2 − k3 u3 = f2
−k2 k2 + k3 −k3 0 0 f2
0 −k3 k3 + k4 −k4 0 f3
−k3 u2 + (k3 + k4 )u3 − k4 u4 = f3
0 0 −k4 k4 + k5 −k5 f4
−k4 u3 + (k4 + k5 )u4 − k5 u5 = f4 0 0 0 −k5 k5 + k6 f5
−k5 u4 + (k5 + k6 )u5 = f5 .
Remark.
○ The purpose of this example is not to find the precise solutions to these equations, but
rather to show you that linear equations appear naturally in engineering.
○ Finite element method: object is broken up into many small parts with connections
between the different parts ; gigantic spring-mass system where the forces from spring
correspond to the interaction between the different parts.
○ In practice: millions of equations, not just five!
○ Solve not for a single force vector f, but for many different vectors. Thus the coefficient
matrix stays the same ; LU-decomposition can make a difference in how quickly such
systems can be solved.
LINEAR ALGEBRA
Inner Product and Orthogonality
ILLINOIS
Department of Mathematics
Definition. The inner product of v, w ∈ Rn is
v · w = vT w.
v1 w1
.. ..
Example. If v = . and w = . , then v · w is ...
vn wn
Solution.
v1 w1 + v2 w2 + · · · + vn wn
Question. Why is v · w = w · v?
Solution.
Note that (v · w)T = v · w, because it is a 1 × 1-matrices. Thus
v · v = v1 v1 + · · · + vn vn = v12 + · · · + vn2 ≥ 0.
|{z} |{z}
≥0 ≥0
Theorem 22. Let u, v and w be vectors in Rn , and let c be any scalar. Then
(a) u · v = v · u
(b) (u + v) · w = u · w + v · w
(c) (cu) · v =c (u · v) = u · (cv)
(d) u · u ≥ 0, and u · u = 0 if and only if u = 0.
Definition. Let v, w ∈ Rn . x2
The norm (or length) of v is
√ q v−w
kvk = v · v = v12 + · · · + vn2 . w v
v
2 u 2 2 2 0 2
u
q √ √
k−1k = t−1 · −1 = 22 + (−1)2 + 12 = 6, dist(0 , 1) = k−1k = 6.
u
1 1 1 1 0 1
Definition. Let v, w ∈ Rn . We say v and w are orthogonal if v · w = 0.
Theorem 23. Let v, w ∈ Rn be non-zero. Then v and w are orthogonal if and only if they
are perpendicular (ie. if they form a right angle).
Proof.
v−w v⊥w
⇐⇒ kvk2 + kwk2 = kv − wk2
⇐⇒ v · v + w · w = (v − w) · (v − w)
w ⇐⇒ v · v + w · w = v · v − 2v · w + w · w
v ⇐⇒ v · w = 0
1 1 1 1
Example. Are , orthogonal? Are , orthogonal?
1 −1 1 −2
Solution.
1 1 1 1
· = 1 · 1 + 1 · (−1) = 0, · = 1 · 1 + 1 · (−2) = −1
1 −1 1 −2
Definition. A set of vectors form an orthonormal set if this set is an orthogonal set and all
vectors in the set are unit vectors.
1 1
Example. Let v1 = , v2 = . Do v1 and v2 form an orthonormal set?
1 −1
Solution.
ILLINOIS
Department of Mathematics
Definition. A non-empty subset H of Rn is a subspace of Rn if it satisfies the following two
conditions:
○ If u, v ∈ H, then the sum u + v ∈ H. (H is closed under vector addition).
○ If u ∈ H and c is a scalar, then cu ∈ H. (H is closed under scalar multiplication.)
Theorem 24. Let v1 , v2 , . . . , vm ∈ Rn . Then Span (v1 , v2 , . . . , vm ) is a subspace of Rn .
Proof.
Let
u = c1 v1 · · · + cm vm , and w = d1 v1 · · · + dm vm .
Then
and
cu = c(c1 v1 · · · + cm vm ) = cc1 v1 · · · + ccm vm .
x
Example. Is H = : x ∈ R a subspace of R2 ?
x
Solution.
x2 a b
H Let , be in H.
a b
a b a+b
○ + = is in H.
x1 a b a+b
a ca
○ c = is in H.
a ca
0
Example. Let Z = . Is Z a subspace of R2 ?
0
Solution.
Yes.
0 0 0+0 0 c0 0
+ = ∈ Z, c = = ∈ Z.
0 0 0+0 0 c0 0
x
Example. Let H = : x ∈ R . Is H a subspace of R2 ?
x +1
Solution.
x2 H
H is not a subspace of R2 , because H is not closed under addi-
tion:
0 −1 0 −1
x1 , ∈ H, but + ∈
/ H.
1 0 1 0
x
Example. Is U = ∈ R2 : x 2 + y 2 < 1 a subspace of R2 ?
y
Solution.
x2 U is not a subspace, because U is not closed under scalar multi-
plication:
U
x1 −0.5 −0.5 −1
∈ U, but 2 = ∈
/ U.
0.5 0.5 1
x
Example. Consider V = ∈ R2 : xy ≥ 0 .
y
Solution.
x2 V is not a subspace, because it is not closed under addition:
V
0 −1 0 −1 −1
, ∈ V , but + = ∈
/ V.
x1 1 0 1 0 1
ILLINOIS
Department of Mathematics
Definition. The column space, written as Col(A), of an m × n matrix A is the set of all
linear combinations of the columns of A. If A = a1 a2 · · · an , then
Col(A) = span (a1 , a2 , . . . , an ).
1 0
Example. Describe the column space of A = .
0 0
Solution.
1 0 1
Col(A) = span( , ) = span( ). This is the x1 axis in R2 .
0 0 0
A (u + v) = Au + Av = 0 + 0 = 0.
A (v − w) = Av − Aw = b − b = 0.
ILLINOIS
Department of Mathematics
○ The most important property of column vectors in Rn is that you can take linear
combinations of them.
○ There are many mathematical objects X , Y , . . . for which a linear combination cX + dY
make sense, and have the usual properties of linear combination in Rn .
○ We are going to define a vector space in general as a collection of objects for which linear
combinations make sense.The objects of such a set are called vectors.
x2 f (x) = x + 1
3 (f + g )(x) = sin x + x
2
1 1 1
2· + 2·
x1
−2−1 1 2 g (x) = sin x − 1
−1
2g (x) = 2 sin x − 2
=
Definition. A vector space is a non-empty set V of objects, called vectors, for which linear
combinations make sense. More precisely: on V there are defined two operations, called
addition and multiplication by scalars (real numbers),satisfying the following axioms for all
u, v , w ∈ V and for all scalars c, d ∈ R:
○ u + v is in V . (V is “closed under addition”.)
○ u + v = v + u.
○ (u + v) + w = u + (v + w).
○ There is a vector (called the zero vector) 0V in V such that u + 0V = u.
○ For each u in V , there is a vector −u in V satisfying u + (−u) = 0V .
○ cu is in V . (V is “closed under scalar multiplication”.)
○ c(u + v) = cu + cv.
○ (c + d)u = cu + du.
○ (cd)u = c(du).
○ 1u = u.
In particular, we may talk about linear combinations and span within a vector space, e.g.
3u + 2v or span(u, v).
Example. Explain how the set of functions R → R is a vector space.
Solution.
Let f , g be two function from R to R. We define f + g by
y
g
(f + g )(x) = f (x) + g (x).
Pn = {a0 + a1 t + a2 t 2 + · · · + an t n : a0 , . . . , an ∈ R}.
Explain why it is a vector space. Can you think of it as a subspace of the vector space of all
functions R → R?
Solution.
ILLINOIS
Department of Mathematics
Definition. Vectors v1 , . . . , vp are said to be linearly independent if the equation
x1 v1 + x2 v2 + · · · + xp vp = 0
Theorem 29. Vectors v1 , . . . , vp are linear dependent if and only if there is i ∈ {1, . . . , p}
such that vi ∈ span(v1 , . . . , vi−1 , vi+1 , . . . , vp ).
Question. A single non-zero vector v1 is always linearly independent. Why?
Solution.
Question. Two vectors v1 , v2 are linearly independent if and only if neither of the vectors is a
multiple of the other. Why?
Solution.
For example, let A = a1 a2 a3 and suppose that a1 , a3 are the pivot columns.
; The matrix a1 a3 has 2 pivots.
ILLINOIS
Department of Mathematics
Definition. Let V be a vector space. A sequence of vectors (v1 , . . . , vp ) in V is a basis of V
if
○ V = span (v1 , . . . , vp ) , and
○ (v1 , . . . , vp ) are linearly independent.
1 0 1 1
Example. Check that both , and , are bases of R2 .
0 1 1 −1
Solution.
Linear independence: x2
1 0 1 1 1 1
; 2 pivots , ; ; 2 pivots .
0 1 1 −1 0 −2
Spanning set: x1
a 1 0 a+b 1 a−b 1
=a +b = + .
b 0 1 2 1 2 −1
Theorem 31. Every two bases in a vector space V contain the same numbers of vectors.
Definition. The number of vectors in a basis of V is the dimension of V .
Example. What is the dimension of Rn ?
Solution.
Let
1
0
0
0 1 0
e1 = 0 , e2 = 0 , . . . , en = 0
.. .. ..
. . . x3
0 0 1
e3
Note that (e1 , e2 , . . . , en ) is a basis of Rn : e2 x2
v1 x1 e1
..
e1 . . . en = In ; n pivots,
. = v1 e1 + · · · + vn en .
vn
Thus dim Rn = n.
Theorem 32. Suppose that V has dimension d. Then
○ A sequence of d vectors in V are a basis if they span V .
○ A sequence of d vectors in V are a basis if they are linearly independent.
Proof.
ILLINOIS
Department of Mathematics
Algorithm. To find a basis for Nul(A):
○ Find the parametric form of the solutions to Ax = 0.
○ Express solutions x as a linear combination of vectors with the free variables as coefficients.
○ Use these vectors as a basis of Nul(A).
3 6 6 3 9
Example. Find a basis for Nul(A) where A = .
6 12 15 0 3
Solution.
3 6 6 3 9 0 1 2 0 5 13 0 x = −2x2 − 5x4 − 13x5
; 1
6 12 15 0 3 0 RREF 0 0 1 −2 −5 0 x3 = 2x4 + 5x5
Theorem 35. Let A be an m × n matrix with rank r . The pivot columns of A form a basis
of Col(A). In particular, dim Col(A) = r .
Proof.
Let U be RREF of A.
; Then the pivot columns of U are linearly independent.
; By Remark, pivot columns of A are linearly independent.
1 2 0 4
2 4 −1 3
Example. Find a basis for Col(A) where A =
3
.
6 2 22
4 8 0 16
Solution.
1 2 0 4 1 2 0 4
2 4 −1 3 0
; U= 0 1 5
A=
3
6 2 22 RREF 0 0 0 0
4 8 0 16 0 0 0 0
This are the pivot columns, the first and third.
1 0
2 −1
Hence, a basis for Col(A) is 3 2 . An aside: Note that for U we have column
,
4 0
u2 = 2u1 and u4 = 4u1 + 5u3 .
The same is true for the columns of A!
1 3 1 3
Example. Let A = . The RREF of A is U = . Is Col(A) = Col(U)? Is
2 6 0 0
Col(AT ) = Col(U T )?
Solution.
The second columns of A and U are redundant: The second rows of A and U are redundant:
1 1 T 1 1
Col(A) = span( ) 6= span( ) = Col(U) Col(A ) = span( ) = span( ) = Col(U T )
2 0 3 3
Row operations do not preserve the column Row operations preserve the row space.
space.
4 8 0 16
Solution.
1 2 0 4 1 2 0 4 1 0
2 4 −1 3 0 0 1 5 2 0
A= ; U= ; , basis of Col(AT ).
3 6 2 22 RREF 0 0 0 0 0 1
4 8 0 16 0 0 0 0 4 5
Theorem 38. Let A be an m × n matrix with rank r . Then
○ dim Col(A) = dim Col(AT ) = r .
○ dim Nul(A) = n − r .
Question. What is dim Nul(AT )? What is dim Col(A) + dim Nul(A)?
Solution.
ILLINOIS
Department of Mathematics
Definition. A graph is a set of nodes (or: vertices) that are connected through edges.
Example. A graph with 4 nodes and 5 edges:
1 2
0 1 1 0
1 1 1 0
1 1 0 1
0 0 1 0
3 4
Definition. Let G be a graph with n nodes. The adjacency matrix of G is the n × n-matrix
A = (aij ) such that
(
1 if there is an edge between node i and node j
aij =
0 otherwise .
Definition. A walk of length k on a graph of is a sequence of k + 1 vertices and k edges
between two nodes (including the start and end) that may repeat.A path is walk in which all
vertices are distinct.
Example. Count the number of walks of length 2 from node 2 to node 3 and the number of
walks of length 3 from node 3 back to node 3:
1 2
3 4
Definition. A graph is connected if for every pair of nodes i and j there is a walk from node
i to node j. A graph is disconnected if it is not connected.
Theorem 39. Let G be a graph and let A be its adjacency matrix. Then the entry in the
i-th row and j-th column of A` is the number of walks of length ` from node j to node i on G.
Proof.
We just consider the case ` = 2. Then the entry in the i-th row and j-th column of A2 is:
aj1
(A2 )ij = ai1 . . . ain ... = ai1 a1j + ai2 a2j + · · · + ain anj
ajn
ai2 a2j = 1 ⇐⇒ there is an edge between node j and node 2 and an edge between node 2 and i.
Since each walk from node j to node i has go through some node,the sum
A graph with one connected component: A graph with two connected components:
1 1 2
1 2
3 1 2
2 4
5 3 4
3 4
Theorem 40. Let G be a directed graph and let A be its edge-node incidence matrix. Then
dim Nul(A) is equal to the number of connected components of G.
Example. We explain the idea in the case of the following graph G. Let A be its edge-node
incidence matrix, a 5 × 4-matrix. Note that Nul(A) ⊆ R4 .
1 x1
1 2 x2
3 x3 as assigning to potential to each node. If x ∈ Nul(A):
Think of x =
2 4
5 x
4 −1 1
0 0
−x1 + x2
3 4
0 −1 0 x1 c
1 0 −x1 + x3
0 x
2 c
x3 = −x2 + x3 ; x = c
−1 1 0 0
= Ax = 0
0 1 −1 0
−1 0 0 −1 0 1 −x2 + x4
1 0 0 x4 c
0 0 −1 1 −x3 + x4
0 1 −1 0
0 −1 0 1 Each edge forces the potential of the nodes it connects to be equal. Thus
0 0 −1 1 if two nodes are connected by a path, they have the same assigned poten-
tial.
1 2
Example. Find a basis of the null space of the edge-node
incidence matrix of the following graph: 1 2
3 4
Solution.
x1
x2 4
x3 ∈ R .
Let A be the edge-node incidence matrix. It is a 2 × 4-matrix. Let x =
If Ax = 0, then x4
ILLINOIS
Department of Mathematics
1 2
Example. Let A = 2 4. Find bases of Nul(A) and Col(AT ).
3 6
Solution.
x2
1 2
A ; 0 0 Nul(A)
RREF Col(AT )
0 0
x1
−2 T 1
Nul(A) = span , Col(A ) = span
1 2
; v · u = (AT w)T u = wT Au = wT 0 = 0.
(Av)T = vT R1 R2 . . . Rm = vT R1 vT R2 . . . vT Rm = 0 0 . . . 0 .
So Av = 0. Thus v ∈ Nul(A).
Remark. It follows that
○ Nul(A)⊥ = Col(AT ).
○ Nul(AT ) = Col(A)⊥ .
Theorem 43. Let V be a subspace of Rn . Then dim V + dim V ⊥ = n.
Proof.
Write V = Col(A) for some n × m matrix A.
dim V + dim V ⊥ = dim Col(A) + dim Nul(AT ) = dim Col(AT ) + dim Nul(AT ) = n
1 0
Example. Find a basis of the orthogonal complement of span 0 , 1.
Solution. 0 1
Strategy: Find A such that span is Col(A). Then find a basis of
Nul(AT ). x3
1 0
1 0 0 b
A = 0 1 ; AT = a2
Transpose 0 1 1 x2
0 1
a1
x1 0 0 0
x2 = −x3 = x3 −1 ; −1 basis of Nul(AT ). Col(A) x1
x3 x3 1 1
LINEAR ALGEBRA
Coordinates
ILLINOIS
Department of Mathematics
Theorem 44. Let (v1 , . . . , vp ) be a basis of V . Then every vector w in V can be expressed
uniquely as
w = c1 v1 + · · · + cp vp .
Proof.
Let c1 , . . . , cp and d1 , . . . , dp such that w = c1 v1 + · · · + cp vp = d1 v1 + · · · + dp vp .
vB,n
Question. Let B = (b1 , . . . , bn ) be a basis of Rn . How do you compute IB,En ?
Solution.
v = IEn ,B vB ; IE−1
n ,B
v = vB ; IB,En = IE−1
n ,B
ILLINOIS
Department of Mathematics
Theorem 46. Let v1 , . . . , vm ∈ Rn be non-zero and pairwise orthogonal.Then v1 , . . . , vm are
linearly independent.
Proof.
Suppose that
c1 v1 + · · · + cm vm = 0.
Take the inner product of v1 on both sides.
0 = v1 · (c1 v1 + · · · + cm vm )
= c1 v1 · v1 + c2 v1 · v2 + · · · + cm v1 · vm
= c1 v1 · v1 = c1 kv1 k2
b1 · v = b1 · (c1 b1 + c2 b2 + · · · + cn bn )
= c1 b1 · b1 + c2 b1 · b2 + · · · + cn b1 · bn
v · b1
= c1 b1 · b1 ; c1 =
b1 · b1
−1
1 1 −1 1 1 1
IU ,E2 = IE−1
2 ,U
= √ =√ = IET2 ,U
2 1 1 2 −1 1
Theorem 48. Let U = (u1 , . . . , un ) be an orthonormal basis of Rn . Then
T
IU ,En = u1 . . . un .
Proof.
T T
u1 · v u1 v u1
.. .. .. T
vU = . = . = . v ; IU ,En = u1 . . . un
un · v uT
nv uTn
ILLINOIS
Department of Mathematics
Definition. Let V and W be vector spaces. A map T : V → W is a linear transformation if
The left-hand side is T (cv + dw) and the right-hand side is cT (v) + dT (w).
Example. Let Pn be the vector space of all polynomials of degree at most n. Consider the
d
map T : Pn → Pn−1 given by T (p(t)) := dt p(t). This map is linear! Why?
Solution.
Take any v ∈ V .
It can be written as v = c1 v1 + · · · + cn vn because (v1 , . . . , vn ) is a basis and hence spans V .
Hence by the linearity of T :
x2
T
1 1 0 v
T =T 1 +2 e2
2 0 1
x3 T (e1 )
1 0 e1 x1
=T + 2T
0 1
x2
x1
1 0 1
= 2 + 2 0 = 2
3 −2 −1 T (v)
T (e2 )
Theorem 50. Let T : Rn → Rm be a linear transformation. Then there is a m × n matrix A
such that
○ T (v) = Av, for all v ∈ Rn .
○ A = T (e1 ) T (e2 ) . . . T (en ) , where (e1 , e2 , . . . , en ) is the standard basis of Rn .
Proof.
v1
..
Let v = . . We can write v = v1 e1 + v2 e2 + · · · + vn en . Then
vn
T (v) = T (v1 e1 + v2 e2 + · · · + vn en )
= v1 T (e1 ) + v2 T (e2 ) + · · · + vn T (en )
v1
.
= T (e1 ) T (e2 ) . . . T (en ) .. = Av.
vn
Remark. We call this A the coordinate matrix of T with respect to the standard bases - we
write TEm ,En .
Example. Let Tα : R2 → R2 be the “rotation over α radians (counterclockwise)” map, that
is Tα (v) is the vector obtained by rotating v over angle α. Find the 2 × 2 matrix Aα such that
Tα (v) = Aα v for all v ∈ R2 .
Solution.
ILLINOIS
Department of Mathematics
Last time: Linear transformation is matrix multiplication. For every linear transformation
T : Rn → Rm , there is an m × n matrix A such that T (v) = Av.
Today: The same in abstract vector spaces.
Theorem 51. Let V , W be two vector space, let B = (b1 , . . . , bn ) be a basis of V and
C = (c1 , . . . , cm ) be a basis of W , and let T : V → W be a linear transformation. Then there
is a m × n matrix TC,B such that
○ T (v)C = TC,B vB , for all v ∈ V .
○ TC,B = T (b1 )C T (b2 )C . . . T (bn )C .
apply T
v : vector in V / vector in W : T (v)
2
0 1 0
DC,B = D(1)C D(t)C D(t )C =
0 0 2
We compute
ILLINOIS
Department of Mathematics
Definition. The determinant
of
a b
○ a 2 × 2 matrix is det = ad − bc,
c d
○ a 1 × 1 matrix is det([a]) = a.
Remark. Recall that −1
a b 1 d −b
= .
c d ad − bc −c a
A is invertible ⇐⇒ det(A) 6= 0.
a b a b
Notation. We will write both det and for the determinant.
c d c d
Definition. The determinant is the operation that assigns to each n × n-matrix a number
and satisfies the following conditions:
○ (Normalization) det In = 1,
○ It is affected by elementary row operations as follows:
o (Replacement) Adding a multiple of one row to another row does not change the
determinant.
o (Interchange) Interchanging two different rows reverses the sign of the determinant.
o (Scaling) Multiplying all entries in a row by s, multiplies the determinant by s.
2 3 3
Example. Compute det 0 1 2 .
0 0 6
Solution.
1 1
R2 →R2 − 3 R3 R1 → 2 R1
2 3 3 1
R1 →R1 − 2 R3
2 3 0 2 0 0 1
R3 → 6 R3
1 0 0
R1 →R1 −3R2
0 1 2 = 0 1 0 = 0 1 0 = 2 · (6) 0 1 0 = 12
0 0 6 0 0 6 0 0 6 0 0 1
Theorem 53. The determinant of a triangular matrix is the product of the diagonal entries.
1 2 0
Example. Compute 3 −1 2 .
2 0 1
Solution.
1 2 0 R2→R2−3R1 1 2 0
R3→R3−2R1
3 −1 2 = 0 −7 2
2 0 1 0 −4 1
R3→R3− 74 R2
1 2 0
= 0 −7 2
0 0 − 17
1
= 1 · (−7) · (− ) = 1
7
a b
Example. (Re)Discover the formula for .
c d
Solution.
a b R2→R2− ca R1 a b c
= = a d − b = ad − bc
c d 0 d − ca b a
NB: this only works if a 6= 0. What do you do if a = 0? Swap rows!
det(A) = 0 ⇐⇒ det(B) = 0,
because elementary row operations do not change whether the determinant is 0 or not.
1
1 = det(In ) = det(AA−1 ) = det(A) det(A−1 ) ; det(A−1 ) = .
det(A)
Theorem 56. Let A be an n × n-matrix. Then det(AT ) = det(A).
Proof.
Let P be a permutation matrix, L be a lower-triangular matrix and U be an upper triangular
matrix such that PA = LU.
For simplicity, assume P is the permutation matrix of a single row swap. Then P = P T .
ILLINOIS
Department of Mathematics
Notation. Let A be an n × n-matrix. We denote by Aij the matrix obtained from matrix A by
deleting the i-th row and j-th column of A.
1 2 3 4
5 6 7 8
Example. Let A = 9 10 11 12. Find A23 and A43 .
13 14 15 16
Solution.
1 2 4 1 2 4
A23 = 9 10 12 , A43 = 5 6 8
13 14 16 9 10 12
Definition. Let A be an n × n-matrix. The (i, j)-cofactor of A is the scalar Cij defined by
Cij = (−1)i+j det Aij .
Theorem 57. Let A be an n × n-matrix. Then for every i, j ∈ {1, . . . , n}
det A = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin (expansion across row i)
= a1j C1j + a2j C2j + · · · + anj Cnj (expansion down column j)
1 2 0
Example. Compute 3 −1 2 by cofactor expansion across row 1.
2 0 1
Solution.
1 2 0 + − +
3 −1 2 = 1·(−1)1+1 · −1 2 +2·(−1)1+2 · 3 2 +0·(−1)1+3 · 3 −1
2 0 1 0 1 2 1 2 0
−1 2 3 2 3 −1
=1· −2· +0· = 1 · (−1) − 2 · (−1) + 0 = 1
0 1 2 1 2 0
1 2 0
Example. Compute 3 −1 2 by cofactor expansion down column 2 and by cofactor
2 0 1
expansion down column 3.
Solution.
1 2 0 − 1 0 1 0
1+2 2+2 3+2
3 −1 2 = 2·(−1) · 3 2 +(−1)·(−1) · + +0·(−1) · 3 2
2 0 1 2 1 2 1 −
= −2 · (−1) + (−1) · 1 − 0 = 1
1 2 0 + 1 2 1 2
3 −1 2 = 0·(−1)1+3 · 3 −1 +2·(−1)2+3 · − +1·(−1)3+3 · 3 −1
2 0 1 2 0 2 0 +
= 0 − 2 · (−4) + 1 · (−7) = 1
Question. Why is the method of cofactor expansion not practical for large n?
Solution.
To compute the determinant of a large n × n matrix,
o one reduces to n determinants of size (n − 1) × (n − 1),
o then n(n − 1) determinants of size (n − 2) × (n − 2),
o and so on.
In the end, we have n! = n(n − 1) · · · 3 · 2 · 1 many numbers to add.
WAY TOO MUCH WORK! Already
Remark. The cofactor expansion is also called Laplace expansion, because it is due
Pierre-Simon Laplace.
LINEAR ALGEBRA
Eigenvectors and Eigenvalues
ILLINOIS
Department of Mathematics
Definition. Let A be an n × n matrix. An eigenvector of A is a nonzero v ∈ Rn such that
Av = λv,
0 1 x1 x
= 2
1 0 x2 x1 x2
1 1 1
A = =1· v2 v1 Av1
1 1 1
1
; is an eigenvector with eigenvalue 1. x1
1
Av2
−1 1 −1
A = = −1 ·
1 −1 1
−1
; is an eigenvector with eigenvalue −1.
1
1 0
Example. Find the eigenvectors and the eigenvalues of B = .
0 0
Solution.
1 0 x1 x
= 1 x2
0 0 x2 0
1 1 1
B = =1· . v2
0 0 0
v1
1
; is an eigenvector with eigenvalue 1. Bv2 Bv1 x1
0
0 0 0
B = =0
1 0 1
0
; is an eigenvector with eigenvalue 0.
1
Definition. Let λ be an eigenvalue of A. The eigenspace of A associated with λ is the set
It consists of all the eigenvectors of A with eigenvalue λ and the zero vector.
0 1 1 0
Example. Draw the eigenspaces of the two matrices A = and B = .
1 0 0 0
Solution.
x2 x2
Eig−1 (A) Eig0 (A)
x x x x
A =1· B =1·
Eig1 (A) x x 0 0
Eig1 (A)
x1 x1
−x −x 0 0
A = −1· B =0·
x x x x
LINEAR ALGEBRA
Computing Eigenvalues and Eigenvectors
ILLINOIS
Department of Mathematics
Theorem 58. Let A be an n × n matrix and λ be a scalar. Then λ is an eigenvalue of A if
and only if det(A − λI ) = 0.
Proof.
λ is an eigenvalue of A.
⇔ there is a non-zero v ∈ Rn such that Av = λv.
⇔ there is a non-zero v ∈ Rn such that Av − λv = 0.
⇔ there is a non-zero v ∈ Rn such that (A − λI )v = 0.
⇔ Nul(A − λI ) 6= {0}.
⇔ A − λI is not invertible.
⇔ det(A − λI ) = 0.
3−λ 1
det(A − λI ) = = (3 − λ)2 − 1
1 3−λ
= λ2 − 6λ + 8 = 0 ; λ1 = 2, λ2 = 4
Eigenvalue λ2 = 4:
−1 1 1 −1
A − 4I = ;
1 −1 RREF 0 0
x1 x2 1 1
;x= = = x2 ; Nul(A − 4I ) = span
x2 x2 1 1
3 2 3
Example. Find the eigenvalues and eigenspaces of A = 0 6 10 .
0 0 2
Solution.
3−λ 2 3
det(A − λI ) = 0 6−λ 10 = (3 − λ)(6 − λ)(2 − λ)
0 0 2−λ
; A has eigenvalues 2, 3, 6. The eigenvalues of a triangular matrix are its diagonal entries.
1 2 3 1 0 −2 2
λ1 = 2 : A − 2I = 0 4 10 ; 0 1 2.5 ; Nul(A − 2I ) = span −5/2
RREF
0 0 0 0 0 0 1
0 2 3 0 1 0 1
λ2 = 3 : A − 3I = 0 3 10
; 0 0
1 ; Nul(A − 3I ) = span
0
RREF
0 0 −1 0 0 0 0
−2
2
−3 2 3 1 3 0 3
λ3 = 6 : A − 6I = 0 0 10
; 0 0 1 ; Nul(A − 6I ) = span 1
RREF
0 0 −4 0 0 0 0
LINEAR ALGEBRA
Properties of Eigenvectors and Eigenvalues
ILLINOIS
Department of Mathematics
Definition. Let A be an n × n matrix and let λ be an eigenvalue of A.
○ The algebraic multiplicity of λ is its multiplicity as a root of the characteristic
polynomial, that is, the largest integer k such that (t − λ)k divides pA (t).
○ The geometric multiplicity of λ is the dimension of the eigenspace Eigλ (A) of λ.
1 1
Example. Find the eigenvalues of A = and determine their algebraic and geometric
0 1
multiplicities.
Solution.
1−λ 1
○ det(A − λI ) = = (1 − λ)2
0 1−λ
So: λ = 1 is the only eigenvalue (it has algebraic multiplicity 2).
0 1 1
○ λ=1: A−I = ; Nul(A − I ) = span
0 0 0
Only dimension 1! ; Geometric multiplicity 1.
Theorem 61. Let A be an n × n matrix and let v1 , . . . , vm be eigenvectors of A
corresponding to different eigenvalues. Then v1 , . . . , vm are linearly independent.
Proof.
c1 (λ1 − λ2 )v1 = 0.
Since λ1 6= λ2 , we get that c1 = 0. Also c2 = 0, since v2 6= 0.
a11 a12 . . . a1n
a21 a22 . . . a2n
Definition. Let A =
..
.
... . . . ...
. The trace of A is the sum of the diagonal entries of A;
an1 an2 . . . ann
that is
Tr(A) = a11 + a22 + · · · + ann .
Solution.
a11 − λ a12
p(λ) = = (a11 −λ)(a22 −λ)−a12 a21 = λ2 −(a11 + a22 ) λ+(a11 a22 − a12 a21 )
a21 a22 − λ | {z } | {z }
Tr(A) det(A)
LINEAR ALGEBRA
Markov matrices
ILLINOIS
Department of Mathematics
Definition. An n × n matrix A is a Markov matrix (or: stochastic
matrix) if it has only non-negative entries, and the entries in each column
add up to 1.
A vector in Rn is a probability vector (or: stochastic vector) if has only
non-negative entries, and the entries add up to 1.
.1
0 .25 .4
.25 1 1
.1 .5 2 1 2
, 1 .25 .2 , .05 . Note that is not a probability vector, but is.
.9 .5 3 10 3
0 .5 .4 .5
4 4
.1
Question. Let A be n × n Markov matrix and v ∈ Rn be a probability vector. Is Av also a
probability vector?
Solution.
a11 . . . a1n v1 v1 a11 + · · · + vn a1n
.. . . ..
Av = . .. .. =
.
an1 . . . ann vn v1 an1 + · · · + vn ann
;(v1 a11 + · · · + vn a1n ) + · · · + (v1 an1 + · · · + vn ann )
= v1 (a11 + · · · + an1 ) + · · · + vn (a1n + · · · + ann ) = v1 + · · · + vn = 1.
| {z } | {z }
=1 =1
; Ak z = c1 λk1 v1 + · · · + cn λkn vn → c1 v1 ,
because |λi | < 1 for each i = 2, . . . , n. Why is c1 6= 0? Because Ak z is a probability vector.
Example. Consider a fixed population of people with or without job. Suppose that each year,
50% of those unemployed find a job while 10% of those employed lose their job.
What is the unemployment rate in the long term equilibrium?
Solution.
.5
x
.9 empl. umempl. .5 ; ∞ stationary probability vector
y∞
.1
−0.1 0.5 1 −5
A − 1I = ;
0.1 −0.5 RREF 0 0
xt : % of population employed at time t
yt : % of population unemployed at time t −0.1 0.5 5
; Nul = span
0.1 −0.5 1
xt+1 .9xt + .5yt .9 .5 xt
= = x∞ 5/6
yt+1 .1xt + .5yt .1 .5 yt Since x∞ + y∞ = 1 ; = .
y∞ 1/6
x∞ 0.9 0.5 x∞ Unemployment rate in the long term equilib-
=
y∞ 0.1 0.5 y∞ rium is 1/6
LINEAR ALGEBRA
Diagonalization
ILLINOIS
Department of Mathematics
Definition. A square matrix A is said to be diagonalizable if there is a invertible matrix P
and a diagonal matrix D such that A = PDP −1 .
Theorem 66. Let A be an n × n matrix that has n linearly independent eigenvectors
v1 , v2 , . . . , vn with associated eigenvalues λ1 , . . . , λn . Then A is diagonalizable as PDP −1 ,
where
λ1
P = v1 . . . vn and D =
..
.
λn
Proof.
AP = A v1 . . . vn = Av1 . . . Avn
λ1
= λ1 v1 . . . λn vn = v1
. . . vn .. = PD
.
λn
; A = PDP −1
6 −1
Example. Diagonalize A = .
2 3
Solution.
6 − λ −1
det(A − λI ) = = (6 − λ)(3 − λ) + 2 = λ2 − 9λ + 20 = (λ − 4)(λ − 5)
2 3−λ
1 − 12
1
2 −1
λ1 = 4 : A − 4I = ; ; Nul(A − 4I ) = span 2
2 −1 RREF 0 0 1
1 −1 1 −1 1
λ2 = 5 : A − 5I = ; ; Nul(A − 5I ) = span
2 −2 RREF 0 0 1
1
4 0 1
D= ,P = 2 ; A = PDP −1 .
0 5 1 1
Definition. Vectors v1 , . . . , vn form an eigenbasis of n × n matrix A if v1 , . . . , vn form a
basis of Rn and v1 , . . . , vn are all eigenvectors of A.
Theorem 67. The following are equivalent for n × n matrix A:
○ A has an eigenbasis.
○ A is diagonalizable.
○ The geometric multiplicities of all eigenvalues of A sums up to n.
Theorem 68. Let A be n × n matrix and let B = (v1 , . . . , vn ) be an eigenbasis of A. Then
there is a diagonal matrix D such that
A = IEn ,B DIB,En .
Proof.
Yes. Eigenvectors to different eigenvalue are linearly independent. Thus A has n linearly inde-
pendent eigenvectors.
0 1
Question. Is A = diagonalizable?
0 0
Solution.
−λ 1
det(A − λI ) = = λ2
0 −λ
0 1 1
λ1 , λ2 = 0 : A − 0I = ; Nul(A) = span
0 0 0
; Only one linearly independent eigenvector. No eigenbasis
LINEAR ALGEBRA
Powers of Matrices
ILLINOIS
Department of Mathematics
Idea. If A has an eigenbasis, then we can raise A to large powers easily!
Theorem 69. If A = PDP −1 , where D is a diagonal matrix, then for any m,
Am = PD m P −1
Proof.
Am = (PDP −1 )m
= (PDP −1 ) · (PDP −1 ) · · · (PDP −1 )
| {z }
m times
−1 −1
= (PD)(P · P)(DP ) · · · (PDP −1 )
= PD · DP −1 · · · PDP −1
= PD · D · · · D · P −1
= PD m P −1
ILLINOIS
Department of Mathematics
Definition. Let A be an n × n-matrix. We define the matrix exponential e At as
(At)2 (At)3
e At = I + At + + + ...
2! 3!
2 0
Example. Compute e At for A = .
0 1
Solution.
(2t)k 0
2t 0
At = ; (At) = k
0 t 0 tk
1 (2t)2 0
At 1 0 2t 0
e = + + + ...
0 1 0 1t 2! 0 t2
" 2
#
1 + 2t + (2t)
2t
2! + . . . 0 e 0
= =
0 t2
1 + t + 2! + . . . 0 et
e At = Pe Dt P −1 .
Proof.
Recall that
Ak = PD k P −1 .
(At)2 (At)3
e At = I + At + + + ...
2! 3!
PD 2 P −1 t 2 PD 3 P −1 t 3
= I + PDP −1 t + + + ...
2! 3!
D 2t 2 D 3t 3
= P I + Dt + + + . . . P −1
2! 3!
= Pe Dt P −1
−1
−2 1 1 1 −1 0 1 1
Example. Let A = = . Compute e At .
1 −2 1 −1 0 −3 1 −1
Solution.
e At = Pe Dt P −1
−t −1
1 1 e 0 1 1
=
1 −1 0 e −3t 1 −1
−t 1 1
1 1 e 0 2 2
=
1 −1 0 e −3t 12 − 12
−t
e −3t
1 1
e 2 2
= −t
e −e −3t 12 − 21
1 −t 1 −3t 1 −t 1 −3t
e + e e − e
= 21 −t 21 −3t 21 −t 21 −3t
2 e − 2 e 2e + 2e
LINEAR ALGEBRA
Linear Differential Equations
ILLINOIS
Department of Mathematics
Definition. A linear (first order) differential equation is an equation of the form
du
= Au
dt
where u is function from R to Rn . A further condition u(0) = v, for some v in Rn is called a
initial condition.
Theorem 72. Let A be an n × n matrix and v ∈ Rn . The solution of the differential
equation du At
dt = Au with initial condition u(0) = v is u(t) = e v.
du
Example. Let n = 1. Find the solution of the differential equation dt = −u and u(0) = 1.
Solution.
u(t)
In this case, A = −1 .Thus
u(t) = e −t · 1 = e −t
t
Theorem 73. Let A be an n × n-matrix, and let v ∈ Rn be an eigenvector of A with
eigenvalue λ. Then e At v = e λt v.
Proof.
(At)2 (At)3 t 2 A2 v t 3 A3 v
e At v = (I + At + + + . . . )v = (v + tAv + + + ...)
2! 3! 2! 3!
(λt)2 v (λt)3 v (λt)2 (λt)3
= (v + λtv + + + . . . ) = (1 + λt + + + . . . )v = e λt v
2! 3! 2! 3!
Theorem 74. Let A be an n × n-matrix and (v1 , . . . , vn ) be an eigenbasis of A with
eigenvalues λ1 , . . . , λn . If v = c1 v1 + · · · + cn vn ,then the unique solution to the differential
equation du
dt = Au with initial condition u(0) = v is
e At v = c1 e λ1 t v1 + · · · + cn e λn t vn .
Proof.
e At v = e At (c1 v1 + · · · + cn vn ) = c1 e At v1 + · · · + cn e At vn
= c1 e λ1 t v1 + · · · + cn e λn t vn .
0 1 du
Example. Let A = . Solve the differential equation dt = Au with initial condition
1 0
1
u(0) = .
0
Solution.
○ Eigenvalues: det(A − λI ) = λ2 − 1 ; λ1 = 1, λ2 = −1
1 1
○ Eigenvectors: eigenvector with eigenvalue 1, eigenvector with eigenvalue −1
1 −1
1 1 1 1 1 1
○ Express in terms of the eigenbasis: =2 +2 .
0 0 1 −1
○
At 1 At 1 1 1 1 1 At 1 1 At 1
e =e ( + )= e + e
0 2 1 2 −1 2 1 2 −1
1 t 1 −t
1 1 1 1 e + e
= et + e −t = 12 t 21 −t → ∞ (as t → ∞)
2 1 2 −1 2e − 2e
1 2 du
Example. Let A = . Solve the differential equation dt = Au with initial condition
2 1
0
u(0) = .
1
Solution.
○ Eigenvalues: det(A − λI ) = λ2 − 2λ − 3 = (λ − 3)(λ + 1) ; λ1 = 3, λ2 = −1
1 1
○ Eigenvectors: eigenvector with eigenvalue 3, eigenvector with eigenvalue −1
1 −1
0 0 1 1 1 1
○ Express in terms of the eigenbasis: =2 + −2 .
1 1 1 −1
○
At 0 At 1 1 1 1 1 At 1 1 At 1
e =e ( +− )= e +− e
1 2 1 2 −1 2 1 2 −1
1 3t 1 −t
1 1 1 1 e − e
= e 3t − e −t = 12 3t 21 −t → ∞ (as t → ∞)
2 1 2 −1 2e + 2e
LINEAR ALGEBRA
Orthogonal projection onto a line
ILLINOIS
Department of Mathematics
Definition. Let v, w ∈ Rn . The orthogonal projection of v onto the line spanned by w is
w·v
projw (v) := w.
w·w
Theorem 75. Let v, w ∈ Rn . Then projw (v) is the point in span(w) closest to v; that is
dist(v, projw (v)) = min dist(v, u).
u∈span(w)
Proof.
x2 ○ We know projw (v) ∈ span(w). So projw (v) =
w cw for some scalar c.
○ Observe that v − projw (v) orthogonal to w.
v
projw (v) ○ So
x1
0 = w · (v − projw (v))
= w · (v − cw) = w · v − cw · w
w·v
;c =
w·w
Remark. Note that v − projw (v) (called the error term) is in span(w)⊥ .
v = projw (v) + v − projw (v)
| {z } | {z }
∈span(w) ∈span(w)⊥
−2 3
Example. Find the orthogonal projection of v = onto the line spanned by w = .
1 1
Solution.
w·v −6 + 1 3 x2
projw (v) = w= 2
w·w 3 + 12 1
v
5 3 −1.5 w
=− =
10 1 −.5
x1
The error term is projw (v)
−2 −1.5 −0.5
v − projw (v) = − = .
1 −.5 1.5
Theorem 76. Let w ∈ Rn . Then for all v ∈ Rn
1 T
projw (v) = ww v.
w·w
Proof.
Trick: cu = uc for every scalar c and u ∈ Rn (Think of c as 1 × 1 matrix).
w·v 1 1
projw (v) = w= (wT v)w = w(wT v)
w·w w·w ·w
w
1 1
= (wwT )v = wwT v.
w·w w·w
1 T
Remark. Note that w·w ww is an n × n matrix, we call the orthogonal projection
matrix onto span(w).
1
Example. Let w = . Find the orthogonal projection matrix P onto span(w). Use it to
1
1 1 1
calculate the projections of , , onto span(w).
0 1 −1
Solution.
1 T 1 1 1 1 1
P= ww = 1 1 = .
w·w 2 1 2 1 1
1 1 1 1 1 1 1
○ If v = , then projw (v) = Pv = 2 =2
0 1 1 0 1
1 1 1 1 1
○ If v = , then projw (v) = Pv = 21 = =v
1 1 1 1 1
1 1 1 1 1 0
○ If v = , then projw (v) = Pv = 2 =
−1 1 1 −1 0
LINEAR ALGEBRA
Orthogonal projection onto a subspace
ILLINOIS
Department of Mathematics
Theorem 77. Let W be a subspace of Rn and v ∈ Rn . Then v can be written uniquely as
v = |{z} v⊥
v̂ + |{z}
in W in W ⊥
Proof.
Note that dim W + dim W ⊥ = n. Let (w1 , . . . , w` ) be
a basis of W and let (u1 , . . . , um ) be a basis of W ⊥ .
w3
v W Since ` + m = n, (w1 , . . . , w` , u1 , . . . , um ) is a basis of
v⊥ Rn . Thus
w1
w2 v̂ v = c1 w1 + · · · + c` w` + d1 u1 + . . . dm um
| {z } | {z }
=:v̂ =:v⊥
v · w1 v · w2
projW (v) = w1 + w2
w1 · w1 w2 · w 2 This calculation only works
0 3
0 0 for orthogonal w1 , w2 !
3 · 0 3 · 1
3 0 0 3
10 1 10 0
= 0 + 1 v⊥ = 3 − 3
3 3 1 0 0 0 10 1
0 · 0 1 · 1
−3
1 1 0 0
= 0
3 0 3 9
10
= 0 + 3 1 = 3
10
1 0 1
Theorem 78. The projection map projW : Rn → Rn that sends v to projW (v) is linear.
Definition. The matrix PW is the matrix (projW )En ,En that represents projW with respect to
the standard basis. We call PW the orthogonal projection matrix onto W .
3 0
Example. Compute PW for W = span 0 , 1.
Solution. 1 0
1 3 1 0
0·0 0·1
1
3 0 3
0 1 0 0 3
0
0
projW (0) = 0 + 1 = 10 0
3 3 0 0 projW (1) = 1
0 1
0·0
0
1·1
1
0 0
1 1 0 0 9 3
10 0 10
0 3 0 0 ; PW = 0 1 0
0·0
0·1 3 1
10 0 10
0 3
0 3
1 1 1 0 1
projW (0) = 10 0 + 1 1 = 10 0
1 1 0 1
Question. Let PW be the orthogonal projection matrix onto W in Rn , and let v ∈ Rn .
○ If PW v = v, what can you say about v? ○ If PW v = 0, what can you say about v?
Solution.
Let v = w + u, where w ∈ W and u ∈ W ⊥ . Then w = PW v.
ILLINOIS
Department of Mathematics
Goal. Suppose Ax = b is inconsistent. Can we still find something like a best solution?
Definition. Let A be an m × n matrix and b ∈ Rm . A least squares solution (short: LSQ
solution) of the system Ax = b is a vector x̂ ∈ Rn such that
x2
Col(A) Let x̂ ∈ Rn . So Ax̂ ∈ Col(A).
b
b̂ dist(Ax̂, b) = minn dist(Ax, b) ⇐⇒ dist(Ax̂, b) = min dist(w, b)
x1 x∈R w∈Col(A)
ILLINOIS
Department of Mathematics
Example. Are there β1 , β2 ∈ R such y
that the data points × ×
×
(x1 , y1 ) = (2, 1), (x2 , y2 ) = (5, 2),
(x3 , y3 ) = (7, 3), (x4 , y4 ) = (8, 3) ×
Solution.
Normal equations:
Find a LSQ to
XTX XTy
1 2 1
1
4 22 β̂1 9
5 β1 2 =
1
= 22 142 β̂2 57
7 β2 3
1 8 3
β̂1 2/7
design matrix X observation vector y ; =
β̂2 5/14
; LSQ-line: y = 2
7 + 5
14 x
Example. A scientist tries to find the relation between the y
mysterious quantities x and y . She measures the following values: × ×
x −2 −1 0 1 2
y 5 2.5 2.25 2 5 × ×
×
ILLINOIS
Department of Mathematics
Theorem 82. Every subspace of Rn has an orthonormal basis.
Algorithm. (Gram-Schmidt orthonormalization) Given a basis a1 , . . . , am , produce an
orthogonal basis b1 , . . . , bm and an orthonormal basis q1 , . . . , qm .
b1
b1 = a1 , q1 = kb1 k
b2 q3
b2 = a2 − projspan(q1 ) (a2 ), q2 = kb2 k
| {z } a3 span(a1 , a2 )
=(a2 ·q1 )q1
b3 q1
b3 = a3 − projspan(q1 ,q2 ) (a3 ) q3 = kb3 k
| {z } q2
(a3 ·q1 )q1 +(a3 ·q2 )q2
··· ···
Remark.
○ span(q1 , . . . , qi ) = span(a1 , . . . , ai ) for i = 1, . . . , m.
○ qj ∈
/ span(a1 , . . . , ai ) for all j > i.
2 0
Example. Let V = span 1 , 0. Use the Gram-Schmidt method to find an
2 3
orthonormal basis of V .
Solution.
2 2
b1 1
Set b1 = 1 and q1 =
kb1 k =3 1 .
2 2
4
0 0 2 2 0 2 −
1 1 2 32
b2 = 0 − ( 0 ·
1 ) 1 = 0 − 1 = −3
3 3 3 5
3 3 2 2 3 2 3
−4
1
; q2 = √ −2
45 5
Theorem 83. (QR decomposition) Let A be an m × n matrix of rank n. There is is an
m × n-matrix Q with orthonormal columns and an upper triangular n × n invertible matrix R
such that A = QR.
Proof.
Let’s do the 3 × 3 case: A = a1 a2 a3 . Using Gram-Schmidt we obtain orthonormal vectors
q1 , q2 , q3 such that
○ (q1 , q2 , q3 ) is a basis of Col(A)
○ a1 ∈ span(q1 ), a2 ∈ span(q1 , q2 ) and a3 ∈ span(q1 , q2 , q3 )
a1 · q1 a2 · q1 a3 · q1
; A = a1 a2 a3
= q1 q2 q3 0 a2 · q2 a3 · q2
0 0 a3 · q3
1 2 4
Example. Find the QR decomposition of A = 0 0 5.
0 3 6
Solution.
ILLINOIS
Department of Mathematics
Theorem 84. Let A be a symmetric n × n matrix. Then A has an orthonormal basis of
eigenvectors.
Proof.
We will just show that eigenvectors to different eigenvalues are orthogonal. Let v, w ∈ Rn be
such that
Av = λ1 v and Aw = λ2 w, for λ1 6= λ2 .
λ1 (v · w) = (λ1 v) · w
= (Av) · w
= (Av)T w
= vT AT w
= vT Aw ←− because AT = A
= v · (Aw)
= λ2 (v · w)
; v · w = 0, since λ1 6= λ2 .
Theorem 85. Let A be a symmetric n × n matrix. Then there is a diagonal matrix D and a
matrix Q with orthonormal columns such that A = QDQ T .
Proof.
Let Q be the matrix whose columns are orthonormal eigenvectors.
Then Q is orthogonal, that is Q −1 = Q T . Thus,
A = QDQ −1 = QDQ T .
AT = (QDQ T )T = (Q T )T D T Q T = QDQ T = A
3 1
Example. Let A = . Write A as QDQ T , where D is diagonal and Q has orthonormal
1 3
columns.
Solution.
x2 Aq2 x2 Av
v
q2 q2
x1 x1
q1 q1
Aq1
LINEAR ALGEBRA
Singular Value Decomposition
ILLINOIS
Department of Mathematics
Definition. Let A be an m × n matrix. A singular value decomposition of A is a
decomposition A = UΣV T where
○ U is an m × m matrix with orthonormal columns,
○ Σ is an m × n rectangular diagonal matrix with non-negative numbers on the diagonal,
○ V is an n × n matrix with orthonormal columns.
Remark. The diagonal entries σi = Σii which are positive are called the singular values of A.
We usually arrange them in decreasing order, that is σ1 ≥ σ2 ≥ . . .
Question. Let A be an m × n matrix with rank r . Recall why
○ Nul(AT A) = Nul(A) and
○ AT A is symmetric and has rank r .
Nul(AAT ) = Nul(AT ).
Solution.
v ∈ Nul(A) ; Av = 0 ; AT Av = AT 0 = 0 ; v ∈ Nul(AT A)
v ∈ Nul(AT A) ; AT Av = 0 ; (Av) · (Av) = (Av)T Av = vT AT Av = vT 0 = 0 ; v ∈ Nul(A).
; AV = A v1 . . . vn = Av1 . . . Avn
= σ1 u1 . . . σr ur 0 . . . 0 = UΣ ; A = UΣV T .
−1 1 0
Example. Compute the SVD of A = .
0 −1 1
Solution.
1 −1 0
AT A = −1 2 −1 √1
6
0 −1 1 1 1 −1 1 0 − √2
u1 := Av1 = √
σ1 3 0 −1 1 6
Eigenvalues: λ1 = 3, λ2 = 1, λ3 = 0 √1
1 1 1 6
√ − √2 √
" 3 # " 1 #
6 3 1 − √6 − √2
− √2
Eigenbasis: , 0
√1
, =√ 3 = √1
6 3 3 √6 2
√1 √1 √1
2
6
| {z } | {z } | {z }
3
− √12 " 1 #
√
:=v1 :=v2 :=v3 1 −1 1 0
u2 = Av2 = 0 = √12
√ σ2 0 −1 1
Singular values: σ1 := 3, σ2 = 1 √1 2
1 2
1 1
√
6
− √2 √3 √
"
− √12 √12
#
− √2 1 3 0 0 ; U = √1
V = 6
0 √ ,Σ =
3
√1
1 1 1 0 1 0 2 2
√ √ √
6 2 3
Theorem 86. Let A be an m × n matrix with rank r , and let U = u1 . . . um ,
V = v1 . . . vn , Σ be such that A = UΣV T is a SVD of A. Then
u1 =
1 1
Av1 , . . . , ur = Avr (vr +1 , . . . , vn ) ∈ Nul(AT A) = Nul(A).
σ1 σr
; u1 , . . . , ur ∈ Col(A) Since dim(Nul(A)) = n − r ,
; (u1 , . . . , ur ) basis of Col(A) ; (vr +1 , . . . , vn ) basis of Nul(A)
dim Col(A)=r
ILLINOIS
Department of Mathematics
Theorem
87. Let A be an m × n matrix with rank r , and let U = u1 . . . um ,
V = v1 . . . vn be matrices with with orthonormal columns and Σ be a rectangular diagonal
m × n matrix such that A = UΣV T is an SVD of A. Then
" # " #
√ − √1 h 1 √1
−1 1 0 i h i
= 3 √1 2 √6 − √26 √1 +1 √1
2 − √12 0 √1
0 −1 1 2
6
2
2
1
− 2 1 − 21
1 1
−2 0 2
= 1 1 +
2 −1 2 − 12 0 1
2
Definition. For k ≤ r , define
√1 − √12
" #
− √12 √1 6
√
2 3 0 − √2
Uc = , Σc = , Vc = 0
√1 √1 0 1 6
2 2 √1 √1
6 2
LINEAR ALGEBRA
The Pseudo-Inverse
ILLINOIS
Department of Mathematics
Definition. Let A be an m × n matrix with rank r . Given the compact singular value
decomposition T
A = UcΣc Vc where
○ Uc = u1 . . . ur is an m × r matrix with orthonormal columns,
○ Σc is an r × r diagonal
matrix with positive diagonal elements,
○ Vc = v1 . . . vr is an n × r matrix with orthonormal columns,
we define the pseudoinverse A+ of A as Vc Σ−1 T
c Uc .
Example. Recall that
T
√1 − √12
" √1 √1
# √
− 6
−1 1 0 3 0 − √2
A := = √1 2 √1
2 0 .
0 −1 1 0 1 6
2 2 √1 √1
6 2
Determine A+ .
Solution.
√1 − √12 " 1 √1 − √12 " 1
2
− 13
#" # # −3
6
− √2 √ 0 − √12 √1 6
− √2 − √6 √1
A+ = 6
0 3
√1 √1
2 = 6
0 1 √ √1
6 = 1
3 − 13
0 1
1 2
√1 1
√ 2 2 √1 √1 2 2
6 2 6 2 3 3
Theorem 88. Let v ∈ Col(AT ) and w ∈ Col(A). Then A+ Av = v and AA+ w = w.
Proof.
Since Col(Vc ) = Col(AT ) and Vc has orthonormal columns, we have for every x ∈ Rn
A+ Av = Vc Σ−1 T T T
c Uc Uc Σc Vc v = Vc Vc v = projCol AT (v) = v.
Since Col(Uc ) = Col(A) and Uc has orthonormal columns, we have for every x ∈ Rm
Thus
AA+ w = Uc Σc VcT Vc Σ−1 T T
c Uc w = Uc Uc w = projCol A (w) = w.
Solution.
Theorem 89. Let A be an m × n matrix and let b ∈ Rm . Then A+ b is the LSQ solution of
Ax = b (with minimum length).
Proof.
Observe that
AA+ b = Uc Σc VcT Vc Σ−1 T T
c Uc b = Uc Uc b = projCol A (b)
Remark. This is particularly useful, when solving many different LSQ problems of the form
Ax = b, where A stays the same, but b varies.
LINEAR ALGEBRA
PCA
ILLINOIS
Department of Mathematics
Setup.
○ Given m objects, we measure the same n variables.
○ Thus m samples of n-dimensional data ; m × n matrix (each row is a sample)
○ Analyse this matrix to understand what drives the variance in the data.
T
Definition. Let X = a1 . . . am be an m × n matrix. We define the column average
µ(X ) of X as
1
µ(X ) := (a1 + · · · + am ).
m
We say X is centered if µ(X ) = 0.
1
For X centered, we define the covariance matrix cov(X ) of X as m−1 XTX.
Remark.
T
○ Not centered, replace X by a1 − µ(X ) . . . am − µ(X ) .
○ If the columns of X are orthogonal, then cov(X ) is a diagonal matrix ; each variable is
independent.
○ What if not? Idea: Pick an eigenbasis of X T X .
Principal component analysis
○ Input: centered m × n-matrix X .
○ Compute cov(X ).
○ Since cov(X ) is symmetric, we can find an orthonormal eigenbasis v1 , v2 , . . . , vn of cov(X )
with eigenvalues λ1 ≥ · · · ≥ λn ≥ 0.
○ Write cov(X ) as a sum of rank 1 matrices:
○ Each principal component vi explains part of the variance of the data. The larger λi , the
more of the variances is explained by vi .
Example. Let X be a centered 30 × 2-matrix. We
plot the data and see that
7.515 20.863
cov(X ) = .
20.863 63.795 x2
An orthonormal eigenbasis is
0.314
0.95
v1
v1 = , v2 =
0.95 −0.314 x1
v2
with λ1 = 70.685 and λ2 = 0.625. Thus
6.97 21.085 0.564 −0.186
cov(X ) = + .
21.085 63.793 −0.186 0.0616
| {z } | {z }
λ1 v1 v1T λ2 v2 v2 T
PCA using SVD.
○ Let X be a centered data matrix. Observe that X T X = (m − 1) cov(X ).
○ To find orthonormal eigenbasis of cov(X ), it is enough to find orthonormal eigenbasis of
XTX.
○ Compute the SVD of X ; X = UΣV T .
○ The columns of V = v1 . . . vn are the desired orthonormal eigenbasis.
○ If σi is the singular value for vi , then
σi2
λi =
m−1
LINEAR ALGEBRA
Review of Complex Numbers
ILLINOIS
Department of Mathematics
Definition.
√ C = {x + iy | x, y ∈ R} where
2
i = −1, or i = −1.
Any point in 2
x
R can be viewed as a complex
number: y ↔ x + iy
Definition. Given z = x + iy , w = u + iv , Example. Compute i(x + iy ), i 2 (x + iy ) and
we define i 3 (x + iy ) and i 4 (x + iy ).
z + w = (x + u) + i(y + v ) Solution.
i(x + iy ) = −y + ix
zw = (x + iy )(u + iv ) 2
i (x + iy ) = i(−y + ix) = −x − iy
= xu + x(iv ) + (iy )u + (iy )(iv ) i 3 (x + iy ) = i(−x − iy ) = y − ix
= (xu − yv ) + i(xv + yu) i 4 (x + iy ) = i(y − ix) = x + iy
=
Example. Compute (3 + 2i) + (4 − i) and −y + ix ◦
(3 + 2i)(4 − i). i x + iy
◦
Solution. i
i i <
(3 + 2i) + (4 − i) = 7 + i −x − iy ◦
(3 + 2i)(4 − i) = 14 + 5i ◦ y − ix
=
Theorem 90. Let z ∈ C. z = x + iy
○ z =z ◦
○ |z|2 = zz
<
○ |z| = |z| ◦
z = x − iy
Proof. p
Given z = x + iy : z = x − iy and |z| = x 2 + y 2 .
z = x + iy = x − iy = x + iy
zz = (x + iy )(x − iy )
= x 2 − x(iy ) + (iy )x − (iy )(iy )
= x 2 + y 2 = |z|2 .
p q
|z| = |x + iy | = x 2 + y 2 = x 2 + (−y )2 = |x − iy | = |z|
LINEAR ALGEBRA
Complex Linear Algebra
ILLINOIS
Department of Mathematics
Goal. Use complex numbers (instead of real numbers) as scalars.
z1
z2
Definition. The (complex) vector space Cn is of all complex column vectors z = . ,
..
zn
where z1 , z2 , . . . , zn are complex numbers.
○ Now multiplication by a complex scalar makes sense.
○ We can define subspaces, span, independence, basis, dimension for Cn in the usual way.
○ We can multiply complex vectors by complex matrices. Column space and Null space still
make sense.
○ The only difference is the dot product, you need to use the complex conjugate to get a
good notion of length:
z1 w1
.. ..
. · . = z1 w̄1 + z2 w̄2 + . . . zn w̄n .
zn wn
0 −1
Example. Find the complex eigenvectors and eigenvalues of A = .
1 0
Solution.
−λ −1
det(A − λI ) = = λ2 + 1
1 −λ
Av = Av = Av = λv = λv.
1−λ 1
○ det(A − λI ) = = (1 − λ)2
0 1−λ
So: λ = 1 is the only eigenvalue (it has algebraic multiplicity 2).
0 1 1
○ λ=1: A−I = =⇒ Nul(A) = span( )
0 0 0
So the eigenspace still has only dimension 1. Complex numbers didn’t help.