[go: up one dir, main page]

0% found this document useful (0 votes)
15 views277 pages

CompleteLectureNotes Filled

MATH 257 is an introductory course in linear algebra focusing on both theoretical concepts and practical applications, particularly in engineering, science, and data science. The course covers essential topics such as matrix operations, linear systems, and advanced techniques like Singular Value Decomposition, with an emphasis on implementation using Python. Students are encouraged to engage deeply with the material, as the course includes both theoretical understanding and practical applications in modern algorithms.

Uploaded by

jenni221iluvu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views277 pages

CompleteLectureNotes Filled

MATH 257 is an introductory course in linear algebra focusing on both theoretical concepts and practical applications, particularly in engineering, science, and data science. The course covers essential topics such as matrix operations, linear systems, and advanced techniques like Singular Value Decomposition, with an emphasis on implementation using Python. Students are encouraged to engage deeply with the material, as the course includes both theoretical understanding and practical applications in modern algorithms.

Uploaded by

jenni221iluvu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 277

MATH 257

Linear Algebra with Computational Applications

ILLINOIS
Department of Mathematics
Course description
8 Linear algebra is used in most sciences and engineering areas,
because it allows modeling many natural phenomena, and ef-
ficiently computing with such models.
8 Mathematics is the language of science and engineering, but
coding is believing it.
8 But what is data science, if not linear algebra persevering?
This is a first course in linear algebra. This covers basic definitions
and algorithms of the subject needed in the higher level (engineer-
ing, science and economics) courses and more sophisticated math-
ematical techniques such as the Singular Value Decomposition.
In this course you learn the mathematical theory and how to imple-
ment it in Python. You will discover many of the striking modern
applications of linear algebra, such as Google’s PageRank algorithm,
image and audio compression schemes such as JPEG and MP3, au-
tomatic face recognition and other data science and machine learn-
ing algorithms.
Three disclaimers
o This is not a course that only teaches you how to compute stuff. Computer will always be
faster. Modern applications of linear algebra require a sophisticated understanding of
theory and methods of linear algebra, and learning these is the purpose of this course.
Some of it might look like abstract linear algebra. However, through the applications we
cover in the labs, you realizes that this indeed is applied linear algebra.
o If you already know some linear algebra, this course might look easy at the beginning.
Don’t be fooled into thinking it will stay like that. Even the material familiar to you will
be covered in more depth here. Furthermore, the exams will require a deeper
understanding of the concepts you already know something about. So it is a good idea to
take this course seriously from the beginning.
o MATH 257 is a new course. A significant part of the material for this new course is - not
surprisingly - new. So if you find a typo or an error in any part of this course, please let us
know by sending an email to the instructors. We appreciate your help, and are also happy
to hear any further comments or suggestions. Thank you!
Module videos Module videos are available at
i https://mediaspace.illinois.edu/playlist/0 w39larp3/
Further resources
Books
 Philip N. Klein, Coding the Matrix: Linear Algebra through Applications to Computer
Science, first edition, Newtonian Press
 Feryal Alayont, Steven Schlicker, Linear Algebra and Applications:An Inquiry-Based
Approach, scholarworks.gvsu.edu/books/21/
 David Cherney, Tom Denton, Rohit Thomas, Andrew Waldron, Linear Algebra,
www.math.ucdavis.edu/∼ linear/
 Gilbert Strang, Linear Algebra and its Applications, fourth edition, Cengage.
Videos
i Essence of Linear Algebra by 3Blue1Brown, on Youtube (highly recommended!)
i MIT lectures by Gilbert Strang, MIT Open Courseware
i Coding the Matrix videos by Philip Klein, on Youtube
Table of Contents.
Module 1: Introduction to Linear Systems Module 17: Column spaces and Nullspaces
Module 2: Matrices and Linear Systems Module 18: Abstract vector spaces
Module 3: Echelon forms of matrices Module 19: Linear independence
Module 4: Gaussian elimination Module 20: Basis and Dimension
Module 5: Linear combinations Module 21: The four fundamental subspaces
Module 6: Matrix vector multiplication Module 22: Graphs and adjacency matrices
Module 7: Matrix multiplication Module 23: Orthogonal complements
Module 8: Properties of matrix multiplication Module 24: Coordinates
Module 9: Elementary matrices Module 25: Orthonormal basis
Module 10: Inverse of a matrix Module 26: Linear Transformations
Module 11: Computing an inverse Module 27: Coordinate matrix
Module 12: LU decomposition Module 28: Determinants
Module 13: Solving using LU decomposition Module 29: Cofactor expansion
Module 14: Spring-mass systems Module 30: Eigenvectors and eigenvalues
Module 15: Inner products and orthogonality Module 31: Computing eigenvalues
Module 16: Subspaces of Rn Module 32: Properties of eigenvectors
Table of Contents (ctd).
Module 48: Review of complex numbers
Module 33: Markov matrices
Module 49: Complex linear algebra
Module 34: Diagonalization
Module 35: Powers of Matrices
Module 36: Matrix exponential
Module 37: Linear Differential Equations
Module 38: Projections onto lines
Module 39: Projections onto subspaces
Module 40: Least squares solutions
Module 41: Linear Regression
Module 42: Gram-Schmidt Method
Module 43: Spectral theorem
Module 44: Singular Value Decomposition
Module 45: Low rank approximations
Module 46: The Pseudo-inverse
Module 47: PCA
LINEAR ALGEBRA
Introduction to Linear Systems

ILLINOIS
Department of Mathematics
Definition. A linear equation is a equation of the form
a1 x1 + . . . + an xn = b
where a1 , ..., an , b are numbers and x1 , ..., xn are variables.
Example. Which of the following equations are linear equations (or can be rearranged to
become linear equations)?
Solution.

4x1 − 5x2 + 2 = x1 3x1 − 5x2 = −2 Linear


√ √
x2 = 2( 6 − x1 ) + x3 2x1 + x2 − x3 = 2 6 Linear
4x1 − 6x2 = x1 x2 4x1 − 6x2 = x1 x2 Not linear
√ √
x2 = 2 x1 − 7 x2 = 2 x1 − 7 Not linear

Definition. A linear system is a collection of one or more linear equations involving the
same set of variables, say, x1 , x2 , ..., xn .
A solution of a linear system is a list (s1 , s2 , ..., sn ) of numbers that makes each equation in
the system true when the values s1 , s2 , ..., sn are substituted for x1 , x2 , ..., xn , respectively.
Example. Two equations in two variables:

x1 + x2 = 1 (I)
−x1 + x2 = 0. (II)

What is a solution for this system of linear equations?


Solution.
Add (I) + (II): x2
2x2 = 1 (?) (II)
(I)
Plug into (I):
x1
1 1
x1 + = 1 ⇒ x1 =
2 2

Thus (x1 , x2 ) = ( 12 , 12 ) is the only solution.


Example. Does every system of linear equation have a solution?
x1 − 2x2 = −3 (III)
2x1 − 4x2 = 8. (IV)
Solution.

Multiply (III) by 2: x2

2x1 − 4x2 = −6. (†) (III)

Subtract (†) from (IV):


x1
0 = 14.

The equation 0 = 14 is always false, so no (IV)


solutions exist.
Example. How many solutions are there to the following system?
x1 + x2 = 3 (V)
−2x1 − 2x2 = −6 (VI)
Solution.

Multiply (V) by -2: x2

−2x1 − 2x2 = −6
(V)/(VI)
x1
(V) and (VI) has the same solutions. Any
value of x1 works, just set x2 := 3 − x1 .
Equation has infinitely many solutions.
Theorem 1. A linear system has either
one unique solution or no solution or infinitely many solutions.
Definition. The solution set of a linear system is the set of all solutions of the linear system.
Two linear systems are equivalent if they have the same solution set.
The general strategy is to replace one system with an equivalent system that is easier to solve.
Example. Consider
x1 − 3x2 = 1 (VII)
−x1 + 5x2 = 3 (VIII)
Transform this linear system into another easier equivalent system.
Solution.
Add (VII) to (VIII):

x1 − 3x2 = 1 (VII)
2x2 = 4 (IX)

Thus x2 = 2 and x1 = 7.
LINEAR ALGEBRA
Matrices and Linear Systems

ILLINOIS
Department of Mathematics
Definition. An m × n matrix is a rectangular array of numbers with m rows and n columns.
Example. Let’s give a few examples.
Solution.

   
    1 2
√  1 −2 3 4
1 2 1 2 3  
, , 3 4, 1 + 5 ,  5 −6 7 8 
3 4 2 1 4
5 6 −9 −10 11 12

In terms of the entries of A:


 
a11 a12 ··· a1n
 a21 a22 ··· a2n 
A= . aij is in the ith row and jth column
 
.. .. .. 
 .. . . . 
am1 am2 · · · amn
Definition. For a linear system, we define the coefficient and augmented matrix as follows:
linear system coefficient matrix augmented matrix
   
a11 x1 + a12 x2 + · · · + a1n xn = b1 a11 a12 · · · a1n a11 a12 · · · a1n b1
 a21 a22 · · · a2n b2 
a21 x1 + a22 x2 + · · · + a2n xn = b2  a21 a22 · · · a2n 
 
 
 .. .. .. ..   .. .. .. .. .. 
..  . . . .   . . . . . 
.
am1 am2 · · · amn am1 am2 · · · amn bm
am1 x1 + am2 x2 + · · · + amn xn = bm
Example. Determine the coefficient matrix and augmented matrix of the linear system
x1 − 3x2 = 1
−x1 + 5x2 = 3.
Solution.
 
1 −3
Coefficient matrix:
−1 5
 
1 −3 1
Augmented matrix:
−1 5 3
Definition. An elementary row operation is one of the following
(Replacement) Add a multiple of one row to another row: Ri → Ri + cRj , where i 6= j,
(Interchange) Interchange two rows: Ri ↔ Rj ,
(Scaling) Multiply all entries in a row by a nonzero constant: Ri → cRi , where c 6= 0.
Example. Give several examples of elementary row operations.
Solution.
Replacement:    
1 0 1 0
−→ .
0 1 R2 →R2 +3R1 3 1
Interchange:    
1 0 2 3
0 1 −→ 0 1
R ↔R
2 3 1 3 1 0
Scaling:    
1 0 1 0
−→ .
0 1 R2 →3R2 0 3
Example. Consider the elementary row operation R3 → R3 + 3R1 . Is there an elementary row
operation that reverse this row operation?
Solution.
Yes. The operation R3 → R3 − 3R1 reverses the row operations R3 → R3 + 3R1 . For example,
     
1 0 0 1 0 0 1 0 0
0 1 0  →  0 1 0 → 0 1 0 .
R →R +3R1 R3 →R3 −3R1
0 0 1 3 3 3 0 1 0 0 1

Remark. Indeed, every row operation is reversible. We already saw how to reverse the
replacement operator. The scaling operatior R2 → cR2 is reversed by the scaling operator
R2 → c1 R2 . Row interchange R1 ↔ R2 is reversible by performing it twice.
Definition. Two matrices are row equivalent, if one matrix can be transformed into the
other matrix by a sequence of elementary row operations.
Theorem 2. If the augmented matrices of two linear systems are row equivalent, then the
two systems have the same solution set.
LINEAR ALGEBRA
Echelon forms of matrices

ILLINOIS
Department of Mathematics
Definition. A matrix is in echelon form (or row echelon form) if
1. All nonzero rows (rows with at least one nonzero element) are above any rows of all zeros.
2. the leading entry (the first nonzero number from the left) of a nonzero row is always
strictly to the right of the leading entry of the row above it.
Example. Are the following matrices in echelon form? Circle the leading entries.
   
3 1 2 0 5 0 2 0 1 4
0 2 0 1 4 3 1 2 0 5
a) 
0
 b)  
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
   
2 −2 3 0 1 √3
c) 0 5 0  d) 0 0 2
0 0 52 0 0 0
Definition. A matrix is in row reduced echelon form (or: reduced echelon form, or:
RREF) if it is in echelon form and
3. The leading entry in each nonzero row is 1.
4. Each leading entry is the only nonzero entry in its column.
Example.
 Are the following matrices in reduced
 echelon form?
0 1 3 0 0 2 5 0 0 6
1 0 1 12 0 0
 
0 0
 0 −2  1 0 −2 3 2 −24
a) 
0 0 0 0 1 −3 4 0 0 5  b) 0 1 −2 2 0 −7 
0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 4
0 0 0 0 0 0 0 0 1 1

Theorem 3. Each matrix is row-equivalent to one and only one matrix in reduced echelon
form.
Definition. We say a matrix B is the reduced echelon form (or: the RREF) of a matrix A
if A and B are row-equivalent and B is in reduced echelon form.
Question. Is each matrix also row-equivalent to one and only one matrix in echelon from?
Solution.
  
1 0 1 2
No. For example, and are row-equivalent and both in echelon form.
0 1 0 1

Example. Find the reduced echelon form of the matrix


 
3 −9 12 −9 6 15
3 −7 8 −5 8 9
Solution.
   
3 −9 12 −9 6 15 1 −3 4 −3 2 5
; ;
R2 →R2 −R1 0 2 −4 4 2 −6 R1 → 13 R1 0 1 −2 2 1 −3
R2 → 12 R2
 
1 0 −2 3 5 −4
;
R1 →R1 +3R2 0 1 −2 2 1 −3
Definition. A pivot position is the position of a leading entry in an echelon form of a
matrix. A pivot column is a column that contains a pivot position.
Example. Locate the pivot columns of the following matrix.
 
0 −3 −6 4 9
A = −1 −2 −1 3 1
1 4 5 −9 −7
Solution.

   
1 4 5 −9 −7 1 4 5 −9 −7
;  −1 −2 −1 3 1  ;  0 2 4 −6 −6 
R1 ↔R3 R2 →R2 +R1
0 −3 −6 4 9 0 −3 −6 4 9
 
1 4 5 −9 −7
;  0 2 4 −6 −6 
R3 →R3 +1.5R2
0 0 0 −5 0

Thus columns 1, 2, 4 of A are the pivot columns of A.


Definition. A basic variable (or pivot variable) is a variable that corresponds to a pivot
column in the coefficient matrix of a linear system. A free variable is variable that is not a
pivot variable.
Example. Consider the following system of linear equations:
 
1 6 0 3 0 0 x1 +6x2 +3x4 =0
0 0 1 −8 0 5  x3 −8x4 =5
0 0 0 0 1 7 x5 = 7

Determine the basic variables and free variables of this system.


Solution.

The first, third and fifth column of the coefficient matrix are pivot columns.

Thus x1 , x3 , x5 are basic variables and x2 , x4 are free variables.


LINEAR ALGEBRA
Gaussian Elimination

ILLINOIS
Department of Mathematics
Goal. Solve linear systems for the pivot variables in terms of the free variables (if any) in the
equation.
Algorithm. (Gaussian Elimination) Given a linear system,
(1) Write down the augmented matrix.
(2) Find the reduced echelon form of the matrix.
(3) Write down the equations corresponding to the reduced echelon form.
(4) Express pivot variables in terms of free variables.
Example. Find the general solution of
3x1 −7x2 +8x3 −5x4 +8x5 = 9
C.F. Gauß (1777–1855)
3x1 −9x2 +12x3 −9x4 +6x5 = 15
Solution.

   
3 −7 8 −5 8 9 1 0 −2 3 5 −4
Augmented matrix:
3 −9 12 −9 6 15 RREF 0 1 −2 2 1 −3
Solution.
 
1 0 −2 3 5 −4
RREF of the augmented matrix:
0 1 −2 2 1 −3
Write down the equations corresponding to the reduced echelon form:

x1 −2x3 +3x4 +5x5 = −4


x2 −2x3 +2x4 + x5 = −3

Express pivot variables in terms of free variables:

x1 = 2x3 − 3x4 − 5x5 − 4


x2 = 2x3 − 2x4 − x5 − 3
x3 = free
x4 = free
x5 = free
Example. Is the following linear system consistent? Is the solution unique?
3x2 − 6x3 + 6x4 + 4x5 = −5
3x1 − 7x2 + 8x3 − 5x4 + 8x5 = 9
3x1 − 9x2 + 12x3 − 9x4 + 6x5 = 15
Solution.
   
0 3 −6 6 4 −5 3 −9 12 −9 6 15
3 −7 8 −5 8 9  ; 0 2 −4 4 2 −6 
EF
3 −9 12 −9 6 15 0 0 0 0 1 4

3x1 − 9x2 + 12x3 − 9x4 + 6x5 = 15


2x2 − 48x3 + 4x4 + 2x5 = −6
x5 = 4
Free variables: x3 , x4 . Set x3 = x4 = 0. Solve the third equation for x5 , the second for x2 and
the first for x1 . This is possible! Thus the system is consistent. Indeed, it has infinitely many
solutions (choose different values for x3 , x4 ).
 
Only a row of the form 0 0 0 0 0 b , where b 6= 0, would be a problem!
Theorem 4. A linear system is consistent if and only if an echelon form of the augmented
matrix has no row of the form 0 ... 0 b , where b is nonzero. If a linear system is
consistent, then the linear system has
○ a unique solution (when there are no free variables) or
○ infinitely many solutions (when there is at least one free variable).
 
3 4 −3
Example. Consider the linear system whose augmented matrix is 3 4 −3 . What can
6 8 −5
you say about the number of solutions of this system?
Solution.
     
3 4 −3 3 4 −3 3 4 −3
3 4 −3  ; 0 0 0  ; 0 0 1  .
R2 →R2 −R1 R2 ↔R3
6 8 −5 R3 →R3 −2R1 0 0 1 0 0 0
 
No solution, because of the row 0 0 1 .
Because 0x1 + 0x2 = 1 has no solution!
LINEAR ALGEBRA
Linear combinations

ILLINOIS
Department of Mathematics
   
a11 a12 ··· a1n b11 b12 ··· b1n
 a21 a22 ··· a2n   b21 b22 ··· b2n 
Definition. Consider m × n-matrices A = 
 .. .. .. .. ,

and B = 
 .. .. .. .. .

 . . . .   . . . . 
am1 am2 · · · amn bm1 bm2 · · · bmn

a) The sum of A + B is b)
 The product cA fora scalar c is
ca11 ca12 · · · ca1n
 
a11 + b11 a12 + b12 · · · a1n + b1n
 a21 + b21 a22 + b22 · · · a2n + b2n   ca21 ca22 · · · ca2n 
 
 .. .. .. 
 
 .. .. .. ..  ..
 . . . .   . . . . 
am1 + bm1 am2 + bm2 · · · amn + bmn cam1 cam2 · · · camn
Example. Calculate
         
1 0 2 3 3 3 2 1 0 10 5 0
+ = 5 =
5 2 3 1 8 3 3 1 −1 15 5 −5

Warning. Addition is only defined if A has the same number of columns as B and the same
number of rows as B.
Definition. A column vector is an m × 1-matrix. A row vector is a 1 × n-matrix.
 
Example. Give a few ex-   0
1    
amples of column and row , 1 2 ,  −2, 1 2 3 4 5 6 7 8 9
2
vectors. 3

Definition. If A is m × n, the transpose of A is the n × m matrix, denoted by AT , whose


columns are formed from the corresponding rows of A. In terms of matrix elements:
(AT )ij = Aji .
Example. What is the  
1 2 1 3 5
transpose of 3 4? 2 4 6
5 6

Remark. The transpose of a column vector is a row vector and vice versa.
Definition. The linear combination of m × n-matrices A1 , A2 , . . . , Ap with coefficients
c1 , c2 , . . . , cp is defined as
c1 A1 + c2 A2 + · · · + cp Ap .
Example. Consider m × n-matrices A1 and A2 . Give examples of linear combinations of these
two matrices.
Solution.

1
3A1 + 2A2 , A1 − 2A2 , 3 A1 = 31 A1 + 0A2 , 0 = 0A1 + 0A2

Definition. The span (A1 , . . . , Ap ) is defined as the set of all linear combinations of
A1 , . . . , Ap . Stated another way:
span (A1 , . . . , Ap ) := {c1 A1 + c2 A2 + · · · + cp Ap : c1 , . . . , cp scalars}.
Definition. We denote the set of all column vectors of length m by Rm .
     
1 4 −1
Example. Let a1 = 0 , a2 = 2 and b = 8 . Is b a linear combination of a1 , a2 ?
    
3 14 −5
Solution.
Find x1 , x2 such that  
1 4 −1
 
1
   
4 −1 ; 0 2 8 
R3 →R3 −R2
x1 0 + x2 2 = 8 
     0 0 −10
3 14 −5 System is inconsistent. Thus b is not a linear
combination of a1 , a2 .
1x1 + 4x2 = −1
0x1 + 2x2 = 8 b
a1
 3x1 + 14x2 = −5.  span(a1 , a2 )
1 4 −1 1 4 −1
a2
0 2 8  ; 0 2 8 
R3 →R3 −3R1
3 14 −5 0 2 −2
Key observation from the previous example: Solving linear systems is the same as finding
linear combinations!
Theorem 5. A vector equation

x1 a1 + x2 a2 + · · · + xn an = b

has the same solution set as the linear system whose augmented matrix is
 
a1 a2 · · · an | b

In particular, b can be generated by a linear combination of a1 , a2 , . . . , an if and only if there is


a solution to the linear system corresponding to the augmented matrix.
Notation. We often define a matrix in terms of its columns or its rows:
 
R1
 R2 
   
A := a1 a2 · · · an or A :=  . 
 .. 
Rm
LINEAR ALGEBRA
Matrix vector multiplication

ILLINOIS
Department of Mathematics
Definition. Let x be a vector in Rn and A = a1 . . . an an m × n-matrix. We define the
 

product Ax by
Ax = x1 a1 + x2 a2 + . . . + xn an .

Remark.
○ Ax is a linear combination of the columns of A using the entries in x as coefficients.
○ Ax is only defined if the number of entries of x is equal to the number of columns of A.
 
  1 2  
2 0 2
Example. Consider A = ,B= 0 1 ,x=
  . Determine Ax and Bx.
1 1 3
3 5
Solution.
        
2 0 2 2 0 4
Ax = =2 +3 =
1 1 3 1 1 5
       
1 2   1 2 8
2
Bx = 0 1 = 2 0 + 3 1 =  3 
3
3 5 3 5 21
Example. Consider the following vector equation
     
1 3 0
x1 + x2 = .
2 4 2

Find a 2 × 2 matrix A such that (x1 , x2 ) is a solution to the above equation if and only if
   
x 0
A 1 = ?
x2 2

Solution.
        
1 3 1 3 x1 1 3
Take A = ; Ax = = x1 + x2 .
2 4 2 4 x2 2 4
         
x1 0 1 3 0
;A = ⇐⇒ x1 + x2 =
x2 2 2 4 2
Theorem 6. Let A = a1 . . . an be an m × n-matrix and b in Rm . Then the following
 

are equivalent
○ (x1 , x2 , . . . , xn ) is a solution of the vector equation x1 a1 + x2 a2 + · · · + xn an = b
 
x1
 .. 
○  .  is a solution to the matrix equation Ax = b.
xn
 
○ (x1 , x2 , . . . , xn ) is a solution of the system with augmented matrix A b
 
Notation. We will write Ax = b for the system of equations with augmented matrix A b .

A matrix as machine. Let A be a m × n matrix.


○ Input: n-component vector x ∈ Rn .
○ Output: m-component vector b = Ax ∈ Rm .

x Ax
A

0 1
Example. Consider the matrix A = . What does this machine do?
1 0
Solution.

x2
 
x
Let x = 1 be our input.
x2
        
0 1 x1 0 1 x Av
Ax = = x1 +x2 = 2 .
1 0 x2 1 0 x1
v x1
So the machine A switches the entries of
Aww
the vector x.
Geometrically speaking, this machine re-
flects across the x1 = x2 -line.


1 0
Example. Consider the matrix B = . What does this machine do?
0 0
Solution.

x2
 
x
Let x = 1 be our input.
x2
        
1 0 x1 1 0 x
Bx = = x1 +x2 = 1 . v
0 0 x2 0 0 0 Bw
w Bv x1
So the machine B replaces the second entry
of the vector x by 0.
Geometrically speaking, this machine
projects a vector onto the x1 -axis.
Composition of machines. Let A be an m × n matrix and B be an k × l matrix.Now we can
compose the two machines:

x Ax B(Ax)
A B

Question. This composition only works for some k, l, m, n. For which?


Solution.

If A is an m × n-matrix and x in Rn , then Ax is in Rm .


In order to calculate B(Ax), we then need B to have m columns.
So we need l = m. Both n and k can be arbitrary.
  
0 1 1 0
Example. Let A= ,B = be as before. Is A(Bx) = B(Ax)?
1 0 0 0
Solution.
No, projection and reflection do not commute!
           
1 1 0 1 2 2
A(B )=A = , B(A )=B =
2 0 1 2 1 0

x2 x2

v v
ABv Av

Bv x1 BAv x1
LINEAR ALGEBRA
Matrix multiplication

ILLINOIS
Department of Mathematics
 
Definition. Let A be an m × n-matrix and let B = b1 . . . bp be an n × p-matrix. We
define
AB := [Ab1 Ab2 · · · Abp ]
 
4 −2  
2 −3
Example. Compute AB where A = 3 −5 and B =
  .
6 −7
0 1
Solution.
       
4 −2   4 −2 −4
2
Ab1 = 3 −5 = 2 3 + 6 −5 = −24
6
0 1 0 1 6
         
4 −2   4 −2 2 −4 2
−3
= −3 3 − 7 −5 =  26  ; AB = Ab1 Ab2 = −24 26 
 
Ab2 = 3 −5
−7
0 1 0 1 −7 6 −7
Remark. Ab1 is a linear combination of the columns of A and Ab2 is a linear combination
of the columns of A. Each column of AB is a linear combination of the columns of A using
coefficients from the corresponding columns of B.
Definition. Let A be an m × n-matrix and let B be an n × p-matrix. We define

AB := [Ab1 Ab2 · · · Abp ]

Question. If C is 4 × 3 and D is 3 × 2, are CD and DC defined? What are their sizes?


Solution.

The product AB is only defined if B has as many rows as A has columns.

If this is the case, then AB has as many rows as A and as many columns as B.

Thus CD is defined and 4 × 2. DC is not defined.


We already learnt that we can think of matrices as machines.
○ Let B be n × p: input x ∈ Rp , output c = Bx ∈ Rn .
○ Let A be m × n: input y ∈ Rn , output b = Ay ∈ Rm .
We already saw that these machines can be composed.

x Bx A(Bx)
B A

Question. How does that compare to the machine given by AB?

x (AB)x
AB
Example. Consider      
2 0 1 2 x
A= , B= , x= 1
1 1 0 1 x2
Compute (AB)x and A(B(x)). Are these the same?
Solution.
        
1 2 x1 1 2 x1 + 2x2
Bx = = x1 + x2 =
0 1 x2 0 1 x2
        
2 0 x1 + 2x2 2 0 2x1 + 4x2
A(Bx) = = (x1 + 2x2 ) + (x2 ) =
1 1 x2 1 1 x1 + 3x2
         
  x1 2 0 2 0
(AB)x = Ab1 Ab2 = x1 Ab1 + x2 Ab2 = x1 (1 +0 ) + x2 (2 +1 )
x2 1 1 1 1
     
2 4 2x1 + 4x2
= x1 + x2 =
1 3 x1 + 3x2

Theorem 7. Let A be an m × n matrix and B be an n × p matrix. Then for every x ∈ Rp


A(Bx) = (AB)x.
Row-Column Rule for computing AB. Let A be m × n and B be n × p such that
 
R1
 ..   
A =  .  , and B = C1 . . . Cp .
Rm
Then  
R1 C1 . . . R1 Cp
AB =  R2 C1 . . . R2 Cp  and (AB)ij = Ri Cj = ai1 b1j + ai2 b2j + · · · + ain bnj .
Rm C1 . . . Rm Cp
 
  2 −3
2 3 6
Example. A = ,B= 0 1  . Compute AB, if it is defined.
−1 0 1
4 −7
Solution.
   
2·2+3·0+6·4 2 · (−3) + 3 · 1 + 6 · (−7) 28 − 45
AB = =
− 1 · 2 + 0 · 0 + 1 · 4 −1 · (−3) + 0 · 1 + 1 · (−7) 2 −4
Outer Product Rule for computing AB. Let A be m × n and B be n × p such that
 
R1
   .. 
A = C1 · · · Cn , and B =  .  .
Rn
Then
AB = C1 R1 + · · · + · · · + Cn Rn
 
  2 −3
2 3 6
Example. A = ,B= 0 1  . Compute AB, if it is defined.
−1 0 1
4 −7
Solution.
 
  2 −3      
2 3 6  2   3   6  
0 1 = 2 −3 + 0 1 + 4 −7
−1 0 1 −1 0 1
4 −7
       
4 −6 0 3 24 −42 28 −45
= + + =
−2 3 0 0 4 −7 2 −4
LINEAR ALGEBRA
Properties of matrix multiplication

ILLINOIS
Department of Mathematics
Definition. The identity matrix In of size n is defined as
 
1 0 ... 0
0 1 . . . 0
In =  . .. . . ..  .
 
 .. . . .
0 0 ... 1

Theorem 8. Let A be an m × n matrix and let B and C be matrices for which the indicated
sums and products are defined.
(a) A (BC ) = (AB)C (associative law of multiplication)
(b) A (B + C ) = AB + AC , (B + C ) A = BA + CA (distributive laws)
(c) r (AB) = (rA)B = A(rB) for every scalar r ,
(d) A (rB + sC ) = rAB + sAC for every scalars r , s (linearity of matrix multiplication)
(e) Im A = A = AIn (identity for matrix multiplication)
Warning. Properties above are analogous to properties of real numbers. But NOT ALL
properties of real number also hold for matrices.
   
1 1 1 0
Example. Let A = ,B= . Determine AB and BA. Are these matrices the
0 1 1 1
same?
Solution.

   
1 1 1 0
2 1
AB = =
0 1 1 1
1 1
    
1 0 1 1 1 1
BA = =
1 1 0 1 1 2
No. These matrices are not the same: AB 6= BA. Matrix multiplication is not commutative.
 
  1 2
1 2 0
Example. Let A = ,B= 0 1  . Compute (AB)T , AT B T and B T AT .
3 0 1
Solution. −2 4
 
  1 2  
1 2 0  1 4
AB = 0 1 =

3 0 1 1 10
−2 4
 
T 1 1
(AB) =
4 10
   
1 3   7 3 10
1 0 −2
AT B T = 2 0 = 2 0 −4
2 1 4
0 1 2 1 4
 
  1 3  
T T 1 0 −2  1 1
B A = 2 0 =

2 1 4 4 10
0 1
Theorem 9. The transpose of a product is the product of transposes in opposite order:
(AB)T = B T AT
Definition. We write Ak for A · · · A, k-times; that is Ak is obtained by multiplying A k-times
with itself.
Question. For which matrices A does Ak make sense? If A is m × n what can m and n be?
Solution.

To be able to multiply A by any m × n-matrix, we need that m = n.


 3
1 0
Example. Determine .
3 2
Solution.
 3    
1 0 1 0 1 0 1 0
=
3 2 3 2 3 2 3 2
    
1 0 1 0 1 0
= =
9 4 3 2 21 8

Remark. In general, calculating high powers of large matrix in this way is very hard. We will
later learn a clever way to do this efficiently.
LINEAR ALGEBRA
Elementary matrices

ILLINOIS
Department of Mathematics
Definition. An elementary matrix is one that is obtained by performing a single elementary
row operation on an identity matrix.
A permutation matrix is one that is obtained by performing row exchanges on an identity
matrix.      
1 0 0 1 0 0 1 0 0
Example. Consider E1 =  0 3 0 , E2 =  0 0 1 , E3 =  0 1 0 . Are these
0 0 1 0 1 0 2 0 1
matrices elementary matrices?
Solution.
   
1 0 0 1 0 0
0 1 0 ; 0 3 0
R2 →3R2
0 0 1 0 0 1
   
1 0 0 1 0 0
0 1 0 ; 0 0 1
R2 ↔R3
0 0 1 0 1 0
   
1 0 0 1 0 0
0 1 0 ; 0 1 0
R3 →R3 +2R1
0 0 1 2 0 1
Question. Let A be a 3 × 3-matrix. What happens to A if you multiply it by one of E1 , E2
and E3 ?
Solution.     
1 0 0 a11 a12 a13 a11 a12 a13
E1 A =  0 3 0  a21 a22 a23  = 3a21 3a22 3a23 
0 0 1 a31 a32 a33 a31 a32 a33
    
1 0 0 a11 a12 a13 a11 a12 a13
E2 A =  0 0 1  a21 a22 a23  = a31 a32 a33 
0 1 0 a31 a32 a33 a21 a22 a23
    
1 0 0 a11 a12 a13 a11 a12 a13
E3 A =  0 1 0  a21 a22 a23  =  a21 a22 a23 
2 0 1 a31 a32 a33 a31 + 2a11 a32 + 2a12 a33 + 2a13

Theorem 10. If an elementary row operation is performed on an m × n-matrix A, the


resulting matrix can be written as EA, where the m × m-matrix E is created by performing the
same row operations on Im .
Theorem 11. Let A, B be two m × n-matrices and row-equivalent. Then there is a sequence
m × m-elementary matrices E1 , . . . , E` such that
E` . . . E1 A = B.
   
0 1 1 2
Example. Consider A = 1 2 and B = 0 1. Find two elementary matrices E1 , E2 such
2 4 0 0
that E2 E1 A = B.
Solution.
     
0 1 1 2 1 2
A = 1 2 ; 0 1 ; 0 1 = B
R1 ↔R2 R3 →R3 −2R1
2 4 2 4 0 0
   
0 1 0 1 0 0
Set E1 = 1 0 0and E2 =  0 1 0.
0 0 1 −2 0 1
     
1 0 0 0 1 0 0 1 1 2
E2 E1 A =  0 1 0 1 0 0 1 2 = 0 1 = B
−2 0 1 0 0 1 2 4 0 0
Example. From the previous example, we know
     
1 2 1 0 0 0 1 0 0 1
0 1 =  0 1 0 1 0 0 1 2 .
0 0 −2 0 1 0 0 1 2 4
| {z } | {z }
B A

Find elementary matrices E1−1 and E2−1 such that A = E1−1 E2−1 B.
Solution.
     
0 1 1 2 1 2
A= 1 2
  0 1  0 1 = B
R ↔R R →R +2R1
2 4 1 2 2 4 3 3 0 0
   
0 1 0 1 0 0
Set E1−1 = 1 0 0 and E2−1 = 0 1 0.
0 0 1 2 0 1
LINEAR ALGEBRA
Inverse of a matrix

ILLINOIS
Department of Mathematics
The inverse of a real number a is denoted by a−1 . For example, 7−1 = 1/7 and

7 · 7−1 = 7−1 · 7 = 1.

Remark. Not all real numbers have inverse. 0−1 is not well defined, since there is no real
number b such that 0 · b = 1.
Remember that the identity matrix In is the n × n-matrix
 
1 0 ··· 0
0 1 · · · 0
In =  . . .
 
. . . .. 
. . . .
0 0 ··· 1

Definition. An n × n matrix A is said to be invertible if there is an n × n matrix C satisfying

CA = AC = In

where In is the n × n identity matrix. We call C the inverse of A.


 
1 0 0
Example. What is the inverse of 5 1 0?
0 0 1
Solution.

Elementary matrices are invertible, because row operations are


 reversible.
 So the inverse matrix
1 0 0
is the elementary matrix corresponding to R2 → R2 − 5R1 : −5 1 0
0 0 1

Theorem 12. Let A be an invertible matrix, then its inverse C is unique.


Proof. Assume B and C are both inverses of A. Then

BA = AB = In (I)
CA = AC = In (II)

Thus B = BIn = BAC = In C = C


(II) (I)
Theorem 13. Suppose A and B are invertible. Then:
−1
(a) A−1 is invertible and A−1 = A (i.e. A is the inverse of A−1 ).
(b) AB is invertible and (AB)−1 = B −1 A−1 .
−1 T
(c) AT is invertible and AT = A−1 .
Proof.
(a)
AA−1 = I = A−1 A X
(b)

(B −1 A−1 )(AB) = B −1 IB = B −1 B = I X
(AB)(B −1 A−1 ) = AIA−1 = AA−1 = I X
(c)

AT (A−1 )T = (A−1 A)T = I T = I X


(A−1 )T AT = (AA−1 )T = I T = I X
○ We will write A−1 for the inverse of A. Multiplying by A−1 is like “dividing by A.”
A
○ Do not write B. Why?

It is unclear whether this means AB −1 or B −1 A, and these two matrices are different.

○ Fact: if AB = I , then A−1 = B and so BA = I . (Not so easy to show at this stage.)


○ Not all n × n matrices are invertible. For example, the 2 × 2 matrix
 
0 1
0 0
is not invertible. Try to find an inverse!
    
0 1 a b c d
= 6 I2
=
0 0 c d 0 0
    
a b 0 1 0 a
= 6= I2
c d 0 0 0 c
Theorem 14. Let A be an invertible n × n matrix. Then for each b in Rn , the equation
Ax = b has the unique solution x = A−1 b.
Proof.

The vector A−1 b is a solution, because x2

A(A−1 b) = (AA−1 )b = In b = b A

Suppose there is another solution w, then Aw = A−1 b


b. Thus A−1 b

w = In w = A−1 Aw = A−1 b. x1

Question. If A is an invertible n × n matrix, then A has how many pivot columns?

A must have n pivots, because otherwise Ax = b would not have a solution of each b.
LINEAR ALGEBRA
Computing the inverse

ILLINOIS
Department of Mathematics
 
Question. When is the 1 × 1 matrix a invertible?
Solution.
1
When a 6= 0. Its inverse is [ a ].
 
a b
Theorem 15. Let A = . If ad − bc 6= 0, then A is invertible and
c d
 
−1 1 d −b
A = .
ad − bc −c a

If ad − bc = 0, then A is not invertible.


Proof.
    
1 d −b a b 1 da − bc db − bd
=
ad − bc −c a c d ad − bc −ca + ac −cb + ad
 
1 0
=
0 1
Theorem 16. Let A be an n × n-matrix. The following are equivalent:
○ A is invertible.
○ the reduced echelon form of A is In .
Proof.
Suppose A can be row-reduced to the identity matrix.

A = A0 ; A1 ; · · · ; Am = In

Thus there are elementary matrices E1 , . . . , Em such that

Em Em−1 . . . E1 A = In .

Thus
A−1 = Em Em−1 . . . E1 = Em Em−1 . . . E1 In .

Theorem 17. Suppose A is invertible. The every sequence of elementary row operations
that reduces A to In will also transform In to A−1 .
Algorithm.
○ Place A and I side-by-side to form an augmented matrix [ A | I ].
This is an n × 2n matrix (Big Augmented Matrix), instead of n × (n + 1).
○ Perform row operations on this matrix (which will produce identical operations on A and
I ).
○ By Theorem:
[ A | I ] will row reduce to I | A−1
 

or A is not invertible.
 
2 0 0
Example. Find the inverse of A = −3 0 1, if it exists.
0 1 0
Solution.
1
   
  2 0 0 1 0 0 1 0 0 2 0 0
A I = −3 0 1 0 1 0  ∼ · · · ∼ 0 1 0 0 0 1 
3
0 1 0 0 0 1 0 0 1 2 1 0
Example. Let’s do the previous example step by step.
Solution.
 
2 0 0 1 0 0
[ A | I ] =  −3 0 1 0 1 0 
0 1 0 0 0 1

1 0 0 12 0 0
 

;  −3 0 1 0 1 0 
R1 → 12 R1
0 1 0 0 0 1

1 0 0 12 0 0
 

;  0 0 1 3 1 0 
2
R2 →R2 +3R1
0 1 0 0 0 1

1 0 0 12 0 0
 

;  0 1 0 0 0 1 
R2 ↔R3
0 0 1 32 1 0
LINEAR ALGEBRA
LU decomposition

ILLINOIS
Department of Mathematics
Definition. An n × n matrix A is called  
? ? ? ? ?
0 ? ? ? ?
 
0
upper triangular if it is of the form  0 ? ? ? ,
. .
0 . . .. 

0 0
0 0 0 0 ∗
 
? 0 0 0 0
? ? 0 0 0
 
?
lower triangular if it is of the form  ? ? 0 0 .
 . . .. 
? ? ? . .
? ? ? ? ?
Example. Give a few examples of upper and lower triangular matrices!
Solution.
Upper triangular
 matrices:
 Lower triangular matrices:
 
  2 3 1     2 0 0
1 0  0 0 5 0 
, 0 3 4 , , 0 3 0
0 1 0 1 2 1
0 0 2 1 −2 2
 
0 2
Neither upper nor lower triangular:
2 1
Theorem 18. The product of two lower (upper) triangular matrices is lower (upper)
triangular.
Example. Consider the row operation Ri → Ri + cRj where j < i. What can you say about
the elementary matrix E corresponding this row operation? What about E −1 ?
Solution.
Such a matrix is lower-triangular! Take R3 → R3 + cR2 , then
 
1 0 0
E = 0 1 0 .
0 c 1
 
1 0 0
Recall that E −1 = 0 1 0. So E −1 is also lower-triangular.
0 −c 1
Remark. The inverse of a lower (upper) triangular matrix (if it exists) is again lower (upper)
triangular.
Definition. A matrix A has an LU decomposition if there is a lower triangular matrix L and
a upper triangular matrix U such that A = LU.
Theorem 19. Let A be an n × n-matrix. If A can be brought to echelon form just using row
operations of the form Ri → Ri + cRj where j < i, then A has an LU-decomposition.
Proof.
A ; A1 ; A2 · · · ; Am = U
using row operations of the form Ri → Ri + cRj where j < i. Then there are lower-triangular
matrices E1 , . . . , Em such that
Em Em−1 . . . E1 A = U.
Multiply the equation by E1−1 . . . Em−1
−1
Em−1 , we get

A = E1−1 . . . Em−1
−1
Em−1 U.
| {z }
=:L

Remark. Not every matrix has an LU decomposition.


The LU decomposition of a matrix is not unique.
 
2 1 1
Example. Let A =  4 −6 0 . Determine its LU decomposition.
−2 7 2
Solution.

       
2 1 1 2 1 1 2 1 1 2 1 1
 4 −6 0 ;  0 −8 −2 ; 0 −8 −2 ; 0 −8 −2
R2 →R2 −2R1 R3 →R3 +R1 R3 →R3 +R2
−2 7 2 −2 7 2 0 8 3 0 0 1
      
2 1 1 1 0 0 1 0 0 1 0 0 2 1 1
U = 0 −8 −2 = 0 1 0 0 1 0 −2 1 0  4 −6 0 = E3 E2 E1 A
0 0 1 0 1 1 1 0 1 0 0 1 −2 7 2
    
1 0 0 2 1 1 2 1 1
E1−1 E2−1 E3−1 U =  2 1 0 0 −8 −2 =  4 −6 0 = A
−1 −1 1 0 0 1 −2 7 2
| {z }
=:L
 
1 0 0
Remark. In previous example L = `21 1 0 where `ij is the factor between pivot and
`31 `32 1
the entry you want to make zero in the elimination process. This works in general if you do the
row operations in the right order.
Step 1:    
2 1 1 2 1 1
A =  4 −6 0 ; = 0 −8 −2
−2 7 2 R2 →R2 −2 R1 ,R3 →R3 + 1 R1 0 8 3
Then set `21 = 2 and `31 = −1.
Step 2:    
2 1 1 2 1 1
0 −8 −2 ; = 0 −8 −2 .
0 8 3 R3 →R3 + 1 ·R2 0 0 1
     
2 1 1 1 0 0 1 0 0
Set `32 = −1,‘ U := 0 −8 −2 , and L :=  `21 1 0 = 2 1 0 .
0 0 1 `31 `32 1 −1 −1 1
 
1 2 2
Example. Determine the LU-decomposition of 4 4 4.
4 4 8
Solution.

     
1 2 2 1 2 2 1 2 2
4 4 4 ; 0 −4 −4 ; 0 −4 −4 .
4 4 8 R2 →R2 −4 R1 ,R3 →R3 −4 R1 0 −4 0 R3 →R3−1 R2 0 0 4

Thus    
1 2 2 1 0 0
U := 0 −4 −4 , L := 4 1 0 .
0 0 4 4 1 1
Check that A = LU.

Remark. It is important that you do the row operations in the right order!
LINEAR ALGEBRA
Solving linear systems using LU decomposition

ILLINOIS
Department of Mathematics
Theorem 20. Let A be an n × n-matrix such that A = LU, where L is a lower triangular
matrix and U is a upper triangular matrix, and let b ∈ Rn . In order to find a solution of the
linear system
Ax = b,
it is enough to find a solution of the linear system

Ux = c,

where c satisfies Lc = b.
Proof.

If Lc = b and Ux = c, then

Ax = (LU)x = L(Ux) = Lc = b

On the other hand, suppose Ax = b. We take c to be the vector Ux. Then Lc is equal to
L(Ux) = Ax = b, so in total we have both Ux = c and Lc = b.
    
2 1 1 x1 5
Example. Find a solution to the linear system 4
 −6 0  x2 = −2.
 
−2 7 2 x3 9
Solution.
    
2 1 1 1 0 0 2 1 1
Recall the LU decomposition  4 −6 0 =  2 1 0 0 −8 −2 .
−2 7 2 −1 −1 1 0 0 1
| {z }| {z }
L U
    
1 0 0 c1 5     
2 1 1 x1 5
 2 1 0   c2  =  −2   0 −8 −2   x2  =  −12  .
−1 −1 1 c3 9
| {z } | {z } 0 0 1 x3 2
L b | {z } | {z }
U c
Forward substitution:
Backwards substitution:
Row 1 → c1 = 5.
Row 3 → x3 = 2.
Row 2 → 10 + c2 = −2; c2 = −12.
Row 2 → −8x2 − 4 = −12; x2 = 1.
Row 3 → −5 + 12 + c3 = 9; c3 = 2.
Row 1 → 2x1 + 1 + 2 = 5; x1 = 1.
Question. Why do we care about LU decomposition if we already have Gaussian elimination?
Solution.

Imagine you solve many linear systems Ax = b where A always stays the same, but b varies.
You have to do the LU-factorization of A only once, and not for every b.
Finding solutions of the two triangular systems above takes 2n2 − n arithmetic operations.
You need (4n3 + 9n2 − 7n)/6 arithmetic operations for Gaussian elimination.

Theorem 21. Let A be n × n matrix. Then there is a permutation matrix P such that PA
has an LU-decomposition.
Proof.

If A can be brought to echelon form with the help of row exchanges, we can do those exchange
first.
So there is a permutation matrix P such that PA can be brought to echelon form without row
exchanges.
 
0 0 1
Example. Let A = 1 1 0. Find a permutation matrix P such that PA has a LU
2 1 0
decomposition.
Solution.
     
0 0 1 2 1 0 0 0 1
A = 1 1 0 ; 1 1 0 Set P = 0 1 0 .
R1 ↔R3
2 1 0 0 0 1 1 0 0
   
2 1 0 2 1 0
PA = 1 1
 0 ; 0 .5 0
R2 →R2 −.5R1
0 0 1 0 0 1
| {z }
=:U

Thus 
2 1 0
 
1 0 0

2 1 0

PA = 1 1 0 = .5 1 0 0 .5 0
0 0 1 0 0 1 0 0 1
| {z }
=:L
LINEAR ALGEBRA
Spring-mass systems

ILLINOIS
Department of Mathematics
Example. Consider the following spring-mass system, consisting of five masses and six
springs fixed between two walls.

m1 m2 m3 m4 m5

Goal.
○ Add (steady) applied forces f1 , f2 , f3 , f4 , f5 a (steady) applied force on mass i.
○ Compute the displacements u1 , u2 , u3 , u4 , u5 of the fives masses.

u1 u2 u3 u4 u5

m1 m2 m3 m4 m5

f1 f2 f3 f4 f5
Equilibrium. In the equilibrium, the forces at each mass add up to 0.

Hooke’s law. The force F needed to extend or compress a spring by some distance u is
proportional to that distance; that is F = −ku, where k is a constant factor characteristic of
the spring, called its stiffness. Let ki be the stiffness of the i-th spring.

Springs:
○ u1 > 0 ; spring 1 is extended.
○ u2 − u1 > 0 ; spring 2 is extended.
u1 u2 Forces
u3 at m1 : u4 u5
f1 f2 ○ applied forces f1 (if positive, pushes m1 to the right).
m1 m2 ○m spring 1: −k1 u1 (since
m4 u1 > 0, pulls m m15 to the left).
3
○ spring 2: k2 (u2 − u1 ) (since u2 − u1 > 0, pulls m1 to
f3
the right). f4 f5
−k1 u1 k2 (u2 − u1 )
Equilibrium:

f1 − k1 u1 + k2 (u2 − u1 ) = 0 ; (k1 + k2 )u1 − k2 u2 = f1


Springs:
○ u2 − u1 > 0 ; spring 2 is extended.
○ u3 − u2 > 0 ; spring 3 is extended.
Forces at m2 :
u1 u2 u3 u4 u5
f1 f2 ○ applied forces f2 (if positive, pushes m2 to the right).

m1 m2 m3 f3 ○ spring m
2: −k2 (u2 − u1 ) (since (u2 − u1 ) > 0, pulls
m5
4 left).
m2 to the
○ spring 3: k3 (u3f4− u2 ) (since u3 − fu52 > 0, pulls m2 to
−k2 (u2 − u1 ) k3 (u3 − u2 ) the right).
Equilibrium at m2 :

f2 − k2 (u2 − u1 ) + k3 (u3 − u2 ) = 0
; −k2 u1 + (k2 + k3 )u2 − k3 u3 = f2
Springs:
○ u5 − u4 > 0 ; spring 5 is extended.
○ u5 > 0 ; spring 6 is compressed.
Forces at m5 :
u4 u5
f4 f5 ○ applied forces f5 (if positive, pushes m5 to the right).
○ spring 5: −k5 (u5 − u4 ) (since (u5 − u4 ) > 0, pulls
m4 m5 m5 to the left).
○ spring 6: −k6 u5 (since u5 > 0, pushes m5 to the
−k5 (u5 − u4 ) −k6 u5 left).
Equilibrium at m5 :

f5 − k5 (u5 − u4 ) − k6 u5 = 0
; (k5 + k6 )u5 − k5 u4 = f5
Equilibrium equations.
 
(k1 + k2 )u1 − k2 u2 = f1 k1 + k2 −k2 0 0 0 f1
−k2 u1 + (k2 + k3 )u2 − k3 u3 = f2

 −k2 k2 + k3 −k3 0 0 f2 

 0 −k3 k3 + k4 −k4 0 f3 
−k3 u2 + (k3 + k4 )u3 − k4 u4 = f3  
 0 0 −k4 k4 + k5 −k5 f4 
−k4 u3 + (k4 + k5 )u4 − k5 u5 = f4 0 0 0 −k5 k5 + k6 f5
−k5 u4 + (k5 + k6 )u5 = f5 .
Remark.
○ The purpose of this example is not to find the precise solutions to these equations, but
rather to show you that linear equations appear naturally in engineering.
○ Finite element method: object is broken up into many small parts with connections
between the different parts ; gigantic spring-mass system where the forces from spring
correspond to the interaction between the different parts.
○ In practice: millions of equations, not just five!
○ Solve not for a single force vector f, but for many different vectors. Thus the coefficient
matrix stays the same ; LU-decomposition can make a difference in how quickly such
systems can be solved.
LINEAR ALGEBRA
Inner Product and Orthogonality

ILLINOIS
Department of Mathematics
Definition. The inner product of v, w ∈ Rn is

v · w = vT w.
   
v1 w1
 ..   .. 
Example. If v =  .  and w =  . , then v · w is ...
vn wn
Solution.

v1 w1 + v2 w2 + · · · + vn wn

Question. Why is v · w = w · v?
Solution.
Note that (v · w)T = v · w, because it is a 1 × 1-matrices. Thus

vT w = (vT w)T = wT (vT )T = wT v


Question. Why is v · v always larger or equal to 0? For which v is v · v = 0?
Solution.

v · v = v1 v1 + · · · + vn vn = v12 + · · · + vn2 ≥ 0.
|{z} |{z}
≥0 ≥0

Thus v · v = 0 if and only if v = 0.

Theorem 22. Let u, v and w be vectors in Rn , and let c be any scalar. Then
(a) u · v = v · u
(b) (u + v) · w = u · w + v · w
(c) (cu) · v =c (u · v) = u · (cv)
(d) u · u ≥ 0, and u · u = 0 if and only if u = 0.
Definition. Let v, w ∈ Rn . x2
The norm (or length) of v is
√ q v−w
kvk = v · v = v12 + · · · + vn2 . w v

The distance between v and w is kvk

dist (v, w) = kv − wk. x1


     
2 2 0
Example. Compute k−1k and dist(0 , 1).
1 1 0
Solution.

  v         
2 u 2 2 2 0 2
u
q √ √
k−1k = t−1 · −1 = 22 + (−1)2 + 12 = 6, dist(0 , 1) = k−1k = 6.
u
1 1 1 1 0 1
Definition. Let v, w ∈ Rn . We say v and w are orthogonal if v · w = 0.
Theorem 23. Let v, w ∈ Rn be non-zero. Then v and w are orthogonal if and only if they
are perpendicular (ie. if they form a right angle).
Proof.

v−w v⊥w
⇐⇒ kvk2 + kwk2 = kv − wk2
⇐⇒ v · v + w · w = (v − w) · (v − w)
w ⇐⇒ v · v + w · w = v · v − 2v · w + w · w
v ⇐⇒ v · w = 0
       
1 1 1 1
Example. Are , orthogonal? Are , orthogonal?
1 −1 1 −2
Solution.
       
1 1 1 1
· = 1 · 1 + 1 · (−1) = 0, · = 1 · 1 + 1 · (−2) = −1
1 −1 1 −2

Definition. A set of vectors in Rn is pairwise orthogonal if each pairing of them is


orthogonal. We call such a set an orthogonal set.
  
1 1
3
Example. Find a non-zero v ∈ R such that 1 , −1 and v form an orthogonal set?
  
Solution. 0 0
     
v1 1 1
Find v = v2  such that 1 · v = −1 · v = 0. So v1 + v2 = 0 and v1 − v2 = 0.
v3 0 0 
0
Thus v1 = v2 = 0. So we can take v = 0 .
1
Definition. A unit vector in Rn is vector of length 1.
 
1 x2 v1
Example. Let v = . Is v a unit vector?
2
Solution. u1
√ x1
v u2
Since v · v = 5 and kvk = 5. However, u = kvk = √v5 is a unit
v2
vector. The unit vector u is called the normalization of v.

Definition. A set of vectors form an orthonormal set if this set is an orthogonal set and all
vectors in the set are unit vectors.
   
1 1
Example. Let v1 = , v2 = . Do v1 and v2 form an orthonormal set?
1 −1
Solution.

They are orthogonal


 but not orthonormal,
  since not unit vectors. Normalize to get a orthonor-
1 1
mal set u1 = √12 , u2 = √12 .
1 −1
LINEAR ALGEBRA
Subspaces of Rn

ILLINOIS
Department of Mathematics
Definition. A non-empty subset H of Rn is a subspace of Rn if it satisfies the following two
conditions:
○ If u, v ∈ H, then the sum u + v ∈ H. (H is closed under vector addition).
○ If u ∈ H and c is a scalar, then cu ∈ H. (H is closed under scalar multiplication.)
Theorem 24. Let v1 , v2 , . . . , vm ∈ Rn . Then Span (v1 , v2 , . . . , vm ) is a subspace of Rn .
Proof.
Let
u = c1 v1 · · · + cm vm , and w = d1 v1 · · · + dm vm .
Then

u + w = c1 v1 · · · + cm vm + d1 v1 · · · + dm vm = (c1 + d1 )v1 · · · + (cm + dm )vm .

and
cu = c(c1 v1 · · · + cm vm ) = cc1 v1 · · · + ccm vm .
  
x
Example. Is H = : x ∈ R a subspace of R2 ?
x
Solution.
   
x2 a b
H Let , be in H.
a b
     
a b a+b
○ + = is in H.
x1 a b a+b
   
a ca
○ c = is in H.
a ca
 
0
Example. Let Z = . Is Z a subspace of R2 ?
0
Solution.
Yes.            
0 0 0+0 0 c0 0
+ = ∈ Z, c = = ∈ Z.
0 0 0+0 0 c0 0
  
x
Example. Let H = : x ∈ R . Is H a subspace of R2 ?
x +1
Solution.
x2 H
H is not a subspace of R2 , because H is not closed under addi-
tion:        
0 −1 0 −1
x1 , ∈ H, but + ∈
/ H.
1 0 1 0

  
x
Example. Is U = ∈ R2 : x 2 + y 2 < 1 a subspace of R2 ?
y
Solution.
x2 U is not a subspace, because U is not closed under scalar multi-
plication:
U      
x1 −0.5 −0.5 −1
∈ U, but 2 = ∈
/ U.
0.5 0.5 1
  
x
Example. Consider V = ∈ R2 : xy ≥ 0 .
y
Solution.
x2 V is not a subspace, because it is not closed under addition:
V
         
0 −1 0 −1 −1
, ∈ V , but + = ∈
/ V.
x1 1 0 1 0 1

But V is closed under scalar multiplication.


0
  
x 2
Question. Is W = ∈ R : xy = 0 a subspace?
y
Solution.
W is not a subspace, because it is not closed under addition:
         
0 1 0 1 1
, ∈ W , but + = ∈
/ W.
1 0 1 0 1
LINEAR ALGEBRA
Column spaces and Nullspaces

ILLINOIS
Department of Mathematics
Definition. The column space, written as Col(A),  of an m × n matrix A is the set of all
linear combinations of the columns of A. If A = a1 a2 · · · an , then
Col(A) = span (a1 , a2 , . . . , an ).
 
1 0
Example. Describe the column space of A = .
0 0
Solution.
     
1 0 1
Col(A) = span( , ) = span( ). This is the x1 axis in R2 .
0 0 0

Theorem 25. Let A be an m × n matrix. Then Col(A) is a subspace of Rm .


Proof.
Col(A) is a span; Spans are subspaces

It is a subspace of Rm , because the columns of A are in Rm .


Theorem 26. Let A be an m × n matrix and b ∈ Rm . Then b is in Col(A) if and only if the
linear system Ax = b has a solution.
Proof.  
x1
   .. 
Let A = a1 . . . an . Suppose Ax = b, where x =  . . Thus:
xn b is a linear combination
b = Ax = x1 a1 + x2 a2 + · · · + xn an of the columns of A ⇐⇒
| {z } Ax = b is consistent.
(lin. comb. of a1 ,...,an )

Question. Let A and B be two row-equivalent matrices. Is Col(A) = Col(B)?


Solution.
    x2
1 0 1 0
No. Take A = and B = .
1 0 0 0 Col(B)
   
1 1 x1
Then Col(A) = span( ) 6= span( ) = Col(B). Col(A)
1 0
Definition. The nullspace of an m × n matrix A, written as Nul(A), is the set of all solutions
to the homogeneous equation Ax = 0; that is, Nul(A) = {v ∈ Rn : Av = 0}.
Theorem 27. The nullspace of an m × n matrix A is a subspace of Rn .
Proof.
Nul(A) is non-empty since 0 ∈ Nul(A). Suppose Au = 0 and Av = 0. Then

A (u + v) = Au + Av = 0 + 0 = 0.

A(cu) = c (Au) = c (0) = 0.


Thus Nul(A) is closed under
  addition and scalar multiplication.
v1
Example. Let H := {v2  : v1 + v2 − v3 = 0}. Find a matrix A such that H = Nul(A).
v3
Solution.
 
  v1  
v1 + v2 − v3 = 0 if and only if 1 1 −1 v2  = 0. Thus Nul( 1 1 −1 ) = H.
v3
 
Example. Let A = 1 1 −1 . Find two vectors v, w such that Nul(A) = span(v, w).
Solution.
Let u ∈ Nul(A). Then
        Nul(A)
u1 −u2 + u3 −1 1 x2
u2  =  u2  = u2  1  + u3 0 .
v
u3 u3 0 1
=:v =:w
z }| { z}|{ w x1
 −1
 
1  x 3
Thus Nul(A) = span  1 , 0
0 1

Question. Is there a matrix B such that Nul(A) = Col(B)?


Solution.
 
  −1 1
Yes. Set B = v w =  1 0.
0 1
Theorem 28. Let A be an m × n matrix, let b ∈ Rm , and let w ∈ Rn such that Aw = b.
Then {v ∈ Rn : Av = b} = w + Nul(A).
Proof.
Let v ∈ Rn be such that Av = b. Then

A (v − w) = Av − Aw = b − b = 0.

Therefore, v − w ∈ Nul(A). Thus v ∈ w + Nul(A).


   T
Example. Let A = 1 1 −1 and let b = 1. Observe that A 1 0 0 = b. Use this to
describe {v ∈ Rn : Av = b}.
Solution.
x2
{v ∈ Rn : Av = b} Nul(A) w + Nul(A)
       
1 1 −1 1
x1
= 0 + Nul(A) = 0 + span  1  , 0 . x3
0 0 0 1
LINEAR ALGEBRA
Abstract vector spaces

ILLINOIS
Department of Mathematics
○ The most important property of column vectors in Rn is that you can take linear
combinations of them.
○ There are many mathematical objects X , Y , . . . for which a linear combination cX + dY
make sense, and have the usual properties of linear combination in Rn .
○ We are going to define a vector space in general as a collection of objects for which linear
combinations make sense.The objects of such a set are called vectors.

x2 f (x) = x + 1
3 (f + g )(x) = sin x + x
2
1 1 1
2· + 2·
x1
−2−1 1 2 g (x) = sin x − 1
−1
2g (x) = 2 sin x − 2

=
Definition. A vector space is a non-empty set V of objects, called vectors, for which linear
combinations make sense. More precisely: on V there are defined two operations, called
addition and multiplication by scalars (real numbers),satisfying the following axioms for all
u, v , w ∈ V and for all scalars c, d ∈ R:
○ u + v is in V . (V is “closed under addition”.)
○ u + v = v + u.
○ (u + v) + w = u + (v + w).
○ There is a vector (called the zero vector) 0V in V such that u + 0V = u.
○ For each u in V , there is a vector −u in V satisfying u + (−u) = 0V .
○ cu is in V . (V is “closed under scalar multiplication”.)
○ c(u + v) = cu + cv.
○ (c + d)u = cu + du.
○ (cd)u = c(du).
○ 1u = u.
In particular, we may talk about linear combinations and span within a vector space, e.g.
3u + 2v or span(u, v).
Example. Explain how the set of functions R → R is a vector space.
Solution.
Let f , g be two function from R to R. We define f + g by
y
g
(f + g )(x) = f (x) + g (x).

For a scalar c define cf by x


f
(cf )(x) = cf (x).
f +g

Example. Explain how the set of all 2 × 2 matrices is a vector space.


Solution.
Addition: Scalar Multiplication:
         
a b e f a+e b+f a b ea eb
+ = e· =
c d g h c +g d +h c d ec ed
Definition. Let V be a vector space. A non-empty subset W ⊆ V is a subspace of V if
○ u + v for all u, v ∈ U (closed under addition)
○ cu for all u ∈ U and c ∈ R (closed under scalar multiplication)
Example. Explain why the set of all symmetric 2 × 2 matrices is a subspace of the vector
space of 2 × 2 matrices?
Solution.
The set of symmetric matrices of a given size is non-empty since the zero matrix is symmetric.
Let A, B be two symmetric 2 × 2 matrices. So AT = A and B T = B.

(A + B)T = AT + B T = A + B (cA)T = cAT = cA.

Thus the set is closed under addition and scalar multiplication.


Question. Is the set of all invertible 2 × 2 matrices a subspace of the vector space of 2 × 2
matrices?
Solution.
No. The set is not closed underaddition:
    
1 0 −1 0 0 0
+ =
0 1 0 −1 0 0
Example. Let Pn be the set of all polynomials of degree at most n, that is

Pn = {a0 + a1 t + a2 t 2 + · · · + an t n : a0 , . . . , an ∈ R}.

Explain why it is a vector space. Can you think of it as a subspace of the vector space of all
functions R → R?
Solution.

Let p(t) = a0 + a1 t + · · · + an t n and q(t) = b0 + b1 t + · · · + bn t n be two polynomials of degree


at most n. Then

(p + q)(t) = (a0 + b0 ) + (a1 + b1 )t + · · · + (an + bn )t n

is also a polynomial of degree at most n. And

(cp)(t) = (ca0 ) + (ca1 )t + · · · + (can )t n

is also a polynomial of degree at most n.


LINEAR ALGEBRA
Linear Independence

ILLINOIS
Department of Mathematics
Definition. Vectors v1 , . . . , vp are said to be linearly independent if the equation

x1 v1 + x2 v2 + · · · + xp vp = 0

has only the trivial solution (namely, x1 = x2 = · · · = xp = 0).


We say the vectors
 are linearly
 dependent if they are not linearly independent.
1 2
Example. Are , linearly independent?
1 2
Solution.
No, the equation  
1
   
2 0 The
  problem:
x1 + x2 =
 
1 2 0 2 1
∈ span
2 1
has the solution (−2, 1).

Theorem 29. Vectors v1 , . . . , vp are linear dependent if and only if there is i ∈ {1, . . . , p}
such that vi ∈ span(v1 , . . . , vi−1 , vi+1 , . . . , vp ).
Question. A single non-zero vector v1 is always linearly independent. Why?
Solution.

Because x1 v1 = 0 only for x1 = 0.

Question. Two vectors v1 , v2 are linearly independent if and only if neither of the vectors is a
multiple of the other. Why?
Solution.

Because if x1 v1 + x2 v2 = 0 with, say, x2 6= 0, then v2 = − xx12 v1 .

Question. Vectors v1 , . . . , vp containing the zero vector are linearly dependent.


Solution.

Because if, say, v1 = 0, then v1 + 0v2 + · · · + 0vp = 0.


     
1 1 −1
Example. Consider 1 , 2 , 1  . Are these vectors linearly independent?
    
1 3 3
Solution.
       
1 1 −1 0
Solve: x1 1 + x2 2 + x3  1  = 0
1 3 3 0

Solve corresponding linear system: a3 a2


    x3
1 1 −1 0 1 0 −3 0
1 2 1 0  ; 0 1 2 0 
RREF
1 3 3 0 0 0 0 0
a1
x1 x2
Solution: x3 is free. x2 = −2x3 , and x1 = 3x3 . Take x3 = 1:
       
1 1 −1 0 span(a1 , a2 )
3 1 − 2 2 +  1  = 0 .
1 3 3 0
Question. In the previous example we determined the following linear dependence:
     
1 1 −1
3 1 − 2 2 + 1  = 0.
    
1 3 3
Can you write this as a matrix equation?
Solution.
  
1 1 −1 3
1 2 1  −2 = 0
1 3 3 1

Idea. The Null space determines linear (in)dependence!


Theorem 30. Let A be an m × n matrix. The following are equivalent:
○ The columns of A are linearly independent.
○ Ax = 0 has only the solution x = 0.
○ A has n pivots.
○ there are no free variables for Ax = 0.
Question. Let v1 , . . . , vn ∈ Rm . If n > m, then v1 , . . . vn are linearly dependent. Why?
Solution.
 
Let A = v1 . . . vn . This is an m × n matrix.
; The matrix can have at most m pivots.
; Since n > m, A does not have n pivots.
; By Theorem, the columns of A have to be linearly dependent.
Question. Consider an m × n-matrix A in echelon form. The pivot columns of A are linearly
independent. Why?
Solution.

 
For example, let A = a1 a2 a3 and suppose that a1 , a3 are the pivot columns.
; The matrix a1 a3 has 2 pivots.
 

; The vectors a1 , a3 are linearly independent.


LINEAR ALGEBRA
Basis and Dimension

ILLINOIS
Department of Mathematics
Definition. Let V be a vector space. A sequence of vectors (v1 , . . . , vp ) in V is a basis of V
if
○ V = span (v1 , . . . , vp ) , and
○ (v1 , . . . , vp ) are linearly independent.
       
1 0 1 1
Example. Check that both , and , are bases of R2 .
0 1 1 −1
Solution.
Linear independence: x2
     
1 0 1 1 1 1
; 2 pivots , ; ; 2 pivots .
0 1 1 −1 0 −2

Spanning set: x1
         
a 1 0 a+b 1 a−b 1
=a +b = + .
b 0 1 2 1 2 −1
Theorem 31. Every two bases in a vector space V contain the same numbers of vectors.
Definition. The number of vectors in a basis of V is the dimension of V .
Example. What is the dimension of Rn ?
Solution.
Let  
1
 
0
 
0
0 1 0
     
e1 = 0 , e2 = 0 , . . . , en = 0
     
 ..   ..   .. 
. . . x3
0 0 1
e3
Note that (e1 , e2 , . . . , en ) is a basis of Rn : e2 x2
 
v1 x1 e1
 .. 
e1 . . . en = In ; n pivots,
 
 .  = v1 e1 + · · · + vn en .
vn

Thus dim Rn = n.
Theorem 32. Suppose that V has dimension d. Then
○ A sequence of d vectors in V are a basis if they span V .
○ A sequence of d vectors in V are a basis if they are linearly independent.
Proof.

○ Take d vectors that span V , say v1 , . . . vd .


So V = span(v1 , . . . vd ).
Suppose v1 , . . . vd are not linearly independent.
Then one of the them, say vd , is in the span of the other vectors.
So vd ∈ span(v1 , . . . , vd−1 ).
Drop vd !
Remaining d − 1 vectors still span V . Thus V = span(v1 , . . . , vd−1 ).
Repeat until the remaining vectors are linearly independent.
Eventually, we have v1 , . . . , v` linearly independent such that span(v1 , . . . , v` ) = V .
Contradicts dim V = d, since ` < d.
     
1 0 1
Example. Is 2 , 1 , 0 a basis of R3 ?
   
0 1 3
Solution.
The set has 3 elements.

Since dim R3 = 3, it is a basis if and only if the vectors are


linearly independent.
a3
     
1 0 1 1 0 1 1 0 1 x3
2 1 0 ; 0 1 −2 ; 0 1 −2
0 1 3 0 1 3 0 0 5 a2

Since each column contains a pivot, the three vectors are x2


x1
linearly independent. a1
     
1 0 1
Hence, 2 , 1 , 0 a basis of R3 .
0 1 3
Theorem 33. A basis is a minimal spanning set of V ; that is the elements of the basis span
V but you cannot delete any of these elements and still get all of V .
Example. Produce a basis of R2 from the vectors
     
1 1 −.5 x2
v1 = , v2 = , v3 = . v1
2 1 −2
v2
Solution.
x1
Three vectors in R2 have to be linearly dependent.
Here, we notice that v2 = 1.5v1 + v3 . v3
2
The remaining vectors {v1 , v3 } are a basis for R , because the two
vectors are clearly linearly independent.
 
2 2
Example. Produce a basis of R from the vector .
1
Solution.
       
2 1 2 1
Add any vector not in span . For example, . So , is a basis.
1 0 1 0
LINEAR ALGEBRA
Bases and dimensions of the four fundamental
subspaces

ILLINOIS
Department of Mathematics
Algorithm. To find a basis for Nul(A):
○ Find the parametric form of the solutions to Ax = 0.
○ Express solutions x as a linear combination of vectors with the free variables as coefficients.
○ Use these vectors as a basis of Nul(A).
 
3 6 6 3 9
Example. Find a basis for Nul(A) where A = .
6 12 15 0 3
Solution.
   
3 6 6 3 9 0 1 2 0 5 13 0 x = −2x2 − 5x4 − 13x5
; 1
6 12 15 0 3 0 RREF 0 0 1 −2 −5 0 x3 = 2x4 + 5x5

The solutions to Ax = 0 are:


             
−2x2 − 5x4 − 13x5 −2 −5 −13 −2 −5 −13
 x2  1 0  0  1 0  0 
 = x2  0  + x4  2  + x5  5  ; basis:  0  ,  2  ,  5 
             

 2x4 + 5x5             
 x4  0 1  0  0 1  0 
x5 0 0 1 0 0 1
Definition. The rank of a matrix is the number of pivots it has.
Theorem 34. Let A be an m × n matrix with rank r . Then dim Nul(A) = n − r .
   
Remark. Let A = a1 . . . an and let U = u1 . . . un be an echelon form of A. Explain
why
x1 u1 + · · · + xn un = 0 ⇐⇒ x1 a1 + · · · + xn an = 0.
Solution.
Because A and U are row-equivalent:
x1 u1 + · · · + xn un = 0 ⇐⇒ Ux = 0 ⇐⇒ Ax = 0 ⇐⇒ x1 a1 + · · · + xn an = 0

Theorem 35. Let A be an m × n matrix with rank r . The pivot columns of A form a basis
of Col(A). In particular, dim Col(A) = r .
Proof.
Let U be RREF of A.
; Then the pivot columns of U are linearly independent.
; By Remark, pivot columns of A are linearly independent.
 
1 2 0 4
2 4 −1 3 
Example. Find a basis for Col(A) where A = 
3
.
6 2 22
4 8 0 16
Solution.
   
1 2 0 4 1 2 0 4
2 4 −1 3  0
 ; U= 0 1 5
A=
3

6 2 22 RREF 0 0 0 0
4 8 0 16 0 0 0 0
This are the pivot columns, the first and third.
   

 1 0 
   
2 −1

Hence, a basis for Col(A) is  3  2 . An aside: Note that for U we have column
,

 
4 0
 
u2 = 2u1 and u4 = 4u1 + 5u3 .
The same is true for the columns of A!
   
1 3 1 3
Example. Let A = . The RREF of A is U = . Is Col(A) = Col(U)? Is
2 6 0 0
Col(AT ) = Col(U T )?
Solution.

The second columns of A and U are redundant: The second rows of A and U are redundant:
       
1 1 T 1 1
Col(A) = span( ) 6= span( ) = Col(U) Col(A ) = span( ) = span( ) = Col(U T )
2 0 3 3

Row operations do not preserve the column Row operations preserve the row space.
space.

Theorem 36. Let A, B be two row-equivalent matrices. Then Col(AT ) = Col(B T ).


Theorem 37. Let A be m × n matrix with rank r. Then the non-zero rows in an echelon
form of A form a basis of Col(AT ), and thus dim(AT ) = r .
 
1 2 0 4
2 4 −1 3 
Example. Find a basis for Col(AT ) where A=3 6 2 22 .

4 8 0 16
Solution.
       
1 2 0 4 1 2 0 4 1 0
2 4 −1 3  0 0 1 5 2 0
A=  ; U=  ;   ,   basis of Col(AT ).
3 6 2 22 RREF 0 0 0 0 0 1
4 8 0 16 0 0 0 0 4 5
Theorem 38. Let A be an m × n matrix with rank r . Then
○ dim Col(A) = dim Col(AT ) = r .
○ dim Nul(A) = n − r .
Question. What is dim Nul(AT )? What is dim Col(A) + dim Nul(A)?
Solution.

dim Nul(AT ) = m − dim Col(AT ) = m − r . dim Col(A) + dim Nul(A) = r + (n − r ) = n.


LINEAR ALGEBRA
Graphs

ILLINOIS
Department of Mathematics
Definition. A graph is a set of nodes (or: vertices) that are connected through edges.
Example. A graph with 4 nodes and 5 edges:

1 2  
0 1 1 0
1 1 1 0
 
1 1 0 1
0 0 1 0
3 4

Definition. Let G be a graph with n nodes. The adjacency matrix of G is the n × n-matrix
A = (aij ) such that
(
1 if there is an edge between node i and node j
aij =
0 otherwise .
Definition. A walk of length k on a graph of is a sequence of k + 1 vertices and k edges
between two nodes (including the start and end) that may repeat.A path is walk in which all
vertices are distinct.
Example. Count the number of walks of length 2 from node 2 to node 3 and the number of
walks of length 3 from node 3 back to node 3:

1 2

○ Node 2 to Node 3: 2 walks of length 2


○ Node 3 to Node 3: 3 walks of length 3

3 4

Definition. A graph is connected if for every pair of nodes i and j there is a walk from node
i to node j. A graph is disconnected if it is not connected.
Theorem 39. Let G be a graph and let A be its adjacency matrix. Then the entry in the
i-th row and j-th column of A` is the number of walks of length ` from node j to node i on G.
Proof.
We just consider the case ` = 2. Then the entry in the i-th row and j-th column of A2 is:
 
aj1

(A2 )ij = ai1 . . . ain   ...  = ai1 a1j + ai2 a2j + · · · + ain anj
 

ajn

Each summand corresponds to a walk of length ` = 2 from node j to i. For example:

ai2 a2j = 1 ⇐⇒ there is an edge between node j and node 2 and an edge between node 2 and i.

Since each walk from node j to node i has go through some node,the sum

ai1 a1j + ai2 a2j + · · · + ain anj

is equal to the number of walks of length ` = 2 from node j to node i.


Definition. A directed graph is a set of vertices connected by edges, where the edges have a
direction associated with them.
Example. A graph with 4 nodes and 5 edges:
1
1 2  
0 0 0 0
1 0 1 0
3  
2 4 1 0 0 0
0 1 1 0
5
3 4
We can also talk about adjacency matrices of directed graphs. We use the following
convention:
Definition. Let G be a directed graph with m edges and n nodes. The adjacency matrix of
G is the n × n matrix A = (ai,j )i,j with

1, if there is a directed edge from node j to node i
ai,j =
0, otherwise
Directed graphs have another important associated matrix:
Definition. Let G be a directed graph with m edges and n nodes. The edge-node incidence
matrix of G is the m × n matrix A = (ai,j )i,j with

 −1, if edge i leaves node j
ai,j = +1, if edge i enters node j
0, otherwise

Example. A graph with 4 nodes and 5 edges:


1
1 2
 
−1 1 0 0
−1 0 1 0
 
3 0 1 −1 0
2 4 
 0 −1 0

1
0 0 −1 1
5
3 4
Definition. A connected component of an undirected graph is a part in which any two
vertices are connected to each other by paths, and which is connected to no additional vertices
in the rest of the graph. The connected components of a directed graph are those of its
underlying undirected graph. A graph is connected if only has one connected component.
Example.

A graph with one connected component: A graph with two connected components:
1 1 2
1 2

3 1 2
2 4

5 3 4
3 4
Theorem 40. Let G be a directed graph and let A be its edge-node incidence matrix. Then
dim Nul(A) is equal to the number of connected components of G.
Example. We explain the idea in the case of the following graph G. Let A be its edge-node
incidence matrix, a 5 × 4-matrix. Note that Nul(A) ⊆ R4 .
 
1 x1
1 2 x2 
3 x3  as assigning to potential to each node. If x ∈ Nul(A):
Think of x =  
2 4
5 x
4 −1 1

0 0  

−x1 + x2

3 4
   
0 −1 0 x1 c
1 0   −x1 + x3 
  
0 x
 2  c 
 x3  = −x2 + x3  ; x = c 
    

−1 1 0 0
   = Ax =  0
0  1 −1 0 
−1 0  0 −1 0 1 −x2 + x4 
1 0 0 x4 c
0 0 −1 1 −x3 + x4
 
0 1 −1 0 
 0 −1 0 1 Each edge forces the potential of the nodes it connects to be equal. Thus
 

0 0 −1 1 if two nodes are connected by a path, they have the same assigned poten-
tial.
1 2
Example. Find a basis of the null space of the edge-node
incidence matrix of the following graph: 1 2

3 4
Solution.
 
x1
x2  4
x3  ∈ R .
Let A be the edge-node incidence matrix. It is a 2 × 4-matrix. Let x =  
If Ax = 0, then x4

x1 = x3 and x2 = x4 Just to make sure:


| {z } | {z }  
connected by an edge connected by an edge −1 0 1 0
A=
    0 −1 0 1
1 0  
1 0 −1 0
0 1 ;
; Nul(A) has the following basis: 
1 , 0
   RREF 0 1 0 −1
0 1
Definition. A cycle in an undirected graph is a path in which all edges are distinct and the
only repeated vertices are the first and last vertices. By cycles of a directed graph we mean
those of its underlying undirected graph.
Example. Find cycles in the following graph.
Solution.
 
−1
1  
edge2 , edge3 , −edge1 ; 
 
1 −1
1

1 2 1
   
0
0 edge2 , edge5 , −edge4 , −edge1 ;  0 
 
3 −1
2 4  
0 1
5 0
3 4 edge , −edge , −edge ; −1 
5 4 3  
−1
; Cycle1 + Cycle2 = Cycle3
1
Definition. The span of all cycle vectors of a graph G is called the cycle space.
Theorem 41. Let G be a directed graph and let A be its edge-node incidence matrix. Then
the cycle space of G is equal to Nul(AT ).
Example. We explain the idea in the case of the following graph G. Let A be its edge-node
incidence matrix, a 5 × 4-matrix. Note that Nul(AT ) ⊆ R5 .
 
1 y1
1 2 y2 
3 Think of y = y3  as assigning a flow to each edge. If y ∈ Nul(AT ):
 
2 4 y 
4
5 y
3 4 5 
   
0  y1  
−1 −1 0 0 0   −y1 − y2
0  y
 2 

−1 1 0 0

0 = AT y =  1
   0 1 −1 0   y3  = y1 + y3 − y4 

−1 0 1 0
 
0
0 1 −1 0 −1  y4 
 y2 − y3 − y5 
0 0 0 1 1 y4 + y5
 
0
 1 −1 0  0 y5
 0 −1 0 1
; y ∈ Nul(AT ) if and only if the inflow equals the outflow at each node.
0 0 −1 1
What is the simplest way to balance flow? Assign flow around cycles.
Example (ctd.).
Let’s solve AT y = 0. 1
1 2
   
−1 −1 0 0 0 1 0 1 0 1
1 0 1 −1 0 0 1 −1 0 −1 3
T  ;  2 4

A = 
0 1 −1 0 −1 RREF 0 0 0 1 1
0 0 0 1 1 0 0 0 0 0 5
3 4
     
−y3 − y5 −1 −1
 y3 + y5  1 1
y ∈ Nul(AT ) ; y1 = −y3 −y5 , y2 = y3 +y5 , y4 = −y5 ; y = 
     
 y 3
 = y3  1  +y5  0 
    
 −y5  0 −1
y5 0 1
Cycle1 Cycle3

; Cycle1 and Cycle3 form a basis of the cycle space of G.


LINEAR ALGEBRA
Orthogonal complements

ILLINOIS
Department of Mathematics
 
1 2
Example. Let A = 2 4. Find bases of Nul(A) and Col(AT ).
3 6
Solution.
  x2
1 2
A ; 0 0 Nul(A)
RREF Col(AT )
0 0
    x1
−2 T 1
Nul(A) = span , Col(A ) = span
1 2

Definition. Let W be a subspace of Rn . The orthogonal complement of W is the


subspace W ⊥ of all vectors that are orthogonal to W ; that is
W ⊥ := {v ∈ Rn : v · w = 0 for all w ∈ W }.

Remark. Observe that (W ⊥ )⊥ = W .


Theorem 42. Let A be an m × n matrix. Then Nul(A) is the orthogonal complement of
Col(AT ); that is Nul(A) = Col(AT )⊥ .
Proof.
Suppose u ∈ Nul(A) and v ∈ Col(AT ). Then Au = 0 and there is w ∈ Rm such that v = AT w.

; v · u = (AT w)T u = wT Au = wT 0 = 0.

Suppose that v ∈ Col(AT )⊥ . Let AT = R1 . . . Rm . Then


 

(Av)T = vT R1 R2 . . . Rm = vT R1 vT R2 . . . vT Rm = 0 0 . . . 0 .
     

So Av = 0. Thus v ∈ Nul(A).
Remark. It follows that
○ Nul(A)⊥ = Col(AT ).
○ Nul(AT ) = Col(A)⊥ .
Theorem 43. Let V be a subspace of Rn . Then dim V + dim V ⊥ = n.
Proof.
Write V = Col(A) for some n × m matrix A.

dim V + dim V ⊥ = dim Col(A) + dim Nul(AT ) = dim Col(AT ) + dim Nul(AT ) = n

   
1 0
Example. Find a basis of the orthogonal complement of span 0 , 1.
Solution. 0 1
Strategy: Find A such that span is Col(A). Then find a basis of
Nul(AT ).   x3
1 0  
1 0 0 b
A = 0 1 ; AT = a2
Transpose 0 1 1 x2
0 1
        a1
x1 0 0 0
x2  = −x3  = x3 −1 ; −1 basis of Nul(AT ). Col(A) x1
x3 x3 1 1
LINEAR ALGEBRA
Coordinates

ILLINOIS
Department of Mathematics
Theorem 44. Let (v1 , . . . , vp ) be a basis of V . Then every vector w in V can be expressed
uniquely as
w = c1 v1 + · · · + cp vp .
Proof.
Let c1 , . . . , cp and d1 , . . . , dp such that w = c1 v1 + · · · + cp vp = d1 v1 + · · · + dp vp .

=⇒ (c1 − d1 )v1 + · · · + (cp − dp )vp = 0.


=⇒ c1 − d1 = · · · = cp − dp = 0 =⇒ c1 = d1 , . . . , cp = dp .

Definition. Let B = (v1 , v2 , . . . , vp ) be an (ordered) basis of V , and let w ∈ V .The


coordinate vector wB of w with respect to the basis B is
 
c1
c2 
 
wB = c3  , if w = c1 v1 + c2 v2 + · · · + cp vp .
 
 .. 
.
cp
Example. Let V = R2 , and consider the bases
         
1 1 1 0
B := b1 = , b2 = , E := e1 = , e2 = .
1 −1 0 1
 
3
Let w = . Determine wB and wE .
−1
Solution.
                   
3 1 1 1 1 3 1 0 1 0
= c1 + c2 =1 +2 = c1 + c2 =3 + (−1)
−1 1 −1 1 −1 −1 0 1 0 1
   
1 3
x2 ; wB = x2 ; wE =
2 −1
b1 e2
x1 e1
x1
b2
w w
Definition. In Rn let ei denote the vector with a 1 in the i-th coordinate and 0’s elsewhere.
The standard basis of Rn is the ordered basis En := (e1 , . . . , en ).
Question. For all v ∈ Rn , we have v = vEn . Why?
Solution.    
v1 v1
 ..   .. 
v =  .  = v1 e1 + · · · + vn en ; vEn =  . 
vn vn
    
1 1
Example. Consider the basis B := b1 = , b2 = of R2 . Let v ∈ R2 be such that
1 −1
 
2
vB = . Can you determine v? x2
1
Solution. b1
    v
2 3
vB = ; v = 2b1 + 1b2 = x1
1 1 b2
Definition. Let B and C be two bases of Rn . The change of basis matrix IC,B is the matrix
such that for all v ∈ Rn
IC,B vB = vC

Theorem 45. Let B = (b1 , . . . , bn ) be a basis of Rn . Then


 
IEn ,B = b1 . . . bn

That is, for all v ∈ Rn ,  


v = b1 . . . bn vB .
Proof.
 
vB,1
Let vB =  ... . Then
 
 
vB,n vB,1
b1 . . . bn vB = b1 . . . bn  ...  = vB,1 b1 + · · · + vB,n bn = v.
    

vB,n
Question. Let B = (b1 , . . . , bn ) be a basis of Rn . How do you compute IB,En ?
Solution.

v = IEn ,B vB ; IE−1
n ,B
v = vB ; IB,En = IE−1
n ,B

Question. Let B and C be two bases of Rn . How do you compute IB,C ?


Solution.

IB,En IEn ,C vC = IB,En v = vB ; IB,C = IB,En IEn ,C

Remark. Another way to compute IC,B :


 
IC,B = (b1 )C . . . (bn )C
LINEAR ALGEBRA
Orthogonal and Orthonormal bases

ILLINOIS
Department of Mathematics
Theorem 46. Let v1 , . . . , vm ∈ Rn be non-zero and pairwise orthogonal.Then v1 , . . . , vm are
linearly independent.
Proof.

Suppose that
c1 v1 + · · · + cm vm = 0.
Take the inner product of v1 on both sides.

0 = v1 · (c1 v1 + · · · + cm vm )
= c1 v1 · v1 + c2 v1 · v2 + · · · + cm v1 · vm
= c1 v1 · v1 = c1 kv1 k2

But kv1 k 6= 0 and so c1 = 0. Similarly, we find that c2 = 0, . . . , cm = 0. Therefore, the vectors


are linearly independent.

Remark. The theorem implies that a set of n orthonormal vectors in Rn is a basis of Rn .


Definition. An orthogonal basis (an orthonormal basis) is an orthogonal set of vectors (an
orthonormal set of vectors) that forms a basis.
Theorem 47. Let B := (b1 , b2 , . . . , bn ) be an orthogonal basis of Rn , and let v ∈ Rn . Then
v · b1 v · bn
v= b1 + . . . + bn .
b1 · b1 bn · bn
Solution.
Let c1 , . . . , cn be such that
v = c1 b1 + · · · + cn bn .
Just calculate

b1 · v = b1 · (c1 b1 + c2 b2 + · · · + cn bn )
= c1 b1 · b1 + c2 b1 · b2 + · · · + cn b1 · bn
v · b1
= c1 b1 · b1 ; c1 =
b1 · b1

Remark. When B is orthonormal, then bi · bi = 1 for i = 1, . . . , n.


    
1 1 1 −1
Example. Let U be the orthonormal basis u1 = 2 √ , u2 = 2
√ of R2 . Let
1 1
 
2
v= . Determine vU using the formula from the previous theorem!
3
Solution.
  " √5 #
u1 · v
vU = = √12
u2 · v 2

Example. With U as above, compute the change of basis matrix IU ,E2 ?


Solution.

  −1  
1 1 −1 1 1 1
IU ,E2 = IE−1
2 ,U
= √ =√ = IET2 ,U
2 1 1 2 −1 1
Theorem 48. Let U = (u1 , . . . , un ) be an orthonormal basis of Rn . Then
 T
IU ,En = u1 . . . un .
Proof.
   T   T
u1 · v u1 v u1
 ..   ..   ..  T
vU =  .  =  .  =  .  v ; IU ,En = u1 . . . un


un · v uT
nv uTn

Definition. An n × n-matrix Q is orthogonal if Q −1 = Q T .


Remark. The columns of an orthogonal matrix form an orthonormal basis. Why?
Solution.
 T  T
q1 q1 qT

q1 1 q2 . . .
(
. T T 0 if i 6= j
In = Q T Q =  ..  q1 . . . qn = q2 q1 q2 q2 . . . ; qi · qj =
     
.. .. .. 1 if i = j
qTn . . .
LINEAR ALGEBRA
Linear Transformation

ILLINOIS
Department of Mathematics
Definition. Let V and W be vector spaces. A map T : V → W is a linear transformation if

T (av + bw) = aT (v) + bT (w)

for all v, w ∈ V and all a, b ∈ R.


Example. Let V = R, W = R. Then the map f (x) = 3x is linear, g (x) = 2x − 2 is not.
Why?
Solution.
If x, y ∈ R, then
f (ax + by ) = 3(ax + by ) = a · 3x + b · 3y = af (x) + bf (y ).

What about the function g (x) = 2x − 2?

g (0) + g (0) = 2 · 0 − 2 + 2 · 0 − 2 = −4 6= −2 = g (0) = g (0 + 0).

Remark. T (0V ) = T (0 · 0V ) = 0 · T (0V ) = 0W ; T (0V ) = 0W


Question. Let A be an m × n matrix, and consider the map T : Rn → Rm given by
T (v) := Av. Is this a linear transformation?
Solution.
Because matrix multiplication is linear.

A(cv + dw) = cAv + dAw

The left-hand side is T (cv + dw) and the right-hand side is cT (v) + dT (w).
Example. Let Pn be the vector space of all polynomials of degree at most n. Consider the
d
map T : Pn → Pn−1 given by T (p(t)) := dt p(t). This map is linear! Why?
Solution.

Because differentiation is linear:


d d d
[ap(t) + bq(t)] = a p(t) + b q(t).
dt dt dt
The left-hand side is T (ap(t) + bq(t)) and the right-hand side is aT (p(t)) + bT (q(t)).
Theorem 49. Let V , W be two vector spaces, let T : V → W be a linear transformation
and let (v1 , . . . , vn ) be a basis of V . Then T is completely determined by the values
T (v1 ), . . . , T (vn ).
Solution.

Take any v ∈ V .
It can be written as v = c1 v1 + · · · + cn vn because (v1 , . . . , vn ) is a basis and hence spans V .
Hence by the linearity of T :

T (v) = T (c1 v1 + · · · + cn vn ) = c1 T (v1 ) + · · · + cn T (vn ).

So we know how to write T (v) as long as we know T (v1 ), . . . , T (vn ).


 
  1
2 3 1
Example. Let T : R → R be a linear transformation with T = 2 and

0
3
 
  0  
0 1
T =  0 . What is T ?
1 2
−2
Solution.

x2
       T
1 1 0 v
T =T 1 +2 e2
2 0 1
    x3 T (e1 )
1 0 e1 x1
=T + 2T
0 1
x2
x1
     
1 0 1
= 2 + 2  0  =  2 
3 −2 −1 T (v)
T (e2 )
Theorem 50. Let T : Rn → Rm be a linear transformation. Then there is a m × n matrix A
such that
○ T (v) = Av, for all v ∈ Rn .
○ A = T (e1 ) T (e2 ) . . . T (en ) , where (e1 , e2 , . . . , en ) is the standard basis of Rn .
 

Proof.  
v1
 .. 
Let v =  . . We can write v = v1 e1 + v2 e2 + · · · + vn en . Then
vn
T (v) = T (v1 e1 + v2 e2 + · · · + vn en )
= v1 T (e1 ) + v2 T (e2 ) + · · · + vn T (en )
 
v1
  . 
= T (e1 ) T (e2 ) . . . T (en )  ..  = Av.
vn

Remark. We call this A the coordinate matrix of T with respect to the standard bases - we
write TEm ,En .
Example. Let Tα : R2 → R2 be the “rotation over α radians (counterclockwise)” map, that
is Tα (v) is the vector obtained by rotating v over angle α. Find the 2 × 2 matrix Aα such that
Tα (v) = Aα v for all v ∈ R2 .
Solution.

x2 We just need to find what happens under (coun-


e2 terclockwise) rotation to the standard basis vectors.
By high school trigonometry,
Tα (e2 ) Tα (e1 )
       
1 cos(α) 0 − sin(α)
Tα = , Tα = ,
e1 x1 0 sin(α) 1 cos(α)
 
cos(α) − sin(α)
So our matrix is Aα = .
sin(α) cos(α)
This is called the rotation matrix for angle α.
LINEAR ALGEBRA
Coordinate matrices of linear transformations

ILLINOIS
Department of Mathematics
Last time: Linear transformation is matrix multiplication. For every linear transformation
T : Rn → Rm , there is an m × n matrix A such that T (v) = Av.
Today: The same in abstract vector spaces.
Theorem 51. Let V , W be two vector space, let B = (b1 , . . . , bn ) be a basis of V and
C = (c1 , . . . , cm ) be a basis of W , and let T : V → W be a linear transformation. Then there
is a m × n matrix TC,B such that
○ T (v)C = TC,B vB , for all v ∈ V .
 
○ TC,B = T (b1 )C T (b2 )C . . . T (bn )C .

apply T
v : vector in V / vector in W : T (v)

write in coordinates wrt B write in coordinates wrt C


 multiply by TC,B 
vB : coordinate vector in Rn / coordinate vector in Rm : TC,B v
d
Example. Let D : P2 → P 1 be given by D(p(t)) = dt p(t). Consider the bases B = (1, t, t 2 )
and C = (1, t) of P2 and P1 . Determine DC,B .
Solution.
D(1) = 0 = 0 · 1 + 0 · t, D(t) = 1 = 1 · 1 + 0 · t, D(t 2 ) = 2t = 0 · 1 + 2 · t

 
 2
 0 1 0
DC,B = D(1)C D(t)C D(t )C =
0 0 2

Question. Consider p(t) = 2 − t + 3t 2 in P2 . Compute D(p(t))C and DC,B p(t)B .


Solution.  
−1
D(2 − t + 3t 2 ) = − 1 + 6t ; D(p(t))C =
6
   
2   2  
0 1 0   −1
p(t)B = −1 ; DC,B p(t)B =
  −1 =
0 0 2 6
3 3
 
3 1
Example. Let T : R2
→ R2
be such that T (v) = v. Consider
1 3
    
1 1
B := b1 = , b2 = . Compute TB,B . Let v = b1 + b2 . Use TB,B to compute T (v).
−1 1
Solution.
 
TB,B = T (b1 )B T (b2 )B .  
1
     Note that vB = .
3 1 1 1 1
T (b1 ) = =2 = 2b1 + 0b2 ,
1 3 −1 −1
     T (v)B = TB,B vB
3 1 1 1     
T (b2 ) = =4 = 0b1 + 4b2 2 0 1 2
1 3 1 1 = =
0 4 1 4
This means that the coordinate matrix is simply  
6

2 0
 ; T (v) = 2b1 + 4b2 =
TB,B = 2
0 4
Theorem 52. Let T : Rm → Rn be a linear transformation and A and B be two bases of
Rm and C, D be two bases of Rn . Then

TC,A = IC,D TD,B IB,A .


Proof.

We compute

IC,D TD,B IB,A vA = IC,D TD,B vB


apply TC,A
= IC,D T (v)D (Rm , A) / (Rn , C)
O
= T (v)C IB,A IC,D
 apply TD,B
Since TC,A is the only matrix with TC,A vA = (Rm , B) / (Rn , D)
T (v)C , we have

TC,A = IC,D TD,B IB,A .


       
1 0 1 1
Example. Consider E := { , } and B := { , } as before. Let T : R2 → R2 be
0 1 −1 1
 
3 1
again the linear transformation that v 7→ v. Determine TB,B .
1 3
Solution.
TB,B = IB,E TE,E IE,B .
We already calculated that
   
1 1 −1 1 1
IB,E = , IE,B = .
2 1 1 −1 1

Since E is the standard basis,  


3 1
TE,E = .
1 3
Therefore      
1 1 −1 3 1 1 1 2 0
TB,B = =
2 1 1 1 3 −1 1 0 4
.
LINEAR ALGEBRA
Determinants

ILLINOIS
Department of Mathematics
Definition. The determinant
 of
a b
○ a 2 × 2 matrix is det = ad − bc,
c d
○ a 1 × 1 matrix is det([a]) = a.
Remark. Recall that  −1  
a b 1 d −b
= .
c d ad − bc −c a

Goal. Define the determinant of an n × n-matrix such that

A is invertible ⇐⇒ det(A) 6= 0.

 
a b a b
Notation. We will write both det and for the determinant.
c d c d
Definition. The determinant is the operation that assigns to each n × n-matrix a number
and satisfies the following conditions:
○ (Normalization) det In = 1,
○ It is affected by elementary row operations as follows:
o (Replacement) Adding a multiple of one row to another row does not change the
determinant.
o (Interchange) Interchanging two different rows reverses the sign of the determinant.
o (Scaling) Multiplying all entries in a row by s, multiplies the determinant by s.
 
2 3 3
Example. Compute det 0 1 2 .
0 0 6
Solution.

1 1
R2 →R2 − 3 R3 R1 → 2 R1
2 3 3 1
R1 →R1 − 2 R3
2 3 0 2 0 0 1
R3 → 6 R3
1 0 0
R1 →R1 −3R2
0 1 2 = 0 1 0 = 0 1 0 = 2 · (6) 0 1 0 = 12
0 0 6 0 0 6 0 0 6 0 0 1
Theorem 53. The determinant of a triangular matrix is the product of the diagonal entries.
1 2 0
Example. Compute 3 −1 2 .
2 0 1
Solution.

1 2 0 R2→R2−3R1 1 2 0
R3→R3−2R1
3 −1 2 = 0 −7 2
2 0 1 0 −4 1

R3→R3− 74 R2
1 2 0
= 0 −7 2
0 0 − 17
1
= 1 · (−7) · (− ) = 1
7
a b
Example. (Re)Discover the formula for .
c d
Solution.

a b R2→R2− ca R1 a b  c 
= = a d − b = ad − bc
c d 0 d − ca b a
NB: this only works if a 6= 0. What do you do if a = 0? Swap rows!

Example. Suppose A is a 3 × 3 matrix with det(A) = 5. What is det(2A)?


Solution.
A has three rows.

Multiplying each of them by 2 of them produces 2A.

Hence, det(2A) = 23 det(A) = 40.


Theorem 54. Let A be an n × n-matrix. Then det(A) = 0 if and only if A is not invertible.
Proof.
Note that if A and B are row-equivalent,then

det(A) = 0 ⇐⇒ det(B) = 0,

because elementary row operations do not change whether the determinant is 0 or not.

Thus A is invertible if and only if A is row-equivalent to In if and only if det(A) 6= 0.


Theorem 55. Let A, B be two n × n-matrices. Then det(AB) = det(A) det(B).
Example. If A is invertible, then det(A−1 ) = 1
det(A) . Why?
Solution.

1
1 = det(In ) = det(AA−1 ) = det(A) det(A−1 ) ; det(A−1 ) = .
det(A)
Theorem 56. Let A be an n × n-matrix. Then det(AT ) = det(A).
Proof.
Let P be a permutation matrix, L be a lower-triangular matrix and U be an upper triangular
matrix such that PA = LU.
For simplicity, assume P is the permutation matrix of a single row swap. Then P = P T .

det(P) det(A) = det(PA) = det(LU) = det(L) det(U),


det(AT ) det(P T ) = det((PA)T ) = det((LU)T ) = det(U T ) det(LT ).

However, det(LT ) = det(L) and det(U) = det(U T ) ; det(A) = det(AT ).


Remark. det(AT ) = det(A) means that everything you know about determinants in terms of
rows of A is also true for the columns of A. In particular:
○ If you exchange two columns in a determinant, the determinant changes by a factor of −1.
○ You can add a multiple of a column to another column without changing the determinant.
○ Multipying each entry of a column by a scalar s, change the determinant by a factor of s.
LINEAR ALGEBRA
Cofactor expansion

ILLINOIS
Department of Mathematics
Notation. Let A be an n × n-matrix. We denote by Aij the matrix obtained from matrix A by
deleting the i-th row and j-th column of A.
 
1 2 3 4
5 6 7 8
Example. Let A =   9 10 11 12. Find A23 and A43 .

13 14 15 16
Solution.

   
1 2 4 1 2 4
A23 =  9 10 12 , A43 = 5 6 8 
13 14 16 9 10 12
Definition. Let A be an n × n-matrix. The (i, j)-cofactor of A is the scalar Cij defined by
Cij = (−1)i+j det Aij .
Theorem 57. Let A be an n × n-matrix. Then for every i, j ∈ {1, . . . , n}
det A = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin (expansion across row i)
= a1j C1j + a2j C2j + · · · + anj Cnj (expansion down column j)
1 2 0
Example. Compute 3 −1 2 by cofactor expansion across row 1.
2 0 1
Solution.
1 2 0 + − +
3 −1 2 = 1·(−1)1+1 · −1 2 +2·(−1)1+2 · 3 2 +0·(−1)1+3 · 3 −1
2 0 1 0 1 2 1 2 0

−1 2 3 2 3 −1
=1· −2· +0· = 1 · (−1) − 2 · (−1) + 0 = 1
0 1 2 1 2 0
1 2 0
Example. Compute 3 −1 2 by cofactor expansion down column 2 and by cofactor
2 0 1
expansion down column 3.
Solution.

1 2 0 − 1 0 1 0
1+2 2+2 3+2
3 −1 2 = 2·(−1) · 3 2 +(−1)·(−1) · + +0·(−1) · 3 2
2 0 1 2 1 2 1 −

= −2 · (−1) + (−1) · 1 − 0 = 1

1 2 0 + 1 2 1 2
3 −1 2 = 0·(−1)1+3 · 3 −1 +2·(−1)2+3 · − +1·(−1)3+3 · 3 −1
2 0 1 2 0 2 0 +

= 0 − 2 · (−4) + 1 · (−7) = 1
Question. Why is the method of cofactor expansion not practical for large n?
Solution.
To compute the determinant of a large n × n matrix,
o one reduces to n determinants of size (n − 1) × (n − 1),
o then n(n − 1) determinants of size (n − 2) × (n − 2),
o and so on.
In the end, we have n! = n(n − 1) · · · 3 · 2 · 1 many numbers to add.
WAY TOO MUCH WORK! Already

25! = 15511210043330985984000000 ≈ 1.55 · 1025 .


P.-S. Laplace (1749–1827)

Remark. The cofactor expansion is also called Laplace expansion, because it is due
Pierre-Simon Laplace.
LINEAR ALGEBRA
Eigenvectors and Eigenvalues

ILLINOIS
Department of Mathematics
Definition. Let A be an n × n matrix. An eigenvector of A is a nonzero v ∈ Rn such that
Av = λv,

where λ is a scalar, known


  as the eigenvalue associated
 with v.  
1 0 −2 0
Example. Verify that is an eigenvector of A = . Is an eigenvector?
1 −4 2 1
Solution.
        
1 0 −2 1 −2 1
A = = = −2 x2
1 −4 2 1 −2 1
Av2
 
1 v2 v1
Hence, is an eigenvector of A with eigenvalue −2.
1
         x1
0 0 −2 0 −2 0
A = = 6= λ
1 −4 2 1 2 1
  Av1
0
for any scalar λ! Thus is not an eigenvector of A.
1
 
0 1
Example. Find the eigenvectors and the eigenvalues of A = .
1 0
Solution.

    
0 1 x1 x
= 2
1 0 x2 x1 x2
     
1 1 1
A = =1· v2 v1 Av1
1 1 1
 
1
; is an eigenvector with eigenvalue 1. x1
1
      Av2
−1 1 −1
A = = −1 ·
1 −1 1
 
−1
; is an eigenvector with eigenvalue −1.
1
 
1 0
Example. Find the eigenvectors and the eigenvalues of B = .
0 0
Solution.

    
1 0 x1 x
= 1 x2
0 0 x2 0
     
1 1 1
B = =1· . v2
0 0 0
  v1
1
; is an eigenvector with eigenvalue 1. Bv2 Bv1 x1
0
     
0 0 0
B = =0
1 0 1
 
0
; is an eigenvector with eigenvalue 0.
1
Definition. Let λ be an eigenvalue of A. The eigenspace of A associated with λ is the set

Eigλ (A) := {v : Av = λv}.

It consists of all the eigenvectors of A with eigenvalue λ and the zero vector.
   
0 1 1 0
Example. Draw the eigenspaces of the two matrices A = and B = .
1 0 0 0
Solution.

x2 x2
Eig−1 (A)     Eig0 (A)    
x x x x
A =1· B =1·
Eig1 (A) x x 0 0
Eig1 (A)
x1     x1    
−x −x 0 0
A = −1· B =0·
x x x x
LINEAR ALGEBRA
Computing Eigenvalues and Eigenvectors

ILLINOIS
Department of Mathematics
Theorem 58. Let A be an n × n matrix and λ be a scalar. Then λ is an eigenvalue of A if
and only if det(A − λI ) = 0.
Proof.
λ is an eigenvalue of A.
⇔ there is a non-zero v ∈ Rn such that Av = λv.
⇔ there is a non-zero v ∈ Rn such that Av − λv = 0.
⇔ there is a non-zero v ∈ Rn such that (A − λI )v = 0.
⇔ Nul(A − λI ) 6= {0}.
⇔ A − λI is not invertible.
⇔ det(A − λI ) = 0.

Theorem 59. Let A be an n × n matrix. Then pA (t) := det(A − tI ) is a polynomial of


degree n. Thus A has at most n eigenvalues.
Definition. We call pA (t) the characteristic polynomial of A.
 
3 1
Example. Find the eigenvalues of A = .
1 3
Solution.
     
3 1 1 0 3−λ 1
A − λI = −λ =
1 3 0 1 1 3−λ

3−λ 1
det(A − λI ) = = (3 − λ)2 − 1
1 3−λ

= λ2 − 6λ + 8 = 0 ; λ1 = 2, λ2 = 4

Theorem 60. Let A be n × n matrix and let λ be eigenvalue of A. Then

Eigλ (A) = Nul(A − λI ).


 
3 1
Example. Determine the eigenspaces of A = .
1 3
Solution.
Eigenvalue λ1 = 2:
   
1 1 1 1
A − 2I = ;
1 1 RREF 0 0
       
x1 −x2 −1 −1
;x= = = x2 ; Nul(A − 2I ) = span
x2 x2 1 1

Eigenvalue λ2 = 4:
   
−1 1 1 −1
A − 4I = ;
1 −1 RREF 0 0
       
x1 x2 1 1
;x= = = x2 ; Nul(A − 4I ) = span
x2 x2 1 1
 
3 2 3
Example. Find the eigenvalues and eigenspaces of A = 0 6 10 .
0 0 2
Solution.
3−λ 2 3
det(A − λI ) = 0 6−λ 10 = (3 − λ)(6 − λ)(2 − λ)
0 0 2−λ
; A has eigenvalues 2, 3, 6. The eigenvalues of a triangular matrix are its diagonal entries.
     
1 2 3 1 0 −2 2
λ1 = 2 : A − 2I = 0 4 10 ; 0 1 2.5 ; Nul(A − 2I ) = span −5/2
RREF
0 0 0 0 0 0 1
     
0 2 3 0 1 0 1
λ2 = 3 : A − 3I = 0 3 10
  ; 0 0
 1 ; Nul(A − 3I ) = span
  0
RREF
0 0 −1 0 0 0 0
−2
     2 
−3 2 3 1 3 0 3
λ3 = 6 : A − 6I = 0 0 10
  ; 0  0 1 ; Nul(A − 6I ) = span  1 
RREF
0 0 −4 0 0 0 0
LINEAR ALGEBRA
Properties of Eigenvectors and Eigenvalues

ILLINOIS
Department of Mathematics
Definition. Let A be an n × n matrix and let λ be an eigenvalue of A.
○ The algebraic multiplicity of λ is its multiplicity as a root of the characteristic
polynomial, that is, the largest integer k such that (t − λ)k divides pA (t).
○ The geometric multiplicity of λ is the dimension of the eigenspace Eigλ (A) of λ.
 
1 1
Example. Find the eigenvalues of A = and determine their algebraic and geometric
0 1
multiplicities.
Solution.

1−λ 1
○ det(A − λI ) = = (1 − λ)2
0 1−λ
So: λ = 1 is the only eigenvalue (it has algebraic multiplicity 2).
   
0 1 1
○ λ=1: A−I = ; Nul(A − I ) = span
0 0 0
Only dimension 1! ; Geometric multiplicity 1.
Theorem 61. Let A be an n × n matrix and let v1 , . . . , vm be eigenvectors of A
corresponding to different eigenvalues. Then v1 , . . . , vm are linearly independent.
Proof.

For simplicity, assume m = 2. Let λ1 , λ2 be such that Av1 = λ1 v1 and Av2 = λ2 v2 .


Suppose there are scalars c1 , c2 such that
c1 v1 + c2 v2 = 0.
Now multiply this relation with A:
A(c1 v1 + c2 v2 ) = c1 Av1 + c2 Av2 = c1 λ1 v1 + c2 λ2 v2 = 0
Subtracting λ2 times the first equation from the second, we obtain

c1 (λ1 − λ2 )v1 = 0.
Since λ1 6= λ2 , we get that c1 = 0. Also c2 = 0, since v2 6= 0.
 
a11 a12 . . . a1n
a21 a22 . . . a2n 
Definition. Let A = 
 ..
 .

... . . . ... 

. The trace of A is the sum of the diagonal entries of A;
an1 an2 . . . ann
that is
Tr(A) = a11 + a22 + · · · + ann .

Theorem 62. Let A be n × n matrix with eigenvalues λ1 , . . . , λn . Then


○ Tr(A) = λ1 + λ2 + · · · + λn .
○ det(A) = λ1 · λ2 · · · · · λn .
Theorem 63. Let A be a 2 × 2-matrix. Then the characteristic polynomial of A is
p(λ) = λ2 − Tr(A)λ + det(A).

Solution.

a11 − λ a12
p(λ) = = (a11 −λ)(a22 −λ)−a12 a21 = λ2 −(a11 + a22 ) λ+(a11 a22 − a12 a21 )
a21 a22 − λ | {z } | {z }
Tr(A) det(A)
LINEAR ALGEBRA
Markov matrices

ILLINOIS
Department of Mathematics
Definition. An n × n matrix A is a Markov matrix (or: stochastic
matrix) if it has only non-negative entries, and the entries in each column
add up to 1.
A vector in Rn is a probability vector (or: stochastic vector) if has only
non-negative entries, and the entries add up to 1.

Example. Give examples of Markov matrices and probability vectors.


A. Markov (1856–1922)
Solution.

 
.1    


 0 .25 .4

.25 1 1
.1 .5    2 1 2
, 1 .25 .2 , .05 . Note that   is not a probability vector, but   is.
.9 .5   3 10 3
0 .5 .4  .5 
4 4
.1
Question. Let A be n × n Markov matrix and v ∈ Rn be a probability vector. Is Av also a
probability vector?
Solution.
    
a11 . . . a1n v1 v1 a11 + · · · + vn a1n
.. .  .   ..
Av =  . ..   ..  = 
 
. 
an1 . . . ann vn v1 an1 + · · · + vn ann
;(v1 a11 + · · · + vn a1n ) + · · · + (v1 an1 + · · · + vn ann )
= v1 (a11 + · · · + an1 ) + · · · + vn (a1n + · · · + ann ) = v1 + · · · + vn = 1.
| {z } | {z }
=1 =1

Theorem 64. Let A be a Markov matrix.Then


○ 1 is an eigenvalue of A and every other eigenvalue λ of A satisfies |λ| ≤ 1.
○ If A has only positive entries, then any other eigenvalue satisfies |λ| < 1.
Definition. A stationary probability vector of a Markov matrix is probability vector v that is
an eigenvector of A corresponding to the eigenvaule 1.
Theorem 65. Let A be an n × n-Markov matrix with only positive entries and let z ∈ Rn be
a probability vector. Then
z∞ := lim Ak z exists,
k→∞

and z∞ is a stationary probability vector of A (ie. Az∞ = z∞ ).


Proof.
For simplicity, assume that A has n different eigenvalues λ1 , λ2 , . . . , λn .
Since A is a Markov matrix and has only positive entries, we can assume that λ1 = 1 and
|λi | < 1 for all i = 2, . . . , n.
Let v1 , v2 , . . . , vn ∈ Rn such that Avi = λi vi .
Since eigenvectors to different eigenvalues are linear independent, the above eigenvectors form
a basis of Rn .
Thus there are scalar c1 , . . . , cn such that z = c1 v1 + · · · + cn vn .

; Ak z = c1 λk1 v1 + · · · + cn λkn vn → c1 v1 ,
because |λi | < 1 for each i = 2, . . . , n. Why is c1 6= 0? Because Ak z is a probability vector.
Example. Consider a fixed population of people with or without job. Suppose that each year,
50% of those unemployed find a job while 10% of those employed lose their job.
What is the unemployment rate in the long term equilibrium?
Solution.
.5  
x
.9 empl. umempl. .5 ; ∞ stationary probability vector
y∞
.1    
−0.1 0.5 1 −5
A − 1I = ;
0.1 −0.5 RREF 0 0
xt : % of population employed at time t    
yt : % of population unemployed at time t −0.1 0.5 5
; Nul = span
0.1 −0.5 1
      
xt+1 .9xt + .5yt .9 .5 xt    
= = x∞ 5/6
yt+1 .1xt + .5yt .1 .5 yt Since x∞ + y∞ = 1 ; = .
     y∞ 1/6
x∞ 0.9 0.5 x∞ Unemployment rate in the long term equilib-
=
y∞ 0.1 0.5 y∞ rium is 1/6
LINEAR ALGEBRA
Diagonalization

ILLINOIS
Department of Mathematics
Definition. A square matrix A is said to be diagonalizable if there is a invertible matrix P
and a diagonal matrix D such that A = PDP −1 .
Theorem 66. Let A be an n × n matrix that has n linearly independent eigenvectors
v1 , v2 , . . . , vn with associated eigenvalues λ1 , . . . , λn . Then A is diagonalizable as PDP −1 ,
where  
λ1
 
P = v1 . . . vn and D = 
 .. 
. 
λn
Proof.

   
AP = A v1 . . . vn = Av1 . . . Avn
 
λ1
  
= λ1 v1 . . . λn vn = v1

. . . vn  ..  = PD

.
λn
; A = PDP −1
 
6 −1
Example. Diagonalize A = .
2 3
Solution.

6 − λ −1
det(A − λI ) = = (6 − λ)(3 − λ) + 2 = λ2 − 9λ + 20 = (λ − 4)(λ − 5)
2 3−λ

1 − 12
     1 
2 −1
λ1 = 4 : A − 4I = ; ; Nul(A − 4I ) = span 2
2 −1 RREF 0 0 1
     
1 −1 1 −1 1
λ2 = 5 : A − 5I = ; ; Nul(A − 5I ) = span
2 −2 RREF 0 0 1

  1 
4 0 1
D= ,P = 2 ; A = PDP −1 .
0 5 1 1
Definition. Vectors v1 , . . . , vn form an eigenbasis of n × n matrix A if v1 , . . . , vn form a
basis of Rn and v1 , . . . , vn are all eigenvectors of A.
Theorem 67. The following are equivalent for n × n matrix A:
○ A has an eigenbasis.
○ A is diagonalizable.
○ The geometric multiplicities of all eigenvalues of A sums up to n.
Theorem 68. Let A be n × n matrix and let B = (v1 , . . . , vn ) be an eigenbasis of A. Then
there is a diagonal matrix D such that
A = IEn ,B DIB,En .
Proof.

Set P = v1 . . . vn . Recall that IEn ,B = P and IB,En = IE−1


 
n ,B
.

A = PDP −1 = IEn ,B DIB,En .


; Diagonalization is base change to eigenbasis!
Question. Let A be an n × n with n distinct eigenvalues. Is A diagonalizable?
Solution.

Yes. Eigenvectors to different eigenvalue are linearly independent. Thus A has n linearly inde-
pendent eigenvectors.

0 1
Question. Is A = diagonalizable?
0 0
Solution.
−λ 1
det(A − λI ) = = λ2
0 −λ
   
0 1 1
λ1 , λ2 = 0 : A − 0I = ; Nul(A) = span
0 0 0
; Only one linearly independent eigenvector. No eigenbasis
LINEAR ALGEBRA
Powers of Matrices

ILLINOIS
Department of Mathematics
Idea. If A has an eigenbasis, then we can raise A to large powers easily!
Theorem 69. If A = PDP −1 , where D is a diagonal matrix, then for any m,
Am = PD m P −1
Proof.
Am = (PDP −1 )m
= (PDP −1 ) · (PDP −1 ) · · · (PDP −1 )
| {z }
m times
−1 −1
= (PD)(P · P)(DP ) · · · (PDP −1 )
= PD · DP −1 · · · PDP −1
= PD · D · · · D · P −1
= PD m P −1

Remark. Finding D m is easy!


m
(λ1 )m
  
λ1
m
D =
 .. 
=
 .. 
.  . 
λn (λn )m
1
 
0 0
2
Example. Let A = − 21 1 6. The matrix has the following eigenvectors:
0 0 2
     
1 0 0
1
v1 = 1 with λ1 = , v2 = 1 with λ2 = 1, v3 = 6 with λ3 = 2.
2
0 0 1
Find A100 .
Solution.
A P D P −1
z }| { z }| { z }| { z }| {
−1 
1 1
   1  
2 0 0 1 0 0 2 0 0 1 0 0 1 0 0 2 0 0 1 0 0
− 1 1 6 = 1 1 6  0 1 0 1 1 6 = 1 1 6  0 1 0 −1 1 −6
2
0 0 2 0 0 1 0 0 2 0 0 1 0 0 1 0 0 2 0 0 1
   1 100    1

1 0 0 2 0 0 1 0 0 2100 0 0
1
A100 = 1 1 6  0 1100 0  −1 1 −6 = ( 2100 − 1) 1 (6 · 2100 − 6)
0 0 1 0 0 2 100 0 0 1 0 0 2100
Example.
  Let A be a 2 × 2 matrix such that  
1 1 1
is an eigenvector with eigenvalue 2 , is an eigenvector with eigenvalue 1.
1 0
 
2
Let v = . Graphically determine how An v behaves as n → ∞.
1
Solution.
x2 x2
   
n n 1 1
v A v=A +
1 0
b1
Ab1    
Av n 1 n 1
=A +A
1 0
b2 x1 Ab2 x1    
1 n 1 n 1
=( ) +1
2 1 0
 
1
→ as n → ∞
0
LINEAR ALGEBRA
Matrix exponential

ILLINOIS
Department of Mathematics
Definition. Let A be an n × n-matrix. We define the matrix exponential e At as
(At)2 (At)3
e At = I + At + + + ...
2! 3!
 
2 0
Example. Compute e At for A = .
0 1
Solution.
(2t)k 0
   
2t 0
At = ; (At) = k
0 t 0 tk
1 (2t)2 0
     
At 1 0 2t 0
e = + + + ...
0 1 0 1t 2! 0 t2
" 2
#
1 + 2t + (2t)
 2t 
2! + . . . 0 e 0
= =
0 t2
1 + t + 2! + . . . 0 et

Theorem 70. Let A be an n × n matrix. Then


○ The series in the definition of e At always converges, ○ e At e As = e A(t+s) ,
○ e At e −At = In , ○ d At
dt (e ) = Ae At .
Theorem 71. Let A be a n × n matrix such that A = PDP −1 for some invertible matrix P
and some diagonal matrix D. Then

e At = Pe Dt P −1 .
Proof.
Recall that
Ak = PD k P −1 .
(At)2 (At)3
e At = I + At + + + ...
2! 3!
PD 2 P −1 t 2 PD 3 P −1 t 3
= I + PDP −1 t + + + ...
2! 3!
 D 2t 2 D 3t 3 
= P I + Dt + + + . . . P −1
2! 3!
= Pe Dt P −1
     −1
−2 1 1 1 −1 0 1 1
Example. Let A = = . Compute e At .
1 −2 1 −1 0 −3 1 −1
Solution.

e At = Pe Dt P −1
   −t  −1
1 1 e 0 1 1
=
1 −1 0 e −3t 1 −1
   −t  1 1

1 1 e 0 2 2
=
1 −1 0 e −3t 12 − 12
 −t
e −3t
 1 1

e 2 2
= −t
e −e −3t 12 − 21
 1 −t 1 −3t 1 −t 1 −3t 
e + e e − e
= 21 −t 21 −3t 21 −t 21 −3t
2 e − 2 e 2e + 2e
LINEAR ALGEBRA
Linear Differential Equations

ILLINOIS
Department of Mathematics
Definition. A linear (first order) differential equation is an equation of the form

du
= Au
dt
where u is function from R to Rn . A further condition u(0) = v, for some v in Rn is called a
initial condition.
Theorem 72. Let A be an n × n matrix and v ∈ Rn . The solution of the differential
equation du At
dt = Au with initial condition u(0) = v is u(t) = e v.
du
Example. Let n = 1. Find the solution of the differential equation dt = −u and u(0) = 1.
Solution.
  u(t)
In this case, A = −1 .Thus

u(t) = e −t · 1 = e −t
t
Theorem 73. Let A be an n × n-matrix, and let v ∈ Rn be an eigenvector of A with
eigenvalue λ. Then e At v = e λt v.
Proof.
(At)2 (At)3 t 2 A2 v t 3 A3 v
e At v = (I + At + + + . . . )v = (v + tAv + + + ...)
2! 3! 2! 3!
(λt)2 v (λt)3 v (λt)2 (λt)3
= (v + λtv + + + . . . ) = (1 + λt + + + . . . )v = e λt v
2! 3! 2! 3!
Theorem 74. Let A be an n × n-matrix and (v1 , . . . , vn ) be an eigenbasis of A with
eigenvalues λ1 , . . . , λn . If v = c1 v1 + · · · + cn vn ,then the unique solution to the differential
equation du
dt = Au with initial condition u(0) = v is

e At v = c1 e λ1 t v1 + · · · + cn e λn t vn .
Proof.

e At v = e At (c1 v1 + · · · + cn vn ) = c1 e At v1 + · · · + cn e At vn
= c1 e λ1 t v1 + · · · + cn e λn t vn .
 
0 1 du
Example. Let A = . Solve the differential equation dt = Au with initial condition
1 0
 
1
u(0) = .
0
Solution.
○ Eigenvalues: det(A − λI ) = λ2 − 1 ; λ1 = 1, λ2 = −1
   
1 1
○ Eigenvectors: eigenvector with eigenvalue 1, eigenvector with eigenvalue −1
1 −1
       
1 1 1 1 1 1
○ Express in terms of the eigenbasis: =2 +2 .
0 0 1 −1

         
At 1 At 1 1 1 1 1 At 1 1 At 1
e =e ( + )= e + e
0 2 1 2 −1 2 1 2 −1
     1 t 1 −t 
1 1 1 1 e + e
= et + e −t = 12 t 21 −t → ∞ (as t → ∞)
2 1 2 −1 2e − 2e
 
1 2 du
Example. Let A = . Solve the differential equation dt = Au with initial condition
2 1
 
0
u(0) = .
1
Solution.
○ Eigenvalues: det(A − λI ) = λ2 − 2λ − 3 = (λ − 3)(λ + 1) ; λ1 = 3, λ2 = −1
   
1 1
○ Eigenvectors: eigenvector with eigenvalue 3, eigenvector with eigenvalue −1
1 −1
       
0 0 1 1 1 1
○ Express in terms of the eigenbasis: =2 + −2 .
1 1 1 −1

         
At 0 At 1 1 1 1 1 At 1 1 At 1
e =e ( +− )= e +− e
1 2 1 2 −1 2 1 2 −1
     1 3t 1 −t 
1 1 1 1 e − e
= e 3t − e −t = 12 3t 21 −t → ∞ (as t → ∞)
2 1 2 −1 2e + 2e
LINEAR ALGEBRA
Orthogonal projection onto a line

ILLINOIS
Department of Mathematics
Definition. Let v, w ∈ Rn . The orthogonal projection of v onto the line spanned by w is
w·v
projw (v) := w.
w·w

Theorem 75. Let v, w ∈ Rn . Then projw (v) is the point in span(w) closest to v; that is
dist(v, projw (v)) = min dist(v, u).
u∈span(w)

Proof.
x2 ○ We know projw (v) ∈ span(w). So projw (v) =
w cw for some scalar c.
○ Observe that v − projw (v) orthogonal to w.
v
projw (v) ○ So
x1
0 = w · (v − projw (v))
= w · (v − cw) = w · v − cw · w
w·v
;c =
w·w
Remark. Note that v − projw (v) (called the error term) is in span(w)⊥ .
v = projw (v) + v − projw (v)
| {z } | {z }
∈span(w) ∈span(w)⊥
   
−2 3
Example. Find the orthogonal projection of v = onto the line spanned by w = .
1 1
Solution.
 
w·v −6 + 1 3 x2
projw (v) = w= 2
w·w 3 + 12 1
v
   
5 3 −1.5 w
=− =
10 1 −.5
x1
The error term is projw (v)
     
−2 −1.5 −0.5
v − projw (v) = − = .
1 −.5 1.5
Theorem 76. Let w ∈ Rn . Then for all v ∈ Rn
 
1 T
projw (v) = ww v.
w·w
Proof.
Trick: cu = uc for every scalar c and u ∈ Rn (Think of c as 1 × 1 matrix).

Rewrite the formula for projw (v).

w·v 1 1
projw (v) = w= (wT v)w = w(wT v)
w·w w·w  ·w
w
1 1
= (wwT )v = wwT v.
w·w w·w

1 T

Remark. Note that w·w ww is an n × n matrix, we call the orthogonal projection
matrix onto span(w).
 
1
Example. Let w = . Find the orthogonal projection matrix P onto span(w). Use it to
1
     
1 1 1
calculate the projections of , , onto span(w).
0 1 −1
Solution.
   
1 T 1 1   1 1 1
P= ww = 1 1 = .
w·w 2 1 2 1 1
      
1 1 1 1 1 1 1
○ If v = , then projw (v) = Pv = 2 =2
0 1 1 0 1
      
1 1 1 1 1
○ If v = , then projw (v) = Pv = 21 = =v
1 1 1 1 1
      
1 1 1 1 1 0
○ If v = , then projw (v) = Pv = 2 =
−1 1 1 −1 0
LINEAR ALGEBRA
Orthogonal projection onto a subspace

ILLINOIS
Department of Mathematics
Theorem 77. Let W be a subspace of Rn and v ∈ Rn . Then v can be written uniquely as
v = |{z} v⊥
v̂ + |{z}
in W in W ⊥
Proof.
Note that dim W + dim W ⊥ = n. Let (w1 , . . . , w` ) be
a basis of W and let (u1 , . . . , um ) be a basis of W ⊥ .
w3
v W Since ` + m = n, (w1 , . . . , w` , u1 , . . . , um ) is a basis of
v⊥ Rn . Thus
w1
w2 v̂ v = c1 w1 + · · · + c` w` + d1 u1 + . . . dm um
| {z } | {z }
=:v̂ =:v⊥

Definition. We say v̂ is the orthogonal projection of v onto W - written projW (v).


Remark. If (w1 , . . . , wm ) is an orthogonal basis of W , then
   
v · w1 v · wm
projW (v) = w1 + . . . + wm .
w1 · w1 wm · wm
     
 3 0  0
Example. Let W = span 0 , 1
   and v = 3 . Compute projW (v).

1 0 10
|{z} |{z}
=:w1 =:w2
Solution.

v · w1 v · w2
projW (v) = w1 + w2
w1 · w1 w2 · w 2 This calculation only works
   
0 3
   
0 0 for orthogonal w1 , w2 !
 3  · 0    3  · 1      
3 0 0 3
10 1 10 0
=     0 +     1 v⊥ =  3  − 3
3 3 1 0 0 0 10 1
0 · 0 1 · 1  
−3
1 1 0 0
      = 0 
3 0 3 9
10  
= 0 + 3 1 = 3
10
1 0 1
Theorem 78. The projection map projW : Rn → Rn that sends v to projW (v) is linear.
Definition. The matrix PW is the matrix (projW )En ,En that represents projW with respect to
the standard basis. We call PW the orthogonal projection matrix onto W .
   
3 0
Example. Compute PW for W = span  0 , 1.
 
Solution. 1 0
     
1 3 1 0
     
0·0   0·1  
     
1   
3 0 3
0 1 0 0 3  
 
0
 
0
projW (0) =     0 +     1 = 10 0
3 3 0 0 projW (1) = 1
0    1
0·0
   0
1·1
1
      0 0
1 1 0 0 9 3

      10 0 10
0 3 0 0 ; PW =  0 1 0 
     
0·0
0·1   3 1
10 0 10
        
0 3
  
0 3
1 1   1 0   1  
projW (0) = 10 0 + 1 1 = 10 0
1 1 0 1
Question. Let PW be the orthogonal projection matrix onto W in Rn , and let v ∈ Rn .
○ If PW v = v, what can you say about v? ○ If PW v = 0, what can you say about v?
Solution.
Let v = w + u, where w ∈ W and u ∈ W ⊥ . Then w = PW v.

If PW v = v, we get that w = v and thus v is in W .

If PW v = 0, then u = v and thus v in W ⊥ .


Question. What is the orthogonal projection matrix PW ⊥ for projecting onto W ⊥ ?
Solution.
Set Q = I − PW , where I is the identity. Then PW ⊥ = Q. Why?
Let v be in Rn . In order to show that Qv is the projection of v onto W ⊥ , we need to check
that Qv ∈ W ⊥ and v − Qv ∈ (W ⊥ )⊥ .
Since PW v is the projection of v onto W , Qv = v − PW v is in W ⊥ .
Now observe that v − Qv = PW v. Thus v − Qv is in W = (W ⊥ )⊥ .
LINEAR ALGEBRA
Least squares solutions

ILLINOIS
Department of Mathematics
Goal. Suppose Ax = b is inconsistent. Can we still find something like a best solution?
Definition. Let A be an m × n matrix and b ∈ Rm . A least squares solution (short: LSQ
solution) of the system Ax = b is a vector x̂ ∈ Rn such that

dist(Ax̂, b) = minn dist(Ax, b).


x∈R

Theorem 79. Let A be an m × n matrix and b ∈ Rm . Then x̂ is an LSQ solution to Ax = b


if and only if Ax̂ = projCol(A) (b).
Proof.
Recall that Col(A) = {Ax : x ∈ Rn }.

x2
Col(A) Let x̂ ∈ Rn . So Ax̂ ∈ Col(A).
b
b̂ dist(Ax̂, b) = minn dist(Ax, b) ⇐⇒ dist(Ax̂, b) = min dist(w, b)
x1 x∈R w∈Col(A)

⇐⇒ Ax̂ = projCol(A) (b)


   
1 1 2
Example. Let A = −1 1 and b = 1. Find an LSQ solution of Ax = b.
0 0 1
Solution.
     
2 1 2 1
1·−1 
     
 1·1        
  
1   
1 1 1 2
1 0 1 0   1  3   
b̂ = projCol(A) (b) =     −1 +    1 = −1 + 1 = 1 .
1 1 1 1 2 2
−1·−1
   0    0
1·1
0 0 0
     
0 0 0 0

Solve Ax̂ = b̂:    


1 1   2
−1 1 x̂1 = 1
x̂2
0 0 0
 
1/2
We have already solved equation in the process when computing b̂: x̂ = .
3/2
Theorem 80. Let A be an m × n matrix and b ∈ Rm . Then x̂ is an LSQ solution to Ax = b
if and only if AT Ax̂ = AT b.
Proof.
x̂ is a least squares solution of Ax = b
⇐⇒ Ax̂ − b is as small as possible
⇐⇒ Ax̂ − b is orthogonal to Col(A)
FTLA
⇐==⇒ Ax̂ − b is in Nul(AT )
⇐⇒ AT (Ax̂ − b) = 0
⇐⇒ AT Ax̂ = AT b
Theorem 81. Let A be an m × n matrix with linearly independent columns and b ∈ Rm .
Then projCol(A) (b) = A(AT A)−1 AT b.
Proof.
Let x̂ be an LSQ solution of Ax = b. The hypotheses implies that (AT A) is invertible. Then

AT Ax̂ = AT b ; x̂ = (AT A)−1 AT b


; projCol(A) (b) = Ax̂ = A(AT A)−1 AT b
   
4 0 2
Example. Let A = 0 2 and b = 0 . Find a LSQ solution of Ax = b.
  
1 1 11
Solution.
 
  4 0  
T 4 0 1  17 1
A A= 0 2 = 
0 2 1 1 5
1 1
 
  2  
T 4 0 1   19
A b= 0 =
0 2 1 11
11
   
T T 17 1 19
The normal equations A Ax̂ = A b are x̂ = .
1 5 11
 
1
Solving, we find x̂ = .
2
   
4 0   4
1
The projection of b onto Col(A) is Ax̂ = 0 2 = 4.
2
1 1 3
LINEAR ALGEBRA
Linear Regression

ILLINOIS
Department of Mathematics
Example. Are there β1 , β2 ∈ R such y
that the data points × ×
×
(x1 , y1 ) = (2, 1), (x2 , y2 ) = (5, 2),
(x3 , y3 ) = (7, 3), (x4 , y4 ) = (8, 3) ×

all lie on the line y = β1 + β2 x. x


Solution.
Need that for i = 1, 2, 3, 4 yi = β1 + β2 xi .
In matrix form:
   
    1 2 1 1 2 1
1 x1 y1 0
1 5 2  3 1 
; 
 
1 x2  β1 y2    
1 
1 7 3  reduce 0 0
1 x3  β2 =
   
y3  3
1 8 3 0 0 0
1 x4 y4
design matrix X observation vector y The system is inconsistent.
y
Example (ctd). Find β1 , β2 such that × ×
the line y = β1 +β2 x best fits the data
points: ×
(x1 , y1 ) = (2, 1), (x2 , y2 ) = (5, 2), ×
(x3 , y3 ) = (7, 3), (x4 , y4 ) = (8, 3) x

Solution.
Normal equations:
Find a LSQ to
    XTX XTy
1 2 1     
1
  4 22 β̂1 9
5 β1 2 =

1
 =   22 142 β̂2 57
7 β2 3
1 8 3    
β̂1 2/7
design matrix X observation vector y ; =
β̂2 5/14
; LSQ-line: y = 2
7 + 5
14 x
Example. A scientist tries to find the relation between the y
mysterious quantities x and y . She measures the following values: × ×

x −2 −1 0 1 2
y 5 2.5 2.25 2 5 × ×
×

Find β1 , β2 , β3 such that y = β1 + β2 x + β3 x 2 best fits the data.


x
Solution.
XTX XTy
For i = 1, . . . , 5 : β1 + xi β2 + xi2 β3
= yi .    
5 0 10 β̂1 16.75

     0 10 0  β̂2  =  −0.5 
1 −2 4   5
1 −1 10 0 34 β̂3 44.5
1 β1
 2.5 
;
  
1 0 0 β2  =  2.25
     
1 1 1 β3  2  β̂1 1.78
1 2 4 5 ; β̂2  = −0.05
β̂3 0.79
=:X =:y
; LSQ-quadratic: y = 1.78 − 0.05x + 0.79x 2
Question. Suppose you are given a data Question. Suppose you are given a data
set (ui , vi , yi ), i = 1, . . . , n. You expect set (xi , yi ), i = 1, . . . , n. You expect that y
that y depends linearly on u and v . Which depends linearly on x and sin(x). Which lin-
linear system do you have to solve to deter- ear system to you have to solve to determine
mine this dependency? this dependency?
Solution. Solution.

Need: yi = β1 + β2 ui + β3 vi for each i = Need: yi = β1 + β2 xi + β3 sin(xi ) for each


1, . . . , n. i = 1, . . . , n.
   
1u1 v1   y1    
1 β1 1 x1 sin(x1 )   y1
u2 v2  y2 
1 x2 sin(x2 ) β1 y2 
; . β2 =
    
.. ..  .. 
; . .
 .. ..  β2  =
    
. . β . . .  .. 
3 . . .  β .
1 un vn yn 3
1 xn sin(xn ) yn
design matrix observation vector
design matrix observation vector
LINEAR ALGEBRA
Gram-Schmidt Method

ILLINOIS
Department of Mathematics
Theorem 82. Every subspace of Rn has an orthonormal basis.
Algorithm. (Gram-Schmidt orthonormalization) Given a basis a1 , . . . , am , produce an
orthogonal basis b1 , . . . , bm and an orthonormal basis q1 , . . . , qm .
b1
b1 = a1 , q1 = kb1 k
b2 q3
b2 = a2 − projspan(q1 ) (a2 ), q2 = kb2 k
| {z } a3 span(a1 , a2 )
=(a2 ·q1 )q1
b3 q1
b3 = a3 − projspan(q1 ,q2 ) (a3 ) q3 = kb3 k
| {z } q2
(a3 ·q1 )q1 +(a3 ·q2 )q2
··· ···
Remark.
○ span(q1 , . . . , qi ) = span(a1 , . . . , ai ) for i = 1, . . . , m.
○ qj ∈
/ span(a1 , . . . , ai ) for all j > i.
   
2 0
Example. Let V = span  1 , 0. Use the Gram-Schmidt method to find an
 
2 3
orthonormal basis of V .
Solution.
   
2 2
b1 1 
Set b1 = 1 and q1 =

kb1 k =3 1 .
2 2
             4
0 0 2 2 0 2 −
1   1     2    32 
b2 = 0 − ( 0 ·
    1 ) 1 = 0 − 1 = −3
3 3 3 5
3 3 2 2 3 2 3
 
−4
1
; q2 = √ −2
45 5
Theorem 83. (QR decomposition) Let A be an m × n matrix of rank n. There is is an
m × n-matrix Q with orthonormal columns and an upper triangular n × n invertible matrix R
such that A = QR.
Proof.  
Let’s do the 3 × 3 case: A = a1 a2 a3 . Using Gram-Schmidt we obtain orthonormal vectors
q1 , q2 , q3 such that
○ (q1 , q2 , q3 ) is a basis of Col(A)
○ a1 ∈ span(q1 ), a2 ∈ span(q1 , q2 ) and a3 ∈ span(q1 , q2 , q3 )

; a1 = (a1 · q1 ) q1 , a2 = (a2 · q1 ) q1 + (a2 · q2 ) q2 ,


a3 = (a3 · q1 ) q1 + (a3 · q2 ) q2 + (a3 · q3 ) q3

 
 a1 · q1 a2 · q1 a3 · q1
; A = a1 a2 a3
  
= q1 q2 q3  0 a2 · q2 a3 · q2 
0 0 a3 · q3
 
1 2 4
Example. Find the QR decomposition of A = 0 0 5.
0 3 6
Solution.

Apply Gram-Schmidt on columns of A:


             
1 2 2 1 1 0 0
q1 = 0
  b2 = 0 −
  0 · 0
   0 = 0 ; q2
 = 0

0 3 3 0 0 3 1
                 
4 4 1 1 4 0 0 0 0
b3 = 5 −
   5 · 0
   0 −
 5 · 0
   0 = 5 ; q3
 = 1

6 6 0 0 6 1 1 0 0
      
1 0 0 1 0 0 1 2 4 1 2 4
; Q = [q1 q2 q3 ] = 0 0 1 ; R = Q T A = 0 0 1 0 0 5 = 0 3 6
0 1 0 0 1 0 0 3 6 0 0 5
LINEAR ALGEBRA
Spectral Theorem

ILLINOIS
Department of Mathematics
Theorem 84. Let A be a symmetric n × n matrix. Then A has an orthonormal basis of
eigenvectors.
Proof.
We will just show that eigenvectors to different eigenvalues are orthogonal. Let v, w ∈ Rn be
such that
Av = λ1 v and Aw = λ2 w, for λ1 6= λ2 .

λ1 (v · w) = (λ1 v) · w
= (Av) · w
= (Av)T w
= vT AT w
= vT Aw ←− because AT = A
= v · (Aw)
= λ2 (v · w)
; v · w = 0, since λ1 6= λ2 .
Theorem 85. Let A be a symmetric n × n matrix. Then there is a diagonal matrix D and a
matrix Q with orthonormal columns such that A = QDQ T .
Proof.
Let Q be the matrix whose columns are orthonormal eigenvectors.
Then Q is orthogonal, that is Q −1 = Q T . Thus,

A = QDQ −1 = QDQ T .

Question. If A is an n × n matrix with an orthonormal eigenbasis, is it symmetric?


Solution.
Since A has an orthonormal eigenbasis, there is a diagonal matrix D and a matrix Q with
orthonormal columns such that A = QDQ T .

AT = (QDQ T )T = (Q T )T D T Q T = QDQ T = A

3 1
Example. Let A = . Write A as QDQ T , where D is diagonal and Q has orthonormal
1 3
columns.
Solution.

A has eigenvalues 2 and 4.


   
1 1 1
λ1 = 2: Eigevector v1 = . Normalized, we get q1 = √2 .
−1 −1
   
1 1 1
λ2 = 4: Eigenvector v2 = . Normalized, we get q2 = √2 .
1 1
   
2 0 1 1 1
D= and Q = 2 √
0 4 −1 1
      
3 1 1 1 1 2 0 √1 1 −1
A= = 2√
1 3 −1 1 0 4 2 1 1
        
3 1 1 1 1 2 0 √1 1 −1 √1
1
Example. Recall that A = = 2
√ , and q1 =
1 3 −1 1 0 4 2 1 1 2 −1
 
1
and q2 = √12 . Suppose v = −q1 + 12 q2 . Graphically, determine Av.
1
Solution.

x2 Aq2 x2 Av

v
q2 q2

x1 x1
q1 q1
Aq1
LINEAR ALGEBRA
Singular Value Decomposition

ILLINOIS
Department of Mathematics
Definition. Let A be an m × n matrix. A singular value decomposition of A is a
decomposition A = UΣV T where
○ U is an m × m matrix with orthonormal columns,
○ Σ is an m × n rectangular diagonal matrix with non-negative numbers on the diagonal,
○ V is an n × n matrix with orthonormal columns.
Remark. The diagonal entries σi = Σii which are positive are called the singular values of A.
We usually arrange them in decreasing order, that is σ1 ≥ σ2 ≥ . . .
Question. Let A be an m × n matrix with rank r . Recall why
○ Nul(AT A) = Nul(A) and
○ AT A is symmetric and has rank r .
Nul(AAT ) = Nul(AT ).
Solution.
v ∈ Nul(A) ; Av = 0 ; AT Av = AT 0 = 0 ; v ∈ Nul(AT A)
v ∈ Nul(AT A) ; AT Av = 0 ; (Av) · (Av) = (Av)T Av = vT AT Av = vT 0 = 0 ; v ∈ Nul(A).

n = dim Col(AT A) + dim Nul(AT A)


= dim Col(AT A) + dim Nul(A) = dim Col(AT A) + n − r ; dim Col(AT A) = r .
Algorithm. Let A be an m × n matrix with rank r .
○ Find orthonormal eigenbasis (v1 , . . . , vn ) of AT A with eigenvalues
λ1 ≥ · · · ≥ λr > λr +1 = 0 = · · · = λn .

○ Set σi = λi for i = 1, . . . , n.
○ Set u1 = σ11 Av1 , . . ., ur = σ1r Avr . (Magic: orthonormal!)
○ Find ur +1 , . . . , um ∈ Rm such that (u1 , . . . , um ) is an orthonormal basis of Rm .
○ Set  
σ1

U = u1 . . . um ,

Σ=
 .. 
 , V = v1 . . . vn
 
.
σmin{m,n}

Proof that A = UΣV T .


Note that vr +1 , . . . , vn ∈ Nul(AT A). Thus vr +1 , . . . , vn ∈ Nul(A).

; AV = A v1 . . . vn = Av1 . . . Avn
   

= σ1 u1 . . . σr ur 0 . . . 0 = UΣ ; A = UΣV T .
 
 
−1 1 0
Example. Compute the SVD of A = .
0 −1 1
Solution.
 
1 −1 0
AT A = −1 2 −1 √1
 
  6
0 −1 1 1 1 −1 1 0 − √2 
u1 := Av1 = √
σ1 3 0 −1 1 6

Eigenvalues: λ1 = 3, λ2 = 1, λ3 = 0 √1
 1   1   1  6
√ − √2 √
" 3 # " 1 #
6 3 1 − √6 − √2
− √2  
Eigenbasis:  , 0
  √1 
, =√ 3 = √1
6     3 3 √6 2
√1 √1 √1
2
 
6
| {z } | {z } | {z }
3
  − √12 " 1 #

:=v1 :=v2 :=v3 1 −1 1 0 
u2 = Av2 =  0  = √12

√ σ2 0 −1 1
Singular values: σ1 := 3, σ2 = 1 √1 2
 1 2
1 1


6
− √2 √3 √ 
"
− √12 √12
#
− √2 1 3 0 0 ; U = √1
V = 6
0 √ ,Σ =
3

√1
1 1 1 0 1 0 2 2
√ √ √
6 2 3
 
Theorem 86.  Let A be an m × n matrix with rank r , and let U = u1 . . . um ,
V = v1 . . . vn , Σ be such that A = UΣV T is a SVD of A. Then


○ (u1 , . . . , ur ) is a basis of Col(A). ○ (v1 , . . . , vr ) is a basis of Col(AT ).


○ (ur +1 , . . . , um ) is a basis of Nul(AT ). ○ (vr +1 , . . . , vn ) is a basis of Nul(A).
Proof.
For simplicity, assume U, V , Σ are constructed using our algorithm.

Note that Recall that

u1 =
1 1
Av1 , . . . , ur = Avr (vr +1 , . . . , vn ) ∈ Nul(AT A) = Nul(A).
σ1 σr
; u1 , . . . , ur ∈ Col(A) Since dim(Nul(A)) = n − r ,
; (u1 , . . . , ur ) basis of Col(A) ; (vr +1 , . . . , vn ) basis of Nul(A)
dim Col(A)=r

; (ur +1 , . . . , um ) basis of Col(A)⊥ = Nul(AT ) ; (v , . . . , v ) basis of Nul(A)⊥ = Col(AT )


1 r
LINEAR ALGEBRA
Low rank approximation via SVD

ILLINOIS
Department of Mathematics
 
Theorem
 87.  Let A be an m × n matrix with rank r , and let U = u1 . . . um ,
V = v1 . . . vn be matrices with with orthonormal columns and Σ be a rectangular diagonal
m × n matrix such that A = UΣV T is an SVD of A. Then

A = σ1 u1 v1T + σ2 u2 v2T + · · · + σr ur vrT


 
σ1

= u1 . . . ur 
 .. 
 v1 . . . vr
T
.
σr
Proof.
 
Suppose m ≤ n. σ1

A = u1

. . . um  .. 
 v1 . . . vn
T
.
σm
  T
= σ1 u1 . . . σm um 0 . . . 0 v1 . . . vn
= σ1 u1 v1T + · · · + σm um vm
T

= σ1 u1 v1T + · · · + σr ur vrT since σr +1 = · · · = σm = 0


Example. Use
 1
− √26 √1

" # √  √6
− √12 √1 6
 
−1 1 0 3 0 0  √1 √1 
= √1 √1
2
− 2 0
0 −1 1 0 1 0 1
2
2 2 √ √1 √1
3 3 3
 
−1 1 0
to write as a sum of rank 1 matrices.
0 −1 1
Solution.

" # " #
√ − √1 h 1 √1
 
−1 1 0 i h i
= 3 √1 2 √6 − √26 √1 +1 √1
2 − √12 0 √1
0 −1 1 2
6
2
2
 1
− 2 1 − 21
  1 1

−2 0 2
= 1 1 +
2 −1 2 − 12 0 1
2
Definition. For k ≤ r , define

Ak = σ1 u1 v1T + σ2 u2 v2T + · · · + σk uk vkT .

Idea. If σ1  σ2  . . . ..., then Ak is a good approximation of A.


Definition. Let A be an m × n matrix with rank r . A compact singular value
decomposition of A is a decomposition A = Uc Σc VcT where
 
○ Uc = u1 . . . ur is an m × r matrix with orthonormal columns,
○ Σc is an r × r diagonal matrix with positive diagonal elements,
 
○ Vc = v1 . . . vr is an n × r matrix with orthonormal columns.
Question. What is Col(Vc )? What is Col(Uc )?
Solution.
If we use our usual algorithm to compute an SVD, then

Col(Vc ) = Col(AT ) and Col(Uc ) = Col(A).


This is also true if you use a different algorithm.
Example. Use
 1
− √26 √1

" # √  √6
− √12 √1 6
 
−1 1 0 3 0 0  √1 √1 
= √1 √1
2
− 2 0
0 −1 1 0 1 0 1
2
2 2 √ √1 √1
3 3 3
 
−1 1 0
to find a compact SVD of .
0 −1 1
Solution.
 
−1 1 0
Note that has rank 2. Set
0 −1 1

√1 − √12
 
" #
− √12 √1 6
√ 
2 3 0 − √2
Uc = , Σc = , Vc = 0 
√1 √1 0 1 6
 
2 2 √1 √1
6 2
LINEAR ALGEBRA
The Pseudo-Inverse

ILLINOIS
Department of Mathematics
Definition. Let A be an m × n matrix with rank r . Given the compact singular value
decomposition T
 A = UcΣc Vc where
○ Uc = u1 . . . ur is an m × r matrix with orthonormal columns,
○ Σc is an r × r diagonal
 matrix with positive diagonal elements,
○ Vc = v1 . . . vr is an n × r matrix with orthonormal columns,
we define the pseudoinverse A+ of A as Vc Σ−1 T
c Uc .
Example. Recall that
T
√1 − √12

 " √1 √1
# √
− 6
 
−1 1 0 3 0 − √2
A := = √1 2 √1
2 0   .
0 −1 1 0 1 6

2 2 √1 √1
6 2
Determine A+ .
Solution.
√1 − √12 " 1 √1 − √12 " 1
     2
− 13

#" # # −3
6
− √2 √ 0 − √12 √1 6
− √2 − √6 √1
A+ = 6
0  3
√1 √1
2 = 6
0  1 √ √1
6 = 1
3 − 13 
0 1
   
1 2
√1 1
√ 2 2 √1 √1 2 2
6 2 6 2 3 3
Theorem 88. Let v ∈ Col(AT ) and w ∈ Col(A). Then A+ Av = v and AA+ w = w.
Proof.
Since Col(Vc ) = Col(AT ) and Vc has orthonormal columns, we have for every x ∈ Rn

projCol AT (x) = projCol(Vc ) (x) = Vc VcT x.

Thus since v ∈ Col(AT ), we get

A+ Av = Vc Σ−1 T T T
c Uc Uc Σc Vc v = Vc Vc v = projCol AT (v) = v.

Since Col(Uc ) = Col(A) and Uc has orthonormal columns, we have for every x ∈ Rm

projCol A (x) = projCol(Uc ) (x) = Uc UcT x.

Thus
AA+ w = Uc Σc VcT Vc Σ−1 T T
c Uc w = Uc Uc w = projCol A (w) = w.

Remark. If A is n × n and invertible, then Col(A) = Rn . Thus A−1 = A+ .


Question. Let v ∈ Rn such that vr + vn = v . What is A+ Av?
|{z} |{z}
∈Col(AT ) ∈Nul(A)

Solution.

A+ Av = A+ A(vr + vn ) = A+ Avr + A+ Avn = vr .

Theorem 89. Let A be an m × n matrix and let b ∈ Rm . Then A+ b is the LSQ solution of
Ax = b (with minimum length).
Proof.

Observe that
AA+ b = Uc Σc VcT Vc Σ−1 T T
c Uc b = Uc Uc b = projCol A (b)

Thus A+ b is a LSQ solution of Ax = b.

Remark. This is particularly useful, when solving many different LSQ problems of the form
Ax = b, where A stays the same, but b varies.
LINEAR ALGEBRA
PCA

ILLINOIS
Department of Mathematics
Setup.
○ Given m objects, we measure the same n variables.
○ Thus m samples of n-dimensional data ; m × n matrix (each row is a sample)
○ Analyse this matrix to understand what drives the variance in the data.
 T
Definition. Let X = a1 . . . am be an m × n matrix. We define the column average
µ(X ) of X as
1
µ(X ) := (a1 + · · · + am ).
m
We say X is centered if µ(X ) = 0.
1
For X centered, we define the covariance matrix cov(X ) of X as m−1 XTX.
Remark.
 T
○ Not centered, replace X by a1 − µ(X ) . . . am − µ(X ) .
○ If the columns of X are orthogonal, then cov(X ) is a diagonal matrix ; each variable is
independent.
○ What if not? Idea: Pick an eigenbasis of X T X .
Principal component analysis
○ Input: centered m × n-matrix X .
○ Compute cov(X ).
○ Since cov(X ) is symmetric, we can find an orthonormal eigenbasis v1 , v2 , . . . , vn of cov(X )
with eigenvalues λ1 ≥ · · · ≥ λn ≥ 0.
○ Write cov(X ) as a sum of rank 1 matrices:

cov(X ) = λ1 v1 v1T + · · · + λn vn vnT .

○ Each principal component vi explains part of the variance of the data. The larger λi , the
more of the variances is explained by vi .
Example. Let X be a centered 30 × 2-matrix. We
plot the data and see that
 
7.515 20.863
cov(X ) = .
20.863 63.795 x2

An orthonormal eigenbasis is

0.314
 
0.95
 v1
v1 = , v2 =
0.95 −0.314 x1
v2
with λ1 = 70.685 and λ2 = 0.625. Thus
   
6.97 21.085 0.564 −0.186
cov(X ) = + .
21.085 63.793 −0.186 0.0616
| {z } | {z }
λ1 v1 v1T λ2 v2 v2 T
PCA using SVD.
○ Let X be a centered data matrix. Observe that X T X = (m − 1) cov(X ).
○ To find orthonormal eigenbasis of cov(X ), it is enough to find orthonormal eigenbasis of
XTX.
○ Compute the SVD of X ; X = UΣV T .
 
○ The columns of V = v1 . . . vn are the desired orthonormal eigenbasis.
○ If σi is the singular value for vi , then

σi2
λi =
m−1
LINEAR ALGEBRA
Review of Complex Numbers

ILLINOIS
Department of Mathematics
Definition.
√ C = {x + iy | x, y ∈ R} where
2
i = −1, or i = −1.

○ The real part of z, denoted <(z) is de-


fined by <(z) = x. y

○ The imaginary part of z, denoted Im(z) z = x + iy


is defined by Im(z) = y . Im z = y ◦
○ The complex conjugate of z, denoted z, |z|
is defined by z = x − iy . <z = x
x
○ The absolute value, or magnitude of z,
denoted |z| or kzk, is given by
Im z = −y ◦
p z = x − iy
|z| = x 2 + y 2 .

Any point in 2
x
 R can be viewed as a complex
number: y ↔ x + iy
Definition. Given z = x + iy , w = u + iv , Example. Compute i(x + iy ), i 2 (x + iy ) and
we define i 3 (x + iy ) and i 4 (x + iy ).
z + w = (x + u) + i(y + v ) Solution.
i(x + iy ) = −y + ix
zw = (x + iy )(u + iv ) 2
i (x + iy ) = i(−y + ix) = −x − iy
= xu + x(iv ) + (iy )u + (iy )(iv ) i 3 (x + iy ) = i(−x − iy ) = y − ix
= (xu − yv ) + i(xv + yu) i 4 (x + iy ) = i(y − ix) = x + iy

=
Example. Compute (3 + 2i) + (4 − i) and −y + ix ◦
(3 + 2i)(4 − i). i x + iy

Solution. i
i i <
(3 + 2i) + (4 − i) = 7 + i −x − iy ◦
(3 + 2i)(4 − i) = 14 + 5i ◦ y − ix
=
Theorem 90. Let z ∈ C. z = x + iy
○ z =z ◦
○ |z|2 = zz
<
○ |z| = |z| ◦
z = x − iy
Proof. p
Given z = x + iy : z = x − iy and |z| = x 2 + y 2 .

z = x + iy = x − iy = x + iy

zz = (x + iy )(x − iy )
= x 2 − x(iy ) + (iy )x − (iy )(iy )
= x 2 + y 2 = |z|2 .
p q
|z| = |x + iy | = x 2 + y 2 = x 2 + (−y )2 = |x − iy | = |z|
LINEAR ALGEBRA
Complex Linear Algebra

ILLINOIS
Department of Mathematics
Goal. Use complex numbers (instead of real numbers) as scalars.
 
z1
z2 
Definition. The (complex) vector space Cn is of all complex column vectors z =  . ,
 
 .. 
zn
where z1 , z2 , . . . , zn are complex numbers.
○ Now multiplication by a complex scalar makes sense.
○ We can define subspaces, span, independence, basis, dimension for Cn in the usual way.
○ We can multiply complex vectors by complex matrices. Column space and Null space still
make sense.
○ The only difference is the dot product, you need to use the complex conjugate to get a
good notion of length:
   
z1 w1
 ..   .. 
 .  ·  .  = z1 w̄1 + z2 w̄2 + . . . zn w̄n .
zn wn
 
0 −1
Example. Find the complex eigenvectors and eigenvalues of A = .
1 0
Solution.

−λ −1
det(A − λI ) = = λ2 + 1
1 −λ

No real eigenvalues! The (complex) eigenvalues are λ1 = i and λ2 = −i.


     
−i −1 0 −1 + i(−i) 1 −i
λ1 = i : A − iI = ; ;
1 −i R1 →R1 +iR2 1 −i R1 ↔R2 0 0
 
i
; Nul(A − iI ) = span
1
     
i −1 0 −1 − i(i) 1 i
λ2 = −i : A + iI = ; ;
1 i R1 →R1 −iR2 1 i R1 ↔R2 0 0
 
−i
; Nul(A + iI ) = span       
1 0 −1 i −1 i
Let’s check: = =i
1 0 1 i 1
Definition. Let A be an m × n-matrix. The conjugate matrix A of A is obtained from A by
taking the complex conjugate of each entry of A.
Theorem 91. Let A be a matrix with real entries and λ is an eigenvalue of A. Then λ̄ is
also a eigenvalue. Furthermore, if v is an eigenvector with eigenvalue λ, then v̄ is an
eigenvector with eigenvalue λ̄.
Proof.
Suppose Av = λv. Observe that Ā = A, because A only has real entries. Thus

Av = Av = Av = λv = λv.

Thus v is an eigenvector of A to the eigenvalue λ.


T
Definition. Let A be an m × n-matrix. The conjugate transpose AH of A is defined as A .
We say the matrix A is Hermitian if A = AH .
 
1 1
Example. Find the eigenvectors and eigenvalues of A = . Can complex numbers help
0 1
you find an eigenbasis of A?

1−λ 1
○ det(A − λI ) = = (1 − λ)2
0 1−λ
So: λ = 1 is the only eigenvalue (it has algebraic multiplicity 2).
   
0 1 1
○ λ=1: A−I = =⇒ Nul(A) = span( )
0 0 0
So the eigenspace still has only dimension 1. Complex numbers didn’t help.

○ We can not find an eigenbasis for this matrix.


This kind of problem cannot really be fixed.
We have to lower our expectations and look for generalized eigenvectors.
These are solutions to (A − λI )2 x = 0, (A − λI )3 x = 0, . . .

You might also like