DDA3020 Machine Learning
Lecture 02 Linear Algebra
Jicong Fan
School of Data Science, CUHK-SZ
September 14, 2022
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 1 / 37
Outline
1 Notations, vectors, matrices
2 Matrix inverse, determinant, independence
3 Systems of linear equations
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 2 / 37
Reference
References for this lecture:
[Book1] Stephen Boyd and Lieven Vandenberghe, “Introduction to Ap-
plied Linear Algebra”, Cambridge University Press, 2018 (available online).
[Book2] Andreas C. Muller and Sarah Guido, “Introduction to Machine
Learning with Python: A Guide for Data Scientists”, O’Reilly Media, Inc.,
2017.
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 3 / 37
1 Notations, vectors, matrices
2 Matrix inverse, determinant, independence
3 Systems of linear equations
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 4 / 37
Notations, vectors, matrices
A scalar is a simple numerical value, like 15 or −3.2.
Variables or constants that take scalar values are denoted by an italic
letter, like x or a.
We shall focus on real numbers.
The summation over a collection {x1 , x2 , x3 , . . . , xm } is denoted like this:
m
X
xi = x1 + x2 + . . . + xm
i=1
The product over a collection {x1 , x2 , x3 , . . . , xm } is denoted like this:
m
Y
xi = x1 · x2 · . . . · xm
i=1
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 5 / 37
Notations, vectors, matrices
A vector is an ordered list of scalar values, called attributes. We denote a
vector as a bold character, for example, x or w.
Vectors can be visualized as arrows that point to some directions as well
as points in a multi-dimensional space.
In many books, vectors are written column-wise:
2 −2 1
a= , b= , c=
3 5 0
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 6 / 37
Notations, vectors, matrices
2 −2 1
Illustrations of three two-dimensional vectors, a = ,b = ,c = are
3 5 0
given in Figure 1 below.
Figure: Three vectors visualized as directions and as points.
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 7 / 37
Notations, vectors, matrices
We denote an attribute of a vector as an italic value with an index, like
this: w(j) or x(j) . The index j denotes a specific dimension of the vector,
the position of an attribute in the list.
For instance, in the vector a shown in red in Figure 1,
(1)
a 2 a 2
a = (2) = , or more commonly, a = 1 =
a 3 a2 3
Note:
The notation x(j) should not be confused with the power operator, such as the 2
in x2 (squared) or 3 in x3 (cubed).
If we want to apply a power operator, say square, to an indexed attribute of a
vector, we write like this: (x(j) )2 .
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 8 / 37
Notations, vectors, matrices
A matrix is a rectangular array of numbers arranged in rows and columns.
Below is an example of a matrix with two rows and three columns,
2 4 −3
X=
21 −6 −1
Matrices are denoted with bold capital letters, such as X or W.
Note:
(j) (k)
A variable can have two or more indices, like this: xi or like this xi,j .
(j)
For example, in neural networks, we denote as xl,u the input feature j of unit u
in layer l.
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 9 / 37
Notations, vectors, matrices
Operations on Vectors:
x1 y1 x1 + y1
x+y = + =
x2 y2 x2 + y2
x1 y1 x1 − y1
x−y = − =
x2 y2 x2 − y2
x ax1
ax = a 1 =
x2 ax2
1
1 1 x1 x
x= = a1 1
a a x2 a x2
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 10 / 37
Notations, vectors, matrices
Matrix or Vector Transpose:
x
x= 1 , x > = x1 x2
x2
x1,1 x1,2 x1,3 x1,1 x2,1 x3,1
X = x2,1 x2,2 x2,3 , X> = x1,2 x2,2 x3,2
x3,1 x3,2 x3,3 x1,3 x2,3 x3,3
Dot Product or Inner Product of Vectors:
x · y = x> y
y1
= x1 x2
y2
= x1 y1 + x2 y2
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 11 / 37
Notations, vectors, matrices
Matrix-Vector Product
x1,1 x1,2 x1,3 w1
Xw = x2,1 x2,2 x2,3 w2
x3,1 x3,2 x3,3 w3
x1,1 w1 + x1,2 w2 + x1,3 w3
= x2,1 w1 + x2,2 w2 + x2,3 w3
x3,1 w1 + x3,2 w2 + x3,3 w3
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 12 / 37
Notations, vectors, matrices
Matrix-Vector Product
w1,1 w1,2 w1,3
x> W = x1 x2 x3 w2,1
w2,2 w2,3
w3,1 w3,2 w3,3
= [(w1,1 x1 + w2,1 x2 + w3,1 x3 ) (w1,2 x1 + w2,2 x2 + w3,2 x3 ) ]
(w1,3 x1 + w2,3 x2 + w3,3 x3 )
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 13 / 37
Notations, vectors, matrices
Matrix-Matrix Product
x1,1 . . . x1,d w1,1 . . . w1,h
XW = ... .. .. .. .. ..
. . . . .
xm,1 . . . xm,d wd,1 . . . wd,h
(x1,1 w1,1 + . . . + x1,d wd,1 ) . . . (x1,1 w1,h + . . . + x1,d wd,h )
=
.. .. ..
. . .
(xm,1 w1,1 + . . . + xm,d wd,1 ) . . . (xm,1 w1,h + . . . + xm,d wd,h )
Pd Pd
i=1 x1,i wi,1 ... i=1 x1,i wi,h
=
.. .. ..
. . .
Pd Pd
x
i=1 m,i i,1w . . . x
i=1 m,i i,h w
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 14 / 37
1 Notations, vectors, matrices
2 Matrix inverse, determinant, independence
3 Systems of linear equations
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 15 / 37
Matrix inverse
Matrix Inverse
Definition:
A d × d square matrix A is called invertible (also nonsingular) if there
exists a d × d square matrix B such that AB = BA = I (Identity matrix)
given by
1 0 ... 0 0
0 1 . . . 0 0
I = ... . . . . . . . . . ... of d by d dimension
0 0 . . . 1 0
0 0 ... 0 1
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 16 / 37
Matrix inverse
Matrix Inverse Computation
1
A−1 = adj(A)
det(A)
det(A) is the determinant of A
adj(A) is the adjugate or adjoint of A which is the transpose of its cofactor
matrix C, i.e., adj(A) = C>
The cofactor Ci,j of a matrix is the (i, j)-minor Mi,j times a sign factor
(−1)i+j , i.e., Ci,j = Mi,j × (−1)i+j
Mi,j is computed through two-steps: removing the i-th row and j-th col-
umn from the original matrix to obtain a small matrix; computing the
determinant of the small
matrix
a b d −c d −b
For example, A = then C = , adj(A) =
c d −b a −c a
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 17 / 37
Determinant
Determinant computation
Example: 2 × 2 matrix
a b
det(A) = |A| = = ad − bc (1)
c d
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 18 / 37
Determinant
Determinant computation
Example: 3 × 3 matrix
a b c
|A| = d e f = a
e f − b d
f + c d
e
g h i h i g i g h
e f d f d e
= a − b
+ c
h i g i g h
= a(ei − f h) − b(di − f g) + c(dh − eg)
Each determinant of a 2 × 2 matrix in this equation is called a minor of the
matrix A. This procedure can be extended to give a recursive definition for
the determinant of a n × n matrix, known as Laplace expansion.
Determinant has an elegant geometric interpretation. If interested,
please refer to https://spaces.ac.cn/archives/1770.
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 19 / 37
Co-factor matrix
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 20 / 37
Co-factor matrix
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 21 / 37
Co-factor matrix
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 22 / 37
Co-factor matrix
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 23 / 37
Co-factor matrix
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 24 / 37
Linear dependence and independence
Linear dependence and independence
A collection of d-vectors x1 , . . . , xm (with m ≥ 1) is called linearly depen-
dent if
β1 x 1 + . . . + βm x m = 0
holds for some β1 , . . . , βm that are not all zero.
A collection of d-vectors x1 , . . . , xm (with m ≥ 1) is called linearly inde-
pendent, which means that
β1 x 1 + . . . + βm x m = 0
only holds for β1 = . . . = βm = 0.
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 25 / 37
Linear dependence and independence
Geometry of dependency and independency
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 26 / 37
1 Notations, vectors, matrices
2 Matrix inverse, determinant, independence
3 Systems of linear equations
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 27 / 37
Systems of linear equations
Consider a system of m linear equations in d variables w1 , . . . , wd :
x1,1 w1 + x1,2 w2 + . . . x1,d wd = y1
x2,1 w1 + x2,2 w2 + . . . x2,d wd = y2
..
.
xm,1 w1 + xm,2 w2 + . . . + xm,d wd = ym
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 28 / 37
Systems of linear equations
These equations can be written compactly in matrix-vector notation:
Xw = y,
where
x1,1 ... x1,d w1 y1
.. .. ,
X= . .. w = ... , y = ... .
. .
xm,1 ... xm,d wd ym
Note: X is of m × d dimension.
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 29 / 37
Systems of linear equations
(i) Square of even-determined system: m = d in Xw = y, X ∈ Rm×d
(equal number of equations and unknowns, i.e., X ∈ Rd×d )
If X is invertible (or X−1 X = I), then pre-multiply both sides by X−1 , we have
X−1 Xw = X−1 y
⇒ w = X−1 y
If all rows or columns of X are linearly independent, then X is invertible.
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 30 / 37
Systems of linear equations
Example w1 + w2 = 4 (1) Two unknowns and
w1 − 2w2 = 1 (2) two equations
1 1 w1 4
=
1 −2 w2 1
X w y
w = X−1 y
1 1 −1 4 3
= =
1 −2 1 1
Here, the rows or columns of X are linearly independent, hence X is invert-
ible.
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 31 / 37
Systems of linear equations
(ii)Over-determined system: m > d in Xw = y, X ∈ Rm×d
(i.e., there are more equations than unknowns)
This set of linear equations has NO exact solution (X is non-square and
hence not invertible). However, an approximated solution is yet avail-
able.
If the left-inverse of X exists such that X† X = I, then pre-multiply both
sides by X† results in
X† Xw = X† y
⇒ w = X† y
Definition: a matrix B that satisfies BA=I (identity matrix) is called a
left-inverse of A. (Note: A is m-by-d and B is d-by-m.
Note: The left-inverse can be computed as X† = (X> X)−1 X> given X> X
is invertible.
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 32 / 37
Systems of linear equations
Example w1 + w2 = 1 (1) Two unknowns and
w1 − w2 = 0 (2) three equations
w1 = 2 (3)
1 1 1
w1
1 −1 = 0
w2
1 0 2
X w y
This set of linear equations has NO
exact solution.
w = X† y = (X> X)−1 X>y Here X> X is invertible.
1
3 0 −1 1 1 1 1
= 0 = (Approximation)
0 2 1 −1 0 0.5
2
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 33 / 37
Systems of linear equations
(iii)Under-determined system: m < d in Xw = y, X ∈ Rm×d
(i.e., there are more unknowns than equations ⇒ infinite number of solutions)
If the right-inverse of X exists such that XX† = I, then the d-vector
w = X† y (one of the infinite cases) satisfies the equation Xw = y, i.e.,
XX† y = y
⇒y=y
Definition: a matrix B that satisfies AB=I (identity matrix) is called a
right-inverse of A. (Note: A is m-by-d and B is d-by-m).
Note: The right-inverse can be computed as X† = X> (XX> )−1 given
XX> is invertible.
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 34 / 37
Systems of linear equations
Derivation:
Under-determined system: m < d in Xw = y, X ∈ Rm×d
(i.e., there are more unknowns than equations ⇒ infinite number of solu-
tions ⇒ but a unique solution is yet possible by constraining the search
using w = X> a !)
If XX> is invertible, let w = X> a, then
XX> a = y
⇒ a = (XX> )−1 y
w = X> a = X> (XX> )−1 y
| {z }
X†
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 35 / 37
Systems of linear equations
Example w1 + 2w2 + 3w3 = 2 (1) Three unknowns and
w1 − 2w2 + 3w3 = 1 (2) two equations
w1
1 2 3 w2 2
=
1 −2 3 1
w3
X w y
This set of linear equations has in-
finitely many solutions along the in-
tersection line.
> −1
w= X> (XX
) y Here XX
>
is invertible.
1 1 0.15
14 6 −1 2
= 2 −2 = 0.25 (Constrained solution)
6 14 1
3 3 0.45
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 36 / 37
Systems of linear equations
Example w1 + 2w2 + 3w3 = 2 (1) Three unknowns and
3w1 + 6w2 + 9w3 = 1 (2) two equations
w1
1 2 3 w2 2
=
3 6 9 1
w3
X w y
Here both XX> and X> X are
NOT invertible!
There is NO solution for the
system.
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 37 / 37