Gaussian Elimination and its Variants
Matrix Multiplication
Abdellatif Serghini
Laurentian University
January 8th, 2025
Vectors and Matrices.
1. Vectors
2. Inner products
3. Matrices
4. Matrix-Vector product
5. Matrix-Matrix product
6. Block matrix and block matrix operations
7. Linear equations
Vectors
A vector of dimension n is an ordered collection of n elements, which
are called components.
A column vector
x1
x2
x=.
..
xn
◮ Also written as (row vector)
x = x1 , x2 , · · · , xn
◮ Set of n-vectors is denoted Rn
◮ xi : ith element or component of x
Special vectors:
◮ x = 0 (zero vector): xi = 0, i = 1, · · · , n
◮ x = ei (ith basis vector or ith unit vector): xi = 1, xk = 0 for k 6= i
Vector operations
◮ Scalar multiplication of a vector x with a scalar α
αx1
αx2
αx = .
.
.
αxn
◮ Addition and subtraction of two n-vectors x , y
x1 + y1 x1 − y1
x2 + y2 x2 − y2
x + y = . , x − y = .
.. ..
xn + yn xn − yn
Example of vectors
Image: N × N pixels, numbered 1 to N 2
1 2 3 ··· N
N+1 N+2 N+3 ··· 2N
2N + 1 2N + 2 2N + 3 ··· 3N
.. .. ..
. . .
N2 − N + 1 N2 − N + 2 N2 − N + 3 ··· N2
◮ Grayscale image: can be represented by a vector x of size
n = N 2 ; xi is the grayscale level of pixel i
◮ Grayscale is a range of shades of gray without apparent color.
◮ Black is represented by 0, and white is represented by 255.
◮ Color image: color of each pixel is given by three values (the
amount of red, green and blue); image can be represented by
three vectors x , y , z of size N 2 . Thus a color image will use three
times as much memory as a grayscale image.
Inner products.
The inner product of two n-vectors x , y is defined as
x1 y1 + x2 y2 + · · · + xn yn
notation x T y , x T is the transpose of a column vector x (The
transpose of a column vector is a row vector and vice versa).
Properties
◮ (αx )T y = α(x T y ) for all scalar α
◮ (x + y )T z = (x T z) + (y T z)
◮ xT y = yT x
Matrices
m × n-matrix A:
a1,1 a1,2 ··· a1,n
a2,1 a2,2 ··· a2,n
Am,n = . .. ..
.. ..
. . .
am,1 am,2 · · · am,n
◮ ai,j are the elements or coefficients
◮ set of m × n-matrices is denoted Rm×n
◮m, n are the dimensions
Special matrices
◮ A = 0 (zero matrix): ai,j = 0 for i = 1, · · · , m, j = 1, · · · , n
◮ A = I (identity matrix): m = n, ai,i = 1, ai,j = 0 for i 6= j
Matrix transpose
m × n-matrix A:
a1,1 a2,1 ··· am,1
a1,2 a2,2 ··· am,2
ATn,m = . .. ..
.. ..
. . .
a1,n a2,n · · · am,n
Properties
◮ AT is n × m if A is m × n
◮ (AT )T = A
◮ a square matrix A is symmetric if A = AT , i.e., ai,j = aj,i
Scalar multiplication and addition
◮ Scalar multiplication of an m × n-matrix A with a scalar α
αa1,1 αa1,2 · · · αa1,n
αa2,1 αa2,2 · · · αa2,n
αAm,n = . .. ..
.. ..
. . .
αam,1 αam,2 · · · αam,n
◮ Addition and subtraction of two m × n-matrices A and B
a1,1 ± b1,1 a1,2 ± b1,2 · · · a1,n ± b1,n
a2,1 ± b2,1 a2,2 ± b2,2 · · · a2,n ± b2,n
Am,n ± Bm,n = .. .. ..
..
. . . .
am,1 ± bm,1 am,2 ± bm,2 · · · am,n ± bm,n
Matrix-vector product
Product of m × n-matrix A with n-vector x
a1,1 x1 + a1,2 x2 + · · · + a1,n xn b1
a2,1 x1 + a2,2 x2 + · · · + a2,n xn b2
Am,n x = .. = .. = b
. .
am,1 x1 + am,2 x2 + · · · + am,n xn bm
◮ Dimensions must be compatible:
number of columns of a matrix A (n) = number of elements of a
vector x (n)
◮ number of elements of the vector b is m
◮ The ith component of b is:
n
X
bi = ai,1 x1 + ai,2 x2 + · · · + ai,n xn = ai,j xj
j=1
Matrix-vector product
From the equation
n
X
bi = ai,1 x1 + ai,2 x2 + · · · + ai,n xn = ai,j xj
j=1
we can obtain a computer pseudocode to perform matrix-vector
multiplication
bi ← 0
for i = 1 → m do
for j = 1 → n do
bi ← bi + ai,j xj
end for
end for
Matrix-vector product
◮ Another way to view matrix-vector multiplication:
a1,1 x1 + a1,2 x2 + · · · + a1,n xn
a2,1 x1 + a2,2 x2 + · · · + a2,n xn
Am,n x = .. =
.
am,1 x1 + am,2 x2 + · · · + am,n xn
a1,1 a1,2 a1,n b1
a2,1 a2,2 a2,n b2
.. x1 + .. x2 + · · · + .. xn = ..
. . . .
am,1 am,2 am,n bm
◮ b is a linear combination of the columns of A
◮ If we let Aj denote the jth column of A, we have
n
X
b= Aj xj
j=1
The cost of matrix-vector operations
Evaluated by counting the number of floating-point operations (real
numbers are stored in computers in a floating-point format)
◮one floating-point (flop): +, −, /, or ×
Example: Matrix-vector product: b = Ax for Am×n
bi ← 0
for i = 1 → m do
for j = 1 → n do
bi ← bi + ai,j xj
end for
end for
◮ bi + ai,j xj involves two flops
◮ for j = 1 → n do
bi ← bi + ai,j xj
end for Pn
involves j=1 2 = 2n
◮ to
Pm calculate b = Ax , the total of flops is
i=1 2n = m × (2n) = 2mn
Matrix-matrix product.
If A is m × n, B is n × p, the product C = AB is m × p
The entry (i, j) of C is
ci,j = nk =1 ai,k bk ,j
P
= the dot product of the ith row of A with the jth column of B
Example: A computer program to multiply A by B
C←0
for i = 1 → m do
for j = 1 → p do
for k = 1 → n do
ci,j ← ci,j + ai,k bk ,j
end for
end for
end for
◮ Cost of matrix-matrix operations:
p
m X n
X X
2 = m × (p × (2n)) = 2m × n × p
i=1 j=1 k =1
Block matrix notation.
" # " #
2 2 0 2 3 h i h i
B= ,C = ,D = 1 0 , E = −1 6 0
1 3 5 4 7
" # 2 22 30
B C
A= = 1 3 5 4 7
D E
1 0 −1 6 0
◮ A is a block matrix; B, C, D, E are the blocks of A
◮ Dimensions of the blocks must be compatible: B and D have the
same number of columns; B and C have the same number of
rows, etc.
◮ Partitioning matrices into blocks is a useful tool for: constructing
algorithms, developing faster variants of algorithms and proving
theorems.
Block matrix notation.
◮ Adding block matrices
" # " # " #
A B Ã B̃ A + Ã B + B̃
+ ,=
C D C̃ D̃ C + C̃ D + D̃
dimensions must conform (additions make sense)
◮ Multiplying block matrices
" #" # " #
A B W X AW + BY AX + BZ
=
C D Y Z CW + DY CX + DZ
dimensions must conform (multiplication and additions make
sense)
Linear equations
m equations in n variables x1 , x2 , · · · , xn :
a1,1 x1 + a1,2 x2 + · · · + a1,n xn = b1
a2,1 x1 + a2,2 x2 + · · · + a2,n xn = b2
..
.
am,1 x1 + am,2 x2 + · · · + am,n xn = bm
in matrix form: Ax = b, where:
a1,1 a1,2 · · · a1,n x1 b1
a2,1 a2,2 · · · a2,n x2 b2
A= . .. .. , x = .. , b = ..
.. ..
. . . . .
am,1 am,2 · · · am,n xn bm
Curve interpolation
◮ Curve interpolation is a problem that arises frequently in
computer graphics and in robotics (path planning).
◮ Interpolation problems require finding curves passing through
some given data points and possibly satisfying some extra
constraints.
◮ There are a number of interpolation problems, and we consider
one of the most common problems which can be stated as
follows:
Problem: Given n data points (t1 , y1 ), . . . , (tn , yn ), find a curve p,
such that p(ti ) = yi , for all i, 1 ≤ i ≤ n.
Polynomial interpolation
Fit a polynomial
p(t) = x1 + x2 t + x3 t 2 + · · · + xn t n−1
through n points (t1 , y1 ), · · · , (tn , yn )
◮ problem data: t1 , t2 , · · · , tn , y1 , y2 , · · · , yn
◮ problem variables: x1 , x2 , · · · , xn
write out the conditions on x :
p(t1 ) = x1 + x2 t1 + · · · + xn t1n−1 = y1
p(t2 ) = x1 + x2 t2 + · · · + xn t2n−1 = y2
.. ..
. .
p(tn ) = x1 + x2 tn + · · · + xn tnn−1 = yn
n linear equations in n variables, in matrix form: Ax = b, where:
t12 · · · t1n−1
1 t1 x1 y1
1 t2 t22 · · · t2n−1 x2 y2
A = . . , x = .. , b = ..
.. ..
.. .. . . . .
n−1
1 tn tn2 · · · tn xn yn
Electrical circuits
Consider the electrical circuit (see page 14). Suppose all the voltage
and currents are constant. We need to determine the four nodal
voltages x1 , x2 , x3 , x4 .
◮ Kirchhoff’s current law:
P the sum
P of the currents away from the
node must be zero ( Iin = Iout at any node).
◮ Ohm’s law: V = IR, the voltage (volts) drop is equal to the
current (amperes) times the resistance (ohms).
where I: current and R: resistance
write out the conditions on x :
I3,4 : the current from node 3 to node 4 through the 5 ohms resistor, by
Ohm’s law we obtain x3 − x4 = 5I3,4 or I3,4 = 1/5(x3 − x4 )
I1,3 : the current from node 1 to node 3 through the 1 ohms resistor, by
Ohm’s law we obtain x1 − x3 = I1,3
I5,3 : the current from node 5 to node 3 through the 2 ohms resistor, by
Ohm’s law we obtain x5 − x3 = 2I5,3
I5,3 + I1,3 = I3,4 (Kirchhoff’s current law) or
1 1
2 (x5 − x3 ) + 1(x1 − x3 ) = 5 (x3 − x4 ) with x5 = 6V (see Equation 3
below)
Applying the same procedure to nodes 1, 2 and 4, we get the
following system (4 linear equations in 4 unknowns)
2x1 − x2 − x3 = 0
−x1 + 1.5x2 − 0.5x4 = 0
−x1 + 1.7x3 − 0.2x4 = 3
− 0.5x2 − 0.2x3 + 1.7x4 = 0
The above system can be written in matrix form
Ax = b, b = [0, 0, 3, 0]T . The system has exactly one solution (the
matrix A is nonsingular).
Nonsingularity and Uniqueness of Solutions
Recall that the n × n identity matrix is denoted by I (it is the unique
matrix such that AI = IA = A
For a given matrix A, if there is a unique matrix B such that
AB = BA = I, then B is called the inverse of A and denoted by A−1 .
Theorem
◮ A−1 exists
◮ There is no nonzero x such that Ax = 0
◮ The columns of A are linearly independent
◮ The rows of A are linearly independent
◮ det(A) 6= 0 (the determinant of A)
◮ Given any vector b, there is exactly one vector x such that
Ax = b
The six conditions above are equivalent, if any one holds, they all
hold.
Numerical Solution of Differential Equations
Differential equations are widely applied to model natural
phenomena, engineering systems and many other situations. Solving
differential equations is a fundamental problem in science and
engineering.
Consider a differential equation
−u ′′ (x ) + cu ′ (x ) + du(x ) = f (x ) 0 < x < 1,
with boundary condition
u(0) = 0, u(1) = 0,
The problem is to solve for the function u, given the constants c and
d and the function f .
◮ The finite difference method obtains an approximate solution for
u(x ) at a finite set of x .
◮ Here we consider the discrete x are uniformly spaced in the
interval 0 ≤ x ≤ 1 such that
xj = jh, j = 0, 1, · · · , m
where m is the total number of spatial nodes, including those on
the boundary.
Given m, the spacing between the xi is computed with h = m1
Numerical Solution of Differential Equations
◮ The finite difference method involves using discrete
approximations like
u(xi + h) − u(xi − h) u(xi+1 ) − u(xi−1 )
u ′ (xi ) ≃ =
2h 2h
and
u(xi + h) − 2u(xi ) + u(xi − h) u(xi+1 ) − 2u(xi ) + u(xi−1 )
u ′′ (xi ) ≃ =
h2 h2
◮ Now, since the differential equation holds at each grid point, we
have
u ′′ (xi ) + c u ′ (xi ) + d u(xi ) = f (xi )
for i = 1, 2, · · · , m − 1
Numerical Solution of Differential Equations
◮ Substituting these approximations for the derivatives into the
differential equation, we obtain
u(xi+1 ) − 2u(xi ) + u(xi−1 ) u(xi+1 ) − u(xi−1 )
+c + d u(xi ) ≃ f (xi )
h2 2h
for i = 1, 2, · · · , m − 1
◮ Next, we replace u(xi ) by the symbol ui an approximation of
u(xi ), and f (xi ) by fi , to get
ui+1 − 2ui + ui−1 ui+1 − ui−1
+c + d ui = fi
h2 2h
◮ We obtain a linear system of m − 1 equations in the unknowns
u0 , u1 , · · · , um , but u0 = u(0) = 0 and um = u(xm ) = 0, leaving
m − 1 unknowns ( u1 , u2 , · · · , um−1 )
Numerical Solution of Differential Equations
For m = 6 we get a system of 5 equations in 5 unknowns. The
system of equations can be represented in matrix form as
a b1 0 0 0 u1 f1
b2 a b1 0 0 u2 f2
0 b2 a b1 0 u3 = f3
0 0 b2 a b1 u4 f4
0 0 0 b2 a u5 f5
where a = 72 + d, b1 = −36 + 3c and b2 = −36 − 3c.
Triangular Systems
◮ A square matrix A is lower triangular if ai,j = 0 for j > i
a1,1 0 ··· 0 0
a2,1 a2,2 ··· 0 0
.. ..
An,n = . ..
. . 0 0
an−1,1 an−1,2 · · · an−1,n−1 0
an,1 an,2 ··· an,n−1 an,n
◮ A is upper triangular if AT is lower triangular (if ai,j = 0 for j < i)
◮ A triangular matrix is nonsingular if and only if the diagonal
elements are nonzero (aii 6= 0, i = 1, 2, · · · , n).
◮ A triangular system is a system of the form: Ax = b with A
triangular matrix.
Solving a lower triangular system
Example
Solve Ax = b with A lower-triangular matrix and nonsingular
1 0 0 x1 −1
−2 1 0 x2 = −7
−1 1 1 x3 −6
Using forward substitution
x1 = −1
x2 = −7 + 2x1 = −9
x3 = −6 + x1 − x2 = 2
Solving a lower triangular system
Solve Ax = b with A lower-triangular matrix and nonsingular
a1,1 0 ··· 0 x1 b1
a2,1 a2,2 · · · 0 x2 b2
.. .. .. = ..
..
. . . . .
an,1 an,2 · · · an,n xn bn
Algorithm
x1 = b1 /a1,1
x2 = (b2 − a2,1 x1 )/a2,2
x3 = (b3 − a3,1 x1 − a3,2 x2 )/a3,3
..
.
Pi−1
xi = (bi − ai,1 x1 − ai,2 x2 − · · · − ai,i−1 xi−1 )/ai,i = a−1
i,i bi − j=1 ai,j x j
..
.
xn = (bn − an,1 x1 − an,2 x2 − · · · − an,n−1 xn−1 )/an,n
Algorithm
It is usual for a computer to store x over b since once we have
calculated xi , we no longer need bi (b1 is only used in calculating x1 ,
b2 is only used in calculating x2 , and so on).
for i = 1 → n do
for j = 1 → i − 1 do
bi ← bi − ai,j bj
end for
if ai,i = 0 then
set error, exit
end if
bi ← bi /ai,i
end for
Cost
Σni=1 Σi−1 n 2
j=1 2 = Σi=1 2(i − 1) = n(n − 1) = n − n
Algorithm
Cost
◮ The operations that are performed outside the j loop: ai,i is
compared with zero n times (if ai,i = 0), and there are n divisions
(bi /ai,i ). The total is proportional to n.
◮ The terms that are proportional to n will be insignificant
(compared to n2 ) if n is large enough.
◮ We ignore the lesser costs: the cost of doing forward substitution
is n2 flops (n2 − n ≃ n2 )
Algorithm
Cost
◮ With the known operation count going like O(n2 ), we can write
Tn = kn2
for some unknown constant.
◮ To determine how long solving A2n,2n x = b will take, we write
T2n = k (2n)2
= 4kn2
= 4Tn
◮ Doubling the size of the matrix is expected to increase the
computational time by a factor of 22 = 4.
Solving an upper triangular system
Solve Ax = b with A upper-triangular matrix and nonsingular
a1,1 · · · a1,n−1 a1,n x1 b1
.. .. .. .. .. ..
.
. . . . = .
0 · · · an−1,n−1 an−1,n xn−1 bn−1
0 ··· 0 an,n xn bn
Agorithm
xn = bn /an,n
xn−1 = (bn−1 − an−1,n xn )/an−1,n−1
xn−2 = (bn−2 − an−2,n−1 xn−1 − an−2,n xn )/an−2,n−2
..
.
x1 = (b1 − a1,2 x2 − a1,3 x3 − · · · − a1,n xn )/a1,1
cost
cost: n2 flops
Positive Definite Systems; Cholesky Decomposition
Cholesky decomposition is widely utilized in various fields, for
example Relevance Vector Machine (RVM) algorithm -M. E. Tipping,
Sparse Bayesian Learning and the Relevance Vector Machine,
Journal of Machine Learning Research, vol.1, pp.211-244, 2001- The
RVM is used in various fields, such as machine learning and image
pattern recognition..., and is associated with intensive matrix
operations, such as matrix matrix multiplication, Cholesky
decomposition, and matrix inversion...
Positive Definite Matrix
◮ A is positive definite matrix if A is symmetric and
x T Ax > 0 for all
x 6= 0
◮ If A is symmetric of order n, then
n X
X n n
X X
x T Ax = ai,j xi xj = ai,i xi2 + 2 ai,j xi xj
i=1 j=1 i=1 i>j
Positive Definite Systems; Cholesky Decomposition
Examples
◮
9 6 9 6
A= ,B =
6 5 6 3
◮ A is positive definite
x T Ax = 9x12 + 12x1 x2 + 5x22 = (3x1 + 2x2 )2 + x22
◮ B is not positive definite
x T Bx = 9x12 + 12x1 x2 + 3x22 = (3x1 + 2x2 )2 − x22
Positive Definite Systems; Cholesky Decomposition
Theorem
Let M be any n × n nonsingular matrix, and let A = M T M. Then A is
positive definite.
Proof
◮ AT = (M T M)T = M T (M T )T = M T M = A (A is symmetric)
◮ We must show x T Ax > 0 for all x 6= 0
x T Ax = x T (M T M)x = (Mx )TP
Mx
n
Let y = Mx , then x T Ax = y T y = i=1 yi2 > 0 for all y 6= 0.
◮ But y = Mx 6= 0 since M is nonsingular and x 6= 0.
◮ Then x T Ax > 0 for all nonzero x
Theorem
If A is positive definite, then A is nonsingular.
Proof
See p. 34
Cholesky Decomposition
Theorem
If A is positive definite, then A can be decomposed in exactly one way
into a product
A = RT R
such that R is upper triangular with ri,i > 0 (all main diagonal are
positive)
Proof
See p. 34
◮ Cholesky decomposition is useful because R and R T are
triangular
◮ To solve Ax = b with A = R T R
◮ Let y = Rx
◮ we have R T y = b, since R T is lower triangular, we can solve for y
(by forward substitution)
◮ Now from Rx = y , with R is upper triangular, we can solve for x (by
back substitution)
Cholesky Decomposition
Algorithm
write the decomposition
a1,1 a1,2 a1,3 · · · a1,n r1,1 0 0 ··· 0
a2,1 a2,2 a2,3 · · · a2,n r1,2 r2,2 0 ··· 0
a3,1 a3,2 a3,3 · · · a3,n r1,3 r2,3 r3,3 ··· 0
=
.. .. .. .. .. ..
.. ..
. . . . . . . .
an,1 an,2 an,3 · · · an,n r1,n r2,n r3,n · · · rn,n
r1,1 r1,2 r1,3 · · · r1,n
0 r2,2 r2,3 · · · r2,n
0 0 r3,3 · · · r3,n
.. .. ..
..
. . . .
0 0 0 · · · rn,n