Lecture 2 Basics of Matrix Algebra
Shaojian Chen
September 23, 2021
1 Basic Definitions
Definition 1.1 (Matrix). A matrix is a rectangular array of numbers. More precisely,
an m n matrix has m rows and n columns. The positive integer m is called the row
dimension, and n is called the column dimension.
We use uppercase boldface letters to denote matrices. We can write an m n matrix
generically as
a11 a12 a13 a1n
a a22 a23 a2 n
A = aij = 21
am1 am 2 am3 amn
where aij represents the element in the ith row and the jth column. For example, a25
stands for the number in the second row and the fifth column of A. A specific example
of a 2 3 matrix is
2 −1 7
Α=
0
(A.1)
−4 5
Where a13 = 7 . The shorthand A = [aij ] is often used to define matrix operations.
Definition 1.2 (Square Matrix). A square matrix has the same number of rows and
columns. The dimension of a square matrix is its number of rows and columns.
Definition 1.3 (Vectors)
(i)A 1 m matrix is called a row vector (of dimension m) and can be written as
x ( x1 , x2 , , xm ) .
(ii)An n 1 matrix is called a column vector and can be written as
1 / 11
x1
x
x 2
xn
Definition 1.4 (Diagonal Matrix).A square matrix A is a diagonal matrix when all of its
off-diagonal elements are zero, that is, aij = 0 for all i j . We can always write a diagonal
matrix as
a11 0 0 0
0 a22 0 0
Α=
0 0 0 ann
Definition 1.5 (Identity and Zero Matrices)
(i) The n n identity matrix, denoted I, or sometimes In to emphasize its dimension,
is the diagonal matrix with unity (one) in each diagonal position, and zero elsewhere:
1 0 0 0
0 1 0 0
I In
0 0 0 1
(ii) The m n zero matrix, denoted 0, is the m n matrix with zero for all entries. This
need not be a square matrix.
2 Matrix Operations
2.1 Matrix Addition
Two matrices A and B, each having dimension m n , can be added element by element:
A + B = [ aij + bij ] . More precisely,
a11 + b11 a12 + b12 a1n + b1n
a +b a22 + b22 a2 n + b2 n
Α + Β = 21 21
am1 + bm1 am 2 + bm 2 amn + bmn
For example,
2 −1 7 1 0 −4 3 −1 3
−4 + =
5 0 4 2 3 0 7 3
2 / 11
Matrices of different dimensions cannot be added.
2.2 Scalar Multiplication
Given any real number g (often called a scalar), scalar multiplication is defined as
A [ aij ] , or
a11 a12 a1n
a a22 a2 n
Α = 21
am1 am 2 amn
For example, if = 2 and A is the matrix in equation (A.1), then
4 −2 14
Α =
−8 10 0
2.3 Matrix Multiplication
To multiply matrix A by matrix B to form the product AB, the column dimension of A
must equal the row dimension of B. Therefore, let A be an m n matrix and let B be an
n p matrix. Then, matrix multiplication is defined as
n
ΑΒ = aik bkj
k =1
In other words, the (i, j)th element of the new matrix AB is obtained by multiplying each
element in the ith row of A by the corresponding element in the jth column of B and
adding these n products together.
For example,
0 1 6 0
2 −1 0 1 −1
1 =
0 12
−4 −1
0 1
2 0
−1 −2 −24
0
1
3 0 0
We can also multiply a matrix and a vector. If A is an n m matrix and y is an m1
vector, then Ay is an n 1 vector. If x is a 1 n vector, then xA is a 1 m vector.
Matrix addition, scalar multiplication, and matrix multiplication can be combined in
various ways, and these operations satisfy several rules that are familiar from basic
operations on numbers. In the following list of properties, A, B, and C are matrices with
3 / 11
appropriate dimensions for applying each operation, and and are real numbers.
Most of these properties are easy to illustrate from the definitions.
Properties of Matrix Operations. (1) ( + )A = A + A ; (2) (A + B) = A + B ;
(3) ( )A = ( A) ; (4) (AB) = ( A)B ; (5) A + B = B + A ; (6) (AB)C = A(BC) ;
(7) (A + B) + C = A + (B + C) ; (8) A(B + C) = AB + AC ; (9) (A + B)C = AC + BC ;
(10) IA = AI = A ; (11) A + 0 = 0 + A = A ; (12) A − A = 0 ; (13) A0 = 0A = 0 ; and (14)
AB BA , even when both products are defined.
The last property deserves further comment. If A is n m and B is m p , then AB is
defined, but BA is defined only if n = p (the row dimension of A equals the column
dimension of B). If A is m n and B is n m , then AB and BA are both defined, but they
are not usually the same; in fact, they have different dimensions, unless A and B are both
square matrices. Even when A and B are both square, AB BA , except under special
circumstances.
2.4 Transpose
Definition 2.1 (Transpose). Let A = [aij ] be an m n matrix. The transpose of A,
denoted A (called A prime), is the n m matrix obtained by interchanging the rows
and columns of A. We can write this as A [a ji ] .
For example,
2 −4
2 −1 7
A= , A = −1 5
−4 5 0
7 0
Properties of Transpose. (1) (A) = A ; (2) ( A) = A for any scalar ; (3)
(A + B) = A + B ; (4) (AB) = BA , where A is m n and B is n k ; (5) xx =
n 2
x ,
i =1 i
where x is an n 1 vector; and (6) If A is an n k matrix with rows given by the 1 k
vectors a1 , a 2 , , a n , so that we can write
a1
a
A = 2
a n
then A = (a1 , a 2 , , a n ) .
4 / 11
Definition 2.2 (Symmetric Matrix). A square matrix A is a symmetric matrix if, and
only if, A = A .
If X is any n k matrix, then XX is always defined and is a symmetric matrix, as can be
seen by applying the first and fourth transpose properties.
2.5 Partitioned Matrix Multiplication
Let A be an n k matrix with rows given by the 1 k vectors a1, a2, , an, and let B be
an n m matrix with rows given by 1 m vectors b1, b2, , bn:
a1 b1
a b
A= 2
, B=
2
a n b n
Then,
n
AB = aibi
i=1
where for each i , ai b i is a k m matrix. Therefore, AB can be written as the sum of n
matrices, each of which is k m . As a special case, we have
n
AA = aiai
i=1
Where ai ai is a k k matrix for all i .
A more general form of partitioned matrix multiplication holds when we have matrices
A ( m n ) and B ( n p ) written as
A A12 B11 B12
A = 11 , B =
A 21 A 22 B 21 B 22
where A11 is m1 n1 , A12 is m1 n2 , A21 is m2 n1 , A22 is m2 n2 , B11 is n1 p1 , B12 is
n1 p2 , B21 is n2 p1 , and B22 is n2 p2 . Naturally, m1 + m2 = m , n1 + n2 = n , and
p1 + p2 = p .
When we form the product AB, the expression looks just when the entries are scalars:
A B + A12B 21 A11B12 + A12B 22
AB = 11 11
A 21B11 + A 22B 21 A 21B12 + A 22B 22
5 / 11
Note that each of the matrix multiplications that form the partition on the right is well
defined because the column and row dimensions are compatible for multiplication.
2.6 Trace
The trace of a matrix is a very simple operation defined only for square matrices.
Definition 2.3 (Trace). For any n n matrix A, the trace of a matrix A, denoted tr(A), is
the sum of its diagonal elements. Mathematically,
n
tr( A) = aii
i =1
Properties of Trace. (1) tr(I n ) = n ; (2) tr(A + B) = tr(A) + tr(B) ; (3) tr(A) = tr(A) ; (4)
tr( A) = tr(A) , for any scalar ; and (5) tr(AB) = tr(BA) , where A is m n and B is
n m .
2.7 Inverse
The notion of a matrix inverse is very important for square matrices.
Definition 2.4 (Inverse). An n n matrix A has an inverse, denoted A −1 , provided that
A −1A = I n and AA −1 = I n . In this case, A is said to be invertible or nonsingular. Otherwise,
it is said to be noninvertible or singular.
Properties of Inverse. (1) If an inverse exists, it is unique; (2) ( A)−1 = (1/ ) A −1 , if
0 and A is invertible; (3) ( AB)−1 = B−1A −1 , if A and B are both n n and
invertible; and (4) ( A) −1 = ( A −1 ) .
3 Linear Independence and Rank of a Matrix
Definition 3.1 (Linear Independence). Let {x1 , x 2 , , x r } be a set of n 1 vectors.
These are linearly independent vectors if, and only if,
1x1 + 2 x2 + + r xr = 0 (A.2)
implies that 1 = 2 = = r = 0 . If (A.2) holds for a set of scalars that are not all zero,
then {x1 , x 2 , , x r } is linearly dependent.
The statement that {x1 , x 2 , , x r } is linearly dependent is equivalent to saying that at
least one vector in this set can be written as a linear combination of the others.
Definition 3.2 (Rank)
6 / 11
(i)Let A be an n m matrix. The rank of a matrix A , denoted rank(A), is the maximum
number of linearly independent columns of A .
(ii)If A is n m and rank A = m, then A has full column rank.
If A is n m , its rank can be at most m. A matrix has full column rank if its columns form
a linearly independent set. For example, the 3 2 matrix
1 3
2 6
0 0
can have at most rank two. In fact, its rank is only one because the second column is
three times the first column.
Properties of Rank. (1) rank(A)=rank(A); (2) If A is n k , then rank( A ) min(n, k);
and (3) If A is k k and rank( A ) = k, then A is invertible.
4 Quadratic Forms and Positive Definite Matrices
Definition 4.1 (Quadratic Form). Let A be an n n symmetric matrix. The quadratic
form associated with the matrix A is the real-valued function defined for all n 1 vectors
x:
n n n
f (x) = xAx = aii xi2 + 2 aij xi x j
i =1 i =1 j i
Definition 4.2 (Positive Definite and Positive Semi-Definite)
A symmetric matrix A is said to be positive definite (p.d.) if
xAx 0 for all n 1 vectors x except x = 0 .
A symmetric matrix A is positive semi-definite (p.s.d.) if
xAx 0 for all n 1 vectors.
If a matrix is positive definite or positive semi-definite, it is automatically assumed
to be symmetric.
Properties of Positive Definite and Positive Semi-Definite Matrices. (1)A p.d.
matrix has diagonal elements that are strictly positive, while a p.s.d. matrix has
nonnegative diagonal elements; (2) If A is p.d., then A −1 exists and is p.d.; (3) If X is n k ,
7 / 11
then XX and XX are p.s.d.; and (4) If X is n k and rank(X) = k , then XX is p.d. (and
therefore nonsingular).
5 Idempotent Matrices
Definition 5.1 (Idempotent Matrix). Let A be an n n symmetric matrix. Then A is said
to be an idempotent matrix if, and only if, AA = A .
For example,
1 0 0
0 0 0
0 0 1
is an idempotent matrix, as direct multiplication verifies.
Properties of Idempotent Matrices. Let A be an n n idempotent matrix. (1)
rank(A) = tr(A) ; (2) A is positive semi-definite; (3) its characteristic roots are either
zero or one.
We can construct idempotent matrices very generally. Let X be an n k matrix with
rank(A) = k . Define
P X( XX)−1 X
M I n − X( XX) −1 X = I n − P
Then P and M are symmetric, idempotent matrices with rank(P) = k and
rank(M) = n − k . The ranks are most easily obtained by using Property 1:
tr(P) = tr[( XX) −1 XX] (from Property 5 for trace) = tr(I k ) = k (by Property 1 for trace).
It easily follows that tr(M) = tr(I n ) − tr(P) = n − k .
6 Differentiation of Linear and Quadratic Forms
Let g (x) = xAx , x = ( x1 , x2 , , xn ) and a be n 1 , A is a n n symmetric matrix,
then:
(1) ax / x = xa / x = a
g (x) / x1
(2) g (x) / x =
g (x) / x n
(3) g (x) / x = g (x) / x1 g (x) / xn
8 / 11
(4) (Ax) / x = A
(5) (xAx) / x = (A + A)x
(6) 2 (xAx) / xx = ( A + A)
7 Moments and Distributions of Random Vectors
In order to derive the expected value and variance of the OLS estimators using matrices,
we need to define the expected value and variance of a random vector. As its name
suggests, a random vector is simply a vector of random variables. We also need to define
the multivariate normal distribution.
7.1 Expected Value
Definition 7.1 (Expected Value)
(i)If x is an n 1 random vector, the expected value of x, denoted E(x), is the vector of
expected values: E(x)=[E(x1 ), E(x2 ),..., E(xn )] .
(ii)If Z is an n m random matrix, E(Z) is the n m matrix of expected values:
E(Z)=[E(zij )] .
Properties of Expected Value. (1) If A is an m n matrix and b is an n 1 vector, where
both are nonrandom, then E(Ax + b) = AE(x) + b ; (2) If A is p n and B is m k , where
both are nonrandom, then E(AZB) = AE(Z)B .
7.2 Variance-Covariance Matrix
Definition 7.2 (Variance-Covariance Matrix). If x is an n 1 random vector, its
variance-covariance matrix, denoted Var(x) , is defined as
12 12 ... 1n
21 22 ... 2 n
Var(x) =
2
n1 n 2 ... n
Where 2j = Var(x j ) and ij = Cor(xi , x j ) . In other words, the variance-covariance
matrix has the variances of each element of x down its diagonal, with covariance terms
in the off diagonals. Because Cov(xi , x j ) = Cov(x j , xi ) , it immediately follows that a
variance-covariance matrix is symmetric.
Properties of Variance. (1) If a is an n 1 nonrandom vector, then Var(a ' x) =
9 / 11
a '[Var(x)]a 0 ;(2)If Var(a ' x) 0 for all a 0 , Var(x) is positive definite;(3) Var(x) =
E[(x − μ)(x − μ)'] , where μ=E(x) ;(4) If the elements of x are uncorrelated, Var( x ) is a
diagonal matrix. If, in addition, Var(x j ) = 2 for j = 1, 2, , n , then Var(x) = 2 I n ; (5)
If A is an m n nonrandom matrix and b is an n 1 nonrandom vector, then
Var(Ax + b) = A[Var(x)]A ' .
7.3 Multivariate Normal Distribution
If x is an n 1 multivariate normal random vector with mean μ and variance-covariance
matrix Σ , we write x Normal(μ, Σ) .We now state several useful properties of the
multivariate normal distribution.
Properties of the Multivariate Normal Distribution. (1) If x Normal(μ, Σ) , then
each element of x is normally distributed; (2) If x Normal(μ, Σ) , then xi and x j , any two
elements of x, are independent if, and only if, they are uncorrelated, that is, ij = 0 ; (3) If
x Normal(μ, Σ) , then Ax+b ~Normal (Aμ+b, AΣA ') ,where A and b are nonrandom;
(4) If x ~Normal (0, Σ) then, for nonrandom matrices A and B, Ax and Bx are
independent if, and only if, AΣB ' = 0 .In particular, if Σ = 2 Ι n , then AB ' = 0 is
necessary and sufficient for independence of Ax and Bx;(5) If x~Normal (0, 2 Ι n ) , A is
a k n nonrandom matrix, and B is an n n symmetric, idempotent matrix, then Ax and
x ' Bx are independent if, and only if, AB =0; (6) If x ~Normal (0, 2 Ι n ) and A and B are
nonrandom symmetric, idempotent matrices, then x ' Ax and x ' Bx are independent
if, and only if, AB =0.
7.4 Chi-Square Distribution
We defined a chi-square random variable as the sum of squared independent
standard normal random variables. In vector notation, if x ~Normal (0, Ι n ) ,then
x ' x ~ χ 2n .
Properties of the Chi-Square Distribution. (1)If x ~Normal (0, Ι n ) and A is an n n
symmetric, idempotent matrix with rank(A) = q, then x ' Ax ~ χ 2q ; (2)If x ~Normal
(0, Ι n ) and A and B are n n symmetric, idempotent matrices such that AB = 0, then
x'Ax and x'Bx are independent, chi-square random variables; and (3) If z ~Normal
(0, C) , where C is an m m nonsingular matrix, then z ' C−1z ~ χ 2m .
10 / 11
7.5 t Distribution
Property of the t Distribution. If x ~Normal (0, Ι n ) , c is an n 1 nonrandom vector,
A is a nonrandom n n symmetric, idempotent matrix with rank q, and Ac=0, then
{c ' x / (c ' c)1/2}/ (x ' Ax / q)1/2 ~ tq .
7.6 F Distribution
Recall that an F random variable is obtained by taking two independent chi-square
random variables and finding the ratio of each, standardized by degrees of freedom.
Property of the F Distribution. If x ~Normal (0, Ι n ) and A and B are n n nonrandom
symmetric, idempotent matrices with rank (A) = k1, rank (B)=k2, and AB=0, then
(x'Ax / k1 ) / (x'Bx / k2 ) ~ Fk1 ,k2 .
11 / 11