See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/332981057
Fundamentals of Multivariate Analysis
Presentation · May 2019
CITATIONS READS
0 676
2 authors:
Omar Abdulmohsin Ali Asmaa G. Jabir
University of Baghdad University of Baghdad
61 PUBLICATIONS 84 CITATIONS 11 PUBLICATIONS 28 CITATIONS
SEE PROFILE SEE PROFILE
All content following this page was uploaded by Omar Abdulmohsin Ali on 10 May 2019.
The user has requested enhancement of the downloaded file.
Fundamentals of
Multivariate Analysis
Omar Abdulmohsin Ali
&
Asmaa Ghalib Jabir
Lecture One
Matrices
Matrix Algebra
The Matrix
It is an ordered array of a set of observations as m "rows"
and n "columns", those observations are called:
"elements".
A=(( aij )) , where is: i=1,2,3,…, m and j=1,2,3,…,
n.
Trace of a Matrix
The sum of elements on the main diagonal of a square
matrix (A) is called the "trace" of a matrix. For (n×n)
matrix (A),
Identity Matrix
The identity matrix is a square matrix with (1) one in each main diagonal
position and zero elements. Identity matrix is a special case of a diagonal
matrix.
The Vector
Vector is a n×1 matrix, that is, a matrix consisting of a single column
of n elements.
Diagonal Matrix
It is a square matrix with non-zero elements only on its main diagonal.
While some of ai elements may be zero.
Triangular Matrix
Upper triangle Matrix
If a square matrix has non-zero elements only on and above its main
diagonal, it is called "upper triangular" matrix.
Lower triangle Matrix
If a square matrix has non-zero elements only on and under its
diagonal, it is called "lower triangular" matrix.
Null Matrix
The (n×m) null matrix has zero in each of its positions and denoted by
(O).
Addition and Subtraction of Matrices
The sum of two matrices of like dimensions is the
matrix of the sum of corresponding elements. Let A & B
are matrices of the same oreder (n×n), then: their
addition will be:
Multiplication of Matrix
It is necessary that the number of columns of matrix A be equal to
the number of rows of matrix B. So, if A is of dimension p×r and
B is of dimension r×q, then, the multiplication results is of p×q
dimension with "ij th"elements of C computed as:
In general,
4 13 2
A B C 2 x3
6 3 4
Lecture Two
Matrix Algebra
Linear Transformation
A matrix can be regarded as specifying a linear transformation of
the vectors in one space to another, if y has m-components and y
has n-components. It is possible to express a transformation from
m-dimensional coordinate system of the elements of x to the n-
dimensional space of y in matrix form. y = Ax
Length of vector: Norm
Denoted by is the square root of the inner product of the vector
itself.
Normalization
The inner product of two vectors x & y is the product of the two
vectors by the cosine of the angle between them.
Orthogonal and Orthonormal vectors
The vectors are said to be orthogonal if the inner product of the
vectors will be (zero) when: ; i.e., (θ=90). A set of
orthogonal vectors is called an orthonormal set if each vector
has (unit) length ; i.e. equal (1) is:
Lecture Three
Linear
independent
vectors
The vectors x1 , x2 , . . . , xn are linearly dependent if there
are scalars k1, k2, . . ., kn not all zero such that :
k1 x1+k2 x2 + . . . +kn xn = 0
Otherwise, the set of vectors are linearly independent. i.e.
where it is impossible to find non-zero k1, k2, . . ., kn such
that :
The Determinant of a Square Matrix
Associated with every square matrix there is a unique
scalar number called its "determinant". The formal definition
of the determinant of n×n matrix A is the sum of all
products consisting of one element from each row and
column and multiplied by (-1) if the number of inversions of
the particular permutation j1, j2, j3, …, jn from the standard
order 1, 2, 3, …, n is odd.
Singular Matrix
A square matrix is called: "singular" if its determinant is
zero, and called: "non- singular" if its determinant is
non-zero.
Minors
A minor of element (aij) of A is the determinant of the
matrix formed by deleting the ith row and the jth
column of A. Cofactor Aij is the minor multiplied by (-
1)th. So, we can compute minor with respect to the
rows (i=1,2,3,…,n).
Principal minors
A minor of a matrix An×n is said to be a principal minor
if it is obtained by deleting certain rows and the same
columns of A. Thus, the diagonal elements of a
principal minor of A are the diagonal elements of A.
Inverse Matrix
The inverse of a square matrix A is that unique matrix A
with elements such that: AA-1 =A-1A = I
It is possible that A-1 does not exist, just as it is not to
perform scalar division by zero, then A is called to be
"singular".
Rank of a Matrix
The rank of matrix whether square or not is
defined as: the order of the largest non-zero
determinant that can be calculated from the
matrix.
Furthermore, the rank of a matrix can be defined
also as: the maximum number of linearly
independent vectors (either rows or columns) in
the matrix.
Note: If the number of rows/ (columns) in a
matrix exceeded the number of columns/ (rows),
then the rank of a matrix would be the number of
linearly independent columns/ (rows). i.e.
r(A)(m×n) ≤ min {m , n}
Lecture Four
Linear Algebra
Simultaneous linear equations
The set of equations with unknowns
x1, x2, x3, …, xn is called a system of m-
simultaneous linear equations in (n) unknowns.
and can be written in matrix form as:
And if C ≠ 0 ; then the system is called: "non-
homogeneous", There are (n-r+1) linearly
independent solutions. For non-homogeneous
system and (An×n) is square and non-singular, i.e.,
t here is unique solution of the system as:
That means, matrix A has full rank i.e., rank(A)=n.
Homogeneous System of Equations
If C = 0 ; then the system is called: "homogeneous", and
always there is consistent non-trivial solution. Iff rank(A)<n ,
there are (n-r) linearly independent solutions. So, the
solution of the system by using Generalized
Inverse.
Orthogonal Matrix
An orthogonal matrix A is a square matrix whose
rows are a set of orthogonal vectors. Hence,
A' is orthogonal too.
Quadratic Form
Quadratic form can be represented as:
To maximizing / (minimizing) some function f(x)
subjected a constrain: g(x) = c on values of x and
for more general method is that of "Lagrange
Multiplier".
Types of Quadratic Forms (Q.F.)
Quadratic forms can be classified according to the
nature of the characteristic roots λi of matrix of the
quadratic form itself.
Positive Definite Quadratic Form
A real symmetric square matrix An×n is positive definite
if the quadratic function represented with A is always
positive, except for x = 0. That is: x'Ax > 0 ,
Consequently, matrix A has only positive non-zero
characteristic values (λi > 0), i=1,2,3,…,n and this
matrix A will be full rank i.e., rank(A)=(n). Furthermore,
all principal minors are non-zero positive.
Semi-Positive Definite Quadratic Form
A real symmetric square matrix An×n is Semi-positive
definite if the quadratic function represented with A is
sometimes positive, except for x = 0. That is: x'Ax ≥ 0,
Consequently, matrix A has non-negative
characteristic values (λi ≥ 0), i=1,2,3,…,n and this matrix
A will be less full rank i.e., rank(A)<(n).
Negative Definite Quadratic Form
A real symmetric square matrix An×n is negative
definite if the quadratic function represented with A is
always negative, except for x = 0. That is: x'Ax < 0,
Consequently, matrix A has only negative
characteristic values (λi < 0), i=1,2,3,…,n.
Indefinite Quadratic Form
Quadratic form is said to be indefinite if there is a real
symmetric square matrix An×n, if all of the
characteristic roots λi are represented by mixture
values of positive, negative or zero.
Lecture Five
Characteristic roots
& Characteristic vectors
of a matrix
The characteristic roots of the (p×p) matrix A are the
solutions of the following determinant equation:
Laplace expansion is used to write the characteristic
polynomial as:
Since (Si) is the sum of all (i×i) principal minor
determinant, then:
Associated with every characteristic root
(latent , eigen, proper) : λi of the squre matrix A is
a characteristic vector xi whose elements satisfy the
homogeneous system of equations:
Properties of Characteristic Roots and Characteristic
Vectors
The characteristic roots of a positive definite symmetric
matrix A are all positive
(λi > 0), i.e.; there are p non-zero characteristic roots.
The characteristic roots of a positive semi-definite
symmetric matrix A are (λi ≥ 0), i.e.; p non-zero
characteristic roots and (n-p) zero.
For any positive definite symmetric matrix A, the
characteristic roots of (A-1) are the reciprocal of
characteristic roots of (A).
If we have the matrix (Ak) , (k is a positive integer) then, it
has the same characteristic vectors of (A) , i.e.;
is the characteristic root of (Ak) , where is (λi) is the (ith)
characteristic root of (A).
For every real symmetric matrix (A), there exists an
orthogonal matrix (P) such that: D = P'AP
where: D is the diagonal matrix of the characteristic
roots of (A).
The normalized characteristic vectors of (A) can be
taken as the columns of (P) , such that: P'P=I.
Any real quadratic form (Q.F.) can be reduced to a
weighted sum of squares by computing the
characteristic roots and vectors of its matrix, such
that: x'Ax = y'P'APy by using the orthogonal
transformation : x=Py where is:
x'Ax = y'Dy = λ1y12 + λ2y22 + …+ λryr2
Where is: λi are the characteristic roots of the coefficient matrix.
r is the rank of the form.
The characteristic roots of an idempotent matrix (AA = A) are
either zeros (0) or (1) ones and a quadratic form with such a
matrix can be reduced to a sum of ( r ) squared terms.
If (C) be the matrix whose columns are the (n) independent
characteristic vectors of (A) then: B = C-1AC be the diagonal
matrix with characteristic roots of (A). i.e.
C: non-singular (full rank)
If λi ≠ λj for a symmetric matrix (A) , their
corresponding vectors xi & xj are orthogonal.
If (A) is orthogonal matrix , then all of its characteristic
roots have absolute value of (1) i.e. (±1).
If λi > 0 , then x'Ax is positive definite and if λi ≥ 0 ,
then x'Ax is positive semi definite.
The characteristic roots of (AB) are identical to the
characteristic roots of (BA), i.e., r(AB) = r(BA) , as well
as: tr(AB) = tr(BA). Furthermore; tr(ABC) = tr(BCA)
Note: If the Q.F. is to be maximum then λ must be the
greater characteristic root of A and x is associated
vector. Similarly , if the Q.F. is to be minimum then λ
must be the minimum characteristic root of A.
Lecture Six
Partitioned Matrices
Let we can write Matrix A as partitioned matrices as
follows:
where the matrices Aij consist ri of rows and cj of
columns, in order that all partitioned matrices at a certain
row of matrix A have to contain the same number of rows.
Also, all partitioned matrices at a certain column of matrix
A have to contain the same number of columns.
The Sum of Partitioned Matrices
If we have two matrices A & B , which are partitioned in
such a way that sub-matrices have similar dimensions
The Product of Partitioned Matrices
If we have two matrices A & B , which are partitioned
in such a way that sub-matrices have convenient
dimensions.
Then, the main goal is how to partition two matrices A
& B adequately to get proper multiplication.
Determinantof Partitioned Matrices
Suppose we partition the matrix (A) as follows:
where A11 & A22 are both square and non-singular. It
is necessary to compute
If (A11) is non-singular , then:
Inverseof Partitioned Matrices
Suppose we partition the matrix (A) as follows:
where A11 & A22 are both square and non-singular,
then the elements of the inverse are given by:
where:
Lecture Seven
Differentiation with Vectors
and Matrices
The vector of partial derivatives is:
If f(x) is constant for all x ; then:
If f(x) = a'x = x'a , then:
If f(x) = x'Ax ; (A is symmetric) ; then:
If ( A is not symmetric A≠A') ; then:
I
f f(x) = x'Ax ; (A is identity) ; then:
Jacobi Matrix
Itrepresents the first derivative of some
function with respect to x vector.
Hessian Matrix
Itrepresents the second derivative of
some function with respect to x' and x
vectors respectively.
Lecture Eight
Determination of Maxima
and Minima Maximization
(Q.F.)
Quadratic form can be represented as :
To maximizing / (minimizing) some function f(x)
subjected a constrain: g(x) = c on values of x and for
more general method is that of "Lagrange Multiplier".
From a new function:
Note
Ifthe Q.F. is to be maximum then λ must be
the greater characteristic root of A and x is
associated vector. Similarly , if the Q.F. is to
be minimum then λ must be the minimum
characteristic root of A.
As a special case :
Lecture Nine
Multivariate Normal
Distribution (MVN)
Let xi ~ N(μi , σ), then the probability density
function is defined as:
Letting: are independent identical
distributed with normal distribution, then the joint
distribution of x will be as:
Definition:
For more general case of the above the
random vector x will distribute as:
said to have a multivariate normal distribution with
mean vector ( ) and variance-covariance matrix (Σ).
x ~ Np(μ , Σ)
Σ is symmetric matrix (σij = σji) and positive
definite, non-singular matrix, (σii = σi2) ,
Some properties of the mean
and variance for a vector
Let a, b, c and d are constants , a & b
are vectors , A & B are matrices.
1- E(ax) = a·E(x)
2- E(ax ± by) = a·E(x) ± b·E(y)
3- E(a'x) = a'μ
4-
ij covxi , x j E xi i x j j i & j 1,2,, p i j
, x)xi=, xVar(x)
ii cov i var xi E xi i i & j 1,2,, p i j
2
5- Cov(x
6- Var(x ± a) = Var(x) = Σ
7- Cov(y , x) = Cov(x , y)
8- Cov(xi±a , xj±b) = Cov(xi , xj)
9-
10- Cov(cxi , dxj) = cd Cov(xi , xj)
11-
12-
13- If A is (p×n) with all elements are constants, then:
Var(A) = A Var(x) A' = AΣA„
14-
15- If xi & xj are independent, then:
cov(xi , xj) = 0 , but ingeneral the converse is not
true
→ ρij = 0
16- If Σ is the covariance matrix and is the
correlation matrix (p.d.) then they are related as:
where σi is the standard deviations of the variates.
17- If we standardize each variates by:
then the density f(z) is given by:
ρ: is (p.d.) correlation matrix.
Lecture Ten
Some Characteristic
Multivariate Normal
Distribution
Representing
For
the Exponent of MVN
bivariate case (p=2) of standard normal for the
as Quadratic
above f(z) is givenForm
by:
where:
Moment Generating Function of MVN
Themoment generating function (m.g.f.) of 𝑥 is a
function from R → [0, ∞) , given by:
𝑀𝑥 (𝑡) = 𝑀𝑥 (𝑡1 ,,𝑡2, …,𝑡𝑝 ) = 𝐸 𝑒 𝑡1 𝑥1 +⋯…+𝑡𝑝 𝑥𝑝
𝑥 𝑡 ′
=𝐸 𝑒
for t=(t1,..,tp). where is:
It is very useful in distribution theory especially
correlations (sum of independent random variables)
asymptotic, and for generating moments. The main
use we have is that the m.g.f. determines the
distribution.
Two important probability of moment generating
function:
1- If two random vectors have the same moment
generating function, if and only if they have the
same density. This property is called the “
uniqueness” of moment generating function.
2- Two random vectors are independent if and
only if their joint moment generating function
factors into the product of their two separate
m.g.f., that is if:
𝑥 ′ = 𝑥1′ , 𝑥2′ 𝑎𝑛𝑑 𝑡 ′ = (𝑡1′ , 𝑡2′ ),
then x1 and x2 are independent if and only if:
𝑀𝑥 𝑡 = 𝑀𝑥1 𝑡 𝑀𝑥2 𝑡 .
The Standard Normal Distribution
The adjective '' standard '' indicates the
special case in which the mean is equal to
zero and the variance is equal to one.
Definition: Let x be a continuous random
variable. Let its support of random variable (is
the set of values that the random variable
can take), be the whole set of real numbers:
𝑅𝑋 = ℝ𝐾 .
We say that x has a standard normal distribution
if and only if its probability density function is
1 1
𝑓 𝑥 = exp − x2 .
2𝜋 2
The function f(x) is a legitimate probability density
function if it is non- negative and if its integral over
the support equals 1, it is not hard to show that if
𝑧~𝑁 0,1 , 𝐸 𝑧 = 0,
1
−2 𝑡 2
𝑣𝑎𝑟 𝑧 = 1, 𝑎𝑛𝑑 𝑀𝑧 𝑡 = 𝑒
Definition : The collection of random variable,
𝑧 = (𝑧1 , … , 𝑧𝑝 ) is a standard normal collection if the
𝑧𝑖 ′𝑠 are mutually independent standard normal
random variables. Because the variables in a
standard normal collection are independent,
where:
𝐸 𝑧 = 0, 𝑐𝑜𝑣 𝑧 = 𝐼𝑝 , 𝑎𝑛𝑑
1 2 2) 1 2
(𝑡 +⋯+𝑡 𝑡
𝑀𝑧 𝑡 = 𝑒 2 1 𝑝
= 𝑀𝑧 𝑡 = 𝑒 2
View publication stats