Multivariate Normal Distribution
Multivariate Normal Distribution
Multivariate
Multivariate Analysis
Multivariate analysis refers to a set of statistical techniques used to analyze data that involves
more than one dependent (response) variable at the same time. It allows researchers to understand
relationships between multiple variables simultaneously and how they interact with each other.
Multivariate analysis in a broad sense is the set of statistical methods aimed simultaneously
analyze datasets. That is, for each individual or object being studied, analyzed several variables.
The essence of multivariate thinking is to expose the inherent structure and meaning revealed
within these sets if variables through application and interpretation of various statistical methods.
Statistical procedure for analysis of data involving more than one type of measurement or
observation. It may also mean solving problems where more than one dependent variable is
analyzed simultaneously with other variables.
It’s used when a single analysis involves several variables of different types (e.g.,
continuous, categorical).
It allows researchers to capture complex interrelationships between variables.
It is especially useful when dependent variables are correlated and should not be analyzed
separately.
You have 100 patients for a medical study. You measure 10 different body characteristics
(e.g. height, weight, LDL cholesterol, etc) and then monitor each patient for 20 different
symptoms over the next 2 years. You would use multivariate analysis to see which groups
of body characteristics correlate with which sets of symptoms.
Education
Portfolio analysis using multiple financial indicators (e.g., risk, return, volatility).
Market segmentation and investment profiling.
Time series modeling involving multiple economic indicators (e.g., GDP, inflation,
interest rates).
Process optimization where multiple outcomes (e.g., strength, durability, cost) are tracked.
Multivariate control charts for monitoring multiple product characteristics.
Reducing dimensionality of sensor data or manufacturing parameters.
Goal: To examine how several outcome variables change in response to one or more
predictors.
Why: Many real-world situations involve more than one outcome (e.g., health indicators,
academic performance, customer satisfaction dimensions).
Goal: To explore the correlation and interdependence among multiple variables (both
dependent and independent).
Why: Helps uncover hidden patterns, shared variance, or redundancy among variables.
Reduce Dimensionality
Goal: To simplify complex datasets by reducing the number of variables while retaining
most of the information (e.g., using Principal Component Analysis – PCA).
Why: Makes analysis easier, reduces noise, and improves model efficiency.
Goal: To identify natural groupings or clusters in data (e.g., market segments, patient
profiles).
Why: Enables targeted interventions, tailored marketing, or group-based decision-making.
Goal: To build models that can simultaneously predict several dependent variables based
on a set of predictors.
Why: Enhances predictive accuracy when outcomes are correlated.
Goal: To detect latent variables or underlying constructs from observed data (e.g., in
Factor Analysis).
Why: Useful in psychology, education, and marketing to understand abstract concepts like
intelligence, satisfaction, or brand perception.
Goal: To test whether groups differ across multiple outcomes at once (e.g., using
MANOVA instead of multiple ANOVAs).
Why: Increases statistical power and controls Type I error rate.
dimensions, the univariate normal distribution with mean and variance 2 has the probability
density function
2
1 x
1
f x e 2 ; x
2 2
2
1 x
ke 2 1
1
x 2
1
x
ke 2
where 2 is positive and k is chosen so that the integral of 1 over the entire x -axis is unity.
x 1 x
p 1
and the constant term becomes 2 2 2
x 1 x
1
f x1 , x2 ,
, xp
1
p
e 2 ; xi ; i 1, 2, , p
2
1
2 2
11 12 1 p
2 p
21 22
p1 p 2 pp
p p
1
1. Let X ~ N p , and Y 2
X then y1 , y2 , , yp will be independent standardized
normal variates.
2. Let X ~ N p , then aj xj will follow univariate normal distribution. Where
aj j 1, 2, , p are arbitrary constants and its mean and variance are respectively
where q p and x1 , x2 , , xq are subset of x . The necessary and sufficient condition for
Q
the solution of 0.
X
a X
8. If X ~ N p 0, I and ' a ' is a normal vector of order p then will be standardized
a a
univariate normal.
9. If
X ~ N p , 2 I and B q p is an orthogonal normal matrix i.e. BB I then
Bx ~ N p B , 2 I .
Many multivariate methods assume multivariate normality for mathematical tractability and
statistical inference. Examples include:
These techniques rely on the properties of the multivariate normal distribution for deriving
sampling distributions, test statistics, and confidence regions.
The multivariate normal distribution provides a complete description of the joint distribution of
multiple continuous variables using just:
The contours (level curves) of the multivariate normal distribution are ellipsoids. This geometric
interpretation is useful in:
Many estimation and inference procedures (e.g., Maximum Likelihood Estimation – MLE) in
multivariate analysis are based on the assumption of multivariate normality. For example:
These properties are essential in structural equation modeling, Bayesian inference, and
multivariate regression.
2
1 X
1
f x e 2
2
1
X
2
1 1
ke 2 Where, k and
2 2
1
and the scalar constant is replaced by a vector 2 . The positive constant can be
p
a11 a12 a1 p
a a22 a2 p
A
21
replaced by positive definite symmetric matrix
which is the variance
a p1 a p2 a pp
p p
X A X
1
f X1 ,
, X p ke 2
Since A is a positive definite symmetric matrix there exists a non-singular matrix C such that
C AC I
C11 C12 C1 p 1 0 0
C C2 p 0 1
C22 0
C I
21
where,
C p1 C p 2 C pp 0 0 1
X CY
X CY
X A X CY A CY
Y C ACY Since C AC I
Y Y
Now, X CY
X 1 1 C11 C12 C1 p Y1
X 2 2 C21 C22 C2 p Y2
X
X p p C p1 C p 2 C pp Yp
1 C11Y1 C12Y2 C1 pY p
2 C21Y1 C22Y2 C2 pY p
p C p1Y1 C p 2Y2 C ppY p
Since, C AC I
C AC I
C A C I
C A C I
2 1
C
A
1
C 1
A 2
1
k Y Y
A
1
2
e 2 dY1 dY p 1
1 1 1
k Y12 Y22 Yp2
A
1
2
2 dY
1 e e 2 dY
2 e 2 dY
p 1
1
1 Y2
1
p 1
Yi 2 e 2 dY
k
1
e 2 dYi 1
2
A 2 i 1
1
Y2
k
2
p
1
e 2 dY 2
1
A 2
p 1
Yi 2
p
A
1
2 e 2 dY
i 2
k p
i 1
2 2
1
X A X
1
A 2
f X1 , , Xp p
e 2 ; X
2 2
Therefore, the density function of a multivariate normal distribution with mean and variance
covariance matrix is given by
1
X 1 X
1
2
f X1 , , Xp p
e 2
2 2
X 1 X
1
f X1 , , Xp p
1
e 2 ; X
2
1
2 2
Characteristic Function:
The characteristic function of a random vector X is
t E eit X denoted by every real vector t .
E eit X e
it X
f X1 , , X p dX1 dX p ;
where, t t1 , t2 , , tp
E eit X E e
it CY
eit E eit CY
eit E eivY ; where v t C
We have,
1
1 X 1 X
f X p
e 2
2
1
2 2
Let , Q X 1 X
CY 1 CY
Y C 1CY
Y IY Y Y
Now, C 1C I
2
C
1
C 2
1 1
1 Y Y 1 Y Y
f Y p
e 2 C p
e 2
2 2 C 2 2
Now,
1
Y Y
E eivY eivY
1
p
e 2 dY
2 2
1 1 1 2
1 Y12 1 Y22 1 iv pYp 2 Yp
p
eiv1Y1 e 2 dY
1 p
eiv2Y2 e 2 dY
2 p
e e dY p
2 2
2 2 2 2
1 v1 2 v2 p vp
1 1 1
v12 v22 v 2p
e 2 e 2 e 2
1
vv
e 2
1
t CC t
e 2
1
t t
e 2
So, we have,
1
it t t
e 2
Proof:
Given that, Y DX f
E Y E DX f DE X f
E DX f DE X f DX f DE X f
E D X E X D X E X
Theorem:
Let X be distributed as N p , then Y CX is distributed according to N C , C C for C is a
non-singular matrix.
Proof:
Since C is non-singular, then C 1 exists.
Y CX
X C 1Y
dX C 1dY
X 1 X
1
1
f X ; , p
e 2
2
1
2 2
Let ,
Q X 1 X
C 1Y C 1C 1 C 1Y C 1C
C 1
Y C 1 C 1 Y C
Y C C 1 1C 1 Y C
Y C C C Y C
1
1 1
1
1
Y C C C Y C 2
f X ; , e 2
C C
p 1
2
1
2 2 2
1 1
1 Y C C C Y C
e 2
p
2 2 C C
1
2
Therefore, Y CX is distributed as N p C , C C .
each has distribution N p , , the joint density function of all the observations is the product of the marginal
distributions
1
n
exp xi 1 xi
1
f x1 , x2 , , xn p
2 2
1
i 1 2 2
n
Maximize the likelihood function is equivalent to minimize xi 1 xi , now
i 1
n n
xi 1 xi trace xi 1 xi
i 1 i 1
n
trace 1 xi xi
i 1
n
i 1
trace 1 xi xi
2
Again,
n n
xi xi xi x x xi x x
i 1 i 1
n n
xi x xi x x x cross product are zero
i 1 i 1
n
xi x xi x n x x 3
i 1
Substituting 3 in 2 we get,
n
n
xi 1 xi trace 1 xi x xi x n x x
i 1
i 1
n
trace 1 xi x xi x n trace 1 x x
i 1
n
trace 1 xi x xi x n x 1 x
i 1
Since 1 is positive definite, so the distance x 1 x 0 , under x . Thus the likelihood is
1 n
exp trace 1 xi x xi x n x x
1
L , np
2 i 1
2
n
2 2
1 n
exp trace 1 xi x xi x
1
L ˆ , np
2 i 1
2
n
2 2
Now we have to maximize this over , using the lemma, given a p p symmetric positive definite matrix B and
1 1
exp trace 1 B
b
2
1
B b
2b pb exp pb
Further we have,
1 n
exp trace 1 xi x xi x n x x
1
L , np
2 i 1
2
n
2 2
1 1 n
exp trace n 1 xi x xi x
1
L ˆ , np
2 n i 1
2
n
2 2
ˆ
L ˆ ,
np
1
n
np
exp
2
2 2 ˆ
2
ˆ
L ˆ ,
1
np
np 1
exp n
2 ˆ 2
2 2
ˆ constant generalized variance n 2
L ˆ ,
eij ~ N 0, 2 E ~ N 0, I
A more formal method for judging the joint normality of a data set is a 2 plot or gamma plot
which is based on the squared generalized distances.
d 2j X j X S 1 X j X ; j 1, 2, ,n 1
equation 1 .
ii) Order the squared distances in 1 from smallest to largest as d12 d 22 d n2 .
1 1 1
j 2 j 2 j
iii) Graph the pairs qc, p , d2
j where qc, p
is the 100
2
quantile of
n n n
1
j 2
iv) The quantile qc, p
is related to upper percentiles of a chi-squared distribution.
n
In particular,
The plot should be resemble a straight line through the origin having slope 1. a systematic curved
pattern suggests lack of normality.
Q Q Plots:
1 1 1
1 2 n
probability values 2 , 2 , ,
2
.
n n n