[go: up one dir, main page]

0% found this document useful (0 votes)
18 views18 pages

Multivariate Normal Distribution

The document explains the concepts of multivariable and multivariate analysis, highlighting their differences, such as the number of independent and dependent variables involved. It details the applications of multivariate analysis across various fields, including social sciences, medicine, education, and business, emphasizing its ability to analyze multiple outcomes simultaneously and understand complex relationships among variables. Additionally, it discusses the properties and importance of the multivariate normal distribution in statistical analysis, including its role in various statistical techniques and methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views18 pages

Multivariate Normal Distribution

The document explains the concepts of multivariable and multivariate analysis, highlighting their differences, such as the number of independent and dependent variables involved. It details the applications of multivariate analysis across various fields, including social sciences, medicine, education, and business, emphasizing its ability to analyze multiple outcomes simultaneously and understand complex relationships among variables. Additionally, it discusses the properties and importance of the multivariate normal distribution in statistical analysis, including its role in various statistical techniques and methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Multivariable

 Multiple independent variables (predictors).


 A model or function that includes more than one independent (input) variable but usually
only one dependent (outcome) variable.
 Example:
A regression model predicting blood pressure (dependent variable) using age, weight, and
exercise level (independent variables).

Blood Pressure  0  1  Age  2  Weight  3  Exercise

 Multiple regression, multivariable calculus.

Multivariate

 Multiple dependent variables (outcomes).


 A model that includes more than one dependent (response) variable and possibly multiple
independent variables.
 Example:
A study measuring the effect of diet and exercise on both blood pressure and cholesterol
levels (two dependent variables).
 Multivariate analysis (e.g., MANOVA, factor analysis, PCA, canonical correlation).

Feature Multivariable Multivariate


Variables type Multiple independent variables Multiple dependent variables
Outcome Single Multiple
Example Multiple regression MANOVA, PCA, factor analysis
Focus Explaining one outcome Explaining patterns in multiple
outcomes

Multivariate Analysis

Multivariate analysis refers to a set of statistical techniques used to analyze data that involves
more than one dependent (response) variable at the same time. It allows researchers to understand
relationships between multiple variables simultaneously and how they interact with each other.

Multivariate analysis in a broad sense is the set of statistical methods aimed simultaneously
analyze datasets. That is, for each individual or object being studied, analyzed several variables.
The essence of multivariate thinking is to expose the inherent structure and meaning revealed
within these sets if variables through application and interpretation of various statistical methods.

Statistical procedure for analysis of data involving more than one type of measurement or
observation. It may also mean solving problems where more than one dependent variable is
analyzed simultaneously with other variables.

Md Iftakhar Parvej, Department of Statistics, NSTU


Explanation:

 It’s used when a single analysis involves several variables of different types (e.g.,
continuous, categorical).
 It allows researchers to capture complex interrelationships between variables.
 It is especially useful when dependent variables are correlated and should not be analyzed
separately.

Real life example

 You have 100 patients for a medical study. You measure 10 different body characteristics
(e.g. height, weight, LDL cholesterol, etc) and then monitor each patient for 20 different
symptoms over the next 2 years. You would use multivariate analysis to see which groups
of body characteristics correlate with which sets of symptoms.

 Suppose a researcher studies the effect of a teaching method on students' performance.


The outcomes measured include test score, class participation, and self-reported
motivation. Instead of analyzing each of these outcomes separately, a multivariate analysis
(like MANOVA or multivariate regression) considers them together in one model.

Applications of Multivariate Analysis

Social Sciences and Psychology.

 Understanding behavioral patterns (e.g., depression, anxiety, and stress).


 Analyzing survey data with multiple questions measuring different traits.
 Factor analysis to identify underlying psychological constructs.
 Discriminant analysis for classifying individuals (e.g., by personality type or diagnosis).

Medical and Health Sciences

 Assessing multiple health outcomes (e.g., blood pressure, cholesterol, BMI).


 Clinical trials measuring treatment effects on various biological indicators.
 Patient clustering using multiple clinical features.
 Multivariate regression to control for multiple risk factors.

Education

 Evaluating student performance across multiple subjects or skills.


 Exploring the impact of teaching methods on several learning outcomes.
 Identifying learning styles using factor or cluster analysis.

Business and Marketing

 Customer segmentation using demographic, behavioral, and psychographic variables.


 Product positioning through perceptual mapping (e.g., via factor analysis).
Md Iftakhar Parvej, Department of Statistics, NSTU
 Forecasting sales and demand with multiple influencing variables.
 Analyzing customer satisfaction across several dimensions.

Economics and Finance

 Portfolio analysis using multiple financial indicators (e.g., risk, return, volatility).
 Market segmentation and investment profiling.
 Time series modeling involving multiple economic indicators (e.g., GDP, inflation,
interest rates).

Environmental and Agricultural Studies

 Analyzing climate data (temperature, humidity, rainfall).


 Soil classification using various physical and chemical properties.
 Evaluating crop performance under different treatment conditions.

Industrial and Quality Control

 Process optimization where multiple outcomes (e.g., strength, durability, cost) are tracked.
 Multivariate control charts for monitoring multiple product characteristics.
 Reducing dimensionality of sensor data or manufacturing parameters.

Machine Learning and Data Science

 Dimensionality reduction using PCA or t-SNE before modeling.


 Multivariate prediction models for complex outcomes (e.g., image tagging).
 Clustering and classification tasks involving multiple features.
 Feature extraction for complex datasets like text, image, or genomics.

Objectives of Multivariate Analysis

Analyze Multiple Dependent Variables Simultaneously

 Goal: To examine how several outcome variables change in response to one or more
predictors.
 Why: Many real-world situations involve more than one outcome (e.g., health indicators,
academic performance, customer satisfaction dimensions).

Understand Relationships among Variables

 Goal: To explore the correlation and interdependence among multiple variables (both
dependent and independent).
 Why: Helps uncover hidden patterns, shared variance, or redundancy among variables.

Reduce Dimensionality

 Goal: To simplify complex datasets by reducing the number of variables while retaining
most of the information (e.g., using Principal Component Analysis – PCA).
 Why: Makes analysis easier, reduces noise, and improves model efficiency.

Md Iftakhar Parvej, Department of Statistics, NSTU


Classify and Group Observations

 Goal: To identify natural groupings or clusters in data (e.g., market segments, patient
profiles).
 Why: Enables targeted interventions, tailored marketing, or group-based decision-making.

Predict Multiple Outcomes

 Goal: To build models that can simultaneously predict several dependent variables based
on a set of predictors.
 Why: Enhances predictive accuracy when outcomes are correlated.

Identify Underlying Structures or Factors

 Goal: To detect latent variables or underlying constructs from observed data (e.g., in
Factor Analysis).
 Why: Useful in psychology, education, and marketing to understand abstract concepts like
intelligence, satisfaction, or brand perception.

Test Hypotheses Involving Multiple Variables

 Goal: To test whether groups differ across multiple outcomes at once (e.g., using
MANOVA instead of multiple ANOVAs).
 Why: Increases statistical power and controls Type I error rate.

Support Complex Decision-Making

 Goal: To aid decision-making by incorporating several criteria or performance indicators


at once.
 Why: Crucial in multi-criteria analysis, policy research, and resource allocation.

Multivariate Normal Distribution


The multivariate normal density is a generalization of the univariate normal density to p  2

dimensions, the univariate normal distribution with mean  and variance  2 has the probability
density function

2
1  x 
1   
f  x  e 2   ;   x  
2 2
2
1  x 
  
 ke 2    1
 
1
 x     2
1
  x 
 ke 2

where  2 is positive and k is chosen so that the integral of 1 over the entire x -axis is unity.

Md Iftakhar Parvej, Department of Statistics, NSTU


The density function of multivariate normal distribution of X    x1 , x2 , , x p  has an analogous

form. For X    x1 , x2 , , x p  the exponent term can be written as

 x    1  x   
p 1
and the constant term becomes  2  2  2

Thus the density function of p-variate normal distribution is

 x    1  x   
1


f x1 , x2 , 
, xp 
1
p
e 2 ;    xi   ; i  1, 2, , p
 2 
1
2  2

we denote this p-dimensional normal density by N p   ,  

where    1 , 2 , , p  represent the expected value of the random vector X

  11  12 1 p 
  2 p 
21  22

 
 
 p1  p 2  pp 
p p

is the variance covariance matrix of X .

Properties of Multivariate Normal Distribution:

The important properties of multivariate normal distribution are given below

1
1. Let X ~ N p   ,   and Y   2
 X    then y1 , y2 , , yp will be independent standardized

normal variates.
2. Let X ~ N p   ,   then aj xj will follow univariate normal distribution. Where

aj  j  1, 2, , p are arbitrary constants and its mean and variance are respectively

aj xj and  a j ak jk .


j k

3. Let X ~ N p   ,   then the components x j  j  1, 2, , p of x will be jointly independent

iff the covariance of x j and xk  j  k  is zero.

4. Let X ~ N p   ,   and if x can be partitioned as follows

Md Iftakhar Parvej, Department of Statistics, NSTU


 x1 
x 
X  
2
 
 
 xq 

where q  p and x1 , x2 , , xq are subset of x . The necessary and sufficient condition for

all the subsets to be jointly independent is that the ij  0  i  j  .

5. Let X ~ N p   ,   and Q   X    1  X    is a quadratic form, then  can be obtained as

Q
the solution of 0.
X

6. Let X ~ N p   ,   and Y  AX  C where A is of order q  p and C is any vector of order q ,

then the distribution of Y will q-variate normal.

7. Let X ~ N p   ,   then  X    1  X    will be distributed as  2p .

a X
8. If X ~ N p  0, I  and ' a ' is a normal vector of order p then will be standardized
a a

univariate normal.
9. If 
X ~ N p  , 2 I  and B q p  is an orthogonal normal matrix i.e. BB  I then

 
Bx ~ N p B ,  2 I .

10. If X ~ N p   ,   and if B is an orthogonal matrix of order q  p , then Bx and  I  BB  x

will be independently distributed.


11. If X ~ N p   ,   then Ax and Bx will be independent if A  B  0 .

Discuss the roles of multivariate normal distribution in multivariate statistical analysis.

1. Theoretical Foundation for Many Techniques

Many multivariate methods assume multivariate normality for mathematical tractability and
statistical inference. Examples include:

 MANOVA (Multivariate Analysis of Variance)


 Hotelling’s T2 test
 Linear Discriminant Analysis (LDA)
 Canonical Correlation Analysis
 Multivariate Regression

These techniques rely on the properties of the multivariate normal distribution for deriving
sampling distributions, test statistics, and confidence regions.

Md Iftakhar Parvej, Department of Statistics, NSTU


2. Characterizes Joint Distributions

The multivariate normal distribution provides a complete description of the joint distribution of
multiple continuous variables using just:

 A mean vector (location),


 A covariance matrix (spread and correlation).

This simplicity is powerful for modeling and inference in high dimensions.

3. Elliptical Contours and Geometric Interpretation

The contours (level curves) of the multivariate normal distribution are ellipsoids. This geometric
interpretation is useful in:

 Data visualization (e.g., confidence ellipses),


 Discriminant analysis (linear boundaries arise when classes are normally distributed),
 Outlier detection (e.g., using Mahalanobis distance).

4. Basis for Likelihood-Based Methods

Many estimation and inference procedures (e.g., Maximum Likelihood Estimation – MLE) in
multivariate analysis are based on the assumption of multivariate normality. For example:

 Estimating means and covariance’s,


 Model comparisons using likelihood ratio tests.

5. Enables Inference under Known Properties

If data follow a multivariate normal distribution:

 Linear combinations of variables are also normally distributed,


 Marginal distributions are normal,
 Conditional distributions are also normal.

These properties are essential in structural equation modeling, Bayesian inference, and
multivariate regression.

6. Useful in Simulations and Data Generation

Multivariate normal distributions are widely used to:

 Generate synthetic data for simulation studies,


 Model uncertainties in multiple variables,
 Evaluate algorithms under controlled conditions.

Md Iftakhar Parvej, Department of Statistics, NSTU


Derive Multivariate Normal Distribution as an extension of the univariate normal
distribution.
The form of the multivariate normal distribution is easily obtained form that of the univariate
normal distribution. The p.d . f . of univariate normal distribution is

2
1  X  
1   
f  x  e 2  
 2
1
   X  
2
1 1
 ke 2 Where, k and 
 2 2

i.e. X ,  and  are scalars and k is constant.


 X1 
 
X
In case of multivariate normal distribution the scalar quantity X is replaced by a vector X   2 
 
Xp

 1 
 

and the scalar constant  is replaced by a vector    2  . The positive constant  can be
 
 p 

 a11 a12 a1 p 
a a22 a2 p 
A
21
replaced by positive definite symmetric matrix  
which is the variance
 
 a p1 a p2 a pp 
p p

covariance matrix and k is positive constant.

The square   X   2   X      X    is replaced by the quadratic form  X    A  X    .

Thus the density function of a p -variate normal distribution is

 X    A X   
1


f X1 , 
, X p  ke 2

Now we have to find out the value of k such that

Md Iftakhar Parvej, Department of Statistics, NSTU


 
 X    A X   
1

k  e 2 dX 1dX 2 dX p  1 1
 
 
 X    A X   
1
 1
   e 2 dX 1dX 2 dX p   k*  2
 
k

Since A is a positive definite symmetric matrix there exists a non-singular matrix C such that

C AC  I

 C11 C12 C1 p  1 0 0
C C2 p  0 1
C22 0 
C I 
21
where,
   
   
C p1 C p 2 C pp  0 0 1 

Now let us make a transformation

X    CY
 X    CY
  X    A  X      CY  A  CY 
 Y C ACY Since C AC  I
 Y Y

Now, X    CY
 X 1   1   C11 C12 C1 p   Y1 
      
X 2   2   C21 C22 C2 p   Y2 
 X   
      
      
 X p    p  C p1 C p 2 C pp   Yp 
 1   C11Y1  C12Y2   C1 pY p 
  
 2   C21Y1  C22Y2   C2 pY p 
 
   
   
  p  C p1Y1  C p 2Y2   C ppY p 

The Jacobian of the transformation is

Md Iftakhar Parvej, Department of Statistics, NSTU


X1 X 2 X p
Y1 Y1 Y1
 C11 C12 C1 p 
X1 X 2 X p C C22 C2 p 
 mod 
21
J  mod Y2 Y2 Y2  mod C
 
 
C p1 C p 2 C pp 
X1 X 2 X p
Yp Yp Yp

where “ mod C ” indicates the absolute value of the determinant of C .

Since, C AC  I
 C AC  I
 C A C  I
 C A C  I
2 1
 C 
A
1
 C  1
A 2

Now from the equation 1 we have,

  1
k  Y Y
  A
1
2
e 2 dY1 dY p  1
 

1  1  1
k  Y12  Y22  Yp2

A
1
2 

2 dY
1 e e 2 dY
2   e 2 dY
p 1
 

 1
1  Y2

 1
p 1
 Yi 2 e 2 dY

k
 1
e 2 dYi 1 
2
A 2 i 1 
 1
 Y2


k
 2 
p
1
  e 2 dY  2
1 
A 2
p  1

 
 Yi 2

p
A
1
2  e 2 dY
i  2
 k p
i 1 

 2  2

Therefore, the p -variate normal distribution can be written as

1
 X    A X   
1

 
A 2
f X1 , , Xp  p
e 2 ;   X  
 2  2

Md Iftakhar Parvej, Department of Statistics, NSTU


Again, C AC  I
  C  1 C ACC 1   C  1 IC 1
1
 A   CC  
 A   1

Therefore, the density function of a multivariate normal distribution with mean  and variance
covariance matrix  is given by

1
  X    1  X   
1

 
2
f X1 , , Xp  p
e 2

 2  2

 X    1  X   
1

 
f X1 , , Xp   p
1
e 2 ;   X  
 2 
1
2  2

Characteristic Function:
The characteristic function of a random vector X is
  t   E eit X  denoted by every real vector t .

 
E eit X    e
it X

f X1 , , X p dX1  dX p ; 
where, t   t1 , t2 , , tp 
 

Let us make a non-singular transformation X    CY such that

C  1C  I ; Since  is an orthogonal symmetric matrix

E eit X   E e 
it    CY  
 eit  E eit CY 
 
 eit  E eivY  ; where v   t C
We have,
1
1   X    1  X   
f X   p
e 2

 2 
1
2  2

Let , Q   X     1  X   

  CY   1  CY 
 Y C  1CY
 Y IY  Y Y

Md Iftakhar Parvej, Department of Statistics, NSTU


Jacobian of CY is C .

Now, C  1C  I
2
 C 
1
 C  2

The density function of Y is given by

1 1
1  Y Y 1  Y Y
f Y   p
e 2 C  p
e 2

 2  2 C  2  2

Now,
  1
 Y Y

E eivY     eivY
1
p
e 2 dY
   2  2

 1  1  1 2
1  Y12 1  Y22 1 iv pYp  2 Yp
  p
eiv1Y1 e 2 dY 
1  p
eiv2Y2 e 2 dY
2  p
e e dY p
 2  2
   2  2   2  2

 1  v1    2  v2   p vp  
1 1 1
 v12  v22  v 2p
 e 2 e 2 e 2
1
 vv
e 2

1
 t CC t
e 2

1
 t t
e 2

So, we have,

E eit X   eit  E eivY 


1
 t t
 eit  e 2

1
it   t t
 e 2

which is the characteristics function of the multivariate normal distribution.

Md Iftakhar Parvej, Department of Statistics, NSTU


Theorem:
If Y  DX  f where X is a random vector, then
E Y   D  E  X   f
V Y   DV  X  D 

Proof:
Given that, Y  DX  f

 E Y   E  DX  f   DE  X   f

V Y   E Y  E Y   Y  E Y  

 E  DX  f  DE  X   f   DX  f  DE  X   f 

 E  D  X  E  X   D  X  E  X 

 E  D  X  E  X   X  E  X  D 


 
 DE  X  E  X  X  E  X   D 
 
 DV  X  D   Pr oved 

Theorem:
Let X be distributed as N p   ,   then Y  CX is distributed according to N  C  , C C   for C is a

non-singular matrix.

Proof:
Since C is non-singular, then C 1 exists.

 Y  CX
 X  C 1Y
 dX  C 1dY

The Jacobian of the transformation is

Md Iftakhar Parvej, Department of Statistics, NSTU


1
J  C 1 
C
1
 2
C


C  C
1
 2
 1
C C  2

Now our density function is

 X    1  X   
1
1 
f  X ; ,   p
e 2

 2 
1
2  2

Let ,
Q   X     1  X   

 

 C 1Y  C 1C   1 C 1Y  C 1C   
 C 1
Y  C   1  C 1 Y  C 

 
 Y  C   C 1  1C 1 Y  C  

 Y  C    C C   Y  C  
1

1 1
1
1
 Y C    C C   Y C     2
 f  X ; ,   e 2 

 C C  
p 1
 2 
1
2  2 2

1 1
1  Y C    C C   Y C   
 e 2 
p
 2  2  C C  
1
2

Therefore, Y  CX is distributed as N p  C  , C C   .

Maximum Likelihood Estimates of Mean Vector and Covariance Matrix:

Suppose our sample of n observations on X distributed according to N   ,   is x1 , x2 , , xn where n  p or let

x1 , x2 , , xn ~ N p   ,   ; n  p . We have to estimate  and  . Since x1 , x2 , , xn are mutually independent and

each has distribution N p   ,   , the joint density function of all the observations is the product of the marginal

distributions

 1 
n
 exp    xi     1  xi    
1
f  x1 , x2 , , xn   p
 
 2  2
1
i 1 2  2

So the likelihood function of the following density

Md Iftakhar Parvej, Department of Statistics, NSTU


 1 n 
  xi    1  xi   
1
L  ,   np
exp   1
 2  2 
n
2  2 i 1 
n
  xi     1  xi   
np n 1
ln L   ,     ln  2   ln  1 
2 2 2 i 1

n
Maximize the likelihood function is equivalent to minimize   xi     1  xi    , now
i 1

n n
  xi    1  xi      trace  xi     1  xi   
 
i 1 i 1
n
  trace 1  xi    xi    
i 1

  n 
  i 1

 trace   1   xi    xi    

 2

Again,
n n
  xi    xi       xi  x  x    xi  x  x   
i 1 i 1
n n
   xi  x  xi  x     x    x     cross product are zero
i 1 i 1
n
   xi  x  xi  x   n  x    x     3
i 1

Substituting  3 in  2  we get,

n
   n
 
  xi    1  xi     trace 1   xi  x  xi  x   n  x    x    
  i 1 
i 1

  n 
 trace   1   xi  x  xi  x    n  trace   1  x    x    
  i 1   

  n 
 trace   1   xi  x  xi  x    n  x     1  x   
  i 1 

Since 1 is positive definite, so the distance  x     1  x     0 , under   x . Thus the likelihood is

maximized with respect to  at ˆ  x .

Again, using equation  2  and  3 in equation 1 we get,

 1   n  
exp   trace   1   xi  x  xi  x   n  x    x     
1
L  ,   np
 2   i 1  
 2 
n
2  2

 1   n  
exp   trace   1   xi  x  xi  x   
1
 L  ˆ ,    np
 2   i 1  
 2 
n
2  2

Now we have to maximize this over  , using the lemma, given a p  p symmetric positive definite matrix B and

scalar b  0 it follows that


1  1 
exp   trace  1 B  
b
2
1
  B b  
 2b  pb exp   pb 

Md Iftakhar Parvej, Department of Statistics, NSTU


 1 
n
for all positive definite  p p , with equality holding only for     B with b  and B    xi  x  xi  x  ,
n
 2b  2 i 1

the maximum occurs at


 
ˆ   1  n
 
  xi  x  xi  x 
 2  n  i 1
 2 
n
  xi  x  xi  x 
1

n i 1

So the maximum likelihood estimates of  and  are respectively


n
ˆ  1  x  x  x  x 
ˆ  x and   i
n i 1
i

Further we have,
 1   n  
exp   trace   1   xi  x  xi  x   n  x    x     
1
L  ,   np
 2   i 1  
 2 
n
2  2

 1   1 n  
exp   trace  n  1    xi  x  xi  x   
1
 L  ˆ ,    np
 2   n i 1  
 2 
n
2  2

 
ˆ 
 L ˆ , 
np
1
n
 np 
exp   
 2 
 2  2 ˆ

2

 
ˆ 
 L ˆ , 
1
np
 np  1
exp    n
 2  ˆ 2
 2  2

 
ˆ  constant   generalized variance  n 2
 L ˆ , 

i.e. maximum of the likelihood is inversely proportional to  generalized variance 


n
2 .

Difference between Univariate and Multivariate Normal Distribution.


Univariate Distribution Multivariate Distribution
When the correlation between the samples is Multivariate distribution is the distribution of
constant then univariate distribution is more the observations of several correlated random
applicable. variables.
The characteristic of univariate distribution is The characteristic of multivariate distribution is
that mean is used as a measure of location and that the mean vector, the set of standard
standard deviation as a measure of variation in deviations and the covariance matrix describe
the population. respectively the location, variability and
dependency in the population.
If X is distributed as univariate normal with If X is a p 1 vector of p random normal

Md Iftakhar Parvej, Department of Statistics, NSTU


mean  and variance  2 then variables then the p.d . f . of X is given by
1  X  
2
 X    1  X   
1
  1 
f X  
1
e 2  

;   X   f X   p
e 2 ;   X  
2  2  2
1
 2

In case of univariate analysis covariance never In multivariate analysis covariance exists.


arises.


eij ~ N 0,  2  E ~ N  0, I 

Discuss the steps of assessing normality of a multivariate normal distribution.

A more formal method for judging the joint normality of a data set is a  2 plot or gamma plot
which is based on the squared generalized distances.

  
d 2j  X j  X  S 1 X j  X  ; j  1, 2, ,n 1

where X1 , X 2 , , X n are the sample observation.

To construct the chi-square plot we follow the following steps:

i) At first we compute the generalized squared distances d 2j using the formula in

equation 1 .

ii) Order the squared distances in 1 from smallest to largest as d12  d 22   d n2 .

  1    1   1
   j  2      j  2    j 
iii) Graph the pairs  qc, p    , d2 
  j where qc, p   
 is the 100 
2
quantile of
  n    n  n
     
   

the chi-square distribution with p degrees of freedom.

 1 
  j  2  
iv) The quantile qc, p   
 is related to upper percentiles of a chi-squared distribution.
 n 
 

In particular,

Md Iftakhar Parvej, Department of Statistics, NSTU


 1   1 
  j  2    n  j  2  
  2   .
qc , p    p 
 n   n 
  
  

The plot should be resemble a straight line through the origin having slope 1. a systematic curved
pattern suggests lack of normality.

Q  Q Plots:

The steps leading to a Q  Q plot are as follows:

i) Order the original observations to get x1 , x 2 , , x n  and their corresponding

 1  1  1
1    2   n 
probability values  2  ,  2  , ,
2
.
n n n

ii) Calculate the standard normal quantities q1 , q 2 , , q n  and

iii) Plot the pairs of observations  q  , x   ,  q  , x   , ,  q  , x  


1 1 2 2 n n and examine the

straightness of the outcomes.

Md Iftakhar Parvej, Department of Statistics, NSTU

You might also like