0% found this document useful (0 votes)

118 views9 pages

Section 2 - Descriptive Multivariate Statistics

This document discusses descriptive multivariate statistics for analyzing a dataset with n observations and p variables. It defines measures of central tendency as the sample mean vector. Measures of dispersion are defined as the sample variance covariance matrix S, which contains the variances and covariances of the variables. The sample correlation matrix R contains the sample Pearson correlation coefficients between each pair of variables. These descriptive statistics provide important preliminary information for understanding the location, variation and correlation structure of multivariate data.

Uploaded by

Twinomugisha Morris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views9 pages

Section 2 - Descriptive Multivariate Statistics

Uploaded by

Twinomugisha Morris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Descriptive Multivariate Statistics

By
Dr.Richard Tuyiragize
School of Statistics and Planning
Makerere University

February 22, 2022

1 Introduction
Before embarking on any statistical analysis of the nxp multivariate data set, a preliminary
data analysis should be done. This includes computation of multivariate descriptive statistics
such as measure of location, measure of spread, sample covariance, and sample correlation
coefficient. Consider the following data set;

Variable1 Variable2 · · · Variablej · · · Variablep

Item1 x11 x12 ··· x1j ··· x1p

Item2 x21 x22 ··· x2j ··· x2p

.. .. .. ... .. ... ..
. . . . .

Itemn xn1 xn2 ··· xnj ··· xnp

2 Measures of central tendency

Measures of central tendency help you find the middle, or the average, of a data set.

For any variable xj , we compute the sample mean such that;

n
1X
x̄j = xij
n i=1
The measure of central tendency for the multivariate data set is defined as the vector of
sample means for all the p variables

 
x̄1
x̄2 
x̄ =  .. 
 
.
x̄p

called the Sample mean vector.

3 Measures of dispersion (variation)

Taking any one of the variables xj , a usual measure of dispersion on this variable is the
sample variance denoted as Sjj which is the sample mean of the squares deviation of the n
observations.

STA3120 1 Email:trgzrich@yahoo.com
n
1
(xij − x̄)2
P
Sjj = n−1
i=1
n
1
P
where x̄j = n
xij
i=1

n
1
P SS(xj )
Sjj = n−1
(xij − x̄j )(xij − x̄j ) = n−1
i=1

where SS = sum of squares of deviations

As an extension, taking any two variables in the multivariate data set, say xj and xk . A
measure of joint dispersion/variance is the sample co-variance demoted as Sjk . such that
n
1
P SCP (xj xk )
Sjk = n−1
(xij − x̄j )(xik − x̄k ) = n−1
i=1

where SCP = sum of cross products of deviations

The measure of dispersion for a multivariate data set is a square matrix of order p
 
V ar(x11 ) Cov(x1 x2 ) · · · · · · Cov(x1 xp )
 
 
Cov(x2 x1 ) V ar(x22 ) · · · · · · Cov(x1 xp )
 
S=
 

 .. .. .. ... .
.. 

 . . . 

 
Cov(xp x1 ) Cov(xp x2 ) · · · · · · Cov(xp xp )

Generally,  
S11 S12 · · · ··· S1p
 
 
S21 S22 · · · · · · S2p 
 
S=
 

 .. .. .. . . .. 
 . . . . . 
 
 
Sp1 Sp2 · · · · · · Spp
The sample variance covariance matrix can be expressed in vector terms;
n
1
P ′ 1
S= n−1
(xi − x̄)(xi − x̄) = n−1
.A
i=1

where;
xi is the ith row of the data set
x̄ is the sample mean vector
A is the sample sum of squares of cross products matrix (SSCP matrix)

STA3120 2 Email:trgzrich@yahoo.com
The determinant of the sample variance covariance matrix summarizes the dispersion and is
called generalized sample variance of the multivariate data.

Properties of matrix S

1. Its diagonal entries are variances and the off diagonal entries are covariances

2. If the p variables are all pairwise jointly uncorrelated, the off diagonal entries will be
zero

i.e. S = diagp (Sjj ), for all j = 1, 2, .........,p

3. For any two variables, xj and xk . If Cov(xj xk ) = Cov(xk xj ), the the matrix is sym-
metric

4. Given a sample size N , for N > P ; then the matrix S is always positive definite

5. x̄ and S are independently distributed

6. x̄ and S are jointly sufficient statistics for µ and Σ respectively.

S
7. x̄ is an unbiased estimator of µ and N −1
is an unbiased estimator of Σ
S
8. x̄ and N
are the maximum likelihood estimates (MLE) of µ and Σ respectively.
Σ
9. x̄ is distributed as Np (µ, N

10. The distribution of S is known as the Wishart distribution, denoted as Wp (n, Σ).
Where n = N − 1. Hence S is called the Wishart matrix. The Wishart distribution
is a generalization of the χ2 distribution i.e. S ∼ σχ2N −1

4 Measures of correlation
The correlation coefficient, denoted by r, is a measure of the strength of the straight-line or
linear relationship between two continuous variables. For any two variables, xj and xk , in
the multivariate data set, the sample pearson correlation coefficient is given by;

Sample Cov(xj xk ) S
Sample PCC, r = = √ jk√
Sample SD(xj )Sample SD(xk ) Sjj Skk

Note that for j = k, the r = 1

STA3120 3 Email:trgzrich@yahoo.com
As a measure of correlation for the nxp multivariate data set, we summarize the sample PCC
into a square matrix of order pxp, called the Sample correlation matrix, denoted as R;
 
1 r12 · · · · · · r1p
 
 
r21 1 · · · · · · r2p 
 
R=
 

 .. .. .. . . .. 
 . . . . . 
 
 
rp1 rp2 · · · · · · 1

Properties of R

1. If the p variables in the data are pairwise jointly uncorrelated then the off diagonal
entries will be zero. Hence R will take on the form;
R = Diagp (I), the identity matrix.

2. The matrix R is symmetric and always semi-positive definite,

′
i.e. X RX ≥ 0 ∀X ̸= 0

3. Given some sample covariance matrix S, then the corresponding sample correlation
matrix R is computed as;
R = D−1 SD−1 , where D = diagp ( Sjj )
p

Example For some bivariate data set, the sample covariance matrix has been found to be;

16 3
3 25
Compute the sample correlation coefficient, R

Solution
1
16 3 −1 −1 −1 0
D= ; R = D SD ; D = 4 1
3 25 3
1 1 5
0 16 3 0
R = D−1 SD−1 = 4 1 4
3 5 3 25 3 15
3

1 20
R= 3
20
1
Question Find the sample
 mean vector, covariance and correlation matrices for the following
4 1
data matrix. −1 3
3 5

STA3120 4 Email:trgzrich@yahoo.com
5 Random Vectors and Matrices
A random vector is a vector whose components are random variables and a random matrix
is a matrix whose components are random.

A linear array of p ≥ 2 random variables x1 , x2 , x3 , · · · , · · · , xp in the form of a column or

row i.e
 
x1
 x2 
′
x =  ..  or x = x1 , x2 , · · · , xn
 
.
xp

is called a p-dimensional random vector of p components.

If we have a column vector of p random components and row vector of q random components,
then pxq is a rectangular matrix, U =uij for i=1, 2, 3, ....., p and j=1, 2, 3, ....., q; where pq
elements uij ′ s are random variables.

For this course, we shall deal with a column or row vector X with p variables.

We shall take the p variables in the multivariate data set as realizations of some p random
variables x1 , x2 , x3 , · · · , · · · , xp , whose simultaneous probabilistic or stochastic behaviour we
need to investigate; and we take the p random variables as forming the row vector;
′
x = x1 , x2 , · · · , xp

5.1 Expectation of a random vector or matrix

′
Let x be an nxp random vector, i.e, x = x1 , x2 , · · · , xp . Then the mean vector is;

     
x1 E(x1 ) µ1
x2  E(x2 ) µ2 
E(x) = E  ..  =  ..  =  ..  =µ
     
.  .  .
xp E(xp ) µp

E(x) is a vector of expectations of the random variables.

5.2 Variance Covariance matrix of a random vector

For a univariate random variable x, a common measure of dispersion is the population
variance; where,

STA3120 5 Email:trgzrich@yahoo.com
σx2 = E(x − µ)2 for µ = E(x)

For two random variables x and y, a common measure of joint dispersion is the population
covariance,

σxy = E(x − µx )(y − µy )

The measure of dispersion for a p-variate random vector x is the population p-variate
variance-covariance matrix, denoted as Σ.

′
Σ = E(x − µ)(x − µ) where for µ = E(x)
 
x1 − µ 1
 x2 − µ 2 
Σ = E  ..  (x1 − µ1 )(x2 − µ2 ) · · · (xp − µp )
 
 . 
xp − µ p
 
(x1 − µ1 )2 (x1 − µ1 )(x2 − µ2 ) · · · · · · (x1 − µ1 )(xp − µp )
 
 
(x2 − µ2 )(x1 − µ1 )
 (x2 − µ2 )2 ··· · · · (x2 − µ1 )(x1 − µp )

=E
 

 .. .. .. .. .. 
 . . . . . 
 
 
2
(xp − µp )(x1 − µ1 ) (xp − µp )(x2 − µ2 ) · · · ··· (xp − µp )
 
V ar(x11 ) Cov(x1 x2 ) · · · · · · Cov(x1 xp )
 
 
Cov(x2 x1 ) V ar(x22 ) · · · · · · Cov(x2 xp )
 
=
 

 .. .. .. .. .. 
 . . . . . 
 
 
Cov(xp x1 ) Cov(xp x2 ) · · · · · · V ar(xpp )
 
σ11 σ12 · · · · · · σ1p
 
 
σ21 σ22 · · · · · · σ2p 
 
Σ=
 

 .. .. . . .. .. 
 . . . . . 
 
 
σp1 σp2 · · · · · · σpp

STA3120 6 Email:trgzrich@yahoo.com
6 Linear Combinations of Random Vectors
1. Univariate case:

E(a1 x1 ) = a1 E(x1 ) = a1 µ1
V ar(a1 x1 ) = a21 V ar(x1 ) = a21 σ11

2. Bivariate case:

Cov(a1 x1 , a2 x2 ) = a1 a2 Cov(x1 x2 ) = a1 a2 σ12

X1 X1 ′
Given X = ; a1 x1 + a2 x2 = (a1 , a2 ) =aX
X2 X2

µ
E(aX) = E(a1 x1 + a2 x2 )= a1 E(x1 ) + a2 E(x2 ) = a1 µ1 +a2 µ2 = (a1 , a2 ) 1
µ2
′
=⇒ E(aX) = a µ

V ar(aX) = V ar(a1 x1 + a2 x2 ) = V ar(a1 x1 ) + V ar(a2 x2 ) + Cov(a1 x1 a2 x2 )

= a21 σ11 + a22 σ22 + a1 a2 σ12

σ11 σ12 a1
=(a1 , a2 )
σ21 σ22 a2

′
=⇒ V ar(aX) = a Σa

3. Multivariate case:

′
If X a p-dimensional random vector and a R, then the linear combination a X is a
one-dimensional random variable. That is,
 
X1
 X2 
 . 
X =  .. 
 
 . 
 .. 
Xp

 
X1
 X2 
 . 
 
Linear combination: a1 X1 + a2 X2 + .... + ap Xp = a1 , a2 , ... ap  ..  = a X
′

 . 
 .. 
Xp

STA3120 7 Email:trgzrich@yahoo.com
′ ′ ′
E(a X) = a E(X) = a µ
′
V ar(aX) = a Σa
 
σ11 · · · σ1p
 .. .. 
Here Σ =  . . 
σp1 · · · σpp

4. Consider q linear combinations of p random variables.

p
P
Z1 = a11 X1 + a12 X2 + · · · + a1p Xp = a1j Xj = a1j X
j=1
Pp
Z2 = a21 X1 + a22 X2 + · · · + a2p Xp = a2j Xj = a2j X
j=1
..
.
p
P
Zq = aq1 X1 + aq2 X2 + · · · + aqp Xp = aqj Xj = aqj X
j=1

In matrix for:

 
  a11 a12 · · · ···
  a1p
Z1   Z1
Z 2    Z 2 
  a21 a22 · · · · · · a2p   
     
=  = ..  ⇐⇒ Z = AX
     
 ..  
 .   .. .. . . .
.   . 
. . ..   
   . . .
     
 
Zp Zp
ap1 p2 · · · · · · app

E(Z) = E(AX) = AE(X) = Aµ

′
Cov(Z) = Cov(AX) = AΣA

Example

Find the mean vector and covariance matrix for the linear combinations: Z1 = X1 − X2 and
Z2 = X1 + X2 .

1 −1 X1
Z= = AX
1 1 X2
1 −1 µ1 µ1 − µ2
E(Z) = AE(X) = Aµ = =
1 1 µ2 µ1 + µ2
′
Cov(Z)
=ACov(Z)A

1 −1 σ11 σ12 1 1 σ11 − 2σ12 + σ22 σ11 − σ22
= =
1 1 σ21 σ22 −1 1 σ11 − σ22 σ11 + 2σ12 + σ22

STA3120 8 Email:trgzrich@yahoo.com

1.12.2024-BSC-301-CSBS-class Note - 2024-25
No ratings yet
1.12.2024-BSC-301-CSBS-class Note - 2024-25
58 pages
MVA Section1 2012
No ratings yet
MVA Section1 2012
14 pages
HASTS215 - HSTS215 NOTES Chapter1 - 2
No ratings yet
HASTS215 - HSTS215 NOTES Chapter1 - 2
24 pages
Sst414-Leson 1
No ratings yet
Sst414-Leson 1
6 pages
HASTS215 - HSTS215 NOTES Chapter3
No ratings yet
HASTS215 - HSTS215 NOTES Chapter3
7 pages
Random Vectors Explained
No ratings yet
Random Vectors Explained
7 pages
Random Vectors
No ratings yet
Random Vectors
9 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
Chapter 1
No ratings yet
Chapter 1
65 pages
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
No ratings yet
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
84 pages
Data Reduction or Structural Simplification
No ratings yet
Data Reduction or Structural Simplification
44 pages
Stat
No ratings yet
Stat
53 pages
Multivariate Analysis (Slides 2)
No ratings yet
Multivariate Analysis (Slides 2)
25 pages
Lecture Note On PCA1
No ratings yet
Lecture Note On PCA1
26 pages
Summarystatistics - PPTX 0
No ratings yet
Summarystatistics - PPTX 0
17 pages
Face Recognition Using PCA
No ratings yet
Face Recognition Using PCA
23 pages
Covariance Matrix (W Krzanowski)
No ratings yet
Covariance Matrix (W Krzanowski)
5 pages
Pca Tutorial
No ratings yet
Pca Tutorial
27 pages
Characterising and Displaying Multivariate Data
100% (1)
Characterising and Displaying Multivariate Data
15 pages
1 Introduction
No ratings yet
1 Introduction
41 pages
MVA Part I
No ratings yet
MVA Part I
39 pages
Handout 2 Multivariate
No ratings yet
Handout 2 Multivariate
10 pages
Slideset 2
No ratings yet
Slideset 2
63 pages
Prs l5
No ratings yet
Prs l5
24 pages
Principal Components
No ratings yet
Principal Components
22 pages
Simple Algebra
No ratings yet
Simple Algebra
5 pages
Note 1
No ratings yet
Note 1
5 pages
Lecture Note5
No ratings yet
Lecture Note5
53 pages
S M S T C Inverse Problems Lecture 4
No ratings yet
S M S T C Inverse Problems Lecture 4
47 pages
Tutorial On PCA
No ratings yet
Tutorial On PCA
27 pages
Applied Multivariate Analysis: Frequently Asked Questions
No ratings yet
Applied Multivariate Analysis: Frequently Asked Questions
7 pages
AIML
No ratings yet
AIML
14 pages
Sst304 Lesson 1
No ratings yet
Sst304 Lesson 1
8 pages
CPL Practical 1
No ratings yet
CPL Practical 1
14 pages
The Scalar Algebra of Means, Covariances, and Correlations
No ratings yet
The Scalar Algebra of Means, Covariances, and Correlations
21 pages
Stat 1
No ratings yet
Stat 1
6 pages
Chapter 1 Introduction and Review
No ratings yet
Chapter 1 Introduction and Review
43 pages
Activity in English III Court
No ratings yet
Activity in English III Court
3 pages
STAT456 Study Guide
No ratings yet
STAT456 Study Guide
31 pages
Maths PCA
No ratings yet
Maths PCA
28 pages
Econometrics Basics for Students
No ratings yet
Econometrics Basics for Students
42 pages
PSF - Week8 - Samp - pdf-BIVARIATE NORMAL DISTRIBUTION
No ratings yet
PSF - Week8 - Samp - pdf-BIVARIATE NORMAL DISTRIBUTION
25 pages
Stat
No ratings yet
Stat
43 pages
MLXX (Dimensionality Reduction) - 1
No ratings yet
MLXX (Dimensionality Reduction) - 1
70 pages
Advanced Multivariate Statistics
No ratings yet
Advanced Multivariate Statistics
13 pages
4-Lecture 04
No ratings yet
4-Lecture 04
34 pages
AIML Module - 4
No ratings yet
AIML Module - 4
25 pages
Unit 19
No ratings yet
Unit 19
16 pages
Numerical Data Analysis Methods
No ratings yet
Numerical Data Analysis Methods
32 pages
Measures of Variation Guide
No ratings yet
Measures of Variation Guide
26 pages
MTL766 5
No ratings yet
MTL766 5
15 pages
Week 6
No ratings yet
Week 6
3 pages
PCA for Advanced Statistics Students
No ratings yet
PCA for Advanced Statistics Students
41 pages
STAT3006 Lecture Notes 2021 Aug8 2021
No ratings yet
STAT3006 Lecture Notes 2021 Aug8 2021
110 pages
Week 3
No ratings yet
Week 3
3 pages
Week 3
No ratings yet
Week 3
3 pages
Week 3 - Notes
No ratings yet
Week 3 - Notes
3 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
SCM - CH5 Network Design in SC
No ratings yet
SCM - CH5 Network Design in SC
45 pages
Physics: Lab - Convex Lenses Name
No ratings yet
Physics: Lab - Convex Lenses Name
2 pages
Kubernetes Native Microservices With Quarkus and MicroProfile 1st Edition John Clingan Ken Finnigan All Chapters Available
No ratings yet
Kubernetes Native Microservices With Quarkus and MicroProfile 1st Edition John Clingan Ken Finnigan All Chapters Available
125 pages
Grade 3: Daily Routines & Time
No ratings yet
Grade 3: Daily Routines & Time
5 pages
10 Leadership Styles You Should Know Final
100% (1)
10 Leadership Styles You Should Know Final
13 pages
Accepted Manuscript: 10.1016/j.jlp.2017.09.011
No ratings yet
Accepted Manuscript: 10.1016/j.jlp.2017.09.011
24 pages
Impaired Verbal Communication Care Plan
100% (1)
Impaired Verbal Communication Care Plan
6 pages
Activity - Chapter 8
No ratings yet
Activity - Chapter 8
3 pages
Essentials of Psychology Concepts and Applications 5th Edition Textbook
0% (2)
Essentials of Psychology Concepts and Applications 5th Edition Textbook
13 pages
AO Recon PR Hip Knee Online 11.20
No ratings yet
AO Recon PR Hip Knee Online 11.20
16 pages
ZLDH
No ratings yet
ZLDH
3 pages
(Ebook) DeathQuest 3: An Introduction To The Theory and Practice of Capital Punishment in The United States by Robert M. Bohm ISBN 9781593453152, 1593453159 Instant Download
100% (1)
(Ebook) DeathQuest 3: An Introduction To The Theory and Practice of Capital Punishment in The United States by Robert M. Bohm ISBN 9781593453152, 1593453159 Instant Download
135 pages
5 Tenses
No ratings yet
5 Tenses
3 pages
Infection Control - Nclex
No ratings yet
Infection Control - Nclex
4 pages
37 Brett
No ratings yet
37 Brett
8 pages
SUN2000-215KTL-H3 Output Characteristics Curve Version V2 - (202001126)
No ratings yet
SUN2000-215KTL-H3 Output Characteristics Curve Version V2 - (202001126)
6 pages
Abstract Shape Class with Subclasses
No ratings yet
Abstract Shape Class with Subclasses
4 pages
Nissens TechnicalBulletin AC System Diagnostics+-+Operating+Pressures+R134A
No ratings yet
Nissens TechnicalBulletin AC System Diagnostics+-+Operating+Pressures+R134A
1 page
Awesome Opposites
No ratings yet
Awesome Opposites
4 pages
Green Building Site Planning Guide
No ratings yet
Green Building Site Planning Guide
10 pages
Disability & Rehabilitation Volume 23 issue 13 2001 [doi 10.1080%2F09638280010029930] James, S. E. Farmer, M. -- Contractures in orthopaedic and neurological conditions- a review of causes and treatme (1).pdf
No ratings yet
Disability & Rehabilitation Volume 23 issue 13 2001 [doi 10.1080%2F09638280010029930] James, S. E. Farmer, M. -- Contractures in orthopaedic and neurological conditions- a review of causes and treatme (1).pdf
10 pages
BEAMANAL (Metric) Copie
No ratings yet
BEAMANAL (Metric) Copie
19 pages
MKT 412 Fall 2021 Exam
No ratings yet
MKT 412 Fall 2021 Exam
3 pages
Comprehensive Guide to Basic Hematology
No ratings yet
Comprehensive Guide to Basic Hematology
89 pages
Under Graduate Medical / Dental Seats For Deemed Universities
No ratings yet
Under Graduate Medical / Dental Seats For Deemed Universities
1 page
G7112 BUser
No ratings yet
G7112 BUser
223 pages
Baseball and Cultural Insights
No ratings yet
Baseball and Cultural Insights
4 pages
DARK EDEN - GENESIS (Text Only)
No ratings yet
DARK EDEN - GENESIS (Text Only)
29 pages
DGC-8 63 2017-07h 8075215g1
No ratings yet
DGC-8 63 2017-07h 8075215g1
4 pages
Track Feeding Transformers For Growing Rail Network
No ratings yet
Track Feeding Transformers For Growing Rail Network
18 pages

Section 2 - Descriptive Multivariate Statistics

Uploaded by

Section 2 - Descriptive Multivariate Statistics

Uploaded by

Descriptive Multivariate Statistics

February 22, 2022

Variable1 Variable2 · · · Variablej · · · Variablep

Item2 x21 x22 ··· x2j ··· x2p

Itemn xn1 xn2 ··· xnj ··· xnp

2 Measures of central tendency

For any variable xj , we compute the sample mean such that;

called the Sample mean vector.

3 Measures of dispersion (variation)

where SS = sum of squares of deviations

where SCP = sum of cross products of deviations

i.e. S = diagp (Sjj ), for all j = 1, 2, .........,p

5. x̄ and S are independently distributed

6. x̄ and S are jointly sufficient statistics for µ and Σ respectively.

Note that for j = k, the r = 1

2. The matrix R is symmetric and always semi-positive definite,

A linear array of p ≥ 2 random variables x1 , x2 , x3 , · · · , · · · , xp in the form of a column or

is called a p-dimensional random vector of p components.

5.1 Expectation of a random vector or matrix

E(x) is a vector of expectations of the random variables.

5.2 Variance Covariance matrix of a random vector

σxy = E(x − µx )(y − µy )

Cov(a1 x1 , a2 x2 ) = a1 a2 Cov(x1 x2 ) = a1 a2 σ12

V ar(aX) = V ar(a1 x1 + a2 x2 ) = V ar(a1 x1 ) + V ar(a2 x2 ) + Cov(a1 x1 a2 x2 )

= a21 σ11 + a22 σ22 + a1 a2 σ12

4. Consider q linear combinations of p random variables.

E(Z) = E(AX) = AE(X) = Aµ

You might also like