Class 4 Part 2 PDF

Principal Component Analysis & Independent Component Analysis
Overview
Principal Component Analysis
•Purpose
•Derivation
•Examples
Independent Component Analysis

•Generative Model
•Purpose
•Restrictions
•Computation
•Example
Literature
Fall 2004 Pattern Recognition for Vision

Notation & Basics
u Vector
U Matrix
uT v, u, v ∈ R N Dot product written
as matrix product
ua Product of a row vector with
scalar as matrix product, and not au
2
u = u = uT u squared norm
2
Rules for matrix multiplication:

UVW = (UV ) W = U (VW )
(U + V ) W = UW + VW
(UV )T = V T UT
Principal Component Analysis (PCA)
Purpose
For a set of samples of a random vector
x ∈ R N , discover or reduce the dimensionality and
identify meaningful variables.
u2 u1
u1
y = Ux, U : p × N , p < N

PCA by Variance Maximization
Find the vector u1 ,such that the variance of the data along this
uB
direction is maximized: uA
σ u2 = E {(u1T x) 2 } , E {x} = 0, u1 = 1
1
σ u2 = E {(u1T x)(xT u1 )} = u1T E {xxT } u1 ,

1
σ u2 > σ u2
σ u2 = u1T Cu1 ,
1
C = E {xxT } A B
The solution is the eigenvector e1of C with the largest

eigenvalue λ1.
Ce1 = e1λ1 , λ1 = e1T Ce1 ⇔ σ u21 = λ1

PCA by Variance Maximization

For a given p < N , find p orthonormal basis vectors ui
such that the variance of the data along these vectors
is maximally large, under the constraint of decorrelation:
E {(uTi x)(uTn x)} = 0, uTi u n = 0 n ≠ p
The solution are the eigenvectors of C ordered according to
decreasing eigenvalues λ :
u1 = e1 , u 2 = e 2 ,..., u p = e p , λ1 > λ2 ... > λ p
Proof of decorrelation for eigenvectors:
E {(eTi x)(eTn x)} = eTi E {xxT } e n = eTi Ce n = eTi e n λn = 0
N
orthogonal

PCA by Mean Square Error Compression

For a given p < N , find p orthonormal basis vectors such that
the mse between x and its projection xˆ into the subspace spanned
by the p orthonormal basis vectors is mimimum:
{ }
p
, xˆ i = ∑ u k ( xTk u k ), uTk u m = δ k ,m
2
mse = E x − xˆ
k =1
uk
x
x̂

 2

{ 
} 
p
mse = E x − x = E  x − ∑ u k (x u k ) 
2 T
ˆ

 k =1
scalar 
{ }  p T   
p p
= E x − 2 E ∑ x u k (x u k )  + E ∑ (x u n )u n ∑ u k (x u k ) 
2 T T T T
 k =1   n =1 k =1 
=E x{ } 2  p T 2
 k =1 
 p T 2
− 2 E ∑ ( x u k )  + E ∑ ( x u k ) 
 k =1 
=E x{ } 2  p T 2
− E ∑ ( x u k ) 
 k =1 
p
= trace ( C ) − ∑ uTk Cu k , C = E {xxT }
k =1

P
mse = trace ( C ) − ∑ uTk Cu k , C = E {xxT }

k =1

maximize
Solution to minimizing mse is any (orthonormal) basis

of the subspace spanned by the p first eigenvectors
e1 ,..., e p of C.
p N
mse = trace ( C ) − ∑ λk = ∑λ k
k =1 k = p +1
The mse is the sum of the eigenvalues corresponding to

N
the discarded eigenvectors e P +1 ,...,e N of C : mse = ∑λ
k = p +1
k

How to determine the number of principal components p?
Linear signal model with unknown number p < N of signals:
 a1,1 a1, p 
 
x = As + n, A =  %  N× p
a a 
 N ,1 N,p 
Signal si have 0 mean and are uncorrelated, n is white noise:

E {ssT } = I, E {nnT } = σ n2 I
C = E {xxT } = E {As( As)T } + E {nnT } + E {AsnT }

=0
= A E {ssT } AT + E {nnT } = AAT + σ n2 I

d1 > d 2 > ... > d p > d p +1 = d p + 2 = ... = d N = σ n2
→ cut off when eigenvalues become constants
Computing the PCA
Given a set of samples {x1 ,..., x M } of a random vector x

calculate mean and covariance.
1 M
µ = ∑
M i =1
xi , x → x − µ
M
1
C =
∑
M i =1
( x i −
µ )( x i −
µ )T
e.g. with QR algorithm

Compute eigenvecotors of C

Computing the PCA
If the number of samples M is smaller than the

dimensionality N of x :
 x1,1 x1, M 
 
B= %  , C = BB T
, B : N × M , B T
:M ×N
x xN , M 
 N ,1 
BBT e = eλ
BT Be′ = e′λ ′
BBT (Be′) = (Be′)λ ′ → Reducing complexity from
e = Be′, λ ′ = λ O( N 2 ) to O( M 2 )

Examples
Eigenfaces for face recognition (Turk&Pentland):
Training:
-Calculate the eigenspace for all faces in the training database
-Project each face into the eigenspace → feature reduction
Classification:
-Project new face into eigenspace
-Nearest neighbor in the eigenspace

Examples cont.
Feature reduction/extraction
Original Reconstruction with 20 PC
http://www.nist.gov/

Independent Component Analysis (ICA)
Generative model
Noise free, linear signal model:

N
x = As = ∑ ai si
i =1
x = ( x1 ,..., xN ) Observed variables

T
s = ( s1 ,..., sN ) latent signals, independent components

T
 a1,1 a1, N 
 
A= %  unknown mixing matrix, a i = ( a1,i ,..., a N ,i )T
a a 
 N ,1 N , N 

Task
For the linear, noise free signal model, compute A and s
given the measurements x.
X1
S1
S1
X3
Blind source separation :
separate the three original signals
S1
S2
s1 ,s2 , and s3 from their mixtures

X2
x1 ,x2 , and x3 .
1
S
S3
Figure by MIT OCW.

Restrictions
1.) Statistical independence

The signals si must be statistically independent:
p( s1 , s2 ,..., sN ) = p1 ( s1 ) p2 ( s2 )... pN ( sN )
Independent variables satisfy:
E { g1 ( s1 ) g 2 ( s2 )...g N ( sN )} = E { g1 ( s1 )} E { g 2 ( s2 )} ...E { g N ( sN )}
for any gi ( s ) ∈ L1
∞ ∞
E { g1 ( s1 ) g 2 ( s2 )} = ∫ ∫ g1 ( s1 )g 2 ( s2 ) p ( s1 , s2 )ds1ds2
−∞ −∞
∞ ∞
= ∫ g1 ( s1 ) p ( s1 )ds1 ∫ g 2 ( s2 ) p ( s2 )ds2 =E { g1 ( s1 )} E { g 2 ( s2 )}
−∞ −∞

Restrictions
Statistical independence cont.
E { g1 ( s1 ) g 2 ( s2 )...g N ( sN )} = E { g1 ( s1 )} E { g 2 ( s2 )} ...E { g N ( sN )}
Independence includes uncorrelatedness:
E {( si − µi )( sn − µ n )} = E {( si − µi )} E {( sn − µ n )} = 0

Restrictions
2.) Nongaussian components
The components si must have a nongaussian distribution
otherwise there is no unique solution.
Example:
σ2
given A and two gaussian signals:
2 2
σ1
1 s s
p( s1 , s2 ) = exp(− 2 − 2 )
1 2
2πσ 1 σ 2 2σ 1 2σ 2 1
generate new signals 1
1/ σ 1 0  1 s1′2 s′22
s′ =   s ⇒ p( s1′, s2′ ) = exp(− ) exp(− )
0 1/ σ 2  2π 2 2

Scaling matrix S

Restrictions
Nongaussian components cont.
under rotation the components remain independent:

 cos ϕ sin ϕ  1 s1′′2 s′′2 2
s′′ =   s′, p( s1′′, s2′′) = exp(− ) exp(− )
− sin ϕ cos ϕ 

2π 2 2
Rotation matrix R
combine whitening and rotation B = RS :

x = As = AB −1s′′
AB −1 is also a solution to the ICA problem.

Restrictions
3.) Mixing matrix must be invertible

The number of independent components is equal to
the number of observerd variables.
Which means that there are no redundant mixtures.
In case mixing matrix is not invertible apply PCA

on measurements first to remove redundancy.

Ambiguities
1.) Scale
N N
1
x = ∑ ai si = ∑ ( ai )(α i si )
i =1 i =1 αi
Reduce ambiguity by enforcing E {si 2 } = 1
2.) Order
We cannot determine an order of the independant
components

Computing ICA
a) Minimizing mutual information:
ˆ −1x
sˆ = A
N
Mutual information: I (sˆ ) = ∑ H ( sˆi ) − H (sˆ )
i =1
H is the differential etnropy:

H (sˆ ) = − ∫ p(sˆ ) log 2 ( p(sˆ )) dsˆ
I is always nonnegative and 0 only if the sˆi are
independent.
ˆ −1 such that I (sˆ ) is minimized.
Iteratively modify A

Computing ICA cont.
b) Maximizing Nongaussianity
s = A −1x
introduce y and b: y = bT x = bT As = qT s
From central limit theorem:
y = qT s is more gaussian than any of the si and becomes
least gaussian if y = si .
Iteratively modify bT such that the "gaussianity"
of y is minimized. When a local minimum is reached,
bT is a row vector of A −1.

PCA Applied to Faces
x1,1 x1,M
…
xN ,1 xN , M
x1 xM
Each pixel is a feature, each face image a point in the feature space.
Dimension of feature vector is given by the size of the image.
x3 u2
ui are the eigenvectors which can be x1 xM
represented as pixel images in the original u1 x
2
cooridinate system x1...xN
x1
ICA Applied to Faces
x1,1 xN ,1
x1 xN
…
S1
S1
x1, M xN , M
Now each image corresponds to a particular observed variable

measured over time (M samples). N is the number of images.
N
x = As = ∑ ai si
i =1
x = ( x1 ,..., xN ) Observed variables

T
s = ( s1 ,..., sN ) latent signals, independent components

T

PCA and ICA for Faces
Features for face recognition
Image removed due to copyright considerations. See Figure 1 in: Baek, Kyungim et. al.
"PCA vs. ICA: A comparison on the FERET data set." International Conference of Computer Vision,
Pattern Recognition, and Image Processing, in conjunction with the 6th JCIS. Durham, NC, March 8-14 2002, June 2001.

Class 4 Part 2 PDF

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Class 4 Part 2 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Class 4 Part 2 PDF

Uploaded by

Copyright:

Available Formats

Principal Component Analysis & Independent Component Analysis

Independent Component Analysis

Fall 2004 Pattern Recognition for Vision

Rules for matrix multiplication:

Fall 2004 Pattern Recognition for Vision

PCA by Variance Maximization

σ u2 = E {(u1T x)(xT u1 )} = u1T E {xxT } u1 ,

The solution is the eigenvector e1of C with the largest

Fall 2004 Pattern Recognition for Vision

PCA by Variance Maximization

Fall 2004 Pattern Recognition for Vision

PCA by Mean Square Error Compression

Fall 2004 Pattern Recognition for Vision

Fall 2004 Pattern Recognition for Vision

Solution to minimizing mse is any (orthonormal) basis

The mse is the sum of the eigenvalues corresponding to

Fall 2004 Pattern Recognition for Vision

Signal si have 0 mean and are uncorrelated, n is white noise:

= A E {ssT } AT + E {nnT } = AAT + σ n2 I

Computing the PCA

Given a set of samples {x1 ,..., x M } of a random vector x

e.g. with QR algorithm

Fall 2004 Pattern Recognition for Vision

Computing the PCA

If the number of samples M is smaller than the

Fall 2004 Pattern Recognition for Vision

Eigenfaces for face recognition (Turk&Pentland):

Fall 2004 Pattern Recognition for Vision

Fall 2004 Pattern Recognition for Vision

Noise free, linear signal model:

x = ( x1 ,..., xN ) Observed variables

s = ( s1 ,..., sN ) latent signals, independent components

Fall 2004 Pattern Recognition for Vision

s1 ,s2 , and s3 from their mixtures

Figure by MIT OCW.

Fall 2004 Pattern Recognition for Vision

1.) Statistical independence

Fall 2004 Pattern Recognition for Vision

Statistical independence cont.

Fall 2004 Pattern Recognition for Vision

Fall 2004 Pattern Recognition for Vision

Nongaussian components cont.

under rotation the components remain independent:

combine whitening and rotation B = RS :

Fall 2004 Pattern Recognition for Vision

3.) Mixing matrix must be invertible

In case mixing matrix is not invertible apply PCA

Fall 2004 Pattern Recognition for Vision

Fall 2004 Pattern Recognition for Vision

H is the differential etnropy:

Fall 2004 Pattern Recognition for Vision

Computing ICA cont.

Fall 2004 Pattern Recognition for Vision

Now each image corresponds to a particular observed variable

x = ( x1 ,..., xN ) Observed variables

s = ( s1 ,..., sN ) latent signals, independent components

Fall 2004 Pattern Recognition for Vision