Factor Analysis
Factor Analysis
1
From PCA to factor analysis
2
First story: Measurement error
Suppose the the numbers we write down into our X matrix aren’t
accurate.
[That is, there is measurement error in the explanatory variables]
This is odd, as we want to keep the signal variance, but would like
to get rid of the noise variance.
3
First story: Measurement error
Xj = µj + j
|{z} |{z}
signal noise
4
Second story: Maintaining correlation
5
Example of a latent variable
Suppose I record all the driving routes you take from your house to
another location and then back to your house
(Imagine the destination location isn’t recorded)
Suppose one group of routes all ended near a grocery store. Then I
could say there is a latent variable for some of your movements
that was “going to get food/supplies”
6
Roots of factor analysis: Causal discovery
7
Roots of factor analysis: Causal discovery
Spearman’s model was
Xij = Gi Wj + ij
where
• i indexes children in the study
• j indexes school subjects
• ij is uncorrelated noise
• Gi is the i th child’s intelligence value. Think of G as a
function evaluated at that person, giving her some amount of
intelligence
• Wj is the amount subject j is influenced by G .
The idea: Given a value of intelligence Gi , a person’s grades are
uncorrelated and merely a (random) function of how much
intelligence affects achievement in a subject
8
Roots of factor analysis: Causal discovery
Spearman and other researchers decided this meant that one factor
wasn’t enough. Thus factor analysis was born.
9
Factor analysis (FA)
where
• the factors Fk are mean zero, and unit variance
• ij are uncorrelated with each other and the factors Fk .
• The X·j are mean zero
(That is; the covariates have already had their mean subtracted off: i.e: X − X)
10
Factor analysis (FA)
K
X
Xij = Fik Wkj + ij ⇔ X = FW +
k=1
Some terminology:
• The Fik are the factor scores of the i th observation on the k th
factor
• The Wkj are the factor loadings of the j th explanatory variable
on the k th factor.
11
There’s a problem...
Reminder: we call a matrix O orthogonal if its inverse is its
transpose:
O > O = I = OO >
[think about the SVD, where X = UDV > and V > V = I ]
13
Factor analysis model
K
X
σj2 = Var(Xij ) = Wkj2 + ψj
|{z}
k=1
| {z } Error variance
Factors’ variance
14
Factor analysis model
Note that additionally,
K
X
σjl2 = Cov(Xij , Xil ) = Wkj Wkl
k=1
This implies
σ12 2
··· σ1p
Σ=
..
.
2
σp1 · · · σp 2
PK 2
PK
k=1 Wk1 + ψ1 · · · k=1 Wk1 Wkp
..
=
.
PK PK 2
k=1 Wkp Wk1 ··· k=1 Wkp + ψp
>
=W W +Ψ
Σ = W >W + Ψ
1 >
(One note: we never observe Σ. We need to estimate it with Σ̂ = n
X X)
16
Find Ψ & W : Maximum likelihood
In order to get at the W , we need a good estimate of Ψ
which implies
i.i.d.
Xi· ∼ N(0, Ψ + W > W )
Hence, the closer Ψ + W > W is to Σ̂, the higher the likelihood.
(The details of the maximization are tedious and we won’t discuss them here)
(Note that implicit in ML is a likelihood ratio test for K . We will discuss this later
when covering methods for choosing K )
17
Find Ψ & W : Principal factor analysis
Principal factor analysis (PA) has strong ties to PCA. The
difference is:
1. Guess a Ψ̂
2. Form the reduced covariance matrix Σ̂∗ = Σ̂ − Ψ̂. Hence, the
diagonals are the communalities we wish to estimate.
(Remember, we are trying to solve Σ̂ = W > W + Ψ)
5. Re-estimate Ψ̂ = Σ̂ − Ŵ > Ŵ
6. Keep returning to step 1. until Ψ̂ and Ŵ don’t change much
between cycles
18
What about estimating F ?
In PCA, arguably the most important quantities were the principal
component scores given by UD.
19
Uses of factor analysis
Usually, you split your data in two parts and run EFA on one part
and test your conclusion using CFA on the other part
20
Factor analysis in R
Like usual, someone has done the heavy lifting for you and made a
nice R package called psych
library(psych)
Xcorr = cor(X)
out = fa(r=Xcorr,nfactors=nfactors,
rotate=‘none’,fm=‘pa’)
21
method parameter in fa
My recommendation:
22
Let’s look at an example
library(psych)
X = read.table(’../data/gradesFA.txt’,header=T)
Xcorr = cor(X)
> round(Xcorr,2)
BIO GEO CHEM ALG CALC STAT
BIO 1.00 0.66 0.74 -0.02 0.27 0.09
GEO 0.66 1.00 0.53 0.10 0.40 0.16
CHEM 0.74 0.53 1.00 0.00 0.10 -0.13
ALG -0.02 0.10 0.00 1.00 0.42 0.32
CALC 0.27 0.40 0.10 0.42 1.00 0.72
STAT 0.09 0.16 -0.13 0.32 0.72 1.00
23
Alternate visualizations
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 Correlation plot
● ● ● ● ● ● ● ● 1
BIO ● ● ● ● ● ● ● ● ● ● ● ● ●
BIO
● ● ● ● ● ● ● ● 0.8
1.0 2.0 3.0 4.0
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6
GEO GEO
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
CHEM 0.2
● ● ● ●
CHEM ● ● ● ● ● ●
● ● ● ● ● ● ● ● 0
1.0 2.0 3.0 4.0
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
ALG
−0.2
● ● ● ● ● ● ● ●
ALG ● ● ● ●
● ● ● ● ●
−0.4
CALC −0.6
● ● ● ● ● ● ● ●
STAT −0.8
1.0 2.0 3.0 4.0
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
STAT −1
BIO
GEO
CHEM
ALG
CALC
STAT
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
##Left plot
pairs(X)
##Right plot
cor.plot(Xcorr) #in psych package
24
Let’s look at an example (Output)
out.fa = fa(X,nfactors=2,rotate=‘none’,fm=‘pa’)
> out.fa
Factor Analysis using method = pa
[omitted]
Standardized loadings
PA1 PA2 h2 u2
BIO 0.81 -0.46 0.86 0.142
GEO 0.70 -0.19 0.53 0.470
CHEM 0.61 -0.54 0.67 0.330
ALG 0.24 0.36 0.19 0.815
CALC 0.73 0.66 0.98 0.023
STAT 0.42 0.63 0.57 0.429
Here K = 2, and
• The ‘Standardized loadings’ are the entries in W
• h2 are the communality ( K 2
P
k=1 Wkj )
• u2 are specific variances (ψ̂j )
25
Let’s look at an example (Output continued)
PA1 PA2
SS loadings 2.28 1.51
Proportion Var 0.38 0.25
Cumulative Var 0.38 0.63
[omitted]
These are
• SS loadings: The sum of the squared loadings on each factor
• Proportion Var: The amount of total variation explained by
each factor
26
Return to the loadings table
Standardized loadings
PA1 PA2 h2 u2
BIO 0.81 -0.46 0.86 0.142
GEO 0.70 -0.19 0.53 0.470
CHEM 0.61 -0.54 0.67 0.330
ALG 0.24 0.36 0.19 0.815
CALC 0.73 0.66 0.98 0.023
STAT 0.42 0.63 0.57 0.429
27
Graphical representation
1.0
Standardized loadings STAT CALC
0.5
PA1 PA2 h2 u2 ALG
Factor 2
0.0
GEO 0.70 -0.19 0.53 0.470
GEO
CHEM 0.61 -0.54 0.67 0.330
ALG 0.24 0.36 0.19 0.815
−0.5
BIO
CHEM
Factor 1
28
Example rotations
Remember: the O matrix was arbitrary!
1.0
BIO
CALC GEO
STAT CALC
CHEM
0.5
0.5
ALG
STAT
ALG
Black: ‘none’
Factor 2
Factor 2
0.0
0.0
−0.5
BIO
CHEM
−1.0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
Factor 1 Factor 1
up that is good
1.0
1.0
Red: A non-orthogonal
rotation I made up that is
0.5
0.5
STAT
the best
Factor 2
Factor 2
ALG
STAT
0.0
0.0
ALG
CALC
CALC
−0.5
−0.5
GEO GEO
CHEM CHEM
BIO BIO
−1.0
−1.0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
Factor 1 Factor 1
29
Automated rotations?
Any solution we estimate is just one of an infinite number of
solutions
(Given by all possible rotations W 0 = O > W )
30
Automated rotations?
Any solution we estimate is just one of an infinite number of
solutions
(Given by all possible rotations W 0 = O > W )
30
Varimax and Oblimin rotation
Varimax: Finds an O that maximizes the variance of the
diagonals of W > W while keeping the latent factors uncorrelated
1.0
STAT CALC
0.5
0.5
ALG
STAT
Black: ‘none’
Factor 2
Factor 2
ALG
0.0
0.0
CALC
GEO
Red: A non-orthogonal
−0.5
−0.5
BIO
CHEM
rotation I made up
GEO
CHEM
BIO
−1.0
−1.0
Factor 1
0.5 1.0 −1.0 −0.5 0.0
Factor 1
0.5 1.0
Green: Varimax rotation
Blue: Oblimin oblique
1.0
1.0
CALC CALC
STAT STAT
rotation
0.5
0.5
ALG ALG
GEO GEO
Factor 2
Factor 2
BIO BIO
0.0
0.0
CHEM CHEM
(Need package GPArotation)
−0.5
−0.5
−1.0
−1.0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
Factor 1 Factor 1
32
Comparison of factor solution
1.0
1.0
CALC CALC
STAT STAT
STAT CALC
0.5
0.5
0.5
ALG ALG
ALG
GEO GEO
Factor 2
Factor 2
Factor 2
BIO BIO
0.0
0.0
0.0
CHEM CHEM
GEO
−0.5
−0.5
−0.5
BIO
CHEM
−1.0
−1.0
−1.0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
33
Which rotation?
34
How to choose K (the number of factors)?
35
How to choose K (the number of factors)?
36
Scree plot
37
Scree results on grades data
●
2.5
2.0
●
eigenvals
1.5
1.0
●
0.5
●
●
●
1 2 3 4 5 6
number of factors
2.5
2.0
●
eigenvals
1.5
1.0
●
0.5
●
●
●
1 2 3 4 5 6
number of factors
The idea is just to pick a threshold (very often 60%), and keep all
the factors needed to explain that much variance
ev = eigen(Xcorr)
ev$val/p
> ev$val/p
[1] 0.46445471 0.29629698 0.10507729 0.05610791
0.04368477 0.03437835
> cumsum(ev$val/p)
[1] 0.4644547 0.7607517 0.8658290 0.9219369
0.9656217 1.0000000
Pick 2 factors.
41
(Horn’s) parallel analysis
A Monte-Carlo based simulation method that compares the
observed eigenvalues with those obtained from uncorrelated normal
variables.
library(nFactors)
ap = parallel(subject = nrow(X),
var = ncol(X), rep=100,cent=0.05)
# parallel: This makes a correlation matrices from null dist
ns = nScree(ev$values,ap$eigen$qevpea)
plotnScree(ns) 42
(Horn’s) parallel analysis
If we type
ns = nScree(ev$values,ap$eigen$qevpea)
ns
> ns
noc naf nparallel nkaiser
2 2 2 2
43
Parallel analysis
Non Graphical Solutions to Scree Test
● ● Eigenvalues (>mean = 2 )
Parallel Analysis (n = 2 )
Optimal Coordinates (n = 2 )
2.5
Acceleration Factor (n = 2 )
2.0
●(OC)
Eigenvalues
1.5
1.0
●(AF)
0.5
●
●
●
1 2 3 4 5 6
Components
Figure: Note: there is an error in the code for this package for
computing AF. They ignore the fact that it is undefined for 1 factor. So,
it should say n = 3 for AF. 44