Canonical Correlation Analysis
Nathaniel E. Helwig
Assistant Professor of Psychology and Statistics
University of Minnesota (Twin Cities)
Updated 16-Mar-2017
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 1
Copyright
Copyright
c 2017 by Nathaniel E. Helwig
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 2
Outline of Notes
1) Canonical Correlations 2) Decathlon Example
Overview Data Overview
Population Defintion Two Sets of Variables
Sample Estimates CCA of Unstandardized Data
Large Samples CCA of Standardized Data
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 3
Canonical Correlations
Canonical Correlations
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 4
Canonical Correlations Overview
Purpose of Canonical Correlation Analysis
Canonical Correlation Analysis (CCA) connects two sets of variables
by finding linear combinations of variables that maximally correlate.
There are two typical purposes of CCA:
1 Data reduction: explain covariation between two sets of variables
using small number of linear combinations
2 Data interpretation: find features (i.e., canonical variates) that are
important for explaining covariation between sets of variables
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 5
Canonical Correlations Population Definition
Linear Combinations of Two Sets of Variables
Let X = (X1 , . . . , Xp )0 and Y = (Y1 , . . . , Yq )0 denote random vectors
with mean vectors µX and µY and covariance matrices ΣX and ΣY .
Let Z 0 = (X 0 , Y 0 ) and note that Z ∼ (µ, Σ) where µ0 = (µ0X , µ0Y ) and
ΣX ΣXY
Σ=
ΣYX ΣY
where ΣXY = E[(X − µX )(Y − µY )0 ] is the covariance between X & Y .
Define new variables U and V via linear combinations of X & Y
U = a0 X
V = b0 Y
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 6
Canonical Correlations Population Definition
Defining Canonical Variates (and Correlations)
Note that U = a0 X and V = b0 Y have properties
Var(U) = a0 ΣX a
Var(V ) = b0 ΣY b
Cov(U, V ) = a0 ΣXY b
The first pair of canonical variates (U1 , V1 ) is defined via the pair of
linear combination vectors {a1 , b1 } that maximize
Cov(U, V ) a0 ΣXY b
Cor(U, V ) = p =√ 0 √
a ΣX a b0 ΣY b
p
Var(U) Var(V )
subject to U1 and V1 having unit variance.
Remaining canonical variates (U` , V` ) maximize the above subject to
having unit variance and being uncorrelated with (Uk , Vk ) for all k < `.
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 7
Canonical Correlations Population Definition
Computing Canonical Variates (and Correlations)
The k -th pair of canonical variates is given by
−1/2 −1/2
Uk = u0k ΣX X and Vk = vk ΣY Y
| {z } | {z }
a0k b0k
where
−1/2 −1/2
uk is the k -th eigenvector of ΣX ΣXY Σ−1
Y ΣYX ΣX
−1/2 −1/2
vk is the k -th eigenvector of ΣY ΣYX Σ−1
X ΣXY ΣY
The k -th canonical correlation is given by
Cor(Uk , Vk ) = ρk
−1/2 −1/2
where ρ2k is the k -th eigenvalue of ΣX ΣXY Σ−1
Y ΣYX ΣX
−1/2 −1/2
[ρ2k is also the k -th eigenvalue of ΣY ΣYX Σ−1
X ΣXY ΣY ]
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 8
Canonical Correlations Population Definition
Covariance of Original and Canonical Variables
U = A0 X and V = B0 Y where A = [a1 , . . . , ap ] and B = [b1 , . . . , bq ].
U = (U1 , . . . , Up )0 contains the p canonical variates from X
V = (V1 , . . . , Vq )0 contains the q canonical variates from Y
If p ≤ q, we are interested in first p canonical variates from Y
The canonical variates and original variables have covariance matrices
Cov(U, X ) = Cov(A0 X , X ) = A0 ΣX
Cov(U, Y ) = Cov(A0 X , Y ) = A0 ΣXY
Cov(V , X ) = Cov(B0 Y , X ) = B0 ΣYX
Cov(V , Y ) = Cov(B0 Y , Y ) = B0 ΣY
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 9
Canonical Correlations Population Definition
Correlation of Original and Canonical Variables
The canonical variates and original variables have correlation matrices
−1/2 −1/2
Cor(U, X ) = Cov(A0 X , Σ̃X X ) = A0 ΣX Σ̃X
−1/2 −1/2
Cor(U, Y ) = Cov(A0 X , Σ̃Y Y ) = A0 ΣXY Σ̃Y
−1/2 −1/2
Cor(V , X ) = Cov(B0 Y , Σ̃X X ) = B0 ΣYX Σ̃X
−1/2 −1/2
Cor(V , Y ) = Cov(B0 Y , Σ̃Y Y ) = B0 ΣY Σ̃Y
given that Var(Uk ) = Var(V` ) = 1 for all k , `.
Σ̃X = diag(ΣX ) is a diagonal matrix containing X variances
Σ̃Y = diag(ΣY ) is a diagonal matrix containing Y variances
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 10
Canonical Correlations Population Definition
Canonical Variates and Summarizing Variability
The linear transformations U = A0 X and V = B0 Y are defined to
maximize the correlations between the canonical variables.
Not the same as maximizing the explained variance in ΣX or ΣY .
If the first few pairs of canonical variables do not well explain the
variability in ΣX and ΣY , then the interpretation becomes less clear.
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 11
Canonical Correlations Sample Estimates
Moving to the Sample Situation
iid
Assume that zi = (x0i , y0i )0 ∼ N(µ, Σ) where
µX ΣX ΣXY
µ= and Σ =
µY ΣYX ΣY
and let the sample mean vector and covariance matrix be denoted by
x̄ SX SXY
z̄ = and S =
ȳ SYX SY
where
Pn Pn
x̄ = (1/n) i=1 xi and ȳ = (1/n) i=1 yi
1 Pn 0
SX = n−1 i=1 (xi − x̄)(xi − x̄)
1 Pn 0
SY = n−1 i=1 (yi − ȳ)(yi − ȳ)
1 Pn
SXY = S0YX = n−1 i=1 (xi − x̄)(yi − ȳ)
0
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 12
Canonical Correlations Sample Estimates
Defining Canonical Variates (and Correlations)
Note that U = a0 X and V = b0 Y have sample properties
Var(U)
c = a0 SX a
c ) = b0 SY b
Var(V
Cov(U,
d V ) = a0 SXY b
The first pair of sample canonical variates (U1 , V1 ) is defined via the
pair of linear combination vectors {a1 , b1 } that maximize
Cov(U,
d V) a0 SXY b
Cor(U,
d V) = q =√ 0 √
a SX a b0 SY b
q
Var(U)
c c )
Var(V
subject to U1 and V1 having unit variance.
Remaining canonical variates (U` , V` ) maximize the above subject to
having unit variance and being uncorrelated with (Uk , Vk ) for all k < `.
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 13
Canonical Correlations Sample Estimates
Calculating Canonical Variates (and Correlations)
The sample estimate of the k -th pair of canonical variates is given by
−1/2 −1/2
Ûk = û0k SX X and V̂k = v̂k SY Y
| {z } | {z }
â0k b̂0k
where
−1/2 −1/2
ûk is the k -th eigenvector of SX SXY S−1
Y SYX SX
−1/2 −1/2
v̂k is the k -th eigenvector of SY SYX S−1
X SXY SY
The sample estimate of the k -th canonical correlation is given by
d k , Vk ) = ρ̂k
Cor(U
−1/2 −1/2
where ρ̂2k is the k -th eigenvalue of SX SXY S−1
Y SYX SX
−1/2 −1/2
[ρ̂2k is also the k -th eigenvalue of SY SYX S−1
X SXY SY ]
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 14
Canonical Correlations Sample Estimates
Covariance of Original and Canonical Variables
Û = Â0 X and V̂ = B̂0 Y where  = [â1 , . . . , âp ] and B̂ = [b̂1 , . . . , b̂q ].
Û = (Û1 , . . . , Ûp )0 contains the p canonical variates from X
V̂ = (V̂1 , . . . , V̂q )0 contains the q canonical variates from Y
If p ≤ q, we are interested in first p canonical variates from Y
The sample canonical variates and original variables have covariances
d Â0 X , X ) = Â0 SX
d Û, X ) = Cov(
Cov(
d Â0 X , Y ) = Â0 SXY
d Û, Y ) = Cov(
Cov(
d B̂0 Y , X ) = B̂0 SYX
d V̂ , X ) = Cov(
Cov(
d B̂0 Y , Y ) = B̂0 SY
d V̂ , Y ) = Cov(
Cov(
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 15
Canonical Correlations Sample Estimates
Correlation of Original and Canonical Variables
The sample canonical variates and original variables have correlations
d Â0 X , S̃−1/2 X ) = Â0 SX S̃−1/2
d Û, X ) = Cov(
Cor( X X
d Â0 X , S̃−1/2 Y ) = Â0 SXY S̃−1/2
d Û, Y ) = Cov(
Cor( Y Y
d B̂0 Y , S̃−1/2 X ) = B̂0 SYX S̃−1/2
d V̂ , X ) = Cov(
Cor( X X
d B̂0 Y , S̃−1/2 Y ) = B̂0 SY S̃−1/2
d V̂ , Y ) = Cov(
Cor( Y Y
given that Var(Ûk ) = Var(V̂` ) = 1 for all k , `.
S̃X = diag(SX ) is a diagonal matrix containing X variances
S̃Y = diag(SY ) is a diagonal matrix containing Y variances
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 16
Canonical Correlations Sample Estimates
Covariance Matrix Implied by CCA for X
Note that we have the following properties
d Û) = Â0 SX Â = Ip
Cov(
This implies that we can write
Â0 SX Â = Ip
(Â0 )−1 Â0 SX Â(Â−1 ) = (Â0 )−1 (Â−1 )
SX = (Â−1 )0 (Â−1 )
p
X
= (â(j) )(â(j) )0
j=1
where a(j) denotes the j-th column of (Â−1 )0 .
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 17
Canonical Correlations Sample Estimates
Covariance Matrix Implied by CCA for Y
Note that we have the following properties
d V̂) = B̂0 SY B̂ = Iq
Cov(
This implies that we can write
B̂0 SY B̂ = Iq
(B̂0 )−1 B̂0 SY B̂(B̂−1 ) = (B̂0 )−1 (B̂−1 )
SY = (B̂−1 )0 (B̂−1 )
q
X
= (b̂(j) )(b̂(j) )0
j=1
where b(j) denotes the j-th column of (B̂−1 )0 .
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 18
Canonical Correlations Sample Estimates
Covariance Matrix Implied by CCA for (X , Y )
Note that we have the following properties (assuming p < q)
ρ̂1 0 · · · 0 0 · · · 0
0 ρ̂2 · · · 0 0 · · · 0
d Û, V̂) = Â0 SXY B̂ =
Cov( .. .. . . .. .. = ρ̂
. . . . 0 . 0
0 0 ··· ρ̂p 0 · · · 0
This implies that we can write
Â0 SXY B̂ = ρ̂
(Â0 )−1 Â0 SXY B̂(B̂−1 ) = (Â0 )−1 ρ̂(B̂−1 )
SXY = (Â−1 )0 ρ̂(B̂−1 )
p
X
= ρ̂j (â(j) )(b̂(j) )0
j=1
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 19
Canonical Correlations Sample Estimates
CCA Error of Approximation Matrices
Using r < p canonical variates, the approximation error matrices are
r
X p
X
(j) 0
EX = SX − (j)
(â )(â ) = (â(k ) )(â(k ) )0
j=1 k =r +1
r
X q
X
(j) 0
EY = SY − (j)
(b̂ )(b̂ ) = (b̂(k ) )(b̂(k ) )0
j=1 k =r +1
r
X q
X
(j) 0
EXY = SXY − (j)
ρ̂j (â )(b̂ ) = ρ̂k (â(k ) )(b̂(k ) )0
j=1 k =r +1
The error matrices provide a descriptive measure of how well the first r
pairs of canonical variates explain the covariation in the data.
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 20
Canonical Correlations Large Sample Inference
Likelihood Ratio Test: Is CCA Worthwhile?
Note that if ΣXY = 0p×q , then Cov(U, V ) = a0 Σ12 b = 0 for all a and b.
Implies that all canonical correlations must be zero
Then there is no point in pursuing CCA
For large n, we reject H0 : ΣXY = 0p×q in favor of H1 : ΣXY 6= 0p×q if
p
|SX ||SY | X
−2 ln(Λ) = n ln = −n ln(1 − ρ̂2j )
|S|
j=1
is larger than χ2pq (α).
For an improvement to the χ2 approximation, Bartlett suggested
replacing the scaling factor of n by n − 1 − (1/2)(p + q + 1)
p
X
−2 ln(Λ) ≈ −[n − 1 − (1/2)(p + q + 1)] ln(1 − ρ̂2j )
j=1
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 21
Decathlon Example
Decathlon Example
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 22
Decathlon Example Data Overview
Men’s Olympic Decathlon Data from 1988
Data from men’s 1988 Olympic decathlon
Total of n = 34 athletes
Have p = 10 variables giving score for each decathlon event
Have overall decathlon score also (score)
> decathlon[1:9,]
run100 long.jump shot high.jump run400 hurdle discus pole.vault javelin run1500 score
Schenk 11.25 7.43 15.48 2.27 48.90 15.13 49.28 4.7 61.32 268.95 8488
Voss 10.87 7.45 14.97 1.97 47.71 14.46 44.36 5.1 61.76 273.02 8399
Steen 11.18 7.44 14.20 1.97 48.29 14.81 43.66 5.2 64.16 263.20 8328
Thompson 10.62 7.38 15.02 2.03 49.06 14.72 44.80 4.9 64.04 285.11 8306
Blondel 11.02 7.43 12.92 1.97 47.44 14.40 41.20 5.2 57.46 256.64 8286
Plaziat 10.83 7.72 13.58 2.12 48.34 14.18 43.06 4.9 52.18 274.07 8272
Bright 11.18 7.05 14.12 2.06 49.34 14.39 41.68 5.7 61.60 291.20 8216
De.Wit 11.05 6.95 15.34 2.00 48.21 14.36 41.32 4.8 63.00 265.86 8189
Johnson 11.15 7.12 14.52 2.03 49.15 14.66 42.36 4.9 66.46 269.62 8180
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 23
Decathlon Example Data Overview
Resigning Running Events
For the running events (run100, run400, run1500, and hurdle),
lower scores correspond to better performance, whereas higher scores
represent better performance for other events.
To make interpretation simpler, we will resign the running events:
> decathlon[,c(1,5,6,10)] <- (-1)*decathlon[,c(1,5,6,10)]
> decathlon[1:9,]
run100 long.jump shot high.jump run400 hurdle discus pole.vault javelin run1500 score
Schenk -11.25 7.43 15.48 2.27 -48.90 -15.13 49.28 4.7 61.32 -268.95 8488
Voss -10.87 7.45 14.97 1.97 -47.71 -14.46 44.36 5.1 61.76 -273.02 8399
Steen -11.18 7.44 14.20 1.97 -48.29 -14.81 43.66 5.2 64.16 -263.20 8328
Thompson -10.62 7.38 15.02 2.03 -49.06 -14.72 44.80 4.9 64.04 -285.11 8306
Blondel -11.02 7.43 12.92 1.97 -47.44 -14.40 41.20 5.2 57.46 -256.64 8286
Plaziat -10.83 7.72 13.58 2.12 -48.34 -14.18 43.06 4.9 52.18 -274.07 8272
Bright -11.18 7.05 14.12 2.06 -49.34 -14.39 41.68 5.7 61.60 -291.20 8216
De.Wit -11.05 6.95 15.34 2.00 -48.21 -14.36 41.32 4.8 63.00 -265.86 8189
Johnson -11.15 7.12 14.52 2.03 -49.15 -14.66 42.36 4.9 66.46 -269.62 8180
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 24
Decathlon Example Two Sets of Variables
Split Events into Two Sets: Arms vs Legs
We will split the decathlon events into two different sets:
X : shot, discus, javelin, pole.vault
Y : run100, run400, run1500, hurdle, long.jump, high.jump
Note that X are “arm events” (throwing/vaulting), whereas Y are “leg
events” (running/jumping).
R code to split the decathlon data into two sets of events
> X <- as.matrix(decathlon[,c("shot", "discus", "javelin", "pole.vault")])
> Y <- as.matrix(decathlon[,c("run100", "run400", "run1500", "hurdle", "long.jump", "high.jump")])
> n <- nrow(X) # n = 34
> p <- ncol(X) # p = 4
> q <- ncol(Y) # q = 6
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 25
Decathlon Example Canonical Correlation Analysis of Unstandardized Data
CCA of Decathlon Data in R
R code to conduct canonical correlation analysis
# canonical correlations of covariance (unstandardized data)
> cca <- cancor(X, Y)
# cca (the normal way)
> Sx <- cov(X)
> Sy <- cov(Y)
> Sxy <- cov(X,Y)
> Sxeig <- eigen(Sx, symmetric=TRUE)
> Sxisqrt <- Sxeig$vectors %*% diag(1/sqrt(Sxeig$values)) %*% t(Sxeig$vectors)
> Syeig <- eigen(Sy, symmetric=TRUE)
> Syisqrt <- Syeig$vectors %*% diag(1/sqrt(Syeig$values)) %*% t(Syeig$vectors)
> Xmat <- Sxisqrt %*% Sxy %*% solve(Sy) %*% t(Sxy) %*% Sxisqrt
> Ymat <- Syisqrt %*% t(Sxy) %*% solve(Sx) %*% Sxy %*% Syisqrt
> Xeig <- eigen(Xmat, symmetric=TRUE)
> Yeig <- eigen(Ymat, symmetric=TRUE)
# compare correlations (same)
> cca$cor
[1] 0.7702006 0.5033532 0.4184145 0.3052556
> rho <- sqrt(Xeig$values)
> rho
[1] 0.7702006 0.5033532 0.4184145 0.3052556
> sqrt(Yeig$values[1:p])
[1] 0.7702006 0.5033532 0.4184145 0.3052556
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 26
Decathlon Example Canonical Correlation Analysis of Unstandardized Data
CCA of Decathlon Data in R (continued)
R code to compare the CCA coefficients:
# compare linear combinations (different!)
> Ahat <- Sxisqrt %*% Xeig$vectors
> Bhat <- Syisqrt %*% Yeig$vectors
> sum((cca$xcoef - Ahat)^2)
[1] 6.710414
> sum((cca$ycoef[,1:p] - Bhat[,1:p])^2)
[1] 42.98483
# NOTE: you need to multiply R’s xcoef and ycoef by sqrt(n-1)
# to obtain the results we are expecting...
# compare linear combinations (same!)
> Ahat <- Sxisqrt %*% Xeig$vectors
> Bhat <- Syisqrt %*% Yeig$vectors
> sum((cca$xcoef * sqrt(n-1) - Ahat)^2)
[1] 3.031301e-28
> sum((cca$ycoef[,1:p] * sqrt(n-1) - Bhat[,1:p])^2)
[1] 2.414499e-25
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 27
Decathlon Example Canonical Correlation Analysis of Unstandardized Data
Plot the CCA Coefficients
X Coefficients Y Coefficients
0.5
6
discus high.jump
4
javelin
0.0
A2 Coefficients
B2 Coefficients
2
−0.5
long.jump
run400
0
run1500
shot run100
−1.0
pole.vault hurdle
−2
−2.0 −1.5 −1.0 −0.5 0.0 −2.0 −1.5 −1.0 −0.5 0.0
A1 Coefficients B1 Coefficients
R code for left plot:
plot(Ahat[,1:2], xlab="A1 Coefficients", ylab="A2 Coefficients",
type="n", main="X Coefficients", xlim=c(-2, 0.1), ylim=c(-1.1, 0.5))
text(Ahat[,1:2], labels=colnames(X))
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 28
Decathlon Example Canonical Correlation Analysis of Unstandardized Data
Define the Canonical Variables
If X = {xij }n×p and Y = {yij }n×q , then
Û = XÂ = {ûij }n×p where columns of Û contain the canonical
variables for the X set
V̂ = YB̂ = {v̂ij }n×q where columns of V̂ contain the canonical
variables for the Y set
R code to define canonical variables:
> U <- X %*% Ahat
> V <- Y %*% Bhat
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 29
Decathlon Example Canonical Correlation Analysis of Unstandardized Data
Covariance Matrices of Canonical Variables
R code to check covariance matrices of the canonical variables:
# canonical variable covariances
> round(cov(U),4)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1
> round(cov(V),4)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 0 0 0 0 0
[2,] 0 1 0 0 0 0
[3,] 0 0 1 0 0 0
[4,] 0 0 0 1 0 0
[5,] 0 0 0 0 1 0
[6,] 0 0 0 0 0 1
> round(cov(U,V),4)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.7702 0.0000 0.0000 0.0000 0 0
[2,] 0.0000 0.5034 0.0000 0.0000 0 0
[3,] 0.0000 0.0000 0.4184 0.0000 0 0
[4,] 0.0000 0.0000 0.0000 0.3053 0 0
> rho
[1] 0.7702006 0.5033532 0.4184145 0.3052556
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 30
Decathlon Example Canonical Correlation Analysis of Unstandardized Data
Covariances of Canonical and Observed Variables
R code to check covariance matrices of the canonical variables:
# covariance of original and canonical variables (U and X)
> Ainv <- solve(Ahat)
> sum( ( cov(U, X) - crossprod(Ahat, Sx) )^2 )
[1] 3.396329e-30
> sum( ( Sx - crossprod(Ainv) )^2 )
[1] 4.364327e-27
# covariance of original and canonical variables (V and Y)
> Binv <- solve(Bhat)
> sum( ( cov(V, Y) - crossprod(Bhat, Sy) )^2 )
[1] 1.696269e-28
> sum( ( Sy - crossprod(Binv) )^2 )
[1] 3.027024e-26
# covariance of original and canonical variables (U and Y)
> sum( (cov(U, Y) - crossprod(Ahat, Sxy))^2 )
[1] 2.071712e-29
# covariance of original and canonical variables (V and X)
> sum( (cov(V, X) - crossprod(Bhat, t(Sxy)))^2 )
[1] 2.943246e-28
# covariance of canonical variables (U and V)
> rhomat <- cbind(diag(rho), matrix(0, p, q-p))
> sum( (cov(U, V) - rhomat)^2 )
[1] 1.241068e-27
> sum( (Sxy - crossprod(Ainv, rhomat) %*% Binv)^2 )
[1] 1.355523e-25
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 31
Decathlon Example Canonical Correlation Analysis of Unstandardized Data
Error of Approximation Matrices (r = 2)
R code to calculate error of approximation matrices with r = 2:
# error of approximation matrices (with r=2)
> Ainv <- solve(Ahat)
> Binv <- solve(Bhat)
> r <- 2
> Ex <- Sx - crossprod(Ainv[1:r,])
> Ey <- Sy - crossprod(Binv[1:r,])
> Exy <- Sxy - crossprod(diag(rho[1:r]) %*% Ainv[1:r,], Binv[1:r,])
# get norms of error matrices
> sqrt(mean(Ex^2))
[1] 6.561393
> sqrt(mean(Ey^2))
[1] 18.37339
> sqrt(mean(Exy^2))
[1] 1.725392
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 32
Decathlon Example Canonical Correlation Analysis of Standardized Data
CCA of Standardized Decathlon Data in R
R code to conduct standardized canonical correlation analysis
# standardize data
> Xs <- scale(X)
> Ys <- scale(Y)
# canonical correlations of correlation (standardized data)
> ccas <- cancor(Xs, Ys)
# cca (the normal way)
> Sx <- cov(Xs)
> Sy <- cov(Ys)
> Sxy <- cov(Xs,Ys)
> Sxeig <- eigen(Sx, symmetric=TRUE)
> Sxisqrt <- Sxeig$vectors %*% diag(1/sqrt(Sxeig$values)) %*% t(Sxeig$vectors)
> Syeig <- eigen(Sy, symmetric=TRUE)
> Syisqrt <- Syeig$vectors %*% diag(1/sqrt(Syeig$values)) %*% t(Syeig$vectors)
> Xmat <- Sxisqrt %*% Sxy %*% solve(Sy) %*% t(Sxy) %*% Sxisqrt
> Ymat <- Syisqrt %*% t(Sxy) %*% solve(Sx) %*% Sxy %*% Syisqrt
> Xeig <- eigen(Xmat, symmetric=TRUE)
> Yeig <- eigen(Ymat, symmetric=TRUE)
# compare correlations (same)
> cca$cor
[1] 0.7702006 0.5033532 0.4184145 0.3052556
> sqrt(Xeig$values)
[1] 0.7702006 0.5033532 0.4184145 0.3052556
> sqrt(Yeig$values[1:p])
[1] 0.7702006 0.5033532 0.4184145 0.3052556
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 33
Decathlon Example Canonical Correlation Analysis of Standardized Data
CCA of Standardized Decathlon Data in R (continued)
R code to compare the CCA coefficients:
# compare linear combinations (different?)
> Ahat <- Sxisqrt %*% Xeig$vectors
> Bhat <- Syisqrt %*% Yeig$vectors
> sum((ccas$xcoef * sqrt(n-1) - Ahat)^2)
[1] 3.332536e-29
> sum((ccas$ycoef[,1:p] * sqrt(n-1) - Bhat[,1:p])^2)
[1] 11.59453
# note that the signing is arbitary!!
> ccas$ycoef[,1:p] * sqrt(n-1)
[,1] [,2] [,3] [,4]
run100 -0.1439138 -0.2404940 0.5274876 -0.13754449
run400 -0.1373435 0.7655659 -1.2826821 0.96359176
run1500 0.3023537 -1.0519285 -0.1514027 -0.52923644
hurdle -0.4396044 -1.0374417 0.6303782 0.49905604
long.jump -0.3564702 0.4110878 -0.0253127 -1.09325282
high.jump -0.1855627 0.5731149 -0.2615838 -0.09007821
> Bhat[,1:p]
[,1] [,2] [,3] [,4]
[1,] 0.1439138 -0.2404940 -0.5274876 -0.13754449
[2,] 0.1373435 0.7655659 1.2826821 0.96359176
[3,] -0.3023537 -1.0519285 0.1514027 -0.52923644
[4,] 0.4396044 -1.0374417 -0.6303782 0.49905604
[5,] 0.3564702 0.4110878 0.0253127 -1.09325282
[6,] 0.1855627 0.5731149 0.2615838 -0.09007821
> Bhat[,1:p] <- Bhat[,1:p] %*% diag(c(-1,1,-1,1))
> sum((ccas$ycoef[,1:p] * sqrt(n-1) - Bhat[,1:p])^2)
[1] 1.132493e-28
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 34
Decathlon Example Canonical Correlation Analysis of Standardized Data
Plot the Standardized CCA Coefficients
X Coefficients Y Coefficients
2.0
discus run400
high.jump
1.5
0.5
long.jump
1.0
A2 Coefficients
B2 Coefficients
0.0
0.5
javelin
run100
−0.5
−0.5
pole.vault
−1.0
shot
−1.5
hurdle run1500
−2.0 −1.5 −1.0 −0.5 0.0 −0.4 −0.2 0.0 0.2 0.4
A1 Coefficients B1 Coefficients
R code for left plot:
plot(Ahat[,1:2], xlab="A1 Coefficients", ylab="A2 Coefficients",
type="n", main="X Coefficients", xlim=c(-2, 0.1), ylim=c(-1.1, 0.5))
text(Ahat[,1:2], labels=colnames(X))
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 35
Decathlon Example Canonical Correlation Analysis of Standardized Data
Define the Canonical Variables
If Xs = {(xij − x̄j )/sxj }n×p and Ys = {(yij − ȳj )/syj }n×q , then
Û = Xs  = {ûij }n×p where columns of Û contain the canonical
variables for the Xs set
V̂ = Ys B̂ = {v̂ij }n×q where columns of V̂ contain the canonical
variables for the Ys set
R code to define canonical variables:
> U <- Xs %*% Ahat
> V <- Ys %*% Bhat
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 36
Decathlon Example Canonical Correlation Analysis of Standardized Data
Covariance Matrices of Canonical Variables
R code to check covariance matrices of the canonical variables:
# canonical variable covariances
> round(cov(U),4)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1
> round(cov(V),4)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 0 0 0 0 0
[2,] 0 1 0 0 0 0
[3,] 0 0 1 0 0 0
[4,] 0 0 0 1 0 0
[5,] 0 0 0 0 1 0
[6,] 0 0 0 0 0 1
> round(cov(U,V),4)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.7702 0.0000 0.0000 0.0000 0 0
[2,] 0.0000 0.5034 0.0000 0.0000 0 0
[3,] 0.0000 0.0000 0.4184 0.0000 0 0
[4,] 0.0000 0.0000 0.0000 0.3053 0 0
> rho
[1] 0.7702006 0.5033532 0.4184145 0.3052556
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 37
Decathlon Example Canonical Correlation Analysis of Standardized Data
Covariances of Canonical and Observed Variables
R code to check covariance matrices of the canonical variables:
# covariance of original and canonical variables (U and Xs)
> Ainv <- solve(Ahat)
> sum( ( cov(U, Xs) - crossprod(Ahat, Sx) )^2 )
[1] 2.759323e-31
> sum( ( Sx - crossprod(Ainv) )^2 )
[1] 6.569732e-30
# covariance of original and canonical variables (V and Ys)
> Binv <- solve(Bhat)
> sum( ( cov(V, Ys) - crossprod(Bhat, Sy) )^2 )
[1] 2.406961e-31
> sum( ( Sy - crossprod(Binv) )^2 )
[1] 3.136492e-29
# covariance of original and canonical variables (U and Ys)
> sum( (cov(U, Ys) - crossprod(Ahat, Sxy))^2 )
[1] 5.477785e-32
# covariance of original and canonical variables (V and Xs)
> sum( (cov(V, Xs) - crossprod(Bhat, t(Sxy)))^2 )
[1] 1.336149e-31
# covariance of canonical variables (U and V)
> rhomat <- cbind(diag(rho), matrix(0, p, q-p))
> sum( (cov(U, V) - rhomat)^2 )
[1] 1.272906e-29
> sum( (Sxy - crossprod(Ainv, rhomat) %*% Binv)^2 )
[1] 7.505349e-30
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 38
Decathlon Example Canonical Correlation Analysis of Standardized Data
Error of Approximation Matrices (r = 2)
R code to calculate error of approximation matrices with r = 2:
# error of approximation matrices (with r=2)
> Ainv <- solve(Ahat)
> Binv <- solve(Bhat)
> r <- 2
> Ex <- Sx - crossprod(Ainv[1:r,])
> Ey <- Sy - crossprod(Binv[1:r,])
> Exy <- Sxy - crossprod(diag(rho[1:r]) %*% Ainv[1:r,], Binv[1:r,])
# get norms of error matrices
> sqrt(mean(Ex^2))
[1] 0.2432351
> sqrt(mean(Ey^2))
[1] 0.2296716
> sqrt(mean(Exy^2))
[1] 0.07458264
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 39