Lecture 3
Multivariate Multiple Regression
➢ When there is more than one response variable, we must use a multivariate
multiple linear regression model.
➢ Example 1:
We might want to model both math and reading SAT scores as a function of
gender, race, parent income, and so forth.
➢ Example 2:
Dependent Variable 1: Revenue
Dependent Variable 2: Customer traffic
Independent Variable 1: Dollars spent on advertising by city
Independent Variable 2: City Population
➢ A data table containing multivariate multiple regression data would be
organized like the following Table.
Table: Multivariate Multiple Regression Data
Observation, Response Response … Response Predictor Predictor … Predictor
i 𝑌1 𝑌2 𝑌𝑚 𝑧1 𝑧2 𝑧𝑟
1 𝑌11 𝑌12 𝑌1𝑚 𝑧11 𝑧12 … 𝑧1𝑟
2 𝑌21 𝑌22 𝑌2𝑚 𝑧21 𝑧22 … 𝑧2𝑟
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
n 𝑌𝑛1 𝑌𝑛2 𝑌𝑛𝑚 𝑧𝑛1 𝑧𝑛2 … 𝑧𝑛𝑟
➢ Now that we are considering a problem with multiple predictors and
several response variables, we must be especially mindful of the notation.
➢ As an example, for clarification, the Y21 entry in Table represents the value
of the first response variable (Y1) observed on the second trial.
➢ The table contains m response variables and r predictor variables, and there
were n trials performed (or n observations collected) for this experiment.
➢ Extending the multiple linear regression case to the multivariate multiple
linear case, we can write a regression model for each response (there are m of
them) on the ith observation, where i = 1, … , n.
𝑌𝑖1 = [𝛽01 + 𝛽11 𝑧𝑖1 + ⋯ … … … . . +𝛽𝑟1 𝑧𝑖𝑟 ] + 𝜀𝑖1
1
𝑌𝑖2 = [𝛽02 + 𝛽12 𝑧𝑖1 + ⋯ … … … . . +𝛽𝑟2 𝑧𝑖𝑟 ] + 𝜀𝑖2
⋮ ⋮ ⋮ ⋮
𝑌𝑖𝑚 = [𝛽0𝑚 + 𝛽1𝑚 𝑧𝑖1 + ⋯ … … … + 𝛽𝑟𝑚 𝑧𝑖𝑟 ] + 𝜀𝑖𝑚
➢ Now we can construct each matrix component of our multivariate multiple
regression model. We will consider the jth response variable.
𝑌11 𝑌12 ⋯ 𝑌1𝑚 𝑌1𝑗
𝑌 𝑌22 ⋯ 𝑌2𝑚 𝑌
𝑌 = [ 21 ] = [𝑌(1) ⋮ 𝑌(2) ⋮ ⋯ ⋮ 𝑌(𝑚) ], 𝑤ℎ𝑒𝑟𝑒 𝑌(𝑗) = 2𝑗
⋮ ⋮ ⋱ ⋮ ⋮
𝑌𝑛1 𝑌𝑛2 ⋯ 𝑌𝑛𝑚 [𝑌𝑛𝑗 ]
1 𝑧11 𝑧12 ⋯ 𝑧1𝑟
1 𝑧21 𝑧22 ⋯ 𝑧2𝑟
𝑍=[ ]
⋮ ⋮ ⋮ ⋱ ⋮
1 𝑧𝑛1 𝑧𝑛2 ⋯ 𝑧𝑛𝑟
𝛽01 𝛽02 ⋯ 𝛽0𝑚 𝛽0𝑗
𝛽 𝛽12 ⋯ 𝛽1𝑚 𝛽
𝛽 = [ 11 ] = [𝛽(1) ⋮ 𝛽(2) ⋮ ⋯ ⋮ 𝛽(𝑚) ], 𝑤ℎ𝑒𝑟𝑒 𝛽(𝑗) = 1𝑗
⋮ ⋮ ⋱ ⋮ ⋮
𝛽𝑟1 𝛽𝑟2 ⋯ 𝛽𝑟𝑚 [𝛽𝑟𝑗 ]
𝜀11 𝜀12 ⋯ 𝜀1𝑚 𝜀1𝑗
𝜀21 𝜀22 ⋯ 𝜀2𝑚 𝜀2𝑗
𝜀=[ ⋮ ⋮ ⋱ ⋮ ] [𝜀(1) ⋮ 𝜀(2) ⋮ ⋯ ⋮ 𝜀(𝑚) ], 𝑤ℎ𝑒𝑟𝑒 𝜀(𝑗) = [ ⋮ ]
𝜀𝑛1 𝜀𝑛2 ⋯ 𝜀𝑛𝑚 𝜀𝑛𝑗
➢ The 𝑌(𝑗) vectors are column vectors that contain the values of the jth response
variable for each of the n trials or observations.
➢ Similarly, each 𝜀(𝑗) vector contains the random error terms obtained for each
of the n trials when considering the jth response variable.
➢ Each 𝛽(𝑗) vector is comprised of the unknown regression coefficients for the
regression model obtained for the jth response variable.
➢ Z is the design matrix.
2
We can generalize a multiple linear regression model for each response using the
notation above as
𝑌(𝑗) = 𝑍𝛽(𝑗) + 𝜀(𝑗) , 𝑗 = 1,2, … … … . , 𝑚 (3)
Combining each single response model, the following matrix model for multivariate
linear regression can be constructed.
𝑌11 𝑌12 ⋯ 𝑌1𝑚
𝑌 𝑌22 ⋯ 𝑌2𝑚
[ 21 ]
⋮ ⋮ ⋱ ⋮
𝑌𝑛1 𝑌𝑛2 ⋯ 𝑌𝑛𝑚
1 𝑧11 𝑧12 ⋯ 𝑧1𝑟 𝛽01 𝛽02 ⋯ 𝛽0𝑚 𝜀11 𝜀12 ⋯ 𝜀1𝑚
1 𝑧21 𝑧22 ⋯ 𝑧2𝑟 𝛽11 𝛽12 ⋯ 𝛽1𝑚 𝜀21 𝜀22 ⋯ 𝜀2𝑚
=[ ][ ]+[ ⋮ ⋮ ⋱ ⋮ ]
⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋱ ⋮
1 𝑧𝑛1 𝑧𝑛2 ⋯ 𝑧𝑛𝑟 𝛽𝑟1 𝛽𝑟2 ⋯ 𝛽𝑟𝑚 𝜀𝑛1 𝜀𝑛2 ⋯ 𝜀𝑛𝑚
For simplicity, we will write the multivariate linear regression model as,
𝑌 = 𝑍 𝛽 + 𝜀
(𝑛 × 𝑚) (𝑛 × (𝑟 + 1)) (𝑟 + 1) × 𝑚) (𝑛 × 𝑚)
Parameter Estimation
➢ Given the outcomes 𝑌 and the values of the predictor variables Z with full
column rank, we determine the least squares estimates 𝛽̂(𝑗) exclusively from
the observations 𝑌(𝑗) on the jth response.
➢ In conformity with the single-response solution, we take
𝛽̂0𝑗
𝛽̂(𝑗) = 𝛽̂1𝑗 = (𝑍 ′ 𝑍)−1 𝑍 ′ 𝑌(𝑗)
⋮
[𝛽̂𝑟𝑗 ]
➢ Next, we can collect all of the univariate 𝛽̂(𝑗) estimates and form the estimated
parameter matrix for the multivariate case,
3
𝛽̂ = [𝛽̂(1) ⋮ 𝛽̂(2) ⋮ ⋯ ⋮ 𝛽̂(𝑚) ]
Our final result becomes
𝛽̂ = (𝑍 ′ 𝑍)−1 𝑍 ′ [𝑌(1) ⋮ 𝑌(2) ⋮ ⋯ ⋮ 𝑌(𝑚) ] = (𝑍 ′ 𝑍)−1 𝑍 ′ 𝑌
Predicted values: 𝑌̂ = 𝑍𝛽̂ = 𝑍(𝑍 ′ 𝑍)−1 𝑍 ′ 𝑌
Residuals: 𝜀̂ = 𝑌 − 𝑌̂ = [𝐼 − 𝑍(𝑍 ′ 𝑍)−1 𝑍 ′ ]𝑌
Example: Fit the multivariate Regression model for the following data to two
responses are as follows:
𝑧1 0 1 2 3 4
𝑦1 1 4 3 8 9
𝑦2 -1 -1 2 3 2
Solution: The design matrix is
1 0
1 1
1 1 1 1 1
𝑍= 1 2 , 𝑍′ = [ ]
0 1 2 3 4
1 3
[1 4]
1 0
1 1
1 1 1 1 1 5 10
𝑍′𝑍 = [ ] 1 2 =[ ]
0 1 2 3 4 10 30
1 3
[1 4]
|𝑍 ′ 𝑍| = 150 − 100 = 50
1 1 30 −10 0.6 −0.2
|𝑍 ′ 𝑍|−1 = 𝑎𝑑𝑗(𝑍 ′
𝑍) = [ ] = [ ]
|𝑍 ′ 𝑍| 50 −10 5 −0.2 0.1
4
and
1
4
1 1 1 1 1 25
𝑍 ′ 𝑦(1) = [ ] 3 =[ ]
0 1 2 3 4 70
8
[9 ]
−1
−1
1 1 1 1 1 5
𝑍 ′ 𝑦(2) = [ ] 2 =[ ]
0 1 2 3 4 20
3
[2]
0.6 −0.2 25 1
𝛽̂(1) = (𝑍 ′ 𝑍)−1 𝑍 ′ 𝑦(1) = [ ][ ] = [ ]
−0.2 0.1 70 2
0.6 −0.2 5 −1
𝛽̂(2) = (𝑍 ′ 𝑍)−1 𝑍 ′ 𝑦(2) = [ ][ ] = [ ]
−0.2 0.1 20 1
Hence,
1 −1
𝛽̂ = [𝛽̂(1) ⋮ 𝛽̂(2) ] = [ ] = (𝑍 ′ 𝑍)−1 𝑍 ′ [𝑦(1) ⋮ 𝑦(2) ]
2 1
The fitted values are generated from 𝑦̂1 = 1 + 2𝑧1 and 𝑦̂2 = −1 + 𝑧2
Collectively,
1 0 1 −1
1 1 3 0
1 −1
𝑌̂ = 𝑧𝛽̂ = 1 2 [ ]= 5 1
2 1
1 3 7 2
[1 4] [9 3 ]
and
5
1 −1 1 −1 0 0
4 −1 3 0 1 −1
𝜀̂ = 𝑌 − 𝑌̂ = 3 2 − 5 1 = −2 1
8 3 7 2 1 1
[9 2 ] [9 3 ] [ 0 −1]