FACULTY OF NATURAL SCIENCES
DEPARTMENT OF MATHEMATICAL SCIENCES AND COMPUTING
SICK TEST 1
LINEAR REGRESSION & MULTIVARIATE STATISTICS
STA37W1
EXAMINER : MR. I. JUBANE
INTERNAL MODERATOR : DR O. BODHLYERA
DATE : 09 MAY 2025
MARKS : 78
This test consists of 9 pages including the cover page
INSTRUCTIONS
1. Write neatly and eligibly.
2. Once completed, scan all pages into a single PDF file.
3. Do not upload image files (e.g., JPG, PNG). Only one PDF file will be accepted.
4. Ensure that your scanned PDF is clear, and that all pages are visible. Use a well-lit environment
when using a camera.
5. Each page of your test must clearly display your full name and student number.
6. Begin your first page with a cover section that includes your full name, student number, and the
title of the test.
7. Clearly number your answers according to the question numbers (e.g., 1.1, 1.2, etc.).
8. Read each question carefully and answer only what is asked. Avoid adding unnecessary
information.
9. Unless otherwise specified, round all numerical answers to four decimal places.
QUESTION ONE [12 MARKS]
Consider a regression problem in which, for each value x of a certain variable X, the random variable
Y has the normal distribution with mean x and variance 2 , where
the values of and 2 are unknown. Suppose that n independent pairs of observations (xi, Yi) are
obtained.
1.1 Show that the M.L.E. of is ˆ
xYi i i
[6]
x i
2
i
1.2 Given that the measured values of x and y are given in the table below,
x 1.9 0.8 1.1 0.1 -0.1 4.4 4.6 1.6 5.5 3.4
y 0.7 -1 -0.2 -1.2 -0.1 3.4 0 0.8 3.7 2
Determine the values of and var(ˆ). [6]
QUESTION TWO [15 MARKS]
Let Y ~ MVNX , 2I , where X is an n p matrix with linearly independent columns. For least
squares estimation, recall that ˆ (I - H)Y .
2.1 What is the distribution of (I H)Y , where H is the projection onto the column space of X?
[6]
2.2 What is the distribution of Y(I H)Y ?
2
[6]
1
2.3 Evaluate E [( YAY ) 2 )2 ] for A (I X(XX)1 X). [5]
n p 2
Derive
2.4 E [ˆ] [3]
2.5 cov[ˆ] [6]
2.6 cov ˆ, HY [6]
QUESTION THREE [16 MARKS]
3.1 Let Yi 0 1xi i , (i 1,, n ) where [] 0 and var 2I . Show that the least
squares estimates of 0 and 1 are uncorrelated if and only if x 0. [4]
3.2 What does BLUE mean? Show that the β̂ is a BLU estimator β . [6]
3.3 Suppose that 1 2 p1 0. Express the F-statistic in terms of R 2 and hence show
that
p 1
[R 2 ] .
n 1
Show all working. [6]
QUESTION FOUR [20 MARKS]
Let
2 1 2 1
1 21 22 2
1 1 2 3
0 21 32 4
where i iid N (0, 2 ), i 1,2,3, 4.
4.1 Write the regression model in matrix notation. [1]
4.2 Estimate βˆ . [5]
4.3 Write down the hat matrix, H and show that H is an idempotent matrix, hence test the
hypothesis H0 : 2 1, [8]
4.4 Derive the F-statistic for testing the hypothesis H0 : 1 2 . [6]
QUESTION FIVE [15 MARKS]
An experiment involved a quantitative analysis of factors found in high-density lipoprotein (HDL) in
a sample of human blood serum. The dataset consists of 22 observations in which three variables
thought to be predictive of or associated with HDL measurements were recorded: total cholesterol,
total triglyceride concentration, and the presence or absence of a sticky component called sinking
pre-beta (SPB). The data correspond specifically to samples where SPB was absent.
963 0.6165839376 0.0017115921 0.0008212202
Given that X Y 250611 , X X 0.0017115921 0.6165839376 8.951846106
141832 0.0008212202 8.951846106 2.155576105
and
F-statistic: 0.4394 on 2 and 19 DF, p-value: 0.6508
Calculate the R 2 value and obtain a 95% confidence interval for the mean HDL level when total
cholesterol is 250 and total triglyceride concentration is 100. [15]