Statistical Theory Solutions and Proofs
Solution to Theory Part
Question T.1)
Prove that the mean of Z is zero and the variance of Z is 1.
𝑋−µ𝑥
We are given that 𝑍 = , where µx is the mean of X and σx is the standard deviation of
𝜎𝑥
X;
By definition, the mean of z is given by;
𝑋−µ𝑥 1
Mean of Z = 𝐸 [𝑍] = 𝐸 [ ]= (𝐸 [𝑋] − µ𝑥 ) = 0
𝜎𝑥 𝜎𝑥
The variance of Z will be given by;
Var (Z) =𝐸 [𝑍 2 ] − (𝐸 [𝑍])2
𝑋 − µ𝑥 2
= 𝐸 [( ) ]−0
𝜎𝑥
1 2]
𝜎 2𝑋
= 𝐸 [( 𝑋 − µ𝑥 ) = =1
𝜎2𝑋 𝜎 2𝑋
Question T.2)
We are to prove that Var (Y) = 𝑎2 𝜎 2 𝑥 + 𝑏2 𝜎 2 𝑧 + 2𝑎𝑏𝜎𝑋𝑍
Given that;
𝑉𝑎𝑟 (𝑌) = 𝑉𝑎𝑟(𝑎𝑋 + 𝑏𝑍)
By definition of Variance, we have that;
𝑉𝑎𝑟(𝑎𝑋 + 𝑏𝑍) = 𝑎2 𝑉𝑎𝑟(𝑋) + 𝑏2 𝑉𝑎𝑟(𝑍) + 2𝑎𝑏𝐶𝑜𝑣(𝑋, 𝑍)
Using the property of variance for linear combinations of random variables,
𝑉𝑎𝑟(𝑎𝑋 + 𝑏𝑍) = 𝑎2 𝜎 2 𝑥 + 𝑏2 𝜎 2 𝑧 + 2𝑎𝑏𝜎𝑋𝑍
This Justifies that Var(Y)= 𝑎2 𝜎 2 𝑥 + 𝑏2 𝜎 2 𝑧 + 2𝑎𝑏𝜎𝑋𝑍
Question T.3)
We are the derive the coefficient 𝛽 in the linear model 𝑦𝑖 = 𝛽𝑥𝑖 + £ using OLS.
The OLS Objective is to Minimize the sum of squared residuals
𝑚𝑖𝑛 𝛽 ∑𝑖 ℇ𝑖 2 = 𝑚𝑖𝑛 𝛽 ∑𝑖 (𝑦𝑖 − 𝛽𝑥𝑖 )2
For the first order condition, we will take the derivatives with respect to β then set it to Zero.
𝑑
∑(𝑦𝑖 − 𝛽𝑥𝑖 ) = −2 ∑(𝑦𝑖 − 𝛽𝑥𝑖 ) = 0
𝑑𝛽
𝑖 𝑖
We now solve for β
∑ 𝑥𝑖𝑦𝑖 = 𝛽 ∑ 𝑥𝑖 2
𝑖 𝑖
∑𝑖 𝑥𝑖𝑦𝑖
𝛽=
∑𝑖 𝑥𝑖 2
Question T.4)
We are to show that the OLS estimator β is unbiased and state the required assumptions.
Given that the population model 𝑌𝑖 = 𝛼 + 𝛽𝑋𝑖 + ℇ𝑖. The OLS estimator:
(∑𝑖 (𝑌𝑖 − 𝑌 − )(𝑋𝑖 − 𝑋 − )
𝛽=
∑𝑖 (𝑋𝑖 − 𝑋 − )2
The first step is for the unbiased proof, where we substitute;
𝑌𝑖 = 𝛼 + 𝛽𝑋𝑖 + ℇ𝑖 into the estimator.
(∑𝑖 (𝑌𝑖 − 𝑌 − )(𝑋𝑖 − 𝑋 − ) ∑𝑖 ℇ𝑖(𝑋𝑖 − 𝑋 − )
𝛽= = 𝛽 +
∑𝑖 (𝑋𝑖 − 𝑋 − )2 ∑𝑖 (𝑋𝑖 − 𝑋 − )2
We take the expectation, to get the following expression;
∑𝑖 ℇ𝑖(𝑋𝑖 − 𝑋 − )
[ ]
𝐸 𝛽 = 𝛽 +𝐸[ ] = 𝛽 𝑖𝑓 𝐸 [ℇ𝑖 𝑔𝑖𝑣𝑒𝑛 𝑋𝑖 ] = 0
∑𝑖 (𝑋𝑖 − 𝑋 − )2
The required assumptions are as follows;
The linearity, that is, 𝑌𝑖 = 𝛼 + 𝛽𝑋𝑖 + ℇ𝑖
The exogeneity, that is, 𝐸 [ℇ𝑖 𝑔𝑖𝑣𝑒𝑛 𝑋𝑖 ] = 0
Random sampling {(𝑋𝑖, 𝑌𝑖 )} 𝑖𝑠 𝑖. 𝑖. 𝑑
No perfect multicollinearity (for multiple regression).
Application Part
Solution to Question A.1
Part 1: Probability that at least 2 people share the same birthday
To find the probability that at least 2 people in a room of n share the same birthday, it is
easier to calculate the complement, that is, the probability that all n people have unique
birthdays) and subtract it from 1.
We have that total possible birthday combinations: Each person can have any of the 365 days
as their birthday. For n people, the total number of possible birthday combinations is giv
𝟑𝟔𝟓𝒏 .
Number of unique birthday combinations: The first person has 365 choices, the second has
364 (to avoid matching the first), the third has 363, and so on. This is given by the
permutation;
𝑃(365, 𝑛) = 365 ∗ 364 ∗ 363 ∗ … … . .∗ (365 − 𝑛 + 1)
The Probability of all unique birthdays will be given by;
𝑃 (365, 𝑛)
𝑃(𝑈𝑛𝑖𝑞𝑢𝑒) =
365𝑛
Probability of at least one shared birthday
𝑃(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑠ℎ𝑎𝑟𝑒𝑑 ) = 1 − 𝑃(𝑈𝑛𝑖𝑞𝑢𝑒)
= 1 − 𝑃(𝑈𝑛𝑖𝑞𝑢𝑒)
𝑃(365, 𝑛)
=1−
365𝑛
Part 2: Minimum number of participants for at least 40% probability
In this case, we need to determine the smallest n such that;
𝑃(365, 𝑛)
1− ≥ 0.40
365𝑛
We can solve this by testing successive values of n
For n=1, the 𝑃(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑠ℎ𝑎𝑟𝑒𝑑 ) = 0
𝑃(365,10)
For n=10, the 𝑃(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑠ℎ𝑎𝑟𝑒𝑑 ) = 1 − = 11.695%
36510
𝑃(365,19)
For n=19, the 𝑃(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑠ℎ𝑎𝑟𝑒𝑑 ) = 1 − = 37.91%
36519
𝑃(365,20)
For n=20, the 𝑃(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑠ℎ𝑎𝑟𝑒𝑑 ) = 1 − = 41.14%
36520
We have that the smallest n where the probability first exceeds 40% is n=20
Solution to Question A.2
i. Expected Values of X and Y
The first step is to calculate the marginal probabilities for X and Y by summing the joint
probabilities over the other variable.
Marginal probabilities for X
𝑃(𝑋 = −3) = 0.05 + 0 + 0 + 0 + 0.05 = 0.10
𝑃(𝑋 = 5) = 0 + 0.15 + 0 + 0.15 + 0 = 0.30
𝑃(𝑋 = 10) = 0 + 0 + 0.2 + 0 + 0 = 0.20
𝑃(𝑋 = 15) = 0 + 0.15 + 0 + 0.15 + 0 = 0.30
𝑃(𝑋 = 23) = 0.05 + 0 + 0 + 0 + 0.05 = 0.10
Marginal probabilities for Y
𝑃(𝑌 = 7) = 0.05 + 0 + 0 + 0 + 0.05 = 0.10
𝑃(𝑌 = 9) = 0 + 0.15 + 0 + 0.15 + 0 = 0.30
𝑃(𝑌 = 12) = 0 + 0 + 0.2 + 0 + 0 = 0.20
𝑃(𝑌 = 15) = 0 + 0.15 + 0 + 0.15 + 0 = 0.30
𝑃(𝑌 = 17) = 0.05 + 0 + 0 + 0 + 0.05 = 0.10
The second step is to compute the expected values using the marginal probabilities.
𝐸 [𝑋] = (−3)(0.10) + 5(0.30) + 10(0.20) + 15(0.30) + 23(0.10)=10
𝐸 [𝑌] = (7)(0.10) + 9(0.30) + 12(0.20) + 15(0.30) + 17(0.10)=12
Hence E[X]=10 and E[Y]=12
ii. The Variance of X and Y
The first step is to compute 𝐸 [𝑋 2 ] and 𝐸 [𝑌 2 ] using the marginal probabilities.
𝐸 [𝑋 2 ] = (−3)2 (0.10) + 52 (0.30) + 102 (0.20) + 152 (0.30) + 232 (0.10)= 148.8
𝐸 [𝑌 2 ] = 72 (0.10) + 92 (0.30) + 122 (0.20) + 152 (0.30) + 172 (0.10) = 154.4
The next step is to calculate the variances using 𝑉𝑎𝑟(𝑋) = 𝐸 [𝑋 2 ] − (𝐸 [𝑋])2 and also for
𝑉𝑎𝑟(𝑌) = 𝐸 [𝑌 2 ] − (𝐸[𝑌])2
𝑉𝑎𝑟(𝑋) = 148.8 − (102 ) = 148.8 − 100 = 48.8
𝑉𝑎𝑟(𝑌) = 154.4 − (122 ) = 154.4 − 144 = 10.4
iii. Covariance Between X and Y.
The first step in this case is to compute 𝐸 [𝑋𝑌] by summing over all the possible pairs (𝑥, 𝑦)
multiplied by their joint probabilities as follows;
𝐸 [𝑋𝑌] = (−3)(−7)0.05 + (−3)(17)0.05 + 5(9)(0.15) + 5(15)(0.15) + 10(12)(0.2)
+ 15(9)(0.15) + 15(15)(0.15) + 23(7)(0.05) + 23(17)(0.05)
𝐸 [𝑋𝑌] = 122.1
We now calculate the covariance using the formula;
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸 [𝑋𝑌] − 𝐸 [𝑋]𝐸[𝑌]
𝐶𝑜𝑣(𝑋, 𝑌) = 122.1 − (10)(12)
𝐶𝑜𝑣(𝑋, 𝑌) = 2.1
iv. Conditional Probability Distribution Y Given XY≤120
We start by identifying all the all the possible pairs (x,y) where XY≤120 and their joint
probabilities;
(-3,7): (-3)(7)= -21≤120, P=0.05
(-3,17): (-3)(17)= -51≤120, P=0.05
(5,9): (5)(9)= 45≤120, P=0.15
(5,15): (5)(15)= 75≤120, P=0.15
(10,12): (10)(12)= 120≤120, P=0.2
We now estimate the probability P(XY≤120) as follows;
P(XY≤120) =0.05+0.05+0.15+0.15+0.2= 0.6
We now calculate the conditional probability of each Y value as follows;
0.05 + 0.05 1
𝑃(𝑌 = 7|𝑋𝑌 ≤ 120) = =
0.6 6
0.15 1
𝑃(𝑌 = 9|𝑋𝑌 ≤ 120) = =
0.6 4
0.15 1
𝑃(𝑌 = 12|𝑋𝑌 ≤ 120) = =
0.6 4
0.2 1
𝑃(𝑌 = 15|𝑋𝑌 ≤ 120) = =
0.6 3
0.05 + 0.05 1
𝑃(𝑌 = 17|𝑋𝑌 ≤ 120) = =
0.6 6
Hence the conditional probability of Y given XY≤120 is
1
𝑃(𝑌 = 7|𝑋𝑌 ≤ 120) =
6
1
𝑃(𝑌 = 9|𝑋𝑌 ≤ 120) =
4
1
𝑃(𝑌 = 12|𝑋𝑌 ≤ 120) =
4
1
𝑃(𝑌 = 15|𝑋𝑌 ≤ 120) =
3
1
𝑃 (𝑌 = 17|𝑋𝑌 ≤ 120) =
{ 6
Solution to Question A.3
i. Calculate 𝑷(𝟗. 𝟎𝟓 < 𝑿 ≤ 𝟕𝟑. 𝟐𝟕)
We are given that 𝑋~𝑁(µ𝑥 = 9, 𝜎 2 𝑥 = 361) so σx= 19
We standardize the bounds as follows;
9.05 − 29
𝑍1 = = −1.05
19
73.27 − 29
𝑍2 = = 2.33
19
We use the standard normal tables to obtain the following;
𝑃(𝑍 ≤ −1.05) = 0.1469
𝑃(𝑍 ≤ 2.33) = 0.9901
We now calculate the probability as follows;
𝑷(𝟗. 𝟎𝟓 < 𝑿 ≤ 𝟕𝟑. 𝟐𝟕) = 0.9901 − 0.1469 = 0.8432
ii. Distribution of Xˉ for nx=17
The sample mean Xˉ follows as a normal distribution;
𝜎2 𝑥 361
𝑋~𝑁(µ𝑥 = 29, = )
𝑛𝑥 17
iii. Covariance and Correlation between X and Y
Given that µx=29, µy=25, σx=19, σy=23 and E[XY]=441
The covariance; 𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸 [𝑋𝑌] − µ𝑋µ𝑋 = 441 − (29)(25) = −284
The correlation formula is given by;
𝐶𝑜𝑣(𝑋, 𝑌) −284
ƿ= = = −0.65
𝜎𝑋𝜎𝑌 (19)(23)
iv. Mean and Variance of W=2X−3Y
Mean =E[W]=2µ𝑋 − 3µ𝑌 = 2(29) − 3(25) = −17
Variance= 𝑉𝑎𝑟(𝑊 ) = 4𝜎 2 𝑥 + 9𝜎 2 𝑦 − 12𝐶𝑜𝑣(𝑋, 𝑌)
= 4(361)+9(529)−12(−284)
=1444+4761+3408
=9613.
Solution to Question A.4
𝟑𝟗 𝟑
a. We are to calculate 𝑷(− < 𝑿− ≤ − )
𝟕 𝟕
𝜎2 81 9
We are given that 𝑋~𝑁 (µ𝑥 = −3, = 49 ) implying that 𝜎𝑥 = 7
𝑛
We standardize the bounds as follows;
39
− − (−3)
𝑍1 = 7 = −2
9
7
3
− 7 − (−3)
𝑍2 = =2
9
7
We use the standard normal tables to obtain the following;
𝑃(𝑍 ≤ −2) = 0.0228
𝑃(𝑍 ≤ 2) = 0.9772
We now calculate the probability as follows;
𝟑𝟗 𝟑
𝑷 (− < 𝑿− ≤ − ) = 0.9772 − 0.0228 = 0.9544
𝟕 𝟕
b. We are to find a such that 𝑷(−𝟑 − 𝒂 < 𝑿− ≤ −𝟑 + 𝒂) = 𝟗𝟖%
The given symmetry in the above equation implies that;
𝑃(𝑋 − ≤ −3 + 𝑎) = 0.99
The Z value for 99% confidence Z ≈ 2.33
We now solve for a:
9
𝑎 = 𝑍 ∗ 𝜎𝑋 − = 2.33 ∗ = 3.00
7
c. 98% Confidence Interval for μ (Unknown Variance)
Given that n=41, 𝑋 − = −5, 𝑆 = 10 𝑎𝑛𝑑 𝛼 = 0.002
We use the t distribution with df=n-1=40; t0.01=2.42
The margin of error (MOE):
𝛼 𝑠 10
𝐸 = 𝑡( )∗ = 2.42 ∗ = 3.78
2 √𝑛 √41
The confidence interval will be given by;
5 ± 3.78 = (−8.78, −1.22)
d. One-Tailed Test for H0:μ≥−3 at α=1%
The test statistic;
𝑋 − − µ𝑜 −5 − (−3)
𝑡= = = −1.28
𝑆 10
√𝑛 √40
The critical value (left-tailed): t0.01,40≈ -2.42
The decision is that we fail to reject the null hypothesis as the test statistic (-1.28) is greater
than the critical value at 1% level of significance.
Solution to Question A.5
Part i) Two-tailed test for 𝑯𝒐: µ𝑿 = µ𝒀 at α=1%
We are given that;
Young Patients (X): 𝑋 − = 1, 𝑆 2 𝑥 = 0.5, 𝑛𝑥 = 61
The Old Patients (Y): 𝑌 − = 0.7, 𝑆 2 𝑦 = 0.45, 𝑛𝑦 = 41
And that 𝜎 2 𝑥 = 𝜎 2 𝑦 (𝑒𝑞𝑢𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑠)
The pooled variance will be given by the following;
(𝑛𝑥 − 1)𝑆 2 𝑥 + (𝑛𝑦 − 1)𝑆 2 𝑦 (60 ∗ 0.5) + (40 ∗ 0.5)
𝑆2𝑝 = = = 0.48
𝑛𝑥 + 𝑛𝑦 − 2 100
The test statistic will be given by;
𝑋 − − 𝑌− 1 − 0.7
𝑡= = = 2.78
1 1
√𝑆 2 𝑝 ( √0.48 ( 1 + 1 )
𝑛𝑥 𝑛𝑦)
+
61 41
The critical Value;
Degrees of freedom will be given by;
𝑛𝑥 + 𝑛𝑦 − 2 = 100
For α=1%, two tailed, t critical=2.626
Since the absolute value of t, that is |t|, is greater than Critical value (2.78>2.626), we reject
the null hypothesis;
Part ii) Conclusion for part i)
There is significant evidence at the 1% level to conclude that the mean plasma concentration
differs between young and old patients.
Part iii) Two-tailed test for 𝑯𝒐: 𝒑𝒙 = 𝒑𝒚 at α=1%.
We are given that;
10
Young patients: 10/61 had side effects (𝑃− 𝑥 = = 0.164)
6
17
Old patients: 17/41 had side effects (𝑃− 𝑦 = 41 = 0.415)
The Pooled proportion will be given by;
10 + 17 27
𝑃− = = = 0.265
61 + 41 102
The test statistics, Z test will be given by;
𝑃− 𝑥 − 𝑃− 𝑦 0.164 − 0.415
𝑍= = ≈ −3.07
1 1 1 1
√(𝑃− (1 − 𝑝) (𝑛𝑥 + 𝑛𝑦)) √(0.265 ∗ 0.735) ( + )
61 41
The critical value for α=1%, two tailed, Zcritical= ±2.576.
The decision is that since |z|=3.07>2.576, we reject the null hypothesis.
Part iv) Conclusion for part iii)
There is significant evidence at the 1% level to conclude that the proportion of patients with
side effects differs between young and old patients.
Solution to Question A.6
We are to find the probability the patient actually has the disease given a positive test
result is approximately
Given that P(D)=0.01 and that P(Positive|D)=0.95
Sensitivity (P(Test+|D): 95%(0.95)
False Positive Rate (P(Test +|¬D): Not given, in this case, we will assume 5% (0.05) because
it is not specified;
We need to find P(D| Test +), that is the probability that the patient has the disease given a
positive test result.
We first calculate the probabilities as follows;
P(D) = 0.01
P(¬D) = 1 - P(D) = 0.99
P(Test+ | D) = 0.95
P(Test+ | ¬D) = 0.05 (assuming)
We apply Bayes’ Theorem as Shown;
𝑃(𝑇𝑒𝑠𝑡 + |𝐷) ∗ 𝑃(𝐷)
𝑃(𝐷|𝑇𝑒𝑠𝑡 +) =
𝑃 (𝑇𝑒𝑠𝑡 +)
Where 𝑃(𝑇𝑒𝑠𝑡 +) = 𝑃(𝑇𝑒𝑠𝑡 + |𝐷) ∗ 𝑃(𝐷) +P(Test+ | ¬D)* P(¬D)
We compute 𝑃(𝑇𝑒𝑠𝑡 +)
𝑃(𝑇𝑒𝑠𝑡 +) = (0.95 × 0.01) + (0.05 × 0.99)
=0.0095+0.0495
=0.059
0.95∗0.01 0.0095
We compute 𝑃(𝐷|𝑇𝑒𝑠𝑡 +) = = ≈ 0.161 or 16.1%
0.059 0.059
The probability that the patient actually has the disease given a positive test result is
approximately 16.1%.
Solution to Question A.7
Binomial distribution: 𝑿~𝑩𝒊𝒏𝒐𝒎𝒊𝒂𝒍(𝒏 = 𝟏𝟎𝟎, 𝒑 = 𝟎. 𝟎𝟓)
I. Probability of at least one success in 100 trials:
Calculate the probability of no successes:
𝑃(𝑋 = 0) = (1 − 𝑝)𝑛 = 0.95100
Subtract from 1 to get P(X≥1)
𝑃(𝑋 ≥ 1) = 1 − 0.95100
= 1 − 0.00592 ≈ 0.9941 𝑜𝑟 99.41%
The probability of at least one success in 100 trials is approximately 99.41%.
ii) Probability of at least one success over one time unit:
Assuming that the 100 trials are uniformly distributed over one time unit (e.g., one hour), the
number of successes in one time unit can be modeled by a Poisson distribution with rate;
𝜆 = 𝑛 ∗ 𝑝 = 100 ∗ 0.05 = 5
The probability of at least one success in one time unit is:
𝑃(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑠𝑢𝑐𝑐𝑒𝑠𝑠) = 1 − 𝑃 (𝑛𝑜 𝑠𝑢𝑐𝑐𝑒𝑠𝑠) = 1 − 𝑒 −𝜆
= 1 − 𝑒 −5 = 1 − 0.0067
The probability of at least one success over one time unit is approximately 0.9933
iii) Probability that the time between two successes is less than one time unit:
The time between successes in a Poisson process follows an exponential distribution with
rate λ=5. The probability that the time between two successes is less than one time unit is:
𝑃(𝑇 > 1) = 1 − 𝑒 −𝜆∗1
1 − 𝑒 −5∗1 = 0.9933
The probability that the time between two successes is less than one time unit is
approximately 0.9933
iv) Comparison of probabilities from part ii and part iii.
The probabilities in part ii and part iii are the same because both scenarios are described by
the same Poisson process, where the probability of an event occurring within a time interval
is equal to the probability that the waiting time between events is less than that interval.
Solution to Question A.8
Given that;
Old Train (X): 22, 23, 22, 24, 21, 20, 20, 23, 21, 21
New Train (Y): 20, 20, 21, 22, 17, 18, 19, 20, 22, 19
The summary statistics is given by;
Sample Size 𝑛𝑥 = 10 and 𝑛𝑦 = 10
The sample means are;
22 + 23 + 22 + 24 + 21 + 20 + 20 + 23 + 21 + 21
𝑋− = = 21.7
10
20 + 20 + 21 + 22 + 17 + 18 + 19 + 20 + 22 + 19
𝑌− = = 19.8
10
Train Old Train (X) New Train (Y)
1 22 20
2 23 20
3 22 21
4 24 22
5 21 17
6 20 18
7 20 19
8 23 20
9 21 22
10 21 19
Sample Mean 21.7 19.8
Sample Variance 1.78889 2.62222222
i) One-Tailed Test with Known Variances
Null Hypothesis: 𝐻𝑜: µ𝑋 ≤ µ𝑌
Alternative Hypothesis: 𝐻1: µ𝑋 > µ𝑌
Significance level α=0.01
The Test statistic;
(𝑋 − − 𝑌 − ) 21.7 − 19.8
𝑍= = = 3.035
𝜎 2𝑥 𝜎 2 𝑦
√ √1.96 + 1.96
𝑛𝑥 + 𝑛𝑦 10 10
The critical Value Z0.01=2.326 (for right tailed test)
The decision is to reject the null hypothesis since 3.035>2.326 and we conclude that the new
trains run faster than the old trains at the 1% significance level.
ii) One-Tailed Test with Unknown but Equal Variances
Null Hypothesis: 𝐻𝑜: µ𝑋 ≤ µ𝑌
Alternative Hypothesis: 𝐻1: µ𝑋 > µ𝑌
Significance level α=0.01
The Pooled Variance will be given by the expression;
2
(𝑛𝑋 − 1)𝑆 2 𝑋 + (𝑛𝑌 − 1)𝑆 2 𝑌 9 ∗ 1.789 + 9 ∗ 2.622
𝑆 𝑝= = = 2.206
𝑛𝑋 + 𝑛𝑌 − 2 (10 + 10 − 2)
The test statistic;
𝑋 − − 𝑌− 1.9
𝑡= = = 2.860
1 1 √2.206 ∗ 0.2
(𝑆𝑝√( + )
𝑛𝑥 𝑛𝑦
Degrees of Freedom: df=nX+nY−2=18
Critical Value t0.01,18=2.552
The decision is to reject the null hypothesis because the test statistic (2.860) is greater than
the critical value (2.552). The conclusion is that the new trains run faster than the old trains at
the 1% significance level.
iii) Two-Tailed Test with Unknown but Equal Variances
Null Hypothesis: 𝐻𝑜: µ𝑋 = µ𝑌
Alternative Hypothesis: 𝐻1: µ𝑋 ≠ µ𝑌
Significance level α=0.01
The test statistics is the same as the one in part (ii) which is 2.860
The critical value t 0.005,18=±2.878
Since the test statistic (2.860) is less than the critical value (2.878), we fail to reject the null
hypothesis and conclude that there is no significant difference in speeds between the old and
new trains at the 1% significance level.
iv) Conclusions from Part ii and Part iii
Part ii (One-Tailed): The new trains are faster than the old trains.
Part iii (Two-Tailed): The speeds of the old and new trains are equal.
These conclusions are inconsistent because failing to reject H0 in the two-tailed test does not
align with the one-tailed result indicating the new trains are faster.
v) Dilemma Intuition
The dilemma arises here because the tests lead to inconsistent conclusions. A dilemma is that
the one-tailed test rejects H0 while the two-tailed test does not, suggesting ambiguity in the
direction of the effect. In this case, the results are not clear.
Solution to Question A.9)
i) Probability of gaining a unit of invertebrate from one die:
The die has 6 faces: invertebrate, seed, fish, fruit, rodent, and "invertebrate or seed".
1
The probability of rolling "invertebrate" is 6
1
The probability of rolling "invertebrate or seed" is 6 and the player can choose to invertebrate.
1 1 2 1
The probability will be + = =
6 6 6 3
1
The answer =
3
ii) Probability of at least two uppermost faces being rodents
1
The probability of rolling "rodent" on one die is
6
We use the complement rule such that;
𝑃(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 2 𝑟𝑜𝑑𝑒𝑛𝑡𝑠) = 1 − 𝑃(0 𝑟𝑜𝑑𝑒𝑛𝑡𝑠) − 𝑃(1 𝑟𝑜𝑑𝑒𝑛𝑡)
5 6
𝑃(0 𝑟𝑜𝑑𝑒𝑛𝑡𝑠) = ( ) = 0.4019
6
4
5 1 1 5
𝑃(1 𝑟𝑜𝑑𝑒𝑛𝑡𝑠) = ( ) ( ) ( ) = 0.4019
1 6 6
Therefore, 𝑃(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 2 𝑟𝑜𝑑𝑒𝑛𝑡𝑠) = 1 − 0.4019 − 0.4019 = 0.1962
iii) Expected value of X (maximal units of rodent)
X can only be 0,1, or 2(Since the player picks 2 dice)
5 5
𝑃(𝑋 = 0): No rodent in any of the five dice (6) = 0.4019
𝑃(𝑋 = 1); At least one rodent, but no two rodents in the top two= 0.4019
𝑃(𝑋 = 2) = 1 − 0.4019 − 0.4019 = 0.1962
The Formula for the Expected Value will be given by;
𝐸 [ 𝑋 ] = 0 ∗ 𝑃 (𝑋 = 0) + 1 ∗ 𝑃 ( 𝑋 = 1) + 2 ∗ 𝑃 (𝑋 = 2)
𝐸 [𝑋] = 0 ∗ 0.4019 + 1 ∗ 0.4019 + 2 ∗ 0.1962 = 0.7943
E[X]= 0.7943
iv) Expected value of YY (maximal units of seed)
Y is the maximal number of seed units the player can gain by selecting any two dice from the
five. The player can gain seed from:
Possible values of Y:
0: No seed-contributing faces.
1: At least one seed-contributing face, but not two.
2: At least two seed-contributing faces.
We calculate the probabilities:
2 5
𝑃(𝑌 = 0) = ( ) = 0.1317
3
1
5 1 2 4
( )
𝑃 𝑌 = 1 = ( ) ( ) ( ) = 0.3291
1 3 3
𝑃(𝑌 = 2) = 1 − 𝑃 (𝑌 = 0) − 𝑃(𝑌 = 1) = 1 − 0.1317 − 0.3291 = 0.5392
𝐸 [𝑋] = 0 ∗ 0.1317 + 1 ∗ 0.3291 + 2 ∗ 0.5392
E[X]= 1.4075
Solution to Question A.10)
i) Name of the Analysis Technique
The analysis technique used is Multiple Linear Regression.
ii) Interpretation of Coefficients
The coefficients represent the estimated effect of each independent variable on the housing
price, holding other variables constant:
sqr_feet: For each additional square foot, the housing price increases by
approximately $40.051 thousand.
age: For each additional year of age, the housing price decreases by approximately
$1.218 thousand.
dtown_dum: Being close to downtown Boston increases the housing price by
approximately $250.628 thousand.
crime_rate: A one-unit increase in the crime rate decreases the housing price by
approximately $75.699 thousand.
educ_qual: A one-unit increase in school quality increases the housing price by
approximately $4.565 thousand.
The intercept (_cons) is $150.742 thousand, representing the baseline price when all other
variables are zero.
iii) Defending the Estimation Results
To support the validity of the results:
Statistical Significance: All coefficients have p-values of 0.000, indicating they are
statistically significant at conventional levels.
Model Fit: The high R-squared (0.756) suggests that 75.6% of the variation in
housing prices is explained by the model.
Assumptions: The model assumes linearity, independence of errors,
homoscedasticity, and normality of residuals. Diagnostic tests (e.g., residual plots)
would be needed to verify these.
iv) Range of Possible Effects
The 95% confidence intervals for each coefficient provide a range of plausible effects:
sqr_feet: [35.202, 44.90] thousand dollars per square foot.
age: [-1.2894, -1.1473] thousand dollars per year.
dtown_dum: [209.468, 291.788] thousand dollars.
crime_rate: [-81.7312, -69.6666] thousand dollars per unit.
educ_qual: [3.872, 5.257] thousand dollars per unit.
These intervals reflect uncertainty in the estimates.
v) Defending Model Validity
To address concerns about the model:
Explanatory Power: The high R-squared and adjusted R-squared indicate strong
explanatory power.
F-statistic: The significant F-statistic (Prob > F = 0.0000) confirms the overall model
fit.
Robustness: Sensitivity analyses (e.g., adding or removing variables) could be
performed to ensure results are robust.
vi) Predicting Housing Price
For an apartment with:
sqr_feet = 50,
age = 10,
dtown_dum = 1,
crime_rate = 0.2,
educ_qual = 9,
The predicted price is calculated as:
𝑦 − = 150.742 + 40.051(50) − 1.218(10) + 250.628(1) − 75.699(0.2) + 4.565(9)
𝑦 − = 150.742 + 2002.55 − 12.18 + 250.628 − 15.1398 + 41.085
𝑦 − = 2417.6852 thousand dollars
Thus, the estimated price is approximately $2,417,685.2
Solution to Bonus Question [15 Points]
a. Z if BVB does not go bankrupt (𝑿 = 𝟎)
Then Z=Y
b. Z if BVB does goes bankrupt (𝑿 = 𝟏)
Then Z=R*Y
Part (ii) on the Conditional probability 𝑷(𝒁 = 𝒛|𝑿 = 𝟎)
Since Z=Y, the distribution is the same as Y
𝑃(𝑍 = 0.5𝐼 ) = 0.1
𝑃(𝑍 = 𝐼 ) = 0.7
𝑃(𝑍 = 0.5𝐼 ) = 0.2
Part (iii) on the Conditional probability 𝑷(𝒁 = 𝒛|𝑿 = 𝟏)
Z=R*Y, where R is 1 with probability 0.4 and 0 otherwise.
Possible outcomes:
𝑍 = 0; (𝑖𝑓 𝑅 = 0); 𝑃 = 0.6
𝑍 = 0.5𝐼; (𝑖𝑓 𝑅 = 1 𝑎𝑛𝑑 𝑌 = 0.5𝐼 ); 𝑃 = 0.4 ∗ 0.1 = 0.01
𝑍 = 𝐼; (𝑖𝑓 𝑅 = 1 𝑎𝑛𝑑 𝑌 = 𝐼 ); 𝑃 = 0.4 ∗ 0.7 = 0.28
𝑍 = 1.5𝐼; (𝑖𝑓 𝑅 = 1 𝑎𝑛𝑑 𝑌 = 1.5𝐼 ); 𝑃 = 0.4 ∗ 0.2 = 0.08
Part (iv) on the unconditional probability 𝑷(𝒁 = 𝒛)
We use the law of total probability as follows;
𝑃(𝑍 = 𝑧) = 𝑃 (𝑍 = 𝑧|𝑋 = 0)𝑃(𝑋 = 0) + 𝑃(𝑍 = 𝑧|𝑋 = 1)𝑃(𝑋 = 1)
𝑃(𝑋 = 0) = 0.7, 𝑃(𝑋 = 1) = 0.3
We then calculate for each 𝑧;
𝑃 (𝑍 = 0) = 0 + 0.6 ∗ 0.3 = 0.18
𝑃 (𝑍 = 0.5𝐼 ) = 0.1 ∗ 0.7 + 0.04 ∗ 0.3 = 0.082
𝑃(𝑍 = 𝐼 ) = 0.7 ∗ 0.7 + 0.28 ∗ 0.3 = 0.574
𝑃 (𝑍 = 1.5𝐼 ) = 0.2 ∗ 0.7 + 0.08 ∗ 0.3 = 0.162