[go: up one dir, main page]

0% found this document useful (0 votes)
60 views76 pages

Stats 2 Week 5 8 Paga

The document contains solutions to various statistical problems related to random variables, including Bernoulli, Uniform, and Normal distributions. It provides detailed calculations for marginal densities, expected values, and variances, along with multiple-choice questions and answers. The content is aimed at students studying statistics for data science at the Indian Institute of Technology Madras.

Uploaded by

amritkrkr6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views76 pages

Stats 2 Week 5 8 Paga

The document contains solutions to various statistical problems related to random variables, including Bernoulli, Uniform, and Normal distributions. It provides detailed calculations for marginal densities, expected values, and variances, along with multiple-choice questions and answers. The content is aimed at students studying statistics for data science at the Indian Institute of Technology Madras.

Uploaded by

amritkrkr6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

lOMoARcPSD|49399442

Stats 2 Week 5-8 PA+GA 🔥

Programming and data science (Indian Institute of Technology Madras)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)
lOMoARcPSD|49399442

Statistics for Data Science - 2

Week 5 Practice Assignment Solution

1. Let X ∼ Bernoulli(0.6). Let (Y | X = 0) ∼ Exp(1) and (Y | X = 1) ∼ Exp(3). Find


the marginal of Y.

a) 0.6e−y + 0.4e−3y
b) 0.4e−y + 0.6e−3y
c) 0.6e−y + 1.2e−3y
d) 0.4e−y + 1.8e−3y

Solution:
Given that, X ∼ Bernoulli(0.6), therefore pX (1) = 0.6 and pX (0) = 0.4.
The marginal density of Y is given by

fY (y) = pX (x)fY |X=x (y)
x∈TX

= pX (1)fY |X=1 (y) + pX (0)fY |X=0 (y)


= 0.6 × 3e−3y + 0.4e−y
= 1.8e−3y + 0.4e−y

2. Let X ∼ Uniform{1, 2, 3}. Let (Y | X = 1) ∼ Exp(1), (Y | X = 2) ∼ Exp(2) and


(Y | X = 3) ∼ Normal(0, 4). What is the marginal of Y ?
1 2
a) e−y + 2e−2y + √ e−y /8
2 2π
1 −y 1 2
b) [e + 2e−2y + √ e−y /8 ]
3 2 2π
1 1 2
c) [e−y + e−2y + √ e−y /4 ]
3 2π
1 2
d) e−y + e−2y + √ e−y /4
2 2π
Solution:
Given that, X ∼ Uniform{1, 2, 3}, therefore pX (1) = pX (2) = pX (3) = 31 .

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

The marginal density of Y is given by



fY (y) = pX (x)fY |X=x (y)
x∈TX

= pX (1)fY |X=1 (y) + pX (2)fY |X=2 (y) + pX (3)fY |X=3 (y)


2
1 1 1 e−y /8
= × e−y + × 2e−2y + × √
3 3 3 2 2π
1 1 2
= [e−y + 2e−2y + √ e−y /8 ]
3 2 2π

3. Let X ∼ Uniform{1, 2}. Let (Y | X = 1) ∼ Exp(2) and (Y | X = 2) ∼ Exp(4). Find


the value of fX|Y =3 (2).

2e−12
a)
e−6 + 2e−12
e−6
b) −6
e + 2e−12
e−12
c) −6
e + e−12
e−6
d) −6
e + e−12
Solution:
Given that, X ∼ Uniform{1, 2}, therefore pX (1) = pX (2) = 21 .
The marginal density of Y is given by

fY (y) = pX (x)fY |X=x (y)
x∈TX

= pX (1)fY |X=1 (y) + pX (2)fY |X=2 (y)


1 1
= × 2e−2y + × 4e−4y
2 2
= e−2y + 2e−4y

And
pX (2)fY |X=2 (3)
fX|Y =3 (2) =
fY (3)
1
2
× 4e−4×3
= −2×3
e + 2e−4×3
2e−12
= −6
e + 2e−12

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

FX (20) = 1 − e−0.1×20 = 1 − e−2


The probability that athlete takes more than 10 minutes but less than 20 minutes to
complete race is FX (20) − FX (10) = e−1 − e−2 = 0.232 approximately.

10.
5 The PDFs of random variables X1, X2, X3, X4, and X5 are shown in Figure 4.2.P.
Based on the information, choose the correct option(s) from below.

Figure 4.2.P: PDF of Normal Distributions for different variables.

(a) E(X1) ≈ E(X5) < E(X2) < E(X4) < E(X3)


(b) E(X1) < E(X5) < E(X2) < E(X4) < E(X3)
(c) E(X1) < E(X5) = E(X2) < E(X4) < E(X3)
(d) Var(X1) < Var(X3) < Var(X4) < Var(X5)
(e) Var(X1) ≈ Var(X2) < Var(X3) < Var(X4) < Var(X5)

Answer: a, d, and e
Solution:
We know that in the PDF of normal distribution, the peak value occurs at mean.
E[X] = µ(mean)
Also, the value of PDF at mean is inversely proportional to standard deviation
1
Since, fX (µ) = √ .
2πσ
The peak value, which is mean or E[X], of PDF occurs approximately for X1, X2, X3, X4,
and X5 at -10, 0, 20, 10, and -10 respectively.
Therefore, E(X1) ≈ E(X5) < E(X2) < E(X4) < E(X3)
The peak value (fX (µ)) for variables X1, X2, X3, X4, and X5 are such that fX1 (µ) ≈
fX2 (µ) > fX3 (µ) > fX4 (µ) > fX5 (µ).
Therefore, Var(X1) ≈ Var(X2) < Var(X3) < Var(X4) < Var(X5)
Hence, options a, d, and e correct.

7
3

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

11.
6 The PDF of a continuous random variable is given as

4x3 0 ≤ x ≤ 1
fX (x) =
0 otherwise
� xa+1
What is the value of Var(X)? ( xa dx = )
a+1
1
(a)
75
2
(b)
75
3
(c)
75
4
(d)
75
Answer: b
We know� that Var(X) = E[X 2 ] − E[X]2
E[X] = xfX (x)dx
�1
E[X] = 0 x ∗ 4x3 dx
�1
⇒ E[X] = 0 4x4 dx

4x5 1
⇒ E[X] = |
5 0

4 4
⇒ E[X] = 5
−0= 5

E[X 2 ] = x2 fX (x)dx
�1
E[X] = 0 x ∗ 4x4 dx
6
⇒ E[X] = 4x6 |10

4 2
⇒ E[X] = 6
−0= 3

Therefore, Var(X) = E[X 2 ] − E[X]2

2
Var(X) = 3
− ( 54 )2

2 16
Var(X) = 3
− 25

2
Var(X) = 75

12.
7 Let X ∼ Uniform(a1 , b1 ) and Y ∼ Uniform(a2 , b2 ). Based on this information, choose
the correct option(s) from below.

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

(a) If b2 − a2 = b1 − a1 , then Var(X) = Var(Y ).


(b) If b2 + a2 = b1 + a1 , then Var(X) = Var(Y ).
(c) If b2 − a2 = b1 − a1 , then E(X) = E(Y ).
(d) If b2 − b1 = a1 − a2 , then E(X) = E(Y ).
Answer: a and d
Solution:
We know that mean (E(X)) and Variance (Var(X)) of uniform random variable (X ∼
a+b (b − a)2
Uniform(a, b) is and respectively.
2 12
Given X ∼ Uniform(a1 , b1 ) and Y ∼ Uniform(a2 , b2 ),
a1 + b 1 a2 + b2
E(X) = , E(Y ) = . So, for E(X) to be equal to E(Y ), a1 + b1 = a2 + b2
2 2
or b2 − b1 = a1 − a2 . Hence option d is correct and option c is incorrect.
(b1 − a1 )2 (b2 − a2 )2
Similarly for Var(X) to be equal to Var(Y ), = or b1 −a1 = b2 −a2 ,
12 12
hence option a is correct and option b is incorrect.
13. The CDF of a random variable X is given as:
10

0 x x<0


FX (x) = 0 ≤ x ≤ ln 2

 ln 4
1 − e−x ln 2 ≤ x < ∞

Derivative formulas required to solve the problem:


d(ax)
=a
dx
d(e−ax )
= −ae−ax
dx
The PDF of the random variable X is:


 0 x<0
1

(a) fX (x) = 0 ≤ x < ln 2

 ln 4
ln 2 ≤ x < ∞
 −x
e

0
 x<0
(b) fX (x) = 1 0 ≤ x < ln 2

ln 2 ≤ x < ∞
 −x
e


 0 x<0
1

(c) fX (x) = 0 ≤ x ≤ ln 2

 ln 2
ln 2 < x < ∞
 −x
e

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


5
lOMoARcPSD|49399442

fXY (x, 1)
fX|Y =1 (x) =
fY (1)
x + x×1
2
= 1 1 3
3
+ ×1
� 6 x�
= 2 x2 +
3

� � � 1/2 �
1 1 x�
P <X< |Y =1 = 2 x2 + dx
4 2 1/4 3
� 3 ��1/2
x x2 ��
=2 +
3 6 �

1/4
�� � � ��
1 1 1 1
=2 + − +
24 24 192 96
� �
1 1
=2 −
12 64
13
=
96

12

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

4. If X ∼ Normal(10, 25), what is the value of E[2X 2 ]?


8
Answer: 250

Solutions:
Given E[X]=10, Var(X)=25
We know that Var(X)= E[X 2 ] − E[X]2
⇒ E[X 2 ] = Var(X) + E[X]2
⇒ E[X 2 ] = 25 + 102 = 125
We know thatE[cX] = cE[X], where c is a constant.
⇒ E[2X 2 ] = 2E[X 2 ]
⇒ E[2X 2 ] = 2 × 125 = 250
9
5. If X ∼ Normal(10, 4), then what is the value of P (X ≥ 8|X ≤ 9)? Use the standard
normal distribution tables if necessary. Enter the answer up to two decimals accuracy.
Use the following CDF values of standard normal distribution.
FZ (−2) = 0.02275, FZ (−1.5) = 0.06681, FZ (−1) = 0.15866, FZ (−0.5) = 0.30854, FZ (0) =
0.5, FZ (0.5) = 0.69146, and FZ (1) = 0.84134
Answer: 0.485 accepted range 0.48 to 0.49

Solution:
Given µ = 10, σ 2 = 4 ⇒ σ = 2
We need to find P (X ≥ 8|X ≤ 9).
P (X ≥ 8 ∩ X ≤ 9)
P (X ≥ 8|X ≤ 9) =
P (X ≤ 9)
FX (9) − FX (8)
P (X ≥ 8|X ≤ 9) =
FX (9)
Converting present normal distribution to standard distribution to get values of FX (x).
x−µ 8 − 10
For x = 8, z = = = −1, ⇒ FX (8) = FZ (−1)
σ 2
x−µ 9 − 10
For x = 9, z = = = −0.5, ⇒ FX (9) = FZ (−0.5)
σ 2
FX (9) − FX (8)
P (X ≥ 8|X ≤ 9) =
FX (9)
0.30854 − 0.15866
⇒ P (X ≥ 8|X ≤ 9) = = 0.485
0.30854

46. A random variable X has the following PDF



2x 0 ≤ x ≤ 1
fX (x) =
0 otherwise

Define Y = eX . What is the PDF fY (y) of Y ?

7
Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)
lOMoARcPSD|49399442

 2 log(y)

1≤y≤e
(a) fY (y) = y
0 otherwise


 log(y)
1≤y≤e
(b) fY (y) = 2ey
0 otherwise
 log(y)

1≤y≤e
(c) fY (y) = y
0 otherwise


 log(y)
1≤y≤e
(d) fY (y) = ey
0 otherwise
 log(y)

1≤y≤e
(e) fY (y) = 2y
0 otherwise

Answer: a

Solution:
Given Y = g(X) = eX
⇒ log y = x = g −1 (y)
Therefore g −1 (y) = log(y)
d(ex )
g(x) = ex , ⇒ g � (x) = ex Since = ex
dx x
We know that in the range 0 to 1, e is monotonic (increasing function).
1
Therefore, we can use the formula, fY (y) = � −1 fX (g −1 (y))
|g (g (y))|
g � (g −1 (y)) = g � (log y) = elog y = y
|g � (g −1 (y))| = y since y is positive in the range [1, e]
fX (g −1 (y)) = fX (log y) = 2 log y
1
Therefore, fY (y) = log y
y
2 log y
fY (y) =
y
Hence option a is correct.

Use the following information to answer the questions 7 and 8.

8
Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)
lOMoARcPSD|49399442

Statistics for Data Science - 2


Week 5 graded Assignment
Solution

1. A person randomly chooses a battery from a store which has 40 batteries of type A and
60 batteries of type B. Battery life of type A and type B batteries are exponentially
distributed with average life of 4 years and 6 years, respectively. If the chosen battery
lasts for 5 years, what is the probability that the battery is of type A?
1
(a) 5
1 + e 12
1
(b) −5
1 + e 12
−4
e 5
(c) −6
1+e 5

−6
e 5
(d) −4
1+e 5

Solution:
Define a event X as follows:
(
1 If the chosen battery is of type A
X=
0 If the chosen battery is of type B

Let Y denote the battery life of the chosen battery.


By the given information, we have
Y |X = 1 ∼ Exp( 14 ) and

Y |X = 0 ∼ Exp( 61 )

It implies that

fY |X=1 (y) = 41 e
−y
4 ; y > 0 and

fY |X=0 (y) = 61 e
−y
6 ;y > 0

Also given that

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

40 2
P (X = 1) = = and
100 5
60 3
P (X = 0) = =
100 5

To find: fX|Y =5 (1). Now,

fY |X=1 (5).P (X = 1)
fX|Y =5 (1) =
fY (5)

fY |X=1 (5).P (X = 1)
=
fY |X=1 (5).P (X = 1) + fY |X=0 (5).P (X = 0)

1 −5
4
e 4 . 25
= 1 −5
e 4 . 52 + 16 e 6 . 35
−5
4

1 −5
10
e4
= 1 −5
1 −5
10
e 4 + 10 e6

−5
e 4
= −5 −5
e 4 +e 6

1
= 5
1 + e 12

2. Let Y = XZ + X, where X ∼ Uniform{1, 2, 3} and Z ∼ Normal(1, 4) are independent.


Find the value of fX|Y =2 (2).

3 exp( 81 )
(a)
3 exp( 81 ) + 6 + 2 exp( 92 )
3 exp( −1
8
)
(b) −1
3 exp( 8 ) + 6 + 2 exp( −2
9
)
2 exp( −2
9
)
(c) −1
3 exp( 8 ) + 6 + 2 exp( −29
)
6
(d)
3 exp( 32 ) + 6 + 2 exp( −1
−1
18
)
Solution:

Page 2

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Given that X ∼ Uniform{1, 2, 3} and Z ∼ Normal(1, 4) are independent.


Y = XZ + X
It implies that
Y |X = 1 = Z + 1 ∼ Normal(2, 4)
Y |X = 2 = 2Z + 2 ∼ Normal(4, 16)
Y |X = 3 = 3Z + 3 ∼ Normal(6, 36)

Therefore,  
−(y−2)2
fY |X=1 (y) = √1 exp
2 2π 8
 
−(y−4)2
fY |X=2 (y) = √1 exp
4 2π 32
 
−(y−6)2
fY |X=3 (y) = √1 exp
6 2π 72

To find: fX|Y =2 (2).

fY |X=2 (2).fX (2)


fX|Y =2 (2) =
fY |X=2 (2).fX (2) + fY |X=1 (2).fX (1) + fY |X=3 (2).fX (3)

 
−(2−4)2
√1 exp . 31
4 2π 32
=      
−(2−4)2 −(2−2)2 −(2−6)2
√1 exp . 13 + √1 exp . 31 + √1 exp . 31
4 2π 32 2 2π 8 6 2π 72

1
exp −1

8 4
= 1 −1
 1
+ 16 exp −2

4
exp 8
+ 2
exp(0) 9

3 exp( −1
8
)
=
3 exp( −1
8
) + 6 + 2 exp( −2
9
)

3. Let X be a continuous random variable with the following PDF:


(
3(1 − x)2 0 < x < 1
fX (x) =
0 otherwise

Define Y = (1 − X)3 . Find the PDF of the random variable Y .

a) (
1 0<y<1
fY (y) =
0 otherwise

Page 3

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

b) (
(1 − y)3 0<y<1
fY (y) =
0 otherwise

c) (
y3 0<y<1
fY (y) =
0 otherwise

d) (
3y 2/3 0<y<1
fY (y) =
0 otherwise

Hint:
d
Apply the monotonic, differentiable function theorem and (1 − x)3 = −3(1 − x)2
dx

Solution:
We know that in the range (0, 1), (1 − x)3 is monotonic (decreasing function).
1
Therefore, we can use the formula, fY (y) = ′ −1 fX (g −1 (y))
|g (g (y))|

Given Y = (1 − X)3 = g(X)(let)


⇒ y 1/3 = 1 − x, ⇒ x = 1 − y 1/3 = g −1 (y)
Therefore g −1 (y) = 1 − y 1/3
d
g(x) = (1 − x)3 ⇒ g ′ (x) = −3(1 − x)2 , since (1 − x)3 = −3(1 − x)2
dx

And
g ′ (g −1 (y)) = g ′ (1 − y 1/3 ) = −3(1 − (1 − y 1/3 ))2 = −3y 2/3
|g ′ (g −1 (y))| = 3y 2/3 , since y 2/3 is positive in the range (0, 1).
fX (g −1 (y)) = fX (1 − y 1/3 ) = 3(1 − (1 − y 1/3 ))2 = 3y 2/3
3y 2/3
Therefore, fY (y) = 2/3
3y
⇒ fY (y) = 1

Therefore
(
1 0<y<1
fY (y) =
0 otherwise

4. Let X be a continuous random variable with the following PDF:


(
x2 /81 −6 < x < 3
fX (x) =
0 otherwise

Page 4

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Define Y = 13 (12 − X). Find the PDF of the random variable Y .

a) (
(12 − 3y)2 /27 −6 < y < 3
fY (y) =
0 otherwise

b) (
(12 − 3y)2 /27 3 < y < 6
fY (y) =
0 otherwise

c) (
(12 − 3y)/27 −6 < y < 3
fY (y) =
0 otherwise

d) (
(12 − 3y)/27 3 < y < 6
fY (y) =
0 otherwise

Solution:
We know that in the range (-6, 3), 13 (12 − x) is monotonic (decreasing function).
1
Therefore, we can use the formula, fY (y) = ′ −1 fX (g −1 (y))
|g (g (y))|

Given Y = 13 (12 − X) = g(X)(let)


⇒ 3y = 12 − x, ⇒ x = 12 − 3y = g −1 (y)
Therefore g −1 (y) = 12 − 3y
g(x) = 31 (12 − x) ⇒ g ′ (x) = − 31

And
g ′ (g −1 (y)) = g ′ (12 − 3y) = − 31
|g ′ (g −1 (y))| = 31
(12 − 3y)2
fX (g −1 (y)) = fX (12 − 3y) =
81
(12 − 3y)2
Therefore, fY (y) = 81
1
3
(12 − 3y)2
⇒ fY (y) =
27
When x = −6, y = 6 and x = 3, y = 3.

Page 5

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Therefore

 (12 − 3y)2

3<y<6
fY (y) = 27
0 otherwise

5. Let X be a continuous random variable with the following PDF:


(
x3 (6x2 + 5x − 4) 0 < x ≤ 1
fX (x) =
0 otherwise

Find the value of E[X].


523
a)
210
23
b)
210
173
c)
210
187
d)
210
Rb 1
Hint: Use a
xn dx = n+1
(bn+1 − an+1 )
Solution:
Z ∞
E[X] = xfX (x)dx
−∞
Z 1
= x × x3 (6x2 + 5x − 4)dx
0
Z 1
= (6x6 + 5x5 − 4x4 )dx
0
1 1 1
6x7 5x6 4x5
= + −
7 6 5
0 0 0
6 5 4
= + −
7 6 5
187
=
210

6. Let X be a continuous random variable with the following PDF:



x
 0≤x≤1
fX (x) = 2 − x 1 < x ≤ 2

0 otherwise

Page 6

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Define Y = 6X + 5. Find the variance of Y.

Rb 1
Use a
xn dx = n+1
(bn+1 − an+1 )

Rb Rc Rb
Also, a
xn dx = a
xn dx + c
xn dx where a < c < b.

Solution:
Var(Y ) = Var(6X + 5) = 36Var(X)
And Var(X) = E[X 2 ] − (E[X])2

Z ∞
E[X] = xfX (x)dx
−∞
Z 2
= xfX (x)dx
0
Z 1 Z 2
= xfX (x)dx + xfX (x)dx
0 1
Z 1 Z 2
= x.xdx + x(2 − x)dx
0 1
1 2 2
x3 2x2 x3
= + −
3 2 3
0 1 1
1 (23 − 13 )
= + (22 − 12 ) −
3 3
1 7
= +3−
3 3
=1

Page 7

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Z ∞
2
E[X ] = x2 fX (x)dx
−∞
Z 2
= x2 fX (x)dx
Z0 1 Z 2
2
= x fX (x)dx + x2 fX (x)dx
Z0 1 Z 2 1
= x2 .xdx + x2 (2 − x)dx
0 1
1 2 2
x4 2x3 x4
= + −
4 3 4
0 1 1
1 2 1
= + (23 − 13 ) − (24 − 14 )
4 3 4
1 14 15
= + −
4 3 4
7
=
6
Therefore,
Var(X) = 67 − 1 = 16
⇒ Var(Y ) = 36 × 61 = 6

7. Suppose X ∼ Normal(3, 4). Find the PDF of Y = 2X + 9.


−(y − 15)2
1
1. fY (y) = √ e 16 , −∞<y <∞
16π
−(y − 6)2
1
2. fY (y) = √ e 32 , −∞<y <∞
32π
−(y − 15)2
1
3. fY (y) = √ e 32 , −∞<y <∞
32π
−(y − 6)2
1
4. fY (y) = √ e 16 , −∞<y <∞
16π
Solution:
We know that (2X + 9) is monotonic in R.
1 −1
Therefore, we can use the formula, fY (y) = fX (g (y))
|g ′ (g −1 (y))|
Given Y = 2X + 9 = g(X)(let)
y−9
=⇒ x = = g −1 (y)
2

Page 8

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

y−9
Therefore, g −1 (y) =
2
g(x) = 2x + 9 ⇒ g ′ (x) = 2

Therefore,

fX (g −1 (y))
fY (y) = ′ −1
| g (g (y)) |
 
y−9
fX
2
=  
y − 9
| g′ |
2
2
y−9

1 1 −3
√ e−  2

2 2

2 2π
=
2
1 (y − 15)2
= √ e−
4 2π 32

8. Suppose X is a continuous random variable with mean 50 and variance 16. Using Cheby-
shev inequality, find the greatest lower bound of the probability that X takes a value in
between 42 and 58.
(Enter the answer correct to two decimal places)

Solution:
Using Chebyshev inequality, we know that for any random variable X
Var(X)
P (| X − E[X] |≤ c) ≥ 1 − . . . (1)
c2

Since E[X] = 50 and Var(X) = 16, (1) will become


16
P (| X − 50 |≤ c) ≥ 1 −
c2
16
=⇒ P (50 − c < X < 50 + c) ≥ 1 − . . . (2)
c2
We have to find a lower bound on the probability P (42 < X < 58). Comparing it with
equation (2), we will get c = 8.

16 1 3
Therefore, the greatest lower bound is 1 − 2
= 1 − = = 0.75.
8 4 4

Page 9

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

9. A passenger train arrives punctually at a station every 20 minutes. Each morning, a


passenger walks in to the train station. Let X denote the amount of time (in minutes)
the passenger waits for the train from the time he reaches the train station. It is known
that the probability density function of X is

1, if 0 < x < 20
fX (x) = 20
0, Otherwise

Find the expected value of Y = X 3 + 22.

Solution:
The expectation of a function of random variable g(X) is given by,
Z ∞
E[g(X)] = g(x)fX (x) dx
−∞

E[Y ] = E[X 3 + 22] = E[X 3 ] + 22

Expectation of X 3 is given by

1
Z
E[g(X)] = dx x3
−∞ 20
Z 20
1
= x3 dx
0 20
 4 20
1 x
=
20 4 0
1 204
= ×
20 4
E[g(X)] = 2000

Therefore, E[Y ] = 2000 + 22 = 2022.

10. 60% of the total people in a city were male and 40% were female. The age of the males is
Normal (60, 25) and the age of the females is Normal (55, 36). If the age of a randomly
selected person is 60, what is the probability that the selected candidate is male?
9
(a)
−25
9 + 5exp( )
72
3
(b)
−25
3 + 5exp( )
72

Page 10

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

9
(c)
−5
9 + 5exp( )
6
3
(d)
−5
3 + exp( )
6
Solution:
Define a event X as follows:
(
1 If the person chosen is a male
X=
0 If the person chosen is a female

Let Y denote the age of the chosen person.


By the given information, we have

Y |X = 1 ∼ Normal(60, 25) and


Y |X = 0 ∼ Normal(55, 36)

It implies that

1 y − 60 2
 

− 
1

fY |X=1 (y) = √ e 2 5 ; y > 0 and
5 2π
1 y − 55 2
 

− 
1

fY |X=0 (y) = √ e 2 6 ;y > 0
6 2π

Also given that

60 3 40 2
P (X = 1) = = and P (X = 0) = =
100 5 100 5

Page 11

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

To find: fX|Y =60 (1). Now,

fY |X=1 (60).P (X = 1)
fX|Y =60 (1) =
fY (60)

fY |X=1 (60).P (X = 1)
=
fY |X=1 (60).P (X = 1) + fY |X=0 (60).P (X = 0)

1 60 − 60 2  
 

− 
1 3

√ e 2 5
5 2π 5
= 2
1 60 − 60   1 60 − 55 2  
  

−  − 
1 3 1 2
 
√ e 2 5 + √ e 2 6
5 2π 5 6 2π 5
 
1 3

5 2π 5  
=
1 25  
  − 
1 3 1 2

√ + √ e 2 36
5 2π 5 6 2π 5
9
=  
−25
9 + 5exp
72

Page 12

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Statistics for data science - II


Week 6 practice assignment solutions

4. The joint density function of two continuous random variables X and Y is given as
1.

kxy 0 < x < 4, 0 < y < 1
fXY (x, y) =
0 otherwise

Find the value of k. Enter your answer correct to two decimals accuracy.
Solution: �∞ �∞
We know that for joint PDF, −∞ −∞ fXY (x, y)dxdy = 1
Since fXY (x, y) is nonzero in the region 0 < x < 4, 0 < y < 1.
� 1� 4
⇒ fXY (x, y)dxdy = 1
0 0
� 1� 4
⇒ kxy dxdy = 1
0 0
� 1
y 2 ��4
⇒ kx � dx = 1
0 2 0
� 1
⇒ 8kxdx = 1
0
x2 ��1
⇒ 8k � = 1
2 0
1
⇒ k = = 0.25
4

2.
5. Let (X, Y ) ∼ Uniform(D), where D = {(x, y) : x + y < 4, x > 0, y > 0}. Find the value
of P (2X + Y > 2).
1
a) 8
7
b) 8
3
c) 4
1
d) 4

Solution:

(X, Y ) ∼ Uniform(D), therefore



1
8
(x, y) ∈ D
fXY (x, y) =
0 otherwise

1
Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)
lOMoARcPSD|49399442

1
Area of the lower shaded region (A) will be 2
×1×2=1

P (2X + Y > 2) = 1 − P (2X + Y ≤ 2)


|A|
=1−
|D|
1
=1−
8
7
=
8

6. The joint density function of the random variables X and Y is given by


3.

x + y 0 < x < 1, 0 < y < 1
fXY (x, y) =
0 otherwise

Find the value of P (X + Y < 1).


1
a) 3
2
b) 3
1
c) 6
3
d) 4

Solution:

2
Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)
lOMoARcPSD|49399442

� 1 � 1−y
P (X + Y < 1) = (x + y)dxdy
0 0
� 1� 2 ��
x �1−y
= + xy � dy
0 2 0
� 1�
(1 − y)2

= + (1 − y)y dy
0 2
�� 1
(1 − y)3 y 2 y 3 ��

= − + −
6 2 3 �

0
� � � �
1 1 1
= − − −
2 3 6
1
=
3
7. The joint PDF of two continuous random variables X and Y is given by
4.

2
(5x + 2y) 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
fXY (x, y) = 7
0 otherwise
Find the marginal PDF of X.
a) �
2x 0 ≤ x ≤ 1
fX (x) =
0 otherwise
b) �
2
7
(5x + 1) 0 ≤ x ≤ 1
fX (x) =
0 otherwise
c) �
2
7
(3x + 2) 0 ≤ x ≤ 1
fX (x) =
0 otherwise

3
Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)
lOMoARcPSD|49399442

d) �
2
7
(5y + 1) 0 ≤ x ≤ 1
fX (x) =
0 otherwise

Solution:
For 0 ≤ x ≤ 1
1
2

fX (x) = (5x + 2y)dy
0 7
�� 1
2y 2 ��

2
= 5xy +
7 2 �

0
2
= (5x + 1)
7

5.
8. Let X and Y be jointly continuous random variables with joint PDF

k(2 − y) 0 < x < 4, 0 < y < 1
fXY (x, y) =
0 otherwise

Find the marginal PDF of Y.

a) �
3
2
y(2 − y) 0 < y < 1
fY (y) =
0 otherwise

b) �
2y 0<y<1
fY (y) =
0 otherwise

c) �
3
2
(1 − y2) 0 < y < 1
fY (y) =
0 otherwise

d) �
2
3
(2 − y) 0 < y < 1
fY (y) =
0 otherwise

Solution: �∞ �∞
We know that for joint PDF, −∞ −∞ fXY (x, y)dxdy = 1

4
Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)
lOMoARcPSD|49399442

Since fXY (x, y) is nonzero in the region 0 < x < 4, 0 < y < 1.
� 1� 4
⇒ fXY (x, y)dxdy = 1
0 0
� 1� 4
⇒ k(2 − y)dxdy = 1
0 0
� 1 �4
⇒ k(2 − y)x� dy = 1

0 0
� 1
⇒ 4k(2 − y)dy = 1
0
�� 1
y 2 ��

⇒ 4k 2y − � =1
2 �
0
3
⇒ 4k × = 1
2
1
⇒k=
6
For 0 < y < 1
4
1

fY (y) = (2 − y)dx
0 6
1 �4
= (2 − y)x�

6 0
2
= (2 − y)
3

9. Let X and Y be two independent continuous random variables with PDFs fX (x) and
6.
fY (y) given as

1 0≤x<1
fX (x) =
0 otherwise

y/2 0 ≤ y < 2
fY (y) =
0 otherwise
Find the value of P (2X + Y > 1).
1
a) 24
11
b) 12
1
c) 12
23
d) 24

5
Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)
lOMoARcPSD|49399442

Solution:
Given that X and Y be two independent continuous random variables,
therefore fXY (x, y) = fX (x)fY (y).

y/2 0 ≤ x < 1, 0 ≤ y < 2
fXY (x, y) =
0 otherwise
We have to find the value of P (2X + Y > 1).
And
P (2X + Y > 1) = 1 − P (2X + Y ≤ 1)
1−y
1
y
� �
2
P (2X + Y ≤ 1) = dxdy
2
�0 1
1−y
0
y �� 2
= x� dy
2 0
�0 1
1
= y(1 − y)dy
0 4
�� 1
1 y 2 y 3 ��

= −
4 2 3 �

0
1
=
24
1 23
⇒ P (2X + Y > 1) = 1 − 24
= 24

10.
7 The joint density function of two random variables X and Y is given by

8xy 0 ≤ x ≤ 1, 0 ≤ y ≤ x
fXY (x, y) =
0 otherwise

Are X and Y independent?

6
Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)
lOMoARcPSD|49399442

a) Yes
b) No

Solution:

� x
fX (x) = 8xy dy
0
�x
y 2 ��
= 8x �
2�
0
3
= 4x

� 1
fY (y) = 8xy dx
0
�1
x2 ��
= 8y �
2�
0
= 4y

fX (x)fY (y) = 4x3 × 4y = 16x3 y �= fXY (x, y).


Hence X and Y are not independent.

11.
8 Let (X, Y ) ∼ Uniform(D), where D = [3, 5] × [2, 4]. Are X and Y independent?

a) Yes
b) No

Solution:
(X, Y ) ∼ Uniform(D), therefore

7
Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)
lOMoARcPSD|49399442


1
4
3 ≤ x ≤ 5, 2 ≤ y ≤ 4
fXY (x, y) =
0 otherwise

4
1

fX (x) = dy
2 4
�4
1 ��
= y�
4 �
2
1
=
2

5
1

fY (y) = dx
3 4
�5
1 ��
= x�
4 �
3
1
=
2
fX (x)fY (y) = 21 × 21 = 14 = fXY (x, y).
Hence X and Y are independent.

12.
9 The joint PDF of two random variables X and Y is given by

4xy 0 < x < 1, 0 < y < 1
fXY (x, y) =
0 otherwise

Find the distribution of X | Y = 0.5. (fX|Y =0.5 (x))

a) �
2x 0 < x < 1
fX|Y =0.5 (x) =
0 otherwise

b) �
3x2 0<x<1
fX|Y =0.5 (x) =
0 otherwise

c) �
4x3 0<x<1
fX|Y =0.5 (x) =
0 otherwise

10

8
Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)
lOMoARcPSD|49399442

d) �
1 0<x<1
fX|Y =0.5 (x) =
0 otherwise
Solution:
For 0 < y < 1
� 1
fY (y) = 4xy dx
0
�1
x2 ��
= 4y �
2�
0
= 2y
The distribution of X | Y = 0.5, (0 < x < 1) is given by

fXY (x, 0.5)


fX|Y =0.5 (x) =
fY (0.5)
4x × 0.5
=
2 × 0.5
= 2x

13. The joint PDF of two random variables X and Y is given by


10.

xy
x2 + 3
0 ≤ x ≤ 1, 0 ≤ y ≤ 2
fXY (x, y) =
0 otherwise

Find the value of P ( 14 < X < 1


2
| Y = 1).
83
a) 96
13
b) 96
13
c) 48
35
d) 48

Solution:
For 0 < y < 1
1
xy �
� �
fY (y) = dx x2 +
0 3
� 3 �� 1
x x2 y ��
= +
3 6 �

0
1 1
= + y
3 6

11

9
Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)
lOMoARcPSD|49399442

fXY (x, 1)
fX|Y =1 (x) =
fY (1)
x + x×1
2
= 1 1 3
3
+ ×1
� 6 x�
= 2 x2 +
3

� � � 1/2 �
1 1 x�
P <X< |Y =1 = 2 x2 + dx
4 2 1/4 3
� 3 ��1/2
x x2 ��
=2 +
3 6 �

1/4
�� � � ��
1 1 1 1
=2 + − +
24 24 192 96
� �
1 1
=2 −
12 64
13
=
96

12

10
Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)
lOMoARcPSD|49399442

Statistics for Data Science - 2


Week 6 graded Assignment
Solution

1. The joint pdf of two continuous ranodm variables X and Y is given by


(
4xy 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
fXY (x, y) =
0 otherwise

Are X and Y independent?

1. Yes
2. No

Solution:
First we will calculate the marginal densities of X and Y .
For 0 ≤ x ≤ 1
Z 1
fX (x) = fXY (x, y)dy
0
Z 1
= 4xydy
0
1
2
= 2xy
0
= 2x

For 0 ≤ y ≤ 1
Z 1
fY (y) = fXY (x, y)dx
0
Z 1
= 4xydx
0
1

= 2x2 y
0
= 2y

Therefore,
fX (x).fY (y) = 4xy = fXY (x, y)
It implies that X and Y are independent random variables.

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

2. Let (X, Y ) ∼ Uniform(D), where D = {(x, y) : (x − k)2 + (y − k)2 ≤ r}. Calculate


P (X ≥ Y ).
Solution:

y=x
2

x
1 2

The region X ≥ Y will be the lower half part of the circle.

Therefore,
Area of lower half circle
P (X ≥ Y ) =
Area of the circle
π(1)2/2
=
π(1)2
1
=
2

3. Let (X, Y ) ∼ Uniform(D), where D = {(x, y) : y ≤ 2x, 0 < x < 1, 0 < y < 2} ∪ [1, 2] ×
[0, 2]. Find the marginal density of X.
(a) 
 2x + 2 0≤x≤2
fX (x) = 3 3
0 otherwise
(b) 
 2x + 1 0≤x≤2
fX (x) = 3 3
0 otherwise
(c) 
2x
3
 0≤x≤1
2
fX (x) = 3
1≤x≤2

0 otherwise

Page 2

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

(d) 
2x
3
 0≤x≤1
1
fX (x) = 1≤x≤2
3
0 otherwise

2 y = 2x

x
1 2

D denotes the area of the support(X, Y ).


Area of D = 21 × 1 × 2 + 1 × 2 = 3
Since (X, Y ) ∼ Uniform(D), it implies that
1
fXY (x, y) = , x, y ∈ D
3
R
We know that fX (x) = fXY (x, y)dy

For 0 < x < 1


2x
1
Z
fX (x) = dy
0 3
2x
1
= y
3
0
2x
=
3
For 1 < x < 2
2
1
Z
fX (x) = dy
0 3
2
1
= y
3
0
2
=
3

Page 3

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Therefore, marginal density of X is given by



2x
3 0≤x≤1

fX (x) = 23 1 ≤ x ≤ 2

0 otherwise

4. The joint pdf of two random variables X and Y is given by


(
24xy 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, x + y ≤ 1
fXY (x, y) =
0 otherwise

Choose the correct option(s).

(a) P (X + Y ≤ 14 ) = 1
2
(b) P (X + Y ≤ 12 ) = 1
16
(c) X and Y are independent random variables.
(d) X and Y are dependent random variables.

Solution:
Option (a)

x+y =1

0.25

x
0.25 1

Page 4

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Orange region will denote X + Y ≤ 14 . Now,


1/4 1/4−y
1
Z Z
P (X + Y ≤ ) = fXY (x, y)dxdy
4 y=0 x=0

Z 1/4 Z 1/4−y

= 24xydxdy
y=0 x=0

1/4−y
Z 1/4
2
= 12x y dy
y=0
x=0

1/4  2
1
Z
= 12y −y dy
y=0 4

1/4
12
Z
= y(1 − 4y)2 dy
y=0 16

1/4
3
Z
= y(1 + 16y 2 − 8y)dy
4 y=0

1/4
y2 8y 3
 
3 4
= + 4y −
4 2 3
y=0

 
3 1 1 1
= + −
4 32 64 24
3 1 1
= . =
4 192 256

Hence, option (a) is wrong.

Option (b)

Page 5

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

0.55 x+y =1

x
0.5 1

Page 6

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Orange region will denote X + Y ≤ 12 . Now,


1/2 1/2−y
1
Z Z
P (X + Y ≤ ) = fXY (x, y)dxdy
2 y=0 x=0

Z 1/2 Z 1/2−y

= 24xydxdy
y=0 x=0

1/2−y
Z 1/2
2
= 12x y dy
y=0
x=0

1/2  2
1
Z
= 12y −y dy
y=0 2

1/2
12
Z
= y(1 − 2y)2 dy
y=0 4

Z 1/2

=3 y(1 + 4y 2 − 4y)dy
y=0

1/2
y2 4y 3
 
=3 + y4 −
2 3
y=0

 
1 1 1
=3 + −
8 16 6
2 1
=3× =
96 16

Hence, option (b) is correct.

Option (c) and (d)

Page 7

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

0.5 x+y =1

x
0.5 1

For 0 < x < 1


Z 1−x
fX (x) = fXY (x, y)dy
y=0
Z 1−x
= 24xydy
y=0
1−x
2
= 12xy
y=0
= 12x(1 − x)2

For 0 < y < 1


Z 1−y
fY (y) = fXY (x, y)dx
Zx=0
1−y
= 24xydx
0
1−y
2
= 12x y
x=0
= 12y(1 − y)2

Therefore, fX (x).fY (y) = 144xy(1 − x)2 (1 − y)2 ̸= fXY (x, y)

Hence, X and Y are not independent.

Page 8

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

5. The joint pdf of two random variables X and Y is given by


(
3xy(1 − x) 0 ≤ x ≤ 1, 0 ≤ y ≤ 2
fXY (x, y) =
0 otherwise

Calculate P (X > 21 |Y = 1).


Solution:
We know that
fXY (a < X < b, y)
P (a < X < b|Y = y) =
fY (y)
Now,
Z 1
fY (y) = 3xy(1 − x)dx
0
Z 1
= (3xy − 3x2 y)dx
0
1
3x2 y
 
= − x3 y
2
0
3y y
= −y =
2 2
1
Therefore, fY (1) = 2
Now,

1 fXY (X > 21 , Y = 1)
P (X > |Y = 1) =
2 fY (1)
1
= 2fXY (X > , Y = 1)
2
Z 1
= 2(3x(1 − x))dx
x= 21
Z 1
=6 (x − x2 )dx
1
2
1
x2 x3

=6 −
2 3 1
  2 
1 1 1 1 1 1
=6 − −6 − =1− =
2 3 8 24 2 2

6. The amount of milk (in litres) in a shop at the beginning of any day is a random amount
X from which a random amount Y (in litres) is sold during that day. Assume that the

Page 9

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

joint density function of X and Y is given by


(
1
0 ≤ x ≤ 10, 0 ≤ y ≤ x
fXY (x, y) = 50
0 otherwise
Find the probability that amount of milk left at the end of day is less than 5 litres. Write
your answer correct to two decimal points.
Solution:
y
y=x
10

5
x−y =5

x
5 10

X denotes the amount of milk at the beginning of any day and Y denotes the amount
of milk which is sold during that day.
Therefore, amount of milk left at the end of the day will be denoted by X − Y .

To find: P (X − Y < 5)

In the diagram above, brown region denotes X −Y < 5 and brown + blue region denotes
the support of X and Y .

1
Area of the support(X, Y ) = 2
× 10 × 10 = 50.

Area of brown region = Area of support(X, Y )− area of blue region

⇒ area of brown region = 50 − 21 × 5 × 5 = 75


2

Therefore,
area of brown region
P (X − Y < 5) =
area of support
75/2
=
50
75
=
100

Page 10

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

7. The joint pdf of two continuous random variables X and Y is given by


(
ke−(x+y) x ≥ 0, y ≥ 0
fXY (x, y) =
0 otherwise

Find the value of P (X ≥ 5, Y ≤ 5).

(a) e−10
(b) (e−5 − 1)e−5
(c) (1 − e−5 )e−5
(d) (e−5 + 1)e−5

Solution:
We know that Z Z
fXY dxdy = 1
Supp(X,Y )

Therefore,
Z ∞ Z ∞
(ke−(x+y) )dxdy = 1
y=0 x=0
Z ∞ Z ∞
⇒k e−y e−x dxdy = 1
y=0 x=0
Z ∞

⇒k e (−e )
−y −x
dy = 1
y=0
0
Z ∞
⇒k e (0 + 1)dy = 1
−y

Zy=0

⇒k e−y dy = 1
y=0

⇒k(−e ) −y
=1
0
⇒k(0 + 1) = 1
⇒k = 1

To find: P (X ≥ 5, Y ≤ 5)

Page 11

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Now,
Z 5 Z ∞
P (X ≥ 5, Y ≤ 5) = (e−(x+y) )dxdy
y=0 x=5
Z 5 Z ∞
= e−y e−x dxdy
y=0 x=5

Z 5
= e−y (−e−x ) dy
y=0
5
Z 5
= e−y (0 + e−5 )dy
y=0
Z 5
= (e ) −5
e−y dy
y=0
5

= (e−5 )(−e−y )
0
= (e−5 )(−e−5 + 1)
= (e−5 )(1 − e−5 )

8. The joint pdf of two random variables X and Y is given by


(
1
(x + y) 0 ≤ x ≤ 2, 0 ≤ y ≤ 2
fXY (x, y) = 8
0 otherwise
 
1 1
Find the value of P ≤ y ≤ 1 | (X = ) . Write your answer correct to two decimal
2 2
points.
Solution:
We know that
fXY (X = x, a < Y < b)
P (a < Y < b|X = x) =
fX (x)
Now,
2
1
Z
fX (x) = (x + y)dy
0 8
2
y2

1
= xy +
8 2
0
2x + 2 x+1
= =
8 4

Therefore, fX ( 21 ) = 3
8

Page 12

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Now,

1 1 fXY (X = 21 , 21 ≤ Y ≤ 1)
P ( ≤ Y ≤ 1|X = ) =
2 2 fX ( 12 )

1   
8 1 1
Z
= + y dy
1/2 3 8 2

1  
1 1
Z
= + y dy
1/2 3 2

1
y y2
 
= +
6 6 1/2

   
1 1 1 1
= + − +
6 6 12 24

1 1 5
= − = = 0.20
3 8 24

2 y

y = x/2

x
1 2

Page 13

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

y
100 D C

y = x − 30 F

30 A B
E

x
30 100

Use the following information to answer questions (9) and (10):

The joint density function of two random variables X and Y is given by



 2 (x + 1), for 0 < x < 1, 0 < y < 1
fXY (x, y) = 3
0, otherwise

9. Are X and Y independent?

(a) Yes
(b) No

Solution:
First we will calculate the marginal densities of X and Y .
For 0 < x < 1
Z 1
fX (x) = fXY (x, y)dy
0
Z 1
2
= (x + 1) dy
0 3
1
2
= (x + 1)y
3
y=0
2
= (x + 1)
3

Page 14

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

For 0 < y < 1


Z 1
fY (y) = fXY (x, y)dx
0
1
2
Z
= (x + 1) dx
0 3
1
2 x2

= +x
3 2
0
2 3
= × =1
3 2

Therefore,
2
fX (x).fY (y) = (x + 1) = fXY (x, y)
3
It implies that X and Y are independent random variables.

10. Find the value of P (X > Y ). Write your answer correct to two decimal places.

Solution:

2 y

x
1 2

Page 15

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

The shaded region here represents X > Y .

Z1 Z1
2
P (X > Y ) = (x + 1) dxdy
3
0 y
Z1 1
x2
 
2
= +x dy
3 2
0 y
Z1
3 y2
 
2
= − − y dy
3 2 2
0
1
y3 y2
 
2 3
= y− −
3 2 6 2
0
2 5
= × = 0.55
3 6

Page 16

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Statistics for Data Science - 2


Week 7 practice Assignment
Statistics from samples and Limit theorems

1. A population has mean 60 and standard deviation 6. Random samples of size 100 from
this population are collected independently. Find the expected value of the sample mean.
Solution:
We know that expected value of the sample mean X is given by

E[X] = µ
= 60

2. Let X1 , X2 , X3 , X4 and X5 ∼ i.i.d. Normal(2, 25). Calculate P (2X1 + X2 + 3X3 + X4 +


X5 ≥ 10).

1. FZ (0.3)
2. 1 − FZ (0.3)
3. FZ (−0.3)
4. 1 − FZ (−0.3)

Solution:

We know that linear combination of independent Normal distributions is again a normal


distribution.

Hence, 2X1 + X2 + 3X3 + X4 + X5 will follow a Normal distribution.


Let Y = 2X1 + X2 + 3X3 + X4 + X5
E[Y ] = E[2X1 + X2 + 3X3 + X4 + X5 ] = (2 + 1 + 3 + 1 + 1)E[X] = 16
Var(Y ) = Var(2X1 + X2 + 3X3 + X4 + X5 ) = (4 + 1 + 9 + 1 + 1)Var(X) = 400

It implies that Y ∼ Normal(16, 202 ).


To find: P (Y ≥ 10)

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Now,

P (Y ≥ 10) = P (Y − 16 ≥ −6)
Y − 16 −6
= P( ≥ )
20 20
Y − 16
= P( ≥ −0.3)
20
= P (Z ≥ −0.3)
= 1 − P (Z < −0.3)
= 1 − FZ (−0.3)

3. Random samples of size 100 are collected from a population of unknown parameters. If
the variance of the sample mean is 36, what will be the standard deviation of the actual
population?
Solution:
σ2
We know that variance of the sample mean is given by where σ is the standard
n
deviation of the actual population and n is the sample size.

By the given information, we have

σ2
= 36
n
σ2
⇒ = 36
100
⇒σ 2 = 3600
⇒σ = 60

Therefore, standard deviation of the actual population is 60.

4. A random sample of size 50 is collected from a population with a standard deviation of


5. Find the upper bound on the probability that the sample mean will be at least 10
away from the actual mean using the weak law of large numbers. Write your answer
correct to three decimal places.
Solution:
Given: standard deviation of the population, σ = 5
Sample size, n = 50

To find: upper bound on P (|X − µ| ≥ 10) where X and µ are sample mean and popu-
lation mean, respectively.

Page 2

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Now, by weak law of large number, we have


σ2
P (|X − µ| ≥ δ) ≤
nδ 2
25
⇒P (|X − µ| ≥ 10) ≤
100 × 50
⇒P (|X − µ| ≥ 10) ≤ 0.005

5. A study says that the delivery time of pizzas has a standard deviation of 10 minutes. A
pizza shop collected the data of some deliveries and their
√ delivery time. The probability
that the mean delivery time of this sample is at least 5 minutes away from the actual
mean delivery time is at most 51 as per the weak law of large numbers. What is the size
of the sample?
Solution:
Let X denote the delivery time of pizzas.
Given that σ = 10 √
To find: size of the sample such that P (|X − µ| ≥ 5) ≤ 51 ...(1).
By the weak law of large numbers, we have
σ2
P (|X − µ| ≥ δ) ≤ 2

√ 100
⇒P (|X − µ| ≥ 5) ≤ ...(1)
n×5

By equation (1) and (2), we have


1 100
=
5 5n
⇒n = 100

6. A random sample of size 100 is collected from a normal population with mean µ and
variance σ 2 . Suppose the expected value and the variance of the sample mean is 50 and
0.25, respectively. Find the value of µ + 10σ.
Solution:
Given, the population has mean µ and variance σ 2 .
Let X denote the population. Then
E[X] = µ, Var(X) = σ 2
Sample size, n = 100.
X1 + . . . + X100
Let X = denote the sample mean.
100
Var(X) σ2
We know that E[X] = E[X] = µ and Var(X) = =
n n

Page 3

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

σ2
Since E[X] = 50, =⇒ µ = 50. Since V ar(X) = 0.25 =⇒ = 0.25 =⇒ σ 2 = 25
n

Now, the value of µ + 10σ = 50 + 10 25 = 100.

7. Which among the following is a random variable? Select all that apply.
(i) Population mean
(ii) Sample mean
(iii) Population variance
(iv) Sample variance
(v) Variance of the sample mean
Solution:
X1 + . . . + Xn
We know that sample mean, X is given by . So, everytime we collect a
n
sample, sample mean will give a different value and it will depend on the samples taken.
So, it is a random variable.

Similarly, the value of sample variance will depend on the samples taken, so it is a ran-
dom variable.

Expected value and variance of a population X is fixed, so they are not a random variable.

σ2
We know that Var(X) = , where σ 2 is population variance and n is the sample size,
n
which are fixed, so the variance of sample mean is not a random variable.

8. In a company, a testing of the product is done to check whether the product is defective
or not. The probability that the product is defective is 0.2 independent of each other.
20 such products are randomly chosen for testing. Which of the following statements
are true?
Hint: Consider the product being non-defective as the success.
(a) The expectation of a sample mean is 0.8.
(b) The expectation of a sample mean is 16.
(c) The variance of a sample mean is 0.008.
(d) The variance of a sample mean is 0.16.
Solution:
Define a random variable X as
(
1, if the product is non-defective
X=
0 otherwise

Page 4

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Now, P (X = 1) = 0.8 and P (X = 0) = 0.2


Since X is Bernoulli(0.8), E[X] = µ = 0.8 and Var[X] = σ 2 = 0.8 × 0.2 = 0.16

A sample X1 , . . . , X20 is chosen for testing and X1 , . . . , X20 ∼ iid X.

σ2
Let X denotes the sample mean. We know that E[X] = µ and Var(X) = . Therefore,
n

E[X] = 0.8.
Hence, option (a) is correct and (b) is incorrect.
σ2 (0.8)(0.2)
And, Var(X) = = = 0.008. Therefore, option (c) is correct and (d) is
n 20
incorrect.

Use the following information to answer (9) and (10).

Suppose for the college president, two candidates are standing for the election, Candidate
A and Candidate B. Of the students in the college, a proportion p will vote for Candidate
A and a proportion (1 − p) will vote for Candidate B. During the election, a random
sample Xi ’s of students are independently interviewed for the candidate they are going
to vote for, where i-th person interviewed will vote for Candidate A.

9. Find the minimum number of people who should be interviewed such that the probabil-
Pn
Xi
i=1
ity that is within 0.2 of the true p is at least 0.9, for p = 0.4?
n
Answer: 60

Solution:
Let the number of people who is interviewed be n. Each student has two choices:
Candidate A and Candidate B.
Let Xi be defined as
(
1, if student i votes for candidate A
Xi =
0 otherwise

Since Xi ’s follow Bernoulli(p) distribution, with E[Xi ] = p and Var[Xi ] = p(1 − p), we
p(1 − p)
will have E[X] = p and Var[X] =
n
Number of students voted for Candidate A
, where p =
Total number of students who voted

Page 5

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Using Weak Law of Large Numbers, we know that

σ 2 /n
P (| X − µ |≤ c) ≥ 1 −
c2
Therefore,  n
P

Xi
 i=1
 (0.4)(0.6)/n
P
| − p |≤ 0.2
≥1− ≥ 0.9
n 0.22

(0.4)(0.6)/n
Solving for 1 − ≥ 0.9, we get
0.22
0.24
≤ 0.1 =⇒ n ≥ 60
0.04n
Therefore, the minimum number of people who should be interviewed is 60.

10. Find the minimum number of people who should be interviewed such that the probabil-
Pn
Xi
i=1
ity that is within 0.2 of the true p is at least 0.95, for p = 0.4?
n
Answer: 120

Solution:
Using Weak Law of Large Numbers, we know that

σ 2 /n
P (| X − µ |≤ c) ≥ 1 −
c2
 n
P

Xi
 i=1
 (0.4)(0.6)/n
P
| − p |≤ 0.2 ≥ 1 − ≥ 0.95
n  0.22

(0.4)(0.6)/n
Solving for 1 − ≥ 0.95, we get
0.22
0.24
≤ 0.05 =⇒ n ≥ 120
0.04n
Therefore, the minimum number of people who should be interviewed is 120.

Page 6

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Statistics for Data Science - 2


Week 7 Graded assignment

1. Let X1 , X2 , X3 are three independent and identically distributed random variables with
mean µ and variance σ 2 . Given below are 3 different formulations of sample mean.
(Observe that E[A] = E[B] = E[C]).
X1 + X2 + X3
A=
3
B =0.1X1 + 0.3X2 + 0.6X3
C =0.2X1 + 0.3X2 + 0.5X3

Choose the correct option from the following:

(a) Var(A) = Var(B) = Var(C)


(b) Var(A) ≥ Var(B) ≥ Var(C)
(c) Var(A) ≤ Var(B) ≤ Var(C)
(d) Var(A) ≤ Var(C) ≤ Var(B)

Solution:
Let X1 , X2 , X3 ∼ i.i.d.X, where E[X] = µ, Var(X) = σ 2
 
X1 + X2 + X3
Var(A) =Var
3
1
= (Var[X1 ] + Var[X2 ] + Var[X3 ])
9
1 σ2
= (3σ 2 ) =
9 3

Var(B) =Var (0.1X1 + 0.3X2 + 0.6X3 )


=0.01Var[X1 ] + 0.09Var[X2 ] + 0.36Var[X3 ]
=0.46σ 2

Var(C) =Var (0.2X1 + 0.3X2 + 0.5X3 )


=0.04Var[X1 ] + 0.09Var[X2 ] + 0.25Var[X3 ]
=0.38σ 2

Therefore, Var(B) ≥ Var(C) ≥ Var(A).

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

2. A random sample of size 25 is collected from a normal population with mean of 50 and
standard deviation of 5. Find the variance of the sample mean.
Solution:
We know that variance of the sample mean X is given by

σ2
Var[X] =
n
52
= =1
25

3. A fair die is rolled 100 times. Let X denote the number of times six is obtained. Find
X 1
a bound for the probability that differs from by less than 0.1 using weak law of
100 6
large numbers.
5
(a) at least
36
31
(b) at least
36
5
(c) at most
36
31
(d) at most
36
Solution:
X denotes the number of times six is obtained on rolling a fair die 100 times.
Let X1 , X2 , . . . , X100 be 100 i.i.d. samples such that
(
1 if six appears on rolling a fair die
Xi =
0 otherwise

1
E[Xi ] = µ = and
6
5
Var(Xi ) = σ 2 =
36
Notice that X = X1 + X2 + X3 + . . . + X100
!
X 1
To find: Bound on P − < 0.1 .
100 6

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

By weak law of large numbers, we have

σ2
P (|X − µ| < δ) ≥ 1 − 2
! nδ
X 1 5
⇒P − < 0.1 ≥ 1 −
100 6 36 × 100 × 0.01
!
X 1 5 31
⇒P − < 0.1 ≥ 1 − =
100 6 36 36

4. Let X1 , X2 , . . . , X5 be i.i.d. samples whose distribution has mean 20 and variance 4.


Suppose the sample variance is defined as

(X1 − X)2 + · · · + (X5 − X)2


S2 =
5
X1 + X2 + · · · + X5
, where X = . Find the expected value of S 2 .
5

Solution:
σ2 4
E[X̄] = µ = 20 and Var[X̄] = = = 0.8.
n 5

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

 n 
2
P
2
 i=1(Xi − X̄) 
E[S ] =E  
 n 

" n #
1 X
= E (Xi − X̄)2
n
" i=1
n
#
1 X
= E (Xi2 + X̄ 2 − 2Xi X̄)
n
" i=1
n n
#
1 X X
= E Xi2 + nX̄ 2 − 2nX̄ Xi
n
" i=1
n
i=1
#
1 X
= E Xi2 + nX̄ 2 − 2nX̄ 2
n
" i=1
n
#
1 X
= E Xi2 − nX̄ 2
n
" ni=1 #
1 X
= E[Xi2 ] − nE[X̄ 2 ]
n i=1
" n  2 #
1 X 2 σ
= (σ + µ2 ) − n + µ2
n i=1 n
1
= (nσ 2 + nµ2 ) − (σ 2 + nµ2 )

n
(n − 1)σ 2
=
n
4
Here, n = 5, therefore, E[S 2 ] = × 4 = 3.2
5
 
1
5. Suppose Xi ∼ Normal 0, 2 , where i = 1, 2, . . . , 9 and X1 , X2 , . . . , X9 are indepen-
i
X 9
dent to each other. Let Y be a random variable defined as Y = iXi . Find the
i=1
variance of Y .
Answer: 9

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Solution:

9
!
X
Var(Y ) =Var iXi
i=1
=Var(X1 + 2X2 + 3X3 + . . . + 9X9 )
=Var(X1 ) + Var(2X2 ) + . . . + Var(9X9 )
=Var(X1 ) + 4Var(X2 ) + . . . + 81Var(X9 )
   
1 1 1
= 2 + 4 2 + . . . + 81 2
1 2 9
=9

6. A random sample of size 50 is collected from a population P , where P ∼ Uniform[0,12].


Find a lower bound on the probability that the sample mean will be at most 3 units
away from the actual mean using the weak law of large numbers.
Answer: 0.733
Solution:
P ∼ Uniform[0, 12]
0 + 12 (12 − 0)2 144
E[P ] = µ = = 6, Var(P ) = σ 2 = = = 12
2 12 12

By weak law of large numbers, we have

σ2
P (|X − µ| < δ) ≥ 1 −
nδ 2
 12 73
⇒P |X − µ| < 3 ≥ 1 − = = 0.9733
50 × 9 75

7. Suppose a random sample is used to estimate the proportion of voters in a city. If the
sample proportion is roughly 0.45, what sample size is necessary so that the standard
deviation of the sample proportion is 0.02?
Answer: 619

Solution:
Let the random variable X represents that the selected candidate is a voter.

Let Xi be defined as
(
1, if the selected candidate is a voter
Xi =
0, otherwise

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Define an event A as A : X = 1.
It is given that P (A) = 0.45.
P (A)(1 − P (A))
We know that Var(S(A)) =
n
r r
p(1 − p) (0.45)(0.55)
= 0.02 =⇒ = 0.02 =⇒ n = 618.75 ≈ 619
n n

8. The average life(in years) of an electronic watch follows an exponential distribution


1
with parameter . Find the lower bound on the probability that the mean life of a
2
random sample of 50 such watches falls between 1 and 3 years. Enter your answer
correct to two decimals.
Hint: Use weak law of large numbers.
Answer: 0.92

Solution:
Let the random variable X represents the life of an electronic watch.
It is given that X ∼ Exp(1/2) and 50 such samples are taken.
E[X] = µ = 2, Var(X) = σ 2 = 4

To find: a lower bound on P (1 < X < 3).

By weak law of large numbers, we have

σ2
P (|X − µ| < δ) ≥ 1 −
nδ 2
 4 23
⇒P |X − 2| < 1 ≥ 1 − = = 0.92
50 × 1 25

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Statistics for Data Science - 2


Week 8 practice Assignment
Statistics from samples and Limit theorems

X
1. If X, Y ∼ i.i.d. Normal(0, 4), what will be the variance of ?
Y
(a) 4
(b) 2
(c) 1
(d) Undefined

Solution:
X
We know that if X, Y ∼ i.i.d. Normal(0, σ 2 ), ∼ Cauchy(0, 1) and variance of Cauchy
Y
distribution is undefined.
Therefore, option(d) is correct.

2. A study shows that the average daily sleeping hours of teenagers is ten hours with a
standard deviation of two hours. If a sample of 100 teenagers is collected, what will be
the probability that the mean of the sleeping hours of these 100 teenagers is at least 0.4
hours away from the population mean? Assume that each observation in the sample is
independent. Assume that FZ denotes the CDF of standard normal distribution.

(a) 1 + FZ (−2) − FZ (2)


(b) 1 − FZ (−2) + FZ (2)
(c) FZ (2) − FZ (−2)
(d) FZ (2)

Solution:
let X denote the average daily sleeping hours of teenagers.
Given: standard deviation of X, σ = 2
Sample size, n = 100

To find: P (|X − µ| ≥ 0.4) where X and µ are sample mean and population mean,
respectively.

Let S = X1 + X2 + . . . X100 where Xi denotes the ith sample.


S − nµ S − 100µ
By CLT, we know that √ ∼ Normal(0, 1) ⇒ ∼ Z (Standard Normal)
σ n 20

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Now,

S
P (|X − µ| ≥ 0.4) = P ( − µ ≥ 0.4)
n
S − nµ
= P( ≥ 0.4)
n

S − nµ 0.4 n
= P( √ ≥ )
σ n σ
= P (|Z| ≥ 2)
= P (Z ≥ 2) + P (Z ≤ −2)
= 1 − P (Z ≤ 2) + P (Z ≤ −2)
= 1 − FZ (2) + FZ (−2)

3. What is the fourth moment of the Normal(0, 4) distribution?


Solution:
λ2 X 2 λ 3 X 3
MX (λ) = E[eλX ] = E[1 + λX + + + ...]
2! 3!
λ2 E[X 2 ] λ3 E[X 3 ]
= 1 + λE[X] + + + ...
2! 3!
In the moment generating function, coefficient of λ will give first moment (E[X]), co-
λ2 λk
efficient of will give the second moment (E[X 2 ]) and similarly, coefficient of will
2! k!
give the kth moment (E[X k ]).

2 σ2
Moment generating function of Normal(0, σ 2 ) is given by eλ /2
.
Let N ∼ Normal(0, 22 )
λ2 22/2
MN (λ) = e
λ2 2 2 λ4 2 4
=1+ + + ...
2 2!(4)
λ2 2 2 λ2
=1+ + 48 + . . .
2 4!

λ4
Therefore, 4th moment of Normal(0, 22 ) = coefficient of = 48
4!

4. Let X ∼ Gamma(2, 21 ) and Y ∼ Gamma(5, 12 ) be two independent random variables.


X
What will be the expected value of ? Write your answer correct to two decimal
X +Y

Page 2

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

places.
Solution:
We know that if X ∼ Gamma(α, k) and Y ∼ Gamma(β, k) be two independent random
X
variables, then ∼ Beta(α, β).
X +Y

Given that X ∼ Gamma(2, 21 ) and Y ∼ Gamma(5, 21 ) are two independent random


variables. It implies that
X
∼ Beta(2, 5)
X +Y
 
X 2
Therefore, E = = 0.28
X +Y 2+5

5. A company sells eggs whose weights are normally distributed with a mean of 70g and a
standard deviation of 2g. Suppose that these eggs are sold in packages that each contain
four eggs. Assume that the weight of each egg is independent. What is the probability
that the mean weight of the four eggs in a package is greater than 68.5g? Write your
answer correct to two decimal places.
(Hint: Use the fact that linear combination of normal distributions is again a normal
distribution. FZ (−1.5) = 0.066)
Solution:
Let X denote the weight of an egg.
Given that E[X] = µ = 70
SD(X) = σ = 2
X ∼ Normal(70, 22 ) Let X1 , X2 , X3 and X4 denote the weights of four eggs in a package.

Suppose that
X1 + X2 + X3 + X4
X=
4

To find: P (X > 68.5)

We know that linear combination of independent Normal distribution is again a Normal


distribution.
It implies that X is a Normal distribution.

E[X] = µ = 70 and
σ2 4
Var(X) = = =1
n 4

It implies that X ∼ Normal(70, 1) ⇒ X − 70 ∼ Normal(0, 1)

Page 3

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Now,

P (X > 68.5) = P (X − 70 > −1.5)


= P (Z > −1.5)
= 1 − FZ (−1.5)
= 1 − 0.066 = 0.93

6. Let X1 , X2 , X3 , . . . Xn be i.i.d. Poisson(4). What should be the value of n such that


P (3.8 ≤ X ≤ 4.2) ≥ 0.95? [2 marks]
(Hint: Use FZ (1.96) = 0.975)

1. at least 200
2. at least 385
3. at least 450
4. at least 585

Solution:
Given that X1 , X2 , X3 , . . . Xn ∼ i.i.d. Poisson(4)

Mean of the distribution = µ = 4


Variance of the distribution = σ 2 = 4
Let S = X1 + X2 + . . . + Xn and
X1 + X2 + . . . + Xn
X=
n

To find: value of n such that P (3.8 ≤ X ≤ 4.2) ≥ 0.95


By CLT, we know that
S − nµ
√ ∼ Normal(0, 1)

S − 4n
⇒ √ ∼ Normal(0, 1) ...(1)
2 n

Page 4

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

P (3.8 ≤ X ≤ 4.2) ≥ 0.95


S
⇒P (3.8 ≤ ≤ 4.2) ≥ 0.95
n
S
⇒P (−0.2 ≤ − 4 ≤ 0.2) ≥ 0.95
n
S − 4n
⇒P (−0.2 ≤ ≤ 0.2) ≥ 0.95
n
S − 4n
⇒P (−0.1 ≤ ≤ 0.1) ≥ 0.95
2n
√ S − 2n √
⇒P (−0.1 n ≤ √ ≤ 0.1 n) ≥ 0.95
2 n
√ √
⇒FZ (0.1 n) − FZ (−0.1 n) ≥ 0.95
√ √
⇒FZ (0.1 n) − (1 − FZ (0.1 n)) ≥ 0.95

⇒2FZ (0.1 n) − 1 ≥ 0.95

⇒Fz (0.1 n) ≥ 0.975

⇒0.1 n ≥ 1.96
⇒n ≥ 384.16

7. Let the moment generating function of a random variable X be given by


         
1 −4λ 1 −2λ 1 2λ 1 4λ 5
MX (λ) = e + e + e + e +
8 6 6 8 12

Find the distribution of X. [1 mark]

X −4 −2 0 2 4
1 1 1 1 5
P (X = x) 8 6 6 8 12

1.

X −4 −2 0 2 4
5 1 1 1 1
P (X = x) 12 8 6 6 8

2.

X −4 −2 0 2 4
1 1 5 1 1
P (X = x) 8 6 12 6 8

3.

Page 5

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

X −4 −2 0 2 4
1 1 5 1 1
P (X = x) 8 6 12 8 6

4.

Solution:
The MGF of a discrete random variable X with the PMF fX (x) = P (X = x), x ∈ TX
is given by

MX (λ) = E[eλX ]
X
= P (X = x).eλx
x∈TX

Now, MGF of a random variable X be given by


         
1 −4λ 1 −2λ 1 2λ 1 4λ 5
MX (λ) = e + e + e + e +
8 6 6 8 12

Therefore, distribution of X is given by

X −4 −2 0 2 4
1 1 5 1 1
P (X = x) 8 6 12 6 8

8. A fair die is rolled 3600 times. Use CLT to compute the probability that six appears at
most 630 times. Enter the answer correct to two decimal places.
(Hint: Use FZ (1.341) = 0.91)
Solution:
Define a random variable X such that
(
1 if six appears on rolling a fair die
X=
0 otherwise

1
Therefore, E[X] = µ = and
6
2 1 5 5
Var(X) = σ = . =
6 6 36

Let X1 , X2 , . . . , X3600 be outcomes on rolling the fair die 3600 times.


Notice that X1 +X2 +. . .+X3600 will denote the number of times six appears in 3600 rolls.

Let S = X1 + X2 + . . . + X3600

Page 6

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

To find: P (S ≤ 630)

By CLT, we know that


S − 3600µ
√ ∼ Normal(0, 1)
σ n
S − 600
⇒ √ ∼ Normal(0, 1)
10 5
Now,

P (S ≤ 630) = P (S − 600 ≤ 30)


S − 600 30
= P( √ ≤ √ )
10 5 10 5
= P (Z ≤ 1.34)
= 0.91

9. Consider the following PDF curves and match them with the correct distribution. [1
mark]

Graph 1 Graph 2

Graph 3 Graph 4

(a) Graph 1 → Gamma, Graph 2 → Normal, Graph 3 → Gamma, Graph 4 → Beta.

Page 7

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

(b) Graph 1 → Beta, Graph 2 → Gamma, Graph 3 → Normal, Graph 4 → Gamma.


(c) Graph 1 → Beta, Graph 2 → Normal, Graph 3 → Normal, Graph 4 → Gamma.
(d) Graph 1 → Gamma, Graph 2 → Normal, Graph 3 → Normal, Graph 4 → Beta.

Solution:
Graph 1: Range of the distribution is [0, 1] and shape of the graph resembles to the Beta
distribution.

Graph 2: PDF curve is not symmetric about mean and shape of the graph resembles to
the Gamma distribution.

Graph 3: PDF curve is symmetric about mean and shape of the graph resembles to the
Normal distribution.

Graph 4: PDF curve is not symmetric about mean and shape of the graph resembles to
the Gamma distribution.
Therefore, Graph 1 → Beta, Graph 2 → Gamma, Graph 3 → Normal, Graph 4 →
Gamma.

10. Let X1 , X2 and X3 ∼ i.i.d. X where X has the following probability mass function:

x -1 2
2 1
fX (x) 3 3

Table 7.1.P: PMF of X

Find the distribution of Y = X1 + X2 + X3 . [1 mark]

Y -3 0 3 6
(a) 1 1 1 1
P (Y = y) 6 6 3 3

Y -3 0 3 6
(b) 8 4 2 1
P (Y = y) 27 9 9 27

Y -3 0 3 6
(c) 8 1 4 2
P (Y = y) 27 27 9 9

Y -3 0 3 6
(d) 2 8 1 4
P (Y = y) 9 27 27 9

Solution:
The PMF of X is given by

Page 8

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

x -1 2
2 1
fX (x) 3 3

Given that Y = X1 + X2 + X3 where X1 , X2 and X3 ∼ i.i.d. X.


To find: Distribution of Y .

We will find the distribution of X by finding the MGF of Y .

MY (λ) = E[eλY ]
= E[eλ(X1 +X2 +X3 ) ]
= E[eλX1 eλX2 eλX3 ]
= E[eλX1 ]E[eλX2 ]E[eλX3 ] (Since, X1 , X2 and X3 are independent)
λX λX λX
= E[e ]E[e ]E[e ] (Since, X1 , X2 and X3 ∼ i.i.d. X)
= [MX (λ)]3 ...(1)

Now,

MX (λ) = E[eλX ]
= e−1λ .P (X = −1) + e2λ .P (X = 2)
2e−λ e2λ
= + ...(2)
3 3

From equation (1) and (2), we have

3
2e−λ e2λ

MY (λ) = +
3 3
1
= (2e−λ + e2λ )3
27
1
= (8e−3λ + e6λ + 12e−2λ e2λ + 6e−λ e4λ ) (since, (a + b)3 = a3 + b3 + 3a2 b + 3ab2 )
27
8 1 4 2
= e−3λ + e6λ + + e3λ
27 27 9 9

Therefore, distribution of Y is given by

Y -3 0 3 6
8 4 2 1
P (Y = y) 27 9 9 27

Page 9

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Statistics for Data Science - 2


Week 8 Graded assignment

50
P
1. Let X1 , X2 , . . . , X50 ∼ i.i.d. Poisson(0.04) and let Y = Xi . Use Central Limit
i=1
theorem to find P (Y > 3). Enter the answer correct to 2 decimal places.
Solution:
Let X ∼ Poisson(0.04).
Consider the samples X1 , X2 , . . . , X50 from X.
E[X] = Var[X]
 50 =0.04  50 
P P
E[Y ] = E Xi = 50 × 0.04 = 2, Var[Y ] = Var Xi = 50 × 0.04 = 2
i=1 i=1

To find: P (Y > 3).


By CLT, we know that
Y − nµ
√ ∼ Normal(0, 1)
σ n
 
Y −2
⇒ √ ∼ Normal(0, 1)
2
Now,

P (Y > 3) = P (Y − 2 > 1)
 
Y −2 3−2
=P √ > √
2 2
= P (Z > 0.707)
= 1 − FZ (0.707) = 1 − 0.76 = 0.24

2. Let the moment generating function of a random variable X be given by


         
1 −2λ 1 3 3 7
MX (λ) = e + + −λ
e + 2λ
e + eλ
4 40 10 40 20
Find the distribution of X.

X −2 −1 0 1 2
1 3 3 1 7
P (X = x) 4 40 10 40 20

(a)

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

X −2 −1 0 1 2
1 1 3 3 7
P (X = x) 4 40 10 40 20

(b)

X −2 −1 0 1 2
1 3 1 7 3
P (X = x) 4 10 40 20 40

(c)

X −2 −1 0 1 2
1 3 1 7 3
P (X = x) 4 40 40 20 10

(d)

Solution:
The MGF of a discrete random variable X with the PMF fX (x) = P (X = x), x ∈ TX
is given by

MX (λ) = E[eλX ]
X
= P (X = x)eλx
x∈TX

Now, MGF of a random variable X is given as


         
1 −2λ 1 3 3 7
MX (λ) = e + + −λ
e + 2λ
e + eλ
4 40 10 40 20

Therefore, distribution of X is given by

X −2 −1 0 1 2
1 3 1 7 3
P (X = x) 4 10 40 20 40

3. A fair coin is tossed 1000 times. Use CLT to compute the probability that head appears
at most 520 times. Enter the answer correct to 3 decimal places.
Solution:
Define a random variable X such that
(
1 if head appears on tossing a fair coin
X=
0 otherwise

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

1
Therefore, E[X] = µ = and
2
1 1 1
Var(X) = σ 2 = . =
2 2 4

Let X1 , X2 , . . . , X1000 be outcomes on tossing the fair coin 1000 times.


Notice that X1 + X2 + . . . + X1000 will denote the number of times head appears in
1000 tosses.
Let S = X1 + X2 + . . . + X1000
To find: P (S ≤ 520)
By CLT, we know that
S − 1000µ
√ ∼ Normal(0, 1)
σ n
S − 500
⇒ √ ∼ Normal(0, 1)
5 10
Now,

P (S ≤ 520) = P (S − 500 ≤ 20)


 
S − 500 20
=P √ ≤ √
5 10 5 10
= P (Z ≤ 1.26)
= 0.896

4. Let X1 , X2 , . . . , X500 ∼ i.i.d Normal(0, 1). Evaluate P (X12 + X22 + . . . + X500


2
> 550)
using Central Limit theorem. Enter the answer correct to 2 decimal places.
Hint: (X12 + X22 + . . . + X500 2
) ∼ Gamma (250, 0.5) .
Solution:
Given X1 , . . . , X500 ∼ i.i.d. Normal(0, 1).  
1 1 2
We know that if X ∼ Normal(0, 1) =⇒ X ∼ Gamma ,
2 2
Also, Sum of n independent  Gamma(α,
 β) is Gamma(nα, β).
1 1
Therefore, Xi2 ∼ Gamma , , for all i.
2 2
and (X12 + X22 + . . . + X500
2
) ∼ Gamma (250, 0.5)
Let Y = Y1 + Y2 + . . . + Y500 , where Yi = Xi2 for all i : 1 → 500

0.5 0.5
E[Yi ] = = 1 and Var[Yi ] = = 2, for i : 1 → 500
0.5 0.25

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

250 250
E[Y ] = = 500 and Var[Y ] = = 1000
0.5 0.52

To find: P (Y > 550)


By CLT, we know that
Y − 500µ
√ ∼ Normal(0, 1)
σ n
Y − 500
⇒ √ ∼ Normal(0, 1)
10 10
Now,

P (Y > 550) = P (Y − 500 > 50)


 
Y − 550 5
=P √ >√
10 10 10
= P (Z > 1.58)
= 1 − FZ (1.58) = 1 − 0.94 = 0.06

Use the below information to answer questions 5 and 6.

Let X be a random variable having the gamma distribution with the parameters α = 2n
and β = 1.
Hint:
α α
• If X ∼ Gamma(α, β), E[X] = and Var[X] = 2
β β
• Sum of n independent Gamma(α, β) is Gamma(nα, β)

5. Use the Weak Law of Large number to find the value of n such that
!
X
P − 1 > 0.01 < 0.01
2n

(a) 505000
(b) 470000
(c) 498000
(d) 482000

Solution:
Given X ∼ Gamma(2n, 1)
Let X = X1 + X2 + X3 + . . . + X2n , where Xi ∼ Gamma(1, 1).

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

E[X] = µ = 1 and Var(X) = σ 2 = 1


1
E[X̄] = 1 and Var[X̄] =
2n
!
X
To find: The value of n such that P − 1 > 0.01 < 0.01.
2n

By weak law of large numbers, we have

σ2
P (|X − µ| > δ) ≤ 2

!
X 1
⇒P − 1 > 0.01 ≤
2n 2n × 0.012

1 1
Therefore, < 0.01 =⇒ 2n > =⇒ n > 500000.
2n × 0.01 2 0.013

6. Use CLT to find the value of n such that


!
X
P − 1 > 0.01 < 0.01
2n

Hint: Use FZ (2.58) = 0.995, FZ (1.96) = 0.975 if needed.

(a) 34570
(b) 33500
(c) 32500
(d) 30000

Solution:
E[X1 + . . . + X2n ] = 2n and Var[X1 + . . . + X2n ] = 2n
!
X
To find: The value of n such that P − 1 > 0.01 < 0.01.
2n

By CLT, we know that


X − 2nµ
√ ∼ Normal(0, 1)
σ n
X − 2n
⇒ √ ∼ Normal(0, 1)
2n

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Now,
!
X
P − 1 > 0.01 < 0.01
2n
!
X1 + . . . + Xn
=⇒ P − 1 > 0.01 < 0.01
2n
!
X1 + . . . + Xn − 2n √
=⇒ P √ > 0.01 2n < 0.01
2n

=⇒ P (| Z |> 0.01 2n) < 0.01

=⇒ 2P (Z > 0.01 2n) < 0.01
√ 0.01
=⇒ 1 − FZ (0.01 2n) <
√ 2
=⇒ FZ (0.01 2n) > 0.995

=⇒ FZ (0.01 2n) > FZ (2.58)
=⇒ n > 33282

7. Let the time taken (in hours) for failure of an electric bulb follow the exponential distri-
bution with the parameter 0.05. Suppose that 100 such light bulbs say L1 , L2 , . . . , L100
are used in the following manner: For every i, as soon as the light Li fails, Li+1 be-
comes operative, where i : 1 → 99 (i.e. If L1 fails, L2 becomes operative, if L2 fails, L3
becomes operative, and so on). Let the total time of operation of 100 bulbs be denoted
by T. Using CLT, compute the probability that T exceeds 2500 hours.
(a) FZ (1.5)
(b) 1 − FZ (1.5)
(c) FZ (2.5)
(d) 1 − FZ (2.5)
Solution:
Given, time to failure (in hours) of an electric bulb has the exponential distribution
with the parameter λ = 0.05.
Since, the bulbs are used in such a way, that as soon as light L1 fails, L2 becomes
operative, L2 fails, L3 becomes operative, and so on.
We know that if X ∼ Gamma(α, β) with parameter α = 1, then X ∼ Exp(β).
Also, sum of n i.i.d. Exp(λ) is Gamma(n, λ).
Since each of the Li ’s are exponentially distributed with parameter = 0.05, therefore
L1 + . . . + L100 ∼ Gamma(nα, β) = Gamma(100, 0.05)

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

Let T = L1 + . . . + L100

1 1
E[Li ] = µ = = 20 and SD[Li ] = σ = = 20
0.05 0.05

To find: P (T ≥ 2500)
By CLT, we know that
T − 100µ
√ ∼ Normal(0, 1)
σ n
T − 2000
⇒ √ ∼ Normal(0, 1)
20 100
Now,

P (T ≥ 2500) = P (T − 2000 ≥ 500)


 
T − 2000 500
=P ≥
200 200
= P (Z ≥ 2.5)
= 1 − FZ (2.5)

8. Suppose speeds of vehicles on a particular road are normally distributed with mean 36
mph and standard deviation 2 mph. Find the probability that the mean speed X of
20 randomly selected vehicles is between 35 and 38 mph.
√ √
(a) FZ ( 5) − FZ (− 5)
√ √
(b) FZ ( 20) − FZ (− 20)
√ √
(c) FZ ( 38) − FZ (− 35)
√ √
(d) FZ ( 20) − FZ (− 5)

Solution:
Let X denote the speed of a vehicle on a particular road.
Given that X ∼ Normal(36, 22 ).
Therefore, µ = 36 and σ = 2
Select X1 , X2 , . . . X20 samples such that X1 , X2 , . . . X20 ∼ iid X

X1 + X2 + . . . + X20
Let X = and S = X1 + X2 + . . . + X20
20

To find: P (35 < X < 38) From CLT, we know that

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)


lOMoARcPSD|49399442

X1 + X2 . . . + Xn − nE[X]
√ ∼ Normal(0, 1)

S − nµ
⇒ √ ∼ Normal(0, 1)

(S − 36(20))
⇒ √ ∼ Normal(0, 1)
(2 20)

Now,
S
P (35 < X < 38) = P (35 < < 38)
20
S
= P (−1 < − 36 < 2)
20
S − 36(20)
= P (−1 < < 2)
√ 20
− 20 S − 36(20) √
= P( < √ < 20)
2 2 20
√ S − 36(20) √
= P (− 5 < √ < 20)
2 20
√ √
= FZ ( 20) − FZ (− 5)

Downloaded by Amrit Ranjan (amritkrkr6@gmail.com)

You might also like